Business Intelligence Is In the Details
Transcription
Business Intelligence Is In the Details
Business Intelligence Is In the Details November 11, 2009 • • • • Neil Raden – Hired Brains George Chalissery – hMetrix Ty Alevizos – Tableau David Menninger – Vertica Systems Agenda • • • • Introduction Neil Raden: BI Is In the Details Introductory Tableau demo using Vertica hMetrix combining Tableau & Vertica to analyze detailed healthcare data • The Three C’s of Vertica • Questions and answers Vertica Analytic Database • Built for today’s BI workloads – Blinding performance • 50-200x faster query performance • Up to 5TB/hour load times – MPP/column database scales from GBs to PBs – Aggressive compression reduces storage – Leverage your current ETL & reporting tools • Ultimate deployment flexibility – – – – Natively on commodity Linux hardware As an appliance thru HP, Dell channels In public clouds: Amazon EC2, etc. In “private” clouds on any VMware platform 100+ customers – Fastest growing customer base in the analytic DBMS market BI Is in the Details How columnar technology leverages the latent value in SQL databases Presented by: NEIL RADEN Hired Brains, Inc. Hired Brains Research http://www.hiredbrains.com © Hired Brains Inc. All rights reserved “Adequate” Information Systems • "... An adequate information system has to include information that makes executives question assumptions. It must lead them to ask the right questions, not just feed them the information they expect. It demands that they obtain that information on a regular basis. It requires that they systematically integrate the information into their decision making." - Peter Drucker © Hired Brains Inc. All rights reserved 5 3 Important Questions • Are you unable to combine analytics with operational systems because of performance and capacity constraints in your data warehouse and BI tools? • Are your analysts confined to only aggregated and/or subsetted data? • Do you need DBA’s to tune your most interesting queries for performance? © Hired Brains Inc. All rights reserved At the Breaking Point • Data warehouse “clients” now include unattended agents • Data warehouses are stretched by four performance limitations: – – – – Data preparation latency Query latency Enhancement latency Query governors • Existing performance enhancers physical schema, indexing and aggregation, will not be enough © Hired Brains Inc. All rights reserved New Rules • Bring the application to the data, not the data to the application • A database stores large/huge amounts of data should be able to process it • Only use tools that leverage the latent value of data in a data warehouse © Hired Brains Inc. All rights reserved Using SQL for Analytics & Visualization Tools capable of generating SQL robust enough and fast enough to interactively support complex analysis are called ROLAP, or Relational OLAP © Hired Brains Inc. All rights reserved OLAP Varieties • ROLAP – Uses the relational database to enable analysis without intermediate and aggregated data • MOLAP – must extract data and reformat it into proprietary structures such as Microsoft AS or Essbase © Hired Brains Inc. All rights reserved MOLAP vs ROLAP The spectrum of performance O pe ra tio ai le d na l MOLAP ROLAP D et Su m m ar y 100 90 80 70 60 50 40 30 20 10 0 © Hired Brains Inc. All rights reserved ROLAP Solution • Optimizes its SQL-generating engine for each database • Pushes processing as far into the database as possible, keeping engine light • Cube-based OLAP held a performance advantage, but without scale © Hired Brains Inc. All rights reserved Two ROLAP Approaches © Hired Brains Inc. All rights reserved Data Warehouse Performance • Logical solution to data warehouse performance today should be non-intrusive and complementary to the existing environment • This requires database technology with a different architecture and providing at least one order of magnitude better performance © Hired Brains Inc. All rights reserved What Columnar Databases Do • In simple terms they invert a database from a row-oriented to a column-oriented format • Far superior for analytic work • Vertica, for example: – Shared-nothing MPP residing in commodity hardware – Aggressive data compression = smaller, faster databases – Auto-administration: design, optimization, failover and recovery – Concurrent loading and querying © Hired Brains Inc. All rights reserved Normal Business Questions • People who have not been exposed to BI ask normal, even mundane questions that stagger data warehouses: Project profits and inventory levels by yesterday’s three pricing models. Show me, by drug type, by prescriber type, by duration from prescription date, the number of adverse reactions grouped into clusters Will this plan work or will we get killed?? © Hired Brains Inc. All rights reserved Convergence is Here History of the Operational/Analytical Rift Operational BI Y2K/ERP SaaS/Cloud C/S OLTP Process Intelligence Convergence CICS/OLTP Batch Reporting 1950 1960 Semantics 4GL/PC/SS 1970 1980 Decision Automation DW/BI 1990 2000 © Hired Brains Inc. All rights reserved 2010 BI Going Operational • Provide detail to a rules engine or predictive model for fraud detection e.g. • Generate a real-time recommendation • Issue PO’s for a continuous replenishment system • Sift through streaming Web clickstream data to dynamically reconfigure web sites © Hired Brains Inc. All rights reserved Market Basket Analysis • Determines what products as complementary to those purchased • Can then change promotions, placement of products or, on the Web, make offers in real-time • Very difficult problem for simple OLAP models; good candidate for ROLAP © Hired Brains Inc. All rights reserved Visualization of Intra-Day Data © Hired Brains Inc. All rights reserved Visualization of Demographic Data © Hired Brains Inc. All rights reserved Adoption Levels Reveal Data Warehouse Success © Hired Brains Inc. All rights reserved Conclusion • Until recently, data warehouse performance was considered a manageable IT issue • But business conditions and expectations, driven by technology, changed • Only fresh solutions like those from Vertica, using ROLAP and visualization technology, can address these needs © Hired Brains Inc. All rights reserved Neil Raden President & Practice Director Hired Brains, Inc. Email: [email protected] White papers: www.hiredbrains.com/Whitepapers.htm LinkedIn: www.linkedin.com/in/neilraden Blog: http://www.intelligententerprise.com/blog/nraden.html (Office) +1 805 962 7391 GMT - 08:00 Pacific Time (Mobile) +1 805 284 2322 Twitter: neilraden © Hired Brains Inc. All rights reserved Tableau Software, Inc. Company and Leadership Based on a breakthrough from Stanford University, Tableau makes visual analytics and business intelligence software that delivers: • 10-100X productivity improvements • amazing multi-dimensional discoveries • web analytics at 1/10th the cost of a “BI Platform” Customers + + + + + + + + Google Allstate Cornell Harvard Apple NSA Microsoft 1,000’s more The company is headquartered in Seattle, WA. Key Partners More information is available at www.tableausoftware.com/vertica All rights reserved. © 2009 Tableau Software Inc. + Oracle OEM + Microsoft Gold Certified + Teradata Partner hMetrix November 11th, 2009 About hMetrix • Analytical services firm, exclusively focused on the healthcare domain. • Diverse Portfolio – Health Plans, Employers, Government, Policy & Research Firms, Providers, Managers, Pharma. • Healthcare business knowledge with analytical expertise. • Small, well integrated team with very productive tools. • Limited IT or Systems resources. 27 2009 hMetrix Why Tableau & Vertica • Clients wanted direct access to clean data in addition to reports. • Needed a solution capable of handling large datasets. • Needed a solution that was easy to use for the end-user. • Needed a solution that could be implemented & deployed in house. • Evaluated several options before selecting Tableau & Vertica. 28 2009 hMetrix Typical Client Request • Data is manufactured but intended to represent real life. • Client for this demo is a firm that specializes in the delivery of optimal care to diabetic patients. • Need to identify patients who are diabetic in population. • Identification based on diagnosis codes from administrative data. • Exclude patients with gestational diabetes from the analysis. • Live Demo. 29 2009 hMetrix The Vertica Difference – The 3 C’s • Column Store • Compression and Encoding • Clustered MPP architecture with High Availability Column Storage • Ideal for read-intensive workloads – Reduces disk IO by orders of magnitude SELECT avg(price) FROM tickstore WHERE symbol = ‘GM’ and date = ‘1/17/2008’ Column Store Reads 3 columns GM GM GM AAPL GM Row Store GM Reads all columns GM AAPL NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE 30.77 30.77 30.79 93.24 30.77 30.77 30.79 93.24 NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NYSE NYSE NYSE NYSE NYSE 1/17/08 1/17/08 1/17/08 1/17/08 1/17/08 1/17/08 1/17/08 1/17/08 Query-able Encoding and Compression Txn Date 1/17/2007 1/17/2007, 16 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 1/17/2007 Run-length Encoding (Few values, sorted) Customer ID 0000001 0000001 0 0000003 2 0000003 2 0000005 4 0000011 10 0000011 10 0000020 19 0000026 25 0000050 49 0000051 50 0000052 51 0000053 52 0000068 67 0000069 68 0000071 70 Delta Encoding (Many values, sorted) Trade 100.99 75.66 36.93 146.88 283.39 93.40 23.21 344.44 21.30 23.92 50.22 38.22 21.92 74.26 152.49 89.23 100.99 75.66 36.93 146.88 283.39 93.40 23.21 344.44 21.30 23.92 50.22 38.22 21.92 74.26 152.49 89.23 • Greater compression with similar data in columns • Vertica applies multiple compressions – – Dependent upon data System chooses which to apply • Vertica Customer Experience – – – – – – – Call detail records (CDR) – 8:1 (87%)1 Consumer data 30:1 (96%) Marketing analytics 20:1 (95%) Network logging 60:1 (98%) Switch-level SNMP 13:1 (92%) Trade and quote exchange – 5:1 (80%) Trade execution auditing Trails – 10:1 (90%) Weblog and click-stream 10:1 (90%) Float – Compression (Many values, • Vertica queries data in encoded form resulting in less I/O, less RAM unsorted) Scale Out on Industry-Standard Hardware • Seamlessly add more servers to boost query or load speed or to increase capacity • No additional license fees for additional hardware • Flexibility to add capacity in increments of 1 or more servers <= 5 TB <= 15TB 40 TB Discussion • Email [email protected] to: – Request a copy of these slides – Request a free Vertica evaluation • Info on Tableau & Vertica: – TableauSoftware.com/Vertica – Vertica.com/Tableau • Contacts: – – – – Neil Raden – [email protected] George Chalissery – [email protected] Ty Alevizos – [email protected] Dave Menninger – [email protected]