Business Intelligence Is In the Details

Transcription

Business Intelligence Is In the Details
Business Intelligence Is In the Details
November 11, 2009
•
•
•
•
Neil Raden – Hired Brains
George Chalissery – hMetrix
Ty Alevizos – Tableau
David Menninger – Vertica Systems
Agenda
•
•
•
•
Introduction
Neil Raden: BI Is In the Details
Introductory Tableau demo using Vertica
hMetrix combining Tableau & Vertica to analyze
detailed healthcare data
• The Three C’s of Vertica
• Questions and answers
Vertica Analytic Database
• Built for today’s BI workloads
– Blinding performance
• 50-200x faster query performance
• Up to 5TB/hour load times
– MPP/column database scales from GBs to PBs
– Aggressive compression reduces storage
– Leverage your current ETL & reporting tools
• Ultimate deployment flexibility
–
–
–
–
Natively on commodity Linux hardware
As an appliance thru HP, Dell channels
In public clouds: Amazon EC2, etc.
In “private” clouds on any VMware platform
100+ customers – Fastest
growing customer base in the
analytic DBMS market
BI Is in the Details
How columnar
technology leverages
the latent value in SQL
databases
Presented by:
NEIL RADEN
Hired Brains, Inc.
Hired Brains Research
http://www.hiredbrains.com
© Hired Brains Inc. All rights reserved
“Adequate” Information Systems
• "... An adequate information system has to
include information that makes executives
question assumptions. It must lead them to
ask the right questions, not just feed them the
information they expect. It demands that they
obtain that information on a regular basis. It
requires that they systematically integrate
the information into their decision
making." - Peter Drucker
© Hired Brains Inc. All rights reserved
5
3 Important Questions
• Are you unable to combine analytics
with operational systems because of
performance and capacity constraints
in your data warehouse and BI tools?
• Are your analysts confined to only
aggregated and/or subsetted data?
• Do you need DBA’s to tune your most
interesting queries for performance?
© Hired Brains Inc. All rights reserved
At the Breaking Point
• Data warehouse “clients” now
include unattended agents
• Data warehouses are stretched
by four performance limitations:
–
–
–
–
Data preparation latency
Query latency
Enhancement latency
Query governors
• Existing performance enhancers physical schema, indexing and
aggregation, will not be enough
© Hired Brains Inc. All rights reserved
New Rules
• Bring the application to the data, not the
data to the application
• A database stores large/huge amounts of
data should be able to process it
• Only use tools that leverage the latent
value of data in a data warehouse
© Hired Brains Inc. All rights reserved
Using SQL for Analytics &
Visualization
Tools capable of generating SQL robust enough and fast
enough to interactively support complex analysis are called
ROLAP, or Relational OLAP
© Hired Brains Inc. All rights reserved
OLAP Varieties
• ROLAP – Uses the relational database to
enable analysis without intermediate and
aggregated data
• MOLAP – must extract data and reformat it
into proprietary structures such as
Microsoft AS or Essbase
© Hired Brains Inc. All rights reserved
MOLAP vs ROLAP
The spectrum of performance
O
pe
ra
tio
ai
le
d
na
l
MOLAP
ROLAP
D
et
Su
m
m
ar
y
100
90
80
70
60
50
40
30
20
10
0
© Hired Brains Inc. All rights reserved
ROLAP Solution
• Optimizes its SQL-generating engine for
each database
• Pushes processing as far into the
database as possible, keeping engine light
• Cube-based OLAP held a performance
advantage, but without scale
© Hired Brains Inc. All rights reserved
Two ROLAP Approaches
© Hired Brains Inc. All rights reserved
Data Warehouse Performance
• Logical solution to data warehouse performance
today should be non-intrusive and
complementary to the existing environment
• This requires database technology with a
different architecture and providing at least one
order of magnitude better performance
© Hired Brains Inc. All rights reserved
What Columnar Databases Do
• In simple terms they invert a database from a
row-oriented to a column-oriented format
• Far superior for analytic work
• Vertica, for example:
– Shared-nothing MPP residing in commodity hardware
– Aggressive data compression = smaller, faster
databases
– Auto-administration: design, optimization, failover and
recovery
– Concurrent loading and querying
© Hired Brains Inc. All rights reserved
Normal Business Questions
• People who have not been exposed to BI
ask normal, even mundane questions that
stagger data warehouses:
Project profits and inventory levels by
yesterday’s three pricing models.
Show me, by drug type, by prescriber
type, by duration from prescription
date, the number of adverse
reactions grouped into clusters
Will this plan work or will we get
killed??
© Hired Brains Inc. All rights reserved
Convergence is Here
History of the Operational/Analytical Rift
Operational BI
Y2K/ERP
SaaS/Cloud
C/S OLTP
Process Intelligence
Convergence
CICS/OLTP
Batch Reporting
1950
1960
Semantics
4GL/PC/SS
1970
1980
Decision
Automation
DW/BI
1990
2000
© Hired Brains Inc. All rights reserved
2010
BI Going Operational
• Provide detail to a rules engine or
predictive model for fraud detection e.g.
• Generate a real-time recommendation
• Issue PO’s for a continuous replenishment
system
• Sift through streaming Web clickstream
data to dynamically reconfigure web sites
© Hired Brains Inc. All rights reserved
Market Basket Analysis
• Determines what products as
complementary to those purchased
• Can then change promotions, placement
of products or, on the Web, make offers in
real-time
• Very difficult problem for simple OLAP
models; good candidate for ROLAP
© Hired Brains Inc. All rights reserved
Visualization of Intra-Day Data
© Hired Brains Inc. All rights reserved
Visualization of Demographic Data
© Hired Brains Inc. All rights reserved
Adoption Levels Reveal Data
Warehouse Success
© Hired Brains Inc. All rights reserved
Conclusion
• Until recently, data warehouse
performance was considered a
manageable IT issue
• But business conditions and expectations,
driven by technology, changed
• Only fresh solutions like those from
Vertica, using ROLAP and visualization
technology, can address these needs
© Hired Brains Inc. All rights reserved
Neil Raden
President & Practice Director
Hired Brains, Inc.
Email: [email protected]
White papers: www.hiredbrains.com/Whitepapers.htm
LinkedIn: www.linkedin.com/in/neilraden
Blog: http://www.intelligententerprise.com/blog/nraden.html
(Office) +1 805 962 7391 GMT - 08:00 Pacific Time
(Mobile) +1 805 284 2322
Twitter: neilraden
© Hired Brains Inc. All rights reserved
Tableau Software, Inc.
Company and Leadership
Based on a breakthrough from Stanford University,
Tableau makes visual analytics and business intelligence
software that delivers:
• 10-100X productivity improvements
• amazing multi-dimensional discoveries
• web analytics at 1/10th the cost of a “BI Platform”
Customers
+
+
+
+
+
+
+
+
Google
Allstate
Cornell
Harvard
Apple
NSA
Microsoft
1,000’s more
The company is headquartered in Seattle, WA.
Key Partners
More information is available at
www.tableausoftware.com/vertica
All rights reserved. © 2009 Tableau Software Inc.
+ Oracle OEM
+ Microsoft Gold Certified
+ Teradata Partner
hMetrix
November 11th, 2009
About hMetrix
• Analytical services firm, exclusively focused on the healthcare
domain.
• Diverse Portfolio – Health Plans, Employers, Government, Policy &
Research Firms, Providers, Managers, Pharma.
• Healthcare business knowledge with analytical expertise.
• Small, well integrated team with very productive tools.
• Limited IT or Systems resources.
27
2009 hMetrix
Why Tableau & Vertica
• Clients wanted direct access to clean data in addition to reports.
• Needed a solution capable of handling large datasets.
• Needed a solution that was easy to use for the end-user.
• Needed a solution that could be implemented & deployed in house.
• Evaluated several options before selecting Tableau & Vertica.
28
2009 hMetrix
Typical Client Request
• Data is manufactured but intended to represent real life.
• Client for this demo is a firm that specializes in the delivery of
optimal care to diabetic patients.
• Need to identify patients who are diabetic in population.
• Identification based on diagnosis codes from administrative data.
• Exclude patients with gestational diabetes from the analysis.
• Live Demo.
29
2009 hMetrix
The Vertica Difference – The 3 C’s
• Column Store
• Compression and Encoding
• Clustered MPP architecture with
High Availability
Column Storage
• Ideal for read-intensive workloads
– Reduces disk IO by orders of magnitude
SELECT avg(price)
FROM tickstore
WHERE symbol = ‘GM’ and date = ‘1/17/2008’
Column
Store
Reads 3 columns
GM
GM
GM
AAPL
GM
Row Store
GM
Reads all columns
GM
AAPL
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYASE
NYAASE
NYSE
NYASE
NGGYSE
NYGGGSE
NYSE
NYSE
NYSE
NYASE
NYAASE
NYSE
NYASE
NGGYSE
NYGGGSE
NYSE
NYSE
NYSE
NYASE
NYAASE
NYSE
NYASE
NGGYSE
NYGGGSE
NYSE
NYSE
NYSE
NYASE
NYAASE
NYSE
NYASE
NGGYSE
NYGGGSE
NYSE
NYSE
NYSE
30.77
30.77
30.79
93.24
30.77
30.77
30.79
93.24
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NYSE
NYSE
NYSE
NYSE
NYSE
1/17/08
1/17/08
1/17/08
1/17/08
1/17/08
1/17/08
1/17/08
1/17/08
Query-able Encoding and Compression
Txn Date
1/17/2007
1/17/2007, 16
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
1/17/2007
Run-length
Encoding
(Few values,
sorted)
Customer ID
0000001
0000001
0
0000003
2
0000003
2
0000005
4
0000011
10
0000011
10
0000020
19
0000026
25
0000050
49
0000051
50
0000052
51
0000053
52
0000068
67
0000069
68
0000071
70
Delta
Encoding
(Many values,
sorted)
Trade
100.99
75.66
36.93
146.88
283.39
93.40
23.21
344.44
21.30
23.92
50.22
38.22
21.92
74.26
152.49
89.23
100.99
75.66
36.93
146.88
283.39
93.40
23.21
344.44
21.30
23.92
50.22
38.22
21.92
74.26
152.49
89.23
• Greater compression with similar
data in columns
• Vertica applies multiple
compressions
–
–
Dependent upon data
System chooses which to apply
• Vertica Customer Experience
–
–
–
–
–
–
–
Call detail records (CDR) – 8:1 (87%)1
Consumer data 30:1 (96%)
Marketing analytics 20:1 (95%)
Network logging 60:1 (98%)
Switch-level SNMP 13:1 (92%)
Trade and quote exchange – 5:1 (80%)
Trade execution auditing Trails – 10:1
(90%)
Weblog and click-stream 10:1 (90%)
Float
–
Compression
(Many values, • Vertica queries data in encoded
form resulting in less I/O, less RAM
unsorted)
Scale Out on Industry-Standard Hardware
• Seamlessly add more
servers to boost query or
load speed or to increase
capacity
• No additional license fees
for additional hardware
• Flexibility to add capacity in
increments of 1 or more
servers
<= 5 TB
<= 15TB
40 TB
Discussion
• Email [email protected] to:
– Request a copy of these slides
– Request a free Vertica evaluation
• Info on Tableau & Vertica:
– TableauSoftware.com/Vertica
– Vertica.com/Tableau
• Contacts:
–
–
–
–
Neil Raden – [email protected]
George Chalissery – [email protected]
Ty Alevizos – [email protected]
Dave Menninger – [email protected]