Analytics Drives Big Data Drives Infrastructure Confessions of

Transcription

Analytics Drives Big Data Drives Infrastructure Confessions of
Analytics Drives Big Data Drives
Infrastructure
Confessions of Storage turned Analytics Geeks
Dr. Aloke Guha
29th IEEE Conference on Massive Data Storage
May 8th, 2013
[email protected]
What’s Common Between
a Sensor that could Distinguish a fine Cognac,
and Predicting Movies You’d Like on Netflix?
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
2
The Sommelier “Robot”
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
3
Predicting What Movies You’d Watch
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
4
(Analytics, BigData, DataStore)+
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
5
Many Analytics Techniques . . .
AI (McCarthy) 1956
SNARC (Minsky) 1951
Dendral (Feigenbaum) 1965
Neural
Networks
Expert
Systems
SVM
LDA
. . . Vapnik (1992)
Linear
Machine Learning
Naïve
Bayes
Decision
Trees
Regression
Random
Forests
Time-Series
Statistics
Random
Forests
Genetic
Algorithms
K-nearest
neighbor
...
Fraser and Burnell (1970)
R
Ihaka and Gentleman (1993)
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
6
Common Analytics Processing pre-2000
•
•
•
•
Sources: Local
Data: Numeric, Homogeneous
Processing: Local
Consumer: Local
• Analytics: Linear/Non-Linear Regression,
Neural Networks, SVM, LDA, LSA,
Decision Trees, Monte Carlo, Lin-Ops,
Expert Systems . . .
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
7
Flavor Predictor – Neural Networks
USPTO #5,373,452 (1994)
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
1988
8
Pattern Recognition – Genetic Algorithms
US PTO #5,140,530, 1992
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
9
Small to Big
http://article.wn.com/view/2013/04/04/Big_data_forefather_Michael_Stonebraker_shows_no_signs_of_sl/#/related_news
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
10
Typical Analytics: 2000-2006
• Sources: Global , Social
Networks
• Data: Heterogeneous, Numeric,
Text
• Processing: Hosted/Scale
• Consumer: Global
• Analytics: Batch Mode, Social
Media Marketing, Churn
Detection, Sentiment Analysis, etc.
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
11
2007- : Internet Data Analytics
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
12
Financial Risk Scoring: Detect
Risk Scoring: detect incremental change in # occurrences where corporate officers
mention “risk” (or equivalent terms) during earnings call
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
13
Financial Risk Scoring: Listen
*Risk Scoring: detect incremental change in occurrences where corporate officers
mention “risk” (or semantically equivalent terms) during the corporate earnings call
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
14
Banking: Credit Worthiness – remember 2008?
Analyze bank reports to assess loans, payments, recoveries, etc. for key bank
indexes, groups of banks, or individual banks
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
15
Share of Voice: Online Buzz
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
16
Sentiment Analysis
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
17
Analytics Processing: 2007• Sources: Global, Mobile,
New Social (Instagram, . . )
• Data: Multi-Dimensional,
Heterogeneous, Audio/Video
• Processing: Hosted/Scale
• Consumer: Global
• Analytics: Batch, Streaming, . . .
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
18
2008 - : Real-Time/Streaming Analytics
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
19
Brand Marketing
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
20
Brand Management
21
Customer Support
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
22
Customer Support
23
Lead Generation
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
24
. . . More Data, Faster
http://www.cioinsight.com/it-strategy/big-data/data-analytics-allows-pg-to-turn-on-a-dime/?kc=CIOMINUTE05062013CIOA
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
25
“Internet of Things”
Machine-to-Machine
Message Queuing Telemetry Transport
http://www.news-sap.com/survey-by-sap-and-harris-interactive-finds-brazil-china-germany-and-india-most-ready-form2m-technology-to-drive-connected-smarter-cities/
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
26
AumniData: Batch Processing
Dashboard
Configuration
Twitter
YouTube
(TomCat)
Blog/Web Site
Blog/Web Site
Blog/Web Site
RSS/ATOM
Feed
Custom Analytics
Display
Dashboard
Application
Ad-Hoc Query
Summary
(.3rd party App)
Content /
Metadata
Index
(MySQL)
Dashboard
Store
(SQL Server)
Requestor/
URL Scanner
Data Collector
Data
Collector
(Batch
Scheduled)
Data
Collector
(Batch
Scheduled)
(Batch Scheduled)
Content
Store
NLP+ Cruxly Intent
NLP+ Cruxly Intent
Detection
NLP+
Cruxly Intent
Detection
NLP+
Cruxly Intent
(AWS)
Detection
NLP+
Cruxly Intent
(AWS)
NLP
Stack+ AumniData
Detection
(AWS)
Detection
(AWS) + Analytics*
Classifier
(AWS)
(RackSpace VM)
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
27
Cruxly: Stream Processing
Twitter
Reports / Dashboard
Tracker Editor
(web app - Heroku)
Request
(Keywords)
Tweets
(Keywords)
Streaming API Client
Streaming
API Client
(Heroku Worker)
Streaming
API Client
(Heroku Worker)
(24x7) Worker)
(Heroku
(24x7)
(24x7)
Tweets
(Keywords)
Tweets
Content Store
(DynamoDB)
Tweet ID + Intent
Signal
(Heroku
PostgresSQL)
NLP+ Cruxly Intent
NLP+ Cruxly Intent
Detection
NLP+
Cruxly Intent
Detection
NLP+
Cruxly Intent
(AWS)
Detection
NLP+
Cruxly Intent
(AWS)
NLP
Detection
(NER,
etc + Cruxly
(AWS)
Detection
(AWS)Detection
Intent
(AWS)
(AWS)
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
28
Data Analytics Demands . . .
Dashboards
View
View
Chart
Report
Query/ RT Query
Ad Hoc/ Search/ SQL
Analyze
Analyze
Custom Analytics
Process
NLP
Classify
Index
Store
Machine
Learning
Library
Stats
Library
R
Process
Store
Data Collector
Text / Sensor Data/ Stream . . .
Storm
Yarn
29
Storage Implications: Back to the Future
IOPs – Stream
MB/s – Batch
Both?
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
30
Storage Implications: Back to the Future II, III
MapReduce
Master
Slave #1
Task
tracker
Task
tracker
Slave #N
Task
tracker
Mgmt Node
Zookeeper
Hive
Job Tracker
Pig
HDFS
Oozie
Name Node
HUE
Data Node
Data Node
Data Node
HDFS client
Storage Capacity Scaling?
Import/Export Data?
Storage Tiering?
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
31
Data Ingesters
Processing
Stream and Batch
Metadata / In-Mem
Store
SAN
Local Storage/ Flash / DAS
Visualization Library / Interactive Query
Analytics Processing
Data
Ingesters
(Smart)
Map Reduce /Distributed Data Store
Data
Ingesters
(Basic)
Sensor Processing: Data Integration
A More General Data Analytics Framework?
Content Store
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
32
Conclusion
•
•
•
•
•
•
Data Analytics  Big Data  Scale-Out
Variety  Infrastructure
Volume  Bandwidth Support
Velocity  Streaming Support
We Solved the Processing Problem
We Need to Solve the Larger Storage Problem
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
33
Grateful Acknowledgements
• Kapil Tundwal
• Venky Madireddy
• Dr. Kirill Kireyev
• Dr. Shumin Wu
• Dr. Andrew Lampert
• Joan Wrabetz
Aloke Guha: Analytics Drives Big Data Drives Infrastructure, 29th IEEE MSST 2013
34

Similar documents