BPOE Research Highlights

Transcription

BPOE Research Highlights
BPOE Research Highlights
http://prof.ict.ac.cn/jfzhan
INSTITUTE OF COMPUTING TECHNOLOGY
Jianfeng Zhan
ICT, Chinese Academy of Sciences 2013-­‐10-­‐9 1
What is BPOE workshop?
  B: Big Data Benchmarks   PO: Performance OpHmizaHon   E: Emerging Hardware WBDB 2013
MoHvaHon   Big data cover many research fields.   Researchers Specific research fields + main conferences   Have few chance to know about each other  
  Gap between Industry and Academia  
bringing researchers and pracHHoners in related areas together
WBDB 2013
BPOE communiHes   CommuniHes of architecture, systems, and data management.  
discuss the mutual influences of architectures, systems, and data management.   Bridge the gap of big data researches and pracHces between industry and academia
WBDB 2013
BPOE: a series of workshops   1st BPOE in conjuncHon with IEEE Big Data 2013  
 
Santa Clara, CA, USA Paper presentaHons + 2 Invited talks   2st BPOE in conjuncHon with CCF HPC China 2013,  
 
Guilin, Guangxi, China 8 invited talks   3st BPOE in conjuncHon with CCF Big Data Technology Conference 2013  
 
Beijing, China 6 invited talks WBDB 2013
OrganizaHon  
Steering commi6ee  
 
 
 
 
 
 
Lizy K John University of Texas at AusHn Zhiwei Xu ICT, Chinese Academy of Sciences Cheng-­‐zhong Xu, Wayne State University Xueqi Cheng, ICT, Chinese Academy of Sciences Jianfeng Zhan, ICT, Chinese Academy of Sciences Dhabaleswar K Panda Ohio State University PC Co-­‐chairs  
 
Jianfeng Zhan, ICT, Chinese Academy of Sciences Weijia Xu, TACC, University of Texas at AusHn WBDB 2013
Paper submissions   26 papers received Each one has at least five reviews   30 TPC members  
  16 paper accepted   Two invited talks WBDB 2013
Finalized programs
  Three sessions  
Performance opHmizaHon of big data systems •  3 from Intel, 1 from Hasso Plabner InsHtute, Germany  
Special Session: Big Data Benchmarking and Performance opHmizaHon. •  2 invited talks+ 1 from Academia Sinica  
Experience and evaluaHon with emerging hardware for big data •  1 from Japan, 3 from US WBDB 2013
Two invited talks
  13:20-­‐14:05  
Invited Talk, BigDataBench: Benchmarking big data systems, Yingjie Shi, Chinese Academy of Sciences   14:05-­‐14:50  
Invited Talk, Facebook: Using Emerging Hardware to Build Infrastructure at Scale, Bill Jia, PhD. Manager, Performance and Capacity Engineering, Facebook WBDB 2013
Related work with Big Data Benchmarking
  BigDataBench   Uses case on BigDataBench 10/
WBDB 2013
What is BigDataBench?
  A Big Data Benchmark Suite, ICT, Chinese Academy of Sciences PresentaHon is available soon from hbp://prof.ict.ac.cn/bpoe2013/program.php   EvaluaHng big data (hardware) systems and architectures  
  Opensource project  
11/
hbp://prof.ict.ac.cn/BigDataBench
WBDB 2013
Summary of BigDataBench
12/
WBDB 2013
BigDataBench Methodology
Representative
Real Data Sets
Investigate
Typical
Application
Domains
Data Types
 Structured data
 Semi-structured data
 Unstructured data
Data Sources
 Text data
 Graph data
 Table data
 Extended …
Synthetic data generation tool
preserving data characteristics
Diverse and
Important
Workloads
Application Types
 Offline analytics
 Realtime analytics
 Online services
Basic & Important
Operations and
Algorithms
Extended…
Represent Software
Stack
Extended…
13/
Big Data Sets
Preserving 4V
WBDB 2013
BigDataBench
Big Data
Workloads
RepresentaHve Datasets
Application Domain
Search Engine
Data Type
Data Source
Dataset
unstructured data
Text data
Wikipedia Entries
Graph data
Google Web Graph
Semi-structured
data
Table data
Profsearch Person
Resume
unstructured data
Text data
Amazon Movie
Reviews
structured data
Table data
ABC Transaction
Data
unstructured data
Graph data
Facebook Social
Graph
E-commence
Social Network
14/
WBDB 2013
Chosen Workloads
Micro Benchmarks
Basic Datastore
Operations
Relational Queries
Application
Scenarios
Search engine
Social network
Ecommerce system
15/
WBDB 2013
Chosen Workloads – Micro Benchmarks
Application
Domain
Data Type
Data Source Operations &
Algorithms
sort
Micro
Benchmarks
un-structured
text
grep
wordcount
graph
16/
WBDB 2013
BFS
Software
Stack
Benchmark
ID
Hadoop
1-1
Spark
1-2
MPI
1-3
Hadoop
2-1
Spark
2-2
MPI
2-3
Hadoop
3-1
Spark
3-2
MPI
3-3
MPI
4
Chosen Workloads – Basic Datastore
Operations
Application
Domain
Basic
Datastore
Operations
Data Type
semistructured
Data Source
Operations &
Algorithms
Software
Stack
Benchmark
ID
Read
HBase
5-1
Cassandra
5-2
MongoDB
5-3
MySQL
5-4
HBase
6-1
Cassandra
6-2
MongoDB
6-3
MySQL
6-4
HBase
7-1
Cassandra
7-2
MongoDB
7-3
MySQL
7-4
Write
table
Scan
17/
WBDB 2013
Chosen Workloads – Basic Relational
Query
Application
Domain
Data Type
Data Source
Basic
Relational
Query
structured
Table
Operations & Software
Algorithms
Stack
Select query
Aggregation
query
Join query
18/
WBDB 2013
Benchmark
ID
Hive
8-1
Impala
8-2
Hive
9-1
Impala
9-2
Hive
10-1
Impala
10-2
Chosen Workloads – Service
Application
Domain
Operations &
Algorithms
Data Type
Data Source
Software
Stack
Benchmark
ID
Search
Engine
Nutch server
Structured
Table
Hadoop
11
Pagerank
Un-structured
Graph
Hadoop
12
Index
Un-structured
text
Hadoop
13
Olio server
Structured
Table
MySQL
14
Kmeans
Un-structured
Graph
Hadoop
15-1
Spark
15-2
Social
Network
E-commerce
19/
Connetcted
components
Un-structured
graph
Hadoop
16
Rubis server
Structured
Table
MySQL
17
Collaborative
filtering
Un-structured
text
Hadoop
18
Naïve bayas
Un-structured
text
Spark
19
WBDB 2013
Related with Big Data Benchmarking
  BigDataBench   Uses case on BigDataBench 20/
WBDB 2013
Case Studies based on BigDataBech
 
 
 
 
21/
The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and
Technology of China, and Florida International University
AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC-­‐based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu. , C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University
WBDB 2013
New Solutions of Big Data Systems
……
22/
WBDB 2013
A Tradeoff?
Energy consumption
23/
WBDB 2013
Performance
23
" What is the p"erformance f Evaluating tohree respective big data different big data systems under systems types of applications? " Comparing two of them " What is the performance of from performance and different big data systems energy cost under different data volumes? " Analyzing the running features of different big " What is the energy consumption system, and the of different big data data systems? underlying reasons
24/
WBDB 2013
24
Experiment Planorms
•  Xeon – Mainstream processor •  Atom – Low power processor •  Tilera – Many core processor Comparison
Hadoop Cluster Basic
Xeon VS Configurations
Atom
Xeon VS Tilera
InformaXon
CPU Type
Intel Xeon E5310
Intel Atom D510 Buffer Tilera TilePro36
OoO ConnecXon FPU
Master/Slaves
1/7
1/7 and 1/1 TDP
ExecuXon
Mode
Sharing
All the
experiments
are 36 based
CPU Core
4 cores @ 1comparison
.6GHz 2 cores @ 1.66GHz
cores on
@ 500MHz
Having same Having core Xeon E5310
Yes
Yes the BUS
No the same 80W
BigDataBench
Comprison
hardware thread number number L1 I/D Cache
32KB
24KB
16KB/8KB
Atom D510
No
Yes
BUS
No
13W
L2 Cache seqng Yes
4096KB
Hadoop TilePro36
25/
WBDB 2013
512KB
64KB 16W
the Hadoop oYes
fficial website No Following IMESH
25
ImplicaHons from the Results
• Xeon vs. Atom
 Xeon is more powerful than Atom
 Atom is energy conservation than Xeon when dealing
with some easy application
 Atom doesn’t show energy advantage when dealing
with complex application
•  Xeon vs. Tilera
 Xeon is more powerful than Tilera
 Tilera is more energy conservation than Xeon when dealing with
some easy application
 Tilera don’t show energy advantage when dealing with complex application
 Tilera is more suitable to process I/O intensive application
26/
WBDB 2013
Case Studies based on BigDataBech
 
 
 
 
27/
The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and
Technology of China, and Florida International University
AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC-­‐based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu. , C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University
WBDB 2013
Greening Data Center
IDC says: Digital Universe will be 35 Zettabytes by 2020
Nature says: Distilling the meaning from big data has
never been in such urgent demand.
The data centers consumed about 1.3% electricity of all the
electricity use
The energy bill is the largest single item in the “total cost
of ownership of a Data Center”
28/
WBDB 2013
Power Usage EffecHveness If you can not measure it, you can not improve it.
– Lord Kelvin
PUE(Power usage effec,veness): a measure of how efficiently a computer data center uses its power; specifically, how much of the power is actually used by the informaHon technology equipment.
29/
WBDB 2013
AxPUE
PUE
30/
WBDB 2013
ApPUE
•  ApPUE (Application Performance Power Usage Effectiveness): a
metric that measures the power usage effectiveness of IT
equipments, specifically, how much of the power entering IT
equipments is used to improve the application performance.
•  Computation Formulas:
Data processing performance of applications
Application Performance
ApPUE =
IT Equipment Power
The average rate of IT Equipment Energy consumed
31/
WBDB 2013
AoPUE
•  AoPUE (Application ): a metric that measures the power usage
effectiveness of the overall data center system, specifically, how
much of the total facility power is used to improve the application
performance.
•  Computation Formulas:
AoPUE =
Application Performance
Total Facility Power
ApPUE
AoPUE =
PUE
32/
WBDB 2013
The average rate of Total Facility Energy Used
The Roles of BigDataBench
  ConduHng the experiments based on BigDataBench to demonstrate the raHonality of the newly proposed AxPUE from two aspects: AdopHng the comprehensive workloads of BigDataBench to design the applicaHon category–
sensiHve experiment.   AdopHng Sort of BigDataBench to design the algorithm complexity-­‐sensiHve experiment.
 
33/
WBDB 2013
Case Studies based on BigDataBech
 
 
 
 
34/
The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and
Technology of China, and Florida International University
AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC-­‐based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu. , C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University
WBDB 2013
MoHvaHon & ContribuHons
MoXvaXon
 The properties of big data bring challenges for
big data management.
 The performance diagnosis is of great
importance to provide healthy big data systems.
ContribuXons
 Propose a new performance anomaly detection method
based on ARIMA model for big data applications.
 Introduce a signature-based approach employing MIC
invariants to correlate a specific kind of performance problem.
 Propose an ensemble approach to diagnose the real
causes of performance problems in big data platform.
35/
WBDB 2013
The Roles of BigDataBench
  ConducHng the experiments based on BigDataBench to evaluate the efficiency and precision of proposed performance anomaly detecHon method .   Using the data generaHon tool of BigDataBench to generate experiment data.   Chosen workloads: Sort, Wordcount, Grep and Naïve Bayesian. 36/
WBDB 2013
Case Studies based on BigDataBech
 
 
 
 
37/
The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and
Technology of China, and Florida International University
AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC-­‐based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu. , C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University
WBDB 2013
Main Ideas
 
Characterize 16 various typical workloads from BigDataBench and HiBench by micro-­‐architecture level metrics.  
Analyze the similarity in these various workloads by staHsHcal techniques such as PCA and clustering.  
Release two typical workloads related to trajectory data process in real-­‐world applicaHon domain. 38/
WBDB 2013
39/
WBDB 2013
Contact informaHon
  Jianfeng Zhan hbp://prof.ict.ac.cn/jfzhan   [email protected]  
  BPOE: hbp://prof.ict.ac.cn/bpoe2013   BigDataBench: hbp://prof.ict.ac.cn/BigDataBench
40/
WBDB 2013