BPOE Research Highlights
Transcription
BPOE Research Highlights
BPOE Research Highlights http://prof.ict.ac.cn/jfzhan INSTITUTE OF COMPUTING TECHNOLOGY Jianfeng Zhan ICT, Chinese Academy of Sciences 2013-‐10-‐9 1 What is BPOE workshop? B: Big Data Benchmarks PO: Performance OpHmizaHon E: Emerging Hardware WBDB 2013 MoHvaHon Big data cover many research fields. Researchers Specific research fields + main conferences Have few chance to know about each other Gap between Industry and Academia bringing researchers and pracHHoners in related areas together WBDB 2013 BPOE communiHes CommuniHes of architecture, systems, and data management. discuss the mutual influences of architectures, systems, and data management. Bridge the gap of big data researches and pracHces between industry and academia WBDB 2013 BPOE: a series of workshops 1st BPOE in conjuncHon with IEEE Big Data 2013 Santa Clara, CA, USA Paper presentaHons + 2 Invited talks 2st BPOE in conjuncHon with CCF HPC China 2013, Guilin, Guangxi, China 8 invited talks 3st BPOE in conjuncHon with CCF Big Data Technology Conference 2013 Beijing, China 6 invited talks WBDB 2013 OrganizaHon Steering commi6ee Lizy K John University of Texas at AusHn Zhiwei Xu ICT, Chinese Academy of Sciences Cheng-‐zhong Xu, Wayne State University Xueqi Cheng, ICT, Chinese Academy of Sciences Jianfeng Zhan, ICT, Chinese Academy of Sciences Dhabaleswar K Panda Ohio State University PC Co-‐chairs Jianfeng Zhan, ICT, Chinese Academy of Sciences Weijia Xu, TACC, University of Texas at AusHn WBDB 2013 Paper submissions 26 papers received Each one has at least five reviews 30 TPC members 16 paper accepted Two invited talks WBDB 2013 Finalized programs Three sessions Performance opHmizaHon of big data systems • 3 from Intel, 1 from Hasso Plabner InsHtute, Germany Special Session: Big Data Benchmarking and Performance opHmizaHon. • 2 invited talks+ 1 from Academia Sinica Experience and evaluaHon with emerging hardware for big data • 1 from Japan, 3 from US WBDB 2013 Two invited talks 13:20-‐14:05 Invited Talk, BigDataBench: Benchmarking big data systems, Yingjie Shi, Chinese Academy of Sciences 14:05-‐14:50 Invited Talk, Facebook: Using Emerging Hardware to Build Infrastructure at Scale, Bill Jia, PhD. Manager, Performance and Capacity Engineering, Facebook WBDB 2013 Related work with Big Data Benchmarking BigDataBench Uses case on BigDataBench 10/ WBDB 2013 What is BigDataBench? A Big Data Benchmark Suite, ICT, Chinese Academy of Sciences PresentaHon is available soon from hbp://prof.ict.ac.cn/bpoe2013/program.php EvaluaHng big data (hardware) systems and architectures Opensource project 11/ hbp://prof.ict.ac.cn/BigDataBench WBDB 2013 Summary of BigDataBench 12/ WBDB 2013 BigDataBench Methodology Representative Real Data Sets Investigate Typical Application Domains Data Types Structured data Semi-structured data Unstructured data Data Sources Text data Graph data Table data Extended … Synthetic data generation tool preserving data characteristics Diverse and Important Workloads Application Types Offline analytics Realtime analytics Online services Basic & Important Operations and Algorithms Extended… Represent Software Stack Extended… 13/ Big Data Sets Preserving 4V WBDB 2013 BigDataBench Big Data Workloads RepresentaHve Datasets Application Domain Search Engine Data Type Data Source Dataset unstructured data Text data Wikipedia Entries Graph data Google Web Graph Semi-structured data Table data Profsearch Person Resume unstructured data Text data Amazon Movie Reviews structured data Table data ABC Transaction Data unstructured data Graph data Facebook Social Graph E-commence Social Network 14/ WBDB 2013 Chosen Workloads Micro Benchmarks Basic Datastore Operations Relational Queries Application Scenarios Search engine Social network Ecommerce system 15/ WBDB 2013 Chosen Workloads – Micro Benchmarks Application Domain Data Type Data Source Operations & Algorithms sort Micro Benchmarks un-structured text grep wordcount graph 16/ WBDB 2013 BFS Software Stack Benchmark ID Hadoop 1-1 Spark 1-2 MPI 1-3 Hadoop 2-1 Spark 2-2 MPI 2-3 Hadoop 3-1 Spark 3-2 MPI 3-3 MPI 4 Chosen Workloads – Basic Datastore Operations Application Domain Basic Datastore Operations Data Type semistructured Data Source Operations & Algorithms Software Stack Benchmark ID Read HBase 5-1 Cassandra 5-2 MongoDB 5-3 MySQL 5-4 HBase 6-1 Cassandra 6-2 MongoDB 6-3 MySQL 6-4 HBase 7-1 Cassandra 7-2 MongoDB 7-3 MySQL 7-4 Write table Scan 17/ WBDB 2013 Chosen Workloads – Basic Relational Query Application Domain Data Type Data Source Basic Relational Query structured Table Operations & Software Algorithms Stack Select query Aggregation query Join query 18/ WBDB 2013 Benchmark ID Hive 8-1 Impala 8-2 Hive 9-1 Impala 9-2 Hive 10-1 Impala 10-2 Chosen Workloads – Service Application Domain Operations & Algorithms Data Type Data Source Software Stack Benchmark ID Search Engine Nutch server Structured Table Hadoop 11 Pagerank Un-structured Graph Hadoop 12 Index Un-structured text Hadoop 13 Olio server Structured Table MySQL 14 Kmeans Un-structured Graph Hadoop 15-1 Spark 15-2 Social Network E-commerce 19/ Connetcted components Un-structured graph Hadoop 16 Rubis server Structured Table MySQL 17 Collaborative filtering Un-structured text Hadoop 18 Naïve bayas Un-structured text Spark 19 WBDB 2013 Related with Big Data Benchmarking BigDataBench Uses case on BigDataBench 20/ WBDB 2013 Case Studies based on BigDataBech 21/ The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC-‐based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu. , C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University WBDB 2013 New Solutions of Big Data Systems …… 22/ WBDB 2013 A Tradeoff? Energy consumption 23/ WBDB 2013 Performance 23 " What is the p"erformance f Evaluating tohree respective big data different big data systems under systems types of applications? " Comparing two of them " What is the performance of from performance and different big data systems energy cost under different data volumes? " Analyzing the running features of different big " What is the energy consumption system, and the of different big data data systems? underlying reasons 24/ WBDB 2013 24 Experiment Planorms • Xeon – Mainstream processor • Atom – Low power processor • Tilera – Many core processor Comparison Hadoop Cluster Basic Xeon VS Configurations Atom Xeon VS Tilera InformaXon CPU Type Intel Xeon E5310 Intel Atom D510 Buffer Tilera TilePro36 OoO ConnecXon FPU Master/Slaves 1/7 1/7 and 1/1 TDP ExecuXon Mode Sharing All the experiments are 36 based CPU Core 4 cores @ 1comparison .6GHz 2 cores @ 1.66GHz cores on @ 500MHz Having same Having core Xeon E5310 Yes Yes the BUS No the same 80W BigDataBench Comprison hardware thread number number L1 I/D Cache 32KB 24KB 16KB/8KB Atom D510 No Yes BUS No 13W L2 Cache seqng Yes 4096KB Hadoop TilePro36 25/ WBDB 2013 512KB 64KB 16W the Hadoop oYes fficial website No Following IMESH 25 ImplicaHons from the Results • Xeon vs. Atom Xeon is more powerful than Atom Atom is energy conservation than Xeon when dealing with some easy application Atom doesn’t show energy advantage when dealing with complex application • Xeon vs. Tilera Xeon is more powerful than Tilera Tilera is more energy conservation than Xeon when dealing with some easy application Tilera don’t show energy advantage when dealing with complex application Tilera is more suitable to process I/O intensive application 26/ WBDB 2013 Case Studies based on BigDataBech 27/ The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC-‐based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu. , C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University WBDB 2013 Greening Data Center IDC says: Digital Universe will be 35 Zettabytes by 2020 Nature says: Distilling the meaning from big data has never been in such urgent demand. The data centers consumed about 1.3% electricity of all the electricity use The energy bill is the largest single item in the “total cost of ownership of a Data Center” 28/ WBDB 2013 Power Usage EffecHveness If you can not measure it, you can not improve it. – Lord Kelvin PUE(Power usage effec,veness): a measure of how efficiently a computer data center uses its power; specifically, how much of the power is actually used by the informaHon technology equipment. 29/ WBDB 2013 AxPUE PUE 30/ WBDB 2013 ApPUE • ApPUE (Application Performance Power Usage Effectiveness): a metric that measures the power usage effectiveness of IT equipments, specifically, how much of the power entering IT equipments is used to improve the application performance. • Computation Formulas: Data processing performance of applications Application Performance ApPUE = IT Equipment Power The average rate of IT Equipment Energy consumed 31/ WBDB 2013 AoPUE • AoPUE (Application ): a metric that measures the power usage effectiveness of the overall data center system, specifically, how much of the total facility power is used to improve the application performance. • Computation Formulas: AoPUE = Application Performance Total Facility Power ApPUE AoPUE = PUE 32/ WBDB 2013 The average rate of Total Facility Energy Used The Roles of BigDataBench ConduHng the experiments based on BigDataBench to demonstrate the raHonality of the newly proposed AxPUE from two aspects: AdopHng the comprehensive workloads of BigDataBench to design the applicaHon category– sensiHve experiment. AdopHng Sort of BigDataBench to design the algorithm complexity-‐sensiHve experiment. 33/ WBDB 2013 Case Studies based on BigDataBech 34/ The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC-‐based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu. , C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University WBDB 2013 MoHvaHon & ContribuHons MoXvaXon The properties of big data bring challenges for big data management. The performance diagnosis is of great importance to provide healthy big data systems. ContribuXons Propose a new performance anomaly detection method based on ARIMA model for big data applications. Introduce a signature-based approach employing MIC invariants to correlate a specific kind of performance problem. Propose an ensemble approach to diagnose the real causes of performance problems in big data platform. 35/ WBDB 2013 The Roles of BigDataBench ConducHng the experiments based on BigDataBench to evaluate the efficiency and precision of proposed performance anomaly detecHon method . Using the data generaHon tool of BigDataBench to generate experiment data. Chosen workloads: Sort, Wordcount, Grep and Naïve Bayesian. 36/ WBDB 2013 Case Studies based on BigDataBech 37/ The Implica,ons from Benchmarking Three Different Data Center Pla:orms. Q. Jing, Y. Shi and M. Zhao University of Science and Technology of China, and Florida International University AxPUE: Applica,on Level Metrics for Power Usage Effec,veness in Data Centers. R. Zhou, Y. Shi, C. Zhu, F. Liu. NaHonal Computer network Emergency Response Technical Team CoordinaHon Center of China, China An Ensemble MIC-‐based Approach for Performance Diagnosis in Big Data Pla:orm. P. Chen, Y. Qi, X. Li, and L. Su. Xi'an Jiaotong University, China A Characteriza,on of Big Data Benchmarks. W. Xiong, Z. Yu. , C. Xu, SIAT, Chinese Academy of Sciences, and Wayne State University WBDB 2013 Main Ideas Characterize 16 various typical workloads from BigDataBench and HiBench by micro-‐architecture level metrics. Analyze the similarity in these various workloads by staHsHcal techniques such as PCA and clustering. Release two typical workloads related to trajectory data process in real-‐world applicaHon domain. 38/ WBDB 2013 39/ WBDB 2013 Contact informaHon Jianfeng Zhan hbp://prof.ict.ac.cn/jfzhan [email protected] BPOE: hbp://prof.ict.ac.cn/bpoe2013 BigDataBench: hbp://prof.ict.ac.cn/BigDataBench 40/ WBDB 2013