Distributed NoSQL Storage for Extreme-Scale System Services
Transcription
Distributed NoSQL Storage for Extreme-Scale System Services
Distributed NoSQL Storage for Extreme-Scale System Services Tonglin Li1, Ioan Raicu1, 2 1Illinois Institute of FRIEDA-State: Scalable State Management for Scientific Applications on Cloud Motivation q Performance gap between storage and computing resource q Large storage systems suffering bottle neck of metadata q No suitable key-value store solution on HPC platforms q Cloud for scientific applications q Need application reproducibility and persistence of state q Clock drifting issue in dynamic environments Design and Implementation Design and Implementation q Written in C++, few dependency q Modified Consistent hashing q Persistent backend: NoVoHT q Use local files to store captured states q Merge and reorder with vector clock q Key-value store for storage and query support Broadcast Primitives q insert, lookup, remove q append, cswap, callback Response to request Response to request ZHT instance ZHT instance q Persistence q Dynamic membership q Fault tolerance via replication Performance UUID(ZHT) IP Port Capacity workload System architecture Physical node TEMPLATE DESIGN © 2 0 0 8 www.PosterPresentations.com Distributed event ordering Performance Overhead analysis of file based storage solution 4000 Storage solution comparison File: amortized latency 2 Cassandra servers 8 Cassandra servers Applications 1 Cassandra server 4 Cassandra servers DynamoDB 3500 16000 14000 12000 10000 8000 6000 4000 2000 0 File write atency 2000 1500 1000 500 2 4 8 16 32 0 64 1 2 ZHT/Q: A Flexible QoS Fortified Distributed KeyValue Storage System for the Cloud Motivation q Needs of running multiple applications on single data store q Optimizing single deployment for many different requirements Response buffer Highlighted features … Batching Strategy Engine Condition Monitor & Sender 1.0 Bn K K K K K K V V V V …V V Batch buckets Check condition q Adaptive request batching q QoS support q Traffic-aware automatic performance tuning Bn-1 q q q q q q Plugin Plugin Plugin Unpack and insert Latency Feedback Sending batches Result Service Returned batch results q q q q q q 12000" 10000" Pa*ern"1" 8000" Pa*ern"2" 6000" Pa*ern"3" 4000" Pa*ern"4" 2000" 5.0 50.0 Request latency in ms Batch request latency distributions Latency(in(ms( Single'node'throughput'in'ops/s' 0.8 0.6 0.4 0.2 0.0 14000" Unreliable network Wide range of request rate Admins interaction to nodes High write concurrency Many data types for sensors Scalable architecture 3" 15" 14" 13" 12" 11" 10" 9" 8" 7" 6" 5" 4" 3" 2" 1" 0" 1" 2" 4" 8" Number'of'nodes' Throughputs and scalability 16" 13.5% 2" 1.5" 4" 8" 4" 8" 16" 32" 64" Clients(#( Scalable current write 128" Real:;me%Latency% 100% Posters and extended abstracts •Tonglin Li, Chaoqi Ma, Jiabao Li, Xiaobing Zhou, Ioan Raicu, etc. , GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System, IEEE Cluster 2015 •Tonglin Li, Kate Keahey, Rajesh Sankaran, Pete Beckman, Ioan Raicu, A Cloud-based Interactive Data Infrastructure for Sensor Networks, SC2014 •Tonglin Li, Raman Verma, Xi Duan, Hui Jin, Ioan Raicu. Exploring Distributed Hash Tables in High-End Computing, ACM SIGMETRICS Performance Evaluation Review (PER), 2011 0" 1" 2" 4" 8" 16" Clients&#& 32" 64" 128" Speedup of distributed servers 14% 12% 10% 8% 8.5% 10% 6% 6.1% Queue&Server& Number& 0.5" 4.8% 4% 2% 1% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% Time'in'sec' •Tonglin Li, Ke Wang, Dongfang Zhao, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Ioan Raicu, A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud, IEEE International Conference on Big Data, 2015 •Tonglin Li, Kate Keahey, Ke Wang, Dongfang Zhao, Ioan Raicu, A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks, ScienceCloud 2015 16% Average%Latency% 2" Conference papers 8% 568.3% 539.1% 462.8% 2.5" 1" 2" Number'of'queue'servers' 2% 4% 1% 1000% 1" 2" 4" 8" Queue"Server" Number" •Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc., A Convergence of Distributed Key-Value Storage in Cloud Computing and Supercomputing, Journal of CCPE 2015. •Iman Sadooghi, Tonglin Li, Kevin Brandstatter, Ioan Raicu, etc. Understanding the Performance and Potential of Cloud Computing for Scientific Applications, IEEE Transactions on Cloud Computing (TCC), 2015 •Ke Wang, Kan Qiao, Tonglin Li, Michael Lang, Ioan Raicu, etc. Load-balanced and localityaware scheduling for data-intensive workloads at extreme scales, Journal of CCPE 2015. •Tonglin Li, Ioan Raicu, Lavanya Ramakrishnan, Scalable State Management for Scientific Applications in the Cloud, IEEE International Congress on Big Data 2014 •Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc. ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table, IPDPS, 2013. •Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Robert Ross, and Ioan Raicu. FusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems, IEEE International Conference on Big Data 2014 •Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, Ioan Raicu. Optimizing Load Balancing and Data-Locality with Data-aware Scheduling, IEEE International Conference on Big Data 2014 Multi-tier architecture Independent components in each tier Organize each tier as a Phantom domain for dynamic scaling Message queues as write buffers Transactional interaction via database Column-family with semi-structured data for various data types 1" 0" 0.5 128 Performance 16000" 0.1 64 Latency'in'ms' Check results Push requests to batch Update condition B3 32 Motivation Latency'in'ms'(log)' Choose strategy Request Handler B2 16 Design and Implementation Client API Wrapper B1 8 Client number Speedup& q Request batching proxy q Dynamic batching strategy 4 WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks q Distributed storage systems: ZHT/Q, FusionFS, IStore q Job scheduling/launching system: MATRIX, Slurm++ q Other systems: Graph/Z, Fabriq Initiate results Journal papers Amortized moving 2500 Client number Design and Implementation Selected Publications Amortized merging 3000 1 Workloads with multiple QoS Highlighted features Membership table Performance Performance q Pregel-like processing model q Using ZHT as backend q Partitioning at master node q Data locality q Load balance Contribution q ZHT: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table § Design and implementation of ZHT and optimized for high-end computing § Verified scalability on 32K-cores scale § Achieving latencies of1.1ms and throughput of 18M ops/sec on a supercomputer and 0.8ms and 1.2M ops/sec on a cloud § Simulated ZHT on 1 million-node scale for the potential use in extreme scale systems. q ZHT/Q: A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud § Supports different QoS latency on a single deployment for multiple concurrent applications, § Both guaranteed and best-effort services are provided § Benchmarks on real system (16 nodes) and simulations (512 nodes) q FRIEDA-State: Scalable state management for scientific applications on cloud § Design and implementation of FRIEDA-State § lightweight capturing, storage and vector clock-based event ordering § Evaluation on multiple platforms at scales of up to 64 VMs q WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks § Design and implementation of WaggleDB § Supporting high write concurrency, transactional command execution and tier-independent dynamic scalability § Evaluated with up to 128 concurrent clients q GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System § Design and implementation of GRAPH/Z, a BSP model graph processing system on top of ZHT. § The system utilizes data-locality and minimize data movement between nodes. § Benchmarks up to 16-nodes scales. Design and Implementation ZHT Manager Partition Partition Graph Processing System q Processing graph query q Handle big data set q Fault tolerance Update Highlighted features Graph/Z: A Key-Value Store Based Scalable Motivation Motivation Latency in us On both HPC systems and clouds the continuously widening performance gap between storage and computing resource prevents us from building scalable data-intensive systems. Distributed NoSQL storage systems are known for their ease of use and attractive performance and are increasingly used as building blocks of large scale applications on cloud or data centers. However there are not many works on bridging the performance gap on supercomputers with NoSQL data stores. This work presents a convergence of distributed NoSQL storage systems in clouds and supercomputers. It firstly presents ZHT, a dynamic scalable zero-hop distributed key-value store, that aims to be a building block of large scale systems on clouds and supercomputers. This work also presents several real systems that have adopted ZHT as well as other NoSQL systems, namely ZHT/Q (a Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud), FREIDA-State (state management for scientific applications on cloud), WaggleDB (a Cloud-based interactive data infrastructure for sensor network applications), and Graph/Z (a key-value store based scalable graph processing system); all of these systems have been significantly simplified due to NoSQL storage systems, and have been shown scalable performance. ZHT: A Light-weight Reliable Dynamic Scalable Zero-hop Distributed Hash Table Latency in ns Abstract Technology, 2Argonne National Laboratory 90% 100% 110% 120% 130% Dynamic tier scaling Acknowledgement