Distributed NoSQL Storage for Extreme-Scale System Services

Transcription

Distributed NoSQL Storage for Extreme-Scale System Services
Distributed NoSQL Storage for Extreme-Scale System Services
Tonglin Li1, Ioan Raicu1, 2
1Illinois Institute of
FRIEDA-State: Scalable State Management for
Scientific Applications on Cloud
Motivation
q Performance gap between storage and computing resource
q Large storage systems suffering bottle neck of metadata
q No suitable key-value store solution on HPC platforms
q Cloud for scientific applications
q Need application reproducibility and persistence of state
q Clock drifting issue in dynamic environments
Design and Implementation
Design and Implementation
q Written in C++, few dependency
q Modified Consistent hashing
q Persistent backend: NoVoHT
q Use local files to store captured states
q Merge and reorder with vector clock
q Key-value store for storage and query support
Broadcast
Primitives
q insert, lookup, remove
q append, cswap, callback
Response
to request
Response
to request
ZHT
instance
ZHT
instance
q Persistence
q Dynamic membership
q Fault tolerance via replication
Performance
UUID(ZHT)
IP
Port
Capacity
workload
System architecture
Physical node
TEMPLATE DESIGN © 2 0 0 8
www.PosterPresentations.com
Distributed event ordering
Performance
Overhead analysis of file based
storage solution
4000
Storage solution comparison
File: amortized latency
2 Cassandra servers
8 Cassandra servers
Applications
1 Cassandra server
4 Cassandra servers
DynamoDB
3500
16000
14000
12000
10000
8000
6000
4000
2000
0
File write atency
2000
1500
1000
500
2
4
8
16
32
0
64
1
2
ZHT/Q: A Flexible QoS Fortified Distributed KeyValue Storage System for the Cloud
Motivation
q Needs of running multiple applications on single data store
q Optimizing single deployment for many different requirements
Response
buffer
Highlighted features
…
Batching
Strategy
Engine
Condition Monitor
& Sender
1.0
Bn
K K K K
K K
V V V V
…V V
Batch buckets
Check
condition
q Adaptive request batching
q QoS support
q Traffic-aware automatic performance tuning
Bn-1
q
q
q
q
q
q
Plugin
Plugin
Plugin
Unpack and
insert
Latency
Feedback
Sending batches
Result
Service
Returned batch
results
q
q
q
q
q
q
12000"
10000"
Pa*ern"1"
8000"
Pa*ern"2"
6000"
Pa*ern"3"
4000"
Pa*ern"4"
2000"
5.0
50.0
Request latency in ms
Batch request latency distributions
Latency(in(ms(
Single'node'throughput'in'ops/s'
0.8
0.6
0.4
0.2
0.0
14000"
Unreliable network
Wide range of request rate
Admins interaction to nodes
High write concurrency
Many data types for sensors
Scalable architecture
3"
15"
14"
13"
12"
11"
10"
9"
8"
7"
6"
5"
4"
3"
2"
1"
0"
1"
2"
4"
8"
Number'of'nodes'
Throughputs and scalability
16"
13.5%
2"
1.5"
4"
8"
4"
8"
16"
32"
64"
Clients(#(
Scalable current write
128"
Real:;me%Latency%
100%
Posters and extended abstracts
•Tonglin Li, Chaoqi Ma, Jiabao Li, Xiaobing Zhou, Ioan Raicu, etc. , GRAPH/Z: A Key-Value
Store Based Scalable Graph Processing System, IEEE Cluster 2015
•Tonglin Li, Kate Keahey, Rajesh Sankaran, Pete Beckman, Ioan Raicu, A Cloud-based
Interactive Data Infrastructure for Sensor Networks, SC2014
•Tonglin Li, Raman Verma, Xi Duan, Hui Jin, Ioan Raicu. Exploring Distributed Hash Tables in
High-End Computing, ACM SIGMETRICS Performance Evaluation Review (PER), 2011
0"
1"
2"
4"
8"
16"
Clients&#&
32"
64"
128"
Speedup of distributed servers
14%
12%
10%
8%
8.5%
10%
6%
6.1%
Queue&Server&
Number&
0.5"
4.8%
4%
2%
1%
0%
0%
10%
20%
30%
40%
50%
60% 70% 80%
Time'in'sec'
•Tonglin Li, Ke Wang, Dongfang Zhao, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Ioan Raicu,
A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud, IEEE
International Conference on Big Data, 2015
•Tonglin Li, Kate Keahey, Ke Wang, Dongfang Zhao, Ioan Raicu, A Dynamically Scalable
Cloud Data Infrastructure for Sensor Networks, ScienceCloud 2015
16%
Average%Latency%
2"
Conference papers
8%
568.3%
539.1%
462.8%
2.5"
1"
2"
Number'of'queue'servers'
2%
4%
1%
1000%
1"
2"
4"
8"
Queue"Server"
Number"
•Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc., A Convergence of Distributed Key-Value Storage
in Cloud Computing and Supercomputing, Journal of CCPE 2015.
•Iman Sadooghi, Tonglin Li, Kevin Brandstatter, Ioan Raicu, etc. Understanding the
Performance and Potential of Cloud Computing for Scientific Applications, IEEE Transactions
on Cloud Computing (TCC), 2015
•Ke Wang, Kan Qiao, Tonglin Li, Michael Lang, Ioan Raicu, etc. Load-balanced and localityaware scheduling for data-intensive workloads at extreme scales, Journal of CCPE 2015.
•Tonglin Li, Ioan Raicu, Lavanya Ramakrishnan, Scalable State Management for Scientific
Applications in the Cloud, IEEE International Congress on Big Data 2014
•Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc. ZHT: A Light-weight Reliable Persistent Dynamic
Scalable Zero-hop Distributed Hash Table, IPDPS, 2013.
•Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip
Carns, Robert Ross, and Ioan Raicu. FusionFS: Towards Supporting Data-Intensive Scientific
Applications on Extreme-Scale High-Performance Computing Systems, IEEE International
Conference on Big Data 2014
•Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, Ioan Raicu. Optimizing
Load Balancing and Data-Locality with Data-aware Scheduling, IEEE International Conference
on Big Data 2014
Multi-tier architecture
Independent components in each tier
Organize each tier as a Phantom domain for dynamic scaling
Message queues as write buffers
Transactional interaction via database
Column-family with semi-structured data for various data types
1"
0"
0.5
128
Performance
16000"
0.1
64
Latency'in'ms'
Check
results
Push requests to batch
Update condition
B3
32
Motivation
Latency'in'ms'(log)'
Choose
strategy
Request Handler
B2
16
Design and Implementation
Client API Wrapper
B1
8
Client number
Speedup&
q Request batching proxy
q Dynamic batching strategy
4
WaggleDB: A Dynamically Scalable Cloud Data
Infrastructure for Sensor Networks
q Distributed storage systems: ZHT/Q, FusionFS, IStore
q Job scheduling/launching system: MATRIX, Slurm++
q Other systems: Graph/Z, Fabriq
Initiate results
Journal papers
Amortized moving
2500
Client number
Design and Implementation
Selected Publications
Amortized merging
3000
1
Workloads with multiple QoS
Highlighted features
Membership
table
Performance
Performance
q Pregel-like processing model
q Using ZHT as backend
q Partitioning at master node
q Data locality
q Load balance
Contribution
q ZHT: A light-weight reliable persistent dynamic scalable zero-hop
distributed hash table
§ Design and implementation of ZHT and optimized for high-end
computing
§ Verified scalability on 32K-cores scale
§ Achieving latencies of1.1ms and throughput of 18M ops/sec on a
supercomputer and 0.8ms and 1.2M ops/sec on a cloud
§ Simulated ZHT on 1 million-node scale for the potential use in extreme
scale systems.
q ZHT/Q: A Flexible QoS Fortified Distributed Key-Value Storage System
for the Cloud
§ Supports different QoS latency on a single deployment for multiple
concurrent applications,
§ Both guaranteed and best-effort services are provided
§ Benchmarks on real system (16 nodes) and simulations (512 nodes)
q FRIEDA-State: Scalable state management for scientific applications on
cloud
§ Design and implementation of FRIEDA-State
§ lightweight capturing, storage and vector clock-based event ordering
§ Evaluation on multiple platforms at scales of up to 64 VMs
q WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor
Networks
§ Design and implementation of WaggleDB
§ Supporting high write concurrency, transactional command execution
and tier-independent dynamic scalability
§ Evaluated with up to 128 concurrent clients
q GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System
§ Design and implementation of GRAPH/Z, a BSP model graph processing
system on top of ZHT.
§ The system utilizes data-locality and minimize data movement between
nodes.
§ Benchmarks up to 16-nodes scales.
Design and Implementation
ZHT
Manager
Partition
Partition
Graph Processing System
q Processing graph query
q Handle big data set
q Fault tolerance
Update
Highlighted features
Graph/Z: A Key-Value Store Based Scalable
Motivation
Motivation
Latency in us
On both HPC systems and clouds the continuously widening
performance gap between storage and computing resource
prevents us from building scalable data-intensive systems.
Distributed NoSQL storage systems are known for their ease of
use and attractive performance and are increasingly used as
building blocks of large scale applications on cloud or data
centers. However there are not many works on bridging the
performance gap on supercomputers with NoSQL data stores.
This work presents a convergence of distributed NoSQL
storage systems in clouds and supercomputers. It firstly presents
ZHT, a dynamic scalable zero-hop distributed key-value store,
that aims to be a building block of large scale systems on clouds
and supercomputers. This work also presents several real
systems that have adopted ZHT as well as other NoSQL
systems, namely ZHT/Q (a Flexible QoS Fortified Distributed
Key-Value Storage System for the Cloud), FREIDA-State (state
management for scientific applications on cloud), WaggleDB (a
Cloud-based interactive data infrastructure for sensor network
applications), and Graph/Z (a key-value store based scalable
graph processing system); all of these systems have been
significantly simplified due to NoSQL storage systems, and have
been shown scalable performance.
ZHT: A Light-weight Reliable Dynamic Scalable
Zero-hop Distributed Hash Table
Latency in ns
Abstract
Technology, 2Argonne National Laboratory
90% 100% 110% 120% 130%
Dynamic tier scaling
Acknowledgement