NoSQL? No Worries: DynamoDB and ElastiCache
Transcription
NoSQL? No Worries: DynamoDB and ElastiCache
NoSQL? No Worries: DynamoDB and ElastiCache Dan Zamansky, Sr. Product Manager, AWS Siva Raghupathy, Principal Solutions Architect, AWS Agenda • • • • • NoSQL Why managed database service? DynamoDB ElastiCache Takeaways NoSQL NoSQL Benefits • Schema less • Highly Scalable – Size – Throughput • Highly Available Constraints • No cross table/item transactions • No complex queries or joins NoSQL available on AWS Managed • Amazon DynamoDB • Amazon ElastiCache – Memcached – Redis Unmanaged • Apache Cassandra • MongoDB • CouchDB • Riak • …. Why managed database services? If you host your databases on-premises App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches OS installation Server maintenance Rack & stack Power, HVAC, net you If you host your databases on-premises App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches OS installation Server maintenance Rack & stack Power, HVAC, net you If you host your databases on Amazon EC2 App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches you OS installation Server maintenance Rack & stack Power, HVAC, net If you host your databases on Amazon EC2 App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches you OS installation Server maintenance Rack & stack Power, HVAC, net If you choose a managed DB service Scaling High availability Database backups DB s/w patches DB s/w installs OS patches OS installation Server maintenance Rack & stack App optimization you Power, HVAC, net Who uses AWS Managed Database Services? Amazon DynamoDB Amazon DynamoDB • • • • • Managed NoSQL database service Accessible via Simple and Powerful APIs Supports both document and key-value data models Highly scalable Consistent, single-digit millisecond latency at any scale • Highly durable & available - 3x replication • No table size or throughout limits Table Table Items Attributes Mandatory Key-value access pattern Determines data distribution Hash Range Key Key Optional Model 1:N relationships Enables rich query capabilities All items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values paged responses Provisioned Throughput Model • Throughput provisioned at the table level – Write capacity units (WCU) are measured in 1 KB per second – Read capacity units (RCU) are measured in 4 KB per second • RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads • WCU and RCU are independent RCU • Consumed capacity is measured per operation WCU Scaling Partition 1 Partition 2 • Scaling is achieved through partitioning • Tables are partitioned for – Throughput • Provision any amount of throughput to a table – Size • Add any number of items to a table Partition 3 Partition 4 Partition N Table Indexing • Local Secondary Index – Local to a hash key – Alternate range key • Global Secondary Index – Across all hash keys – Alternate hash (+range) key User-files-table User File Date Shared (hash) (range) Size File-size-LSI User Size File Date (hash) (range) (table key) (projected) Shared-files-GSI Shared User File Date (hash) (table key) (table key) (projected) Data types • String (S) • Number (N) • Binary (B) • String Set (SS) • Number Set (NS) • Binary Set (BS) • • • • Boolean (BOOL) Null (NULL) List (L) Map (M) Used for storing nested JSON documents DynamoDB Table and Item API • CreateTable • UpdateTable • DeleteTable • DescribeTable • ListTables • • • • GetItem Query Scan BatchGetItem • • • • PutItem UpdateItem DeleteItem BatchWriteItem DynamoDB Streams API • • • • ListStreams DescribeStream GetShardIterator GetRecords DynamoDB Streams • Stream of updates to a table • Asynchronous • Exactly once • Strictly ordered – Per item • • • • Highly durable Scale with table 24-hour lifetime Sub-second latency DynamoDB Streams and AWS Lambda Cross-region replication US East (N. Virginia) DynamoDB Streams Asia Pacific (Sydney) Open Source CrossRegion Replication Library EU (Ireland) Replica Data & Access Modeling Store data based on how you will access it! 1:1 relationships or key-values • Use a table or GSI with a hash key • Use GetItem or BatchGetItem API Example: Given a user or email, get attributes Users Table Hash key UserId = bob UserId = fred Attributes Email = [email protected], JoinDate = 2011-11-15 Email = [email protected], JoinDate = 2011-12-01 Users-Email-GSI Hash key Email = [email protected] Email = [email protected] Attributes UserId = bob, JoinDate = 2011-11-15 UserId = fred, JoinDate = 2011-12-01 1:N relationships or parent-children • Use a table or GSI with hash and range key • Use Query API Example: – One device has many readings – For DeviceId = 1, find all readings where epoch >= 1435457946 Hash Key DeviceId = 1 DeviceId = 1 DeviceId = 2 Device-measurements Range key Attributes epoch = 1435457946 Temperature = 30, pressure = 90 epoch = 1435457960 Temperature = 32, pressure = 91 epoch = 1435458028 Temperature = 32, pressure = 91 N:M relationships • Use a table and GSI with hash and range key elements switched • Use Query API Example: Given a user, find all games. Or given a game, find all users. User-Games-Table Hash Key Range key UserId = bob GameId = Game1 UserId = fred GameId = Game2 UserId = bob GameId = Game3 Game-Users-GSI Hash Key Range key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob Documents (JSON) • • New data types (M, L, BOOL, NULL) introduced to support JSON Document SDKs – Simple programming model – Conversion to/from JSON – Java, JavaScript, Ruby, .NET • Cannot index (S,N) elements of a JSON object stored in M – They need to be modeled as top-level table attributes to be used in LSIs or GSIs Javascript DynamoDB string S number N boolean BOOL null NULL array L object M DynamoDB use cases - IoT case class CameraRecord( cameraId: Int, // hash key ownerId: Int, subscribers: Set[Int], hoursOfRecording: Int, ... ) case class Cuepoint( cameraId: Int, // hash key timestamp: Long, // range key type: String, ... ) Video: https://youtu.be/-0FtKBgYiik?t=79 DynamoDB use cases - AdTech Requirements: – Low <5ms response time – 1,000,000+ global requests/second – 100B items DynamoDB table HashKey RangeKey Value Key Segment 1234554343254 Key Segment1 1231231433235 Video: https://youtu.be/qV7yAwcMtYE?t=598 DynamoDB use cases - Retail Video: https://youtu.be/AHk3RhrETi4?t=1616 Amazon DynamoDB Best Practices • Keep item size small – Compress large items – Store metadata in Amazon DynamoDB and large blobs in Amazon S3 • Use table per day, week, month etc. for storing time series data • Use conditional updates for de-duping & versioning • Avoid hot keys and hot partitions Events_table_2012 Event_id (Hash key) Timestam p (range key) Attribute1 …. Attribute N Events_table_2012_05_week1 Events_table_2012_05_week2 Attribute1 …. Attribute N Event_id Timestam (Hash key) p Attribute1 …. Attribute N Event_id Timestam (range key) (Hash key) p Events_table_2012_05_week3 (range key) Attribute1 …. Attribute N Event_id Timestam (Hash key) p (range key) Amazon ElastiCache Why In-Memory? ms μs Why In-Memory? • Everything is connected - Phones, Tablets, Cars, Air Conditioners, Toasters • Demand for real-time performance – online games, AdTech, eCommerce, social apps etc. • Load is spikey and unpredictable • DB performance often the bottleneck Amazon ElastiCache • AWS Managed service that lets you easily create, use and scale in-memory key-value stores in the cloud and it comes in two flavors: Memcached Memcached Insanely fast! Patterns for sharding No persistence Very established In-memory key-value datastore Slab allocator Supports strings, objects Multi-threaded Redis In-memory key-value datastore Ridiculously fast! More like a NoSQL db Pub/sub functionality http://redis.io/commands Persistence Supports data types snapshots or append-only log strings, lists, hashes, sets, sorted sets, bitmaps & HyperLogLogs Read replicas Single-threaded Atomic operations supports transactions has ACID properties Memcached or Redis? Memcached Redis Simple caching to offload DB burden Ability to scale horizontally Multithreaded performance Advanced data types Sorting/Ranking data sets Pub/Sub capability HA through replication Persistence Yes with Redis 3.0 How can I leverage In-Memory? Key Use Cases Caching App Reads ElastiCache Cache Updates Database Reads Clients Elastic Load Balancing EC2 App Instances Amazon RDS Database Writes DynamoDB Be Lazy # Python pseudocode def get_user(user_id): # Check the cache record = cache.get(user_id) if record is None: # Run a DB query record = db.query("select * from users where id = ?", user_id) cache.set(user_id, record) return record # App code user = get_user(17) Write-back Caching # Python def save_user(user_id, values): # Save to DB record = db.query("update users ... where id = ?", user_id, values) # Push into cache cache.set(user_id, record) return record # App code user = save_user(17, {"name": "Nate Dogg"}) Leaderboards - Redis • • East to implement using Sorted Sets Simultaneously guarantees: – uniqueness and ordering def save_score(user, score): redis.zadd("leaderboard", score, user) def get_rank(user) return redis.zrevrank(user) + 1 It’s mine! Not if I destroy it first! Example ZADD ZADD ZADD ZADD "leaderboard" "leaderboard" "leaderboard" "leaderboard" 1201 963 1092 1383 "Gollum” "Sauron" "Bilbo" "Frodo” ZREVRANGE "leaderboard" 0 -1 1) "Frodo" 2) "Gollum" 3) "Bilbo" 4) "Sauron” ZREVRANK "leaderboard" "Sauron" (integer) 3 Customer Example – Globo App https://www.youtube.com/watch?v=F34SszLGH6A Recommendation Engines Use Redis to store data for recommendation algorithms such as Slope One. INCR HSET INCR HSET - "item:38923:likes" "item:38923:ratings" "Susan" 1 "item:38923:dislikes" "item:38923:ratings" "Tommy" -1 Redis counters used to increment/decrement number of likes or dislikes to an item. Redis hashes to maintain a list of everyone who likes or disliked an item. Task queue (Redis backed) Ruby based Resque http://github.com/resque • Basically, anything can be done asynchronously outside of the immediate user-experience: – – – – – – – sending email image or video processing converting or watermarking docs generating (large) reports priming or cleaning caches interacting with external APIs search indexing Python based Redis-Queue http://python-rq.org Chat and Messaging - Redis PUBLISH and SUBSCRIBE Redis commands SUBSCRIBE "chat:114" PUBLISH "chat:114" "Hello all" ["message", "chat:114", "Hello all"] UNSUBSCRIBE "chat:114" Redis HA on ElastiCache Auto-Failover Goes to replica with lowest replication lag No changes in DNS asynchronous replication writes use “Primary Endpoint” from Node Group reads use ‘replica’ endpoints from Node Group *can use ‘primary’ also Availability Zone #1 Availability Zone #2 Takeaways • Define you access patterns and needs • Use AWS managed services to offload the undifferentiated heavy-lifting of database management • Pick the right NoSQL tool: – ElastiCache-Memcached for key-value caching – ElastiCache-Redis for key-value caching and in-memory data structures – DynamoDB for storing & indexing key-values and documents