slides - Open Compute Project
Transcription
slides - Open Compute Project
Compute Summit January 16-17, 2013 Santa Clara Friday, January 18, 13 Compute Summit Disaggregated Rack Jason Taylor, PhD Director, Capacity Engineering & Analysis Friday, January 18, 13 Agenda 1 Facebook Scale & Infrastructure 2 Disaggregated Rack Friday, January 18, 13 Facebook Scale 4 domestic regions today. Europe region will come online later this year. Friday, January 18, 13 Facebook Scale 82% of users are outside of the U.S 4 domestic regions today. Europe region will come online later this year. Friday, January 18, 13 Facebook Stats ▪ 1 billion users 300+ million photos added per day ▪ 4.2 billion likes, posts and comments per day ▪ 140+ billion friend connections ▪ 220+ billion photos ▪ 17 billion check-ins ▪ Friday, January 18, 13 Cost and Efficiency ▪ From our 10-Q filed with the SEC in October 2012: ▪ “The first nine months of 2012 ... $1.0 billion for capital expenditures related to the purchase of servers, networking equipment, storage infrastructure, and the construction of data centers.” ▪ At this size, we spend a lot of time thinking about efficiency and costs. Friday, January 18, 13 Architecture Front-End Cluster Web 250 racks Back-End Cluster Service Cluster Friday, January 18, 13 Photos Msg Multifeed 9 racks Other small services Cache (~144TB) Search Ads 30 racks Others UDB ADS-DB Tao Leader Lots of “vanity free” servers. Friday, January 18, 13 Multifeed rack ▪ The rack is our unit of capacity. ▪ ▪ Leaf + agg code runs on all servers ▪ ▪ ▪ All 40 servers work together Leaf L A L A Leaf has most of the the RAM Aggregator uses most of the CPU Lots of network BW within the rack Friday, January 18, 13 Aggregator . . . . L A Life of a “hit” Front-End request starts MC MC Time Back-End Feed agg Web Database MC request completes Friday, January 18, 13 L L L L L L Ads MC Five Standard Servers Standard Systems I Web III Database IV Hadoop V Haystack VI Feed High 2 x E5-‐2670 Med 2 x E5-‐2660 Med 2 x X5650 Low 1 x L5630 High 2 x E5-‐2660 Memory Low 16GB High 144GB Medium 48GB Low 18GB High 144GB Disk Low 250GB High IOPS 3.2 TB Flash High 12 x 3TB SATA High 12 x 3TB SATA Medium 2TB SATA Photos, Video MulVfeed, Search, Ads CPU Services Friday, January 18, 13 Web, Chat Database Hadoop Five Server Types ▪ Advantages: ▪ ▪ ▪ ▪ ▪ Volume pricing Re-purposing Easier operations - simpler repairs, drivers, DC headcount New servers allocated in hours rather than months Drawbacks: ▪ ▪ 40 major services; 200 minor ones - not all fit perfectly Service needs change over time. Friday, January 18, 13 Agenda 1 Facebook Scale & Infrastructure 2 Disaggregated Rack Friday, January 18, 13 Disaggregated Rack Challenge ▪ Can we build hardware that will fit more services and still do well in terms of serviceability and cost? ▪ Can we build hardware that will grow with services over time? Friday, January 18, 13 Server/Service Fit - across services TYPE-6 server TYPE-6 server CPU CPU WASTED CPU RESOURCE RAM MultiFeed Friday, January 18, 13 RAM Other Service A Server/Service Fit - over time TYPE-6 server TYPE-6 server CPU CPU NOT ENOUGH CPU RAM Year 1 Friday, January 18, 13 RAM Year 2 - more CPU needed In-Rack Resources ▪ Building blocks: ▪ ▪ ▪ CPU RAM (key/value pairs) Common resource pairs: ▪ ▪ ▪ ▪ ▪ Disk IOPS Disk space ▪ Growth resources : ▪ ▪ ▪ Flash IOPS Flash space Friday, January 18, 13 CPU vs RAM RAM vs Disk IOPS RAM vs Flash IOPS ▪ ▪ RAM Disk space Flash space Disaggregated Rack ▪ How can we build hardware that is highly configurable and re-configurable but still cost effective? Friday, January 18, 13 A rack of multifeed servers... Network Switch Type-6 Server 80 processors 640 cores COMPUTE Type-6 Server . . . 5.8 TB RAM 80 TB STORAGE => Type-6 Server Type-6 Server Type-6 Server Friday, January 18, 13 30 TB FLASH 40 Feed servers per rack each server with: 2 x E5-2660 144GB RAM 2TB hard drives 760GB of flash * We assume full line-rate network within the rack. Compute ▪ Hardware ▪ ▪ ▪ ▪ ▪ 2 processors 8 or 16 DIMM slots no hard drive - small flash boot partition. big NIC - 10 Gbps or more This is essentially a standard server with network boot. Friday, January 18, 13 Ram Sled ▪ Hardware ▪ ▪ ▪ Performance ▪ ▪ 128GB to 512GB compute: FPGA, ASIC, mobile processor or desktop processor 450k to 1 million key/value gets/sec Cost ▪ Excluding RAM cost: $500 to $700 or a few dollars per GB. Friday, January 18, 13 Storage Sled (Knox) ▪ Hardware ▪ ▪ ▪ Performance ▪ ▪ 15 drives Replace SAS expander w/ small server 3k IOPS Cost ▪ Excluding drives: $500 to $700 or less than $0.01 per GB. Friday, January 18, 13 Flash Sled ▪ Hardware ▪ ▪ Performance ▪ ▪ 500GB to 8TB of flash 600k IOPS Cost ▪ Excluding flash cost: $500 to $700 Friday, January 18, 13 A disaggregated rack for graph search... Network Switch 40 processors 320 cores Compute COMPUTE 20 Compute Servers . . Compute Flash Sled 3.1 TB => RAM 8 Flash Sleds 2 RAM Sleds 60 TB STORAGE 1 Storage Sled RAM Sled . . Storage Sled Friday, January 18, 13 30 TB FLASH => 1:10 RAM:Flash ratio * Add 4 more flash sleds in 2014 to get to a 1:15 RAM:Flash ratio * Virtualization? ▪ Strengths: ▪ ▪ ▪ ▪ Good for heavily-idle workloads Outsourcing datacenter/server ops Unpredictable workloads Potential issues: ▪ ▪ ▪ Overhead Shared bottlenecks under high load The networking can be expensive Friday, January 18, 13 Disaggregated Rack ▪ Strengths: ▪ ▪ ▪ ▪ ▪ ▪ Volume pricing, serviceability, etc. Custom Configurations Hardware evolves with service Smarter Technology Refreshes Speed of Innovation Potential issues: ▪ ▪ Physical changes required Interface overhead Friday, January 18, 13 Questions? Friday, January 18, 13 Compute Summit January 16-17, 2013 Santa Clara Friday, January 18, 13