slides - Open Compute Project

Transcription

slides - Open Compute Project
Compute Summit
January 16-17, 2013
Santa Clara
Friday, January 18, 13
Compute Summit
Disaggregated Rack
Jason Taylor, PhD
Director, Capacity Engineering & Analysis
Friday, January 18, 13
Agenda
1
Facebook Scale & Infrastructure
2
Disaggregated Rack
Friday, January 18, 13
Facebook Scale
4 domestic regions today. Europe region will come online later this year.
Friday, January 18, 13
Facebook Scale
82%
of users are
outside of
the U.S
4 domestic regions today. Europe region will come online later this year.
Friday, January 18, 13
Facebook Stats
▪
1 billion users
300+ million photos added per day
▪ 4.2 billion likes, posts and comments per day
▪
140+ billion friend connections
▪ 220+ billion photos
▪ 17 billion check-ins
▪
Friday, January 18, 13
Cost and Efficiency
▪
From our 10-Q filed with the SEC in October 2012:
▪ “The first nine months of 2012 ... $1.0 billion for capital
expenditures related to the purchase of servers, networking
equipment, storage infrastructure, and the construction of
data centers.”
▪
At this size, we spend a lot of time thinking about
efficiency and costs.
Friday, January 18, 13
Architecture
Front-End Cluster
Web
250 racks
Back-End Cluster
Service Cluster
Friday, January 18, 13
Photos
Msg
Multifeed 9 racks
Other small services
Cache (~144TB)
Search
Ads 30 racks
Others
UDB
ADS-DB
Tao Leader
Lots of “vanity free” servers.
Friday, January 18, 13
Multifeed rack
▪
The rack is our unit of capacity.
▪
▪
Leaf + agg code runs on all servers
▪
▪
▪
All 40 servers work together
Leaf
L
A
L
A
Leaf has most of the the RAM
Aggregator uses most of the CPU
Lots of network BW within the rack
Friday, January 18, 13
Aggregator
.
.
.
.
L
A
Life of a “hit”
Front-End
request
starts
MC
MC
Time
Back-End
Feed
agg
Web
Database
MC
request
completes
Friday, January 18, 13
L
L
L
L
L
L
Ads
MC
Five Standard Servers
Standard
Systems
I
Web
III
Database
IV
Hadoop
V
Haystack
VI
Feed
High
2 x E5-­‐2670
Med
2 x E5-­‐2660
Med
2 x X5650
Low
1 x L5630
High
2 x E5-­‐2660
Memory
Low
16GB
High
144GB
Medium
48GB
Low
18GB
High
144GB
Disk
Low
250GB
High IOPS
3.2 TB Flash
High
12 x 3TB SATA
High
12 x 3TB SATA
Medium
2TB SATA
Photos, Video
MulVfeed,
Search, Ads
CPU
Services
Friday, January 18, 13
Web, Chat
Database
Hadoop
Five Server Types
▪
Advantages:
▪
▪
▪
▪
▪
Volume pricing
Re-purposing
Easier operations - simpler repairs, drivers, DC headcount
New servers allocated in hours rather than months
Drawbacks:
▪
▪
40 major services; 200 minor ones - not all fit perfectly
Service needs change over time.
Friday, January 18, 13
Agenda
1
Facebook Scale & Infrastructure
2
Disaggregated Rack
Friday, January 18, 13
Disaggregated Rack Challenge
▪
Can we build hardware that will fit more services and still
do well in terms of serviceability and cost?
▪
Can we build hardware that will grow with services over
time?
Friday, January 18, 13
Server/Service Fit - across services
TYPE-6 server
TYPE-6 server
CPU
CPU
WASTED CPU RESOURCE
RAM
MultiFeed
Friday, January 18, 13
RAM
Other Service A
Server/Service Fit - over time
TYPE-6 server
TYPE-6 server
CPU
CPU
NOT ENOUGH CPU
RAM
Year 1
Friday, January 18, 13
RAM
Year 2 - more CPU needed
In-Rack Resources
▪
Building blocks:
▪
▪
▪
CPU
RAM (key/value pairs)
Common resource pairs:
▪
▪
▪
▪
▪
Disk IOPS
Disk space
▪
Growth resources :
▪
▪
▪
Flash IOPS
Flash space
Friday, January 18, 13
CPU vs RAM
RAM vs Disk IOPS
RAM vs Flash IOPS
▪
▪
RAM
Disk space
Flash space
Disaggregated Rack
▪
How can we build hardware that is highly configurable
and re-configurable but still cost effective?
Friday, January 18, 13
A rack of multifeed servers...
Network Switch
Type-6 Server
80 processors
640 cores
COMPUTE
Type-6 Server
.
.
.
5.8 TB
RAM
80 TB
STORAGE
=>
Type-6 Server
Type-6 Server
Type-6 Server
Friday, January 18, 13
30 TB
FLASH
40 Feed servers per rack
each server with:
2 x E5-2660
144GB RAM
2TB hard drives
760GB of flash
* We assume full line-rate
network within the rack.
Compute
▪
Hardware
▪
▪
▪
▪
▪
2 processors
8 or 16 DIMM slots
no hard drive - small flash boot partition.
big NIC - 10 Gbps or more
This is essentially a standard server with network boot.
Friday, January 18, 13
Ram Sled
▪
Hardware
▪
▪
▪
Performance
▪
▪
128GB to 512GB
compute: FPGA, ASIC, mobile processor or desktop processor
450k to 1 million key/value gets/sec
Cost
▪
Excluding RAM cost: $500 to $700 or a few dollars per GB.
Friday, January 18, 13
Storage Sled (Knox)
▪
Hardware
▪
▪
▪
Performance
▪
▪
15 drives
Replace SAS expander w/ small server
3k IOPS
Cost
▪
Excluding drives: $500 to $700 or less than $0.01 per GB.
Friday, January 18, 13
Flash Sled
▪
Hardware
▪
▪
Performance
▪
▪
500GB to 8TB of flash
600k IOPS
Cost
▪
Excluding flash cost: $500 to $700
Friday, January 18, 13
A disaggregated rack for graph search...
Network Switch
40 processors
320 cores
Compute
COMPUTE
20 Compute Servers
.
.
Compute
Flash Sled
3.1 TB
=>
RAM
8 Flash Sleds
2 RAM Sleds
60 TB
STORAGE
1 Storage Sled
RAM Sled
.
.
Storage Sled
Friday, January 18, 13
30 TB
FLASH
=> 1:10 RAM:Flash ratio
* Add 4 more flash sleds in
2014 to get to a 1:15
RAM:Flash ratio *
Virtualization?
▪
Strengths:
▪
▪
▪
▪
Good for heavily-idle workloads
Outsourcing datacenter/server ops
Unpredictable workloads
Potential issues:
▪
▪
▪
Overhead
Shared bottlenecks under high load
The networking can be expensive
Friday, January 18, 13
Disaggregated Rack
▪
Strengths:
▪
▪
▪
▪
▪
▪
Volume pricing, serviceability, etc.
Custom Configurations
Hardware evolves with service
Smarter Technology Refreshes
Speed of Innovation
Potential issues:
▪
▪
Physical changes required
Interface overhead
Friday, January 18, 13
Questions?
Friday, January 18, 13
Compute Summit
January 16-17, 2013
Santa Clara
Friday, January 18, 13