2012-03-15 - HPC Lugano - Handout

Transcription

2012-03-15 - HPC Lugano - Handout
BUILDING HIGH AVAILABILITY SSD
•
•
•
•
•
Company overview
Architecture & Performance
Reliability
Maximizing SSD
Q&A
Adam Chunn
15 March 2012
Lugano - Switzerland
1 - 56
Select RamSan Facts…
The largest SSD installations in production in
the world
Currently operating in 10 major financial
exchanges worldwide
Used today by 7 out of 11 of the world’s largest
telecoms
Conducted a
financial trade
Installed and in production in over 34 countries
Sent a text
message
Shopped online
Placed online bet
Used pre-paid
wireless
Booked a cruise
or flight
Gamed online
Used an ATM
…RamSan is Everywhere
2 - 56
Select RamSan Facts…
The largest SSD installations in production in
the world
Currently operating in 10 major financial
exchanges worldwide
Used today by 7 out of 11 of the world’s largest
telecoms
Conducted a
financial trade
Installed and in production in over 35 countries
Sent a text
message
Shopped online
Placed online bet
Used pre-paid
wireless
Booked a cruise
or flight
Gamed online
Used an ATM
…RamSan is Everywhere
3 - 56
Background on TMS
Solid State Storage
Leader
Deep Domain
Expertise
Global Enterprise
Customers
Strong Financial
Performance
World Class
Team
• Industry’s highest performance, highest reliability,
lowest latency, lowest power SSD solutions
• 33 years experience designing SSDs; 30+ patents
granted and pending; many trade secrets
• Growing enterprise customer base in over 34
countries
• No Venture Capital/Long Term Debt
• Strong management and engineering teams
• Over 400 man-years of SSD experience
4 - 56
Key References
See all of these and more in the Success Stories section
of our web site at www.ramsan.com.
5 - 56
ARCHITECTURE & PERFORMANCE
6 - 56
L = λW
The long-term average number of customers in a
stable system L is equal to the long-term average
effective arrival rate, λ, multiplied by the average
time a customer spends in the system, W 1
Above is Little’s Law which is just a fancy
way to say that performance is based on
Latency and Parallelism
1
Paraphrased from Little’s Law, John D.C. Little and Stephen C. Graves, MIT
7 - 56
Flash Controller Design Basics
FLASH Media
Lookup
Tables
Flash
Controller
FPGA
• Each controller handles 10
flash chips
• The Lookup Tables and
Write Buffer is RAM
accessible from the
Write Buffer
controller only.
• The I/O Interface and
CPU controller are both separate
FPGAs
• The CPU is an embedded
CPU processor that handles all
RAM out-of-band operations
• DMAs are all processed
completely in FPGA
I/O Interface
hardware
8 - 56
DMAs are hardware only
FLASH Media
Lookup
Tables
Write Buffer
CPU
Flash
Controller
FPGA
CPU
RAM
I/O Interface
9 - 56
•DMAs are all
processed
completely in
FPGA hardware
Decreasing Latency
The Embedded CPU
FLASH Media
Lookup
Tables
Write Buffer
CPU
Flash
Controller
FPGA
CPU
RAM
I/O Interface
10 - 56
• Remove from the DMA
path, all non-critical flash
memory book-keeping
• Write setup
• Garbage collection
• Error handling
• Health calculation
• Wear Leveling
• Statistics collection
• Formatting
• Backup/Restore
• Key Generation
Increasing Parallelism
FLASH Media
Lookup
Tables
Flash
Controller
FPGA
• Increasing the number of flash
chips that can run concurrently
• Which is done by increasing the
number of flash chip controllers
• Each TMS flash chip controller
Write Buffer
can do 36 4KB DMAs in parallel
• (40 if you include the
CPU background chip RAID, or VSR,
operations)
• A RamSan-70 has 8 controllers,
so it can do 288 4KB operations
CPU simultaneously
RAM • A RamSan-810 has 40
controllers, so it can do 1440
I/O Interface
4KB operations simultaneously
11 - 56
L = λW
So, what else effects Latency and
Parallelism?
12 - 56
L = λW
What else effects Latency?
• CPU Speed
• not number of cores
• not number of chips
• Bus architecture
• North/south bridges
• PCIe hierarchy
• PCIe controller
• CPU Usage (so in a convoluted way, cores and chip counts
do matter)
13 - 56
L = λW
What else effects Latency?
• Operating system and file system
• OSes and file systems optimized for disks tend to count on slow
data access to hide processing
• Modern OSes and file systems are now written to maximize SSD
• The driver, the bridge between the OS and the hardware
• It must be thin or else adds latency
• Linux, Windows, Solaris, VMWare, OSX, AIX
• We are actively trying to push the driver into the Linux kernel
• If measuring at the application layer, middleware (for example,
databases) can inject latency
14 - 56
L = λW
What else effects Parallelism?
• Large Blocks
• RamSan products break apart large block DMAs into multiple, parallel
DMAs
• For example, a 64kB DMA is converted into 16 parallel 4kB DMAs
• A single application can be written to either have multiple threads of
synchronous I/O or a single thread that allows multiple outstanding
asynchronous I/O
• Most high-performance middleware does just this (such as Microsoft
SQL, Oracle, et cetera)
• Running multiple applications can provide the same effect as a single
application running multiple threads
• CPU becomes more and more of a bottleneck, however
15 - 56
CSCS Benchmark
• CSCS = Swiss National
Computing Centre
• Independent evaluation of
PCIe SSDs
• RamSan-70 results:
– “…by far the best IOPS result
we have ever measured…”
(300K+ random 4K IOPS)
– “Unlike the FusionIO and
Virident TachIOn devices, the
bandwidth is almost
independent of block size…”
16 - 56
RamSan Flash Product Portfolio
RamSan-70
RamSan-710/810
RamSan-720/820
RamSan-630
SLC Flash
SLC/eMLC Flash
SLC/eMLC Flash
SLC Flash
900GB
5/10TB
12/24TB
10TB
1.2M IOPS
400K/320K IOPS
500K/450K IOPS
1M IOPS
2.5GB/s
5/4GB/s
5/4GB/s
10GB/s
Full-height, halflength PCIe x8 2.0
Single Server Apps;
Distributed
filesystems
1U rackmount, 4x IB or FC ports
3U rackmount, 10x
IB or FC ports
Clustered Server Apps; Shared-storage filesystems (GPFS, GFS2,
etc)
17 - 56
SPC Price/Performance Leader
Top 10 SPC-1 IOPS™
1,40
TMS
RamSan400
SPC-2 MBPS™ x1k / Total TSC Price (USD)
SPC-1 IOPS™ / Total TSC Price (USD)
1,60
Top 10 SPC-2 MBPS™
1,20
1,00
TMS
RamSan-630
0,80
0,60
0,40
0,20
0,00
250.000
350.000
450.000
550.000
SPC-1 IOPS™
25
TMS
RamSan-630
20
15
10
5
0
5.000 6.000 7.000 8.000 9.000 10.000
SPC-2 MBPS™
18 - 56
Keys to Performance
• Hardware-only Data Path
– FPGA & Hardware Logic
– Faster than software-shared memory
• Software cannot add performance
– Virtualization is a software overhead to utilizing
additional hardware
– QoS is a software overhead to give applications
priority over another on shared hardware
19 - 56
RELIABILITY
20 - 56
Flash Quality
• Flash type matters!
Typical Chip Endurance
P/E Cycles (Thousands)
– SLC in most RamSans
– Enterprise MLC (eMLC) in
RamSan-8x0
• SLC is best but most
expensive/least dense
• eMLC chips last 10x
longer vs. normal MLC
• TMS technologies like
Variable Stripe RAID™
lengthen system life
100
90
80
70
60
50
40
30
20
10
0
MLC
21 - 56
eMLC
Flash Type
SLC
Combat Endurance
• Endurance of system is calculated:
Flash Capacity × Flash Quality
Media Write Bandwidth
22 - 56
Combat Endurance
5TB RamSan-710 (SLC Flash)
5TB × 100,000
= 15.8 Years Endurance
1 GBps
10TB RamSan-810 (eMLC Flash)
10TB × 30,000
= 9.5 Years Endurance
1 GBps
23 - 56
Combat Endurance
• Fight endurance with increased capacity
• eMLC has 2x Capacity for same cost
– 2/3rd endurance of SLC
• MLC is 3000 Writes where eMLC is 30000 Writes
• MLC is ~1/4th price of eMLC storage
– Sustained writes do not make sense for MLC
– MLC will last less than a year from sustained writes at same cost
and half the write workload
1TB × 3,000
= Less than a year
500 MBps
24 - 56
Flash Problems and TMS Solutions
Problem
Solution
Limited write-erase cycles
Wear leveling
Bit errors
ECC
Block/plane/device failures
Block remapping, RAID,
Variable Stripe RAID™
Disturb errors
Voltage and timing adjustments
(read, write, erase)
Erases need big blocks and
take a long time
Overprovisioning
25 - 56
Four Layers of Data Correction
Layer
Protection
System-level RAID 5
Module failure
managed by centralized RAID controllers
Module-level Variable Stripe RAID™
RamSan-720/820 only
Sub-chip failure, System Longevity
managed by each module across its chips
Module-level RAID 5
Chip failure
managed by each module across its chips
Chip-level ECC
Bit and block errors
managed by each module using its chips
RamSan-720/820 introduce System-Level RAID 5 across
Flash modules, plus the other mechanisms found on all
RamSan Flash storage systems.
26 - 56
Variable Stripe RAID™ (VSR)
• Patented VSR allows RAID stripe sizes to vary.
• If one die fails in a ten-chip stripe, only the failed die is
bypassed, and then data is restriped across the
remaining nine chips.
10 Chips
…
16 Planes
FAIL
…
27 - 56
2D Flash RAID™ (RS-720/820)
External
Interfaces
(FC, IB)
Interface
Interface
RAID
Controllers
RAID Controller
RAID Controller
RAID 5 within
Flash Modules
(9 data + 1 parity)
TMS
2D Flash RAID™
RAID 5 across Flash Modules (10 data + 1 parity + 1 hot spare)
28 - 56
RamSan-70 Overview
1. PCIe 2.0 x8
2. PowerPC CPU
5
4
5
2
4. 900GB usable SLC Flash
(1374GB raw)
3
5
6
3. Xilinx FPGAs
5. 4GB DRAM
4
5
6. Super-Capacitors
3
7. Half-length card
1
7
•
•
•
Usable 450-900GB
650,000 4K IOPS
2.5 GB/s Bandwidth
•
•
•
30 µs sustained 4K Write Latency / 100us 4K Read
Latency
10 Years Life Expectancy
Series-7™ Flash Controller
29 - 56
MAXIMIZING SSD
30 - 56
Segregation of Workload
• Metadata, Working Data, Archived Data
• Metadata is typically accessed the most, but takes up
the least space
• Archived Date is accessed the least, but takes up the
most space
• Moving high-access data into a high-performance
medium has the greatest impact
But the question is, what data
makes sense to store on SSD?
31 - 56
Performance per Capacity
• Historically, TMS has designed DRAM-based
SSD devices that performed GB/sec per GB of
storage [Metadata]
• Our flash-based SSD devices perform GB/sec
per TB of storage [Metadata, Working Data]
• Disk-based products typically grossly underperform SSD, but economical performance at
>>TB of storage [Archive, Large Working Data]
32 - 56
Algorithm Matrix
Low CPU Utilization +
Low I/O Wait
Low CPU Utilization +
High I/O Wait
High CPU Utilization +
Low I/O Wait
High CPU Utilization +
High I/O Wait
Algorithm needs to
=
provide more work
= Great fit for SSD!!
= In-memory work
=
Using Asynchronous I/O
Add disks for growing capacity
Add SSD for same size capacity
33 - 56
Q&A
34 - 56
RamSan-440 Overview
128 - 512 GB capacity
600,000 IOPS
4.5 GB/s throughput
Latency 15 µs
2-8 FC Ports
Industry Firsts:
512GB Non-volatile RAM storage
RAM SSD with Flash backup
RAID protected RAM and Flash modules
TMS patented IO2 Instant-on Input-Output option.
35 - 56
RamSan-440 Architecture
RAID Protected
RAM Boards
Management
Control
Processor
4 Dual-ported Fibre
Channel or
InfiniBand Interfaces
Redundant
Batteries
Hot Swappable
Redundant
Power Supplies
Redundant
Fans
RAID Protected
Backup Flash
4U Chassis
36 - 56
Series-7 Flash Controller Design
Lookup
Tables
Write
Buffer
4 GB RAM Cache
Best Performance:
4K aligned I/O
CPU – (out of Primary Data Path)
Write setup, Garbage
collection, Error handling
Out of the data path activities
Super Capacitors
Flash
Controller
FPGA
(Process all of
the “IN DATA”
activities)
4 GB RAM Cache
I/O Interface
37 - 56
Memory Backup
RamSan-630 Overview
1-10TB capacity
1 Million IOPS
10 GB/s throughput
Latency 80-250 µs
Highest density SLC Flash SSD system available.
Leverages proven flash core from the RamSan-20
and RamSan-620
Easily shared and multipathed through ten 8 Gbit
Fibre Channel ports or QDR InfiniBand ports
Enterprise Reliability
Single Layer Cell (SLC) Flash
Fault Tolerant Flash (FTF) Architecture
Active Spare Flash
38 - 56
RamSan-630 Architecture
5 Dual-ported FC or
IB
Interfaces
1-10TB of SLC
Flash Boards
Management
Control
Processor
Redundant
Power
Supplies
Redundant
Fans
3U Chassis
39 - 56
RamSan-630 Flash Board
RAID-5 Protected Flash
Embedded
PowerPC
480 GB usable, 640 GB RAW
On Board RAM
ECC Protected
Gateway
FPGA
4 Flash Controllers
40 - 56
Super capacitors
RamSan-710 Overview
1-5 TB Usable capacity (6.8 TB
Raw)
400,000 IOPS
5 GB/s throughput
35-175 µs latency
150K+ Write/Erase Cycles per Cell
Highest density SLC Flash SSD system available in a 1U
Series-7™ Flash Controller
Four 8 Gbit Fibre Channel ports or QDR InfiniBand ports
Enterprise reliability
Single Layer Cell (SLC) Flash
Variable Stripe RAID (VSR)™
Active Spare
41 - 56
RamSan-710 Overview
4-20 Flash modules
+ 1 “Active Spare”
2 dual-ported 8Gb FC or
QDR IB interfaces
management
control processor
redundant
power
supplies
1U chassis
N+1
batteries
redundant fans
42 - 56
RamSan-710 Overview
43 - 56
RamSan-810 Overview
2-10 TB Usable capacity (13.7 TB Raw)
320,000 IOPS
5 GB/s throughput
70-225 µs latency (est.)
30K+ Write/Erase Cycles per Cell
Highest density eMLC Flash SSD system available in a 1U
Series-7™ Flash Controller
Four 8 Gbit Fibre Channel ports or QDR InfiniBand ports
Enterprise reliability
enterprise Multi-Level-Cell (eMLC) Flash
Variable Stripe RAID (VSR)™
Active Spare
44 - 56
RamSan-810 Architecture
4-20 Flash modules
+ 1 “Active Spare”
1-2 interface modules
management
control processor
redundant
power
supplies
1U chassis
N+1
batteries
redundant fans
45 - 56
Motherboard
46 - 56
Toshiba eMLC Flash
Series-7 Flash
Controller FPGAs
Gateway FPGA
DDR DRAM
PowerPC CPU @
400 MHz
47 - 56
Applications Suited for eMLC
•
•
•
•
Data Warehousing
Web Content Hosting
Low Bandwidth Log Files
READ Intensive, Low WRITE Application
For Users Writing at 600 MB/s,
the Lifetime of the eMLC
RamSan-810 is rated at 10 years*.
–
–
–
*2TB
*6TB
*10TB
=10TB WRITES per Day
=30TB WRITES per Day
=50TB WRITES per Day
48 - 56
RamSan-70 Overview
49 - 56
RamSan-70 Architecture
1. PCIe 2.0 x8
5
3
5
4
5
3. Power PC CPU 333 mHz
4. Xilinx FPGAs
5
6
2. 900GB usable SLC Flash
(1374GB raw)
2
2
5. 4GB DRAM
4
6. Super-Capacitors
1
7. Half-length card
7
•
•
•
450-900GB
650.000 IOPS (4K)
2,5GB/s Bandwidth
•
•
•
50 - 56
30 µs Write Latency
10 Years Life Expectancy (25% writes)
Series-7™ Flash Controller
RamSan-720 Overview
6 or 12 TB Usable capacity (~ 7.8 or ~15.6 TB Raw)
500,000 IOPS (4K)
5 GB/s throughput
<100µs latency
No Single Point of Failure (nSPoF)
Hot Swappable Flash Cards
Highest density SLC Flash SSD system available in a 1U
Series-7™ Flash Controller
Four 8 Gbit Fibre Channel ports or QDR InfiniBand ports
High Enterprise reliability
Single-Level-Cell (SLC) Flash
Variable Stripe RAID (VSR)™
2D Flash RAID™
51 - 56
RamSan-820 Overview
12 or 24TB Usable capacity (~ 15.6 or ~31.2 TB Raw)
450,000 IOPS (4K)
5 GB/s throughput
<100µs latency
No Single Point of Failure (nSPoF)
Hot Swappable Flash Cards
Highest density eMLC Flash SSD system available in a 1U
Series-7™ Flash Controller
Four 8 Gbit Fibre Channel ports or QDR InfiniBand ports
High Enterprise reliability
enterprise Multi-Level-Cell (eMLC) Flash
Variable Stripe RAID (VSR)™
2D Flash RAID™
52 - 56
RamSan-Green IT
53 - 56
Bandwidth
Latency
Speed & IOPS
HAHNSTÄTTEN
•
MÜNCHEN
CONFIDENTIAL • COPYRIGHT BY PSP
World‘s Fastest Storage Since 1978
Pure
SSD-Racepower
HAHNSTÄTTEN
•
MÜNCHEN
CONFIDENTIAL • COPYRIGHT BY PSP
Thanks for your attention…
56 - 56