Compression

Transcription

Compression
Accelerate MySQL with SanDisk Memory File System NVMFS Thomas Rochner -­‐ DOAG 2015 1 A Global Leader in Flash Storage Solu?ons Financial Strength* • 
• 
• 
• 
• 
14.65 Billion Market Cap $6.63 Billion Revenue $2 Billion Cash $852 Million invested in R&D 65% Commercial; 35% Retail Qualified at
8 of the 9 Top
Server & Storage OEMs
Enterprise SSDs & Storage Software
SanDisk Ventures $75 Million for Strategic Investments Ver?cally Integrated Design -­‐-­‐ Development -­‐-­‐ Manufacturing – ApplicaPons in the Marketplace SanDisk Supplies
Client & Retail SSDs to
All Leading PC
Manufacturers
4th Most Valuable IP •  5,000 Patents •  Thomson Reuters Top 100 Global Innovator (4th consecuPve year) All Leading Smartphone
& Tablet Manufacturers
use SanDisk
* Annual financials as of Q4 2014 2 The Broadest PorNolio of Enterprise Flash Products SAS SSDs PCIe SSDs Flash-­‐Op?mizing SoKware Ultra-­‐Low Latency SSDs “Flash Appliances” 4 TB -­‐ 2,5” SSD – OpPmus Serie SATA SSDs InfiniFlash™ 3 Latency is important ! L1-­‐L3 Cache 10 ns DRAM 100 ns ioMemory 15 µs Blink of an eye 1/10 second Heartbeat 1 second Get Coffee 2.5 minutes SSD 500 µs Football Game 90 minutes HDD 4 ms Fly to South America 12 hours MulPpliziert mit 10.000.000 4 SanDisk PCIe Server Products ioMemory SX300/PX600 From 1 TB to 6.4TB MezzaninePCIe Applica?on Accelerators ioMemory SX350 5 Capacity, Performance, and Endurance PX600 5 years or maximum endurance used Latency Read/Write Access 92µs/15µs 1 TB ioDrive 12 PBW 4K R/W 196K/330K 2.7GB Read/1.5GB Write 1.3 TB ioDrive 16 PBW 4K R/W 235K/375K 2.7GB Read/1.7GB Write 2.6 TB ioDrive 32 PBW 4K R/W 350K/385K 2.7GB Read/2.2GB Write 5.2 TB ioDrive 64 PBW 4K R/W 285K/385K 2.7GB Read/2.1GB Write SX300 3 years or maximum endurance used Latency Read/Write Access 92µs/15µs 1.3 TB ioScale 4 PBW 4K R/W 196K/330K 2.7GB Read/1.5GB Write 1.6 TB ioScale 5.5 PBW 4K R/W 235K/375K 2.7GB Read/1.7GB Write 3.2 TB ioScale 11 PBW 4K R/W 350K/385K 2.7GB Read/2.2GB Write 6.4 TB ioScale 22 PBW 4K R/W 285K/385K 2.7GB Read/2.1GB Write SX350 3 years or maximum endurance used Latency Read/Write Access 79µs/15µs 1.3 TB ioScale 4 PBW 4K R/W 225/345 2.8GB Read/1.3GB Write 1.6 TB ioScale 5.5 PBW 4K R/W 270/375K 2.8GB Read/1.7GB Write 3.2 TB ioScale 11 PBW 4K R/W 350K/385K 2.8GB Read/2.2GB Write 6.4 TB ioScale 22 PBW 4K R/W 340/385K 2.8GB Read/2.2GB Write * Write bandwidth achieved with opPonal high power mode. Maximum write bandwidth performance of 1.6 GB/s achievable with 25W power limit. 6 Applica?on Performance Determine Value Apps (MySQL) File System (XFS, Ext4, Btrfs, HDFS, NVMFS) Linux Kernel (schedulers, I/O path, syscalls) 7 Legacy MySQL Challenges Double-­‐Write and Compression Penal?es 2 DRAM Page Page Page B C Buffer A 3 2 Writes Page Page Page A B C Buffer SSD (or HDD) Page A Page B 4 Page C Database 2 ApplicaPon iniPates updates to pages A, B, and C. Compression Performance Penalty (ReducPon in transacPon rate) MySQL copies updated pages to memory buffer. MySQL writes to double-­‐write buffer on the media. Once step 3 is acknowledged, MySQL writes the updates to the actual tablespace. TransacPon Rate compared to baseline 100% 80% 60% 40% 100% 20% 20% 0% Uncompressed (Baseline) 80% reducPon in TPS Database Server 1 Page Page Page A B C 80% performance penalty
with legacy MySQL
compression enabled
TransacPon Rate 1 Every MySQL write translates to
2 writes to storage device
Legacy MySQL Compression Results and performance may vary according to configuraPons and systems, including drive capacity, system architecture and applicaPons. 8 SanDisk Solu?on 9 1 Solving the Double-­‐Write Problem SanDisk NVMFS with Atomic Write § 
Enhanced Life Expectancy of Fusion ioMemory Devices – 
Reduce Writes to flash by half at similar throughput § 
Improved performance consistency § 
Reduced latency, increased transacPon/sec § 
Higher performance – 
Especially workloads with datasets that are bigger than DRAM Perfect Fit for ACID-­‐compliant MySQL ▸  MySQL with Atomic Write Database Server Page Page Page A B C 1 ApplicaPon iniPates updates to pages A, B, and C. 2 MySQL copies updated pages to memory buffer. 3 MySQL writes to actual tablespace, bypassing the double-­‐write buffer step due to inherent atomicity guaranteed by the intelligent Fusion ioMemory device. DRAM Page Page Page B C Buffer A Page A Page B Page C Fusion ioMemory Database One, Single Atomic Write! The performance results discussed herein are based on internal tesPng and use of Fusion ioMemory products. Results and performance may vary according to configuraPons and systems, including drive capacity, system architecture and applicaPons. 10 MySQL Compression 2 Improving SanDisk Contribu?on to MySQL Community § 
§ 
Benefits of compression without severe performance penalty –  Within 10% of uncompressed Up to 50% improvement in capacity uPlizaPon1 Enhanced life expectancy of flash devices2 – 
Up to 4x fewer writes to storage with Compression and Atomic Write 100% 80% TransacPon Rate § 
Compression Performance Penalty (ReducPon in transacPon rate) TransacPon Rate compared to baseline 60% 100% 90% 40% 20% 20% 0% Uncompressed (Baseline) Compression with almost no performance penalty legacy MySQL Compression SanDisk NVM Compression 1For workloads that compress well. Improvement will vary 2At Similar Throughput (assuming same load) The performance results discussed herein are based on internal tesPng and use of Fusion ioMemory products. Results and performance may vary according to configuraPons and systems, including drive capacity, system architecture and applicaPons. 11 Legacy MySQL Compression § 
Compressed page size is chosen at table creaPon § 
Compression is performed using regular somware compression libraries (zlib) § 
Table updates appended to Page ModificaPon Log (mlog) at the end of the compressed (8K) page § 
When mlog gets full, page is recompressed 8K Compressed Data 8K Compressed Data 8K mlog Compressed Data Insert 16K pages are compressed into a fixed compressed page size of 1K, 2K, 4K, 8K Uncompressed Data Insert § 
16K Insert MySQL stores uncompressed data in 16K pages Update § 
mlog 12 This is the cause of most of the performance penalty of MySQL compression Compressed Data Compressed Data >8K Uncompressed Data 8K Compressed Data mlog 8K Insert § 
8K Insert If recompress operaPon fails to fit within compressed block size, page is split into 2 pages which triggers an anempt to rebalance the tree Insert § 
Update Fail – Split – Rebalance – Recompress Uncompressed Data Compressed Data mlog 13 SanDisk Accelera?on NVMFS file system reports that less space is used on media § 
No limitaPons due to pre-­‐selected fixed compressed page size § 
Very simple 16K = (32) 512B Sectors 512B 512B 512B 512B 512B 512B 512B 8K = (16) 512B Sectors on Flash 512B 512B 512B 512B 512B 512B 512B UNALLOCATED 512B § 
Compressed Data 16K 512B Use TRIM to free unused space 512B § 
512B Tables recompressed with each update 512B § 
512B Only store uncompressed 16KB pages in memory. Keep code ‘as is’ 512B § 
Uncompressed Data 16K 512B Move compression to the lowest layer 512B § 
14 Benchmark tes?ng results 15 SanDisk NVMFS Improves Latency Consistency Lower Latency with Greater Stability NVMFS atomics XFS double-­‐write 200 180 140 XFS latency range 120 100 80 60 NVMFS latency range 40 20 0 1 78 155 232 309 386 463 540 617 694 771 848 925 1002 1079 1156 1233 1310 1387 1464 1541 1618 1695 1772 1849 1926 2003 2080 2157 2234 2311 2388 2465 2542 2619 2696 2773 2850 2927 3004 3081 3158 3235 3312 3389 3466 3543 Milliseconds 160 Time (Seconds) Sysbench -­‐ MariaDB 10.0.15, 4000 OLTP TXN injec?on/second, 99% latency, 220GB data -­‐ 10GB buffer pool NVMFS Atomic Write Significantly Reduces Latency while Increasing Performance Consistency Performance with Atomic Writes Double-­‐write disabled – Non-­‐ACID Double-­‐write – ACID Without Atomic Writes Atomic writes – ACID Twice the performance AND with ACID proper?es With Atomic I/O •  Atomic writes at 99% of the performance of raw writes •  2x flash device endurance improvement Maximum Throughput Improvement with Atomic Write 90000 80000 70000 77497 69516 68202 XFS EXT4 Ops/s 60000 50000 40000 30000 20000 10000 0 NVMFS atomics Linkbench : 110 GB data – 50 GB MySQL Buffer pool, MariaDB 10.0.15 Atomic Write also improve the maximum throughput for use cases where the database acPve data set is larger than the amount of DRAM available to MySQL 18 Performance Improvement: (TPCC-­‐like)
TPC-­‐C like workload MariaDB 10 1,000 warehouses -­‐ 75GB DRAM 30000 25000 20000 MySQL uncompressed New Order TX 15000 MySQL compression 10000 Fusion-­‐io Compression 0 Time 130 260 390 520 650 780 910 1040 1170 1300 1430 1560 1690 1820 1950 2080 2210 2340 2470 2600 2730 2860 2990 3120 3250 3380 3510 5000 Time 19 19 Compression with Almost no Performance Hit Balanced Reads/Writes Applica?ons 60000 50000 40000 Ops/s 30000 20000 10000 0 EXT4 Uncompressed EXT4 Legacy compression NVMFS compression (Linkbench, 110GB data -­‐ 50GB MySQL Buffer pool, MariaDB 10.0.15 20 SanDisk NVMFS Greatly Reduces Writes to Flash Improved Flash Life Expectancy = Improved ROI 2,5E+12 NVMFS = 70% Reduc?on in Data Wripen to Fusion ioMemory! Bytes 2E+12 1,5E+12 1E+12 5E+11 0 EXT4 uncompressed NVMFS atomics NVMFS atomics+compression Linkbench: 110GB data -­‐ 50GB MySQL Buffer Pool, MariaDB 10.0.15, Physical bytes wripen NVMFS with Flash-­‐Aware MySQL can Reduce Data Wripen to Flash by 70% 21 4x Beper Flash Endurance § 
Compression – 
§ 
Atomics – 
§ 
2x fewer writes to flash 2x fewer writes by disabling double-­‐writes Persistent TRIMs – 
Lower write amplificaPon benefits for flash 22 SanDisk NVMFS 23 SanDisk NVMFS EliminaPng Duplicate Logic & Leverage New PrimiPves for OpPmal Flash Performance & Efficiency Value §  Increase life expectancy of flash devices §  Consistent low latency §  Consistent high performance How § 
§ 
§ 
Reducing Writes to flash OpPmize IO Write path for flash ApplicaPons leverage enhanced I/O interface Application
user-­‐space
kernel-­‐space
Linux VFS (virtual Eile system) abstraction layer
Ext3 Eile metadata mgmt, block allocation, mapping, recycling,
ACID updates, logging/journaling, crash-­‐recovery
Kernel block layer
NVMFS Eile metadata mgmt
New File system Primitive Interfaces
Native Flash Translation Layer block allocation, mapping, recycling ACID updates, logging/journaling, crash-­‐recovery
24 Who is NVMFS for? § 
NVMFS will opPmize customer database flash storage by improving – 
– 
– 
§ 
Enterprise Customers – 
– 
– 
– 
§ 
TransacPonal performance (latency and throughput) Enhanced lifespan of flash devices PracPcal capacity OLTP databases running in a Linux environment Insert heavy workloads needing to persist large amounts of data Latency sensiPve OLTP workloads Customers concerned about flash endurance Hyperscale Customers – 
– 
Customers looking to improve CPU uPlizaPon per node Customers looking to consolidate clusters of MySQL nodes by being able to store more data 25 Released MySQL Versions suppor?ng NVMFS MariaDB mainline >= 5.5.31 Percona Server >= 5.5.31 Oracle MySQL >= 5.7.4 26 26 Thank you !