The Why And How Of SSD Over Provisioning White Paper WP004
Transcription
The Why And How Of SSD Over Provisioning White Paper WP004
The Why And How Of SSD Over Provisioning White Paper WP004 October 2012 Corporate Headquarters: 39672 Eureka Dr., Newark, CA 94560, USA ♦ Tel:(510) 623-1231 ♦ Fax:(510) 623-1434 ♦ E-mail: [email protected] Flash Design Center: 2 Robbins Road, Westford, MA 01886, USA ♦ Tel:(978) 303-8500 ♦ Fax:(978) 303-8757 Flash Design Center: 2600 W. Geronimo, Chandler, AZ 85244, USA ♦ Tel:(480) 792-8900 ♦ Fax:(480) 792-8901 Asia: Plot 18, Lrg Jelawat 4, Kawasan Perinudstrian Seberang Jaya 13700, Prai, Penang, Malaysia ♦ Tel:+604-3992909 ♦ Fax:+604-3992903 WP004 – The Why And How Of SSD Over Provisioning Table of Contents 1 Overview ......................................................................................................................................................... 2 2 Flash Management Algorithms ....................................................................................................................... 2 3 The Impact of Over Provisioning ..................................................................................................................... 3 4 Write Amplification ......................................................................................................................................... 4 5 Optimus SAS SSD family .................................................................................................................................. 5 1|Page October 2012 WP004 – The Why And How Of SSD Over Provisioning 1 Overview Solid State Disks reserve a portion of their total flash address space for “Over Provisioning” (OP), a percentage of the total physical memory reserved by the SSD and not part of the device’s logical address space. The level of OP affects both write performance and endurance (operational lifetime). Higher is always better. The differences in performance and endurance that result from the changes in OP are all a “natural” result of the amount of reserved physical address space, there are no algorithmic changes, the drive just runs more “efficiently” when the OP is higher. The HDD equivalent to this is called “short stroking” and the implications of the change are identical. The exact same device running the exact same firmware simply accesses a larger or smaller range of the physical address space. The benefit for a storage OEM is that qualifying an SSD with a one OP configuration is in essence the same as qualifying all OP configurations. The firmware and the hardware of the drive remain the same; the only difference will be seen in its performance, logical capacity and endurance specifications. Separate qualification processes for different levels of Over Provisioning within the same product family are not required. SMART Storage Systems offers the Optimus family of products in three specific levels of OP, allowing the Optimus SSDs to address a range of workloads that range from 10 “Drive Writes Per Day” (DWPD) to 50 DWPD with a level of performance and “Total Cost of Ownership” (TCO) that’s better than any comparable competitor. 2 Flash Management Algorithms When an SSD is in a FOB (Fresh-Out-of-Box) state 1 , almost the entire available flash memory space is in an unallocated/free pool 2. As host writes come in, the drive allocates enough flash memory to write the host data into those allocated physical memory locations, and assigns a logical-block-to-physical-address entry in a table. As new writes come in for previously unwritten logical blocks, more memory is allocated from the free pool to store this new data. This process continues until the entire logical capacity of the drive has been written once. The order and addresses of user data sent to the drive do not necessarily correlate with the physical locations of the data in the flash. The physical location used to store data for specific logical addresses is more a function of when the command was received than what the targeted logical block address was. Regardless of a host command’s logical block addresses, the associated data gets written to the next-available flash block in the free pool. The process is pretty straight forward as long as new writes always go to logical addresses that haven’t been written before. Things get more complicated when newly received write commands overlap previously written locations. Flash is not direct-rewritable memory. Data is stored in Blocks consisting of Pages that contain the User Data. Flash is written a Page at a time, but must be erased a Block at a time. So once written, the entire Block containing the 1 2 FOB state can be also achieved through a FORMAT function. Some overhead is required for drive operation and is excluded from the “unallocated” space. 2|Page October 2012 WP004 – The Why And How Of SSD Over Provisioning memory location must be explicitly erased before that location can be written again. This means that write commands that overlap data that has already been written require a Read/Modify/Write cycle in order to be completed. These Read/Modify/Write cycles are extremely costly from a performance and endurance standpoint. To minimize them, the SSD keeps a fixed amount of physical flash memory in reserve, called the Over Provisioning (OP) space. This memory is not part of the user-addressable logical space (the physical capacity of the drive is greater than the logical capacity). Where a HDD can just overwrite previously written areas of the media to update those locations, an SSD puts newly received data into newly allocated locations from the pool of free flash. Furthermore, depending on alignment and transfer size the drive may also need to read data from old adjacent locations to combine into a full page to write to the new location. When this happens, old flash blocks that now contain the “stale” data are marked as invalid and returned to the free pool (where they will be erased until such that they are ready to be used again). It is worth mentioning that logical block addresses that have never been written don’t actually “exist”. That is, there is no physical location containing data assigned to that logical block location. As a result, read performance on commands accessing these addresses will be poor. There is no flash location to read, and the drive has to generate the data “manually”. Protection Information (PI) can add additional overhead, slowing read performance further. For this reason, it is critically important to write all addresses at least once prior to reading in order to maximize read performance. 3 The Impact of Over Provisioning Like any system that dynamically allocates and de-allocates resources from a shared pool, the larger the size of the pool the higher the operating efficiency and the better the performance. Since the pool starts off completely full of erased flash blocks, new allocations occur quickly, with little-to-no overhead. The allocation routine simply supplies the next available physical address range from the pool. The resulting drive performance is high, but not representative of its sustainable performance. In typical operating environments, the capacity of the drive is quickly written and the initial performance level falls to a lower steady-state level. “Preconditioning” is designed to address this issue during performance testing. As the pool is used, it becomes “dirty”. The ongoing process of allocation and de-allocation results in fragmentation of the large contiguous unallocated regions in the physical flash free space. This can cause the allocation routine to have to work harder to find a region (or regions) sufficient to meet each new request, slowing performance. So the allocation routine must also periodically “garbage collect”, moving content around to pack used regions together in order to maximize the extent of the free regions in the pool. This can also cause slower performance. Over provisioning is generally defined as follows: 𝑂𝑃 (𝑂𝑣𝑒𝑟 𝑃𝑟𝑜𝑣𝑖𝑠𝑖𝑜𝑛𝑖𝑛𝑔) = 𝑃ℎ𝑦𝑠𝑖𝑐𝑎𝑙 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 −1 𝐿𝑜𝑔𝑖𝑐𝑎𝑙 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 A higher percentage of overprovisioning means that there is a higher probability of finding an available block for each new write. It also means that the SSD can perform garbage collection less “aggressively”, reducing its impact on latency and improving the overall performance of the drive. 3|Page October 2012 WP004 – The Why And How Of SSD Over Provisioning As discussed in Section 2, when the drive starts out in a FOB state all the flash blocks are in the free unallocated pool. In this condition, the drive is effectively 100% over provisioned, which significantly benefits write performance. As writes are completed, capacity is consumed and the effective OP decreases, eventually reaching the drive’s specified OP level. The OP determines the maximum logical capacity of the drive, but the drive’s consumed logical capacity can change the amount of OP that is effectively reserved. The same physical drive could be reduced in logical size by any additional arbitrary amount, and the logical space “surrendered” becomes part of the OP. By using less of the total logical address space (by reducing the logical block address range accessed by the host), the OP is increased, which implicitly results in higher performance and increased endurance. 4 Write Amplification The performance of an SSD is in large a function of how efficiently it manages the OP pool. Unfortunately, the allocation and de-allocation routines, garbage collection, checkpoint data, plus the Read/Modify/Write operations all result in more data to be written to the flash than is actually being received from the host. Write Amplification (WA) is defined as the ratio between these two, as shown below. 𝑊𝑟𝑖𝑡𝑒 𝐴𝑚𝑝𝑙𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 = 𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝐷𝑎𝑡𝑎 𝑊𝑟𝑖𝑡𝑡𝑒𝑛 𝑡𝑜 𝐹𝑙𝑎𝑠ℎ 𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝐷𝑎𝑡𝑎 𝑊𝑟𝑖𝑡𝑡𝑒𝑛 𝑏𝑦 𝐻𝑜𝑠𝑡 Because WA represents overhead and extra writes to flash, the lower the Write Amplification 3 the higher the performance of the SSD and the longer the drive will last. The reverse is true as well; a high Write Amplification results in reduced performance and accelerated drive wear out (Optimus includes a performance monitoring parameter that can be read by the host to determine current WA. This parameter can be configured to generate a S.M.A.R.T. warning if WA increases above the chosen threshold). The OP has a direct effect on the Write Amplification, as it allows the flash management algorithms to run more efficiently. Figure 1 below shows the correlation between Write Amplification and over provisioning. Figure 1: Write Amplification vs Over Provisioning 10 Write Amplification 9 OP=7% WA=8.60 8 7 6 5 4 OP=28% WA=2.73 3 OP=156% WA=1.22 2 1 0% 20% 40% 60% 80% 100% 120% 140% 160% Overprovisioning (%) 3 Unless the SSD controller deploys a compression scheme, write amplification will always be larger than 1. 4|Page October 2012 WP004 – The Why And How Of SSD Over Provisioning As can be seen from the graph above, higher OP has a significant impact on WA. A 7% OP drive has an average WA of 8.6. Every full write of the drive generates 8.6 times as many internal writes to Flash. A 156% OP drive has a WA of only 1.22. Subjecting the same drive type to the same workload at these two different OP levels, the higher OP drive would last 7x longer. Drive Writes Per Day (DWPD) describes how many bytes of data the SSD is rated to accept before reaching wear out. It is important to note that this includes consideration for the extra writes imposed by WA, as well as user data, but that it is independent of the physical capacity of the drive. 𝐷𝑟𝑖𝑣𝑒 𝑊𝑟𝑖𝑡𝑒𝑠 𝑃𝑒𝑟 𝐷𝑎𝑦 (𝐷𝑊𝑃𝐷) = 𝐸𝑛𝑑𝑢𝑟𝑎𝑛𝑐𝑒 ∗ (1 + 𝑂𝑃) 𝐷𝑎𝑦𝑠 𝑃𝑒𝑟 𝐿𝑖𝑓𝑒 ∗ 𝑊𝐴 As a result of this relationship, DWPD can be increased significantly, simply by increasing OP! 5 Optimus SAS SSD family SMART Storage Systems Optimus, Optimus Ultra and Optimus Ultra+ SAS SSDs take advantage of the effects of Write Amplification and Over Provisioning to deliver configurations ranging from 10 DWPD to 50 DWPD. The only difference between these models (besides the Label, Inquiry string/VPD identifying the type of drive, and raw capacity) is the logical address space they report to the host system. For example, an Optimus 400GB SSD has a physical flash capacity of 512GB (and a logical capacity of ~374GB). The “missing” 138GB is the 28% OP, which results in an endurance spec of 10 DWPD. The same 512GB raw capacity space can also be configured into an Optimus Ultra 300GB, effectively over provisioning the drive by an additional 50GB for a total of 71% OP space. This configuration supports 25 DWPD. Finally, by increasing the over provisioning even further to 156%, the Optimus Ultra+ provides a capacity of 200GB and endurance capability of 50 DWPD. These configurations are set by limiting the maximum LBA address the drive says it will accept (“MaxLBA”). This is accomplished via a single data-value change, the default setting of a field in a SAS Mode Page 4. SMART Storage Systems makes this change for customers when they order the specific model types, although it is possible for the customer to make this change in their configuration process or on their host systems and not be restricted to the three configurations offered by SMART Storage Systems. This architecture allows our customer to qualify one model from the Optimus family, and then purchase one or all of the models to use in a variety of different workload environments. The qualification burden is significantly reduced because the different models do not run different FW, and share 100% identical HW. The only difference is the maximum LBA range they are configured to accept. 4 The host can achieve the same effect by just not accessing any logical block addresses above the desired maximum. 5|Page October 2012 WP004 – The Why And How Of SSD Over Provisioning Disclaimer: No part of this document may be copied or reproduced in any form or by any means, or transferred to any third party, without the prior written consent of an authorized representative of SMART Storage Systems (“SMART”). The information in this document is subject to change without notice. SMART assumes no responsibility for any errors or omissions that may appear in this document, and disclaims responsibility for any consequences resulting from the use of the information set forth herein. SMART makes no commitments to update or to keep current information contained in this document. The products listed in this document are not suitable for use in applications such as, but not limited to, aircraft control systems, aerospace equipment, submarine cables, nuclear reactor control systems and life support systems. Moreover, SMART does not recommend or approve the use of any of its products in life support devices or systems or in any application where failure could result in injury or death. If a customer wishes to use SMART products in applications not intended by SMART, said customer must contact an authorized SMART representative to determine SMART's willingness to support a given application. The information set forth in this document does not convey any license under the copyrights, patent rights, trademarks or other intellectual property rights claimed and owned by SMART. The information set forth in this document is considered to be “Proprietary” and “Confidential” property owned by SMART. ALL PRODUCTS SOLD BY SMART ARE COVERED BY THE PROVISIONS APPEARING IN SMART'S TERMS AND CONDITIONS OF SALE ONLY, INCLUDING THE LIMITATIONS OF LIABILITY, WARRANTY AND INFRINGEMENT PROVISIONS. SMART MAKES NO WARRANTIES OF ANY KIND, EXPRESS, STATUTORY, IMPLIED OR OTHERWISE, REGARDING INFORMATION SET FORTH HEREIN OR REGARDING THE FREEDOM OF THE DESCRIBED PRODUCTS FROM INTELLECTUAL PROPERTY INFRINGEMENT, AND EXPRESSLY DISCLAIMS ANY SUCH WARRANTIES INCLUDING WITHOUT LIMITATION ANY EXPRESS, STATUTORY OR IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. ©2012 SMART Storage Systems. All rights reserved. Corporate Headquarters: 39672 Eureka Dr., Newark, CA 94560, USA ♦ Tel:(510) 623-1231 ♦ Fax:(510) 623-1434 ♦ E-mail: [email protected] Flash Design Center: 2 Robbins Road, Westford, MA 01886, USA ♦ Tel:(978) 303-8500 ♦ Fax:(978) 303-8757 Flash Design Center: 2600 W. Geronimo, Chandler, AZ 85244, USA ♦ Tel:(480) 792-8900 ♦ Fax:(480) 792-8901 Asia: Plot 18, Lrg Jelawat 4, Kawasan Perinudstrian Seberang Jaya 13700, Prai, Penang, Malaysia ♦ Tel:+604-3992909 ♦ Fax:+604-3992903 6|Page October 2012