How to get the most from storage while staying within budget CW
Transcription
How to get the most from storage while staying within budget CW
buyer’s guide How to get the most from storage while staying within budget thinkstock CW Buyer’s guide In 2011, storage represented an Storage & Back-up average of 15% of the total IT budget. Making the wrong bets on how you architect your storage environment could cause this number to grow more than your finance chief might like. In this buyer’s guide, we assess the changing storage landscape and identify how the various technologies can be implemented to maximum effect. Contents Derive maximum value from storage page 2 Forrester analysts Andrew Reichman and Vanessa Alvarex identify the top trends in a rapidly changing storage market and methods for implementing relevant technologies to deliver and manage business data effectively and within budget page 4 thinkstock Cut costs with primary deduplication Chris Evans looks at how primary storage deduplication works, what it can achieve and how its use is set to increase How storage provision must change with virtual desktop infrastructure page 6 Centrally run storage can suck all benefits from a VDI deployment if it is not sufficiently provisioned, writes Cliff Saran These articles were originally published in the Computer Weekly ezine. a whitepaper from 1 Computer Weekly thinkstock buyer’s guide Derive maximum value from storage Forrester analysts Andrew Reichman and Vanessa Alvarex identify the top trends in a rapidly changing storage market and methods for implementing relevant technologies to deliver and manage business data effectively and within budget CW Buyer’s guide storage & back-up I n 2011, storage represented an average of 15% of the total IT budget. Making the wrong bets on how you architect your storage environment could cause this number to grow more than your finance chief might like. We assess the changing storage landscape and identify how the various technologies can be implemented to maximum effect. Storage will no longer be run as an island The traditional model for an infrastructure and operations (I&O) organisation is to have a distinct server, storage and network team, with different budgets and priorities – and the result is often strained relationships and poor communication among the groups. Because most firms don’t have effective chargeback, there is little visibility into the overall IT impact of any cross-group strategy moves. Add in the complexity of technical interactions between these silos, and you get a real mess. Change in this approach has been sorely needed for years, and we are starting to see it happen. We expect 2012 to be a banner year for convergence across these silos, and cooperation will bring storage out of the vacuum. Because storage is so expensive, CIOs and CFOs are paying more attention to purchase decisions, and this trend is pushing those purchases towards greater consistency and fit with the wider IT strategy. The consolidation of applications, increased use of virtual server technology, and the emergence of application-specific appliances and bundles means that it is more viable to buy consistent solutions for stacks such as Oracle databases and applications, VMware and Microsoft applications and virtual servers, among other workloads. Forrester’s advice: Break down the organisational and budgetary walls that prevent I&O people from cooperating. Consider aligning teams by the major workload stacks rather than technology components; you may see much better communication as a result. Make storage technology decisions in concert with server, network and application strategies, and you will likely start to optimise around the thing you care most about: supporting the business. Storage to become more specialised for big firms For years, many firms have simply chosen the “highest common denominator” as their single tier of storage – in other words, if some data needed 2 top-tier block storage, then in many cases, this was the only flavour to be deployed. As data volumes have grown over the years, the penalty for such a simple environment has grown, when much of the data does not really need top-tier storage. Additionally, unique requirements for specific workloads vary significantly, so the single flavour is often not well suited to big portions of the data being stored. Major workload categories that demand optimisation include virtual servers, virtual desktops, Oracle databases, Microsoft applications, files, data warehouse/business intelligence, mainframe, archives, and back-ups. Each of these has unique performance and availability profiles, and each has major applications that need close integration to the storage they use. Forrester’s advice: I&O professionals should be clear about which of these workloads are major consumers » of data in your large storage environment and see if an optimised architecture would make more sense than a generic solution. Once you start measuring and strategising along those lines, develop a set of scenarios about what you could buy and how you could staff along workload-optimised lines, and a strategy will emerge from there. » Cloud storage to become a viable enterprise option In 2010 and 2011, I&O professionals saw a great deal of attention being paid to multiple forms of cloud, storage included, but still, few large enterprises had jumped on board. With more enterprise-class cloud storage service provider options, better service level agreements (SLAs), the emergence of cloud storage gateways, and more understanding of the workloads that make sense, 2012 is likely to be a big year for enterprises moving data that matters into the public cloud. I&O professionals will have to assess what data they can move to the cloud on a workload-byworkload basis. There will not be a dramatic “tear down this datacentre” moment any time soon, but I&O professionals will quietly shift individual data to the cloud in situations that make sense, while other pieces of data will remain in a more traditional setting. The appropriate place for your data will depend on its performance, security and geographic access requirements, as well as integration with other applications in your environment. Forrester’s advice: I&O teams should evaluate their workloads to see if they have some that might make sense to move now. Develop a set of detailed requirements that would enable a move to the cloud, then evaluate service providers to determine what is feasible. Focus on files, archives, low-performance applications and back-up workloads as likely cloud storage candidates, and develop scenarios of SSD remains far more expensive than traditional spinning disk, so it is still challenging to figure out how and where to use it thinkstock buyer’s guide The high cost of storage is leading to greater consistency and fit with the wider IT strategy needs it. I&O professionals have another option in leveraging the performance power of SSD to enable better deduplication that could bring storage cost down, but these options are still newer to the market. If you currently use custom performance-enhancing configurations such as “short stroking”, then that data is likely to be a good candidate to get better results on SSD. If you have applications that are struggling to deliver the needed levels of performance, then SSD might be your best option to house their data. Forrester’s advice: You need to understand the performance requirements and characteristics of your workloads to make effective use of SSD. Don’t overspend on SSD where traditional disk will do – carry out rigorous performance analysis to find out where the bottlenecks are, and pick the tools that will address the gaps you uncover. how they could run in cloud models currently on the market. Make sure you think about fallback strategies in case the results are poor, so that you are insulated should your provider change its offering or go out of business. SSD to play a larger part in enterprise storage While application performance demands continue to increase, spinning disk drives are not getting any faster; they have reached a plateau at 15,000rpm. To fill the gap, the industry has coalesced around solid state disk (SSD) based on flash memory – the same stuff that’s in your iPod (for the most part). Flash memory is fast, keeps data even when it loses power, and recent improvements in hardware and software have increased the reliability profile to effectively meet enterprise needs. However, SSD remains far more expensive than traditional spinning disk, so it is still challenging to figure out how and where to use it. In 2012, Forrester expects to see existing and promising new suppliers showcase more mature offerings in a variety of forms, including SSD tiers within disk arrays supported in some cases by automated tiering, SSD data caches, and SSD-only storage systems. Because SSD is fast, but relatively expensive, the long-term media mix is likely to include cheap dense drives for the bulk of data that is not particularly performance sensitive, and a small amount of SSD that is targeted only for the data that truly Automated tiering will become widely adopted I&O teams have dreamed of an easy way to put the right data on to the right tier of storage media, but a costeffective, reliable way of doing so has remained elusive. Tiering, information lifecycle management (ILM), and hierarchical storage management (HSM) look promising, but few firms have managed to get it right and spend less money on storage as a result. Compellent, now owned by Dell, was a pioneer in sub-volume, automated tiering – a method that takes 3 the responsibility away from the administrator to make decisions about what should live where and has enough granularity to address the varied performance needs within volumes. Almost every supplier in the space is eagerly working on a tool that can accomplish this goal, and we are likely to see results in 2012, leading to increased maturity and wider adoption. However, some application providers say block storage systems don’t have enough context of data to effectively predict performance needs and that the applications should do this, rather than the storage systems. They also cite that the added central processing unit (CPU) burden outweighs benefits, or that SSD will eventually be cheap enough with advanced deduplication that a tierless SSD architecture will replace the need for tiering altogether. Suppliers such as NetApp also prefer a caching approach. Forrester’s advice: There is some validity in some of these arguments, but there is little doubt that automated tiering will play a bigger role in enterprise storage, along with alternatives such as application-driven data management, advanced caching, and SSD-centric systems. ■ This is an extract from the report: “Top 10 Storage Predictions For I&O Professionals” (Feb, 2012) by Forrester analysts Andrew Reichman and Vanessa Alvarez, both of whom are speaking at Forrester’s upcoming Infrastructure & Operations EMEA Forum 2012 in Paris (19-20 June). buyer’s guide Cut costs with primary deduplication Chris Evans looks at how primary storage deduplication works, what it can achieve and how its use is set to increase CW Buyer’s guide Storage & back-up D isk space reduction is a key consideration for many organisations that want to reduce storage costs. With this aim in mind, data deduplication has been deployed widely on secondary systems, such as for data back-up, but primary storage deduplication has yet to which further instances of data map to the single instance already held. In instances such as back-up operations, where the same static data may be backed up repeatedly, deduplication can reduce physical storage consumption by ratios as high as 10-to-1 or 20-to-1 (equalling reach this level of adoption. Data deduplication is the process of identifying and removing identical pieces of information in a batch of data. Compression removes redundant data to reduce the size of a file but doesn’t do anything to cut the number of files it encounters. Data deduplication, meanwhile, takes a broader view, comparing files or blocks in files across a much larger data set and removing redundancies from that. In a data deduplication hardware setting, rather than store two copies of the same data, the array retains metadata and pointers to indicate 90% and 95% saving in disk space respectively). Clearly, the potential savings in physical storage are significant. If primary storage could be reduced by up to 90%, this would represent huge savings for organisations that deploy large numbers of storage arrays. » “Inline deduplication requires more resources and can suffer from latency as data is checked against metadata” 4 buyer’s guide Unfortunately, the reality is not that straightforward. The use case for deduplicated data fits well with backup but not always so well with primary storage. Compared with large back-up streams, the working data sets in primary storage are much smaller and contain far fewer redundancies. Consequently, ratios for primary storage deduplication can be as low as 2-to-1, depending on the type of data the algorithm gets to work on. Having said that, as more organisations turn towards server and desktop virtual infrastructures, the benefits of primary storage deduplication implementations re-appear. Virtual servers and desktops are typically cloned from a small number of master images and a workgroup will often run from a relatively small set of spreadsheets and Word documents, resulting in highly efficient deduplication opportunities that can bring ratios of up to 100-to-1. The deduplication saving can even justify the use of solid-state drives (SSDs), where their raw cost would have been previously unjustifiable. Pros and cons Of course, primary storage deduplication is no panacea for solving storage growth issues and there are some disadvantages alongside the obvious capacity and cost savings. There are two key data deduplication techniques in use by suppliers today. Identification of duplicate data can be achieved either inline in real time, or asynchronously at a later time, known as post-processing. Inline deduplication requires more resources and can suffer from latency issues as data is checked against metadata before being committed to disk or flagged as a duplicate. Increases in CPU processing power help to mitigate this issue and, with efficient search algorithms, performance can actually be improved if a large proportion of the identified data is duplicated, as this data doesn’t need to be written to disk and metadata can simply be updated. Post-processing data deduplication requires a certain amount of storage to be used as an overhead until the deduplication process can be executed and the duplicates removed. In environments with high data growth rates, this overhead starts to cut into the potential savings. For both implementations, deduplicated data produces random I/O for read requests, which can be an issue for some storage arrays. Storage array suppliers spent many years optimising their products to make use of sequential I/O and prefetch. Deduplication can work counter to Supplier implementations of deduplication technology How have suppliers implemented deduplication technology into their primary storage systems? l NetApp: NetApp was the first supplier to offer primary storage deduplication in its filer products five years ago, in May 2007. Originally called A-SIS (advanced singleinstance storage), the feature performs post-processing deduplication on NetApp volumes. Many restrictions were imposed on volumes configured with A-SIS; as volume sizes increased, the effort required to find and eliminate duplicate blocks could have significant performance impacts. These restrictions have been eased as newer filers have been released with faster hardware. A-SIS is a free add-on feature and has been successful in driving NetApp in the virtualisation market. l EMC: Although EMC has had deduplication in its back-up products for some time, the company’s only array platform to currently offer primary storage deduplication is the VNX. This capability is restricted to file-based deduplication, traced to the part of the product that was the old Celerra. EMC has talked about block-level primary storage deduplication for some time, and we expect to see that in a future release. l Dell: In July 2010 Dell acquired Ocarina Networks. Ocarina offered a standalone deduplication appliance that sat in front of traditional storage arrays to provide inline deduplication functionality. Since acquisition, Dell has integrated Ocarina technology into the DR4000 for disk-to-disk backup and the DX6000G Storage Compression Node, providing deduplication functionality for object data. Dell is rumoured to be working on deploying primary storage deduplication in its Compellent products. l Oracle and suppliers that support ZFS: As the owner of ZFS, Oracle has had the ability to use data deduplication in its storage products since 2009. The Sun ZFS Storage Appliance supports inline deduplica- “With efficient search algorithms, performance can actually be improved” this because over time it pulls apart the “natural” sequence of blocks found in unreduced data, making gaps here and placing pointers there and spreading parts of the file across many spindles. Users can deal with this issue by adding flash as a top tier in working data, which provides rapid-enough access to combat the type of randomisation that’s an issue for spinning disk. Some suppliers mentioned in the panel above – the SSD startups – have seen the boost that flash can give to primary data deduplication and designed it into their product architectures from the start. ■ 5 tion and compression. The deduplication feature also appears in software from suppliers that use ZFS in their storage platforms. These include Nexenta Systems, which incorporated data deduplication into NexentaStor 3.0 in 2010, and GreenBytes, a start-up specialising in SSD-based storage arrays that also makes use of ZFS for inline data deduplication. l SSD array startups: SSD-based arrays are suited to coping with the impacts of deduplication, including random I/O workloads. SSD array startups Pure Storage, Nimbus Data Systems and SolidFire all support inline primary data deduplication as a standard feature. In fact, on most of these platforms, deduplication cannot be disabled and is integral to these products. l Suppliers targeting virtualisation: For platforms that specifically target virtualisation environments, Tintri and NexGen Storage offer arrays optimised for virtualisation, and both utilise data deduplication. NexGen has taken a different approach from some of the other recent start-ups and implements post-processing deduplication with its Phased Data Reduction feature. Primary storage data deduplication offers the ability to reduce storage utilisation significantly for certain use cases and has specific benefits for virtual server and desktop environments. The major storage suppliers have struggled to implement deduplication into their flagship products – NetApp is the only obvious exception to this – perhaps because it reduces their ability to maximise disk sales. However, new storage start-ups, especially those that offer all- or heavily SSD-reliant arrays, have used that performance boost to leverage data deduplication as a means of justifying the much higher raw storage cost of their devices. So it looks as if primary storage deduplication is here to stay, albeit largely as a result of its incorporation into new forms of storage array. » buyer’s guide Storage hardware How storage provision must change with virtual desktop infrastructure Centrally run storage can suck all benefits from a VDI deployment if it is not sufficiently provisioned, writes Cliff Saran becomes an option for companies in the mid-market.” CW Buyer’s guide Storage & back-up I f 100 people tried to access the same piece of data on conventional PC infrastructure simultaneously, it would result in a denial of service. And this is exactly the case with virtual desktop infrastructure (VDI). There is a growing realisation among IT professionals of an Achilles’ heel to desktop virtualisation, in the way conventional storage works in VDI. Virtualising hundreds, if not thousands, of desktop computers may make sense from a security and manageability perspective. But each machine has local processing, graphics processors and storage. Server CPUs may be up to the task of running most desktop applications and modern VDI offers local graphics accelerators. But storage needs to run centrally. So if each physical PC has 120GB of local storage, a 1,000 virtual desktop deployment needs at least 120TB of enterprise storage. However, even this is not enough. For a good user experience on VDI, the infrastructure must minimise latency. It boils down to I/O operations per seconds (IOPS) – the number of data reads and writes to disks. Theoretical models of usage claim a desktop PC spends 70-80% of its time performing disk reads and 1030% of its time writing to disk. But Ruben Spruijt, CTO at IT infrastructure specialist PQR, believes these numbers are underestimates. “In my experience a user’s PC spends 20-40% of the time doing reads, and 60-80% on disk writes,” says Spruijt. And writing to disk can be difficult in VDI. The more IOPS virtual desktops need, the greater the cost. Consider streaming media, desktop video conferencing and any application that makes frequent disk read and writes. Josh Goldstein, vice-president of marketing and product management at XtremIO, says: “Since successful “If you hold a lot of data in the disk controller, it reduces the number of writes to the disk” Hamish MacArthur, MacArthur Stroud ments. Some products try to sequence disk drive access to minimise the distance the disk heads need to move. Others perform data de-duplication, to prevent multiple copies of data being stored on disk. This may be combined with a large cache and tiering to optimise access to frequently used data. Today, the most talked-about move in disk technology is solid state disks (SSDs), which can be used for tier one storage to maximise IOPS. SSDs from companies such as Kingston improve VDI performance by boosting IOPS. Graham Gordon is operations director at ISP and datacentre company, Internet for Business. The firm is expanding its product portfolio to offer clients VDI. Gordon says: “SSD is still relatively expensive but it is getting cheaper. There is still some way to go before it selling of VDI storage requires keeping the cost of a virtual machine inline or lower than a physical machine, storage vendors artificially lower their IOPS planning assumptions to keep costs in line. “This is one of the reason many VDI projects stall or fail. Storage that worked great in a 50 desktop pilot falls apart with 500 desktops.” Innovation in disk technology Storage expert Hamish MacArthur, founder of MacArthur Stroud, says: “If you read and write every time it builds up a lot of traffic. One way manufacturers of disk controllers are tackling this problem is to hold the data. If you can hold a lot of data in the disk controller before writing to the disk, it reduces the number of writes to the disk.” A new breed of disk controllers is now tailored to virtualised environ- 6 Exploiting SSD flash niche EMC recently revealed elements of its upcoming product release resulting from its acquisition of XtremIO. It will use the start-up’s technology to create an entirely flash-based storage array to give enormous – albeit expensive – performance, compared with traditional disk drives. Goldstein claimed it could achieve unlimited IOPS which, in a short demonstration, reached 150,000 write and 300,000 read speeds. The key to the box is its ability to scale out and link up to other XtremIO arrays. Goldstein has demonstrated eight working together as a cluster, which had the potential to achieve over 2.3 million IOPS. He also showed the array creating 100 10TB volumes in 20 seconds, configuring 1PB overall. However, flash-based technologies and solid state drives are too expensive to run as primary storage in enterprise environments. Instead, companies are deploying tiered storage arrays, using SSD for immediate access to important data and cheaper hard drives for mass storage. Taking this a step further, in the Ovum report, 2012 Trends to watch: Storage, analyst Tim Stammers notes storage vendors are planning to create flash-based caches of data physically located inside servers, to eliminate the latency introduced by SANs. “Despite their location within third-party servers, these caches would be under the control of disk arrays. EMC has been the most vocal proponent of this concept, which is sometimes called host-side caching. EMC’s work in this area is called Project Lightning,” writes Stammers. Project Lightning has now become a product called VFCache, which places flash memory onto a PCIe card that plugs into the server. It allows a copy of data to be taken immediately at the server level, rather than before it gets to the storage, upping performance yet again. ■ This is an edited excerpt. Click here to read the full article online