Backup and Recovery for Microsoft SQL Server
Transcription
Backup and Recovery for Microsoft SQL Server
White Paper Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Abstract Users are faced with many options and tradeoffs when choosing a backup strategy for Microsoft SQL Server databases. This white paper maps out those choices and examines how EMC® Data Domain® deduplication storage systems preserves data integrity, meets stringent RTO/RPO objectives, and integrates easily into a multitude of active SQL or third-party backup environments. February 2012 Copyright © 2012 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Microsoft and Microsoft SQL Server, Microsoft Exchange, Microsoft SharePoint, and Microsoft Hyper-V are registered trademarks or trademarks of Microsoft, Inc. in the United States and/or other jurisdictions. All other trademarks used herein are the property of their respective owners. Part Number h8116.2 Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 2 Table of Contents Executive summary.................................................................................................. 5 Audience ............................................................................................................................ 6 Introduction ............................................................................................................ 6 Data Domain Product Background ...................................................................................... 8 Advantages of Data Domain in a SQL Server Environment................................................... 8 EMC Data Domain Boost ..................................................................................................... 9 SQL Server Basics ................................................................................................. 10 Terminology .......................................................................................................... 11 Types of Backups ............................................................................................................. 11 Recovery Models .............................................................................................................. 12 Recovery Techniques ........................................................................................................ 12 Data Domain Integration Best Practices.................................................................. 14 Compression .................................................................................................................... 15 Multiplexing ..................................................................................................................... 15 Encryption ........................................................................................................................ 15 Backup Application Based Data Deduplication ................................................................. 16 Blocksize.......................................................................................................................... 16 Stripes ............................................................................................................................. 16 Backup Command ............................................................................................................ 18 Data Transfer Rates .......................................................................................................... 18 Integration ............................................................................................................ 19 Solution Planning ............................................................................................................. 20 Additional Considerations ..................................................................................... 20 Backup Types ................................................................................................................... 21 IP Network Considerations ............................................................................................... 21 Data Domain and Third-Party Backup Applications ........................................................... 21 Conclusion ............................................................................................................ 22 Appendix A: Index Fragmentation........................................................................... 23 Addressing the Challenge ................................................................................................. 23 Appendix B: Additional resources .......................................................................... 24 Microsoft Resource Links .................................................................................................. 24 EMC Data Domain links .................................................................................................... 25 Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 3 List of Figures Figure 1: Native MS-SQL Database Backup Tool ......................................................................... 6 Figure 2: NetWorker – MS SQL Client Properties – VSS Snapshot Configuration ......................... 7 Figure 3: Dual MS SQL Database Backups – NetWorker and Native SQL Server Back Up ............ 8 Figure 4: EMC Data Domain Boost .............................................................................................. 9 Figure 5: Microsoft SQL Server Management Studio Databases................................................ 10 Figure 6: Selection Recovery Model .......................................................................................... 12 Figure 7: Restore Database Dialog Box ..................................................................................... 13 Figure 8: Restore Database Options ......................................................................................... 13 Figure 9: Restore the Initial Full Backup then the First Transaction Log Backup ........................ 14 Figure 10: NetWorker MS SQL Client Restore GUI Example ....................................................... 14 Figure 11: Native SQL Backup - Disable Compression .............................................................. 15 Figure 12: Multi-striped Database Backup – Eight Stripes ........................................................ 18 Figure 13: Database Backup to a Null Device ........................................................................... 18 Figure 14: Multiple Null Disk Devices ....................................................................................... 19 Figure 15: Nominal Database Backup Performance .................................................................. 19 Figure 16: NetWorker Management Console ............................................................................. 22 Figure 17: DBCC “showcontig” Command Output ..................................................................... 24 Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 4 Executive summary Many database administrators prefer native Microsoft SQL Server backups directly to disk compared to using third-party backup applications. When utilizing native SQL Server backup, there is no reliance on the backup administrative team to perform backups or play a role in database recovery. Additionally, there is no longer a need for the database administrator to become proficient in deploying, configuring, administering, or maintaining third-party backup applications. Historically, native SQL backups have had some drawbacks for a couple of reasons: • Native SQL backup facilities do not provide automated media management capabilities and therefore must write to disk devices. While backups performed to disk media eliminated the challenge of manually managing tape cartridges, this method also introduced the need for a considerable amount additional disk. Conventional wisdom has traditionally been that the cost of disk versus removable tape media was significantly higher. • Backup to disk did not meet the requirement of retaining an offsite copy of database backups as part of a disaster recovery strategy. Native backup to disk fell short of providing a viable solution for this requirement. Deployed as database backup media, EMC® Data Domain® deduplication storage systems address the legacy shortcomings of performing native database backups to disk for the following reasons: • Data Domain deduplication storage systems optimize storage capacity, making retention and replication of backup data exceptionally cost and network-efficient by providing 10-30x data reduction • Data Domain systems are simple to integrate utilizing traditional backup software, but also offer an alternative with high-speed, cost-effective backup directly to a CIFS network share, utilizing native SQL Server backup. Users have the choice to eliminate the need for third-party SQL Server backup application agents and their associated operational costs and maintenance fees. • Data Domain Replicator provides up to 99% reduction in bandwidth required, which enables users to send data offsite for faster “time-to-DR” • EMC NetWorker integration with EMC Data Domain Boost (DD Boost) significantly increases performance by distributing parts of the deduplication process to NetWorker storage nodes or applications hosts, and serves as a solid foundation for additional integration between NetWorker and Data Domain systems • Data Domain systems benefit from the EMC Data Domain Data Invulnerability Architecture – continuous recovery verification, fault detection and self healing, and other resiliency features transparent to the backup application. This white paper provides information about the use of Data Domain deduplication storage as backup media for Microsoft SQL Server backups. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 5 Audience This white paper is intended as a guide for data protection architects, SQL Server database administrative staff, backup administrators and EMC partners seeking information about integrating Data Domain deduplication storage systems as a key component in a comprehensive backup and recovery strategy. Introduction Microsoft SQL Server backup methodology falls into one of two generic categories. The first consists of native SQL Server database backups. This backup technique creates SQL database backups using tools and utilities native to Microsoft SQL Server and does not rely on third-party backup application software (see figure 1). The native database backup tool performs a full database backup to disk through a CIFS network share. The tool is easy to use and provides a feature set that addresses business requirements. Benefits include the use of backup Figure 1: Native MS-SQL Database Backup Tool and recovery interfaces familiar to the database administrative staff. This ability is included with Microsoft SQL Server, and there are no additional third-party software license fees. The second backup methodology uses backup application software that integrates with Microsoft SQL Server to perform SQL database backups based on the Virtual Device Interface (VDI). This solution is typically packaged as a database agent specifically for Microsoft SQL Server and a particular backup application. When VDI is used, the backup application allows setting customized backup and recovery parameters similar to those that can be employed when using native Microsoft SQL tools and utilities. EMC NetWorker backup software has the capability to utilize available snapshot technologies designed to provide application consistency for the backup and recovery processes. The EMC NetWorker Module for Microsoft Applications (NMM) delivers unified, online backup and recovery utilizing Microsoft Virtual Shadow Copy Services (VSS) for Microsoft applications including SQL Server, Exchange, SharePoint, and Hyper-V. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 6 The NetWorker graphical user interface (Figure 2) is an example of a backup application that utilizes Microsoft VSS protection for SQL. Figure 2: NetWorker – MS SQL Client Properties – VSS Snapshot Configuration When the snapshot type is based on Microsoft VSS, the backup application is the VSS requestor, the SQL Server is the VSS writer, and backup is coordinated with a VSS provider. Advanced backup and recovery features such as disk staging and instant recovery may be available with these implementations depending on the backup application and agent being used. Additional Concepts Sometimes customers utilizing the native Microsoft SQL Server database backup methodology, augment their solution with third-party backup client agents that effectively protect the native backup data as a flat file. This two-phased methodology is effectively “backing up a backup.” Among the perceived benefits of the augmented solution is that it allows segregation of the SQL database administrative staff from the data protection staff while providing means to retain database backups in conformance with sound business practices and standardized corporate retention policies. There is another variant of the same methodology of an augmented backup solution that utilizes two backup solutions in combination to satisfy business objectives. Native SQL database backups are performed to a Data Domain system and subsequently backup is performed by a third-party backup application and written to the same Data Domain system (Figure 3). Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 7 Figure 3: Dual MS SQL Database Backups – NetWorker and Native SQL Server Back Up Data Domain Product Background Data Domain deduplication storage systems minimize backup and recovery times, storage and network bandwidth, and risk of data loss. EMC offers a range of Data Domain systems to meet the backup and archive requirements for companies of all sizes as they seek to reduce costs and simplify data management. Data Domain systems also offer replication that is extremely easy to deploy. The primary advantage of Data Domain system replication is that the data is deduplicated and compressed prior to being sent over the network. Advantages of Data Domain in a SQL Server Environment Data Domain systems can be directly integrated into Microsoft SQL Server environments as disk backup media. In addition, Data Domain systems support all leading enterprise backup and archive applications for seamless integration into existing IT infrastructures. The use of different backup methodologies with Microsoft SQL Server and Data Domain systems typically has a negligible effect on overall data deduplication ratios. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 8 This enables users to perform native database backups in conjunction with database backups controlled by a third-party backup application without affecting deduplication efficiency. This includes third-party backup applications that use a SQL agent, with or without VSS snapshots. Additionally, the use of different numbers of stripes or different blocksize values also has a negligible impact on deduplication ratios. Data Domain network-efficient replication can be used to create offsite copies of SQL backups faster and more economically than legacy tape-based strategies. Data Domain replication makes advanced disaster recovery preparedness for SQL Server a reality. EMC Data Domain Boost EMC Data Domain Boost (DD Boost) distributes parts of the deduplication process from the Data Domain system to the backup server or application client. In addition to storage node support, NetWorker 7.6 SP2 or later supports DD Boost-based backup from application hosts for Microsoft applications and databases. This is driving new efficiency for users with NetWorker and Data Domain. By sending only unique data from the NetWorker server or application client to the Data Domain system, less LAN bandwidth is required, backups are 50 percent faster and the whole aggregate system more manageable. NetWorker provides operational capabilities for configuring, monitoring, and reporting of backup and restores for Data Domain devices. This functionality is provided through the NetWorker Management Console (NMC) portal. The NMC portal is accessible from any supported remote Internet browser. The NMC Device Configuration Wizard simplifies the configuration of storage devices, backup clients, storage (target) pools, volume labeling, and save set cloning. DD Boost dramatically increases the aggregate throughput, up to 50% faster than NFS, and reduces the amount of data transferred over the network by 80 to 99 percent. These efficiencies can help eliminate future costs by leveraging existing backup servers and Ethernet networks. • • • Increases backup speed up to 50% faster Reduces network traffic Clone-controlled replication - • Schedules replication Catalog awareness Ease of use - Wizard automated configuration Monitoring and reporting Figure 4: EMC Data Domain Boost Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 9 With DD Boost, backup applications can control replication between multiple Data Domain systems and provide backup administrators with a single point of management for tracking all backups and duplicate copies. This paradigm allows backup administrators to efficiently create DR copies of their backups over the WAN using DD Replicator software and keep catalog consistency for easy disaster recovery. This also provides the flexibility for administrators to manage different retention periods for each copy of data. With NetWorker, the Data Domain replication process is managed by standard NetWorker cloning, ensuring that NetWorker can recognize and manage a replicated (remote) copy of data and assign unique retention policies to it. The administrator has the ability to schedule the cloning process to run at a time that is most appropriate for the business. SQL Server Basics A Microsoft SQL server instance includes system and user databases. As depicted in Figure 5, system databases are created at installation and include: • The “master” database, which records all system-level information for a Microsoft SQL server. It contains records for all login accounts and all system configuration settings. The master database records the existence and location of all other databases. • The “model” database, which is used as a template that contains the default settings for all databases created within the Microsoft SQL Server instance • The “msdb” database, which is used for scheduling, alerts, and jobs • Figure 5: Microsoft SQL Server The “tempdb” database, which serves as a Management Studio Databases global resource that contains all temporary tables and temporary stored procedures. It is re-created every time the Microsoft SQL Server instance is started. Data protection strategies for the system databases are dependent on the database being protected. For instance, transaction log backups are not supported for the master database. The master database cannot be recovered if a functional version of it does not already exist. Recovery procedures for the master database may include re-installing Microsoft SQL Server such that a backup of the pre-disaster master database can then be restored. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 10 The model and msdb databases can contain customized data such as user-specific templates, scheduling information, as well as backup and restore history information. Without a data protection strategy, these items will need to be manually reconstructed in the event of a disaster. The tempdb database is empty when the SQL instance is shut down, and does not require protection as it is re-created at startup. Terminology Entire databases, specific database files, file groups, and transaction log backups are among the supported backup types with Microsoft SQL Server. This section defines the terminology associated with a given backup type. Types of Backups Database backups • Database Backup – This is a full backup of an entire database and represents the state of the database at the point when the backup is completed • Differential Database Backup – This is a backup of all the files within a database, and contains only the extents modified since the most recent full backup of each file. Restoring a database protected with full and differential backups to the most recent point in time includes recovering the most recent full and differential backup. Partial backups • Partial Backup – Partial backups provide flexibility for backing up databases that contain some number of read-only file groups. This is a partial backup of all data in the primary filegroup, each read/write filegroup, and any optionally specified read-only files or filegroups. • Differential Partial Backup – This backup contains only the extents modified since the prior partial backup of the same set of filegroups File backups • File Backup – This consists of a full backup of all data in one or more files or filegroups • Differential File Backup – This is a backup of one or more files containing data extents changed since the prior full backup of each file Transaction log backups • Regular transaction log backups are required when using the full or bulk-logged recovery models. This backup contains all log records that have not been backed up previously. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 11 Copy-Only backups • Database backups usually change the database in some way, such as truncating a transaction log in the case of a full database backup. Copy-Only backups can be used in cases where a backup of a database is required without changing the database. Recovery Models Microsoft SQL Server includes three recovery models: simple, bulk logged, and full (see Figure 6). The desired recovery model can be deployed based on requirements. Functionally, each recovery model differs with regard to how backup and recovery strategies are executed. • The full recovery model includes log backups. This model typically has no exposure to data loss. Point-intime recovery is possible, up to including the last committed transaction. • The bulk logged recovery model requires log backups. This model permits highperformance bulk copy operations. Recovery to the end of any backup is possible; point-in-time recovery is not supported. • Figure 6: Selection Recovery Model The simple recovery model consists of performing full backups only. Logs are not backed up. In the event database recovery is required, the most recent full backup can be restored. Any changes that occurred subsequent to the last full backup must be redone. From a transactional perspective, the database can only be recovered to the point of the prior full backup. Recovery Techniques The technique used to restore a database will vary based on the recovery model being used as well as the backup types being performed. Figures 7-10 provide a brief look at restoring a database that was protected using the full recovery model with full and transaction log backups. A single full backup was performed, followed by five transaction log backups. Figure 7 depicts the restore database dialog box and general database restore attributes. By default the full backup and subsequent transaction log backups are all selected. Clicking the “OK” button would initiate recovery to the most recent possible point in time. Alternately, recovery to a specific point in time is also possible. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 12 Figure 7: Restore Database Dialog Box Figure 8 depicts restore database options and available database recovery options. By default an existing database will not be overwritten. Also note that by default the recovery state is “RESTORE WITH RECOVERY,” which leaves the recovered database in an online and unstable state after the restore process completes. Figure 8: Restore Database Options Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 13 Figure 9 is an example of a recovery transaction that restores the initial full backup, followed by the first transaction log backup. The remaining transaction logs were not included in this query for brevity. Figure 9: Restore the Initial Full Backup then the First Transaction Log Backup EMC NetWorker and third party backup applications will each have a unique recovery interface for databases. Many automate and coordinate the recovery of full and transaction log backups similar to the way native Microsoft SQL Server tools and utilities do. Figure 10 is an example of the NetWorker MS SQL client restore GUI Figure 10: NetWorker MS SQL Client Restore GUI Example Data Domain Integration Best Practices Table 1 presents a summary of the suggested best practices settings for Microsoft SQL Server backup to Data Domain deduplication storage systems. Table 1: Recommended Backup Software Settings PARAMETERS AFFECTING DEDUPLICATION PERFORMANCE SQL Server 2008 native compression SETTING NO_COMPRESSION Third-party backup application SQL Server local compression Disabled Third-party backup application multiplexing Disabled Third-party backup application encryption Disabled Third-party backup application deduplication Disabled Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 14 Compression Specific to SQL Server 2008 Enterprise and later versions, backup compression can be enabled or disabled. The default product installation does not compress backups. A server-level compression setting can be applied that alters default behavior. The use of the COMPRESSION keyword within a backup SQL transaction explicitly enables backup compression. The use of the NO_COMPRESSION keyword within a backup SQL transaction explicitly disables backup compression. Figure 11: Native SQL Backup - Disable Figure 11 illustrates SQL Server 2008 properties Compression for native compression; the “Compress backup” service level property is used for backup jobs that do not explicitly enable or disable compression. Backup application software compression should be disabled because the Data Domain system can fingerprint unique data segments more efficiently for deduplication if the data segments sampled are not already compressed. Backup windows can be extended and CPU performance can be impacted on the backup client if the backup software is tasked with performing compression. Local compression is provided for on the Data Domain storage system. Multiplexing When the Data Domain system is integrated as a backup device with a backup application that supports multiplexed backups, EMC recommends disabling multiplexed backups. Multiplexing limits the ability of the Data Domain system to deduplicate incoming data. Historically used as a speed matching solution where multiple slower data streams were multiplexed into a single stream to take advantage of a somewhat faster tape drives, backups to disk drives obtain no advantage from multiplexing. Whether deployed as a CIFS share, NFS mount, VTL, or OpenStorage / DD Boost disk pool, Data Domain systems accommodate writing multiple backup streams in parallel without multiplexing. Encryption Encrypted files are by definition, unique. The encryption software that is part of the backup application will create unique files, on-the-fly for each backup, defeating the deduplication capabilities of the deduplication storage system. Data Domain Encryption software provides encryption of data at rest and is persistent in flight during replication with Data Domain Replicator software. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 15 Backup Application Based Data Deduplication Disabling deduplication from the backup application software will provide better performance and allow the Data Domain system to offload this work. Data Domain systems are optimized to provide the very best ingest performance and deduplication ratios. Table 2: Recommended Backup Software Parameter Settings PARAMETERS AFFECTING BACKUP AND SETTING RECOVERY PERFORMANCE BLOCKSIZE Default 512 byte or higher based on performance improvements Stripes Consider the use of multiple stripes to improve backup and restore data transfer rates Blocksize The, “BLOCKSIZE” keyword can be used to alter physical block size used when writing to backup media. By default the backup process will automatically select a block size appropriate for the backup device. Supported sizes are 512, 1K, 2K, 4K, 8K, 16K, 32K and 64K bytes. The default value used for disk backup is 512 bytes. The default 512-byte size yields excellent performance with Data Domain systems. Third-party backup applications may substitute their own default value. The fact that this parameter can be adjusted is included as reference. The use of larger sizes may improve or degrade performance. Users are encouraged to investigate further to determine what value may provide optimal results in their environment. Stripes While not a keyword within the context of Microsoft SQL Server, the term stripes correlates to the number of simultaneous backup streams to be created for a given backup operation. In the case of disk backups with SQL Server, multi-streamed backups are performed by specifying a number of backup disk targets with the BACKUP command. Table 3: Mount Options MOUNT OPTIONS When performing native database backups When using a third-party backup server SETTING UNC path Dependent on backup application and server OS type Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 16 When the Data Domain system is used as a disk backup media for native Microsoft SQL Server backups, configuration is performed utilizing a CIFS share. As a general rule, the UNC path to the share should be used instead of a mapped drive because: a) Scheduled backups may execute when no user is logged in to the server b) When Sqlservr.exe is executed as a service, it has no relation to a login session Table 4: Miscellaneous Options MISCELLANEOUS OPTIONS CONFIGURATION Comingling native and third-party backup application database backups to the same Data Domain system Yes Replication Yes Comingling native and third-party backups to a Data Domain system should have only a negligible impact on deduplication ratios because of the variable segment processing and Stream Informed Segment Layout (SISL) architected into Data Domain systems. Since Data Domain Replicator software only sends unique, compressed data segments to the remote system it is ideal for network-efficient disaster recovery. Table 5: Infrastructure Configuration INFRASTRUCTURE CONFIGURATION Server Disk Subsystem Database and log files should be placed on disk storage with performance attributes facilitating required transaction and backup performance metrics IP Network Dedicated backup network that meets or exceeds bandwidth requirements for the desired data transfer rate EMC Data Domain System Sized to meet or exceed ingest rate and backup retention capacity requirements Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 17 Backup Command The recommended use of SQL stripes is as a speed matching technology. Multiple backup streams from a given database can be simultaneously written to a target Data Domain system in an effort to achieve an aggregate data transfer rate that aligns with business requirements. Figure 12 illustrates a multi-striped database backup that uses eight stripes in an effort to improve backup data transfer rate performance. Multiple stripes can be used to better match data transfer rate capabilities between source and destination media. Figure 12: Multi-striped Database Backup – Eight Stripes Data Transfer Rates Multiple business objectives are considered when determining required backup and recovery data transfer rates. Decision criteria include backup window duration, log growth, and recovery time. By definition, slow backups are those that fail to meet or exceed business objectives. Understanding factors that can affect performance is critical to removing them from the environment. Figure 13: Database Backup to a Null Device A reasonable place to start any backup performance investigation is to understand the theoretical maximum speed at which SQL Server can process a given database backup. Performing a database backup to a null disk device provides an estimate of that maximum achievable speed in a given environment. Figure 13 depicts a database backup to a null device. The results of the query indicate that the theoretical maximum rate at which the SQL Server backup function can extract data from this database using a single stripe is approximately 80 MB/sec. Regardless of the data transfer rate at which the backup media can accept data, backing up this database as it currently stands will be limited to 80 MB/sec when using a single stripe. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 18 Figure 14 depicts a database backup to multiple null disk devices. Figure 14: Multiple Null Disk Devices Figure 15 depicts nominal database backup performance improvement with a moderately tuned eight-stripe SQL database backup with an aggregate data transfer rate of approximately 172 MB/sec, indicating that the network-attached backup devices are not limiting throughput. Figure 15: Nominal Database Backup Performance Integration EMC NetWorker and third-party backup applications used to protect Microsoft SQL Server can also take advantage of Data Domain systems employed as backup media. Data Domain systems are easily configured as varied backup media types and protocols including VTL, CIFS share, NFS mount, or Data Domain Boost (DD Boost) for backup applications such as EMC NetWorker. Additionally, DD Boost enables managed replication capabilities known as, “clone controlled replication” with EMC NetWorker. In this scenario, backup images are replicated from one Data Domain system to another under the direct control of NetWorker or other supported backup applications. DD Boost monitoring, reporting, and cataloging of replicated backup images and savesets can be used to architect a comprehensive disaster recovery plan. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 19 Solution Planning Capacity and performance planning play a critical role in both successful deployment and ongoing production usage of a Data Domain system. Detailed capacity analysis should be performed by a knowledgeable EMC Velocity partner or an EMC technical consultant. The analysis considers database sizes, growth rates, change rates, and retention periods as input criteria. Performance analysis considers data points such as the required aggregate data transfer rate for backups, connection topology requirements to support the data transfer rate, and the Data Domain system required to meet or exceed the required data transfer rate. Beyond capacity and performance planning are additional considerations for Data Domain system replication. Additional Considerations Replication Scope Replicating all database backups is certainly possible. However, many users will want to implement replication at a more granular level. Production database backups are usually excellent replication candidates, whereas development and test database backups are less critical. An analysis of network bandwidth and destination disk space requirements should be performed by a knowledgeable EMC Velocity partner or an EMC technical consultant. Replication Topology Backups are typically replicated to serve as a second backup copy for recovery in the event of a disaster. When backups from a primary site are being replicated to a secondary site, planning is relatively straightforward. Users with multiple primary sites may decide to implement a bidirectional replication solution where database backups from either site are replicated to the alternate site. Proper planning should render an outline detailing which database backups are being replicated to each location. Tape Consolidation Some users replicate backup images to a central location for disaster recovery purposes while also using the solution as a vehicle that enables centralized tape creation. The third-party backup application used to create tape-based backup copies will dictate any additional considerations or restrictions that this solution involves. A knowledgeable EMC Velocity partner or an EMC technical consultant will be able to assist with this planning task. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 20 Backup Types The goal of backups is to satisfy recovery time and point objectives. Outlining a strategy of full, differential, and transaction log backups is beyond the scope of this paper. That stated, there are a few key points worth noting: • Performing full backups frequently with Data Domain deduplication storage does not create a storage usage penalty, as redundant database segments do not consume additional disk space. While this may appear to enable the ability to perform full backups more frequently, the load full backups place on the SQL server and connection topology to the Data Domain system should be taken into consideration. • When split-mirror or snapshot backups are performed and controlled by a thirdparty backup application, the Data Domain system is easily integrated as a backup storage device. The features provided by these backup techniques (lowimpact backups, instant recovery, and so on) do not preclude the use of Data Domain technology. IP Network Considerations When Data Domain systems are deployed as a CIFS backup share, EMC recommends interconnecting SQL servers and Data Domain systems using a dedicated backup area network. When deployment is in conjunction with a backup application as a CIFS share, NFS mount, or OpenStorage / DD Boost disk pool, EMC similarly recommends interconnecting backup application media servers and Data Domain systems using a dedicated backup area network. Whenever possible, the network used for backup and recovery communications should be segregated from other production networks. This best practice recommendation seeks to assure that network bandwidth is available for backup and restore jobs to meet or exceed business objectives. Network bandwidth requirements may dictate the need for a topology that supports data transfers in excess of 125 MB/s. All Data Domain systems support the use of multiple GbE network interfaces, and the use of 10 GbE network interfaces. A knowledgeable Data Domain system engineer will be able to assist with planning the deployment based on user requirements and available resources. Data Domain and Third-Party Backup Applications When Data Domain systems are integrated with EMC NetWorker and third-party backup applications, it is important to note that Microsoft SQL Server backup parameters are handled the same as when compared to a native SQL Server backup implementation. The COMPRESSION, and BLOCKSIZE keywords, as well as any striping, are still valid parameters. Some of these settings may or may not be unavailable when using a third-party backup application. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 21 Figure 16 depicts the NetWorker Management Console interface for configuring MS SQL Backups. Figure 16: NetWorker Management Console Users of third-party backup applications seeking to exploit the full complement of available Microsoft SQL Server backup options should contact their software provider in the event additional information is required. Conclusion A Data Domain system makes an excellent target for Microsoft SQL Server backups because it integrates easily and seamlessly into existing SQL Server environments. Data Domain systems allow the SQL Server administrative team to retain a greater number of full backup images online, thereby optimizing recovery options while occupying minimal footprint in the data center, utilizing native backup tools that are familiar to SQL Server administrators. The addition of a Data Domain system into the environment greatly reduces dependence on legacy tape and provides faster “time-to-DR” with network-efficient replication. When Data Domain Boost integration with EMC NetWorker is leveraged, performance can be greatly improved and the managed replication includes the remote backup image in the saveset database for easy recovery. It is for all of these reasons that more people choose to build their backup solutions using EMC products and technology. Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 22 Appendix A: Index Fragmentation Index fragmentation affects I/O performance of queries whose data pages do not reside in the Microsoft SQL Server data cache. A variety of techniques are commonly used to reduce index fragmentation, including but not limited to “DBCC INDEXDEFRAG”, “DBCC DBREINDEX”, and “CREATE INDEX WITH DROP EXISTING”. While these techniques are effective in reducing index fragmentation, they can also have a negative impact on deduplication. Database administrative teams that routinely defragment all indexes at some predetermined frequency may notice reduced data deduplication rates on their Data Domain systems. The end result is reduced storage efficiency. Index defragmentation has the effect of reorganizing the pages within a database such that Data Domain deduplication sees the backup data stream as new, unique data. In addition to the inefficient use of backup device storage space, this can also impact the ability to replicate database backups using Data Domain replication. A greater quantity of unique data blocks equates to replicating a greater quantity of data over what may be a bandwidth limited WAN. Database administrative teams may find themselves in a situation where index fragmentation impacts query performance, and frequent index defragmentation impacts backup storage device performance in terms of deduplication and replication rates. Addressing the Challenge EMC recommends addressing these challenges with a balanced approach. For instance, instead of defragmenting all indexes based on a schedule, consider defragmentation based on thresholds. Additionally EMC recommends the use of index keys that are less prone to fragmentation in the first place. Is index fragmentation the only issue impacting transaction performance? I/O subsystem performance, memory usage, and CPU utilization can all have a negative impact on query performance. These issues should be diagnosed and resolved versus the use of frequent automatic index defragmentation to improve performance. File fragmentation can also impact performance. Many small databases sharing the same logical disk volume combined with the use of the “autogrowth” property can cause logically sequential database files to allocate non-sequential physical storage on disk. Ideally, administrators should set the size of database files at deployment to accommodate potential future growth. While it may be impossible to anticipate the size of a given database three years into the future, doing so helps to reduce the possibility that file fragmentation will impact query performance. If automatically growing database files is a requirement, consider growing in large chunks versus small chunks. It may be impractical to locate each database on a unique logical volume, but consider doing so for databases that are Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 23 expected to grow considerably over time. Finally, disk file fragmentation can be reduced by Windows file system defragmentation utilities such as the Windows “Disk Defragmenter.” Do all indexes need to be defragmented or just a subset? EMC recommends the use of index defragmentation tools based on thresholds and limits versus automatically defragmenting every index on every table whether it is required or not. The suggestion is to understand what indexes and their corresponding fragmentation levels impact performance. These indexes should be monitored for a specific fragmentation threshold, and action taken to defragment these indexes only when necessary. Selective index defragmentation will have less impact on production and will assist in preserving the ability to efficiently deduplicate database backups. Figure 17 depicts the DBCC showcontig command output. It includes extent scan fragmentation data indicating that index “C_CustomerI1” does not require defragmentation at this time. Figure 17: DBCC “showcontig” Command Output Structuring indexes and keys so as to minimize fragmentation may or may not be realistic in all cases, but it should be considered as it potentially reduces the need to defragment indexes frequently. Index and key inserts that occur at the end of the table and index are likely to reduce fragmentation. Deletes that occur in contiguous chunks also assist in reducing fragmentation. Appendix B: Additional resources Microsoft Resource Links Backing Up and Restoring Databases in SQL Server - from SQL Server 2008 Books Online: http://msdn.microsoft.com/en-us/library/ms187048.aspx Backing Up and Restoring Databases in SQL Server - from SQL Server 2005 Books Online: http://msdn.microsoft.com/en-us/library/ms187048(SQL.90).aspx Optimizing Backup and Restore Performance in SQL Server - SQL Server 2005 Books Online: http://msdn.microsoft.com/en-us/library/ms190954(SQL.90).aspx Microsoft SQL Server Community http://technet.microsoft.com/en-us/sqlserver/bb671048.aspx Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 24 EMC Data Domain links EMC Backup and Recovery for Microsoft Applications — Deduplication Enabled by EMC CLARiiON and Data Domain white paper http://www.emc.com/collateral/software/white-papers/h7051-backup-recoverymicrosoft-deduplication-clariion-wp.pdf EMC Data Domain Family products and deduplication technology http://www.emc.com/products/family/data-domain-family.htm Technical Notes - Using EMC® NetWorker® Module for SQL Server® with Data Domain Boost® for Improved Backup and Recovery Performance http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-013159.pdf?mtcs=ZXZlbnRUeXBlPUttQ2xpY2tDb250ZW50RXZlbnQsZG9jdW1lbnRJZD0wOTAxNDA2NjgwN WZkOTMwLGRvY3VtZW50VHlwZT1wZGYsbmF2ZU5vZGU9MGIwMTQwNjY4MDQyNzY2OF9Hcmlk ® EMC Data Domain Boost Software http://www.emc.com/products/detail/software/data-domain-boost.htm EMC Data Domain SISL Scalability Architecture — A Detailed Review white paper http://www.emc.com/collateral/hardware/white-papers/h7221-data-domain-sislsclg-arch-wp.pdf EMC Data Domain Replicator Software — A Detailed Review white paper http://www.emc.com/collateral/software/white-papers/h7082-data-domainreplicator-wp.pdf.pdf EMC Data Invulnerability Architecture: Ensuring Data Integrity and Storage System Recoverability white paper http://www.emc.com/collateral/software/white-papers/h7219-data-domain-datainvul-arch-wp.pdf ® EMC Networker Software http://www.emc.com/backup-and-recovery/networker/networker.htm EMC NetWorker Online Community https://community.emc.com/community/connect/networkeronline IDC Study – Worldwide Purpose Built Backup Appliances: http://www.emc.com/collateral/analyst-reports/idc-worldwide-purpose-built-backup-appliance-20112015.pdf Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems Best Practices Planning Guide Page 25