Document 6514274
Transcription
Document 6514274
Global Technology Services How to health-check your TSM environment Holger Speh Consulting IT Specialist Global Technology Services Motivation Why is my backup so slow? How are my tape drives utilized? How much data is transferred every day? Where is my bottleneck? It‘s the network!? Why is my restore so slow? Healthcheck your environment! Why do I run out of scratch tapes all the time? 2 Global Technology Services What is a TSM Healthcheck? A short engagement to evaluate an existing TSM solution. Typically focused on optimisation potential TSM server health Overall performance Aligning current configuration with best practices Can also be focused on helping with strategic planning for future needs. Must be clear and honest, even if it has bad news 3 3 Global Technology Services What is a TSM Healthcheck not? It is not just an “overview dashboard lights” tool. It is not deployment. It is not an excuse to sell/buy unneeded software. It is not a long term engagement. It is not about fixing specific current problems. It is not an excuse for the customer to find an IBM engineer to be on the hook forever for recommendations resulting from the assessment. It does not include Disaster Recovery analysis 4 4 Global Technology Services Methodology overview 1. 2. 3. 4. 5. 5 5 Prepare an assessment workshop Perform assessment workshop Perform analysis and generate diagrams and tables Create report (doc/presentation) Present analysis results and recommendations Global Technology Services Prepare yourself to understand the situation Understand why this has been started Understand the customer‘s expectations Understand your own expectations Prepare a customer assessment workshop and send customer an assessment worksheet 6 6 Global Technology Services Perform a detailed client workshop Set customer expectations of the assessment Gather general information about processes, organisation, strategy, etc. Understand your client‘s requirements! Gather as much data as possible about customer‘s environment TSM configuration Network topology SAN topology Systems and Software used Disk layout Tape infrastructure Setup data gathering/monitoring mechanisms Remember to monitor OS cpu, memory, disk and network Monitor the same time period as you will analyse from within TSM 7 7 Global Technology Services Know your tools and understand the data 8 8 This TSM Healthcheck is a pretty technical based tool! It does not replace a general architectural review! It does not provide all answers at hand! You need to be able to interpret the data! Use latest available reference literature! Global Technology Services Workflow and Tools used Actlog TSM Administrative 9 … Commandline MySQL Perl Bash Shell Scripts Gnuplot Archives Backups Contents Spacemgfiles … Volumeusage 6-May-09 Global Technology Services TSM performance measurement areas Not all layers can be monitored through TSM Client performance for LANfree needs special server parameter OS monitoring for TSM host needed 10 10 Global Technology Services The situation: Consolidation of very large TSM environment Customer asked for proposal to simplify and consolidate his TSM environment Following points had be be respected: LAN-free (150-250 Clients), LAN-based (3500-4500 Clients) Central Tape Management (eRMM / RMM) Dynamic loadbalancing due to heavy data growth Simple and leight weighted Use optimal tape performance Use TSM mirroring Use available techniques Don‘t consider disaster recovery Encryption PnP (Plug´n Play of new systems) Migrate... finally 11 11 Global Technology Services Quick overview ordered by sections Server Kategorie Kriterien client activity 24h traffic daytime gaps backup archive restores/retrieves client volume avg hourly nighttime volume avg hourly daytime volume daily file-level daily tdp client performance avg mb/sec max mb/sec client sessions avg parallel count max parallel count server activity 24h traffic nighttime gaps migration reclamation expiration server performance db backup expiration migration server volume migration mount wait migration 12 i0 i1 s1 s2 s3 s4 HP-UX s5 s6 s7 s8 w1 w2 zOS x1 x2 x3 x4 x5 x6 Win x7 x8 xa AIX 6-May-09 xb xc xd Global Technology Services Section: client activity zOS and HP-UX environment Actvity around the clock Nearly no free time windoes for administrative activities during the day Moderate to frequent restore activites Windows and AIX environment Backup operation only during the night Adequate free time windows for administrative activities during the day No restore activities 13 tsmgroup HPUX HPUX HPUX MVS MVS MVS MVS MVS MVS MVS MVS MVS MVS MVS MVS MVS Windows AIX AIX AIX SUM 6-May-09 platform_name count gb HPUX 178 4.47 TDP Oracle HP 41 3.86 TDP R3 HP 52 3,659.61 AIX 77 48.92 CE Archive 3,593 41.76 HPUX 3,293 114.53 IRIX 16 3.92 Linux390 3 0.00 Linux86 26 298.07 LinuxIA64 1 0.00 LinuxPPC 6 0.00 SUN SOLARIS 13 0.75 TDP Oracle HP 260 156.01 TDP Oracle 5 1.58 SUN TDP R3 HP 31 1,260.61 WinNT 81 20.18 WinNT 56 28.93 AIX 4 0.00 TDP MSExchg 25 780.03 WinNT 60 24.28 7,821 6447.52 Global Technology Services Section: client volume zOS environment Moderate to high data volume per hour during the night Low data volume per hour during the day File-level as TDP traffic HP-UX environment Moderate data volume per hour during the night as during the day Few file-level Traffic, much TDP Windows environment Low utilization per hour Low data traffic AIX environment Fresh environment with yet low utilization Partition of day and night recognizable 14 14 Global Technology Services Section: client performance All environments Low thruput during file-level activities Only TDP Nodes show higher thruput 15 15 Global Technology Services Section: client sessions zOS environment Moderate to many parallel sessions Note: Every active session allocates memory and generates locks in the TSM DB HP-UX environment Few to moderate parallel sessions Windows environment Few parallel sessions AIX environment Very few parallel sessions on freshly setup systems Moderate utilization on all other systems 16 16 Global Technology Services Section: server activity zOS environment Activity round the clock with no free time windows Main activity is migration Sometimes very long running expirations HP-UX environment Different behaviour Adsmi0 only shows server activities during the day Adsmi1 show activity round the clock with a lot of migration Windows environment Few nightly activities Sometimes long expiration AIX environment Currently server activity only during the day Exception: Expiration, which is scheduled with short intervals 17 17 Global Technology Services Section: server performance zOS environment Very good DB performance, exception is s7 Migration only moderate HP-UX environment Very good DB performance Migration only moderate to bad Windows environment Mixed behaviour Moderate (w1) and very good (w2) performance AIX environment Mixed behaviour Sometimes very good DB Performance (x1-x4) 18 18 Global Technology Services Section: server volume Only migration shows significant volume zOS environment Low to very high migration volume Frequent disk pool overflows HP-UX environment Low migration volume Frequent disk pool overflows Windows environment Low migration volume AIX environment Nearly no migration volume 19 6-May-09 Global Technology Services Section: mount wait zOS environment Low waiting times HP-UX environment Low to moderate waiting times Windows environment Low to moderate waiting times High waiting times for client actions AIX environment Low waiting times High waiting times for client actions 20 20 Global Technology Services Potential Storage Pool Overflows Daily migration volume compared to disk pool size Assumption: only 1x daily migration Potential small disk pool if daily migration volume exceeds storage pool size 21 21 stgpool_name adsmi0 BACKUP_DISK_LAN1 BACKUP_DISK_REDO1 BACKUP_DISK_REDO2 adsmi1 BACKUP_DISK_REDO1 BACKUP_DISK_REDO2 adsms1 ARCHIVE_DISK BACKUP_DISKJ adsms2 ARCHIVE_DISK BACKUP_DISK BACKUP_DISKJ adsms3 ARCHIVE_DISK BACKUP_DISK adsms4 BACKUP_DISKJ overflows stgpool_name adsms5 5 ARCHIVE_DISK 10 BACKUP_DISKJ 10 adsms6 ARCHIVE_DISK 25 BACKUP_DISK 15 BACKUP_DISK2 BACKUP_DISK3 9 BACKUP_DISKJ 21 BACKUP_DISKJ_GR adsms7 1 ARCHIVE_DISK 12 adsms8 18 ARCHIVE_DISK BACKUP_DISKJ 2 adsmx7 17 BACKUP_3592_J1_FS adsmx8 30 BACKUP_3592_J1_FS overflows 7 16 14 5 4 3 1 4 31 1 22 4 4 Global Technology Services LANfree considerations – by volume 161 LANfree Clients platform_name Total HPUX TDP MSExchg TDP Oracle HP TDP R3 HP Grand Total 55 39 9 58 161 74 LANfree Clients platform_name HPUX Total 1 TDP MSExchg 37 TDP R3 HP 36 Grand Total 74 Activities by Volume Plattform HPUX HPUX HPUX HPUX SUM TDP MSExchg Volume by date GB Count <1 820 1-10 206 10-50 20 50-100 1 2007-04-19 24,166.77 8,402.95 23,353.95 9,215.77 32,569.72 2007-04-20 21,327.20 9,066.45 19,845.86 10,547.78 30,393.65 2007-04-21 18,156.31 8,570.01 16,822.02 9,904.30 26,726.32 1,047 4 2007-04-22 5,233.80 1,621.94 4,995.45 1,860.28 6,855.74 2007-04-23 19,138.22 9,542.61 17,910.95 10,769.87 28,680.82 9 11 50 147 221 9,392 1,042 10,434 2007-04-24 20,589.58 10,757.53 19,054.26 12,292.85 31,347.11 2007-04-25 18,260.77 11,085.51 16,958.34 12,387.93 29,346.27 2007-04-26 21,739.70 9,790.04 19,787.00 11,742.74 31,529.75 2007-04-27 20,816.35 9,917.61 19,560.24 11,173.71 30,733.95 2007-04-28 20,546.28 8,830.13 19,231.59 10,144.82 29,376.42 2007-04-29 4,866.14 2,083.47 4,654.87 2,294.74 6,949.61 <1 TDP MSExchg TDP MSExchg TDP MSExchg TDP MSExchg SUM TDP Oracle HP TDP Oracle HP SUM 1-10 10-50 50-100 >100 TDP R3 HP TDP R3 HP TDP R3 HP TDP R3 HP TDP R3 HP <1 1-10 10-50 50-100 >100 <1 1-10 SUM SUM SUM Overall SUM Small operations no longer LANfree 3,519 2,240 376 259 652 7,046 17,639 1,109 18,748 New date LAN LANFREE LAN LANFREE Total 2007-04-30 16,865.96 8,702.27 15,834.17 9,734.06 25,568.23 2007-05-01 15,993.69 10,719.76 14,715.15 11,998.30 26,713.46 2007-05-02 19,126.66 11,644.60 17,789.90 12,981.37 30,771.27 2007-05-03 22,755.03 9,169.84 21,149.79 10,775.08 31,924.87 2007-05-04 19,576.18 9,992.76 18,225.08 11,343.86 29,568.94 2007-05-05 18,911.51 8,449.42 17,552.52 9,808.40 27,360.93 2007-05-06 5,194.30 1,678.79 5,043.75 1,829.34 6,873.09 2007-05-07 19,298.28 9,299.15 18,092.39 10,505.05 28,597.43 2007-05-08 19,789.73 10,301.16 18,551.39 11,539.50 30,090.89 Median 19,218.25 9,234.50 18,001.67 10,658.83 29,361.34 Max 24,166.77 11,644.60 23,353.95 12,981.37 32,569.72 Min 4,866.14 1,621.94 4,654.87 1,829.34 6,855.74 Massive reduction of LANfree tape mounts 22 Old 6-May-09 Global Technology Services LANfree considerations – by performance 161 LANfree Clients platform_name HPUX TDP MSExchg TDP Oracle HP TDP R3 HP Grand Total Total 55 39 9 58 161 85 LANfree Clients platform_name Total TDP MSExchg 39 TDP Oracle HP 1 TDP R3 HP 45 Grand Total 85 Activities by Thruput and GB Platform MB/sec Count Avg GB/Op HPUX HPUX HPUX HPUX 0-1 1-5 5-10 10-20 SUM TDP MSExchg TDP MSExchg TDP MSExchg TDP MSExchg TDP MSExchg 1-5 5-10 10-20 20-50 2 1 18 157 TDP MSExchg SUM TDP Oracle HP TDP Oracle HP >50 TDP Oracle HP TDP Oracle HP TDP Oracle HP SUM TDP R3 HP 5-10 10-20 20-50 0-1 36 113 13 10,434 1,383 TDP R3 HP TDP R3 HP 1-5 5-10 2,063 105 TDP R3 HP TDP R3 HP 10-20 20-50 1,797 1,698 7,046 SUM 23 0-1 362 561 93 31 1,047 1 0-1 1-5 42 221 2,865 7,407 0.12 0.95 4.04 12.38 0.00 0.09 0.09 140.59 145.47 154.43 0.05 0.32 2.08 3.89 0.77 0.00 0.04 2.65 17.52 75.16 Volume by Thruput New Old date LAN LANFREE LAN LANFREE Total 2007-04-19 24,489.57 8,080.15 23,353.95 9,215.77 32,569.72 2007-04-20 2007-04-21 2007-04-22 21,738.70 19,242.94 5,196.03 8,654.95 7,483.38 1,659.71 19,845.86 16,822.02 4,995.45 10,547.78 9,904.30 1,860.28 30,393.65 26,726.32 6,855.74 2007-04-23 19,330.74 9,350.08 17,910.95 10,769.87 28,680.82 2007-04-24 2007-04-25 2007-04-26 2007-04-27 2007-04-28 2007-04-29 2007-04-30 2007-05-01 2007-05-02 2007-05-03 2007-05-04 2007-05-05 2007-05-06 2007-05-07 2007-05-08 Median Max Min 21,497.01 19,350.50 22,045.69 22,113.83 9,850.10 9,995.77 9,484.05 8,620.13 19,054.26 16,958.34 19,787.00 19,560.24 12,292.85 12,387.93 11,742.74 11,173.71 31,347.11 29,346.27 31,529.75 30,733.95 21,695.53 4,848.17 17,271.56 16,121.22 7,680.89 2,101.44 8,296.67 10,592.24 19,231.59 4,654.87 15,834.17 14,715.15 10,144.82 2,294.74 9,734.06 11,998.30 29,376.42 6,949.61 25,568.23 26,713.46 20,472.16 23,449.74 20,517.47 19,466.40 5,168.56 20,737.75 22,198.89 10,299.11 8,475.14 9,051.46 7,894.52 1,704.52 7,859.68 7,892.01 17,789.90 21,149.79 18,225.08 17,552.52 5,043.75 18,092.39 18,551.39 12,981.37 10,775.08 11,343.86 9,808.40 1,829.34 10,505.05 11,539.50 30,771.27 31,924.87 29,568.94 27,360.93 6,873.09 28,597.43 30,090.89 20,494.82 24,489.57 4,848.17 8,385.91 10,592.24 1,659.71 18,001.67 23,353.95 4,654.87 10,658.83 12,981.37 1,829.34 29,361.34 32,569.72 6,855.74 6-May-09 Global Technology Services Solution strategy: Equalize potential bottlenecks by introducing a layered architecture which accounts all included components Each layer is setup by following defined standards and will be easily expandable in a horizontal manner by adding additional components. Through standardized usage of included components a vertical efficiency can process data by its business value. By introducing planning mechanisms this highly scalable environment can be administered very effectively and efficiently. 24 24 Global Technology Services Findings require infrastructure alignment to implement solution strategy TSM Software Consolidate environment by focussing on only two platforms Adjust config to Best Practices Define standard mechanisms for client backup procedures according to their SLAs Establish central configuration manager for TSM objects Disk Hardware Reduce complexity coming from LANfree backups and force Disk2Disk by enlarging disk pools by 10-15TB Establish shared disk environment by establishing GPFS Establish standardized Disk environment with DS8300 for Open Systems and Mainframe Tape Hardware Establish standardized Tape environment with TS1120 for Open Systems and Mainframe, which also supports encryption SAN Hardware New dual fabric SAN with >200 ports per fabric 25 25 Global Technology Services What about TSM 6.1? DB/2 included DB/2 instance is hidden Many tables can be seen but not all can be accessed Don‘t do number crunching direct on TSM DB Include DB/2 parameters into assessment TSM Reporting available Offers limited view on available data Aggregates data Presented method is still valid for TSM 6.1 Little Adjustment might be needed 26 Global Technology Services 27 27 Global Technology Services [email protected] 28 Global Technology Services 29 Global Technology Services Disclaimer No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation. Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This information could include technical inaccuracies or typographical errors. IBM may make improvements and/or changes in the product(s) and/or program(s) at any time without notice. Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. The performance data contained herein was obtained in a controlled, isolated environment. Actual results that may be obtained in other operating environments may vary significantly. While IBM has reviewed each item for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customer experiences described herein are based upon information and opinions provided by the customer. The same results may not be obtained by every user. Reference in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this document is not intended to state or imply that only that program product may be used. Any functionally equivalent program, that does not infringe IBM's intellectual property rights, may be used instead. It is the user's responsibility to evaluate and verify the operation on any non-IBM product, program or service. THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR INFRINGEMENT. IBM shall have no responsibility to update this information. IBM products are warranted according to the terms and conditions of the agreements (e.g. IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM is not responsible for the performance or interoperability of any non-IBM products discussed herein. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. 30 6-May-09