Document 6514274

Transcription

Document 6514274
Global Technology Services
How to health-check
your TSM environment
Holger Speh
Consulting IT Specialist
Global Technology Services
Motivation
Why is my backup so slow?
How are my tape
drives utilized?
How much data is
transferred every day?
Where is my bottleneck?
It‘s the network!?
Why is my restore so slow?
Healthcheck your environment!
Why do I run out of scratch
tapes all the time?
2
Global Technology Services
What is a TSM Healthcheck?
A short engagement to evaluate an
existing TSM solution.
Typically focused on optimisation potential
TSM server health
Overall performance
Aligning current configuration with best practices
Can also be focused on helping with strategic planning
for future needs.
Must be clear and honest, even if it has bad news
3
3
Global Technology Services
What is a TSM Healthcheck not?
It is not just an “overview dashboard lights” tool.
It is not deployment.
It is not an excuse to sell/buy unneeded software.
It is not a long term engagement.
It is not about fixing specific current problems.
It is not an excuse for the customer to find an IBM engineer to
be on the hook forever for recommendations resulting from
the assessment.
It does not include Disaster Recovery analysis
4
4
Global Technology Services
Methodology overview
1.
2.
3.
4.
5.
5
5
Prepare an assessment workshop
Perform assessment workshop
Perform analysis and generate diagrams and tables
Create report (doc/presentation)
Present analysis results and recommendations
Global Technology Services
Prepare yourself to understand
the situation
Understand why this has been started
Understand the customer‘s expectations
Understand your own expectations
Prepare a customer assessment workshop and send
customer an assessment worksheet
6
6
Global Technology Services
Perform a detailed client workshop
Set customer expectations of the assessment
Gather general information about processes, organisation, strategy, etc.
Understand your client‘s requirements!
Gather as much data as possible about customer‘s environment
TSM configuration
Network topology
SAN topology
Systems and Software used
Disk layout
Tape infrastructure
Setup data gathering/monitoring mechanisms
Remember to monitor OS cpu, memory, disk and network
Monitor the same time period as you will analyse from within TSM
7
7
Global Technology Services
Know your tools and
understand the data
8
8
This TSM Healthcheck is a pretty technical
based tool!
It does not replace a general architectural review!
It does not provide all answers at hand!
You need to be able to interpret the data!
Use latest available reference literature!
Global Technology Services
Workflow and Tools used
Actlog
TSM Administrative
9
…
Commandline
MySQL
Perl
Bash Shell Scripts
Gnuplot
Archives
Backups
Contents
Spacemgfiles
…
Volumeusage
6-May-09
Global Technology Services
TSM performance measurement areas
Not all layers can be
monitored through TSM
Client performance for
LANfree needs special
server parameter
OS monitoring for TSM
host needed
10
10
Global Technology Services
The situation: Consolidation of very large TSM
environment
Customer asked for proposal to simplify and consolidate his TSM environment
Following points had be be respected:
LAN-free (150-250 Clients), LAN-based (3500-4500 Clients)
Central Tape Management (eRMM / RMM)
Dynamic loadbalancing due to heavy data growth
Simple and leight weighted
Use optimal tape performance
Use TSM mirroring
Use available techniques
Don‘t consider disaster recovery
Encryption
PnP (Plug´n Play of new systems)
Migrate... finally
11
11
Global Technology Services
Quick overview ordered by sections
Server
Kategorie
Kriterien
client activity
24h traffic
daytime gaps
backup
archive
restores/retrieves
client volume
avg hourly nighttime volume
avg hourly daytime volume
daily file-level
daily tdp
client performance avg mb/sec
max mb/sec
client sessions
avg parallel count
max parallel count
server activity
24h traffic
nighttime gaps
migration
reclamation
expiration
server performance db backup
expiration
migration
server volume
migration
mount wait
migration
12
i0
i1
s1
s2
s3
s4
HP-UX
s5
s6
s7
s8
w1 w2
zOS
x1
x2
x3
x4
x5
x6
Win
x7
x8
xa
AIX
6-May-09
xb
xc
xd
Global Technology Services
Section: client activity
zOS and HP-UX environment
Actvity around the clock
Nearly no free time windoes for
administrative activities during the day
Moderate to frequent restore activites
Windows and AIX environment
Backup operation only during the night
Adequate free time windows for
administrative activities during the day
No restore activities
13
tsmgroup
HPUX
HPUX
HPUX
MVS
MVS
MVS
MVS
MVS
MVS
MVS
MVS
MVS
MVS
MVS
MVS
MVS
Windows
AIX
AIX
AIX
SUM
6-May-09
platform_name count gb
HPUX
178
4.47
TDP Oracle HP
41
3.86
TDP R3 HP
52 3,659.61
AIX
77
48.92
CE Archive
3,593
41.76
HPUX
3,293
114.53
IRIX
16
3.92
Linux390
3
0.00
Linux86
26
298.07
LinuxIA64
1
0.00
LinuxPPC
6
0.00
SUN SOLARIS
13
0.75
TDP Oracle HP
260
156.01
TDP Oracle
5
1.58
SUN
TDP R3 HP
31 1,260.61
WinNT
81
20.18
WinNT
56
28.93
AIX
4
0.00
TDP MSExchg
25
780.03
WinNT
60
24.28
7,821 6447.52
Global Technology Services
Section: client volume
zOS environment
Moderate to high data volume per hour during the
night
Low data volume per hour during the day
File-level as TDP traffic
HP-UX environment
Moderate data volume per hour during the night as
during the day
Few file-level Traffic, much TDP
Windows environment
Low utilization per hour
Low data traffic
AIX environment
Fresh environment with yet low utilization
Partition of day and night recognizable
14
14
Global Technology Services
Section: client performance
All environments
Low thruput during file-level activities
Only TDP Nodes show higher thruput
15
15
Global Technology Services
Section: client sessions
zOS environment
Moderate to many parallel sessions
Note: Every active session allocates memory
and generates locks in the TSM DB
HP-UX environment
Few to moderate parallel sessions
Windows environment
Few parallel sessions
AIX environment
Very few parallel sessions on freshly setup
systems
Moderate utilization on all other systems
16
16
Global Technology Services
Section: server activity
zOS environment
Activity round the clock with no free time
windows
Main activity is migration
Sometimes very long running expirations
HP-UX environment
Different behaviour
Adsmi0 only shows server activities during the
day
Adsmi1 show activity round the clock with a lot of
migration
Windows environment
Few nightly activities
Sometimes long expiration
AIX environment
Currently server activity only during the day
Exception: Expiration, which is scheduled with
short intervals
17
17
Global Technology Services
Section: server performance
zOS environment
Very good DB performance, exception is s7
Migration only moderate
HP-UX environment
Very good DB performance
Migration only moderate to bad
Windows environment
Mixed behaviour
Moderate (w1) and very good (w2) performance
AIX environment
Mixed behaviour
Sometimes very good DB Performance (x1-x4)
18
18
Global Technology Services
Section: server volume
Only migration shows significant
volume
zOS environment
Low to very high migration
volume
Frequent disk pool overflows
HP-UX environment
Low migration volume
Frequent disk pool overflows
Windows environment
Low migration volume
AIX environment
Nearly no migration volume
19
6-May-09
Global Technology Services
Section: mount wait
zOS environment
Low waiting times
HP-UX environment
Low to moderate waiting times
Windows environment
Low to moderate waiting times
High waiting times for client actions
AIX environment
Low waiting times
High waiting times for client actions
20
20
Global Technology Services
Potential Storage Pool Overflows
Daily migration volume
compared to disk pool
size
Assumption: only 1x daily
migration
Potential small disk pool
if daily migration volume
exceeds storage pool
size
21
21
stgpool_name
adsmi0
BACKUP_DISK_LAN1
BACKUP_DISK_REDO1
BACKUP_DISK_REDO2
adsmi1
BACKUP_DISK_REDO1
BACKUP_DISK_REDO2
adsms1
ARCHIVE_DISK
BACKUP_DISKJ
adsms2
ARCHIVE_DISK
BACKUP_DISK
BACKUP_DISKJ
adsms3
ARCHIVE_DISK
BACKUP_DISK
adsms4
BACKUP_DISKJ
overflows stgpool_name
adsms5
5 ARCHIVE_DISK
10 BACKUP_DISKJ
10 adsms6
ARCHIVE_DISK
25 BACKUP_DISK
15 BACKUP_DISK2
BACKUP_DISK3
9 BACKUP_DISKJ
21 BACKUP_DISKJ_GR
adsms7
1 ARCHIVE_DISK
12 adsms8
18 ARCHIVE_DISK
BACKUP_DISKJ
2 adsmx7
17 BACKUP_3592_J1_FS
adsmx8
30 BACKUP_3592_J1_FS
overflows
7
16
14
5
4
3
1
4
31
1
22
4
4
Global Technology Services
LANfree considerations – by volume
161 LANfree Clients
platform_name
Total
HPUX
TDP MSExchg
TDP Oracle HP
TDP R3 HP
Grand Total
55
39
9
58
161
74 LANfree Clients
platform_name
HPUX
Total
1
TDP MSExchg
37
TDP R3 HP
36
Grand Total
74
Activities by Volume
Plattform
HPUX
HPUX
HPUX
HPUX
SUM
TDP MSExchg
Volume by date
GB
Count
<1
820
1-10
206
10-50
20
50-100
1
2007-04-19
24,166.77
8,402.95
23,353.95
9,215.77 32,569.72
2007-04-20
21,327.20
9,066.45
19,845.86
10,547.78 30,393.65
2007-04-21
18,156.31
8,570.01
16,822.02
9,904.30 26,726.32
1,047
4
2007-04-22
5,233.80
1,621.94
4,995.45
1,860.28 6,855.74
2007-04-23
19,138.22
9,542.61
17,910.95
10,769.87 28,680.82
9
11
50
147
221
9,392
1,042
10,434
2007-04-24
20,589.58
10,757.53
19,054.26
12,292.85 31,347.11
2007-04-25
18,260.77
11,085.51
16,958.34
12,387.93 29,346.27
2007-04-26
21,739.70
9,790.04
19,787.00
11,742.74 31,529.75
2007-04-27
20,816.35
9,917.61
19,560.24
11,173.71 30,733.95
2007-04-28
20,546.28
8,830.13
19,231.59
10,144.82 29,376.42
2007-04-29
4,866.14
2,083.47
4,654.87
2,294.74 6,949.61
<1
TDP MSExchg
TDP MSExchg
TDP MSExchg
TDP MSExchg
SUM
TDP Oracle HP
TDP Oracle HP
SUM
1-10
10-50
50-100
>100
TDP R3 HP
TDP R3 HP
TDP R3 HP
TDP R3 HP
TDP R3 HP
<1
1-10
10-50
50-100
>100
<1
1-10
SUM
SUM
SUM
Overall SUM
Small operations no longer LANfree
3,519
2,240
376
259
652
7,046
17,639
1,109
18,748
New
date
LAN
LANFREE
LAN
LANFREE
Total
2007-04-30
16,865.96
8,702.27
15,834.17
9,734.06 25,568.23
2007-05-01
15,993.69
10,719.76
14,715.15
11,998.30 26,713.46
2007-05-02
19,126.66
11,644.60
17,789.90
12,981.37 30,771.27
2007-05-03
22,755.03
9,169.84
21,149.79
10,775.08 31,924.87
2007-05-04
19,576.18
9,992.76
18,225.08
11,343.86 29,568.94
2007-05-05
18,911.51
8,449.42
17,552.52
9,808.40 27,360.93
2007-05-06
5,194.30
1,678.79
5,043.75
1,829.34 6,873.09
2007-05-07
19,298.28
9,299.15
18,092.39
10,505.05 28,597.43
2007-05-08
19,789.73
10,301.16
18,551.39
11,539.50 30,090.89
Median
19,218.25
9,234.50
18,001.67
10,658.83 29,361.34
Max
24,166.77
11,644.60
23,353.95
12,981.37 32,569.72
Min
4,866.14
1,621.94
4,654.87
1,829.34 6,855.74
Massive reduction of LANfree tape mounts
22
Old
6-May-09
Global Technology Services
LANfree considerations – by performance
161 LANfree Clients
platform_name
HPUX
TDP MSExchg
TDP Oracle HP
TDP R3 HP
Grand Total
Total
55
39
9
58
161
85 LANfree Clients
platform_name
Total
TDP MSExchg
39
TDP Oracle HP
1
TDP R3 HP
45
Grand Total
85
Activities by Thruput and GB
Platform
MB/sec Count Avg GB/Op
HPUX
HPUX
HPUX
HPUX
0-1
1-5
5-10
10-20
SUM
TDP MSExchg
TDP MSExchg
TDP MSExchg
TDP MSExchg
TDP MSExchg
1-5
5-10
10-20
20-50
2
1
18
157
TDP MSExchg
SUM
TDP Oracle HP
TDP Oracle HP
>50
TDP Oracle HP
TDP Oracle HP
TDP Oracle HP
SUM
TDP R3 HP
5-10
10-20
20-50
0-1
36
113
13
10,434
1,383
TDP R3 HP
TDP R3 HP
1-5
5-10
2,063
105
TDP R3 HP
TDP R3 HP
10-20
20-50
1,797
1,698
7,046
SUM
23
0-1
362
561
93
31
1,047
1
0-1
1-5
42
221
2,865
7,407
0.12
0.95
4.04
12.38
0.00
0.09
0.09
140.59
145.47
154.43
0.05
0.32
2.08
3.89
0.77
0.00
0.04
2.65
17.52
75.16
Volume by Thruput
New
Old
date
LAN
LANFREE LAN
LANFREE Total
2007-04-19 24,489.57
8,080.15 23,353.95
9,215.77 32,569.72
2007-04-20
2007-04-21
2007-04-22
21,738.70
19,242.94
5,196.03
8,654.95
7,483.38
1,659.71
19,845.86
16,822.02
4,995.45
10,547.78
9,904.30
1,860.28
30,393.65
26,726.32
6,855.74
2007-04-23
19,330.74
9,350.08
17,910.95
10,769.87
28,680.82
2007-04-24
2007-04-25
2007-04-26
2007-04-27
2007-04-28
2007-04-29
2007-04-30
2007-05-01
2007-05-02
2007-05-03
2007-05-04
2007-05-05
2007-05-06
2007-05-07
2007-05-08
Median
Max
Min
21,497.01
19,350.50
22,045.69
22,113.83
9,850.10
9,995.77
9,484.05
8,620.13
19,054.26
16,958.34
19,787.00
19,560.24
12,292.85
12,387.93
11,742.74
11,173.71
31,347.11
29,346.27
31,529.75
30,733.95
21,695.53
4,848.17
17,271.56
16,121.22
7,680.89
2,101.44
8,296.67
10,592.24
19,231.59
4,654.87
15,834.17
14,715.15
10,144.82
2,294.74
9,734.06
11,998.30
29,376.42
6,949.61
25,568.23
26,713.46
20,472.16
23,449.74
20,517.47
19,466.40
5,168.56
20,737.75
22,198.89
10,299.11
8,475.14
9,051.46
7,894.52
1,704.52
7,859.68
7,892.01
17,789.90
21,149.79
18,225.08
17,552.52
5,043.75
18,092.39
18,551.39
12,981.37
10,775.08
11,343.86
9,808.40
1,829.34
10,505.05
11,539.50
30,771.27
31,924.87
29,568.94
27,360.93
6,873.09
28,597.43
30,090.89
20,494.82
24,489.57
4,848.17
8,385.91
10,592.24
1,659.71
18,001.67
23,353.95
4,654.87
10,658.83
12,981.37
1,829.34
29,361.34
32,569.72
6,855.74
6-May-09
Global Technology Services
Solution strategy: Equalize potential bottlenecks by
introducing a layered architecture which accounts all
included components
Each layer is setup by following
defined standards and will be easily
expandable in a horizontal manner
by adding additional components.
Through standardized usage of
included components a vertical
efficiency can process data by its
business value.
By introducing planning mechanisms
this highly scalable environment can
be administered very effectively and
efficiently.
24
24
Global Technology Services
Findings require infrastructure alignment to
implement solution strategy
TSM Software
Consolidate environment by focussing on only two platforms
Adjust config to Best Practices
Define standard mechanisms for client backup procedures according to their SLAs
Establish central configuration manager for TSM objects
Disk Hardware
Reduce complexity coming from LANfree backups and force Disk2Disk by
enlarging disk pools by 10-15TB
Establish shared disk environment by establishing GPFS
Establish standardized Disk environment with DS8300 for Open Systems and
Mainframe
Tape Hardware
Establish standardized Tape environment with TS1120 for Open Systems and
Mainframe, which also supports encryption
SAN Hardware
New dual fabric SAN with >200 ports per fabric
25
25
Global Technology Services
What about TSM 6.1?
DB/2 included
DB/2 instance is hidden
Many tables can be seen but not all can be accessed
Don‘t do number crunching direct on TSM DB
Include DB/2 parameters into assessment
TSM Reporting available
Offers limited view on available data
Aggregates data
Presented method is still valid for TSM 6.1
Little Adjustment might be needed
26
Global Technology Services
27
27
Global Technology Services
[email protected]
28
Global Technology Services
29
Global Technology Services
Disclaimer
No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation.
Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This
information could include technical inaccuracies or typographical errors. IBM may make improvements and/or changes in the
product(s) and/or program(s) at any time without notice. Any statements regarding IBM's future direction and intent are subject to
change or withdrawal without notice, and represent goals and objectives only.
The performance data contained herein was obtained in a controlled, isolated environment. Actual results that may be obtained in other
operating environments may vary significantly. While IBM has reviewed each item for accuracy in a specific situation, there is no
guarantee that the same or similar results will be obtained elsewhere. Customer experiences described herein are based upon
information and opinions provided by the customer. The same results may not be obtained by every user.
Reference in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or
services available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this
document is not intended to state or imply that only that program product may be used. Any functionally equivalent program, that
does not infringe IBM's intellectual property rights, may be used instead. It is the user's responsibility to evaluate and verify the
operation on any non-IBM product, program or service.
THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR
IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR
INFRINGEMENT. IBM shall have no responsibility to update this information. IBM products are warranted according to the terms and
conditions of the agreements (e.g. IBM Customer Agreement, Statement of Limited Warranty, International Program License
Agreement, etc.) under which they are provided. IBM is not responsible for the performance or interoperability of any non-IBM
products discussed herein.
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of
performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products
should be addressed to the suppliers of those products.
30
6-May-09