Stephan Peinkofer

Transcription

Stephan Peinkofer
TSM Performance Tuning
Exploiting the full power of modern industry
standard Linux-Systems with TSM
Stephan Peinkofer
[email protected]
Agenda
‰ Network Performance
‰ Disk-Cache Performance
‰ Tape Performance
‰ Server Performance
‰ Lessons Learned
‰ Additional Resources
Network
The Problem with High-Speed
High Speed Networks
‰ Current Ethernet technology can transfer up to 1.25 GB/s
‰ With default settings we cannot saturate a single Gigabit
Tuning Network Settings for
Gigabit and Beyond
‰ Utilizing (Multi-)Gigabit links requires tuning of:
z TCP Window size
• How much can be sent/received before waiting for ACK
z Maximum Transfer Unit
• How much can be sent/received per Ethernet frame
TCP Window Size
$> cat /etc/sysctl.conf
…
net.ipv4.tcp_rmem = 4096 87389 4194304
net ipv4 tcp wmem = 4096 87389 4194304
net.ipv4.tcp_wmem
net.core.rmem_max = 4194304
net.core.wmem
et.co e. e _max
a = 4194304
9 30
‰ Sets a limit of 4MB for the receive and send window
‰ TSM option
p
TCP Window size has to be set to 2MB on server and client
Maximum Transfer Unit
$> ifconfig ethX mtu XXXX
$ cat /
$>
/etc/sysctl.conf
/
l
f
…
net ipv4 ip no pmtu disc = 0
net.ipv4.ip_no_pmtu_disc
‰ Set MTU to max supported size
‰ Enable MTU path discovery for communication with non-JumboFramed hosts
‰ Only
O l useful
f l if every intermediate
i t
di t system
t
supports
t Jumbo
J
b Frames
F
Measuring the Success
‰ IPERF was used to benchmark the network performance
‰ http://dast.nlanr.net/Projects/Iperf
Measuring the Success
‰ Server
$ i
$>iperf
f –s –w 1M –f
f M
‰ Client
$> iperf -c <server> -t 20 -w 1M -f M
------------------------------------Cli t connecting
Client
ti
to
t <server>, TCP port
t 5001
TCP window size: 2.00 MByte (WARNING: requested 1.00 MByte)
-----------------------------------------------------------[ 3] local <IP> p
port 36484 connected with <IP> port
p
5001
[ 3] 0.0-20.0 sec 10665 MBytes
533 MBytes/sec
Measuring the Success
‰ Influence of TCP Window size on a 10 Gbit Ethernet link
Some Thoughts on Bonding/Trunking
‰ Great for high availability
‰ Mostly not suitable for increasing performance
z Single client can utilize a single link only
z Multiple clients balance across available links only if:
• Clients and server are in the same subnet or
• Balancing algorithm uses IP addresses (unlikely)
‰ We have to keep in mind that:
z Switch is responsible for balancing incoming traffic
z Server is responsible for balancing outgoing traffic
Alternatives to Bonding
‰ Use next Ethernet generation
‰ Balance manually by using multiple IP‘s
Disk Storage
Disk-Storage
Photo from Helmut Payer, gsiCom
Main Factors for good
Disk-Cache Performance
‰ Stripe-Size
‰ Locality of disk accesses
‰ IO-Subsystem of OS
‰ Number of FC-Links utilized in parallel
p
Stripe Size
Stripe-Size
‰ Rule of thumb:
z Random IO => Small Stripe-Size
z Sequential IO => Large Stripe-Size
‰ TSM Disk-Cache is rather a sequential IO workload
z Use Stripe-Size of 512 KB or larger
‰ TSM Database is rather a random IO workload
z IBM recommends Stripe-Size of 256 KB
Locality of Disk Accesses
‰ How TSM uses Disk-Cache volumes cannot be influenced
‰ How the OS lays out the volumes can be influenced
Locality of Disk Accesses
‰ TSM can allocate multiple disk volumes in parallel
tsm> DEFINE VOLUME /stg/vol1.dsm FORMATSIZE=16G
ANR0984I PROCESS XX for DEFINE VOLUME started ...
...
tsm> DEFINE VOLUME /stg/vol4.dsm FORMATSIZE=16G
ANR0984I PROCESS XY for DEFINE VOLUME started ...
‰ How the volumes are placed on disk depends on the file
s stem
system
XFS
‰ Allocates disk-blocks when file system buffer is flushed
Write(1)
Write(2)
Write(3)
Write(4)
Write(1)
Write(2)
Write(3)
Write(4)
Filesystem
y
Cache
Flush Buffers
Disk
EXT3
‰ Allocates disk-blocks when data hits the file system buffer
Write(1)
Write(2)
Write(3)
Write(4)
Write(1)
Write(2)
Write(3)
Write(4)
Filesystem
y
Cache
Flush Buffers
Disk
Comparing EXT3 and XFS
‰ XFS has no problems with parallel allocation of Disk-Volumes
‰ XFS has a slight weakness with re-write workloads
‰ On EXT3
EXT3, volumes have to be defined one after another
Linux IO-Subsystem
IO Subsystem
‰ Linux’s IO-Subsystem is rapidly evolving
‰ More and more screws to turn
‰ More and more complex to tune
Linux IO-Subsystem
IO Subsystem
‰ Current observation:
z Write performance OK with default settings
z Read performance must be tuned by setting read-ahead of
block device
$ blockdev
$>
bl kd
–setra <bytes>
b
<device>
d i
IO Multipathing
‰ Typically more than one FC-Link is used for connecting servers to
storage for HA reasons
‰ Available FC-Links can be used in parallel to gain optimal
performance
‰ IO-Balancing algorithm depends on IO-Failover driver
‰ Configuration for exploiting performance benefit depends on
algorithm
IOMP with Qlogic Drivers
‰ Qlogic driver supports assignment of individual LUNs to a
specific FC-Link
z Performance
P f
per LUN iis nott iincreased
d
‰ Resulting configuration:
z Use at least 2 LUNs per TSM
TSM-Instance
Instance and stripe them
with Software-RAID 0
z Use multiple TSM-Instances per server and use dedicated
LUNs per instance
Measuring the Success
‰ IOZONE was used to benchmark disk performance
‰ http://www.iozone.org
Measuring the Success
‰ Write file sequential
$ i
$>iozone
-s 10g -r 512k
k -t 1 -i0
i -w
‰ Read file sequential
$>iozone -s 10g -r 512k -t 1 –i1 -w
-s
-r
-t
-i
-w
10g :
512k:
1
:
0|1
| :
:
Amount to Write/Read is 10 GB
Size of Record to Write/Read is 512 KB
Write 1 File in parallel
Perform Write | Perform Read
Don’t delete Files after benchmark
Comparison of Stripe-Size
Stripe Size
‰ IBM FastT900 with 6 SATA-Disks in a RAID5 volume
‰ Workload: Single file sequential read/write
EXT3 Block Allocation
‰ IBM FastT900 with 6 SATA-Disks in a RAID5 volume
‰ Workload: 12 parallel sequential reads
Comparison of Read-Ahead
Read Ahead
‰ STK FlexLine 380 with 7 FC-Disks in a RAID5 volume
Tape Storage
Tape-Storage
TSM Tape Performance
‰ No real influence on tape performance
‰ Barely seen 125 MB/s for more than a few seconds with
Titanium drives
‰ TSM v5.3 on Linux seems not to be ready for current
high-end tapes yet
‰ Assumption: Some buffers are too small
Photo from Sun Microsystems
Server
Photo from Helmut Payer, gsiCom
Main Factors of Server Performance
‰ PCI Bus throughput
‰ Memory Bandwidth
‰ Number of CPU-Cores
‰ Performance of a CPU-Core
PCI Bus Throughput
‰ Data travels 4 times over PCI Bus
z => PCI Bus is main bottleneck
‰ PCI-X barely achieves half of the theoretical throughput in
typical TSM workloads
‰ PCI-Express performs much better because of its switched
topology
‰ General Rule: Don‘t
Don t try to save money on the peripheral
interconnect
Memory Bandwidth
‰ As long as DIRECT-IO is not used, data travels 4 times
through memory
‰ Database operations rely on memory performance too
Number of CPU-Cores
CPU Cores
‰ TSM is a multi-threaded application
z The more CPU-cores available the more work can be done
i parallel
in
ll l
Lessons Learned
Tuning
‰ Network
z TCP Window-size: always
z MTU: if applicable
‰ Disk
z Read-ahead
z Define Cache-/DB-/Log-volumes
g
sequentially
q
y
Criteria for next Servers
‰ Have fastest peripheral interconnect available
‰ Have 10 Gbit-Ethernet
‰ Have at least 4-Gbit FC-HBAs
‰ Have at least 4 CPU-Cores
‰ Have upper class CPU-Core performance
Additional Resources
‰ IBM Tivoli Storage Manager Performance Tuning Guide v5.3
‰ IBM DS4000 Best Practices and Performance Tuning Guide
Thank you for your Attention
‰ Any questions?
Contact: [email protected]