2. EqualLogic Load Balancers

Transcription

2. EqualLogic Load Balancers
EqualLogic PS Series Load Balancers
and Tiering, a Look Under the Covers
Keith Swindell
Dell Storage Product Planning Manager
Topics
•
•
•
•
Guiding principles
Network load balancing
MPIO
Capacity load balancing
– Disk spreading (wide striping)
– RAID optimizer
• Free space balancing
• Automatic Performance Load Balancer
• Tiered Array
– Shock absorber
• Summary
EqualLogic
Load
Balancers:
overview and
guiding
principles
3
3
Observations
• As you raise work on a device, its latency goes up
– When you overload a device the observed latency will become
unacceptable
• SANs run multiple workloads typically
– The net effect is the sum of the workloads is an ever growing
random I/O workload
• Workloads change
– Sometimes we know about it
– Many times we don’t (we react to it)
• This is resolved by automatically spreading
work across available hardware proportionally
EqualLogic Load Balancers
The NLB (Network Load Balancer) manages the assignment of individual iSCSI
connections to Ethernet ports on the group members
EqualLogic MPIO extends network load balancing end to end host <->
array port
The CLB (Capacity Load Balancer) manages the utilization of the disk
capacity in the pool
Free Space balancing manages dynamic space usage between arrays
The APLB (Automatic Performance Load Balancer) manages the
distribution of high I/O data within the pool.
The Hybrid Array Load Balancer manages the distribution of high I/O data
within a hybrid array.
Network
Load
Balancing
6
6
Network Load balancer
Goal: Interface load is balanced
• Internal communication is balanced
• I/O is directed to the most used array
• Best network paths are preferred
Automatic Network Load Balancer Operation
•
Each array port’s statistics are analyzed every 6 minutes.
– All traffic on port is considered:
›
›
›
•
Host access
Replication
Internal communications (if more than 1 array)
Are any of the ports overloaded?
– Near maximum
– Large difference in utilization vs. other ports
•
If yes, what is the overloaded port and most eligible connection on that port?
– Is there a port that is less loaded we can move to?
– If yes, move the connection to that [array] port*
›
›
•
This is done via iSCSI commands
It is transparent to host
Will only make one determination every 6 minutes so that the effect can be taken into account for the next
analysis.
*In multi-member groups the selection will take into account that a server is doing more I/O to a particular member’s data - the connection
load balancer will move the connection to that member.
Enhanced MPIO effect on network connection load
balancing
• The EqualLogic Host Integration Tools include support for Enhanced MPIO
– Windows, VMware, and Linux
– Enhanced MPIO automatically creates additional connections and will send I/Os directly to the member
which contains the data requested by the initiator
– Very helpful in a congested networking environment
• The network connection load balancer has knowledge built in to communicate with the OS
based Enhanced MPIO processes
• The group and Enhanced MPIO module agree on the optimal connection matrix for best
performance
• Only interface balancing is needed when using the Enhanced MPIO
9
Why use EqualLogic Multipathing?
• It is best practice to run MPIO on your servers
– Improves reliability of storage access
– Improves resilience to errors
– Improves performance
• EqualLogic MPIO is easier to use than host/OS MPIO
– Automatically manage iSCSI sessions
›
›
Create iSCSI sessions based on the SAN configuration to ensure High Availability and maximize I/O performance
Sessions automatically raise and lower based on operating needs
– It comes with the product and does not cost extra.
• It performs better than host/OS MPIO
– Optimizes network performance and throughput
– Provides End to end network balancing (rather than array to switch balancing)
– Route I/O directly to Member which will be servicing it
›
Reduces overall network traffic
Other Details – Automatic network load balancing
• iSCSI connections are “long lived”
– Connections are created by servers, and commonly exist for days/weeks
• The operations of the network load balancer can be seen in PS GUI or SANHQ
›
›
you see iSCSI connections are spread across multiple arrays & array ports
You see event messages for server iSCSI logins that are not tied to reboot/restarts
– In your host MPIO
›
›
Connection are spread across host ports and array IP addresses
Event messages showing iSCSI connection being adjusted
• Some of my hosts have a disruption during network load balancing operations
– This is not normal, call support
– Typically a combination of
›
›
Network errors ( switches not properly setup, Cabling or configuration problem)
Server OS settings not properly configured (iSCSI settings or MPIO settings)
Automatic Capacity Balancer Operation
Fit volumes optimally across pool members
• Spread volume data among the member arrays
keeping in use and free space percentages equivalent
Goals:
• Keep capacity balanced within a pool
• Honor volume RAID preference hints if possible
• In larger pools (3 or more members)
– Tries to keep volume on 3 members if larger pool
› could spread to more if needed
– Performs automatic R10 preference
Automatic Capacity Balancer Operation Details
• Data within a volumes is first distributed across members via the CLB.
– The CLB looks at the total amount of volume reserve space
• This is compared with free space on each member to determine the optimal members that will
receive data
– The collection of volume data that the CLB places on a member make up a
volume slice
– By default, the CLB will strive to create no more than 3 slices per volume
›
This is not a hard limit as a volume can have as many slices as there are members in a
pool if required to hold the capacity of the volume
Automatic Capacity Balance
When does the capacity balancer run?
A member
• Is added or is removed from the group
• Is merged or moved to another pool
• Following correction of member free space trouble.
A volume
• is created, deleted, expanded, change in preferences, or bound (bind / unbind in cli)
Timer
• If a balance operation has not run for 36 hours, a timer will start a balance evaluation. An actual
balance operation may or may not execute, depending on whether or not the pool is already
sufficiently
Automatic Capacity Balancer Operation
RAID Placement
• At volume create (or later) an administrator can
select a “RAID preference” for the volume.
– If the desired RAID type is available in the pool, the CLB
will attempt to honor the request
– After re-calculating for RAID preference, the pool
rebalance begins
• In larger groups (3 members or more) the CLB
heuristically determines if RAID-10 would be
advantageous for the volume
• Automatic Performance Load Balancer in many
cases renders automatic raid placement obsolete
Free Space Balancing
• In PS groups there are many volume operations that consume and release free space
dynamically
– Volumes grow/shrink – thin provisioning (map/unmap)
– Snapshots space grows/shrinks – create / delete, space grows as
writes occur to volumes or other snaps
– Replication – recovery points, freeze of data for transmission
• In multi-member pools, this can cause free space imbalances
between members in the pool
– Data changes more quickly on faster members typically
(e.g. consuming snapshot space faster)
– Free Space balancing adjusts this in background shifting in-use and free pages between members
• When capacity gets low, the member enters free space trouble state.
– In worst case scenarios, these imbalances can affect the dynamic operations
– When in free space trouble state, the load balancer works to more rapidly free space on member that is
running low by swapping in use pages for free pages with other members.
– In groups with > 3 members It may also change the slice layout amongst members
Other Details – Capacity and Free space balancing
•
•
•
Free Space Trouble: Member FST vs. Pool FST
–
Member FST occurs when a member of the group is low on space
–
Pool FST occurs when a pool is low on space.
Are there ways I can create more free space in a pool?
–
Yes
–
unmap non-replicated volumes (V6 FW)
–
use snapshot space borrowing (V6)
–
adjust snapshot or replication reserves
–
convert volumes from full to thin provisioned
–
Delete unneeded snapshots (or volumes)
–
Adjust schedule keep counts
–
Add more storage to pool
I’m seeing a lot of performance issues, yet workloads do not appear very heavy – what do I do?
–
Call support “Houston we have a problem”
–
It may be free space balancing is occurring too frequently
›
–
•
If demand for dynamic page allocations is greater than balancing rate, free space can be in short supply on a member
Common solution is to increase free space in pool
Very high capacity arrays mixed with lower capacity in pool
›
–
Example PS65x0 with 3TB drives with 6100XV with 146GB disks
We adjust the proportions for capacity spread (PS65x0 array capacity is discounted) to prevent the large capacity system from carrying the bulk of the workload
Automatic
Performance
Load
Balancing
18
Automatic Performance Load Balancer
• Automatically operates when there are multiple
members in a pool
• APLB is designed to minimize response time for all
pool members
– Arrays are optimized in pool by trading hot and
cold data
– Trigger: significant latency differences between members
• The APLB operates in near real time
– Data exchanges occur as frequently as every 2 minutes
– APLB can adjust the distribution of data and free space as
conditions change
How does each of the Load Balancers work?
Automatic Performance Load Balancer
SAN
• Data Movement Automation is
based on the POOL…
• Data is moved through SAN
infrastructure
• Data reside on arrays based
on:
Vol2
Pool
– Capacity balancing
– Access frequency vs. Array
latency
Member01
Latency:
Latency:1020msms
EQL Group
Member02
Latency:
Latency:40
20ms
ms
Automatic Performance Load Balancer
Data Swapping Algorithm
Evaluate member latencies over 2 minute interval
Track hot data by fine grain statistics of volume heat by regions
If an out of balance condition is detected, select some hot data on
the overloaded member and an equal amount of cold data from the same
volume on a less loaded member
Identify cold member involved in the swap based on member latency
and headroom left on the member
First choice is assigned (reserved) unallocated data.
Second choice is cold allocated in-use data.
Swap the data (up to 150MB), wait, re-evaluate the pool, and continue if needed
Wait minimum of 2 minutes then re-evaluate
21
Automatic Performance Load Balancer
Different array types within Pool (Tiered Pool)
22
Observing Automatic Performance Load Balancing
Latency comes down
Avg. IOPs goes up
Member B
Member A
Group performance results
Dell internal testing May 2011 - Group setup: a single pool containing 3 arrays,
PS6500 10k (fastest),\ PS6000XV 15k (second fastest) and PS6000E 7k (slowest)
Queue depth reduced
Other Details – Automatic Performance Load Balancing
• The configuration is multiple members in a pool
– Single member pools can use hybrid arrays for similar results
• The algorithm works well with
– Different disk capacities
– Different disk speeds
– Different RAID types (and RAID performance differences)
• Optimal tiering depends on adequate resources at different tiers
– SANHQ can help look into operating data
• If workloads do not optimize over time
– May be an issue of not enough resources at specific tier
– Workload is not tiered
– Technical specialists or support can assist analyzing SANHQ data
Tiering
within an
Array
25
How to tier with a single member pool (or group)?
• Common array models are
– comprised of a single Disk speed
› (7200, 10K, 15K, SSD)
– Are configured as a single RAID type
› R6, R10, R50
• There are models that have multiple disk types
– 60x0XVS (15K + SSD)
– 61x0XS (10K + SSD)
– 65x0ES (7200 + SSD)
• These are sometimes called hybrid arrays
Hybrid Operation Two unique technologies
• 1st Balancing Technology: Fast shifting within the array of hot and cold data between
HDD and SSD drives
– Hot data is monitored, and can be quickly shifted from HDD to SSD
›
Balancer can start reacting with 10 seconds of workload shift
• 2nd Technology: Write cache extension / accelerator
(shock absorber)
– A portion of SSD is used as a controller write cache extension
– Dramatically increases the size of the write cache
– I/Os are later played back to the normal storage
• The total usable capacity of the storage array is the sum of HDD and SSD (after RAID &
shock absorber)
– Hybrids use exclusively a RAID type call “Accelerated R6”
How does each of the Load Balancers
work?
Hybrid Arrays – Auto-tiering within an
array
SQL
Exchange
Oracle
Archive
• Volumes are initially placed on
SSD until about 2/3 full, then
spread across both SAS and
SSD drives
Switched Gb Ethernet
Exchange
Oracle
• When data on a volume is
frequently accessed it turns hot
SQL
Storage Pool
• If the hot data is on the HDD, the
data will be moved to the SSD drives
Pages
RAID
RAID 6 Accelerated
Disks
SAS Drives
SSD Drives
PS Series Array 1
Automatic and transparent load balancing within array
Hybrid SANHQ View Hybrid Array SSDs vs. SAS Drives
IOPS at workload saturation
80% from SSD
20% from SAS
Source: Benefits of Automatic Data Tiering in OLTP Database
Environments with Dell EqualLogic Hybrid Arrays, Dell TR-PS002
March 2011
Hybrid Operation Multi-tiered VDI workload
Server
= High-IOPS “hot” data
= Low-IOPS “warm” data
Desktop
Linked Clones
Volume 1
Gold Image
Volume
Switched Gb Ethernet
Desktop
Linked Clones
Volume 2
PS61x0XS SSD/HDD Hybrid
SSD
HDD
Hybrid Results with SQL Server Testing conclusions
• OLTP in production exhibits similar
characteristics
– high % of I/O directed at small % of total
dataset
• EqualLogic hybrid arrays automatically
tiers OLTP workloads by intelligently and
continuously moving the “hot” datasets
to SSD tier
• Performance continues to be optimized
even as access patterns change
Source: Benefits of Automatic Data Tiering in OLTP
Database Environments with Dell EqualLogic Hybrid
Arrays, Dell TR-PS002 March 2011
Normalized Improvements in Transactions
Hybrid SSD Array vs. SAS Array
Other Details – Hybrid Arrays
•
How can I determine if my application needs SSD (or SSD tiering)?
– SANHQ has displays “Group I/O Load Space Distribution”
•
Do Hybrid arrays work with other load balancers (APLB, Network, etc.)
– Yes
– Hybrid arrays can be mixed in pool with non-hybrid arrays
•
Can I have multiple hybrids in a pool?
– Yes
•
Can I bind volumes to hybrids?
– Yes – choose RAID preference for Accelerated R6
•
Can I bind volumes to the SSD in a hybrid?
– Data placement within a hybrid is done by the load balancer only
•
In current generation Hybrids XS, ES, how many SSDs, how much SSD capacity?
– 7 SSDs, ~ 2TB of SSD capacity per array
•
Are there all SSD arrays?
– Yes, 61x0S models
Summary
33
Summary
• The various load balancers that work in an EqualLogic PS Series pool provide a flexible, dynamic
operations that quickly adapt to the shifting workload requirements.
• For maximum automatic balancing, pool multiple arrays together
• Follow Best Practices
– Keep up to date in firmware
›
Consider V6 FW for latest in load balancing features
– Array choice & setup
›
Choose RAID types for required reliability
– Use SANHQ
– Host setup
›
MPIO and iSCSI settings
– Network setup
›
Switch settings & configuration
Q&A
35
Thank You!
36
Notices & Disclaimers
These features are representative of feature areas under
development. Nothing in this presentation constitutes a
commitment that these features will be available in future products.
Feature commitments must not be included in contracts, purchase
orders, sales agreements of any kind. Technical feasibility and market
demand will affect final delivery.
THIS PRESENTATION REQUIRES A DELL NDA AND MAY NOT BE
PROVIDED ELECTRONICALLY OR AS HARDCOPY TO CUSTOMERS
OR PARTNERS.
37