HP CLUSTER MANAGEMENT UTILITY

Transcription

HP CLUSTER MANAGEMENT UTILITY
HP CLUSTER MANAGEMENT
UTILITY
March 2011
Various scenarios managed by CMU
Deployment of
images
Site has customized or multiple images tuned for their
workload
Site wants to deploy image quickly across many nodes
Support for emerging Site wants to leverage new features not yet in mainstream
HPC technologies
Staff has HPC and Linux competency
System management
Cost
Need for simple central GUI for monitoring and issuing
commands
Need for real time monitoring of node status and activity on
cluster and subgroups
“Free software” tools don’t work across all platforms and
applications, and lack support
More expensive options may include features not required
and a steep learning curve
HP CMU Overview
–
–
Easy, low-cost customizable utility
Scalable cluster CLI and GUI
−
−
–
Features:
•
•
•
–
–
–
One-click selection of groups of nodes
with menu-selectable operations
Extensible
Scalable provisioning
Configurable scalable monitoring
Remote management commands
Proven: over 50000 licenses, Top500
sites included with 1000’s of nodes
Broad HP hardware platforms support
Multiple Linux distributions
•
Including Hybrid support w/Windows
SUSE Linux
Enterprise
HP CMU Major Features
– Provisioning (GUI and CLI)
• Capture
& deploy a golden image on all the nodes (or groups of nodes)
• Scalable provisioning: 4000+ nodes
• Unassisted auto-install (kickstart, autoyast, debian preseed) support
• Diskless compute nodes support
– Management (GUI and CLI)
• day
to day administration of the cluster from one central point.
• halt, re/boot or broadcast commands to a set of nodes
• Cmudiff tool for identifying outliers in configuration or operation
– Monitoring
• view
cluster activity in real time ‘at a glance’
• receive alerts when something special happens on a compute node or on a
set of compute nodes
• dynamic resource group creation as jobs submitted
• collectl support http://collectl.sourceforge.net/
– Lightweight: 1 RPM, easy to
upgrade!
HP CMU configuration GUI
– HP CMU is configured and customized using the HP CMU GUI. Tasks
include:
•
Manually adding, removing, or modifying nodes in the HP CMU database
•
Invoking the scan node procedure to automatically add several nodes
•
Adding, deleting, or customizing HP CMU groups
•
Managing the system images stored by HP CMU
•
Configuring actions performed when a node status changes such as display a warning,
execute a command, or send an email
•
Exporting HP CMU node list in a simple text file for reuse by other applications
•
Importing nodes from a simple text file into HP CMU Database
HP CMU configuration
– 1. Start HP CMU on the management node.
– 2. Start the GUI client on the GUI workstation.
– 3. Scan the compute nodes.
– 4. Create the network entities.
– 5. Create the golden image. More than one golden image can be
created.
– 6. Create the logical groups and user groups.
– 7. Backup each golden image in its logical group.
– 8. Clone the compute nodes.
– 9. Deploy the management agent on the compute nodes.
− Install monitoring rpm.
− Ping all nodes from the management node.
Compute node administration
– Halting
– Rebooting
– Booting and powering off using the management card of the compute
nodes
– Broadcasting a command to selected compute nodes using a secure
shell connection or a management card connection
– Direct node connection by clicking a node to open a secure shell
connection or a management card connection
System disk replication
– Creating a new image. While backing up a compute node system disk,
you can dynamically choose which partitions to backup
– Replicating available images on any number of compute nodes in the
cluster
– Managing as many different images as needed for different software
stacks, different operating systems, or different hardware
– • Cloning from one to 4096 nodes at a time with a scalable algorithm
which is reliable and does not stop the entire cloning process if any
nodes are broken
– • Customizing reconfiguration scripts associated with each image to
execute specific tasks on compute nodes after cloning
Compute node monitoring
– You can monitor up to 4096 nodes using a single window.
– HP CMU provides the connectivity status of each node as well as
sensors.
– HP CMU provides a default set of sensors such as CPU load, memory
usage, I/O performance, and network performance.
– You can customize this list or create your own sensors. You can
display sensor values for any number of nodes.
– Information provided by HP CMU is used to ensure optimum
performance and for troubleshooting.
– You can fix thresholds to trigger alerts. All information is transmitted
across the network at time intervals, using a scalable protocol for realtime monitoring.
A Frame containing the nodes in the cluster
B Tree structure of nodes
C All possible states of a node
D Drop-down menu to view nodes based on classification
• Network Entity
• Logical Group
• User Group
E Tool bar containing start, stop, and refresh buttons. The green LED appears
when monitoring polls the cluster nodes.
F Menu bar
G Title of the figures and tables displayed in the main frame
H The main frame displaying active monitoring and configuration information
Example of building a
supercomputer with CMU
System mgr environment
Provisioning
System
Admin and
monitoring
Platform HPC
LSF
MPI
RTM
Adaptive Cluster
Application Center
HP SHMEM
& UPC
User environment
Job scheduling
and resource
mgmt
MPI
HPC Open source,
Dev. Job tools in CMU
tools
Auto discover and config
Cluster database
Linux Distribution
Solution Architecture
CMU
HP Cluster Platform
HP Cluster Test Suite
(cluster hw diagnostics)
Multiple Linux
Distros, drivers
Provisioning a cluster stack with CMU
– Scan node automatically registers the nodes with their network
parameters
– Install and configure the cluster stack on one “golden” compute node
•
Use CMU kickstart/autoyast tools to install a new OS
•
Install workload scheduler/MPI/LVP/HP-MPI/applications/etc. for their environment
•
Configure their existing user accounts, filesystems, etc.
– Backup the compute node image with CMU into a repository
•
Customer choice: disk image or diskless?
– Clone/Distribute this “golden image” to the rest of the cluster
•
While cloning, perform automatic firmware updates if needed
•
Perform post-cloning and node-customization tasks
– Provision multiple stacks on multiple sets of nodes if desired
– Use the image editor for minor image modifications without a full backup
operation
CMU Diskless Support
– Based on NFS-root design
•
Golden node is installed on disk, then copied to CMU mgt node
•
Root file-system is read only and shared among all compute nodes
•
Specific read-write directories are created for each node and mounted on each node
•
List of read-write directories and files is customizable
– Diskless cluster requirements
•
1 NFS server for each 256 nodes (CMU supports configuring multiple NFS servers)
•
4 GB non-SATA storage for each compute node in NFS server
•
Additional NFS servers as part of HA solution for NFS
•
Network infrastructure optimized for diskless needs (1Gb/10GbE & no bottlenecks)
– CMU recommends disk-based clusters
•
Easier to manage; more cost-effective; no single points of failure
•
Security is the only advantage of diskless (all data resides on NFS server)
•
Disk failures are a valid concern, but CMU cloning restores image on new disk fast
•
Better diskless performance is a myth because the same tunings required for diskless
can be done for disk-based clusters
CMU GUI Basics
CMU Cluster
Mgmt Panel
displays all
nodes in
selected
groupings:
by switch
location; by
image; or by
custom
grouping
Node States
display
current state
of each node
CMU Main
Display Panel
Alerts displayed
along the bottom
CMU GUI Basics
–
–
–
–
Right-click in the main
area to select which
sensors to display
CMU pre-configured
with standard
sensors: CPU and
memory usage, and
disk and network I/O
Simple to add any
sensor or alert
CMU provides simple
support for monitoring
GPU temp and ECC
errors on sl390s
HP CMU monitoring interface – large cluster view
Dynamic
grouping in
one petal
CMU Remote Management Commands
Selected
nodes
Power
commands
Broadcast
commands
Provisioning
commands
User-defined
commands
CMU Remote Management Commands
•
Multi-window broadcast command (access OS or console)
Type here…
… and see it there!
CMU Remote Management Commands
•
Single-window pdsh
with cmudiff example
One command
executed across
a set of selected
nodes…
…finds one node
running with an
old BIOS version!
Worldwide CMU Deployments
UNIVERSITIES
GOVERNMENT and RESEARCH LABS
ENGINEERING
ENERGY
HP Enabling HPC Innovation,
Affordability and Efficiency
Purpose-built systems
for scale
Holistic energy mgmt portfolio
Modular and adaptable solutions
Worldwide expertise and
experience
21
Performance
Efficiency
Agility
Confidence
Resources
www.hp.com/go/hpc
www.hp.com/go/cmu:
• link to white papers, documentation
• CMU Forum on IT Resource Center
Outcomes that matter.