HP CLUSTER MANAGEMENT UTILITY
HP CLUSTER MANAGEMENT
Various scenarios managed by CMU
Site has customized or multiple images tuned for their
Site wants to deploy image quickly across many nodes
Support for emerging Site wants to leverage new features not yet in mainstream
Staff has HPC and Linux competency
Need for simple central GUI for monitoring and issuing
Need for real time monitoring of node status and activity on
cluster and subgroups
“Free software” tools don’t work across all platforms and
applications, and lack support
More expensive options may include features not required
and a steep learning curve
HP CMU Overview
Easy, low-cost customizable utility
Scalable cluster CLI and GUI
One-click selection of groups of nodes
with menu-selectable operations
Configurable scalable monitoring
Remote management commands
Proven: over 50000 licenses, Top500
sites included with 1000’s of nodes
Broad HP hardware platforms support
Multiple Linux distributions
Including Hybrid support w/Windows
HP CMU Major Features
– Provisioning (GUI and CLI)
& deploy a golden image on all the nodes (or groups of nodes)
• Scalable provisioning: 4000+ nodes
• Unassisted auto-install (kickstart, autoyast, debian preseed) support
• Diskless compute nodes support
– Management (GUI and CLI)
to day administration of the cluster from one central point.
• halt, re/boot or broadcast commands to a set of nodes
• Cmudiff tool for identifying outliers in configuration or operation
cluster activity in real time ‘at a glance’
• receive alerts when something special happens on a compute node or on a
set of compute nodes
• dynamic resource group creation as jobs submitted
• collectl support http://collectl.sourceforge.net/
– Lightweight: 1 RPM, easy to
HP CMU configuration GUI
– HP CMU is configured and customized using the HP CMU GUI. Tasks
Manually adding, removing, or modifying nodes in the HP CMU database
Invoking the scan node procedure to automatically add several nodes
Adding, deleting, or customizing HP CMU groups
Managing the system images stored by HP CMU
Configuring actions performed when a node status changes such as display a warning,
execute a command, or send an email
Exporting HP CMU node list in a simple text file for reuse by other applications
Importing nodes from a simple text file into HP CMU Database
HP CMU configuration
– 1. Start HP CMU on the management node.
– 2. Start the GUI client on the GUI workstation.
– 3. Scan the compute nodes.
– 4. Create the network entities.
– 5. Create the golden image. More than one golden image can be
– 6. Create the logical groups and user groups.
– 7. Backup each golden image in its logical group.
– 8. Clone the compute nodes.
– 9. Deploy the management agent on the compute nodes.
− Install monitoring rpm.
− Ping all nodes from the management node.
Compute node administration
– Booting and powering off using the management card of the compute
– Broadcasting a command to selected compute nodes using a secure
shell connection or a management card connection
– Direct node connection by clicking a node to open a secure shell
connection or a management card connection
System disk replication
– Creating a new image. While backing up a compute node system disk,
you can dynamically choose which partitions to backup
– Replicating available images on any number of compute nodes in the
– Managing as many different images as needed for different software
stacks, different operating systems, or different hardware
– • Cloning from one to 4096 nodes at a time with a scalable algorithm
which is reliable and does not stop the entire cloning process if any
nodes are broken
– • Customizing reconfiguration scripts associated with each image to
execute specific tasks on compute nodes after cloning
Compute node monitoring
– You can monitor up to 4096 nodes using a single window.
– HP CMU provides the connectivity status of each node as well as
– HP CMU provides a default set of sensors such as CPU load, memory
usage, I/O performance, and network performance.
– You can customize this list or create your own sensors. You can
display sensor values for any number of nodes.
– Information provided by HP CMU is used to ensure optimum
performance and for troubleshooting.
– You can fix thresholds to trigger alerts. All information is transmitted
across the network at time intervals, using a scalable protocol for realtime monitoring.
A Frame containing the nodes in the cluster
B Tree structure of nodes
C All possible states of a node
D Drop-down menu to view nodes based on classification
• Network Entity
• Logical Group
• User Group
E Tool bar containing start, stop, and refresh buttons. The green LED appears
when monitoring polls the cluster nodes.
F Menu bar
G Title of the figures and tables displayed in the main frame
H The main frame displaying active monitoring and configuration information
Example of building a
supercomputer with CMU
System mgr environment
HPC Open source,
Dev. Job tools in CMU
Auto discover and config
HP Cluster Platform
HP Cluster Test Suite
(cluster hw diagnostics)
Provisioning a cluster stack with CMU
– Scan node automatically registers the nodes with their network
– Install and configure the cluster stack on one “golden” compute node
Use CMU kickstart/autoyast tools to install a new OS
Install workload scheduler/MPI/LVP/HP-MPI/applications/etc. for their environment
Configure their existing user accounts, filesystems, etc.
– Backup the compute node image with CMU into a repository
Customer choice: disk image or diskless?
– Clone/Distribute this “golden image” to the rest of the cluster
While cloning, perform automatic firmware updates if needed
Perform post-cloning and node-customization tasks
– Provision multiple stacks on multiple sets of nodes if desired
– Use the image editor for minor image modifications without a full backup
CMU Diskless Support
– Based on NFS-root design
Golden node is installed on disk, then copied to CMU mgt node
Root file-system is read only and shared among all compute nodes
Specific read-write directories are created for each node and mounted on each node
List of read-write directories and files is customizable
– Diskless cluster requirements
1 NFS server for each 256 nodes (CMU supports configuring multiple NFS servers)
4 GB non-SATA storage for each compute node in NFS server
Additional NFS servers as part of HA solution for NFS
Network infrastructure optimized for diskless needs (1Gb/10GbE & no bottlenecks)
– CMU recommends disk-based clusters
Easier to manage; more cost-effective; no single points of failure
Security is the only advantage of diskless (all data resides on NFS server)
Disk failures are a valid concern, but CMU cloning restores image on new disk fast
Better diskless performance is a myth because the same tunings required for diskless
can be done for disk-based clusters
CMU GUI Basics
image; or by
of each node
along the bottom
CMU GUI Basics
Right-click in the main
area to select which
sensors to display
sensors: CPU and
memory usage, and
disk and network I/O
Simple to add any
sensor or alert
CMU provides simple
support for monitoring
GPU temp and ECC
errors on sl390s
HP CMU monitoring interface – large cluster view
CMU Remote Management Commands
CMU Remote Management Commands
Multi-window broadcast command (access OS or console)
… and see it there!
CMU Remote Management Commands
with cmudiff example
a set of selected
…finds one node
running with an
old BIOS version!
Worldwide CMU Deployments
GOVERNMENT and RESEARCH LABS
HP Enabling HPC Innovation,
Affordability and Efficiency
Holistic energy mgmt portfolio
Modular and adaptable solutions
Worldwide expertise and
• link to white papers, documentation
• CMU Forum on IT Resource Center
Outcomes that matter.