Building an HP Insight CMU Linux Cluster

Transcription

Building an HP Insight CMU Linux Cluster
Building an HP Insight
CMU Linux Cluster
WW Hyperscale R&D Division, ISS
June 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Agenda
• Insight CMU Brief Overview
• Installing Insight CMU
• Adding cluster nodes to Insight
CMU
• Installing a Linux OS
•
•
•
Autoinstalling a node
Backing up and cloning an image
Creating and deploying a diskless image
• Configure Insight CMU Monitoring
•
Monitoring options: collectl & iLO4
If you are clustering Linux servers, then you are
building a supercomputer. Make sure it acts like one.
HP Insight Cluster Management Utility (CMU)
Hyperscale cluster lifecycle management software
Provision
Monitor
Control
• Simplified discovery,
firmware audits
• Industry-standard baremetal installation
• Fast and scalable cloning
• Diskless support
• ‘At a glance’ view of entire
• GUI and CLI options
system; zoom to component
• Easy, friction-less control
• Customizable
of remote servers
• Lightweight and efficient
• Scalable auditing and
management
• Instant view (2D)
• Time view (3D)
• 10 years+ in deployment, included Top500 sites with1000s of nodes
• Built for and designed around Linux -scales to 4,000 nodes per cluster
• HP supported, available as factory-integrated cluster option
Typical Cluster Topology
CMU
Site Network
Console
Network
Mgmt Network
Head node
Moonshot Network Topology
CMU Management Network
Moonshot Chassis
CMU Head Node
Moonshot Chassis
Moonshot Chassis
Site Network
Moonshot Chassis
iLO CM Network
2nd Network
Installing Insight CMU
Cluster head node
• Install standard Linux distribution
• Copy the Linux Distribution ISO to the head node as a local repository
Cluster head node
• Configure static site network, cluster management network, and iLO network
•
For this cluster, iLO network (172.22.x.x) and management network (172.20.x.x) are the same
Site network
Management/iLO
network
Install Oracle java on the cluster head node
• For CMU v7.1, this must be java version 1.6 update 26 or newer
Install Insight CMU
Copy the Insight CMU RPM and Insight CMU license to the cluster head node
Use ‘yum’ to install Insight CMU to resolve all dependencies
•
•
•
•
Manually install ‘/usr/lib64/libXdmcp.so.6’ first to resolve missing dependency in Insight CMU v7.1 RPM
Install Insight CMU license file in /opt/cmu/etc/cmu.lic
Configure Insight CMU
• Run ‘/opt/cmu/bin/cmu_mgt_config –c’ to configure the cluster head
node for use by Insight CMU
• This tools performs various checks and configures the cluster head node as a
PXE-boot server on the management network.
• PXE-boot support is required to remotely install a Linux OS onto a set of compute
nodes.
• If there are any errors, please correct them and rerun
‘/opt/cmu/bin/cmu_mgt_config –c’.
• If you don’t plan on using CMU to install an OS, you can skip this step.
•
Make sure that password-less ssh for root works between all nodes!
Configure Insight CMU
Configure Insight CMU
Configure Insight CMU
Start Insight CMU
• Before starting
Insight CMU,
‘unset_audit’ to
de-activate
‘audit’ mode
•
‘audit’ mode is
used when Insight
CMU is installed in
a High-Availability
configuration
Launch the Insight CMU webpage and GUI
• Make sure your desktop/laptop has a recent version of Oracle java installed
• Enter the head node IP address in your browser
Click to launch the
Insight CMU GUI
Click ‘yes’ to accept
installing the OpenGL
library support
needed by Insight
CMU Timeview
Insight CMU GUI
Cluster resources
organized in groups
Main Monitoring
Display
Cluster Administrators need an X-Server
• Anyone can launch the Insight CMU GUI for monitoring.
•
Administrative tasks are disabled by default.
• For Administrators, the Insight CMU v7.1 GUI will execute administrative
tasks within an xterm. This xterm is launched from the cluster head node
and displays on your desktop/laptop.
• Administrators need a X-server running on their desktop/laptop that can
accept and display these Insight CMU xterm windows.
• Desktops/laptops running a Linux or Apple OS come with X-server
software.
•
Need to configure the X-server to accept X-display requests from the cluster head node.
• Desktops/laptops running Windows need to install an X-Server.
•
•
Insight CMU recommends Xming – free, lightweight X-server available on the web.
Cygwin, ReflectionX, other X-server software works fine too.
Cluster Administrators need an X-Server
Snapshot of
Xming Xlaunch
screen #3
Need to disable Access
Control to allow cluster
head node to display
xterms on the local
desktop/laptop when
requested by the GUI
Logging into the Insight CMU GUI
Administrators log
in here with cluster
head node ‘root’
user credentials
Add cluster nodes to Insight CMU
Cluster Node Preparation
Each compute node should be pre-configured with:
•
1.
Consecutive static iLO IP addresses
•
•
1.
2.
Common iLO username and password
Boot order: network PXE-boot before hard disk boot
•
1.
iLO NIC is set to DHCP by default.
Insight CMU requires iLO static IP addressing to avoid DHCP and to assign node order (see ‘scanning
nodes’).
Required for remotely installing an OS on the nodes.
Virtual Serial Port attached to COM1
•
•
•
Embedded Serial Port is assigned to COM1 by default (for connecting display to server).
Insight CMU recommends assigning Virtual Serial Port to COM1 so that the Linux kernel console activity
can be viewed remotely via the iLO.
If Virtual Serial Port is assigned to COM2, then add ‘console=ttyS1’ to Linux kernel arguments to view
console activity remotely via iLO (‘console=ttyS0’ is the default Linux kernel argument setting).
Cluster Node Preparation
•
Other recommended BIOS settings (not required by Insight CMU):
–
Drive Write Cache: Enabled (SATA disks only; this setting is ‘disabled’ by default)
–
HP Power Savings Mode: “High Performance” for HPC workloads, or ‘Dynamic’
for power savings
–
Intel Hyperthreading: ‘enabled’ or ‘disabled’ depending on your workload
–
Turbo Boost: ‘enabled’ or ‘disabled’ depending on your workload
Adding cluster nodes to Insight CMU
• Node information needed by Insight CMU:
•
•
•
Hostname, IP Address, Netmask, MAC Address, Management Card Type, Management Card
IP, and Architecture (and cartridge ID and node ID for each node in a Moonshot Chassis).
Hostname, IP Address, Netmask, and MAC Address for the Management Network
(information for additional networks for each node configured elsewhere).
Management Card Type can be ‘none’ (only ‘ssh’-based power control available; MAC
addresses cannot be scanned; no serial console access).
• MAC address needed for installing an OS
•
•
•
Insight CMU provides a ‘Scan Nodes’ tool for querying the MAC Address from the iLO
Node information can also be entered manually
If Insight CMU is not used to install an OS, enter fake MAC Address (ex. ‘AA-BB-CC-DD-EE-FF’)
Managing Nodes via the Insight CMU GUI
Launch the Node Management Window
Add node manually
Scan nodes via iLO
Scanning nodes with the Insight CMU GUI
Options are ‘ILO’, ‘lo100i’, and ‘ILOCM’ (for
Moonshot Chassis). IPMI also available.
Hostname syntax. ‘%i’ provides node numbering.
Other format keys available for Moonshot servers.
Scanning starts with first iLO IP address. The
first node hostname and node IP address are
assigned to the server with that iLO IP address.
Then both IP addresses and the node hostname
are incremented and the process continues until
all nodes are scanned.
Scanning nodes with the Insight CMU GUI
Enter common iLO username and
password. Insight CMU will cache
these values.
Managing iLO access with Insight CMU
• Insight CMU stores the iLO access credentials
• If you need to update or change these values, use the Insight CMU CLI
Scanning nodes with the Insight CMU GUI
Check to ensure scanned MAC
addresses are valid. If there are
errors, check access to iLO.
If the node information looks
good, then add these nodes to
the Insight CMU database.
This option replaces all of the
existing nodes with the scanned
nodes.
Adding Nodes with Insight CMU commands
• Configure iLO credentials with ‘/opt/cmu/cmucli’ (see previous slide).
• Add a node with ‘/opt/cmu/bin/cmu_add_node’
•
Run ‘/opt/cmu/bin/cmu_add_node –h’ for details
• Scan nodes with ‘/opt/cmu/bin/cmu_scan_macs’
•
Run ‘/opt/cmu/bin/cmu_scan_macs –h’ for details
• Add or scan nodes with the Insight CMU CLI
•
•
•
•
Run ‘/opt/cmu/cmucli’ to enter the Insight CMU CLI
Type ‘help’ at the ‘cmu>’ prompt to see the list of commands
Type ‘help <command>’ to view the command syntax (double quotes are important!)
‘add_node’ to add a node; ‘scan_macs’ to scan nodes
Configure network topology in Insight CMU
• Nodes in a Network Entity share a common “leaf-level” network switch
Launch the Network
Entity Management
Window
Create a Network
Entity (ex. ‘rack1’)
Select the Network
Entity
Select nodes from the
left-hand side and add
them to the Network
Entity (right-side)
Installing a Linux OS
Installing a Linux OS with Insight CMU
• Insight CMU provides ‘autoinstall’ support
•
•
Remotely installs a Linux OS on a “bare-metal” server (for initial OS installation)
Remotely installs a new Linux distribution on a server (for upgrading to a newer Linux
distribution)
• Once one server is installed with Linux, Insight CMU provides 2 methods of
distributing that image to the other cluster nodes
1. Insight CMU ‘backup and clone’ archives the “golden node” image on the cluster head node
(backup), and then copies or clones it to the other cluster nodes
2. Insight CMU diskless support copies the “golden node” image to one or more NFS servers
and boots the other cluster nodes with support for mounting that “golden node” image via
NFS
Insight CMU AutoInstall Support
• Enable autoinstall support in the Insight CMU GUI (do this once and restart GUI)
• Export the repository via NFS (do this for each Linux distribution)
A sample AutoInstall Script (kickstart 1 of 3)
[root@cmu1 ~]# cat centos63.cfg
#
# General config options
#
install
nfs --server=CMU_MGT_IP --dir=CMU_REPOSITORY_PATH
lang en_US.UTF-8
keyboard us
skipx
#
# Network setup
#
network --onboot=yes --bootproto=static --ip=CMU_CN_IP --netmask=CMU_CN_NETMASK -hostname=CMU_CN_HOSTNAME
#
# Security and time
#
rootpw changeME
A sample AutoInstall Script (kickstart 2 of 3)
firewall --disabled
authconfig --enableshadow --enablemd5
selinux --disabled
timezone --utc America/New_York
#
# Disk partition information
#
bootloader --location=mbr
zerombr
clearpart --all --initlabel
part /boot --fstype ext2 --size=1024 --ondisk=sda --asprimary
part swap
--size=1024 --ondisk=sda --asprimary
part /
--fstype ext4 --size=1
--ondisk=sda --asprimary --grow
#
# Reboot after all packages have been installed
#
reboot
A sample AutoInstall Script (kickstart 3 of 3)
#
# Packages
#
%packages
@system-admin-tools
@core
@base
@network-server
@development
expect
tcl
openssl
nfs-utils
emacs
openssl-devel
%end
[root@cmu1 ~]#
Creating an Insight CMU AutoInstall Group
• Note that the partitions defined in the sample AutoInstall script are
formatted with a native filesystem format
•
•
CMU cannot backup or clone a Logical Volume (LVM) or software RAID filesystem
Kickstart by default uses LVM, so make sure you specify a native format like ext4
• Launch the “New AutoInstall Logical Group” Window
Creating an Insight CMU AutoInstall Group
Give the new group
a name
Configure the
location of the
image repository
Configure the
path to the
autoinstall
script
Creating an Insight CMU AutoInstall Group
Add Nodes to an Insight CMU Logical Group
Launch ‘Logical Group Management’.
Select the Logical Group by name.
Select nodes from left-hand side and
use arrow to move them to the righthand side.
Nodes are now members of the
group but they are not active in the
group until Insight CMU successfully
installs the group image on them.
AutoInstalling a node with Insight CMU
Select a node and right-click to
launch the remote management
menu.
Select ‘AutoInstall’
AutoInstalling a node with Insight CMU
Select the AutoInstall Group
and click ‘OK’.
Click ‘OK’ to begin.
AutoInstalling a node with Insight CMU
Monitor progress and watch for errors.
You can also select ‘Virtual
Serial Port Connection’ from the
remote management menu and
watch the autoinstall process
via the serial console.
The node ‘ping’ status will eventually
turn green when it is powered up and
connected to the network.
Successful autoinstall of a node via Insight CMU
When the autoInstall process
has completed, the node will
reboot into the newly
installed image.
Building a Golden Node with Insight CMU
• At this point the autoinstalled node is ready for a software stack.
User can log into the node and install their software, setup user accounts, create mountpoints, etc.
•
• Install the Insight CMU Monitoring Agent
To setup password-less ssh keys for the root account (required for monitoring), run the following
command from the cluster head node:
•
/opt/cmu/tools/copy_ssh_keys.exp <nodename> [root password]
•
If you don’t provide the root password, you will be prompted for it
Install the CMU Monitoring Agent from the Insight CMU GUI
•
•
Select the node, right-click, and select ‘Update->Install CMU Monitoring Client’
[optional] Add local users to the Cluster Head Node
Add user account
Confirm local home
directory is created
Export /home to the
rest of the cluster to
create shared /home
Note userID and
groupID
[optional] Add local users to Golden Node
Log into Golden Node
Mount /home from
cluster head node
Create same user
with existing /home
Confirm user has
same userID and
groupID (or fix)
[optional] Install Collectl
• Collectl is an open-source tool for gathering performance data from a server
[optional] Integrate Collectl with Insight CMU
• Configure collectl on the Golden Node to
provide metrics to Insight CMU
Configure the appropriate
subsystems to monitor
Export the metrics
as key/value pairs
Run collectl in “server” mode
Configure a 5-second metric polling interval
[optional] Configure yum repository
[optional] Install dependencies for HP conrep
• Configure Insight CMU to use the correct conrep binary
•
Configure the CMU_BIOS_SETTINGS_TOOL setting in
/opt/cmu/etc/cmuserver.conf
•
Insight CMU v7.1 contains the latest conrep binary in
/opt/cmu/ntbt/rp/opt/hp/hp-scripting-tools/bin/conrep
[optional] Install dependencies for HP conrep
Select the Golden Node, right-click, and choose ‘AMS->Show BIOS Settings’
This will likely fail the
first time on newer OS
versions. This is
because the HP conrep
tool for extracting BIOS
settings is a 32-bit
binary, and the newer
OS versions no longer
install 32-bit support
by default.
[optional] Install dependencies for HP conrep
To fix the conrep
dependencies, log
into the golden node
and run ‘conrep –h’ to
identify each
dependency and then
install it
Insight CMU installs
conrep on each node in
/opt/cmu/tmp/conrep
[optional] Installing SLURM
• SLURM is an open-source workload scheduler
• Build and Install SLURM on the cluster head node with
/home/slurm/slurm.conf as the shared central config file
(everything else in /usr/local/)
• Install SLURM on the Golden Node
• Configure keys and /etc/init.d/slurm
• Configure slurm.conf with nodes, logdirs, scheduler, etc.
• Details TBD
[optional] Installing Hadoop
• Hadoop is a framework for processing unstructured data
• Download and install hadoop on the cluster head node
• Configure the files in the conf/ directory (all data goes on other
disks – NOT on the local OS disk)
• Do NOT configure HDFS yet!
• Configure HDFS after cloning
• Install hadoop on the Golden Node
• Copy config files in conf/ from cluster head node to Golden Node
• Details TBD
Insight CMU Backup Support
• Create a new Logical Group for the image to be backed up
Creating a disk-based image group
Give the Logical Group a name
Configure the device from which the
image is to be taken (‘sda’ is most
common; check by running ‘mount’
or ‘df’ on the golden node).
Optional: add nodes to this group from
an existing group. These nodes will be
“not active” in this group until CMU
successfully installs this image on them.
Backing up a Golden Node with Insight CMU
• Make sure nodes are added to the Logical Group before Backing Up or Cloning
•
See ‘Add Nodes to an Insight CMU Logical Group’ for details
• Backup the Golden Node
Select the Golden Node; right-click; select ‘Backup’
Backing Up a Golden Node with Insight CMU
Select the Logical Group.
Select the root directory partition. Run
‘df’ or ‘mount’ on the golden node to
determine this (‘sda3’ is the root
partition from the sample AutoInstall
script: /boot is ‘sda1’ and swap is ‘sda2’).
Make sure that this information is
correct before proceeding.
Backing Up a Golden Node with Insight CMU
• Monitor the backup progress
•
•
CMU reboots the golden node into a CMU diskless environment and archives the partitions on the given disk
Watch for and address any errors that may come up
Backing up a Golden Node with insight CMU
• When the backup is finished, the golden node is rebooted back into its OS
Contents of an Insight CMU Archived Image
• The archived image is in ‘/opt/cmu/image/<logical_group>/’
/boot (sda1)
/ (root: sda3)
reconf.sh is the
post-cloning
script for this
image
reconf.sh: the Insight CMU post-cloning script
• This script is where all per-node post-cloning activities are crafted
•
•
•
Insight CMU provides pre-set per-node environment variables for use by this script (see script for details)
Configuring additional networks is the most common per-node post-cloning task
The following example configures IB-over-TCP by appending the last two octets of the management
network IP address for each node to the IB subnet (192.168.x.x):
### Insert your custom reconfiguration scripts here
## reconfiguring ib0
IBCONF=${CMU_RCFG_PATH}/etc/sysconfig/network-scripts/ifcfg-ib0
IBSUBNET=192.168
TMPFILE=/tmp/cmu-tmp-$$
grep -v IPADDR ${IBCONF} > ${TMPFILE}
IPSUFFIX=`echo $CMU_RCFG_IP | awk -F. '{print $3 "." $4}'`
echo IPADDR=${IBSUBNET}.${IPSUFFIX} >> ${TMPFILE}
mv ${TMPFILE} ${IBCONF}
Modifying an Insight CMU Archived Image
• An archived image can be unpacked so that changes can be applied
•
/opt/cmu/bin/cmu_image_open –i <logical group>
•
•
Useful for updating the archived image without having to perform a complete backup.
Run ‘chroot image_mountpoint’ to “enter” the image and make changes as if the image were the
local filesystem (type ‘exit‘ when done).
•
If the unpacked image is corrupted, simply delete the ‘image_mountpoint’ directory and unpack another
copy.
• When modifications are completed, repack the archived image
•
/opt/cmu/bin/cmu_image_commit –I <logical group>
•
•
The original archive is preserved.
The modified archive is ready for cloning.
Modifying an Insight CMU Archived Image
Cloning an Insight CMU Archived Image
Select the set of nodes to clone
Select cloning
Cloning an Insight CMU Archived Image
Select the image to clone
Confirm this information
before proceeding
Cloning an Insight CMU Archived Image
The first node in each
Network Entity will boot
up, partition and format
the local disk, and then
receive the archive. Then
it will wait for the rest of
of the nodes.
The rest of the nodes will
boot up, partition and
format their disks, and
then receive the archive
from the first node.
Cloning an Insight CMU Archived Image
When everyone has the
archived image, they all
unpack it, run the
post-configuration
script, and then reboot.
The first node waits for
the other nodes to
finish cloning and reboot
before it reboots.
Cloning an Insight CMU Archived Image
[optional] Configure HDFS for Hadoop
• When the cloned nodes come back up, then you can format
and mount the other disks, configure the HDFS, and start
hadoop
• Use Insight CMU management tools such as Multi-Window
Broadcast and/or Pdsh with CMU_Diff to format the other
disks in parallel.
• Details TBD
Diskless Topology
CMU
Diskless
Compute
Nodes
CMU Head Node
And NFS Server
NFS Servers
Insight CMU Diskless Support
• Enable Insight CMU diskless support
•
•
Enable diskless support in /opt/cmu/etc/cmuserver.conf and restart the Insight
CMU GUI
Install the system-config-netboot-cmd-cmu RPM from the Insight CMU ISO on the cluster
head node
Insight CMU Diskless Support
• Enable tftp diskless support
• Add the /tftpboot directory to the tftp configuration for system-confignetboot diskless support
Insight CMU Diskless Support
• Make sure that the Golden Node image is ready to become a diskless image
•
•
Add the required packages for Insight CMU diskless support
Ensure all software is installed and configured on the Golden Node
Workaround: Skipping the Diskless OS Check
• Insight CMU v7.1 diskless support has been qualified for use with CentOS 6.3 via
the exception process, but the Insight CMU v7.1 code still guards against using
unqualified Linux OS distributions
•
The workaround is to replace the code that performs the OS check with a dummy script
Creating a Diskless Image from a Golden Node
• Create a new Diskless Logical Group
Creating a Diskless Image from a Golden Node
Give the new diskless group a name
Click the ‘diskless’ check box
Provide the Golden Node
Click on ‘Get Kernel List’ to get a list of the
available kernels from the Golden Node
Select the kernel to be used as the diskless
kernel
Don’t add clients from another group to
this group. Click ‘OK’ when finished.
Creating a Diskless Image from a Golden Node
Preparing an Insight CMU Diskless Image
• The Diskless Image is stored in /opt/cmu/image/<logical group>
Script run after image is created
Script run after each node is
added to the group
Directory containing the readonly ‘root’ filesystem image
Directory containing the pernode read-write filesystems
Preparing an Insight CMU Diskless Image
• Make any cluster-wide changes to the read-only root
filesystem
• Example: add NFS mountpoints to etc/fstab (this file is modified for diskless
support)
• Script these changes in reconf-diskless-image.sh so
that they can be made automatically if the image is recreated
• The diskless logical group can be deleted and re-created to reload the image and
re-run the local reconf-diskless-image.sh
Adding Nodes to an Insight CMU Diskless Image
• Identify and configure per-node read-write files and directories in
snapshot/files.custom
•
•
Ex. …/ifcfg-ethX or …/ifcfg-ib0 files for additional networks (full filesystem pathname
required)
The snapshot/files file is provided by Insight CMU and should remain unedited
• Configure any per-node changes in reconf-diskless-snapshot.sh
•
•
Ex. changing IPADDR in additional network files
Insight CMU per-node environment variables are available, similar to reconf.sh file for postcloning
• When ready, add the nodes to the diskless logical group
Adding Nodes to an Insight CMU Diskless Image
Select nodes from the
left-hand side and add
them to the group
Snapshot directory
for each node is
created and
configured
Nodes are not active
in this group until
they are booted into
this image
Adding Nodes to an Insight CMU Diskless Image
• The per-node snapshot directories are where the per-node read-write files and
directories are stored and edited
•
•
The files and directories listed in the snapshot/files and snapshot/files.custom files are
mounted from the per-node snapshot directory over the appropriate locations in the read-only root
filesystem image when each node boots up
Ex. when a node writes to /var/log/messages, that file exists on the NFS server in
/opt/cmu/image/<diskless_group>/snapshot/<nodename>/var/log/messages
Booting Diskless Nodes with Insight CMU
Select the nodes to boot diskless; right-click
and select ‘Boot’
Select ‘network’ and then select the logical group
And click ‘OK’
Booting Diskless Nodes with Insight CMU
The nodes will get configured in the DHCP
server and then get rebooted
Insight CMU Diskless Filesystem
Configure Insight CMU Monitoring
Insight CMU Monitoring
Insight CMU Default Metrics and Alerts
Insight CMU ActionAndAlertsFile.txt
•
/opt/cmu/etc/ActionAndAlertsFile.txt
•
•
Single file for configuring all metrics, alerts, and reactions
Edit file on the cluster head node
•
•
•
•
One line per metric / alert / reaction
Action Format:
<name> <description> <interval> numerical Instantaneous|MeanOverTime <max> <unit> <action command>
<name> <description> <interval> string Instantaneous|MeanOverTime <unit> <action command>
Alert Format:
<name> <message> <level> <interval> <threshold> <comparison_operator> <unit> <action command>
Reaction Format:
<alert_name> <message> ReactOnRaise|ReactAlways <command(s) to run>
Insight CMU ActionAndAlertsFile.txt
• Metric example: This is how Insight CMU obtains CPU load from each server
Description of ‘cpuload’ metric
Metric name: ‘cpuload’
Max expected value: 100%
Metric unit: ‘%’
Interval: gather ‘cpuload’ every 5 seconds
(set to ‘2’ for every 10 seconds; set to 12
For every 60 seconds)
Command to run to obtain
the ‘cpuload’ metric
Metric value
is a number
Sum this value with the previous
value and divide by elapsed
seconds (based on interval)
Restart Insight CMU Monitoring
• Restart Insight CMU monitoring after making changes to the
ActionAndAlertsFile.txt file
• Stop and then start the Insight CMU Monitoring Engine
[optional] Configure Insight CMU to gather Collectl data
• The Insight CMU ActionAndAlertsFile.txt file supports gathering metrics
from collectl
•
•
In the ‘Action Command’ field, add the keyword COLLECTL followed by the collectl metric names (simple
arithmetic is supported)
Example: this is one way to gather CPUload using collectl:
cpuload "% cpu load (normalized)" 1 numerical
Instantaneous 100 % COLLECTL 100 - (cputotals.idle)
•
The Insight CMU ActionAndAlertsFile.txt file includes pre-configured collectl-based metrics
•
Comment out the Insight CMU native metrics and uncomment the collectl-based metrics to enable gathering metrics from
collectl
[optional] Configure Insight CMU to gather Collectl data
• Run collectl to get a list of the collectl metric names for use in the
ActionAndAlertsFile.txt
Don’t forget to
include the
subsystems of
Interest!
[optional] Configure Insight CMU with iLO4 Metrics
• iLO4 in the HP Proliant Gen8 servers can be configured to provide server metric
data via a public SNMP port
• Insight CMU can gather this server data directly from iLO4 for monitoring and
analysis
•
Hardware metric data is obtained “out-of-band” to avoid disruptions in the OS layer
• This process employs Insight CMU “EXTENDED” Monitoring
•
•
•
“Extended” monitoring provides support for processes outside of Insight CMU to gather metrics and
submit them to Insight CMU for display and analysis
/opt/cmu/bin/cmu_submit_extended_metrics is the Insight CMU command to submit
metrics to Insight CMU
Submitted metrics must be pre-configured in the Insight CMU ActionAndAlertsFile.txt file
with the EXTENDED keyword
[optional] Configure Insight CMU with iLO4 Metrics
• The Insight CMU iLO4 support requires a file containing a list of the servers with
iLO4
•
If all servers in the Insight CMU cluster contain iLO4 then use /opt/cmu/bin/cmu_show_nodes to
create a nodefile.
• Run /opt/cmu/bin/cmu_config_ams –f <nodefile> to configure
iLO4 with a public SNMP port and to enable Agentless Monitoring
• Run /opt/cmu/bin/cmu_config_ams –c to add Agentless Monitoring
support to the Insight CMU GUI via the Insight CMU Custom Menu Support
•
Insight CMU Custom Menu Support is configured in /opt/cmu/etc/cmu_custom_menu
[optional] Configure Insight CMU with iLO4 Metrics
[optional] Gather and Review iLO4 Metrics
• Now that the iLO4 SNMP port is enabled, you can get a dump of all of the
available information from iLO4 and view it using the new “AMS” custom
menu commands in the Insight CMU GUI
•
Restart the Insight CMU GUI for the custom menu changes to take effect
[optional] Gather and Review iLO4 Metrics
Select the nodes, rightclick, and select AMS->
Get/Refresh SNMP Data
to get a data dump
When the data download
finishes, this pop-up
window will appear
[optional] Gather and Review iLO4 Metrics
• The iLO4 Metric data is stored in
/opt/cmu/tmp/snmp_node_data/<nodename>.raw
• CMU reads the MIBs stored in /opt/cmu/snmp_mibs/ and adds humanreadable text to each SNMP OID (and value, where applicable), and stores these
results in /opt/cmu/tmp/snmp_node_data/<nodename>.txt
[optional] Gather and Review iLO4 Metrics
Data can be viewed in
the GUI by selecting
the node(s), rightclicking, and selecting
AMS->View/Compare
SNMP Data
The data is filtered
through CMU_Diff,
in case multiple nodes
were selected, to
highlight any
differences
[optional] Configure Insight CMU with iLO4 Metrics
• /opt/cmu/bin/cmu_get_ams_metrics is the program to gather a preconfigured list of SNMP metrics from the iLO4 management cards of a given list
of nodes and submit the data to Insight CMU
•
The default pre-configured list of SNMP metrics is in the /opt/cmu/etc/cmu_ams_metrics file
[optional] Integrate Insight CMU with iLO4 Metrics
[optional] Integrate Insight CMU with iLO4 Metrics
• The /opt/cmu/bin/cmu_get_ams_metrics program requires the
snmpget command
[optional] Integrate Insight CMU with iLO4 Metrics
• Test the /opt/cmu/bin/cmu_get_ams_metrics command with the
configured SNMP metrics from /opt/cmu/etc/cmu_ams_metrics using a
subset of nodes
•
The ‘-d’ option displays the results to the screen instead of submitting them to Insight CMU
[optional] Integrate Insight CMU with iLO4 Metrics
• Configure the EXTENDED iLO4 metrics in the ActionAndAertsFile.txt file
•
Commands configured after the EXTENDED keyword will be executed by Insight CMU at the configured
interval
Interval set to ‘4’ to poll iLO4 every 20 seconds
Configure only the
first EXTENDED
metric with the
program that will
gather the other
EXTENDED metrics
[optional] Integrate Insight CMU with iLO4 Metrics
• Restart Insight CMU monitoring and the GUI to view the new metrics
[optional] Check BIOS Versions with Insight CMU
Select the nodes, right-click, and choose ‘AMS->Show BIOS Version’
The BIOS Vendor,
Version, and Release
Date for each node is
obtained, organized,
and displayed in a
concise format
[optional] Check BIOS settings with Insight CMU
Select the nodes, right-click, and choose ‘AMS->Show BIOS Settings’
The output of conrep
is saved to a file in
/opt/cmu/tmp/conrep
on each node, and
then the file is
displayed via CMU_Diff
to highlight any
differences.
More information available at:
http://www.hp.com/go/cmu
Thank you!
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.