SightLine Power Agent for VOS Systems User`s Guide

Transcription

SightLine Power Agent 
for VOS Systems
User’s Guide
version 6.1
Stratus Computer
R464-01
Notice
The information contained in this document is subject to change without notice.
UNLESS EXPRESSLY SET FORTH IN A WRITTEN AGREEMENT SIGNED BY AN AUTHORIZED
REPRESENTATIVE OF STRATUS COMPUTER (DE), INC., STRATUS MAKES NO WARRANTY OR
REPRESENTATION OF ANY KIND WITH RESPECT TO THE INFORMATION CONTAINED HEREIN,
INCLUDING WARRANTY OF MERCHANTABILITY AND FITNESS FOR A SPECIFIC PURPOSE.
Stratus Computer (DE), Inc., assumes no responsibility or obligation of any kind for any errors contained
herein or in connection with the furnishing, performance, or use of this document.
The software described in Stratus documents (a) is the property of Stratus Computer Systems, S. a. r. I.,
Luxembourg or the third party, (b) is furnished only under license, and (c) may be copied or used only as
expressly permitted under the terms of the license.
This document is protected by copyright. All rights are reserved. No part of this document may be copied,
reproduced, or translated, either mechanically or electronically, without the prior written consent of Stratus
Computer.
Stratus, the Stratus logo, Continuum, XA, Continuous Processing, StrataLINK, and StrataNET are
registered trademarks of Stratus Computer, Inc.
SightLine and Power Agent are trademarks of FORTEL Inc.
All other trademarks are registered to their respective owners.
Manual Name: SightLine Power Agent for VOS Systems User’s Guide
Part Number: R464
Revision Number: 01
SightLine Version Number: 6.1
VOS Release Number: 13.5
Printing Date: March 2002
Stratus Computer (DE), Inc.
111 Powdermill Road
Maynard, Massachusetts 01754-3409
© 2001–2002 by Stratus Computer Systems, S. a. r. I., Luxembourg. All rights reserved.
Preface
Preface
The SightLine Power Agent for VOS Systems User’s Guide describes how to install and use SightLine ,
a Stratus client/server software product that monitors, analyzes, and reports on the performance of
computer systems both historically and in real-time. SightLine runs on all Stratus Continuum models
running VOS release 13.5 or later.
This documentation is made up of three parts:
1.
Getting Started, which provides quick instructions for host installation, post-installation
configuration, installation on the PC, and a brief overview on how to begin performance monitoring.
2.
Power Agent, which describes in detail how to install and configure the SightLine host software on
computer systems running the Stratus VOS operating system.
3.
Analysis, which describes the sample files provided on your installation CD.
Document Conventions
This document uses the following typographical conventions.
Item
Convention
Example
Acronyms
All uppercase
CUAFIFO
a.m., p.m.
Lowercase, separated by
periods
9:00 a.m., 12:22 p.m.
Book and guide titles
Title caps, italic type
See the PC User’s Guide for details.
Chapter titles
Title caps, in quotation marks
See Chapter 1, “Introduction.”
Code sample, including
keywords and variables
within text and as separate
paragraphs, and user-defined
program elements within text
Monospace
#include <iostream.h>
Command-line commands
and options (switches)
All lowercase, bold
copy command
Commands on menus and
buttons
Bold; capitalization follows
interface (usually title caps)
/a option
Date and Time
Apply
New Query button
Device names
All uppercase
LPT1 COM1
Preface
iii
Preface
Item
Convention
Example
Dialog box titles
Bold; title caps
Protect Document dialog box
Import/Export Setup dialog box
Dialog box options
Bold; capitalization follows
interface (usually initial caps)
Close all programs and log on as a
different user?
Find Entire Cells Only check box
Key names, key
combinations, and key
sequences
All uppercase
CTRL, TAB
CTRL+ALT+DEL
SHIFT, F7
ALT, F, O
Logical key names
Title caps, bold
Backspace key
Logical operators
All uppercase, bold
AND
XOR
Macros
Bold (if predefined); usually all
uppercase
LOWORD
Menu names
Bold; title caps
File menu
New terms or emphasis
Italic
You can look up entries in the online
index.
Programs and applications
Usually title caps for
application program names.
SightLine; Microsoft Word
Italic type for internal program
names.
The datamgr program.
User input
Generally lowercase,
monospace, unless casesensitive or to match standard
capitalization conventions
Type -ppassword
Windows, named
Title caps
Help window
Windows, unnamed
All lowercase
document window
iv
SightLine for VOS Systems User’s Guide (R464)
Preface
Related Manuals
See also the following guide:
•
SightLine Expert Advisor/Vision User’s Guide (available on the CD-ROM that contains the
EA/V software)
Ordering Manuals
You can order manuals in the following ways:
•
If your system is connected to the Remote Service Network (RSN™), issue the
maint_request command at the system prompt. Complete the on-screen form with all of
the information necessary to process your manual order.
•
Customers in North America can call the Stratus Customer Assistance Center (CAC) at
(800) 221-6588 or (800) 828-8513, 24 hours a day, 7 days a week. All other customers can
contact their nearest Stratus sales office, CAC office, or distributor; see the file
cac_phones.doc in the directory >system>doc for CAC phone numbers outside the U.S.
Manual orders will be forwarded to Order Administration.
Preface
v
SightLine Power Agent for VOS Systems
Getting Started
version 6.1
Contents
Installation Steps
VOS Product Tape
FTP Bundled Image
Post-Installation Configuration
1
1
2
2
Expert Advisor/Vision Software Installation
4
Beginning Performance Monitoring
5
Figures
Figure 1.
SightLine Modules
6
Contents
i
Getting Started
Before you install the SightLine Power Agent software, make sure you can satisfy the following
requirements:
•
Make sure you have the correct kit for your VOS hardware platform and TCP “flavor,” and
the necessary AccessKeys provided by your software provider.
•
You will need about 15 MB of disk space to hold the program modules, and some
additional space (configurable) to hold the collected performance data. You should allow for
at least 25 MB of total disk space.
•
You will need a Windows NT or Windows 2000 Workstation or Server with at least 50 MB
of free disk space and a TCP/IP connection to the VOS system.
®
®
The software is easy to install. If you are familiar with your platform’s installation facility, you can
install SightLine by following the steps in the next three sections:
•
•
•
The first section details the VOS host system installation.
The second section details the SightLine Expert Advisor/Vision software installation on the
workstation.
The third section gets you started monitoring and analyzing your VOS system(s).
Installation Steps
You can install the host kits in one of two ways: from a VOS product tape using
install_new_release, or from an FTP bundled image.
VOS Product Tape
You can install the software directly from the tape by using the following commands:
1.
Load the SightLine Power Agent for VOS tape and run install_new_release, as
described in the Stratus manual VOS Installation Guide (R386-02).
2.
Change to the SightLine directory and run the post-installation command macro
(install_sl.cm). Follow the instructions described in the Post-Installation Configuration
section.
Getting Started
1
Getting Started
FTP Bundled Image
1.
FTP the kit for your VOS hardware platform from the appropriate FTP site. It should be
transferred in binary mode (type = binary). Place the kit in any directory you choose.
SightLine will be installed in a subdirectory.
2.
Create the SightLine directory:
create_dir SightLine
3.
Unbundle the package. You will need to have the necessary command macros
(unbundle.cm) and program modules (decode_vos_file.pm and gzip.pm) installed
and their location defined in your command library path using the add_library_path
command. A description of bundle/unbundle and all the necessary files can be found on the
Stratus public FTP site, ftp://ftp.stratus.com/pub/vos/utility/README.txt.
unbundle kit-name SightLine
4.
Run the post-installation command macro (install_sl.cm), and follow the instructions
described in the next section.
change_current_dir SightLine
install_sl
Post-Installation Configuration
SightLine is now installed in the SightLine directory (called FRTLHOME) and contains the following
subdirectories:
bin
data
etc
log
lib
install
Program modules directory
HostTraceFile directory
Configuration directory
Log files directory
Library files directory
Base files used during the installation process
The next step is to run the post-installation configuration script, install_sl.cm. The macro will
complete the configuration portion of the installation. It will prompt you for the following
information:
•
•
•
•
•
2
the AccessKey
the collection interval
the amount of data to be stored on the host system
the IP address or name of the host system
a 3-character short id that uniquely identifies this host
SightLine Power Agent for VOS Systems: Getting Started (R464)
Getting Started
1.
If you are upgrading from ViewPoint, the following prompt will appear:
There is an existing installation of ViewPoint in FRTLHOME
2.
Enter the Access Key string:
Type the AccessKey string after this prompt.
3.
Enter the data collection interval [default = 30]:
Enter an integer number of seconds for the collection interval.
4.
Enter the data retention period for the host trace file
(example formats: 24h, 10m, or 3d) [default = 10m]:
Enter the desired parameter for the maximum amount of data to be stored locally. This
can be defined in either hours (h), days (d), or megabytes (m). The default is 10m, or 10
megabytes.
5.
You must specify the system_name or IP address of this system
The system name you use must resolve to the IP address of this
system
Enter the system name or the IP address:
Specify an IP address or system name that will resolve to the IP address of the host
system. A PING will be executed to make sure the name entered does properly resolve.
6.
When monitoring data from multiple machines, the ViewPoint/PC
client uses a 3 character identifier to uniquely identify a
particular machine.
You MUST choose a 1 to 3 character string that will make it
obvious which machine a particular metric is coming from.
Enter a new short-id or accept [default = system_name]:
system. A PING will be executed to make sure the name enters does resolve properly.
When these configuration parameters have been entered, the macro will ask if you want to start
the software now. When you answer this prompt, the post-installation configuration will be
complete.
N O T E ————————————————————————Do not start the software at this point if you want to configure the
analyze_system interface. Configure this interface by editing
the FRTLHOME>etc>analyze_system.conf file. If you do not
want this interface, edit FRTLHOME>bin>slagent.cm and set
frtlasi to 0. (Refer to Appendix A for information about the
analyze_system interface.)
Getting Started
3
Getting Started
To start the software manually, enter the following:
change_current_dir FRTLHOME>bin
slagent start
N O T E ————————————————————————You will need to replace FRTLHOME with the full path to your
SightLine directory.
To stop the software manually, pass stop as the first parameter to the slagent macro instead
of start. The command looks like this (substituting the path for FRTLHOME):
slagent stop
The slagent macro can be used to start the ViewPoint Agent processes in diagnostic mode.
Diagnostic mode causes additional information to be written to the log files in the FRTLHOME>log
directory, which can be of use in diagnosing problems. Use this syntax:
slagent [-da ] [ -ds ] [ -dd ] [ -dt ] { start | stop | restart | status }
where:
-da
-ds
-dd
-dt
starts
starts
starts
starts
agentmgr in diagnostic mode
servd in diagnostic mode
datamgr in diagnostic mode
threshd in diagnostic mode
There are other configuration parameters that can be modified. See the remaining chapters in this
User’s Guide for detailed information about configuring the host software to achieve operational
objectives.
I M P O R T A N T ————————————————————Please review the Release Notes accompanying this software
for any information that may have been available too late to be
included in this guide.
Expert Advisor/Vision Software Installation
Microsoft ® Windows NT™ or Windows 2000 higher is required for Expert Advisor/Vision (EA/V)
on the PC. Use the SightLine installation CD-ROM or FTP’ed image to install EA/V as follows:
4
1.
Select Start | Settings | Control Panel | Add/Remove Programs.
2.
Select Add New Programs.
3.
Browse to the CD-ROM or location to which the install image has been transferred using
FTP and select Setup.exe.
4.
Respond to the InstallShield dialog boxes to complete the installation.
Getting Started
If you have any questions about the installation process, more detail is provided in Section 1.3 of
the Expert Advisor/Vision User’s Guide. You are now ready to connect to your system and begin
monitoring the performance of the system.
Beginning Performance Monitoring
The steps required to begin performance monitoring are outlined below. To transfer data from the
host system, a TCP/IP connection is needed between the host system and the PC. More
complete details are provided in Chapter 3 of the Power Agent section of this User’s Guide.
1.
The first time you run EA/V on the PC, and after you enter an AccessKey, the agents that
are on the same subnet as the EA/V workstation will be displayed in the Enterprise View
(the left pane of the EA/V application window). If there isn’t anything showing in the
Enterprise View, right-click on the Enterprise icon and select AutoDiscover.
Right -click on a hostname to examine and modify the settings for a particular host. If there
isn’t an entry for a particular host, you will need to manually add a Network Host Session
as follows:
•
Right -click the Enterprise icon and select New Host.
•
Enter the hostname in the Name field, and specify the host name or IP address in the
Host field.
•
Click Edit to specify the size and location of the PC trace file.
•
Click OK to exit the Configure Network Host Session dialog box.
2.
When connecting for the first time, right -click on the hostname in the Enterprise View and
select Connect Now! If data collection was already started on the host, you will see the
Define Times to Download dialog box, indicating that there is already historical data to
download. Select the times you want to download and click OK. If you do not want to
proceed to live data, clear the Proceed to Live Data box and choose an end time for your
download. If the box is checked, and after all requested historical data has been
downloaded, you will get a new, nearly real-time data block every collection interval (default
= 30 seconds).
3.
If you encounter an error, double-click the Status line next to the connection to view a
communications log.
4.
The standard VOS environment (a set of plots and other objects on multiple pages) should
load as soon as the data transmission begins. If not, load it by choosing File | Open |
Environment from the menu. Each platform’s sample environment files are in a <platform>
directory under \Expert Advisor Vision (for example, \FORTEL SightLine\
Expert Advisor Vision\VOS). When the Open Environment dialog box appears,
follow these steps:
Getting Started
5
Getting Started
5.
6.
•
Browse to the proper directory.
•
Highlight <filename>.VEN by clicking on it.
•
Choose the system you want to analyze in the Force Into Trace System field, and
click OK. A set of standard plots will appear on the screen. They will be updated
with new information every interval.
•
Use the standard Windows controls to maximize and restore each plot.
You are now ready to begin exploring the activity and performance of your system.
•
Create new plots and environments specific to your system and save them for future
use.
•
Set thresholds to detect performance problems and generate alerts.
•
Use AutoAnalyze to produce reports on host activity, exception reports based on
pre-defined thresholds, and a list of recommendations for each exception.
•
Use AutoCorrelate™ to determine the causes of poor performance.
•
Capture interesting plots and import them into a spreadsheet or word-processing
program for annotation or reporting.
TM
To stop the data transfer from the VOS host to the PC, right-click on the hostname in the
Enterprise View and select Disconnect.
Figure 1 illustrates the relationship between the SightLine EA/V workstation and the SightLine
Power Agent on a VOS host:
datamgr
Power
Agent
servd
protomgr
SightLine
Expert
Advisor/Vision
threshd
e-mail
script
Figure 1. SightLine Modules
6
Getting Started
Each part is described briefly here; later sections will describe each component in more detail.
agentmgr
Real-time Agent gathers and provides data to clients.
datamgr
Receives data from agentmgr and manages a performance database.
servd
Listens for EA/V PC download requests and starts protomgr.
threshd
Receives live data from agentmgr and produces alerts.
protomgr
Gets data from datamgr or agentmgr and transmits to EA/V on the PC.
EA/V
Management application for alerting, metric display, performance analysis, and
reporting.
Getting Started
7
Power Agent
version 6.1
Contents
Chapter 1 Introduction
1-1
Chapter 2 Step-by-Step Host Installation
Installation Steps
VOS Product Tape
FTP Bundled Image
Post Installation Configuration
Default Directory Structure
The bin Directory
The data Directory
The etc Directory
The log Directory
Associated Programs and Scripts
The datadump.pm Program
The slagent.cm Macro
The db2vtx.pm Macro
The as_iface.pm Macro
The cvtag43to61.pm and cvtpr43to 61.pm Macros
2-1
2-1
2-1
2-2
2-2
2-4
2-4
2-5
2-5
2-5
2-6
2-6
2-6
2-6
2-6
2-6
Chapter 3 PC-to-VOS Host Configuration Parameters
3-1
Chapter 4 Agentmgr
The Class Hierarchy
Default agentmgr.conf File
The CONFIG Statement
AccessKey Configuration
Nproc Configuration
FILTER Statements
The COMPUTATIONS Section
CLASS
CRITERIA
EXCLUSIVE | INCLUSIVE
COMPUTE
VARIABLE
VARSET
Example of a CLASS Specification
4-1
4-1
4-2
4-8
4-8
4-8
4-9
4-10
4-10
4-11
4-12
4-12
4-13
4-13
4-13
Contents
i
Contents
The Workloads Class
The Processes Class
Class Definition
CRITERIA
VARIABLE
VARSET
Path Meters Class
Registry.csv Metric Description File
agentmgr Command Line Options
4-15
4-15
4-17
4-19
4-20
4-20
4-20
4-21
4-22
Chapter 5 Datamgr
Configuration
Communication
File Structure
Directory Structure
Centralized Database Management
Datamgr Command Line Options
5-1
5-2
5-3
5-4
5-5
5-6
5-8
Chapter 6 Threshd
Configuration
General Structure
Metric Names
String Substitution
Additional String Features
MAILHOST Definition
AGENTMANAGERS Definition
SNMPVARS Definition
SNMPTRAPS Definition
THRESHOLDS Definition
Expressions
Operators for Complex Expressions
Actions
Send E-mail
Send an SNMP Trap
Execute a Script
Messages
Threshd Command Line Options
6-1
6-1
6-2
6-2
6-2
6-3
6-3
6-4
6-4
6-5
6-5
6-7
6-7
6-7
6-8
6-8
6-9
6-9
6-9
Chapter 7 Servd
Configuration
Subscript name truncation
Servd Command Line Options
7-1
7-1
7-2
7-2
Chapter 8 Protomgr
Default protomgr.conf
Data Source Selection
Short ID Definition
Collection Interval Definition
8-1
8-2
8-2
8-2
8-3
ii
SightLine Power Agent for for VOS Systems: Power Agent (R464)
Network Address Translation (NAT) and Firewall Support
Download Throttling
Enable/Disable Metric Selection
Exclusion Section
Data Redefinition
Event Data
Passing protomgr Command Options
Option Flags
8-4
8-4
8-5
8-5
8-6
8-8
8-9
8-9
Chapter 9 Command Syntax — Quick Reference
Agentmgr
Datamgr
Threshd
Servd
Protomgr
slagent Command Macro
9-1
9-1
9-1
9-1
9-2
9-2
9-3
Chapter 10 Troubleshooting
10-1
Appendix A Analyze_system Interface
Configuration
A-1
A-1
Contents
iii
Figures
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
4-1.
4-2.
4-3.
4-4.
4-5.
5-1.
5-2.
5-3.
6-1.
8-1.
8-2.
Regular Expressions
Metrics for the Process Class Stratus VOS
Default Workload Class Definition
Default control.harvest File
Default Line from datamgr.conf File
SightLine EA/V and the Power Agent Components
Default Line from datamgr.conf File
Advanced Session Settings Dialog Box
Summary EventClass Data
4-8
4-11
4-17
4-19
4-21
5-2
5-3
5-6
6-8
8-3
8-9
Figures
v
Chapter 1
Introduction
SightLine is the Stratus client/server software product that monitors, analyzes, and reports on the
performance of computer systems both historically and in real-time. SightLine consists of two
parts:
1.
The SightLine Power Agents, which run on the computer(s) to be monitored (the host(s)).
2.
The SightLine Expert Advisor/Vision (EA/V) application, which runs on a workstation
running Microsoft® Windows NT® or Windows 2000® Server or Workstation.
The Power Agent software reduces the raw performance data and sends it to EA/V on the PC to
be analyzed and displayed. This design minimizes SightLine’s impact on system resources and
gives it a powerful graphical environment for displaying and analyzing system performance.
This guide describes how to install and configure the SightLine Power Agent software on
computer systems running the Stratus VOS operating system. The EA/V software is described in
the SightLine Expert Advisor/Vision User’s Guide, which is available on the CD-ROM that
contains the software.
The intended audience for this material is the VOS system administrator.
The SightLine Power Agent runs on all Stratus Continuum models running VOS release 13.5 or
later.
Version 6.1 of the SightLine Power Agent software for Stratus VOS incorporates fundamental
improvements in design and functionality by introducing Dynamic Symbol Table Changes. In
previous releases of the software, the introduction of new symbols, such as a mounted file
system or a new metric, would require overwriting the trace file on the PC. With the introduction of
Dynamic Symbol Table Changes, new symbols will be merged in with the existing trace file. In
addition, the default metric names have been moved out of the configuration file and are now
contained in the Interface Agent. This greatly improves the modular design of the Interface
Agents by eliminating the need, in the previous version, to add metric name mappings in the
vpcom.conf file in order for a metric to appear in EA/V.
Introduction
1-1
Chapter 2
Step-by-Step Host Installation
Before you install the SightLine Power Agent software, make sure you can satisfy the following
requirements:
•
Make sure you have the correct Power Agent software for your VOS hardware platform and
TCP “flavor,” and the necessary AccessKey provided by your software provider.
•
You will need about 15 MB of disk space to hold the program modules, and some
additional space (configurable) to hold the collected performance data. You should allow for
at least 25 MB of total disk space.
•
You have control over how much disk space will be used on your VOS system for storing
performance data. You can choose to configure the allocation of space by Day, Hours, or
Megabytes for performance data storage. If you configure by Days or Hours, expect
approximately 20 megabytes of disk space required for each day. The size of the Power
Agent trace files depends directly on the size and configuration of the system being
monitored. Aspects such as the Interface Agents loaded, and the number of processes and
disks on the system, can significantly affect the size. The size of trace files can also be
directly affected by altering the number of seconds between samples. The relationship
between resources required, both CPU cycles and disk space, and the sample interval is
fairly linear.
•
You will need a PC or workstation with at least 50 MB of free disk space and a TCP/IP
connection to the VOS system.
The software is easy to install. If you are familiar with your platform’s installation facility, you can
install SightLine by following the steps in the next sections.
Installation Steps
You can install the host kits in one of two ways: from a VOS product tape using
install_new_release, or from an FTP bundled image.
VOS Product Tape
You can install the software directly from the tape by using the following commands:
1.
Load the SightLine for VOS tape and run install_new_release, as described in the
Stratus manual VOS Installation Guide (R386-02).
2-1
2.
Change to the SightLine directory and run the post-installation command macro
(install_sl.cm). Follow the instructions described in the Post-Installation Configuration
section.
FTP Bundled Image
1.
FTP the kit for your VOS hardware platform from the appropriate FTP site. It should be
transferred in binary mode (type = binary). Place the kit in any directory you choose.
SightLine will be installed in a subdirectory.
2.
Create the SightLine directory:
create_dir SightLine
3.
Unbundle the package. You will need to have the necessary command macros
(unbundle.cm) and program modules (decode_vos_file.pm and gzip.pm) installed
and their location defined in your command library path using the add_library_path
command. A description of bundle/unbundle and all the necessary files can be found on the
Stratus public FTP site, ftp://ftp.stratus.com/pub/vos/utility/README.txt.
unbundle kit-name SightLine
4.
Run the post-installation command macro (install_sl.cm), and follow the instructions
described in the next section.
change_current_dir SightLine
install_sl
Post Installation Configuration
The next step is to run the post-installation configuration script, install_sl.cm. The macro will
complete the configuration portion of the installation. It will prompt you for the following
information:
•
•
•
•
•
1.
the AccessKey
the collection interval
the amount of data to be stored on the host system
the IP address or name of the host system
a 3-character short-id that uniquely identifies this host
If you are upgrading from ViewPoint, the following prompt will
appear:
There is an existing installation of ViewPoint in FRTLHOME
Do you want to upgrade from this? Enter 'n' to change directory
2.
Enter the Access Key string:
Type the AccessKey string following this prompt.
2-2
SightLine Power Agent for VOS Systems: Power Agent (R464)
2.
Enter the data collection interval [default = 30]:
Enter an integer number of seconds for the collection interval.
3.
Enter the data retention period for the host trace file
(example formats: 24h, 10m, or 3d) [default = 10m]:
Enter the desired parameter for the maximum amount of data to be stored locally. This
can be defined in either hours (h), days (d), or megabytes (m). The default is 10m, or 10
megabytes.
4.
You must specify the system_name or IP address of this system
The system name you use must resolve to the IP address of this
system
Enter the system name or the IP address:
system. A PING will be executed to make sure the name entered does properly resolve.
5.
When monitoring data from multiple machines, the SightLine/PC
client uses a 3 character identifier to uniquely identify a
particular machine.
You MUST choose a 1 to 3 character string that will make it
obvious which machine a particular metric is coming from.
Enter a new short-id or accept [default = system_name]:
Enter up to 3 characters to be used as the short-id.
When these configuration parameters have been entered, the macro will ask if you want to start
the software now. When you answer this prompt, the post-installation configuration will be
complete.
N O T E ———————————————————————— Do not start the software at this point if you want to configure the
analyze_system interface. Configure this interface by editing
the SightLine>etc>analyze_system.conf file. If you do
not want this interface, edit SightLine>bin>slagent.cm and
set frtlasi to 0. (Refer to Appendix A for information about the
analyze_system interface.)
To start the software manually, enter the following:
slagent start
N O T E ————————————————————————You will need to replace FRTLHOME with the full path to your
SightLine directory.
To stop the software manually, pass stop as the first parameter to the slagent macro instead
of start. The command looks like this (substituting the path for FRTLHOME):
2-3
slagent stop
The slagent macro can be used to start the SightLine Power Agent processes in diagnostic
mode. Diagnostic mode causes additional information to be written to the log files in the
FRTLHOME>log directory, which can be of use in diagnosing problems. Use this syntax:
slagent [-da ] [ -ds ] [ -dd ] [ -dt ] { start | stop | restart | status }
where:
starts
starts
starts
starts
-da
-ds
-dd
-dt
There are other configuration parameters that can be modified. See the remaining chapters in this
Power Agent section of the SightLine for VOS Systems User’s Guide for detailed information
about configuring the host software to achieve operational objectives.
I M P O R T A N T ————————————————————Please review the Release Notes accompanying this software
for any information that may have been available too late to be
included in this guide.
Default Directory Structure
SightLine is now installed in a frtl directory and contains the following directories under the
FRTLHOME directory:
bin
data
etc
lib
log
install
Program modules directory
HostTraceFile directory
Configuration directory
Library files directory
Log files directory
Base files used during the installation process
Each of these directories is described in more detail below.
The bin Directory
The bin directory contains the executable program files and start/stop scripts. The initial
contents of the bin directory are:
Main SightLine program files:
agentmgr.pm
datamgr.pm
servd.pm
2-4
protomgr.pm
as_iface.pm
Associated programs and scripts:
slagent.cm
datadump.pm
db2vtx.pm
cvtag43to61.pm
cvtpr43to61.pm
See the section Associated Programs and Scripts for more details.
The data Directory
The data directory contains the host trace files that store the performance data. This file is
circular and, by default, resides in the Local subdirectory.
The data directory is initially empty. The first time datamgr.pm is executed, the Local
subdirectory is created and contains the following files:
registry
Local.htf
Local.idx
performance metrics registry
host trace file
index file
The etc Directory
The etc directory contains the configuration files. There is one configuration file for each of the
main program modules. Each configuration file has the same name as the corresponding
program file, with a .conf suffix.
The initial contents of the etc directory are:
agentmgr.conf
datamgr.conf
servd.conf
protomgr.conf
threshd.conf
analyze_system.conf
control.harvest
The log Directory
The log directory contains diagnostic log files written by the SightLine programs.
The log directory is initially empty. The following files are generated by the corresponding
SightLine programs:
agentmgr.log
datamgr.log
2-5
servd.log
protomgr.log
Associated Programs and Scripts
The datadump.pm Program
The datadump program is a diagnostic tool for inspecting the contents of the performance
database files. You should only use it when you are requested to do so by Technical Support
personnel.
The slagent.cm Macro
The slagent command macro is used to manually start and stop the SightLine VOS processes. It
can be edited to select the command line options for these processes.
The db2vtx.pm Macro
The db2vtx program extracts data from the VOS performance database file and converts it into
VTX and VEV format. It provides facilities to select a section of the trace file, using fixed rules or
rules based on the content of the file.
db2vtx obtains its data using the datamgr database manager agent. This is part of the SightLine
Power Agent software. The datamgr agent can be running on the local system, or on a remote
system connected with TCP/IP. Also, the database can contain performance data on the local
system or a remote system.
The Power Agent software sends the symbol table to Expert Advisor/Vision (EA/V) on the PC,
reads all of the data a second time, and then sends the symbol table to EA/V again. This
procedure is followed to avoid sending unnecessary symbol table changes to EA/V. If the initial
data scan is expected to take a significant amount of time, dialog boxes are displayed on EA/V at
the start and end of the scan. These dialog boxes will automatically time-out.
The as_iface.pm Macro
The as_iface program is called by the slagent command macro to start the interface with
analyze_system. (Refer to Appendix A for more information about the analyze_system
interface.)
The cvtag43to61.pm and cvtpr43to 61.pm Macros
The cvtag43to61 and cvtag43to61 programs are used to update ViewPoint version 4.3 programs
and command macros to SightLine version 6.1 programs and command macros.
2-6
Chapter 3
PC-to-VOS Host Configuration Parameters
The first time you run SightLine Expert Advisor/Vision (EA/V) on the PC, and after you enter an
AccessKey, the agents that are on the same subnet as the EA/V workstation will be displayed in
the Enterprise View (the left pane of the EA/V application window). If no connections are showing
in the Enterprise View, right -click on the Enterprise icon and select AutoDiscover.
Right -click on a hostname and select Edit Connection to examine and modify the settings for a
particular host. If there an entry for a particular host does not exist, you will need to manually add
a connection as follows:
1.
Right -click the Enterprise icon and select New Host.
2.
Enter the hostname in the Name field, and specify the host name or IP address in the
Host field. If the host name you’ve entered does not resolve to the managed node’s IP
address, check the hosts file on the local machine or the local Domain Name Server
(DNS) set up in your environment to ensure that an entry exists for the managed node.
If DNS is not set up in your environment, use the PC’s hosts file to ensure the
resolution of machine names with IP addresses.
To configure your PC trace file, right-click the Host Connection, select Edit Connection, and click
the Edit button next to the Trace File field. The Trace File to Capture Data Into dialog box will
appear. Edit this dialog box as follows:
1.
The Save in box in the middle of the dialog box specifies the location of the trace file.
Navigate to the directory where the trace file is to be stored.
2.
Enter the name of the PC trace file in the File name text box. The default name is the first
eight characters of the hostname, unless it contains non-alphanumeric characters.
3.
The Host ID field will contain the default host ID, which identifies which trace file each
metric comes from. You can update the host ID to make it longer or more host-specific.
4.
Optionally, fill the File Info text box with a phrase describing the trace file.
5.
The default size of the PC trace file (20 Mbytes) is shown in the File Size in MB text box in
the lower right of the dialog box. If you would like this file to contain approximately 24 hours
of data, leave this value at the default.
6.
In the Dynamic Attributes box, there will be a Reserve Space of 20% allocated by default.
This will accommodate changes to the trace file’s datablock.
7.
Choose either the Append or Create option in the Trace File Initialization at the bottom of
the dialog box.
3-1
8.
a.
Append accesses the Trace files on the corresponding managed system and
automatically appends any data to the PC Trace file from the available times
from the host file. Essentially, the PC trace file catches itself up with any data
it might have missed due to a loss in the connection. The Define Trace File
Times to Download dialog box will appear once when the trace file is first
initialized. From that point on, SightLine will use the initial host metrics for all
future downloads and automatically update the trace file with any data that it
has not yet downloaded from the Agent machine. Please be aware,
considerable host resources can be consumed during this “catch-up” process.
b.
Create forces the Agent machine to send down a new set of host metrics
each time a download is requested. The Define Trace File Times to
Download dialog box will appear for each download. Additionally, the trace
file will be reinitialized and overwritten each time. Functionally, Create should
be used each time there is a change on the Managed Agent machine such as,
addition of new peripherals, a change in the File systems, or the addition of a
new processor.
Click Save to exit the Trace File to Capture Data Into dialog box.
Now that you have configured where to store the data for this connection, open the
connection to the Power Agent by right -clicking on the connection and selecting Connect
Now! Once the systems have established communications a dialog box will appear. The
Define Trace File Times to Download dialog box will display a time period that covers
the available times in the host trace file for the download system. If you want to download
data that has already been collected, drag the slider bar left until the desired start time is
displayed. If you want to only download a specific time period, uncheck the Continue
into live session check box and position the start and end sliders so they include the
time period you want to download.
3-2
Chapter 4
Agentmgr
The Agent Manager (agentmgr) program module is the coordinator for all the SightLine Agents. It
is intended to run continuously in the background and should be started at boot time. All
performance data is reported through agentmgr by Interface Agents loaded from the
FRTLHOME>lib>interface directory during startup. At each interval, agentmgr makes calls to
each Interface Agent for updated data. It is important to remember that agentmgr will attempt to
load every single file in this directory. In order to deactivate a given Interface Agent, it must be
moved to another directory.
The agentmgr’s configuration file contains the definitions for process filtering, defining logical
workloads, and defining new metrics. Each of these items is discussed in more detail in the
following sections.
By default, the agentmgr uses port 8700. If there is a conflict in your environment and this port is
already in use, you need to edit the slagent.cm command macro in FRTLHOME>bin to start the
agentmgr with an alternate port number. Edit the agentmgr_port variable located near the top
of the macro to reflect the new port number. You will also need to change the port number in
datamgr.conf, as described in Chapter 5 of this manual.
The Class Hierarchy
Before describing how the agentmgr’s configuration file can be used, it is helpful to understand
how the various performance metrics are organized by the agentmgr. In addition, some terms will
be defined that are used throughout this document.
Internally, the agentmgr organizes metrics (also called metric variables, or simply variables) into a
hierarchy of metric classes. Each class can have its own set of metrics as well as a collection of
zero or more subclasses. When specifying the names of metrics, the full class membership is
specified using a dot notation (.).
For example, the following metric is defined for the node VOS System:
VOS System.CPU.Idle
The following specifies the metric CPU Queue Busy Time for the class CPU, which is a
subclass of VOS System:
VOS System.CPU.Cpu Queue Busy Time
In addition, some classes are defined as array classes. An example of a metric for an array class
is:
VOS System.Path.Read Queue Completions
agentmgr
4-1
agentmgr
For these classes, array names can be used to specify individual members of the array. If this
Power Agent is configured to monitor three paths (files), there will be three members in the array
class.
The default agentmgr.conf file is shown in Figure 4-1 and described in the following sections.
Comments can be embedded in the configuration files by using the “#” character. Any text that
follows the “#” on the same line will be ignored.
CONFIG
FILTER
FILTER
FILTER
FILTER
key
{
{
{
{
VOS
VOS
VOS
VOS
XXXX-YYY-ZZZZZ;
System.Processes.Reads }
>=
System.Processes.Writes }
>=
System.Processes.Page Faults } >=
System.Processes.% Cpu }
>=
0.01;
0.01;
0.01;
1.0;
COMPUTATIONS
CLASS { CPU Extra } = { VOS System.CPU }
CRITERIA CPUExtra = { Empty Idle } >= 0;
INCLUSIVE
VARIABLE U_INT { CPU Logical Cpus } GROUPNAME { Module CPU Utilization }
PCNAME { CPU Logical Cpus } POSITION { 1 }
= ({ Cpu Seconds } / { Number Seconds });
VARIABLE FLOAT { CPU Wait Secs } GROUPNAME { Module CPU Utilization }
PCNAME { CPU Wait Time Secs } POSITION { 30 }
= ({ Cpu Queue Wait Time } - { Cpu Queue Busy Time });
VARIABLE FLOAT { CPU % Residence } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % Residence } POSITION { 31 }
= (100 * { Cpu Queue Wait Time } / { Cpu Seconds } );
VARIABLE FLOAT { CPU % Busy } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % Busy } POSITION { 32 }
= (100 * { Cpu Queue Busy Time } / { Cpu Seconds } );
VARIABLE FLOAT { CPU % Wait } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % Wait } POSITION { 33 }
= (100 * ({ Cpu Queue Wait Time } - { Cpu Queue Busy Time })
/ { Cpu Seconds } );
VARIABLE FLOAT { CPU Other Secs } GROUPNAME { Module CPU Utilization }
PCNAME { CPU Other Time Secs } POSITION { 20 }
= { Cpu Seconds } - ({ System } +
{ User } +
{ Server } +
{ Interrupt } +
{ Empty Idle } +
{ User Page Fault Time } +
{ System Page Fault Time } +
{ Server Page Fault Time });
VARIABLE FLOAT { CPU % System } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % System } POSITION { 4 }
= 100 * { System } / { Cpu Seconds };
VARIABLE FLOAT { CPU % User } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % User } POSITION { 5 }
= 100 * { User } / { Cpu Seconds };
4-2
agentmgr
VARIABLE FLOAT { CPU % Server } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % Server } POSITION { 6 }
= 100 * { Server } / { Cpu Seconds };
VARIABLE FLOAT { CPU % Interrupt } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % Interrupts } POSITION { 7 }
= 100 * { Interrupt } / { Cpu Seconds };
VARIABLE FLOAT { CPU % Empty Idle } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % Idle } POSITION { 12 }
= 100 * { Empty Idle } / { Cpu Seconds };
VARIABLE FLOAT { CPU % User PF } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % User PF } POSITION { 8 }
= 100 * { User Page Fault Time } / { Cpu Seconds };
VARIABLE FLOAT { CPU % System PF } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % System PF } POSITION { 9 }
= 100 * { System Page Fault Time } / { Cpu Seconds };
VARIABLE FLOAT { CPU % Server PF } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % Server PF } POSITION { 10 }
= 100 * { Server Page Fault Time } / { Cpu Seconds };
VARIABLE FLOAT { CPU % Other } GROUPNAME { Module CPU Utilization }
PCNAME { CPU % Other } POSITION { 11 }
= 100 * ({ Cpu Seconds } - ({ System } +
{ User } +
{ Server } +
{ Interrupt } +
{ Empty Idle } +
{ User Page Fault Time } +
{ System Page Fault Time } +
{ Server Page Fault Time }))
/ { Cpu Seconds };
VARSET { Name } = "CPUExtra" criteria CPUExtra;
end CLASS
CLASS { DiskQueue } = { VOS System.IOPs.Busses.Controllers.Disks }
COMPUTE
VARIABLE FLOAT { Read Queue Wait Secs } GROUPNAME { Module Disk Units }
PCNAME { Disk Rd Wait Time Secs } POSITION { 15 }
= { Read Queue Wait Time } - { Read Queue Busy Time };
VARIABLE FLOAT { Write Queue Wait Secs } GROUPNAME { Module Disk Units }
PCNAME { Disk Wr Wait Time Secs } POSITION { 16 }
= { Write Queue Wait Time } - { Write Queue Busy Time };
VARIABLE FLOAT { Disk % Busy } GROUPNAME { Module Disk Units }
PCNAME { Disk % Busy } POSITION { 1 }
= 100 * ({ Read Queue Busy Time } + { Write Queue Busy Time })
/ { Read Queue Time };
VARIABLE FLOAT { Disk I/Os/Sec } GROUPNAME { Module Disk Units }
PCNAME { Disk I/Os/Sec } POSITION { 2 }
= ({ Read Queue Completions } + { Write Queue Completions })
VARIABLE FLOAT { Disk Avg Res Time } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Res Time } POSITION { 3 }
= 1000 * ({ Read Queue Wait Time } + { Write Queue Wait Time })
/ ({ Read Queue Completions } + { Write Queue Completions });
VARIABLE FLOAT { Disk Avg Serv Time } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Serv Time } POSITION { 4 }
agentmgr
4-3
agentmgr
VARIABLE FLOAT { Disk Avg Queue Time } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Queue Time } POSITION { 5 }
= 1000 * (({ Read Queue Wait Time } - { Read Queue Busy Time })
+ ({ Write Queue Wait Time } - { Write Queue Busy Time }))
VARIABLE FLOAT { Disk Avg Queue Length } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Queue Length } POSITION { 6 }
= (({ Read Queue Wait Time } - { Read Queue Busy Time })
/ ({ Read Queue Busy Time } + { Write Queue Busy Time });
VARIABLE FLOAT { Disk Degradation } GROUPNAME { Module Disk Units }
PCNAME { Disk Degradation } POSITION { 7 }
= ({ Read Queue Wait Time } + { Write Queue Wait Time })
VARIABLE FLOAT { Disk Concurrency } GROUPNAME { Module Disk Units }
PCNAME { Disk Concurrency } POSITION { 8 }
VARIABLE FLOAT { Disk Avg Rd Res Time } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Rd Res Time } POSITION { 9 }
= 1000 * { Read Queue Wait Time } / { Read Queue Completions };
VARIABLE FLOAT { Disk Avg Rd Serv Time } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Rd Serv Time } POSITION { 10 }
= 1000 * { Read Queue Busy Time } / { Read Queue Completions };
VARIABLE FLOAT { Disk Avg Rd Queue Time } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Rd Queue Time } POSITION { 11 }
= 1000 * ({ Read Queue Wait Time } - { Read Queue Busy Time })
/ { Read Queue Completions };
VARIABLE FLOAT { Disk Avg Wr Res Time } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Wr Res Time } POSITION { 12 }
= 1000 * { Write Queue Wait Time } / { Write Queue Completions };
VARIABLE FLOAT { Disk Avg Wr Serv Time } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Wr Serv Time } POSITION { 13 }
= 1000 * { Write Queue Busy Time } / { Write Queue Completions };
VARIABLE FLOAT { Disk Avg Wr Queue Time } GROUPNAME { Module Disk Units }
PCNAME { Disk Avg Wr Queue Time } POSITION { 14 }
= 1000 * ({ Write Queue Wait Time } - { Write Queue Busy Time })
/ { Write Queue Completions };
end CLASS
CLASS { PathQueue } = { VOS System.Path }
COMPUTE
VARIABLE FLOAT { Path Read Wait Secs } GROUPNAME { Module Path Meters }
PCNAME { Path Rd Wait Time Secs } POSITION { 15 }
= { Read Queue Wait Time } - { Read Queue Busy Time };
VARIABLE FLOAT { Path Write Wait Secs } GROUPNAME { Module Path Meters }
PCNAME { Path Wr Wait Time Secs } POSITION { 16 }
= { Write Queue Wait Time } - { Write Queue Busy Time };
VARIABLE FLOAT { Path % Busy } GROUPNAME { Module Path Meters }
PCNAME { Path % Busy } POSITION { 1 }
VARIABLE FLOAT { Path I/Os/Sec } GROUPNAME { Module Path Meters }
PCNAME { Path I/Os/Sec } POSITION { 2 }
4-4
agentmgr
VARIABLE FLOAT { Path Avg Res Time } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Res Time } POSITION { 3 }
= 1000 * ({ Read Queue Wait Time } + {Write Queue Wait Time })
VARIABLE FLOAT { Path Avg Serv Time } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Serv Time } POSITION { 4 }
VARIABLE FLOAT { Path Avg Queue Time } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Queue Time } POSITION { 5 }
= 1000 * (({ Read Queue Wait Time } - { Read Queue Busy Time })
VARIABLE FLOAT { Path Avg Queue Length } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Queue Length } POSITION { 6 }
= (({ Read Queue Wait Time } - { Read Queue Busy Time })
VARIABLE FLOAT { Path Degradation } GROUPNAME { Module Path Meters }
PCNAME { Path Degradation } POSITION { 7 }
VARIABLE FLOAT { Path Concurrency } GROUPNAME { Module Path Meters }
PCNAME { Path Concurrency } POSITION { 8 }
VARIABLE FLOAT { Path Avg Rd Res Time } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Rd Res Time } POSITION { 9 }
= 1000 * { Read Queue Wait Time } / { Read Queue Completions };
VARIABLE FLOAT { Path Avg Rd Serv Time } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Rd Serv Time } POSITION { 10 }
= 1000 * { Read Queue Busy Time } / { Read Queue Completions };
VARIABLE FLOAT { Path Avg Rd Queue Time } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Rd Queue Time } POSITION { 11 }
= 1000 * ({ Read Queue Wait Time } - { Read Queue Busy Time })
/ { Read Queue Completions };
VARIABLE FLOAT { Path Avg Wr Res Time } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Wr Res Time } POSITION { 12 }
= 1000 * { Write Queue Wait Time } / { Write Queue Completions };
VARIABLE FLOAT { Path Avg Wr Serv Time } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Wr Serv Time } POSITION { 13 }
= 1000 * { Write Queue Busy Time } / { Write Queue Completions };
VARIABLE FLOAT { Path Avg Wr Queue Time } GROUPNAME { Module Path Meters }
PCNAME { Path Avg Wr Queue Time } POSITION { 14 }
= 1000 * ({ Write Queue Wait Time } - { Write Queue Busy Time })
/ { Write Queue Completions };
end CLASS
CLASS { Process Info } = { VOS System.Processes }
CRITERIA Stop
= { State } = /Stopped/;
CRITERIA Ready
= { State } = /Rdy/;
CRITERIA Frozen
= { State } = /Frozen/;
CRITERIA WaitShort
= { State } = /WaitShrt/;
EXCLUSIVE
agentmgr
4-5
agentmgr
VARIABLE U_INT { TotalS } GROUPNAME { Module Processes }
PCNAME { Procs Count } POSITION { 1 } = 1;
VARSET {
VARSET {
VARSET {
VARSET {
end CLASS
Name
Name
Name
Name
}
}
}
}
=
=
=
=
"Stopped"
"Rdy"
"Frozen"
"WaitShrt"
criteria
criteria
criteria
criteria
Stop;
Ready;
Frozen;
WaitShort;
CLASS { Workloads } = { VOS System.Processes }
CRITERIA System
= { Process Name } = /TheOverseer/ OR
{ Process Name } = /BatchOverseer/ OR
{ Process Name } = /mail_handler/ OR
{ Process Name } = /rsn/ OR
{ Process Name } = /TPOverseer/ OR
{ Process Name } = /Cache_Manager/;
CRITERIA LinkServer
= { Process Name } = /LinkServer/ OR
{ Process Name } = /osl_server/ ;
CRITERIA OtherServer
= { Process Name } = /network_client/ OR
{ Process Name } = /network_server/ OR
{ Process Name } = /open_client/ OR
{ Process Name } = /open_server/;
CRITERIA SightLine
= { Process Name } = /agentmgr/ OR
{ Process Name } = /servd/ OR
{ Process Name } = /datamgr/ OR
{ Process Name } = /protomgr/ OR
{ Process Name } = /threshd/;
CRITERIA FtpD
= { Program Name } = /ftpd.pm/;
CRITERIA InetD
= { Program Name } = /inetd.pm/;
CRITERIA PagingD
= { Person Name } = /Paging_Daemon/;
CRITERIA QrunD
= { Process Name } = /Qrun_Daemon/;
CRITERIA MiscUtil
= { Process Name } = /Maintenance_Utility/ OR
{ Process Name } = /Diagnostic_Utility/ OR
{ Process Name } = /Kernel_Utility/;
CRITERIA Other
EXCLUSIVE
= 1;
VARIABLE FLOAT { % Cpu } GROUPNAME { Module Workloads }
PCNAME { Wkld % Cpu } POSITION { 1 }
= { % Cpu };
VARIABLE FLOAT { Page Faults } GROUPNAME { Module Workloads }
PCNAME { Wkld PgFlt } POSITION { 2 }
= { Page Faults };
VARIABLE FLOAT { % Page Fault Time } GROUPNAME { Module Workloads }
PCNAME { Wkld % PgFlt Time } POSITION { 3 }
= { % Page Fault };
VARIABLE FLOAT { Reads } GROUPNAME { Module Workloads }
PCNAME { Wkld Reads } POSITION { 4 }
= { Reads };
VARIABLE FLOAT { Writes } GROUPNAME { Module Workloads }
PCNAME { Wkld Writes } POSITION { 5 }
= { Writes };
VARIABLE FLOAT { CPU Completions/Sec } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Completes/Sec } POSITION { 7 }
= { Cpu Queue Completions/Sec };
4-6
agentmgr
VARIABLE FLOAT { CPU Queue Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Queue ET } POSITION { 8 }
= { Cpu Queue Time };
VARIABLE FLOAT { CPU Residence Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Residence Time } POSITION { 9 }
= { Cpu Queue Wait Time };
VARIABLE FLOAT { CPU Busy Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Busy Time } POSITION { 10 }
= { Cpu Queue Busy Time };
VARIABLE FLOAT { CPU Wait Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Wait Time } POSITION { 11 }
= { Cpu Queue Wait Time } - { Cpu Queue Busy Time };
VARIABLE FLOAT { CPU % Busy Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU % Busy Time } POSITION { 12 }
= { % Cpu Queue Busy Time };
VARIABLE FLOAT { Disk Read Completions/Sec } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Completes/Sec } POSITION { 13 }
= { Disk Read Queue Completions/Sec };
VARIABLE FLOAT { Disk Read Queue Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Queue ET } POSITION { 14 }
= { Disk Read Queue Time };
VARIABLE FLOAT { Disk Read Residence Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Res Time } POSITION { 15 }
= { Disk Read Queue Wait Time };
VARIABLE FLOAT { Disk Read Busy Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Busy Time } POSITION { 16 }
= { Disk Read Queue Busy Time };
VARIABLE FLOAT { Disk Read Wait Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Wait Time } POSITION { 17 }
= { Disk Read Queue Wait Time } - { Disk Read Queue Busy Time };
VARIABLE FLOAT { Disk Read % Busy Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd % Busy Time } POSITION { 18 }
= { % Disk Read Queue Busy Time };
VARIABLE FLOAT { Cache Read Hits } GROUPNAME { Module Workloads }
PCNAME { Wkld Cache Read Hits } POSITION { 19 }
= { Cache Read Hits };
VARIABLE FLOAT { Cache Read Misses } GROUPNAME { Module Workloads }
PCNAME { Wkld Cache Read Misses } POSITION { 20 }
= { Cache Read Misses };
VARIABLE FLOAT { Cache Soiled } GROUPNAME { Module Workloads }
PCNAME { Wkld Cache Soiled } POSITION { 21 }
= { Cache Soiled };
VARIABLE FLOAT { Interrupts Pluses } GROUPNAME { Module Workloads }
PCNAME { Wkld Interrupts } POSITION { 22 }
= { Interrupts Pluses };
VARIABLE FLOAT { New Shared Memory } GROUPNAME { Module Workloads }
PCNAME { Wkld Shared Memory } POSITION { 23 }
= { New Shared Memory };
VARIABLE FLOAT { New Unshared Memory } GROUPNAME { Module Workloads }
PCNAME { Wkld Unshared Memory } POSITION { 24 }
= { New Unshared Memory };
VARIABLE U_INT { Total } GROUPNAME { Module Workloads }
PCNAME { Wkld Total } POSITION { 6 }
= 1;
agentmgr
4-7
agentmgr
VARSET { Name
VARSET { Name
VARSET { Name
VARSET { Name
VARSET { Name
VARSET { Name
VARSET { Name
VARSET { Name
VARSET { Name
VARSET { Name
end CLASS
end COMPUTATIONS
}
}
}
}
}
}
}
}
}
}
=
=
=
=
=
=
=
=
=
=
"System"
"LinkServer"
"OtherServer"
"SightLine"
"FtpD"
"InetD"
"PagingD"
"MiscUtil"
"QrunD"
"Other"
criteria
criteria
criteria
criteria
criteria
criteria
criteria
criteria
criteria
criteria
System;
LinkServer;
OtherServer;
SightLine;
FtpD;
InetD;
PagingD;
MiscUtil;
QrunD;
Other;
Figure 4-1. Default agentmgr.conf File
The CONFIG Statement
The first section of the agentmgr.conf file allows users to configure two important parameters:
the AccessKey and the maximum number of processes to be managed (the nproc parameter).
Each of these parameters is configured with the CONFIG keyword and is described in detail in the
following sections.
AccessKey Configuration
In order to run the SightLine Power Agent software, a valid AccessKey must be entered in the
agentmgr.conf file. This key should be provided by the vendor from whom you received the
software, and is entered during installation. This key controls the expiration of the software. To
configure the AccessKey, edit the CONFIG key line of the agentmgr.conf file as shown in the
example below. Be sure to keep the semicolon at the end of the line.
Example:
CONFIG
key
XXXX-YYY-ZZZZZ;
Nproc Configuration
The SightLine Power Agent monitors the performance of a system at the detailed level of the
processes that are consuming system resources. In other words, users not only have the ability to
view overall system metrics such as CPU utilization, system cache statistics, and paging activity,
but with SightLine they also are able to see the processes that actually account for these
resources.
To control the overhead of performance management, the agentmgr has the ability to limit the
number of processes it collects at each interval. This limits the space required to store the
performance data. The agentmgr limits the number of processes stored using the nproc
parameter. Although the agentmgr looks at all the processes running on the system, it only stores
the top ones. The most active processes are determined using an algorithm based on the
FILTER statements. (See the next section.) Note that the nproc limit is only applied if there is at
4-8
agentmgr
least one relevant FILTER statement. To configure the nproc limit, add the CONFIG nproc line
of the agentmgr.conf file as shown in the example below.
Example:
CONFIG
nproc
30;
In this example, the nproc limit is configured to be the top 30 processes.
FILTER Statements
The agentmgr has the ability to filter out “uninteresting” or unwanted members of array classes.
This helps decrease the overhead of performance management, because less space is required
to store performance data after extraneous members have been filtered out. For example, given
the array set of processes, it is rarely the idle process that causes performance problems.
Process filtering is accomplished with two methods. The first is with the nproc parameter, which
sets the limit of the maximum number of processes saved. The second method is by specifying
FILTER statements in the agentmgr’s configuration file (agentmgr.conf). Note that the FILTER
statements can be used to filter members of any array class, and nproc applies to all classes
that have filters.
There can be zero or more FILTER statements; all will be included in the agentmgr’s
consideration of which class members to keep. If no FILTER statements are used, then the
agentmgr will process all the members of all array classes. Once a member has met the criteria
for any one of the FILTER statements, it is included in the collection.
The syntax of the FILTER statement is as follows:
FILTER { <array class>.<metric name> } >=
value;
The <array class> is any fully specified array class. The <metric name> is any metric
name that is defined for the array class.
The FILTER statements in the default agentmgr.conf (Figure 4-1) are:
FILTER
FILTER
FILTER
FILTER
{
{
{
{
VOS
VOS
VOS
VOS
System.Processes.Reads }
>=
System.Processes.Writes }
>=
System.Processes.Page Faults } >=
System.Processes.% Cpu }
>=
0.01;
0.01;
0.01;
1.0;
These statements specify that a process will be filtered if it does not meet any of the following
criteria:
•
•
•
Consume 1% or more of the CPU, or
Generate more than .01 page faults per second, or
Perform more than 0.01 reads or writes per second
agentmgr
4-9
agentmgr
The COMPUTATIONS Section
The COMPUTATIONS section of the agentmgr.conf file is used to define user-specific metrics. It
accomplishes this by creating new array classes using the basic array classes that are delivered
with the software.
The syntax of the COMPUTATIONS section is as follows:
COMPUTATIONS
CLASS <New Array Class> = <Existing Array Class>
CRITERIA <Criteria_Name> = <Criteria Boolean>
.
.
INCLUSIVE | EXCLUSIVE | COMPUTE
VARIABLE U_INT | INT | FLOAT
{ New Metric Name }
GROUPNAME { EA/V Group Name }
PCNAME { EA/V Metric Name }
POSITION { Position in Metric list }
= <Metric Expression>;
.
.
VARSET { Name } = "<VarSetName>" criteria <Criteria_Name>;
.
.
VARSET { Name } = "Others"
criteria <System_others>;
end CLASS
end COMPUTATIONS
The COMPUTATIONS section consists of one or more CLASS definitions. These definitions are
used to define new array classes. A new array class is specified by first identifying an existing
array class. At each sampling interval, all members of the existing array class are examined and
the values of its metrics are evaluated. If specified criteria are met, metrics for the newly specified
class are computed.
CLASS
A CLASS specification is used to define a new array class. It uses CRITERIA, VARIABLE,
VARSET, and COMPUTE statements for its definition.
The syntax of the CLASS statement is as follows:
CLASS <New Array Class> = <Existing Array Class>
<New Array Class> is the name of the array class being created, and <Existing Array
Class> is the name of the existing array class upon which the new array class will be based.
4-10
agentmgr
CRITERIA
A CRITERIA is a definition of which members are used to establish membership to the array
elements of the new class array. Membership is used by referencing metrics in the existing class.
The syntax of the CRITERIA statement is as follows:
CRITERIA <Criteria_Name> = <Criteria Boolean>
<Criteria Boolean> evaluates to a boolean value. The boolean expression can include =, >,
<=, and >= when comparing metrics to values. Also, a number of comparisons can be joined
using logical operators, such as OR and AND. An example would be:
{ metric1 } = /S/ OR { metric2 } >= 3
The pattern matching used for CRITERIA evaluation utilizes regular expressions. Regular
expressions are presented in Figure 4-2, which explains the symbols used.
Regular Expressions
In:
Metacharacter
Basic regular
expressions (stringmatching regexps)
Extended regular
expressions (stringmatching regexps):
All BRE constructions
plus:
Matches
Example matches
*
Zero or more occurrences of preceding
character of regexp
Pattern: Th*omas
Matches: Thomas, Tomas,
Thhomas
(period)
Any single character
Pattern: string1
Matches: string12, string13,
etc.
[...]
enclosed in the
brackets
Pattern: string[12]
Matches: string1 and string2
only
[^...]
not enclosed in the
brackets
Pattern: string[^12]
Matches: string3, string4, etc.
^
Start of line
Pattern: ^Tom
Matches: “Tom is here”
$
End of line
Pattern: Tom$
Matches: “Here is Tom”
+
One or more occurrences of preceding
character or regexp
Pattern: A+
Matches: A, AA, AAA, etc.
?
Zero or one occurrences of preceding
character or regexp
Pattern: BA?
Matches: BA, BB, BC, etc.
Figure 4-2. Regular Expressions
agentmgr
4-11
agentmgr
N O T E ————————————————————————To use any of the metacharacters literally, place a “\” in front of
any character that has a metacharacter interpretation.
Example:
\$STRING will be read literally as $STRING instead of a shell variable called STRING.
EXCLUSIVE | INCLUSIVE
The keyword EXCLUSIVE specifies that, once a CRITERIA specification is met, no other criteria
are considered. The VARSETS are mutually exclusive in this respect.
The keyword INCLUSIVE allows members to be included in multiple members of the new array.
The VARSETS are inclusive and there can be overlapping data within the array class.
COMPUTE
The keyword COMPUTE establishes another way to define a new array class: computing values
based on existing metrics. In the following example, we would like to have some new metrics for
each disk that is a member of the VOS System.IOPs.Busses.Controllers.Disks array
class. This is accomplished be creating a new array class, DiskQueue. Among many new
metrics are: Disk % Busy, Disk I/Os/Sec, and Disk Avg Res Time. The members of the
new array class will have the same names as the members of the existing array class.
CLASS { DiskQueue } = { VOS System.IOPs.Busses.Controllers.Disks }
COMPUTE
VARIABLE FLOAT { Disk % Busy }
GROUPNAME { Module Disk Units }
PCNAME { Disk % Busy }
POSITION { 1 }
VARIABLE FLOAT { Disk I/Os/Sec }
PCNAME { Disk I/Os/Sec }
POSITION { 2 }
VARIABLE FLOAT { Disk Avg Res Time }
PCNAME { Disk Avg Res Time }
POSITION { 3 }
= 1000 * ({ Read Queue Wait Time } + { Write Queue Wait Time })
end CLASS
4-12
agentmgr
VARIABLE
Metrics for the new array class are defined with VARIABLE statements. The syntax of the
VARIABLE statement is as follows:
VARIABLE U_INT | INT | FLOAT
{ New Metric Name }
GROUPNAME { EA/V Group Name }
PCNAME { EA/V Metric Name }
POSITION { Position in Metric list }
= <Metric Expression>;
VARIABLEs can have one of three types: unsigned integer (U_INT), integer (INT), or float
(FLOAT). All names must be unique throughout all class definitions.
GROUPNAME { EA/V Group Name } specifies the group in the SightLine Expert Advisor/Vision
(EA/V) metric list under which the defined metric will appear (specifically, in the Edit Plot
Variable List dialog box).
PCNAME { EA/V Metric Name } is the name that will appear in the EA/V metric list.
POSITION { Position in Metric list } defines the order in which the metric will appear
in the group.
<Metric Expression> is a simple numeric expression that can be formed with metric names
from the existing array class and the numeric operators - *, +, - or /. Note that these expressions
are parsed with right to left precedence. For example, the following evaluates to 26:
10 + 3 * 2
VARSET
The VARSET statement is used to define the name for referencing a member of the new array
class. It associates the name with the members that met specific criteria as defined for the class
array.
Example of a CLASS Specification
The following example will help explain the syntax and semantics of the CLASS definition. This
particular example may not apply to your system, but it provides a simple way to gain insights into
the CLASS definition. In this example, we want to evaluate the % Busy metric for each member of
the Disks array class. Using this metric for the CRITERIA definition, we will create values for the
I/Os and Total metrics for a new array class called Disk Activity. The two members of this
new array class are referenced by the names Hot Disks and Cool Disks.
agentmgr
4-13
agentmgr
CLASS { Disk Activity } = { VOS System.IOPs.Busses.Controllers.Disks }
CRITERIA IsHotDisk = { % Busy } >= 50;
CRITERIA IsCoolDisk = { % Busy } >= 25;
EXCLUSIVE
VARIABLE FLOAT { I/Os
}
GROUPNAME { Disk Activity }
PCNAME { Disk Act I/Os }
POSITION { 1 }
= { I/Os };
VARIABLE U_INT { TotalD } GROUPNAME { Disk Activity }
PCNAME { Disk Act Total }
POSITION { 2 }
= 1;
VARSET { Name } = "Hot Disks" criteria IsHotDisk;
VARSET { Name } = "Cool Disks" criteria IsCoolDisk;
end CLASS
The CLASS statement is used to establish the new array class and the existing array class that
will be used to create the values for the metrics.
The CRITERIA statements are evaluated in order. Note that the keyword EXCLUSIVE is
included. This means that if a member meets the first CRITERIA statement, it will not be checked
to see if it meets any other criteria. In this way, disks that are 50% busy or greater will not be
included in the criterion for 25% busy or greater.
Two metrics are defined for the new array class: I/Os and TotalD. These are the internal
registry names, which will appear in the registry.csv file under FRTLHOME>data>Local
directory. (Please refer to the section Registry.csv Metric Description File for more information
about the registry.csv file.) The PCNAME settings, Disk Act I/Os and Disk Act
Total, are the names of the metrics that will appear in EA/V.
When a member of the existing array class matches the proper criterion, the value of its metric is
added to the newly defined metric. Notice that, by adding one (1) to the TotalD metric, we will
have the total number of disks that met the specified criterion.
The new array class will have two members that can be referenced with the names Hot Disks
and Cool Disks. Membership to each array entry is based on the criteria defined above.
In order to make this new array class visible in EA/V, we will have to add specifications to the
protomgr.conf file. The following shows how we enable the Disk Activity array class.
In the ENABLE section, include:
ENABLE
{ Disk Activity }
Using SightLine EA/V, you would be able to monitor these four new metrics:
Disk
Disk
Disk
Disk
4-14
Act
Act
Act
Act
I/Os for Hot Disks
Total for Hot Disks
I/Os for Cool Disks
Total for Cool Disks
agentmgr
The Workloads Class
One very important class defined in the configuration file is the Workloads class. From the
standard set of VOS performance data that SightLine delivers, it is relatively simple to track
resource utilization on a system-wide basis. However, to effectively manage your system, you
also need to track specific users, groups of users, or applications. Only then will you be able to
answer questions like:
•
How much of my CPU are the programmers using during prime time?
•
How much memory does the production application really need?
•
How much I/O is my application really doing?
The semantics and syntax of workload definitions follow the description above for CLASS, but
there are some important issues to address before you start to define this class for your
organization. You should first consider exactly what your organization does, how it is organized,
and how the VOS system provides services for the organization within the context of that overall
scheme. Ideally, the workloads you define will equate either to functional areas within your
organization, specific applications or sets of application images, or some other distinguishable
“grouping” for processes doing related work on your system. That way, you will gain more insight
about how your organization and your system fit together, how one workload affects the
performance of other workloads, and how to keep the system running well in your unique
environment.
The SightLine Power Agent allows you to define logical workloads, which group the many
processes that make up your total processing load into manageable, functionally related metrics
as well as collect and deliver data that indicates how each of these workloads is behaving in
terms of activity, resource utilization and system impact. Inspect the Workloads class in the
default configuration file. Then, use the various metrics from the existing array class of processes
to create meaningful members of the Workloads array class.
The important thing to remember is that capturing and reporting workload measurements is
essentially a two-step process. The first step is to decide which processes fit into which logical
bucket or workload. To accomplish step one, agentmgr uses CRITERIA. Within each CRITERIA,
users can test for several attributes that a process might have, such as the Person Name,
Process Name, Program Name, Group Name, or Terminal Name. Once the CRITERIA have
been established, a name is assigned using VARSET specifications. In the second step, you must
decide what metrics are to be collected, and specify these metrics using VARIABLE statements.
The Processes Class
In order to understand this example, it is important to know the metrics associated with the array
class VOS System.Processes. The following table (Figure 4-3) lists the Process Class
metrics.
agentmgr
4-15
agentmgr
4-16
Metric Type
Description
Pid
Process Name
Person Name
Group Name
Program Name
Terminal Name
Priority
Login Time
% Cpu
% Page Fault
Reads
Writes
Page Faults
State
Cpu Time Limit
Invoking Process Identity
Integer
String
String
String
String
String
Integer
String
Float
Float
Float
Float
Float
String
String
Integer
Process ID
Process Name
Person Name
Group Name
Program Name
Terminal
Process priority
Login time/date stamp
CPU Utilization
% CPU time processing page faults
Reads/Second
Writes/Sec
Page Faults/Sec
Processor ready state
Clone Level
Subprocesses
Time Last Run
Memory Pool
% Cpu Queue Time
Cpu Queue Time
% Cpu Queue Wait Time
Cpu Queue Wait Time
Cpu Queue Completions
Cpu Queue Completions/Sec
Cpu Queue Busy Time
% Cpu Queue Busy Tim
% Disk Read Queue Time
Disk Read Queue Time
% Disk Read Queue Wait Time
Disk Read Queue Wait Time
Disk Read Queue Completions
Disk Read Queue
Completions/Sec
Disk Read Queue Busy Time
% Disk Read Queue Busy Time
Cache Read Hits
Cache Read Misses
Cache Soiled
Interrupts Pluses
Interrupts Minuses
Page Fault Pluses
Page Fault Minuses
Integer
Integer
String
Integer
Float
Float
Float
Float
Float
Float
Float
Float
Float
Float
Float
Float
Float
Float
Elapsed Time of the sample, expressed as a %
Elapsed Time of the sample
Residence Time, expressed as a %
Residence Time
Processor visits
Processor visits, expressed as a rate per second
Busy time, actually using the processor
Busy time, expressed as a %
Elapsed Time of the sample, expressed as a %
Elapsed Time of the sample
Residence Time, expressed as a rate per second
Residence Time
Disk visits
Disk visits, expressed as a rate per second
Float
Float
Float
Float
Float
Float
Float
Float
Float
Busy time, actually using the disk(s)
Busy time, expressed as a %
Read Hits/second
Read Misses/second
Cache pages soiled/second
Interrupt starts
Interrupt completions
Page fault starts
Page fault completions
agentmgr
Metric Type
Description
Shared Memory Pluses
Shared Memory Minuses
Unshared Memory Pluses
Unshared Memory Minuses
New Page Faults
New Interrupts
Float
Float
Float
Float
Float
Float
Shared memory pages gained
Shared memory pages released
Unshared memory pages gained
Unshared memory pages released
Page Fault plusses – minuses
Interrupt plusses – minuses
New Shared Memory
New Unshared Memory
Unsigned Integer
Unsigned Integer
Shared pages plusses – minuses
Unshared pages plusses – minuses
Figure 4-3. Metrics for the Process Class Stratus VOS
Class Definition
The Workload CLASS definition from the default agentmgr.conf configuration file is shown in
Figure 4-4. This class will be discussed in detail.
CLASS { Workloads } = { VOS System.Processes }
CRITERIA System
= { Process Name } = /TheOverseer/ OR
{ Process Name } = /BatchOverseer/ OR
{ Process Name } = /mail_handler/ OR
{ Process Name } = /rsn/ OR
{ Process Name } = /TPOverseer/ OR
{ Process Name } = /Cache_Manager/;
CRITERIA LinkServer
= { Process Name } = /LinkServer/ OR
{ Process Name } = /osl_server/ ;
CRITERIA OtherServer
= { Process Name } = /network_client/ OR
{ Process Name } = /network_server/ OR
{ Process Name } = /open_client/ OR
{ Process Name } = /open_server/;
CRITERIA SightLine
= { Process Name } = /agentmgr/ OR
{ Process Name } = /servd/ OR
{ Process Name } = /datamgr/ OR
{ Process Name } = /protomgr/ OR
{ Process Name } = /threshd/;
CRITERIA FtpD
= { Program Name } = /ftpd.pm/;
CRITERIA InetD
= { Program Name } = /inetd.pm/;
CRITERIA PagingD
= { Person Name } = /Paging_Daemon/;
CRITERIA QrunD
= { Process Name } = /Qrun_Daemon/;
CRITERIA MiscUtil
= { Process Name } = /Maintenance_Utility/ OR
{ Process Name } = /Diagnostic_Utility/ OR
{ Process Name } = /Kernel_Utility/;
CRITERIA Other
= 1;
EXCLUSIVE
VARIABLE FLOAT { % Cpu } GROUPNAME { Module Workloads }
PCNAME { Wkld % Cpu } POSITION { 1 }
= { % Cpu };
agentmgr
4-17
agentmgr
VARIABLE FLOAT { Page Faults } GROUPNAME { Module Workloads }
PCNAME { Wkld PgFlt } POSITION { 2 }
= { Page Faults };
VARIABLE FLOAT { % Page Fault Time } GROUPNAME { Module Workloads }
PCNAME { Wkld % PgFlt Time } POSITION { 3 }
= { % Page Fault };
VARIABLE FLOAT { Reads } GROUPNAME { Module Workloads }
PCNAME { Wkld Reads } POSITION { 4 }
= { Reads };
VARIABLE FLOAT { Writes } GROUPNAME { Module Workloads }
PCNAME { Wkld Writes } POSITION { 5 }
= { Writes };
VARIABLE FLOAT { CPU Completions/Sec } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Completes/Sec } POSITION { 7 }
= { Cpu Queue Completions/Sec };
VARIABLE FLOAT { CPU Queue Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Queue ET } POSITION { 8 }
= { Cpu Queue Time };
VARIABLE FLOAT { CPU Residence Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Residence Time } POSITION { 9 }
= { Cpu Queue Wait Time };
VARIABLE FLOAT { CPU Busy Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Busy Time } POSITION { 10 }
= { Cpu Queue Busy Time };
VARIABLE FLOAT { CPU Wait Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU Wait Time } POSITION { 11 }
= { Cpu Queue Wait Time } - { Cpu Queue Busy Time };
VARIABLE FLOAT { CPU % Busy Time } GROUPNAME { Module Workloads }
PCNAME { Wkld CPU % Busy Time } POSITION { 12 }
= { % Cpu Queue Busy Time };
VARIABLE FLOAT { Disk Read Completions/Sec } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Completes/Sec } POSITION { 13 }
= { Disk Read Queue Completions/Sec };
VARIABLE FLOAT { Disk Read Queue Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Queue ET } POSITION { 14 }
= { Disk Read Queue Time };
VARIABLE FLOAT { Disk Read Residence Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Res Time } POSITION { 15 }
= { Disk Read Queue Wait Time };
VARIABLE FLOAT { Disk Read Busy Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Busy Time } POSITION { 16 }
= { Disk Read Queue Busy Time };
VARIABLE FLOAT { Disk Read Wait Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd Wait Time } POSITION { 17 }
= { Disk Read Queue Wait Time } - { Disk Read Queue Busy Time };
VARIABLE FLOAT { Disk Read % Busy Time } GROUPNAME { Module Workloads }
PCNAME { Wkld Disk Rd % Busy Time } POSITION { 18 }
= { % Disk Read Queue Busy Time };
VARIABLE FLOAT { Cache Read Hits } GROUPNAME { Module Workloads }
PCNAME { Wkld Cache Read Hits } POSITION { 19 }
= { Cache Read Hits };
VARIABLE FLOAT { Cache Read Misses } GROUPNAME { Module Workloads }
PCNAME { Wkld Cache Read Misses } POSITION { 20 }
= { Cache Read Misses };
VARIABLE FLOAT { Cache Soiled } GROUPNAME { Module Workloads }
PCNAME { Wkld Cache Soiled } POSITION { 21 }
= { Cache Soiled };
4-18
agentmgr
VARIABLE FLOAT { Interrupts Pluses } GROUPNAME { Module Workloads }
PCNAME { Wkld Interrupts } POSITION { 22 }
= { Interrupts Pluses };
VARIABLE FLOAT { New Shared Memory } GROUPNAME { Module Workloads }
PCNAME { Wkld Shared Memory } POSITION { 23 }
= { New Shared Memory };
VARIABLE FLOAT { New Unshared Memory } GROUPNAME { Module Workloads }
PCNAME { Wkld Unshared Memory } POSITION { 24 }
= { New Unshared Memory };
VARIABLE U_INT { Total } GROUPNAME { Module Workloads }
PCNAME { Wkld Total } POSITION { 6 }
= 1;
VARSET {
VARSET {
VARSET {
VARSET {
VARSET {
VARSET {
VARSET {
VARSET {
VARSET {
VARSET {
end CLASS
Name
Name
Name
Name
Name
Name
Name
Name
Name
Name
}
}
}
}
}
}
}
}
}
}
=
=
=
=
=
=
=
=
=
=
"System"
"LinkServer"
"OtherServer"
"SightLine"
"FtpD"
"InetD"
"PagingD"
"MiscUtil"
"QrunD"
"Other"
criteria
criteria
criteria
criteria
criteria
criteria
criteria
criteria
criteria
criteria
System;
LinkServer;
OtherServer;
SightLine;
FtpD;
InetD;
PagingD;
MiscUtil;
QrunD;
Other;
Figure 4-4. Default Workload Class Definition
CRITERIA
In this example, CRITERIA are defined using only Process Name metrics. All processes that
have the string agentmgr, servd, datamgr, protomgr, or threshd Process Name are
considered to be a member of the SightLine CRITERIA statement.
Similar groups of processes have been combined to define the other standard workloads. A final
CRITERIA statement, defining Other, is included to capture all processes that do not fit into the
previous criteria. For all processes that evaluated to FALSE in the previous statements, this
statement will always evaluate to TRUE.
Note that the keyword EXCLUSIVE is used. Once a CRITERIA statement is satisfied, the process
cannot satisfy any following statements. Note that order may be important in defining CRITERIA
statements, and the CRITERIA statement for Other should be the last CRITERIA statement.
The following example explores regular expressions as presented in Figure 4-2 and their use
when defining workloads:
Person Name1 = jeff
1.
Person name2 = rjeff
Person Name3 = jeffe
{Person Name} = /jeff/
This specifies that the Person Name must contain the string “jeff.” Therefore, all three
persons will be included.
agentmgr
4-19
agentmgr
2.
{Person Name} = /^jeff/
This specifies that the Person Name must begin with the string “jeff.” Therefore, Person
Name1 and Person Name3 will be included.
3.
{Person Name} = /jeff$/
This specifies that the Person Name must end with the string “jeff.” Therefore, Person
Name1 and Person Name2 will be included.
4.
{Person Name} = /^jeff$/
This is a compound definition that specifies that the Person Name must begin string “jeff”
and end with the string “jeff.” Therefore, Person Name1 would be the only one included.
VARIABLE
In this example, the metrics are defined to equal the same metrics (or a combination of metrics)
defined for the existing array class. Note that metrics are always added. For example, for a
specific VARSET, % Cpu is the sum of all processes that met the criterion to be in that VARSET.
VARSET
VARSET statements define the names for the members of the Workloads array class. The
names associate these members to the members of the existing array class that met the
specified criteria.
Path Meters Class
The SightLine Power Agent can collect detailed information about individual paths or files. You
must specify the full path(s) in a file named control.harvest, located in the FRTLHOME>etc
directory. The object specified by the path does not need to exist prior to starting the agentmgr
process. However, if none of the paths specified in the control.harvest file are open when
the agentmgr is started, the entire Path class will be disabled and no Path meters statistics will be
gathered.
You can specify a star (*) name for path_name. However, the name expansion only occurs
within a given directory. It does not apply to any subdirectories. If only a directory is specified (no
filenames), the agentmgr will attempt to collect path statistics for all the files in the directory.
The control.harvest file delivered with the software is shown in Figure 4-5. It is delivered as
a sample file, and must be modified for your system. Note that all the lines are commented out
(with the number sign [ # ] preceding each line) so, by default, no paths are metered and the Path
Meters class will not appear in EA/V. File metering does result in some overhead, so you should
be judicious in deciding which files to monitor.
4-20
agentmgr
# harvest file. specify only path_name entries
#
path_name:%es#enet*
#
path_name:%es#tcp*
#
path_name:%es#d02>sightline>log
Figure 4-5. Default control.harvest File
Registry.csv Metric Description File
The datamgr process generates a metric description file when it connects to the agentmgr and
also when the metric names change. The file is called registry.csv, and is normally stored in
the data/Local directory.
The registry.csv file is intended to assist you in customizing the software and for technical
support. This file lists the internal name of each metric (registry name), and specifies how it
appears in EA/V. This file can be read with a text editor or with a spreadsheet program, such as
Microsoft® Excel.
The primary reason for this file is to assist users in customizing the protomgr.conf and
threshd.conf files. However, this file can also help with debug.
The following columns are generated for each metric:
Group Name
The group name in EA/V, or the event class for EventList metrics
Metric Name
The metric name in EA/V, or the column name for EventList metrics
Event Flag
0 for conventional metrics, 1 for EventList metrics
Position
The position of the metric in the group or in the event class
Type
The metric type (for example, FLOAT64)
Form
The metric form (for example, PCTDUR)
Scale
The scale factor (for example, 1024)
Registry Path
The dot-separated registry path name for the metric.
Type, Form, and Scale are the values from the registry. The equivalent values from the
metric data are not displayed. These registry values indicate the default presentation of
agentmgr
4-21
agentmgr
the metrics in EA/V on the PC. These are the settings that can be overridden by
customizing the protomgr.conf file. For example, to change a metric from KBytes to
MBytes, or a rate into an absolute value.
If a metric has multiple PC definitions then it is listed multiple times (for example, Pid, which
appears in all the event classes).
The file is updated whenever datamgr writes a registry block to the HTF file. This should occur
every time the datamgr starts, and whenever the registry changes.
If datamgr is running in maximum debug (-ddd), then it generates a history of the registry
changes, by renaming the registry.csv file before rewriting it. After three registry changes
you would have registry0.csv, registry1.csv, registry2.csv, and the latest values in
registry.csv.
agentmgr Command Line Options
To view all the available command line options at any time, issue the following command:
agentmgr –h
The “-h” option returns the possible options available with this executable. The output is shown
below.
Usage: agentmgr [-dfhuv] [-n name] [-p port] [-z level]
-d
-f
-h
-u
-v
-n name
-p port
-z level
4-22
show more messages in log file (may be repeated)
run in foreground
display this message
do not compress data (same as -z 0)
display version information
specify alternate conf and log file name
specify TCP port to listen on
specify compression level (0=off, 9=max)
Chapter 5
Datamgr
The Database Manager program, datamgr, is the process that records performance metrics from
the agentmgr in a data file and reads out historical data when requested by SightLine Expert
Advisor/Vision (EA/V). It connects to the agentmgr and records blocks of metrics into a circular
file that is usually, but not necessarily, stored locally. This file is called the Host Trace File, or
HTF. The datamgr program also has the capability to connect to multiple agentmgrs and store
separate HTFs for each managed node. This feature is discussed later in this chapter in
Centralized Database Management.
The datamgr program can also act as a data server to other FORTEL client processes. For
example, datamgr reads out historical data when it is requested by EA/V. There are really two
types of data that EA/V might request: real-time data and historical data. EA/V will connect to
datamgr to get historical data and connect to agentmgr for real-time data. The parent datamgr
spawns a child process datamgr. The parent then transmits all requested data to EA/V and
terminates, while the child takes over the database management responsibilities. Usually EA/V
will ask to switch to live data when all historical data has been downloaded. This transfer of
connections from datamgr to agentmgr happens seamlessly.
The capacity to store agentmgr output locally provides a temporary store of data that facilitates
flexible strategies for downloading data to EA/V on the PC, as well as recovery of data that was
collected on the host but not transferred to EA/V. The datamgr program is designed primarily to
save data for recovery purposes. By design, the performance data should be archived on the PC
client by EA/V. The data stored locally on the managed node acts as a safeguard against any
downtime for the data transfer to the client. With this in mind, determining exactly how much data
to store should take into account the stability of your environment. A very stable network may
enable storing just a single day of data on the host. On the other hand, some environments might
want to consider several days of storage to secure against extended network downtime.
This chapter describes the capabilities of datamgr and the configuration steps required to achieve
various operational objectives. After completing this chapter, you should understand the tasks
performed by datamgr. You should also have a command of the configuration file syntax to
program those tasks in your own environment.
Datamgr
5-1
Datamgr
Configuration
The operation of datamgr is controlled through its configuration file, datamgr.conf, in the
FRTLHOME>etc directory. The datamgr.conf file is quite brief and fairly simple. All the work in
this sample file is performed in the last line. See the sample line below.
# Host Name
Port Number
DB Name
Interval
Expire
#
DB system.us.com 8700 Local 30 36h
{ VOS System }, { General },
{ CPU Extra }, { DiskQueue }, { PathQueue }, { Process Info }, { Workloads };
Figure 5-1. Default Line from datamgr.conf File
A line is required for each system for which datamgr is to store data. The datamgr program can
store data for more than a single system. By default, each datamgr.conf file lists the local
system. A centralized database management scheme can also be employed where the datamgr
program is instructed to collect and maintain data from multiple agentmgrs. (See Figure 5-2.)
There are six essential entries in an operative line of the configuration file:
1.
The declaration “DB.”
2.
The Agent name:
We recommend that you use the system name plus the full domain for this entry. Whatever
string is used here, it should resolve to the IP address of the machine.
3.
The port on which agentmgr is listening:
By default, the port number is 8700. If the collector is configured to listen on an alternate
port, this entry must be modified to reflect the change.
4.
The directory name where host trace files will be stored:
This directory will be located under the FRTLHOME>data directory. A unique name should
be used for each directory when configuring a centralized database manager. In other
words, when storing data for two systems in addition to the local machine, there would be
three subdirectories in the data directory. The local data is stored in Local, and one
directory for each of the other two systems. We recommend naming the subdirectories with
the name of the system for which they hold performance data.
5.
The sample interval at which datamgr will archive the data:
By default, the sample interval is configured to be 30 seconds. The collection interval
defined in the datamgr.conf must agree with the collection interval defined in the
protomgr.conf file. A future release of the host software will permit multiple intervals.
Changing this interval affects the amount of space needed to store the data. The degree of
impact is dependent on the rate of compression of the data, but it is essentially a linear
relationship; more samples per hour (shorter interval) means more storage and processing
resources required.
5-2
Datamgr
6.
The amount of data to store (in hours (h), days (d), or MBs (m)):
By default, the software will store the last 24 hours of data. (3d = 72h) 3 days is equivalent
to the last 72 hours from the current time, not the last 3 full days. To make sure that the last
three full days are stored, use 4d or 96h.)
7.
The list of Metric Classes:
A list of the metric classes you want to store must be specified. This list should also include
the General metric class, and any computed classes defined in agentmgr.conf. For
instance, in Figure 5-1, the metric classes { General } and { VOS System } are
stored, as well as the computational metrics based on metrics in { VOS System }:
{ CPU Extra }, { DiskQueue }, { PathQueue }, { Process Info }, and
{ Workloads };
By default, datamgr listens on port 8800. If there is a conflict in your environment and this port is
already in use, you need to edit the slagent.cm command macro under FRTLHOME>bin to start
the datamgr with an alternate port number. Simply edit the variable named DATAMGR_PORT located
near the top of the macro to reflect the new port number, as shown below:
&set datamgr_port 8800
Communication
VOS Server
Servd (1645)
Agentmgr
(8700)
n
k
EA/V Workstation
j
mo
SightLine
Expert
Advisor/Vision
Protomgr
l
Datamgr
(8800)
Figure 5-2. SightLine EA/V and the Power Agent Components
Datamgr
5-3
Datamgr
The Database Manager communicates with data sources (agentmgrs), as well as with data
clients (protomgrs) via TCP/IP. On startup, it will attempt to connect to all configured data
sources. Should an attempt to connect fail, a new attempt will occur every 60 seconds. The
typical course of events in a normal connection is to download some historical data, then carry on
into live data, as depicted in the numbered sequence in Figure 5-2. The default TCP/IP ports are
in parentheses.
j SightLine Expert Advisor/Vision (EA/V) sends a connection request that is received by servd,
the Service Manager, on a remote VOS host.
k Servd will launch a protomgr process, the Real-Time Agent, to manage the transfer of data
between the host and the EA/V client.
l Having received a request for historical data, protomgr first connects to datamgr to receive
the recorded performance metrics.
m Protomgr transfers the data to EA/V.
n When all the requested historical data has been received from datamgr, protomgr will drop its
connection to datamgr and open one to the agentmgr for real-time performance metrics.
o Protomgr will transfer data from the collector to EA/V until the connection is stopped
manually.
When EA/V wants to connect to the Database Manager Daemon, the client will send a message
requesting data for a particular host to the Service Manager, servd, on the host (j in Figure 5-2).
The name is passed from the Host field in the Configure Network Host Session dialog box in
EA/V. This host name is translated to an IP address using DNS or the hosts file on the local
machine. The datamgr program will perform an IP address match to verify that it stores data for
the host passed from EA/V. If the IP address match fails, no historical data will be sent.
The connection will pass directly to the agentmgr for a live download (n in Figure 5-2). If this
happens, a message will appear in the datamgr log to the effect that the database cannot be
found. If you attempt to connect for historical data, but only receive live data, then you should
ensure that:
1.
Datamgr is running (lu).
2.
There is an HTF in FRTLHOME>data>Local.
3.
The host name sent from EA/V resolves to the same IP Address as the host entry in the DB
line of datamgr.conf.
File Structure
The temporary store of data is maintained as a set of directories and files corresponding to each
host for which datamgr is archiving data. The set of host files will include a host index file and a
host trace file. The host trace file is circular. New data will be wrapped over the oldest data when
either the configured size is filled, or the configured period has elapsed.
5-4
Datamgr
Directory Structure
FRTLHOME>data> DB name >
>DB Name.htf
>DB Name.idx
>registry.csv
where:
<DB Name >
DB Name.htf
DB Name.idx
registry.csv
Example:
“DB Name” from the datamgr.conf file
Host Trace File (data)
Index to the host trace file
>data>Local
>Local.htf
>Local.idx
>registry.csv
>data>develB
>devel.htf
>devel.idx
>registry.csv
The database directory files are located in a directory named, FRTLHOME>data>DB Name,
where
DB Name is the database name declared in the datamgr configuration file. The database
directory contains three files:
DB Name.htf
This file contains the performance data. It is the largest file, and grows until the defined maximum
is reached, then wraps new data over the oldest data. The maximum can expressed in absolute
size, as in 25MB, or can be expressed in time, as in 24h (hours).
DB Name.idx
This file contains the index for the DB Name.htf file.
Registry.csv
The registry.csv file is intended to assist you in customizing the software and for technical
support. This file lists the internal name of each metric (registry name), and specifies how it
appears in EA/V. This file can be read with a text editor or with a spreadsheet program, such as
®
Microsoft Excel. If Oracle support is installed, then multiple registry.csv files can be
generated (depending on the configuration selected). These are stored in the data/*/ directories.
The file is documented in Chapter 4 of this guide.
Datamgr
5-5
Datamgr
The datamgr and agentmgr do not have to run on the same module. Therefore, one datamgr can
manage the host trace files for all your systems. This allows multiple host trace files to be stored
on a single host, which can be very advantageous if storage resources are scarce.
Now that you have seen a simple model of the relationship between EA/V and a single VOS host,
refer to the more complex environment shown in Figure 5-3.
System A
System B
servd (1645)
agentmgr
protomgr
(8700)
servd (1645)
agentmgr
protomgr
(8700)
threshd
threshd
datamgr
(8800)
A
B
B
B
Figure 5-3. Centralized Databa se Management
The example in Figure 5-3 shows two systems, A and B, where system A is the central database
repository. Notice that the datamgr on system B is not running. System A’s datamgr is recording
the data for both. It has two TCPI/IP connections open. The first is to the local agentmgr, and the
second is to system B’s agentmgr. System B no longer needs to run datamgr. It could be started
any time should the connection be lost to the central datamgr on system A. It is possible to merge
the data from the two separate datamgrs in SightLine so that long-term history files can be
uninterrupted.
The datamgr program will manage the two databases separately. For example, datamgr might be
configured to store data from the local agentmgr for 3 days, while keeping 100 MB of
performance data for system B. To configure this environment, edit the datamgr.conf file in the
FRTLHOME>etc directory on system A. Enter one line for each system. Notice that the syntax of
the datamgr.conf file includes which host and port it should get its data from for each trace file
directory, the directory where the data should be stored, the interval at which it is being collected,
and the amount of data to store. Revisit the previous section, Configuration, for more details.
The following datamgr.conf entries would configure the environment in Figure 5-3:
5-6
Datamgr
#
#
DB
{
DB
{
Host Name
Port
DB Name
moduleA.domain.com 8700
Local
30
CPU Extra }, { DiskQueue }, { PathQueue
moduleB.domain.com 8700
HostBDB 30
CPU Extra }, { DiskQueue }, { PathQueue
Interval
3d {
}, {
100m
}, {
Expire
VOS System }, {
Process Info },
{ VOS System },
Process Info },
General },
{ Workloads};
{ General },
{ Workloads};
N O T E ————————————————————————Intervals for all host entries must be equal to the collection
interval set in the protomgr.conf file in the current release of
the host software. A future release will allow multiple differing
intervals.
Now consider the communications model shown in Figure 5-2. Essentially, the sequence of
events is the same for a central database, except that the protomgr process may have to go to
two machines for all the data requested. The protomgr process on system B must go to system A
for historical data before switching to the local agentmgr. It is possible to pass options from the
Advanced Settings in EA/V’s Network Host Sessions dialog box to instruct the protomgr
process to connect to an alternate location to retrieve historical data, but connect to the local host
agentmgr for real-time data. Rather than add these setting to each SightLine machine, it is far
more efficient to configure these settings on the command line of protomgr.
Remember that protomgr is launched by the Service Manager, servd. The servd process’s sole
responsibility is to listen for connection requests and launch protomgr when one is received. The
protomgr process accepts several options on the command line. Let’s look more closely at how
servd calls protomgr. The call to open protomgr is defined in servd.conf below:
Calls protomgr for local data
#
service hostname
command
#
SERVICE vpdata localhost protomgr -P %p -V %v
-K %k -C %c -E %e -c %h –x1;
In a centralized data storage environment, the historical performance data is not kept locally.
There are now two places to connect to retrieve data: the local agentmgr for real-time data, and
the central repository for historical data. You need an intelligent agent that knows to go to
moduleA to retrieve historical data, and then connect back to the local agentmgr for real-time
data. The protomgr program can be configured though command line options to do this.
You should configure the servd process on the machine not storing the historical data (moduleB
in our example) to access the datamgr process of the central database machine (moduleA). To
accomplish this, you must edit the source.conf file of the system B to pass the “–m
<central_host_name>” option when calling the protomgr process. This option instructs
protomgr to use the datamgr process of the system A when downloading the historical data. The
parameters contained in source.conf are appended as a protomgr option for the vpdata
service in servd.conf.
EA/V can declare which protomgr process to use, with which options, by adding a new data
source definition in the source.conf file. By default, EA/V’s Discover Data Sources feature
will display all connections defined in the source.conf file. The default connection for System
data is named system. Any string can be placed in the source field. When EA/V passes a string
that matches a data source entry, then the associated options to protomgr will be executed. To
set up a new data source called Bdata that will call a protomgr pre-configured to transfer system
B data from the repository on system A, the source.conf file should look like this:
Datamgr
5-7
Datamgr
#
Source Name
Description
Options
SOURCE system “System”;
SOURCE Bdata “System A” –m system A.domain.com;
System A is the centralized database manager. Therefore, the source.conf files for additional
nodes whose data is stored on system A should also be edited to reflect the above changes.
Datamgr Command Line Options
To view all the available command line options at any time, issue the following command after
first exporting the path to the dynamic libraries (see the slagent script for the path options):
datamgr -h
below.
Usage: datamgr [-dfhv] [-q port] [-n name] [-O output_file_number]
-d
-dd
-f
-h
-v
-q port
-n name
-O number
5-8
turn on debugging
turn on an additional level of debugging
run in foreground
specify alternate datamgr port
specify alternate name for conf and log files
suffix to the datamgr.out file
Chapter 6
Threshd
SightLine’s MultiAction Threshold Agent, which is implemented in the threshd program, provides
a robust threshold and reactive action management service for the SightLine Power Agent
software. When user-specified performance thresholds are exceeded, the MultiAction Threshold
Agent can be configured to automatically interface with various software components outside the
Power Agent software. Three types of threshold reactive actions can be implemented:
•
•
•
Automatic e-mail
Send SNMP traps and create an SNMP MIB
Invoke a batch script
Each one of these actions can be invoked using a robust set of criteria to evaluate any of the
agentmgr metrics and then take action on their results. This chapter provides the overall syntax
and semantics for using the MultiAction Threshold Agent, along with some simple, but useful,
examples.
The MultiAction Threshold Agent can connect to multiple agentmgrs. All agentmgrs connected to
a single threshd share the same thresholds. The agentmgrs and thresholds are all defined in a
single configuration file. This configuration file is discussed in detail in this chapter.
The MultiAction Threshold Agent enables the user to define multiple thresholds on the same
metric. This functionality permits an escalation of action as the metric’s value becomes more
critical. For example, a MultiAction Threshold Agent could be configured to delete temporary files
from a directory if the space used is over 50% but less than 75%. If this doesn't solve the
problem, mail can be sent at 60% used, and an SNMP trap can be sent at 80% used.
Configuration
By default, the configuration file for the MultiAction Threshold Agent, threshd.conf, is located
in the FRTLHOME>etc directory. The file is used to configure the condition-action thresholds, alert
actions, and agentmgrs to be monitored. Since each of these parameters must be tailored for
individual computer systems, the default file is essentially empty, consisting of sample lines that
are commented out. These lines provide some guidance and examples to help get you started.
Threshd
6-1
Threshd
General Structure
The following describes the overall structure of the threshd.conf file:
MAILHOST domain_name;
AGENTMANAGERS
host.fortel.com;
host.fortel.com:8700;
host.fortel.com INTERVAL 30 SECONDS;
host.fortel.com INTERVAL 1 MINUTE;
host.fortel.com INTERVAL 10 MINUTES VPNAMES “protomgr”;
END AGENTMANAGERS
SNMPVARS
snmpvariable_definition;
.
.
snmpvariable_definition;
END SNMPVARS
SNMPTRAPS
snmptrap_definition;
.
.
snmptrap_definition;
END SNMPTRAPS
THRESHOLDS
threshold_definition;
.
.
threshold_definition;
END THRESHOLDS
Each part is discussed in the following sections.
Metric Names
By default, the registry.csv file located in FRTLHOME>data>Local for each VOS system
should be considered the definitive list of metrics for that server. When specifying metric names
for threshd, you may use either the metrics managed by agentmgr or the metric name that is
used by SightLine Expert Advisor/Vision (EA/V) on the PC, which is often simpler.
String Substitution
Various agentmgr variables are available when forming expressions and messages. The
following string substitutions can be used to provide additional information over and above the
metrics’ names:
6-2
Threshd
$ADDRESSLIST
$COUNT
$DURATION
$EXPRESSION
$HOSTIPADDRESS
$HOSTNAME
$INSTANCE
$RESULT
$SEVERITY
$SLEEP
$TIME
$TIMEGMT
$TITLE
$VARNAME
$VPVARNAME
Address list of threshold
The number of consecutive times that the action has been taken
since the expression has evaluated to TRUE
Duration of threshold
Description of expression for threshold
agentmgr IP address
agentmgr name
Instance or metric
Description of the result
Severity of threshold
Sleep of threshold (output in seconds)
Time of action (local)
Time of action (GMT)
Title of threshold
Name of agentmgr metric (including the instance)
Name of EA/V metric (including the instance)
Whenever these keywords are used, they are replaced by their value.
Additional String Features
Any string enclosed in braces ( { } ) will be “evaluated.” For example, the string “{ VOS
System.Users Logged On }” would be replaced by the number of users logged on at the time
of the threshold violation.
You can use the following escape characters:
\n
\r
\t
\"
\\
replaced with an ASCII 10 (newline)
replaced with an ASCII 13 (return)
replaced with an ASCII 7 (tab)
replaced with a double quotation mark
replaced with a single \
To put a left brace ( { )or a dollar sign ( $ ) in your string, use:
\\{
\\$
replaced with {
replaced with $
MAILHOST Definition
The MAILHOST subset of threshd.conf is optional. It is used to define the ip address or
network name of the system on your network acting as a mail server, running the SNMP (Simple
Mail Transfer Protocol) service, that threshd should use when sending mail alerts. The syntax for
this is described below:
Syntax:
MAILHOST domain_name;
Threshd
6-3
Threshd
The domain_name specifies the name of the system that will service the sent mail. The default is
the host executing threshd.
Example:
MAILHOST mailserver.fortel.com;
AGENTMANAGERS Definition
The AGENTMANAGERS subset of threshd.conf is used to list each agentmgr to be monitored
by threshd. Every collection agent listed in this section will be alerted by the same set of
condition-action thresholds. The syntax and parameters needed for each collector definition is
described below:
Syntax:
hostname[:port] [INTERVAL duration] [VPNAMES "conf_file"]
The hostname specifies the hostname of the system where the agentmgr resides.
[:port] specifies the port where the agentmgr is running. The default port is 8700.
[INTERVAL duration] specifies the time interval for data being collected. The default is 30
seconds. The duration may be specified as days, hours, minutes, or seconds; if it is not
specified, it is assumed to be seconds.
[VPNAMES "conf_file"] specifies the protomgr configuration file containing the metric
redefinitions that may have been optionally configured. The conf_file specified will be
prepended with the value of FRTLHOME>etc and appended with .conf. The default is
protomgr.
Example:
AGENTMANAGERS
host.fortel.com;
host.fortel.com:8700;
host.fortel.com INTERVAL 30 SECONDS;
host.fortel.com INTERVAL 1 MINUTE;
host.fortel.com:8700 INTERVAL 30 VPNAMES “protomgr”;
END AGENTMANAGERS
SNMPVARS Definition
The SNMPVARS subset of threshd.conf is optional. It is used to specify the metrics that will be
passed in any SNMP traps. The syntax and parameters needed for each metric definition is
described below:
Syntax:
name: OID integer VALUE string [ TYPE type ] [ DESCRIPTION
string];
name specifies the name of the SNMP metric. This name is used in the SNMPTRAPS section to
reference a defined SNMP variable. Valid characters include a-z, A-Z, and 0-9.
OID specifies the Object ID of the metric. The metric will have the object identifier of
1.3.6.1.4.1.1130.1.xx.
6-4
Threshd
VALUE specifies the string to evaluate to generate the metric’s value at the time the trap is
generated. For example, a string of $VPVARNAME would generate the string that represents the
metric that triggered the trap. Also, {$VPVARNAME} would generate the value of the metric.
TYPE specifies the output type of the metric. The TYPE can be one of the following:
INTEGER
STRING (default)
DESCRIPTION specifies the description of the metric as it will appear in the MIB.
SNMPTRAPS Definition
The SNMPTRAPS subset of the threshd.conf file is optional. This section is used in conjunction
with the SENDTRAP..TO action to define a customized SNMP trap. The syntax and parameters
needed for each trap definition are described below:
Syntax:
name: SPECIFIC integer [VARIABLES var-list]
[DESCRIPTION string] [MIBTEXT string];
name specifies the name of the SNMP trap. The name must be specified in the THRESHDOLDS
section with the action SENDTRAP..TO in order for the defined trap to be used in the send SNMP
Trap action. Valid characters include a-z, A-Z, and 0-9.
SPECIFIC specifies the specific id, which should be a unique number for each defined SNMP
trap. The value of the SPECIFIC integer must be greater than or equal 6. Generic traps sent with
the SENDTRAP..TO action are assigned the specific id of 6. SNMP traps defined in the
SNMPTRAPS section will have the Enterprise ID of 1.3.6.1.4.1.1130.
VARIABLES is followed by a comma-separated list of names that specify the variables defined in
the SNMPVARS section to include in the trap.
DESCRIPTION specifies the description of the trap as it will appear in the MIB.
MIBTEXT specifies any additional text to be included in the MIB.
THRESHOLDS Definition
The THRESHOLDS subset of threshd.conf is used to configure the condition-action thresholds.
Each threshold that is defined in this section will apply to each agentmgr defined in the
AGENTMANAGERS subset. This is the most complex section of the file. Take care to read
thoroughly through this section to fully understand each parameter. Example thresholds are
provided in following sections of this chapter. The syntax and parameters needed to define a
threshold are described below:
Syntax:
title: IF expression [FOR duration] THEN action [parms]
Threshd
6-5
Threshd
title specifies the description of the threshold. Valid characters include:
a-z
A-Z
Space
_ (underscore)
-!@#$&
0-9
expression specifies a boolean expression comprised of variable names and operators. This
expression must be true for the action to occur. An expression can be as simple or complex as
needed. Expressions are described in detail in the section, Expressions.
duration specifies the amount of time expression must be true before action is taken.
duration may be further specified as:
nn days
nn hours
nn minutes
nn seconds
If not specified, the duration is assumed to be in seconds. If no duration is specified, it is
assumed to have a value of zero. This means the action will be taken during the first interval
where the expression evaluates as TRUE.
action specifies one of the following:
SENDMAILTO address_list
SENDMPEVENTTO address_list
EXECSHELL command_line
SENDTRAPTO address_list
SENDTRAP snmptrap TO address_list
address_list allows mail and/or traps to be sent to multiple addressees. Multiple addresses
are separated by commas. Remember that both string substitution and evaluation are supported
within address lists. For SENDMAILTO, the address_list consists of a list of e-mail addresses.
For SENDMPEVENTO, SENDTRAPTO, and SENDTRAP..TO, the address_list consists of
domain names or IP addresses.
command_line specifies the full command path to a shell script designed for corrective actions.
snmptrap specifies the name of the SNMP trap as defined in the SNMPTRAPS section.
Each of the actions is described later in the section, Actions.
parms can be one or more of the following:
SLEEP duration
duration specifies the amount of time before the action will be reissued. duration
may be further specified as:
6-6
Threshd
nn days
nn hours
nn minutes
nn seconds
If not specified, the duration is assumed to be in seconds. If no duration is specified, it is
assumed to have a value of zero.
SEVERITY integer
integer specifies the severity. The default severity is 5. The specification of the severity
must come before the use of the $SEVERITY variable in the MESSAGE string.
MESSAGE string
string specifies the actual message to be sent with each action. This string message is
surrounded by double quotes and supports string substitutions.
Expressions
The MultiAction Threshold Agent has can evaluate very complex expressions for each defined
threshold. An expression is comprised of variable names, integers and operators. Each of these
parameters is discussed below.
Operators for Complex Expressions
Complex expressions are supported with the following operators:
AND
OR
<
<=
=
!=
>=
>
+
*
/
(
)
logical and
logical or
less
less than or equal
equal
not equal
greater than or equal
greater than
addition
subtraction
multiplication
division
left parenthesis
right parenthesis
Actions
The MultiAction Threshold Agent has the ability to take action to alert the user to a specific event
or problem. These actions include sending email, sending an SNMP trap, executing a batch script
for corrective actions, and sending a message.
Threshd
6-7
Threshd
Send E-mail
The keyword SENDMAILTO is used to specify an action where e-mail is to be sent. The
MAILHOST that is specified in the configuration file is used to specify the domain name for the
mail server that will service the e-mail sent by threshd.
IF { Disk % Busy } > 90 FOR 5 MINUTES THEN SENDMAILTO
[email protected], [email protected]
MESSAGE “Disk $INSTANCE has been busy for more than $DURATION minutes”;
In this example, e-mail will be sent to [email protected] and [email protected] when a
disk is over 90% Busy for 5 minutes or more. Note that a string substitution is used to include the
instance name of the disk and the duration in the message.
Send an SNMP Trap
The keywords SENDTRAPTO and SENDTRAP..TO are used to specify an action where an SNMP
trap is sent to an IP address. SENDTRAPTO will send a generic SNMP trap, whereas
SENDTRAP..TO will send a customized SNMP trap defined in the SNMPTRAPS section.
The SNMP trap message conforms to SNMP standards, where the following fields are provided:
SNMP Field
Type
Description
Trap Message
Value
Alarm Occurrence
IP Address
Unused Field 1
Unused Field 2
VarName
Unused Field 3
Time
Duration
Unused Field 4
Severity
String
Integer
Integer
Integer
Integer
Integer
String
String
String
Integer
Integer
Integer
This is the string provided by the MESSAGE parameter of the threshold
This is the value of the variable specified in the threshold
$COUNT
$HOSTIPADDRESS
Constant with value 123
$VARNAME
Constant with value “”
$TIME
$DURATION
This is the string provided by the SEVERITY parameter of the threshold
Figure 6-1. Default Line from datamgr.conf File
In the table above only the Trap Message and Severity fields can be set by the user in the
SENDTRAPTO and SENDTRAP..TO actions (refer to the example below).
IF( { CPU % System } + { CPU % User } ) > 90 FOR 60 seconds THEN
SENDTRAPTO 127.0.0.1
SEVERITY 7
MESSAGE “CPU utilization above 90%”;
This example will set Severity to 7, and the Trap Message to “CPU utilization above 90%”.
6-8
Threshd
In addition, the SNMP trap contains the following identification:
Enterprise Name
Enterprise Id
Event Name
Generic Trap
Specific Trap Number
FORTEL
1.3.6.1.4.1.1130
VOS_Host_Alarm
6
2
Execute a Script
The keyword EXECSHELL is used to specify an action where a command line script is to be
executed.
In general, the EXECSHELL command can have as many arguments as necessary.
Messages
The MultiAction Threshold Agent can send customized messages to alert users to a specific
event. These messages are sent using a string surrounded by double quotation marks. Multiple
strings in a row will be appended. For example, the following are equivalent:
“this is a string”
“this “
“is” “ a string”
The messages can contain pertinent information about the exact event that triggered the alert. For
example, the threshold value, the actual value, the duration and the severity can be inserted into
the message using string substitutions.
Threshd Command Line Options
The user-configurable command line options for threshd can be displayed by issuing the following
command:
threshd -h
The resulting output is:
Usage: threshd [-bdfhvM] [-n name]
-d
-f
-h
-v
-b
-n name
run in foreground
show version information
swap IP address byte order in SNMP trap header
-M
generate SNMP MIB
Threshd
6-9
Chapter 7
Servd
The Service Daemon, servd, is the host listening process that detects requests for service from a
PC client running SightLine Expert Advisor/Vision (EA/V). It uses a registered port, 1645. If there
is a conflict in your environment and this port is already in use, you need to edit the slagent
script under FRTLHOME>bin to start servd with an alternate port number. Edit the variable named
servd_port located near the top of the script to reflect the new port number. Additionally, edit
the port number that EA/V uses to request the service for data. You can find this port by rightclicking the connection in Enterprise View, and selecting Edit Connection, then selecting
Advanced Settings.
When a service request is received, servd starts the corresponding process. For example, EA/V
starting a download requests the service vpdata. Receiving a vpdata request causes servd to
execute a protomgr process. The protomgr process then manages the connection and data
transmission between EA/V and the Power Agent. To establish a connection from EA/V, servd
must be running on the managed node.
In most cases, there will never be a need to modify servd. If you need to run any of the processes
on ports other than the defaults, then you may need to modify some entries in the servd.conf
file.
Configuration
The servd process employs several macros to provide arguments to the protomgr command line
options. The default servd.conf file is shown below.
# Macros:
#
%c
SightLine supplied compression version
#
%e
SightLine supplied encryption version
#
%p
SightLine supplied callback port number
#
%h
SightLine supplied collection host
#
%v
SightLine supplied callback host
#
%k
SightLine supplied key
#
#
service
hostname
command
#
# The '-x1' option is for temporary backward compatibility; it allows metric
# name + array subscript to be longer than SightLine/PC can deal with in several
# cases, but the truncated subscripts are awkward with Oracle data.
#
# To truncate subscripts (and thus be able to save environments in SightLine),
# remove the '-x1' option.
#
SERVICE
vpdata
localhost protomgr -P %p -V %v -K %k -C %c -E %e -c %h;
Servd
7-1
Servd
To properly set up and manage the connection from the agentmgr to EA/V on the client PC,
several communication parameters are passed from EA/V to the host communication process,
protomgr. Notice the parameters listed in the default configuration file above. These configuration
settings should not be modified except in response to a request from FORTEL Customer Support
or unless instructed to do so in the documentation.
Subscript name truncation
SightLine provides a configurable option to truncate names (such as file system names) that
guarantees full compatibility with EA/V.
This option is configured with the following command line options to the protomgr:
-x1
-x2
allows long subscripts
truncate long subscripts
These should be set in the servd.conf configuration file.
SightLine has a 128-character maximum for metric names.
Servd Command Line Options
The user-configurable command line options for servd can be displayed by issuing the following
command:
servd -h
The resulting output is:
Usage: servd [-dfhv] [-p port]
-d
-f
-h
-v
-p port
7-2
turn on debugging (d can be repeated)
run in foreground
specify alternate TCP/UDP port
Chapter 8
Protomgr
The SightLine communications process, protomgr, is the process that manages the data
transmission from the SightLine Power Agent on the host to SightLine Expert Advisor/Vision
(EA/V) on the PC client. Each time an EA/V client sends a vpdata service request to a system,
servd will start a protomgr process to manage the download for that client. There is a one-to-one
relationship between the number of EA/V PCs downloading data and the number of protomgr
processes running on the system.
In addition to managing the data transmission to EA/V on the PC, protomgr defines userconfigurable items that can be modified to suit user requirements. These eight items are:
1.
Data source selection — multiple data sources, such as Interface Agents, are defined in
source.conf, and these definitions are passed to the protomgr.
2.
Short ID definition — a one- to three-character string that is prepended to the trace file and
metric names inside EA/V when there is more than a single system connected. This is used
to identify which Power Agent a particular metric comes from.
3.
Collection interval definition — the sample interval in an integer number of seconds.
(NOTE: This interval must match the interval configured in the dbmgrd.conf file.)
4.
Network Address Translation (NAT) and Firewall support — specify port ranges and
peer IP address to allow EA/V to connect to the SightLine Power Agent through a firewall.
5.
Download Throttling — limit the rate at which data is sent from protomgr to EA/V.
6.
Enable/Disable metric selection — select which metric classes to enable or disable from
collection. This section controls which metric classes are requested from agentmgr and
datamgr.
7.
Exclusion section — select metrics to exclude from sending to EA/V. This does not disable
the metric from collection by agentmgr or datamgr.
8.
Data redefinition — specify redefinitions of metrics to appear in EA/V.
Each item is discussed in the following sections.
Protomgr
8-1
Protomgr
Default protomgr.conf
The protomgr.conf file is used to control which metrics are collected, exclude metrics from
appearing in EA/V, and reconfigure default metric definitions that originate from Interface Agents
loaded by agentmgr. This includes all metrics from the System Interface Agent as well as any
additional Interface Agents available. This feature provides backward compatibility for users of
previous SightLine versions.
The default protomgr.conf file is quite large and consequently will not be listed here. In the
discussions below, appropriate portions of the file will be shown and referenced. Consult the
actual file for the definitive answer for any configuration questions.
Data Source Selection
Multiple data sources, such as different Interface Agents, or different configurations, can be
defined in the source.conf file. To discover data sources in EA/V, right-click on the connection
in the Enterprise View, and select Discover Data Sources. EA/V will automatically create new
entries for all data sources defined in the source.conf file. The source name defined in the
source.conf file will be passed to protomgr at the command line with the –S option.
Short ID Definition
SightLine uses a Short ID to identify each system when monitoring more than one host. The
Short ID is declared on the protomgr command line with the -s option.
Unless specified otherwise, SightLine automatically truncates the system name to its first three
characters for the Short ID. There may be times when this is undesirable, such as when two or
more systems have the same first three characters in its name. For example, a system named
TIGER1 and another name TIGER2 will both be truncated to TIG as a Short ID. In such a case,
SightLine will increment the last character to the Short ID of the system that is opened second
(for example, TIG and TIH). Because it is sometimes difficult to determine which connection was
established first, confusion may arise as to which system is actually being monitored. In our
example, TIG could refer to TIGER1 or TIGER2 depending on which connection was made first,
which could change each time a connection is established.
To circumvent this possibility, a user-defined Short ID can be inserted on the command line. The
option can be entered in two ways.
8-2
1.
In EA/V in the Options field of the Advanced Session Settings dialog box, as
shown in the example below (see also the SightLine Expert Advisor/Vision User’s
Guide).
2.
In the source.conf file on the command line for protomgr. If the entry is made in
the Advanced Session Settings dialog box, then it is only effective for a session
launched with this specific Enterprise View. If the entry is made in the source.conf
file, then it is effective for every session, regardless of which PC client initiates it.
Protomgr
Examples:
1.
Advanced Session Settings dialog box:
Figure 8-1. Advanced Session Settings Dialog Box
2.
Add an argument like the one circled in the source.conf file:
#
Source Name
Description
#
SOURCE system “System”
-s m9 ;
Options
In both of these examples, the Short ID is configured to be “m9.” Other options are discussed in
Option Flags later in this chapter.
Collection Interval Definition
The collection interval is the period between data samples requested by the protomgr process.
The default interval is 30 seconds. You can change this parameter in the CINTERVAL statement
of the protomgr.conf file. The CINTERVAL statement defines the interval in an integer number
of seconds. The agentmgr will report values to the protomgr process at this interval, and
ultimately to EA/V as well.
Protomgr
8-3
Protomgr
To modify this parameter, simply change the value 30 to the integer number of seconds desired.
Example:
CINTERVAL 60
In this example, the interval is configured to be 60 seconds. Notice that no semicolon is required
after the statement.
N O T E ————————————————————————The collection interval defined in the protomgr.conf must
agree with the collection interval defined in the datamgr.conf
file. A future release of the host software will permit multiple
differing intervals.
Network Address Translation (NAT) and Firewall Support
Firewall support is provided by the –F and –f options to protomgr. These options can be used to
specify the name and port numbers that protomgr uses to connect to EA/V.
In a NAT network configuration, the locally defined IP address of a machine may not be the same
when accessing the machine from outside the network. In this case, the –F option can be used to
specify the callback IP address of the host to EA/V.
-F host
specify outbound hostname
In a Firewall network configuration, you can control access to a network by limiting the number of
open ports. The –f option can be used to specify a range of ports open for use by SightLine.
-f port,port specify outbound port range
These options can be specified in the source.conf file, or through EA/V’s Edit Connection |
Advanced Settings | Options.
Refer to Option Flags section later in this chapter for a complete list of protomgr options.
Download Throttling
The rate at which data is sent from protomgr to EA/V can be limited by using the –T option
(throttle option) to protomgr, and by specifying the maximum number of characters per second to
be sent. The –T option also has the effect of limiting the CPU and disk activity during this
operation (with corresponding increase in download time).
The –T option can be specified in the source.conf file, or through EA/V’s Edit Connection |
Advanced Settings | Options.
8-4
Protomgr
Enable/Disable Metric Selection
The ENABLE and DISABLE sections in protomgr.conf allow you to control the metrics that are
collected by agentmgr. The ENABLE section specifies a list of metric classes to be requested by
protomgr from agentmgr and datamgr. The DISABLE section omits classes from the request. In
both section, you can specify subclasses to enable or disable subsections of a metric class.
In the example below, the subclasses “Disks” and “IOPs” of “VOS System.IOPs” have been
disabled. The internal names can be found in the registry.csv file in the
FRTLHOME>data>Local directory.
#
# The ENABLE section specifies what data gets requested from the agentmgr.
#
# ENABLE { General }, { VOS System }, { Workloads }
# ENABLE ALL
#
ENABLE { VOS System }, { General }, { CPU Extra }, { DiskQueue }
ENABLE { PathQueue }, { Process Info }, { Workloads }
#
#
#
#
#
#
The DISABLE section specifies what data is not requested from the agentmgr.
(Evaluated after ENABLE section.)
DISABLE { VOS System.IOPs.Busses.Controllers.Disks }
DISABLE { VOS System.IOPs }
Exclusion Section
The EXCLUDE section specifies a list of metrics to be excluded from sending to EA/V.
Complete Data Groups and Event Classes may be excluded, as well as individual metrics. The
names used in this section are the metric names as they appear in EA/V.
Data Groups and Event Classes can be turned on and off, giving you the ability to control exactly
which metrics are being reported and analyzed by EA/V. The ability to control the metrics that
protomgr sends to EA/V allows you to configure the software specifically to your environment.
Protomgr
8-5
Protomgr
The syntax for EXCLUDE is as follows:
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
The EXCLUDE section specifies what data is suppressed from being sent to
Sightline Expert Adviser.
Data groups may be suppressed with:
DATAGROUP "Module CPU Utilization"
Individual metrics may be suppressed with:
DATAGROUP "Module CPU Utilization" DATAVARIABLE "CPU % Idle"
Likewise, eventscope classes and columns may be suppressed with:
EVENTCLASS "Summary"
EVENTCLASS "Summary" EVENTCOLUMN " %Cpu"
The EXCLUDE section takes precendence over all definitions in DATA and
EVENT sections. (Evaluated after DISABLE section.)
EventScope column names must match exactly as listed in the
$FRTLHOME>data>*>registry.csv file(s).
EXCLUDE
DATAGROUP "Module Processes"
DATAGROUP "Module Memory" DATAVARIABLE "Mem Sys Page Faults/Sec"
EVENTCLASS "Identification"
EVENTCLASS "I/O" EVENTCOLUMN "
Reads"
end EXCLUDE
Data Redefinition
The DATA and EVENT sections of the protomgr.conf file specify redefinition of metrics, by
either renaming, rescaling, or repositioning existing metrics prior to sending them to EA/V.
DATA section metrics are organized in groups. Array metrics in the DATA section require
specifying the source of the subscript names.
#
#
#
#
#
#
#
#
GROUP "Module Disk Info"
ARRAY NAME = { VOS System.IOPs.Busses.Controllers.Disks.Name }
VARIABLE { Disk Size MB } POSITION { 4 } =
{ VOS System.IOPs.Busses.Controllers.Disks.Size, mbytes, delta }
end ARRAY
end GROUP
The DATA section recognizes tokens GROUP, ARRAY, NAME, VARIABLE, and POSITION. The
EVENT section recognizes tokens CLASS, COLUMN, and POSITION.
On the left side of the VARIABLE and COLUMN lines is defined the metric name as it is to appear
in EA/V, followed by the position it will appear in the EA/V Variable List or EventList.
8-6
Protomgr
#
#
#
#
#
CLASS "Identification"
COLUMN "
Pid" POSITION { 1 } = { VOS System.Processes.Pid }
end CLASS
The variable (metric) position defines the ordered location within the Group or Event Class.
Omitting the POSITION setting, or setting the position to zero, will default the metric to be added
to the end of the list.
It is not necessary to redefine the entire GROUP or CLASS when modifying or adding metrics.
The internal metric names (on the right side of VARIABLE and COLUMN lines) can be found in the
file(s) $FRTLHOME>data>*>registry.csv. The internal metric name can be followed by one
or more of the following to adjust the way the metric is presented in EA/V. (These words are not
case sensitive.)
Forms:
count - Display the value of a cumulative counter. (If the metric indicates a system activity
then this value will continually increase.)
delta - Display difference in the value of a counter from the preceding sample. ( The metric
will show 'operations per interval'.)
rate -
Display the difference in the value of a counter from the preceding sample, scaled by
the time since the preceding sample. (The metric will show 'operations per second'.)
raw -
Display raw value, as collected by the agentmgr without modification
Scaling:
blocks gb gbytes kb kbytes mb mbytes milli noscale (or just a number: 1024, -30, +50)
The DATA section specifies redefinitions of what will appear in EA/V.
#DATA
# # Report the change per interval in free memory in megabytes
#
# GROUP "Module Memory"
#
VARIABLE { Mem Free Pages } POSITION { 3 } =
#
{ VOS System.CPU.Free Memory, mbytes, delta }
# end GROUP
#
# # Use 60000 because the base time unit is milliseconds and use
# # 'count' to get an ever-increaing value
#
# GROUP "Module CPU Utilization"
#
VARIABLE { CPU Idle Minutes } POSITION { 2 } =
#
{ VOS System.CPU.Empty Idle, 60000, count }
# end GROUP
#
# # Computed metrics may be combined with internal metrics, and multiple
# # arrays may be added to the same group in the DATA section.
Protomgr
8-7
Protomgr
#
# GROUP "Module Disk Units"
#
ARRAY NAME = { VOS System.IOPs.Busses.Controllers.Disks.Name }
#
VARIABLE { Disk Rd % Busy } POSITION { 1 } =
#
{ VOS System.IOPs.Busses.Controllers.Disks.% Read Queue Busy Time }
#
end ARRAY
#
ARRAY NAME = { DiskQueue.Name }
#
VARIABLE { Disk % Busy } POSITION { 1 } =
#
{ DiskQueue.Disk % Busy }
#
end ARRAY
# end GROUP
#
#end DATA
The EVENT section specifies redefinitions of what will appear in the EventList windows in EA/V. It
is important that the number and placement of the spaces in the COLUMN name matches exactly
the COLUMN names as listed in the file $FRTLHOME>data>Local>registry.csv. Duplicate
entries may result if the placement of the spaces differs.
All column metrics must originate from the same internal class in the EVENT section. Computed
metrics may not be combined with internal metrics within Event Classes.
#EVENT
#
# CLASS "Summary"
#
COLUMN "
Reads" POSITION { 4 } = { VOS System.Processes.Reads, 10000 }
# end CLASS
#
#end EVENT
Event Data
The EventList Window section of the SightLine Expert Advisor/Vision User’s Guide describes
EA/V’s EventList window. For VOS systems, Summary, CPU, Memory, I/O, and
Identification Event Classes are delivered by default. Each Event Class displays
information about a select group of processes on the system for each interval. Figure 8-2 shows
an example of the Summary EventList display.
8-8
Protomgr
Figure 8-2. Summary Event Class Data
Passing protomgr Command Options
As mentioned earlier, command options can be passed from EA/V on the PC in the Options field
of the Advanced Session Settings dialog box (Figure 8-1).
The protomgr command options allow users to configure more complex working environments,
such as user-defined Short IDs. See the section Option Flags for a complete list of protomgr
options.
Option Flags
The user-configurable command line options for protomgr can be displayed by issuing the
following command:
protomgr -h
below.
usage: protomgr [-dhv] -c host -P port -V host -K key [-C version]
[-E version] [-m host] [-n name] [-p port] [-q port]
[-D database] [-s shortId] [-x1] [-x2] [-x3] [-x4]
[-S source] [-i interval] [-e maxEvent]
[-F host] [-f port,port ]
[livewanted={t|f|y|n}]
-d
-h
-v display
-C version
turn on
display
version
version
debugging
this message
information
of SightLine compression to use
Protomgr
8-9
Protomgr
8-10
-c host
-D database
-E version
-e maxEvent
-F host
-f port,port
-K key
-i interval
-m host
-n name
-P port
-p port
-q port
-r
-S source
-s shortId
-T
-t
-V host
-x1
-x2
-x3
-x4
specify alternate agentmgr host
name of datamgr database to read
version of SightLine encryption to use
specify maximum limit to EventList in bytes
specify outbound port range
specify SightLine key
specify collection interval (bypass config file)
specify alternate datamgr host
specify SightLine 'answer' port
specify alternate agentmgr port
allow non-updated variables
specify data source (as defined in source.conf)
specify short ID for host
throttling algorithm
throttling algorithm on reading side
specify SightLine 'answer' host
allow long subscripts
restart on normal exit
terminate without verifying done
livewanted=
control whether or not protomgr switches to live data or
not after sending historical data
Chapter 9
Command Syntax — Quick Reference
The following sections show the help output from each SightLine Power Agent process.
Agentmgr
Usage: agentmgr [-dfhuv] [-n name] [-p port] [-z level]
-d
-f
-h
-u
-v
-n name
-p port
-z level
run in foreground
do not compress data (same as -z 0)
specify alternate conf and log file name
specify TCP port to listen on
specify compression level (0=off, 9=max)
Datamgr
Usage: datamgr [-dfhv] [-q port] [-n name] [-O output_file_number]
-d
-dd
-f
-h
-v
-q port
-n name
-O number
turn on debugging
turn on an additional level of debugging
run in foreground
suffix to the datamgr.out file
Threshd
Usage: threshd [-bdfhvM] [-n name]
-d
-f
-h
-v
-b
-n name
-M
run in foreground
show version information
swap IP address byte order in SNMP trap header
generate SNMP MIB
9-1
Servd
Usage: servd [-dfhv] [-p port]
-d
-f
-h
-v
-p port
turn on debugging (d can be repeated)
run in foreground
specify alternate TCP/UDP port
Protomgr
usage: protomgr [-dhv] -c host -P port -V host -K key [-C version]
[-E version] [-m host] [-n name] [-p port] [-q port]
[-D database] [-s shortId] [-x1] [-x2] [-x3] [-x4]
[-S source] [-i interval] [-e maxEvent]
[-F host] [-f port,port ]
[livewanted={t|f|y|n}]
9-2
-d
-h
-v display
-C version
-c host
-D database
-E version
-e maxEvent
-F host
-f port,port
-K key
-i interval
-m host
-n name
-P port
-p port
-q port
-r
-S source
-s shortId
-T
-t
-V host
-x1
-x2
-x3
-x4
turn on debugging
version information
version of SightLine compression to use
specify alternate agentmgr host
name of datamgr database to read
version of SightLine encryption to use
specify maximum limit to EventList in bytes
specify outbound port range
specify SightLine key
specify collection interval (bypass config file)
specify alternate datamgr host
specify SightLine 'answer' port
specify alternate agentmgr port
allow non-updated variables
specify data source (as defined in source.conf)
specify short ID for host
throttling algorithm
throttling algorithm on reading side
specify SightLine 'answer' host
allow long subscripts
restart on normal exit
terminate without verifying done
livewanted=
control whether or not protomgr switches to live data or
not after sending historical data
slagent Command Macro
Usage: slagent action_string [-da] [-dd] [-ds] [ -dt ]
{ start | stop | restart | status }
-da
-ds
-dd
-dt
starts
starts
starts
starts
start
stop
restart
starts all SightLine Power Agent processes
stops all SightLine Power Agent processes
stops and then restarts all SightLine Power Agent
processes
reports the status of all SightLine Power Agent
processes
status
SightLine Power Agent for VOS Systems: Power Agent
9-3
Chapter 10
Troubleshooting
This chapter provides help in troubleshooting problems with the SightLine Power Agent software.
The table below is a list of common problems and suggested cures.
If these suggestions do not apply to your situation, you might be able to gain more insight by
inspecting the various log files. The following are the procedures that are used to determine the
cause of an error.
If you start up the SightLine Expert Advisor/Vision (EA/V) client software and cannot connect to
the host, inspect the protomgr log, which is located in the following directory:
FRTLHOME>log>protomgr.log
Based on messages in the log, inspect the logs for datamgr and agentmgr, located in the
following directory:
FRTLHOME>log>datamgr.log
FRTLHOME>log>agentmgr.log
Further debug messages can be supplied if the software components are run in debug mode. To
restart agentmgr and datamgr in debug mode, issue the following command:
slagent –da –dd start
The first parameter invokes the agentmgr in debug mode, while the second parameter invokes
datamgr. The easiest method for starting protomgr in debug mode is to set the option in EA/V,
under Advanced Session Settings Options.
Contact SightLine Technical Support for additional help.
Troubleshooting
10-1
Troubleshooting
Symptom
Possible Cures
Agentmgr starts then stops
a few minutes later
Check the agentmgr.log file; the key may have expired or may be
invalid
The agentmgr socket may not have timed out; perform a netstat –
na and see if 8700 is in use
No data files are being
created
Check to make sure datamgr is running
Check the datamgr.log file
Restart datamgr
“Connection refused by
host” in EA/V log file
Ensure servd is running on the host
Check the servd.log file
Restart servd
“Remote closed session
(received FD_CLOSE)” in
EA/V log file
Check the protomgr.log file
EA/V PC download stop at
“Host is Version…”
Agentmgr is not running or string mismatch between HostName used in
EA/V on the PC & in datamgr.conf
Attempt to reconnect
Started download before software was initialized; restart download
Check the log files for errors/warnings
Restart all processes
No Event data in EventList
window
No .vev file on PC
Not all processes starting
Make sure the .pms exist in FRTLHOME>bin
Check the slagent.cm command macro in FRTLHOME>bin to
ensure that frtldbm is set to 1
Log files contain the
message “unable to bind
address to socket...”
10-2
Some ports that the software is trying to use are already occupied. Run
the command netstat -na and see if the port is already being used.
If the socket is in use by another application, the SightLine processes
can be assigned to another port. See the chapter pertaining to the
problem process for a description of how to change the port.
Troubleshooting
Symptom
Possible Cures
Error message reads “No
data available for # second
interval” in datamgr.conf
or EA/V
The collection intervals in datamgr.conf and protomgr.conf do
not match. Edit the protomgr.conf file to reflect the appropriate
interval.
New Workloads are not
showing up in EA/V on PC
Check the agentmgr.log file to see if there are any syntax errors in
agentmgr.conf. Make sure you have stopped and restarted the
software to recognize the changes. Make sure you have reinitialized
the trace file on the PC in Create mode to force down the new symbol
table.
Troubleshooting
10-3
Appendix A
Analyze_system Interface
The SightLine Power Agent for VOS Systems has an interface to analyze_system that is installed
by default with the Power Agent software. This interface is initiated automatically when the
agentmgr process is started.
To configure the analyze_system interface, update the analyze_system.conf file in the
FRTLHOME>etc> directory.
If you do not want this interface, edit FRTLHOME>bin>slagent.cm and set frtlasi to 0.
Configuration
The analyze_system.conf file must be updated if you want to use the analyze_system
interface. This file contains many comments to aid you with your configuration. The default
analyze_system.conf file is shown here.
# ident "@(#)$Id: analyze_system.conf,v 1.1.2.1 2001/07/04 11:58:45 dsimmond
Exp $
#
# PROGRAM is used to specify the program to run
# Full path name needed if cannot be found via start_up.cm
# of the user who started agentmgr
PROGRAM "analyze_system"
# PROMPT is the string that is sent back when the program is ready
# for more input
PROMPT "as: "
# When outputting to a pipe file, analyse_system does not flush
# the as: prompt until it has some more output. To overcome this dummy
# requests are sent. If this is fixed or you are using a different program
# which DOES flush the prompt, you can switch on NOFLUSH
#NOFLUSH
# ERRMATCH is used to check for errors. If the output from a request matches
# one of the ERRMATCH lines, the request has failed and will not be attempted
# at any further intervals.
ERRMATCH "Entry point name not found."
ERRMATCH "No program is currently loaded."
A-1
Appendix A - Analyze_system Interface
# There are two types of requests possible.
# Ones for array classes, and ones for scalar classes.
# The format of an array class is as follows:
# CLASS { Class Name } QUERY "query string" [MODULE "module name"]
# [MATCH "match string"] [NOTMATCH "notmatch string" ... ]
# [STARTLINE { start line }] [ENDLINE { end line }]
# ARRAY NAME COLUMN { array name column number }
#
VARIABLE variable_type { variable name } [GROUPNAME { group name }]
#
[PCNAME { pc name }] [POSITION { position }]
#
[INPFORM input_form] [INPSCALE input_scale] [OUTFORM output_form]
#
[OUTSCALE output_scale] [EVENT]
#
COLUMN { column number };
#
...
# end ARRAY
# end CLASS
#
#
#
#
#
#
#
#
#
#
#
#
#
#
Anything in square braces is optional.
... indicates more than one of the preceeding is possible.
The idea is that "query string" is sent to analyze_system and the output
read back.
Before that any "module name" or "match string"'s are sent.
Any lines matching any "notmatch string"'s are ignored.
if { start line } is specified, any lines before this number are skipped.
e.g. if { start line } is { 2 }, the first line is skipped.
if { end line } is specified, this number of lines are skipped at the end.
if { end line } is negative, it is taken as the number of lines to skip at
the end.
e.g. STARTLINE { 2 } ENDLINE { 4 } will process lines 2,3 and 4.
e.g. STARTLINE { 3 } ENDLINE { -1 } will skip the first two lines and the
last one.
#
#
#
#
#
#
#
#
#
#
#
#
#
#
Each line that is read in is broken into columns. Separators are space, tab
and slash.
{ array name column number } specifies which column is the name of the array
element, for example if you were listing disks, this would be the disk name.
variable_type can be any one of: bool, float, float64, int, int32, int64,
string, str8, uint, u_int, uint32, u_int32, uint64, u_int64.
{ variable name } is the internal name of the variable.
{ group name } is the group that the variable will appear under on the PC.
{ pc name } is the name of the variable on the PC.
{ position } is for ordering the variables on the PC.
input_form and input_scale are the form and scale of the variable as it
appears in the line read from analyze_system.
output_form and output_scale are the form and scale of the variable as it
should appears on the PC.
# Form can be any one of: count, zcount, delta, rate, raw, pctdur.
# Scale can be any one of: base, byte, blocks, pblocks, gb, gbytes,
# kb, kbytes, mb, mbytes, kilo, mega, giga, milli, second, seconds, minute,
# minutes, hour, hours, day, days, noscale.
# Specify EVENT if the variable is to appear in the event scope.
# { column number } specifies which column the variable appears in.
A-2
# Here is an example array class.
#CLASS { Channels } QUERY "dump_channels -meter" MATCH "term"
#
NOTMATCH "not asynchronous."
#
STARTLINE { 2 } ENDLINE { -1 }
#
ARRAY NAME COLUMN { 1 }
#
VARIABLE FLOAT { Ochars } GROUPNAME { Module Channels }
#
PCNAME { Ochars } POSITION { 2 } INPFORM count OUTFORM delta
#
COLUMN { 3 };
#
VARIABLE FLOAT { Ichars } GROUPNAME { Module Channels }
#
PCNAME { Ichars } POSITION { 1 } INPFORM count OUTFORM delta
#
COLUMN { 2 };
#
end ARRAY
#end CLASS
# The format of a scalar class is as follows:
# CLASS { Class Name } QUERY "query string" [MODULE "module name"]
# [MATCH "match string"] [NOTMATCH "notmatch string" ... ]
#
VARIABLE variable_type { variable name } [GROUPNAME { group name }]
#
[PCNAME { pc name }] [POSITION { position }]
#
[INPFORM input_form] [INPSCALE input_scale] [OUTFORM output_form]
#
[OUTSCALE output_scale] [EVENT]
#
LINE { line number } COLUMN { column number };
#
...
# end CLASS
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
Anything in square braces is optional.
... indicates more than one of the preceeding is possible.
The idea is that "query string" is sent to analyze_system and the output
read back.
Before that any "module name" or "match string"'s are sent.
Any lines matching any "notmatch string"'s are ignored.
Each line that is read in is broken into columns. Separators are space, tab
and slash.
variable_type can be any one of: bool, float, float64, int, int32, int64,
string, str8, uint, u_int, uint32, u_int32, uint64, u_int64.
{ variable name } is the internal name of the variable.
{ group name } is the group that the variable will appear under on the PC.
{ pc name } is the name of the variable on the PC.
{ position } is for ordering the variables on the PC.
input_form and input_scale are the form and scale of the variable as it
appears in the line read from analyze_system.
output_form and output_scale are the form and scale of the variable as it
should appears on the PC.
# Form can be any one of: count, zcount, delta, rate, raw, pctdur.
#
#
#
#
#
#
Scale can be any one of: base, byte, blocks, pblocks, gb, gbytes,
kb, kbytes, mb, mbytes, kilo, mega, giga, milli, second, seconds, minute,
minutes, hour, hours, day, days, noscale.
Specify EVENT if the variable is to appear in the event scope.
{ line number } specifies which line the variable appears in.
{ column number } specifies which column the variable appears in.
# Here is an example scalar class.
A-3
#CLASS { Cache Meters } QUERY "cache_meters" MODULE "%es#m18"
#
VARIABLE FLOAT { File Hits } GROUPNAME { Module Cache }
#
PCNAME { File Hits/Sec } POSITION { 1 } INPFORM count OUTFORM rate
#
LINE { 3 } COLUMN { 3 };
#
VARIABLE FLOAT { Directory Hits } GROUPNAME { Module Cache }
#
PCNAME { Directory Hits/Sec } POSITION { 2 } INPFORM count OUTFORM rate
#
#
VARIABLE FLOAT { Directory Misses } GROUPNAME { Module Cache }
#
PCNAME { Directory Misses/Sec } POSITION { 3 } INPFORM count OUTFORM rate
#
#end CLASS
CLASS { IOP Meters } QUERY "use_iop 10 –file
(master_disk)>system>prom_code>K6000fw18.0rom;dump_iop_meters"
VARIABLE FLOAT { CmdsSent2 } GROUPNAME { Module AS }
PCNAME { CmdsSentRate} POSITION { 2 } INPFORM count OUTFORM rate
VARIABLE FLOAT { IdleSecsDel } GROUPNAME { Module AS }
PCNAME { IdleSecsDel } POSITION { 2 } INPFORM count OUTFORM delta
end CLASS
CLASS { Cache Meters } QUERY "use_module;cache_meters"
VARIABLE FLOAT { File Hits } GROUPNAME { Module ASCache }
PCNAME { File Hits/Sec } POSITION { 1 } INPFORM count OUTFORM rate
VARIABLE FLOAT { Directory Hits } GROUPNAME { Module ASCache }
PCNAME { Directory Hits/Sec } POSITION { 2 } INPFORM count OUTFORM rate
VARIABLE FLOAT { Directory Misses } GROUPNAME { Module ASCache }
PCNAME { Directory Misses/Sec } POSITION { 3 } INPFORM count OUTFORM rate
end CLASS
A-4
Analysis
version 6.1
Contents
Introduction
1
Sample Environment: VOS.VEN
Main Page
CPU Utilization Plot
Memory Pages in Use and Free Plot
I/O Rate Plot
CPU Usage by Workload Plot
Process States TopList
CPU Page
Avg CPU Response Plot
Queue Meter Seconds Plot
CPU Completions and Interrupts Plot
CPU % Busy and % Wait Plot
Active CPUs Plot
Memory Page
Total Memory and Free Pages Plot
Paging File Usage Plot
Page Faults by Type Plot
Cache Hit Rate Plot
Wired/Unwired Pages Plot
Workload Memory Usage TopList
I/O Page
Disk I/O Plot
Disk % Busy Plot
Disk I/O TopList
File I/O Plot
Disk Free Space (MB) Plot
Wkld Disk I/O TopList
Member Count Plot
% Free Space Plot
Avg Q Length by Disk Plot
Cache Page
Cache Activity and Hit Rate Plot
Cache Soils Plot
Workload Cache Usage TopList
Workloads Page
Task Count by Workload Plot
Workloads TopList
Wkld Memory Usage Plot
2
2
3
4
5
6
7
7
7
9
10
11
12
13
14
14
15
16
17
18
19
20
20
21
22
23
24
25
26
27
28
29
29
30
31
32
32
33
34
35
Contents
i
Contents
Reads by Workload Plot
Writes by Workload Plot
36
37
Sample AutoAlert System: VOS.VTH
Memory % Free
Page File % Free
Cache Read Hit %
Page Fault Rate
CPU % Other
CPU % Interrupts
Total CPU Utilization
CPU Wait-Busy Ratio
38
38
39
39
39
39
40
40
40
AutoAnalyze Rules and Reports
Disk-Busy
CPU-Waiting Exceeds Running
Disk-Error(s) Detected
CPU-Busy
CPU-Too many Interrupts
CPU-High Scheduler Overhead
Memory-High Page Fault Rate
Cache-Low Hit Rate
Disk-File Sys Free Space Low
Memory-Pagefile Space Low
Memory-Low Free Space
40
41
41
41
41
41
42
42
42
42
42
43
ii
SightLine Power Agent for VOS Systems: Analysis (R464)
Figures
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
Figure 10.
Figure 11.
Figure 12.
Figure 13.
Figure 14.
Figure 15.
Figure 16.
Figure 17.
Figure 18.
Figure 19.
Figure 20.
Figure 21.
Figure 22.
Figure 23.
Figure 24.
Figure 25.
Figure 26.
Figure 27.
Figure 28.
Figure 29.
Figure 30.
Figure 31.
Figure 32.
Figure 33.
Figure 34.
Figure 35.
I/O Rate Plot
Active CPUs Plot
Cache Hit Rate Plot
Disk I/O Plot
Disk % Busy Plot
Disk I/O TopList
File I/O Plot
Disk % Busy Plot
% Free Space Plot
Cache Soils Plot
Wkld Cache Usage TopList
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Figures
iii
Analysis
This section of the SightLine Power Agent for VOS Systems User’s Guide describes the sample
files provided on your installation media. By using these examples to “jump start” your SightLine
sessions, you’ll spend less time learning how SightLine works and more time learning how
SightLine works for you.
Introduction
The sample file collection on your distribution package includes several types of SightLine Expert
Advisor/Vision (EA/V) files that you can use on your system:
•
Environments (file extension .VEN) — A set of plots and TopLists on one or many pages.
•
Threshold Systems (file extension .VTH) – A set of conditions that, when violated, cause
EA/V to display a message and, optionally, take other action.
•
AutoAnalyze rules and reports — A set of rules, implemented using expressions (file
extension .VEX) and reports (Word .RTF files with linked EA/V plots (.VPL)) that
summarize the activity and performance of your VOS system.
Each of these is described in the remaining sections of this chapter.
Analysis
1
Analysis
Sample Environment: VOS.VEN
The sample environment provided with the EA/V installation kit, VOS.VEN, gives you a good
starting point for performance analysis. Use it as delivered, or modify it to build a library of
analytical views that fits the specific needs of your own computing environment.
By default, a copy of this environment is loaded for each live connection. To manually load an
environment, click File | Open | Environment to bring up the Open Environment dialog box.
Then, select the filename of the environment you want to load. At the bottom of the dialog box,
choose the system to which you want the environment to apply (Force into Trace File System)
and the environment load options.
The following sections describe the sample environments shipped with your EA/V installation kit.
The descriptions follow this format:
Page Name: The name of the page and a brief description of its intended purpose and the plots
it contains.
Plot Title: If there are multiple plots on the page, each will have its own descriptive paragraph.
This is usually the text from the plot title bar and a brief description of the issue that it addresses,
the metrics it contains, how it is formatted, and how to use it to interpret your system’s condition.
The VOS.VEN environment uses plots to give you a high-level overview of the system resources
CPU, Memory, I/O, and Workload. The VOS environment uses EA/V’s drill-down feature to help
you quickly pinpoint system problems. It consists of six pages, each with multiple plots and
TopLists. Use it as a launch point for analysis or as a general purpose screen to keep up at all
times. You can customize it to suit your site’s needs.
The drill-down links exist in each plot on the Main Page, and on selected plots on the CPU and
Memory pages, as described in the diagram below. To activate the drill-down links, double-click
on the active plot when the cursor looks like a little hand. If you double-click anywhere else, you
will return to the Main page of the environment.
Main
CPU
I/O
Workloads
Memory
Workloads
Cache
Main Page
The Main Page of VOS.VEN contains four plots and a minimized TopList window. This is the top
level page of the VOS environment. Each plot contains drill-down links to pages detailing that
plot’s contents. Double-click on a plot when the cursor looks like a small hand, and EA/V will
present the page that contains the detailed display you need to identify potential problems before
2
Analysis
they seriously affect your system. For more information on EA/V’s drill-down feature, see the
topic, Fixed Section, Page, in the SightLine Expert Advisor/Vision User’s Guide.
The CPU Utilization plot (Figure 1) describes how much of your total CPU resource is being
utilized. The CPU resources are broken up in the components as displayed by the
display_system_usage –long VOS command.
Figure 1. CPU Utilization Plot
CPU % System tells you what percentage of the CPU resource all the system processes are
using.
CPU % User tells you what percentage of the CPU resource all the user processes are using.
CPU % Server tells you what percentage of the CPU resource all the server processes are using.
CPU % Interrupts tells you what percentage of the CPU resource was used to handle the
interrupts on a module.
CPU % User PF tells you what percentage of the CPU resource the page faulting activity done by
all the user processes is consuming.
CPU % System PF tells you what percentage of the CPU resource the page faulting activity done
by all the system processes is consuming.
Analysis
3
Analysis
CPU % Server PF tells you what percentage of the CPU resource the page faulting activity done
by all the server processes is consuming.
CPU % Other tells you what percentage of the CPU resource was spent in the scheduler.
The Memory Pages in Use and Free plot (Figure 2) describes how much of your total memory
resource is being utilized. The memory resources are broken up in the components as described
below.
Figure 2. Memory Pages in Use and Free Plot
Mem System-Cache Pages is an expression metric (“Mem System Pages” – “Mem Cache Phys
Pages”) that tells you how many pages the VOS operating system is using, not including the
pages in use by the Cache Manager.
Mem Cache Phys Pages tells you how many pages of memory the cache manager is using.
Mem User Pages is an expression metric (“Mem Total Pages” – “Mem Free Pages” –”Mem
System Pages”) that tells you how many pages of memory user processes are using.
Mem Free Pages tells you how many pages of memory are currently free (unassigned to the OS
or to any process).
4
Analysis
I/O Rate Plot
The I/O Rate plot (Figure 3) shows the overall I/O activity on the system, broken down into the
components described below.
Figure 3. I/O Rate Plot
Disk User Reads/Sec tells you the number of reads per second that all the user processes were
charged with. A process is charged for a read when it requests a block of disk that is not in the
cache manager and, thus, a physical I/O is required.
Disk User Writes/Sec tells you the number of writes per second that all the user processes were
charged with. A process is charged for a write when it is the first process to modify (dirty) a block
in the cache manager and, thus, it needs to be written out sometime in the future.
Disk Sys Reads/Sec tells you the number of reads per second that all the system processes
were charged with. A process is charged for a read when it requests a block of disk that is not in
the cache manager and, thus, a physical I/O is required.
Disk Sys Writes/Sec tells you the number of writes per second that all the system processes
were charged with. A process is charged for a write when it is the first process to modify (dirty) a
block in the cache manager and, thus, it needs to be written out sometime in the future.
Disk Svr Reads/Sec tells you the number of reads per second that all the server processes
(StrataLink, StrataNet, and OSL) were charged with. A process is charged for a read when it
requests a block of disk that is not in the cache manager and, thus, a physical I/O is required.
Analysis
5
Analysis
Disk Svr Writes/Sec tells you the number of writes per second that all the server processes
(StrataLink, StrataNet, and OSL) were charged with. A process is charged for a write when it is
the first process to modify (dirty) a block in the cache manager and thus it needs to be written out
sometime in the future.
The CPU Usage by Workload plot (Figure 4) shows the overall CPU usage, broken down by the
workloads as defined in the FRTLHOME>etc>agentmgr.conf file. (You should modify
agentmgr.conf to capture the work done by your applications. See Chapter 4 of the Power
Agent section of this User’s Guide for a description of defining workloads.) Process and workload
metrics are all on a per CPU basis, so this plot can scale from 0 to 100 * Number of CPUs.
Figure 4. CPU Usage by Workload Plot
Wkld % CPU for a workload tells you what percentage of the total available CPU resource the
workload is consuming. It is reported on a scale from 0 to 100 * Number of CPUs
6
Analysis
The Process States TopList (Figure 5) shows the total number of processes active, broken down
by the components described below.
Figure 5. Process States TopList
Procs Count for WaitShrt tells you how many processes are in a short wait state. A process in
the wait state (state = 4, also called short wait) is waiting for some action to occur; typically, it is
waiting for some I/O.
Procs Count for Rdy tells you how many processes are in a ready state. A process in the ready
state (state = 1) is either executing or waiting to execute in the CPU queue.
Procs Count for Frozen tells you how many processes are in a frozen state. A process in put in
a frozen state (state = 2) when a privileged user performs a freeze_process command for that
process. The thaw_process command restores a process to the ready or waiting state.
Procs Count for Stopped tells you how many processes are in a stopped state. A process in the
wait state (state = 0) has terminated but is waiting for VOS to destroy it.
CPU Page
The CPU Page is intended to provide a detailed look at the activity and performance of the
processors. It contains six plots.
The Avg CPU Response plot (Figure 6) reports the average number of milliseconds per visit to
the CPU. The total response time (CPU Residence Msec/Completion) and its components (CPU
Wait and CPU Busy) are shown.
Analysis
7
Analysis
Figure 6. Avg CPU Response Plot
CPU Resident Msec/Completion is an expression metric (1000 * “CPU Residence Time Secs” /
(“CPU Completes/Sec” * “Interval”)) that reports the total number of milliseconds an average visit
to the CPU took. This includes processing and waiting time.
CPU Busy Msec/Completion is an expression metric (1000 * “CPU Busy Time Secs” / (“CPU
Completes/Sec” * “Interval”)) that reports the average number of milliseconds each visit to the
CPU took.
CPU Wait Msec/Completion is an expression metric (1000 * “CPU Wait Time Secs” / (“CPU
Completes/Sec” * “Interval”)) that reports the average number of milliseconds the process waited
for a visit to the CPU.
8
Analysis
The Queue Meter Seconds plot (Figure 7) reports the number of seconds during the interval that
the CPU was busy and the number of seconds that processes spent waiting for the processor.
Figure 7. Queue Meter Seconds Plot
CPU Wait Time Secs tells you how many seconds all the processes on the module were waiting
just for the CPU. This meter shows the delay a lack of CPU resources is imposing on the active
processes. There is no upper bound for this meter as (within certain limits) you can keep adding
processes to an overwhelmed 100% CPU busy system and they too will sit in the queue for the
CPU and push this meter higher.
CPU Busy Time Secs tells you how many seconds all the processes on the module were
executing on a CPU. The largest possible value here is the sample period (CPU Queue
Meter_ET) times the number of logical CPUs (CPU Logical Cpus). At that point, the module
would be 100% busy.
Analysis
9
Analysis
The CPU Completions and Interrupts plot (Figure 8) reports the rate of CPU completions (visits)
and the rate of CPU interrupts.
Figure 8. CPU Completions and Interrupts Plot
CPU Completes/Sec tells you how often a process on this module loses the CPU it was running
on. A process can lose a CPU because it terminates, decides to wait for some event, waits for an
IO, or because some other process with a better priority takes the CPU away.
CPU Interrupts/Sec tells you the number of interrupts per second on the module. Interrupts
generally happen when a comm or a disk controller has completed some operation the system
requested or is waiting for. There are some time-based interrupts, but for the most part, the
number of interrupts is determined by how much work you are pushing through the system.
The interrupt rate and the amount of work you are pushing through the module should correlate
well. If you notice a sudden jump in interrupts for a given workload, you may have a problem with
a comm line or some process is doing things in an inefficient manner (no wait IO, sending small
rather than large packets, etc.).
10
Analysis
The CPU % Busy and % Wait plot (Figure 9) reports the percentage of time the CPU was busy
processing and the percentage of time that processes were waiting for the CPU.
Figure 9. CPU % Busy and % Wait Plot
CPU % Busy tells what percent of the time the CPU(s) on the module are busy. This meter
should vary between ~0% for an idle machine to ~100% for a machine that is consuming all of its
CPU power. All processes running on this module contribute to this total.
CPU % Wait tells you how much time all the processes spent waiting for the CPU, expressed as
a percentage. This value can grow above 100%, as more and more jobs waiting in the queue
contribute to overall wait time.
Analysis
11
Analysis
The CPU Usage by Workload plot (Figure 10) shows the overall CP U usage, broken down by the
workloads as defined in the FRTLHOME>etc>agentmgr.conf file. (You should modify
agentmgr.conf to capture the work done by your applications. See Chapter 4 in the Power
Agent section of this User’s Guide for a description of defining workloads.) Process and workload
metrics are all on a per CPU basis, so this plot can scale from 0 to 100 * Number of CPUs.
Wkld % CPU for a workload tells you what percentage of the total available CPU resource the
workload is consuming. It is reported on a scale from 0 to 100 * Number of CPUs
12
Analysis
Active CPUs Plot
The Active CPUs plot (Figure 11) shows average number of CPUs active during the interval.
Figure 11. Active CPUs Plot
CPU Logical CPUs tells you the number of logical CPUs the system has. It tells you how many
processes can be simultaneously executing. On a Stratus computer, there are multiple physical
CPU chips running in lockstep as one logical CPU to provide the hardware fault tolerance, thus
the distinction between logical and physical CPUs.
Analysis
13
Analysis
Memory Page
The Memory Page is designed to provide details about the activity and performance of memory. It
contains five plots and one TopList.
The Total Memory and Free Pages plot (Figure 12) shows the total number of 4096 byte pages
configured on the system, and the number of those which are free or unused.
Figure 12. Total Memory and Free Pages Plot
Mem Total Pages tells you how many pages of memory are installed on the module.
Mem Free Pages tells you how many pages of memory are currently free (unassigned to the OS
or to any process). Under normal circumstances, the system never runs out of free pages. When
the number of free pages gets down to about 2% of main memory, the paging daemon starts
creating new free pages by tossing old pages out of memory.
14
Analysis
The Paging File Usage plot (Figure 13) shows the free and used pages for the Paging file.
Figure 13. Paging File Usage Plot
Paging Used Pages tells you the total number of pages in use in your paging partition and
paging file(s). For every page of memory in a user process there is allocated one page (block) of
paging partition space as insurance so VOS knows it has someplace to put the page if memory
gets full. Even if there is lots of room in memory this one to one allocation continues. Pages are
not written to disk unless there is a memory shortage.
Paging Free Pages tells you the total number of pages of free space in your paging partition and
paging file(s). For every page of memory in a user process there is allocated one page (block) of
paging partition space as insurance so VOS knows it has someplace to put the page if memory
gets full. Running out of paging space is bad because processes can die.
Analysis
15
Analysis
The Page Faults by Type plot (Figure 14) shows the rate at which page faults were generated,
broken down by the components described below.
Figure 14. Page Faults by Type Plot
Mem User Page Faults/Sec tells you the number of page faults the user processes (not VOS
system or server processes) were charged for during the sample. Specifically, this meter tells you
how many times a process discovered that a page it needs is not there. Many types of page faults
require no disk IO and can happen when there is plenty of free memory, so don’t automatically
assume that a high page fault rate means the module is low on memory.
Mem Sys Page Faults/Sec tells you the number of page faults the system processes (overseer,
rsn, tp_overseer, mail_handler, batch_overseer, cache_manager) were charged for during the
sample. Specifically, this metric tells you how many times a system process discovered that a
page it needs is not there. Many types of page faults require no disk IO and can happen when
there is plenty of free memory, so don’t automatically assume that a high page fault rate means
the module is low on memory. Since these processes don’t tend to page fault at all until there is a
real lack of memory, any page faulting here should get your attention focused on the memory
resource.
Mem Svr Page Faults/Sec tells you the number of page faults the server processes (link_server,
network_client, StrataNet, network_server, open_client, open_server, osl_server) were charged
for during the sample. Specifically, this metric tells you how many times a server process
discovered that a page it needs is not there. Many types of page faults require no disk IO and can
happen when there is plenty of free memory, so don’t automatically assume that a high page fault
rate means the module is low on memory. Since these processes don’t tend to page fault at all
16
Analysis
until there is a real lack of memory, any page faulting here should get your attention focused on
the memory resource.
Cache Hit Rate Plot
The Cache Hit Rate plot (Figure 15) shows the percentage of reads that were satisfied by data
already in the cache.
Figure 15. Cache Hit Rate Plot
Cache Read Hit % is an expression metric (“Mem Cache Read Hits/Sec” / (“Mem Cache Read
Hits/Sec” + “Mem Cache Read Misses/Sec”) * 100). The disk cache is a part of the module main
memory that is set aside to provide a buffer between the physical disks and the applications.
When an application goes to read a record from a block(s) in cache, it either finds it there (a hit)
or it has to wait for the IO subsystem to bring it into memory (a miss). This metric reports the
percentage of requests that did not require an I/O (hits).
Analysis
17
Analysis
The Wired/Unwired Pages plot (Figure 16) reports the number of 4096 byte pages that are wired,
unwired, and free.
Figure 16. Wired/Unwired Pages Plot
Mem Wired Pages tells you how many pages of memory are wired. A wired page is a page that
cannot be paged out. Critical sections of VOS (the paging code) are in wired memory. User code
cannot wire memory. This count does not include memory used by the cache manager (see
“Mem Cache Phys Pages”) or the pageable parts of VOS (see “Mem System Pages”).
Mem Unwired Used Pages is an expression metric “Mem Total Pages” – “Mem Wired Pages” –
“Mem Free Pages”) that tells you how many pages of memory are unwired and in use. (Free
pages are also considered unwired.) An unwired page is the opposite of a wired page in that it is
a page that can be paged out.
18
Analysis
The Workload Memory Usage TopList (Figure 17) reports memory statistics for the workloads
defined in FRTLHOME>etc>agentmgr.conf. (You should modify agentmgr.conf to capture
the work done by your applications. See Chapter 4 of the Power Agent section of this User’s
Guide for a description of defining workloads.)
Figure 17. Workload Memory Usage TopList
Shared mem tells you the shared memory pages in use by the workload. When two or more
processes run the same .pm, they share one copy of each code page. Also, this meter counts the
pages of memory explicitly shared when processes use SVM (Shared Virtual Memory).
Unshared mem tells you the number of unshared memory pages in use by the workload. These
are typically pages of memory that hold the variables, arrays and data structures of the program.
NOTE
If two processes are running the same .pm, they share the code
pages unless the .pm is physically located on another module.
When that happens, each process stores a private copy of the
entire .pm in the paging area. This slows down startup and
wastes a lot of memory and paging space.
Flts/Sec tells you the page faults generated by the workload, expressed as a rate per second.
Page faults can be a symptom of insufficient memory, an application memory leak, or they can be
caused by an application design where memory is used for a short period and then returned, thus
causing page faults on each repeated allocate. Typically, if the page faulting is widespread
among most processes, there is a shortage of memory. If only one process is page faulting, that
can be fixed with a code change.
Analysis
19
Analysis
I/O Page
The I/O Page is designed to provide details about the activity and performance of the I/O
subsystem. It contains seven plots and two TopLists.
Disk I/O Plot
The Disk I/O plot (Figure 18) shows the overall I/O activity on the system.
Figure 18. Disk I/O Plot
Disk IOs/Sec tells you the number that is the sum of all the process reads and writes plus all the
I/Os generated by paging activity. A process is charged for a read when it requests a block of disk
that is not in the cache manager. A process is charged for a write when it is the first process to
modify (dirty) a block in the cache manager. This value is the same as what you get in the VOS
command display_system_usage.
20
Analysis
Disk % Busy Plot
NOTE
This plot will probably need to be adjusted for your system. If
there are more than 32 disks on your module, the metrics will not
resolve. The scales may also need to be adjusted.
The Disk % Busy plot (Figure 19) shows the percentage of time during the interval that each disk
was busy.
Figure 19. Disk % Busy Plot
Disk % Busy tells you how busy individual disks are. Disks should not be run higher than 50%
busy for acceptable performance.
Analysis
21
Analysis
Disk I/O TopList
The Disk I/O TopList (Figure 20) reports various statistics for each disk drive on the system.
Figure 20. Disk I/O TopList
% Busy tells you how busy individual disks are. Disks should not be run higher than 50% busy for
acceptable performance.
I/O Rate is the rate of the total number of read and write operations to a given disk. This number
includes file and paging IO. This number does not include the extra IOs needed to do the verify
operation on the disks.
Avg Resp is the amount of time in milliseconds that a disk request took to complete for a given
disk. This time is composed of the average time it took to actually perform the disk IO (service
time) and the average time a given request had to wait because the disk was busy doing other
I/Os that were previously queued.
Avg Svc is the average amount of time in milliseconds that a disk took to perform a given IO.
Regardless of how busy a disk is, this number should be fairly constant as it ignores the time
spent waiting as previously queued requests are processed. Some small change might be seen if
application IOs happen to find the read/write heads on cylinder more often.
Avg Wait is the amount of time in milliseconds that the average IO request had to wait as
previously queued requests are processed. The busier the disk, the more likely this number will
increase (assuming multiple sources of requests). This meter shows you the “price” you pay for
running that disk, that busy.
Q Length is the average number of IO requests in the queue for a given disk.
Degradation gives you an idea of the performance penalty you are paying as multiple processes
compete for the disk. It normalizes queue time with this basic formula:
Degradation = ((serv time + queue time) / serv time)
22
Analysis
It will never be less that 1 (it equals 1 if queue = 0), and will be 2 when queuing time equals
service time (50% busy in theory). But, an average queue time of 5 msec may be okay if serv
time = 30 msec (degradation =1.16) but bad if serv time = 10 msec (degradation = 1.5). Numbers
above 1.5 are a flag of potential problems.
Concurrency is a number similar to utilization, but is based on the overall response time, not just
the service time ((serv time + queue time) / interval). The name implies it should give some
measure of “how many users are visiting the service center.” The number can grow above 1, as
queuing gets really bad. If the server is 100% busy (10 seconds busy time in a 10-second
interval), and there is an additional 5 seconds of queuing, the concurrency is 15/10 = 1.5, or on
average, there were 1.5 I/Os at the disk.
File I/O Plot
The File I/O plot (Figure 21) shows the overall I/O activity on the system, broken down onto the
components described below.
Figure 21. File I/O Plot
Disk User Reads/Sec tells you the number of reads per second that all user processes were
cache manager and thus a physical I/O is required.
Disk User Writes/Sec tells you the number of writes per second that all user processes were
in the cache manager and thus it needs to be written out sometime in the future.
Analysis
23
Analysis
Disk Sys Reads/Sec tells you the number of reads per second that all system processes were
cache manager and thus a physical I/O is required.
Disk Sys Writes/Sec tells you the number of writes per second that all system processes were
in the cache manager and thus it needs to be written out sometime in the future.
Disk Svr Reads/Sec tells you the number of reads per second that all server processes
(StrataLink, StrataNet, and OSL) were charged with. A process is charged for a read when it
requests a block of disk that is not in the cache manager and thus a physical I/O is required.
Disk Svr Writes/Sec tells you the number of writes per second that all server processes
(StrataLink, StrataNet, and OSL) were charged with. A process is charged for a write when it is
the first process to modify (dirty) a block in the cache manager and, thus, it needs to be written
out sometime in the future.
NOTE
The Disk Free Space (MB) plot (Figure 22) shows the number of megabytes available on each
disk.
Figure 22. Disk Free Space (MB) Plot
24
Analysis
Disk Free Space (MB) is an expression metric (“Disk FSize MB[]”-”Disk FUsed MB[]”) that tells
you how much space (in megabytes) is available on each disk. Running out of free disk space
crashes applications. A sufficient reserve should be maintained.
The Wkld Disk I/O TopList (Figure 23) reports I/O statistics by the workloads defined in
FRTLHOME>etc>agentmgr.conf. (You should modify agentmgr.conf to capture the work
done by your applications. See Chapter 4 in the Power Agent section of this User’s Guide for a
description of defining workloads.)
Figure 23. Wkld Disk I/O TopList
% Busy shows the percentage of time all the processes that make up this workload were reading
from disk. It takes the Wkld Disk Rd Busy Time and expresses it as a percentage of the sample
interval.
Reads shows the total number of reads to disk for the file system during the interval for the entire
workload. A read is charged to a process when it does not find the record it is looking for in cache
and has to go all the way to disk to get the block(s) that contain the record.
Writes shows the total number of writes to disk for the file system during the interval for the entire
workload. A process is charged for a write when it is the first process to modify a block in the
cache manager. Until that modified block is written to disk, all subsequent writers to the file are
not charged for the write. So, if your application does 50 writes in one second to the same record
in a single disk block the application is only charged for one write. If lots of processes are writing
to that same record, only one will get charged for doing the first modification to that block.
Eventually, the cache manager will write that block out to disk and then the next process to write
to that block gets charged for a write.
Analysis
25
Analysis
Res Time shows the total amount of time during the sample that all processes in the workload
were doing reads. This time includes the time waiting to read and the time spent actually doing
the reads.
Busy Time shows the total amount of time during the sample that all the processes in the
workload were actually reading from disk. This does not include the time they spent waiting for
their reads to start because I/Os were ahead of them in the queue.
Wait Time shows the total amount of time during the sample that all processes in the workload
were waiting to access the disk because other I/Os were ahead of them in the queue. This time
does not include the time spent actually doing the reads.
Member Count Plot
NOTE
The Member Count plot (Figure 24) is a minimized plot that shows the number of members
associated with each disk drive.
Figure 24. Disk % Busy Plot
Disk Members reports the number of logical members on the disk.
26
Analysis
% Free Space Plot
NOTE
The % Free Space plot (Figure 25) shows the percentage of space available on each disk.
Figure 25. % Free Space Plot
% Free Space is an metric variable (100 – ((“Disk FUsed MB[]” / “Disk FSize MB[]”) * 100) that
tells you what percentage of space is available on each disk. Running out of free disk space
crashes applications. A sufficient reserve should be maintained.
Analysis
27
Analysis
NOTE
The Avg Q Length by Disk plot (Figure 26) shows the average number of items queued for each
disk drive.
Figure 26. Avg Q Length by Disk Plot
Disk Avg Queue Length tells you the average number of IO requests in the queue for a given
disk.
28
Analysis
Cache Page
The Cache Page displays detailed information about the VOS cache memory. It consists of two
plots and a TopList window.
The Cache Activity and Hit Rate plot (Figure 27) shows the overall activity to the system cache,
and the effectiveness of the cache.
Figure 27. Cache Activity and Hit Rate Plot
Mem Cache Read Hits/Sec tells you how many times per second during the interval a process
on the module asked for some part of the file structure and it found it in cache memory. This
could be a file, an index or a directory block. Hits are good for performance, misses (see “Mem
Cache Read Misses/Sec”) are bad.
Mem Cache Read Misses/Sec tells you how many times per second during the interval a
process on the module asked for some part of the file structure and did NOT find it in cache
memory. This could be a file, an index, or a directory block. Hits (see “Mem Cache Read
Hits/Sec”) are good for performance, misses are bad.
Cache Read Hit % is an expression metric (“Mem Cache Read Hits/Sec” / (“Mem Cache Read
Hits/Sec” + “Mem Cache Read Misses/Sec”) * 100). The disk cache is a part of the module main
memory that is set aside to provide a buffer between the physical disks and the applications.
When an application goes to read a record from a block(s) in cache it either finds them there (a
hit) or has to wait for the IO subsystem to bring them into memory (a miss). This metric reports
the percentage of requests that did not require an I/O, (hits).
Analysis
29
Analysis
Cache Soils Plot
The Cache Soils plot (Figure 28) shows the rate at which cache pages are updated (written to),
as described below.
Figure 28. Cache Soils Plot
Mem Cache Soiled/Sec tells you how many times per second during the interval a process on
the module was the first process to write into a file system block. Because the cache manager
buffers writes to disk, it is perfectly possible that many other processes will write to a given block
before it is flushed to disk. But, only the first write is counted here. Once the cache manager
writes that block out to disk, the next process that writes to that block will be “charged” with a
cache soiled and the process starts all over again. This could be a file, an index, or a directory
block. This number is interesting because it meters the blocks that will eventually have to be
written to disk, regardless of how many times the processes wrote data into those blocks.
30
Analysis
Workload Cache Usage TopList
The Workload Cache Usage TopList (Figure 29) reports cache memory usage and performance
by the workloads defined in FRTLHOME>etc>agentmgr.conf. (You should modify
agentmgr.conf to capture the work done by your applications. See Chapter 4 in the Power
Agent section of this User’s Guide for a description of defining workloads.)
Figure 29. Wkld Cache Usage TopList
Read Hit % is an expression metric (“Wkld Cache Read Hits[]”)/(“Wkld Cache Read
Misses[]”+”Wkld Cache Read Hits[]”) *100)) that shows the percentage of read requests that were
satisfied with data already in the cache for all of the processes that make up the workload.
Soiled/Sec tells you how many times per second during the interval all the processes in the
workload were charged for a cache soil, or write. Only the first process to write into a file system
block is charged with the I/O. Once the cache manager writes that block out to disk the next
process that writes to that block will be charged with a cache soiled and the process starts all
over again. This could be a file, an index, or a directory block. This number is interesting because
it meters the blocks that will eventually have to be written to disk, regardless of how many times
the processes wrote data into those blocks.
Hits/Sec tells you how many times per second during the interval all the processes in the
workload asked for some part of the file structure and it found it in cache memory. This could be a
file, an index, or a directory block.
Misses/Sec tells you how many times per second during the interval all the processes in the
workload asked for some part of the file structure and did NOT find it in cache memory. This
could be a file, an index, or a directory block.
Analysis
31
Analysis
Workloads Page
The Workloads Page provides statistics on your applications, as defined in
FRTLHOME>etc>agentmgr.conf. (You should modify agentmgr.conf to capture the work
done by your applications. See Chapter 4 in the Power Agent section of this User’s Guide for a
description of defining workloads.)
The CPU Usage by Workload plot (Figure 30) shows the overall CPU usage for all the processes
that make up the workload. Process and workload metrics are all on a per CPU basis, so this plot
can scale from 0 to 100 * Number of CPUs.
Wkld % CPU tells you what percentage of the total available CPU resource the workload is
consuming. It is reported on a scale from 0 to 100 * Number of CPUs.
32
Analysis
The Task Count by Workload plot (Figure 31) reports the total number of processes active in the
workload.
Figure 31. Task Count by Workload Plot
Wkld Total tells the total number of processes in the workload. This value is only sampled once
per interval so short -lived processes might get missed.
Analysis
33
Analysis
Workloads TopList
The Workloads TopList (Figure 32) reports the following statistics for the workload:
Figure 32. Task Count by Workload Plot
% CPU tells you what percentage of the total available CPU resource the workload is consuming.
It is reported on a scale from 0 to 100 * Number of CPUs.
Flts/Sec tells you the page faults generated by the workload, expressed as a rate per second.
Page faults can be a symptom of insufficient memory, an application memory leak or they can be
caused by an application design where memory is used for a short period and then returned, thus
causing page faults on each repeated allocate. Typically, if the page faulting is widespread
among most processes there is a shortage of memory. If only one process is page faulting, that
can be fixed with a code change.
Rds/Sec shows the total number of reads to disk for the file system during the interval for the
entire workload. A read is charged to a process when it does not find the record it is looking for in
cache and has to go all the way to disk to get the block(s) that contain the record.
Wrts/Sec shows the total number of writes to disk for the file system during the interval for the
entire workload. A process is charged for a write when it is the first process to modify a block in
the cache manager. Until that modified block is written to disk, all subsequent writers to the file
are not charged for the write. So, if your application does 50 writes in one second to the same
record in a single disk block the application is only charged for one write. If lots of processes are
writing to that same record, only one will get charged for doing the first modification to that block.
Procs shows the total number of processes in the workload. This value is only sampled once per
interval, so short-lived processes might get missed.
% CPU Busy reports the CPU busy time for this workload, expressed as a percentage. It is
reported on a scale from 0 to 100 * Number of CPUs.
34
Analysis
The Wkld Memory Usage plot (Figure 33) reports the total number of memory pages in use by the
workload.
Figure 33. Wkld Memory Usage Plot
Wkld Mem Usage is an expression metric (“Wkld Shared Memory[]” + “Wkld Unshared
Memory[]”) that reports the total number of pages in use for all processes that make up the
workload.
Analysis
35
Analysis
The Reads by Workload plot (Figure 34) reports the rate of read I/Os generated by the workload.
Figure 34. Reads by Workload Plot
Wkld Reads reports the total number of reads to disk for the file system during the int erval for all
the processes that make up the workload. A read is charged to a process when it does not find
the record it is looking for in cache and has to go all the way to disk to get the block(s) that
contain the record.
36
Analysis
The Writes by Workload plot (Figure 35) reports the rate of read I/Os generated by the workload.
Figure 35. Writes by Workload Plot
Wkld Writes reports the total number of writes to disk for the file system during the interval for
the entire workload. A process is charged for a write when it is the first process to modify a block
in the cache manager. Until that modified block is written to disk, all subsequent writers to the file
are not charged for the write. So, if your application does 50 writes in one second to the same
record in a single disk block, the application is only charged for one write. If lots of processes are
writing to that same record, only one will get charged for doing the first modification to that block.
Analysis
37
Analysis
Sample AutoAlert System: VOS.VTH
To help you quickly get started using SightLine’s AutoAlert System, also known as threshold
alarms, we provide you a pre-configured threshold system. It is in the C:\Program
Files\FORTEL SightLine\Expert Advisor Vision\VOS directory. AutoAlert
configuration files are stored in files with a .VTH extension.
The sample threshold system is by necessity very basic, because there are few “rules of thumb”
that can be applied to all VOS systems. Use the threshold system we provide as a starting point.
You will probably need to modify it (or create your own) to set thresholds tailored to your unique
environment.
The descriptions we provide address key elements of the EA/V menus and dialogs. See the
SightLine Expert Advisor/Vision User’s Guide for a complete discussion of thresholds, threshold
systems, and time systems.
VOS.VTH consists of eight thresholds. The description for this sample threshold system follows
this format:
These items are repeated for each metric in the threshold system.
Metric: The metric to which the threshold is assigned.
Value: The value that the metric must exceed in order to constitute a violation. To change the
value, click inside the edit box, and change the number to the value you want the threshold to be.
Priority: From 0 (lowest) to 99 (highest), the priority of this threshold with regard to other
thresholds. To change a priority, click in the edit box, and change the number to reflect the priority
you want for this threshold.
Direction: Low or High — For metrics that must exceed their threshold to be in violation (such as
CPU Busy), this should be set to high. For metrics that must go below their threshold to be in
violation (Memory % Free), it should be set to low.
Trigger after [n] secs of violation: The value of n specifies how long a violation must persist
before EA/V will trigger its alarm. To change this, click inside the edit box and change n.
Violation Message: The text that EA/V will display on screen and write to the Threshold
Violation Log when a violation of this threshold occurs. To change this, click inside the Violation
Message edit box, and change the message to suit your situation.
Memory % Free
38
Value:
10
Priority:
99
Direction:
Low
Trigger After:
60 seconds
Violation Message:
There is less than 10% free memory. If usage can not be reduced, more
memory may be needed.
Analysis
Page File % Free
Value:
20
Priority:
50
Direction:
Low
Trigger After:
0 seconds
Violation Message:
Paging space is running low. Additional swap space may be needed.
Cache Read Hit %
Value:
90
Priority:
50
Direction:
Low
Trigger After:
60 seconds
Violation Message:
The read cache hit rate is low. Additional system cache may be needed.
Page Fault Rate
Value:
10
Priority:
77
Direction:
High
Trigger After:
60 seconds
Violation Message:
The page fault rate is high. Check the EventList for offending processes.
CPU % Other
Value:
5
Priority:
50
Direction:
High
Trigger After:
60 seconds
Violation Message:
CPU % Other is running high. Check for no-wait I/O and excessive interprocess communication.
Analysis
39
Analysis
CPU % Interrupts
Value:
20
Priority:
66
Direction:
High
Trigger After:
60 seconds
Violation Messa ge:
Device interrupt processing is quite high. Check configuration of network
and communication devices.
Total CPU Utilization
Value:
80
Priority:
99
Direction:
High
Trigger After:
60 seconds
Violation Message:
CPU usage is very high. Check the EventList for offending processes.
CPU Wait-Busy Ratio
Value:
1
Priority:
77
Direction:
High
Trigger After:
60 seconds
Violation Message:
More time waiting for CPU than using it. Use the EventList to see which
processes are waiting.
AutoAnalyze Rules and Reports
This section describes the rules and reports used when AutoAnalyze is invoked. AutoAnalyze is
an analysis and reporting tool that automatically looks for exception conditions and then
generates recommendations and summary reports.
Each exception has a common set of attributes:
Condition: The metric(s) and value that causes the exception.
Persistance: The duration that the exception must exist for it to be considered an exception.
Report: .RTF file located in the c:\Program Files\FORTEL SightLine\Expert Advisor
Vision\AANALYZE directory.
Plot(s): OLE-linked plot(s) (.VPL file) located in the c:\Program Files\FORTEL SightLine\Expert
Advisor Vision\AANALYZE directory.
40
Analysis
Disk-Busy
Condition:
A disk is greater than 50% busy
Persistance:
5 out of 10 intervals
Report:
shdskbsy.rtf
Plot(s):
shdskbsy.vpl
CPU-Waiting Exceeds Running
Condition:
CPU Wait time is greater than CPU Busy time
Persistance:
Report:
scpuwait.rtf
Plot(s):
scpuwait.vpl
Disk-Error(s) Detected
Condition:
Any fatal or data errors detected on a disk
Persistance:
Immediate
Report:
shdskerr.rtf
Plot(s):
shdskerr.vpl
CPU-Busy
Condition:
The CPU is greater than 80% busy
Persistance:
7 out of 10 intervalse
Report:
shtotcpu.rtf
Plot(s):
shtotcpu.rtf
CPU-Too many Interrupts
Condition:
CPU % Interrupts is greater than 20%
Persistance:
Report:
shintcpu.rtf
Plot(s):
shintcpu.rtf
Analysis
41
Analysis
CPU-High Scheduler Overhead
Condition:
CPU % Other is greater than 5%
Persistance:
Report:
shothcpu.rtf
Plot(s):
shothcpu.vpl
Memory-High Page Fault Rate
Condition:
Total Page Fault Rate is greater than 5 per second
Persistance:
Report:
shpagflt.rtf
Plot(s):
shpagflt.rtf
Cache-Low Hit Rate
Condition:
Cache Read Hit Rate is less than 90%
Persistance:
Report:
slcachit.rtf
Plot(s):
slcachit.rtf
Disk-File Sys Free Space Low
Condition:
A disk’s file space is less than 20% free
Persistance:
Immediate
Report:
slfilspc.rtf
Plot(s):
slfilspc.vpl
Memory-Pagefile Space Low
42
Condition:
The page file hass less than 20% free space
Persistance:
Report:
slpagspc.rtf
Plot(s):
slpagspc.rtf
Analysis
Memory-Low Free Space
Condition:
Memory % Free is less than 10%
Persistance:
Report:
slfremem.rtf
Plot(s):
slfremem.vpl
Analysis
43

SightLine Power Agent for VOS Systems User`s Guide

Transcription

Similar documents

E - DEVLET

Arriving - Olympics

Tuning U2 Databases on Windows

Quiz 5

soundelux e47 - RecordingHacks

Colette Salyk

slides

Windows Formatting Instructions