SightLine Power Agent for VOS Systems User`s Guide
Transcription
SightLine Power Agent for VOS Systems User`s Guide
SightLine Power Agent for VOS Systems User’s Guide version 6.1 Stratus Computer R464-01 Notice The information contained in this document is subject to change without notice. UNLESS EXPRESSLY SET FORTH IN A WRITTEN AGREEMENT SIGNED BY AN AUTHORIZED REPRESENTATIVE OF STRATUS COMPUTER (DE), INC., STRATUS MAKES NO WARRANTY OR REPRESENTATION OF ANY KIND WITH RESPECT TO THE INFORMATION CONTAINED HEREIN, INCLUDING WARRANTY OF MERCHANTABILITY AND FITNESS FOR A SPECIFIC PURPOSE. Stratus Computer (DE), Inc., assumes no responsibility or obligation of any kind for any errors contained herein or in connection with the furnishing, performance, or use of this document. The software described in Stratus documents (a) is the property of Stratus Computer Systems, S. a. r. I., Luxembourg or the third party, (b) is furnished only under license, and (c) may be copied or used only as expressly permitted under the terms of the license. This document is protected by copyright. All rights are reserved. No part of this document may be copied, reproduced, or translated, either mechanically or electronically, without the prior written consent of Stratus Computer. Stratus, the Stratus logo, Continuum, XA, Continuous Processing, StrataLINK, and StrataNET are registered trademarks of Stratus Computer, Inc. SightLine and Power Agent are trademarks of FORTEL Inc. All other trademarks are registered to their respective owners. Manual Name: SightLine Power Agent for VOS Systems User’s Guide Part Number: R464 Revision Number: 01 SightLine Version Number: 6.1 VOS Release Number: 13.5 Printing Date: March 2002 Stratus Computer (DE), Inc. 111 Powdermill Road Maynard, Massachusetts 01754-3409 © 2001–2002 by Stratus Computer Systems, S. a. r. I., Luxembourg. All rights reserved. Preface Preface The SightLine Power Agent for VOS Systems User’s Guide describes how to install and use SightLine , a Stratus client/server software product that monitors, analyzes, and reports on the performance of computer systems both historically and in real-time. SightLine runs on all Stratus Continuum models running VOS release 13.5 or later. This documentation is made up of three parts: 1. Getting Started, which provides quick instructions for host installation, post-installation configuration, installation on the PC, and a brief overview on how to begin performance monitoring. 2. Power Agent, which describes in detail how to install and configure the SightLine host software on computer systems running the Stratus VOS operating system. 3. Analysis, which describes the sample files provided on your installation CD. Document Conventions This document uses the following typographical conventions. Item Convention Example Acronyms All uppercase CUAFIFO a.m., p.m. Lowercase, separated by periods 9:00 a.m., 12:22 p.m. Book and guide titles Title caps, italic type See the PC User’s Guide for details. Chapter titles Title caps, in quotation marks See Chapter 1, “Introduction.” Code sample, including keywords and variables within text and as separate paragraphs, and user-defined program elements within text Monospace #include <iostream.h> Command-line commands and options (switches) All lowercase, bold copy command Commands on menus and buttons Bold; capitalization follows interface (usually title caps) /a option Date and Time Apply New Query button Device names All uppercase LPT1 COM1 Preface iii Preface Item Convention Example Dialog box titles Bold; title caps Protect Document dialog box Import/Export Setup dialog box Dialog box options Bold; capitalization follows interface (usually initial caps) Close all programs and log on as a different user? Find Entire Cells Only check box Key names, key combinations, and key sequences All uppercase CTRL, TAB CTRL+ALT+DEL SHIFT, F7 ALT, F, O Logical key names Title caps, bold Backspace key Logical operators All uppercase, bold AND XOR Macros Bold (if predefined); usually all uppercase LOWORD Menu names Bold; title caps File menu New terms or emphasis Italic You can look up entries in the online index. Programs and applications Usually title caps for application program names. SightLine; Microsoft Word Italic type for internal program names. The datamgr program. User input Generally lowercase, monospace, unless casesensitive or to match standard capitalization conventions Type -ppassword Windows, named Title caps Help window Windows, unnamed All lowercase document window iv SightLine for VOS Systems User’s Guide (R464) Preface Related Manuals See also the following guide: • SightLine Expert Advisor/Vision User’s Guide (available on the CD-ROM that contains the EA/V software) Ordering Manuals You can order manuals in the following ways: • If your system is connected to the Remote Service Network (RSN™), issue the maint_request command at the system prompt. Complete the on-screen form with all of the information necessary to process your manual order. • Customers in North America can call the Stratus Customer Assistance Center (CAC) at (800) 221-6588 or (800) 828-8513, 24 hours a day, 7 days a week. All other customers can contact their nearest Stratus sales office, CAC office, or distributor; see the file cac_phones.doc in the directory >system>doc for CAC phone numbers outside the U.S. Manual orders will be forwarded to Order Administration. Preface v SightLine Power Agent for VOS Systems Getting Started version 6.1 Contents Installation Steps VOS Product Tape FTP Bundled Image Post-Installation Configuration 1 1 2 2 Expert Advisor/Vision Software Installation 4 Beginning Performance Monitoring 5 Figures Figure 1. SightLine Modules 6 Contents i Getting Started Before you install the SightLine Power Agent software, make sure you can satisfy the following requirements: • Make sure you have the correct kit for your VOS hardware platform and TCP “flavor,” and the necessary AccessKeys provided by your software provider. • You will need about 15 MB of disk space to hold the program modules, and some additional space (configurable) to hold the collected performance data. You should allow for at least 25 MB of total disk space. • You will need a Windows NT or Windows 2000 Workstation or Server with at least 50 MB of free disk space and a TCP/IP connection to the VOS system. ® ® The software is easy to install. If you are familiar with your platform’s installation facility, you can install SightLine by following the steps in the next three sections: • • • The first section details the VOS host system installation. The second section details the SightLine Expert Advisor/Vision software installation on the workstation. The third section gets you started monitoring and analyzing your VOS system(s). Installation Steps You can install the host kits in one of two ways: from a VOS product tape using install_new_release, or from an FTP bundled image. VOS Product Tape You can install the software directly from the tape by using the following commands: 1. Load the SightLine Power Agent for VOS tape and run install_new_release, as described in the Stratus manual VOS Installation Guide (R386-02). 2. Change to the SightLine directory and run the post-installation command macro (install_sl.cm). Follow the instructions described in the Post-Installation Configuration section. Getting Started 1 Getting Started FTP Bundled Image 1. FTP the kit for your VOS hardware platform from the appropriate FTP site. It should be transferred in binary mode (type = binary). Place the kit in any directory you choose. SightLine will be installed in a subdirectory. 2. Create the SightLine directory: create_dir SightLine 3. Unbundle the package. You will need to have the necessary command macros (unbundle.cm) and program modules (decode_vos_file.pm and gzip.pm) installed and their location defined in your command library path using the add_library_path command. A description of bundle/unbundle and all the necessary files can be found on the Stratus public FTP site, ftp://ftp.stratus.com/pub/vos/utility/README.txt. unbundle kit-name SightLine 4. Run the post-installation command macro (install_sl.cm), and follow the instructions described in the next section. change_current_dir SightLine install_sl Post-Installation Configuration SightLine is now installed in the SightLine directory (called FRTLHOME) and contains the following subdirectories: bin data etc log lib install Program modules directory HostTraceFile directory Configuration directory Log files directory Library files directory Base files used during the installation process The next step is to run the post-installation configuration script, install_sl.cm. The macro will complete the configuration portion of the installation. It will prompt you for the following information: • • • • • 2 the AccessKey the collection interval the amount of data to be stored on the host system the IP address or name of the host system a 3-character short id that uniquely identifies this host SightLine Power Agent for VOS Systems: Getting Started (R464) Getting Started 1. If you are upgrading from ViewPoint, the following prompt will appear: There is an existing installation of ViewPoint in FRTLHOME 2. Enter the Access Key string: Type the AccessKey string after this prompt. 3. Enter the data collection interval [default = 30]: Enter an integer number of seconds for the collection interval. 4. Enter the data retention period for the host trace file (example formats: 24h, 10m, or 3d) [default = 10m]: Enter the desired parameter for the maximum amount of data to be stored locally. This can be defined in either hours (h), days (d), or megabytes (m). The default is 10m, or 10 megabytes. 5. You must specify the system_name or IP address of this system The system name you use must resolve to the IP address of this system Enter the system name or the IP address: Specify an IP address or system name that will resolve to the IP address of the host system. A PING will be executed to make sure the name entered does properly resolve. 6. When monitoring data from multiple machines, the ViewPoint/PC client uses a 3 character identifier to uniquely identify a particular machine. You MUST choose a 1 to 3 character string that will make it obvious which machine a particular metric is coming from. Enter a new short-id or accept [default = system_name]: Specify an IP address or system name that will resolve to the IP address of the host system. A PING will be executed to make sure the name enters does resolve properly. When these configuration parameters have been entered, the macro will ask if you want to start the software now. When you answer this prompt, the post-installation configuration will be complete. N O T E ————————————————————————Do not start the software at this point if you want to configure the analyze_system interface. Configure this interface by editing the FRTLHOME>etc>analyze_system.conf file. If you do not want this interface, edit FRTLHOME>bin>slagent.cm and set frtlasi to 0. (Refer to Appendix A for information about the analyze_system interface.) Getting Started 3 Getting Started To start the software manually, enter the following: change_current_dir FRTLHOME>bin slagent start N O T E ————————————————————————You will need to replace FRTLHOME with the full path to your SightLine directory. To stop the software manually, pass stop as the first parameter to the slagent macro instead of start. The command looks like this (substituting the path for FRTLHOME): change_current_dir FRTLHOME>bin slagent stop The slagent macro can be used to start the ViewPoint Agent processes in diagnostic mode. Diagnostic mode causes additional information to be written to the log files in the FRTLHOME>log directory, which can be of use in diagnosing problems. Use this syntax: slagent [-da ] [ -ds ] [ -dd ] [ -dt ] { start | stop | restart | status } where: -da -ds -dd -dt starts starts starts starts agentmgr in diagnostic mode servd in diagnostic mode datamgr in diagnostic mode threshd in diagnostic mode There are other configuration parameters that can be modified. See the remaining chapters in this User’s Guide for detailed information about configuring the host software to achieve operational objectives. I M P O R T A N T ————————————————————Please review the Release Notes accompanying this software for any information that may have been available too late to be included in this guide. Expert Advisor/Vision Software Installation Microsoft ® Windows NT™ or Windows 2000 higher is required for Expert Advisor/Vision (EA/V) on the PC. Use the SightLine installation CD-ROM or FTP’ed image to install EA/V as follows: 4 1. Select Start | Settings | Control Panel | Add/Remove Programs. 2. Select Add New Programs. 3. Browse to the CD-ROM or location to which the install image has been transferred using FTP and select Setup.exe. 4. Respond to the InstallShield dialog boxes to complete the installation. SightLine Power Agent for VOS Systems: Getting Started (R464) Getting Started If you have any questions about the installation process, more detail is provided in Section 1.3 of the Expert Advisor/Vision User’s Guide. You are now ready to connect to your system and begin monitoring the performance of the system. Beginning Performance Monitoring The steps required to begin performance monitoring are outlined below. To transfer data from the host system, a TCP/IP connection is needed between the host system and the PC. More complete details are provided in Chapter 3 of the Power Agent section of this User’s Guide. 1. The first time you run EA/V on the PC, and after you enter an AccessKey, the agents that are on the same subnet as the EA/V workstation will be displayed in the Enterprise View (the left pane of the EA/V application window). If there isn’t anything showing in the Enterprise View, right-click on the Enterprise icon and select AutoDiscover. Right -click on a hostname to examine and modify the settings for a particular host. If there isn’t an entry for a particular host, you will need to manually add a Network Host Session as follows: • Right -click the Enterprise icon and select New Host. • Enter the hostname in the Name field, and specify the host name or IP address in the Host field. • Click Edit to specify the size and location of the PC trace file. • Click OK to exit the Configure Network Host Session dialog box. 2. When connecting for the first time, right -click on the hostname in the Enterprise View and select Connect Now! If data collection was already started on the host, you will see the Define Times to Download dialog box, indicating that there is already historical data to download. Select the times you want to download and click OK. If you do not want to proceed to live data, clear the Proceed to Live Data box and choose an end time for your download. If the box is checked, and after all requested historical data has been downloaded, you will get a new, nearly real-time data block every collection interval (default = 30 seconds). 3. If you encounter an error, double-click the Status line next to the connection to view a communications log. 4. The standard VOS environment (a set of plots and other objects on multiple pages) should load as soon as the data transmission begins. If not, load it by choosing File | Open | Environment from the menu. Each platform’s sample environment files are in a <platform> directory under \Expert Advisor Vision (for example, \FORTEL SightLine\ Expert Advisor Vision\VOS). When the Open Environment dialog box appears, follow these steps: Getting Started 5 Getting Started 5. 6. • Browse to the proper directory. • Highlight <filename>.VEN by clicking on it. • Choose the system you want to analyze in the Force Into Trace System field, and click OK. A set of standard plots will appear on the screen. They will be updated with new information every interval. • Use the standard Windows controls to maximize and restore each plot. You are now ready to begin exploring the activity and performance of your system. • Create new plots and environments specific to your system and save them for future use. • Set thresholds to detect performance problems and generate alerts. • Use AutoAnalyze to produce reports on host activity, exception reports based on pre-defined thresholds, and a list of recommendations for each exception. • Use AutoCorrelate™ to determine the causes of poor performance. • Capture interesting plots and import them into a spreadsheet or word-processing program for annotation or reporting. TM To stop the data transfer from the VOS host to the PC, right-click on the hostname in the Enterprise View and select Disconnect. Figure 1 illustrates the relationship between the SightLine EA/V workstation and the SightLine Power Agent on a VOS host: datamgr Power Agent servd protomgr SightLine Expert Advisor/Vision threshd e-mail script Figure 1. SightLine Modules 6 SightLine Power Agent for VOS Systems: Getting Started (R464) Getting Started Each part is described briefly here; later sections will describe each component in more detail. agentmgr Real-time Agent gathers and provides data to clients. datamgr Receives data from agentmgr and manages a performance database. servd Listens for EA/V PC download requests and starts protomgr. threshd Receives live data from agentmgr and produces alerts. protomgr Gets data from datamgr or agentmgr and transmits to EA/V on the PC. EA/V Management application for alerting, metric display, performance analysis, and reporting. Getting Started 7 SightLine Power Agent for VOS Systems Power Agent version 6.1 Contents Chapter 1 Introduction 1-1 Chapter 2 Step-by-Step Host Installation Installation Steps VOS Product Tape FTP Bundled Image Post Installation Configuration Default Directory Structure The bin Directory The data Directory The etc Directory The log Directory Associated Programs and Scripts The datadump.pm Program The slagent.cm Macro The db2vtx.pm Macro The as_iface.pm Macro The cvtag43to61.pm and cvtpr43to 61.pm Macros 2-1 2-1 2-1 2-2 2-2 2-4 2-4 2-5 2-5 2-5 2-6 2-6 2-6 2-6 2-6 2-6 Chapter 3 PC-to-VOS Host Configuration Parameters 3-1 Chapter 4 Agentmgr The Class Hierarchy Default agentmgr.conf File The CONFIG Statement AccessKey Configuration Nproc Configuration FILTER Statements The COMPUTATIONS Section CLASS CRITERIA EXCLUSIVE | INCLUSIVE COMPUTE VARIABLE VARSET Example of a CLASS Specification 4-1 4-1 4-2 4-8 4-8 4-8 4-9 4-10 4-10 4-11 4-12 4-12 4-13 4-13 4-13 Contents i Contents The Workloads Class The Processes Class Class Definition CRITERIA VARIABLE VARSET Path Meters Class Registry.csv Metric Description File agentmgr Command Line Options 4-15 4-15 4-17 4-19 4-20 4-20 4-20 4-21 4-22 Chapter 5 Datamgr Configuration Communication File Structure Directory Structure Centralized Database Management Datamgr Command Line Options 5-1 5-2 5-3 5-4 5-5 5-6 5-8 Chapter 6 Threshd Configuration General Structure Metric Names String Substitution Additional String Features MAILHOST Definition AGENTMANAGERS Definition SNMPVARS Definition SNMPTRAPS Definition THRESHOLDS Definition Expressions Operators for Complex Expressions Actions Send E-mail Send an SNMP Trap Execute a Script Messages Threshd Command Line Options 6-1 6-1 6-2 6-2 6-2 6-3 6-3 6-4 6-4 6-5 6-5 6-7 6-7 6-7 6-8 6-8 6-9 6-9 6-9 Chapter 7 Servd Configuration Subscript name truncation Servd Command Line Options 7-1 7-1 7-2 7-2 Chapter 8 Protomgr Default protomgr.conf Data Source Selection Short ID Definition Collection Interval Definition 8-1 8-2 8-2 8-2 8-3 ii SightLine Power Agent for for VOS Systems: Power Agent (R464) Network Address Translation (NAT) and Firewall Support Download Throttling Enable/Disable Metric Selection Exclusion Section Data Redefinition Event Data Passing protomgr Command Options Option Flags 8-4 8-4 8-5 8-5 8-6 8-8 8-9 8-9 Chapter 9 Command Syntax — Quick Reference Agentmgr Datamgr Threshd Servd Protomgr slagent Command Macro 9-1 9-1 9-1 9-1 9-2 9-2 9-3 Chapter 10 Troubleshooting 10-1 Appendix A Analyze_system Interface Configuration A-1 A-1 Contents iii Figures Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure 4-1. 4-2. 4-3. 4-4. 4-5. 5-1. 5-2. 5-3. 6-1. 8-1. 8-2. Default agentmgr.conf File Regular Expressions Metrics for the Process Class Stratus VOS Default Workload Class Definition Default control.harvest File Default Line from datamgr.conf File SightLine EA/V and the Power Agent Components Centralized Database Management Default Line from datamgr.conf File Advanced Session Settings Dialog Box Summary EventClass Data 4-8 4-11 4-17 4-19 4-21 5-2 5-3 5-6 6-8 8-3 8-9 Figures v Chapter 1 Introduction SightLine is the Stratus client/server software product that monitors, analyzes, and reports on the performance of computer systems both historically and in real-time. SightLine consists of two parts: 1. The SightLine Power Agents, which run on the computer(s) to be monitored (the host(s)). 2. The SightLine Expert Advisor/Vision (EA/V) application, which runs on a workstation running Microsoft® Windows NT® or Windows 2000® Server or Workstation. The Power Agent software reduces the raw performance data and sends it to EA/V on the PC to be analyzed and displayed. This design minimizes SightLine’s impact on system resources and gives it a powerful graphical environment for displaying and analyzing system performance. This guide describes how to install and configure the SightLine Power Agent software on computer systems running the Stratus VOS operating system. The EA/V software is described in the SightLine Expert Advisor/Vision User’s Guide, which is available on the CD-ROM that contains the software. The intended audience for this material is the VOS system administrator. The SightLine Power Agent runs on all Stratus Continuum models running VOS release 13.5 or later. Version 6.1 of the SightLine Power Agent software for Stratus VOS incorporates fundamental improvements in design and functionality by introducing Dynamic Symbol Table Changes. In previous releases of the software, the introduction of new symbols, such as a mounted file system or a new metric, would require overwriting the trace file on the PC. With the introduction of Dynamic Symbol Table Changes, new symbols will be merged in with the existing trace file. In addition, the default metric names have been moved out of the configuration file and are now contained in the Interface Agent. This greatly improves the modular design of the Interface Agents by eliminating the need, in the previous version, to add metric name mappings in the vpcom.conf file in order for a metric to appear in EA/V. Introduction 1-1 Chapter 2 Step-by-Step Host Installation Before you install the SightLine Power Agent software, make sure you can satisfy the following requirements: • Make sure you have the correct Power Agent software for your VOS hardware platform and TCP “flavor,” and the necessary AccessKey provided by your software provider. • You will need about 15 MB of disk space to hold the program modules, and some additional space (configurable) to hold the collected performance data. You should allow for at least 25 MB of total disk space. • You have control over how much disk space will be used on your VOS system for storing performance data. You can choose to configure the allocation of space by Day, Hours, or Megabytes for performance data storage. If you configure by Days or Hours, expect approximately 20 megabytes of disk space required for each day. The size of the Power Agent trace files depends directly on the size and configuration of the system being monitored. Aspects such as the Interface Agents loaded, and the number of processes and disks on the system, can significantly affect the size. The size of trace files can also be directly affected by altering the number of seconds between samples. The relationship between resources required, both CPU cycles and disk space, and the sample interval is fairly linear. • You will need a PC or workstation with at least 50 MB of free disk space and a TCP/IP connection to the VOS system. The software is easy to install. If you are familiar with your platform’s installation facility, you can install SightLine by following the steps in the next sections. Installation Steps You can install the host kits in one of two ways: from a VOS product tape using install_new_release, or from an FTP bundled image. VOS Product Tape You can install the software directly from the tape by using the following commands: 1. Load the SightLine for VOS tape and run install_new_release, as described in the Stratus manual VOS Installation Guide (R386-02). Step-by-Step Host Installation 2-1 Step-by-Step Host Installation 2. Change to the SightLine directory and run the post-installation command macro (install_sl.cm). Follow the instructions described in the Post-Installation Configuration section. FTP Bundled Image 1. FTP the kit for your VOS hardware platform from the appropriate FTP site. It should be transferred in binary mode (type = binary). Place the kit in any directory you choose. SightLine will be installed in a subdirectory. 2. Create the SightLine directory: create_dir SightLine 3. Unbundle the package. You will need to have the necessary command macros (unbundle.cm) and program modules (decode_vos_file.pm and gzip.pm) installed and their location defined in your command library path using the add_library_path command. A description of bundle/unbundle and all the necessary files can be found on the Stratus public FTP site, ftp://ftp.stratus.com/pub/vos/utility/README.txt. unbundle kit-name SightLine 4. Run the post-installation command macro (install_sl.cm), and follow the instructions described in the next section. change_current_dir SightLine install_sl Post Installation Configuration The next step is to run the post-installation configuration script, install_sl.cm. The macro will complete the configuration portion of the installation. It will prompt you for the following information: • • • • • 1. the AccessKey the collection interval the amount of data to be stored on the host system the IP address or name of the host system a 3-character short-id that uniquely identifies this host If you are upgrading from ViewPoint, the following prompt will appear: There is an existing installation of ViewPoint in FRTLHOME Do you want to upgrade from this? Enter 'n' to change directory 2. Enter the Access Key string: Type the AccessKey string following this prompt. 2-2 SightLine Power Agent for VOS Systems: Power Agent (R464) Step-by-Step Host Installation 2. Enter the data collection interval [default = 30]: Enter an integer number of seconds for the collection interval. 3. Enter the data retention period for the host trace file (example formats: 24h, 10m, or 3d) [default = 10m]: Enter the desired parameter for the maximum amount of data to be stored locally. This can be defined in either hours (h), days (d), or megabytes (m). The default is 10m, or 10 megabytes. 4. You must specify the system_name or IP address of this system The system name you use must resolve to the IP address of this system Enter the system name or the IP address: Specify an IP address or system name that will resolve to the IP address of the host system. A PING will be executed to make sure the name entered does properly resolve. 5. When monitoring data from multiple machines, the SightLine/PC client uses a 3 character identifier to uniquely identify a particular machine. You MUST choose a 1 to 3 character string that will make it obvious which machine a particular metric is coming from. Enter a new short-id or accept [default = system_name]: Enter up to 3 characters to be used as the short-id. When these configuration parameters have been entered, the macro will ask if you want to start the software now. When you answer this prompt, the post-installation configuration will be complete. N O T E ———————————————————————— Do not start the software at this point if you want to configure the analyze_system interface. Configure this interface by editing the SightLine>etc>analyze_system.conf file. If you do not want this interface, edit SightLine>bin>slagent.cm and set frtlasi to 0. (Refer to Appendix A for information about the analyze_system interface.) To start the software manually, enter the following: change_current_dir FRTLHOME>bin slagent start N O T E ————————————————————————You will need to replace FRTLHOME with the full path to your SightLine directory. To stop the software manually, pass stop as the first parameter to the slagent macro instead of start. The command looks like this (substituting the path for FRTLHOME): Step-by-Step Host Installation 2-3 Step-by-Step Host Installation change_current_dir FRTLHOME>bin slagent stop The slagent macro can be used to start the SightLine Power Agent processes in diagnostic mode. Diagnostic mode causes additional information to be written to the log files in the FRTLHOME>log directory, which can be of use in diagnosing problems. Use this syntax: slagent [-da ] [ -ds ] [ -dd ] [ -dt ] { start | stop | restart | status } where: starts starts starts starts -da -ds -dd -dt agentmgr in diagnostic mode servd in diagnostic mode datamgr in diagnostic mode threshd in diagnostic mode There are other configuration parameters that can be modified. See the remaining chapters in this Power Agent section of the SightLine for VOS Systems User’s Guide for detailed information about configuring the host software to achieve operational objectives. I M P O R T A N T ————————————————————Please review the Release Notes accompanying this software for any information that may have been available too late to be included in this guide. Default Directory Structure SightLine is now installed in a frtl directory and contains the following directories under the FRTLHOME directory: bin data etc lib log install Program modules directory HostTraceFile directory Configuration directory Library files directory Log files directory Base files used during the installation process Each of these directories is described in more detail below. The bin Directory The bin directory contains the executable program files and start/stop scripts. The initial contents of the bin directory are: Main SightLine program files: agentmgr.pm datamgr.pm servd.pm 2-4 SightLine Power Agent for VOS Systems: Power Agent (R464) Step-by-Step Host Installation protomgr.pm as_iface.pm Associated programs and scripts: slagent.cm datadump.pm db2vtx.pm cvtag43to61.pm cvtpr43to61.pm See the section Associated Programs and Scripts for more details. The data Directory The data directory contains the host trace files that store the performance data. This file is circular and, by default, resides in the Local subdirectory. The data directory is initially empty. The first time datamgr.pm is executed, the Local subdirectory is created and contains the following files: registry Local.htf Local.idx performance metrics registry host trace file index file The etc Directory The etc directory contains the configuration files. There is one configuration file for each of the main program modules. Each configuration file has the same name as the corresponding program file, with a .conf suffix. The initial contents of the etc directory are: agentmgr.conf datamgr.conf servd.conf protomgr.conf threshd.conf analyze_system.conf control.harvest The log Directory The log directory contains diagnostic log files written by the SightLine programs. The log directory is initially empty. The following files are generated by the corresponding SightLine programs: agentmgr.log datamgr.log Step-by-Step Host Installation 2-5 Step-by-Step Host Installation servd.log protomgr.log Associated Programs and Scripts The datadump.pm Program The datadump program is a diagnostic tool for inspecting the contents of the performance database files. You should only use it when you are requested to do so by Technical Support personnel. The slagent.cm Macro The slagent command macro is used to manually start and stop the SightLine VOS processes. It can be edited to select the command line options for these processes. The db2vtx.pm Macro The db2vtx program extracts data from the VOS performance database file and converts it into VTX and VEV format. It provides facilities to select a section of the trace file, using fixed rules or rules based on the content of the file. db2vtx obtains its data using the datamgr database manager agent. This is part of the SightLine Power Agent software. The datamgr agent can be running on the local system, or on a remote system connected with TCP/IP. Also, the database can contain performance data on the local system or a remote system. The Power Agent software sends the symbol table to Expert Advisor/Vision (EA/V) on the PC, reads all of the data a second time, and then sends the symbol table to EA/V again. This procedure is followed to avoid sending unnecessary symbol table changes to EA/V. If the initial data scan is expected to take a significant amount of time, dialog boxes are displayed on EA/V at the start and end of the scan. These dialog boxes will automatically time-out. The as_iface.pm Macro The as_iface program is called by the slagent command macro to start the interface with analyze_system. (Refer to Appendix A for more information about the analyze_system interface.) The cvtag43to61.pm and cvtpr43to 61.pm Macros The cvtag43to61 and cvtag43to61 programs are used to update ViewPoint version 4.3 programs and command macros to SightLine version 6.1 programs and command macros. 2-6 SightLine Power Agent for VOS Systems: Power Agent (R464) Chapter 3 PC-to-VOS Host Configuration Parameters The first time you run SightLine Expert Advisor/Vision (EA/V) on the PC, and after you enter an AccessKey, the agents that are on the same subnet as the EA/V workstation will be displayed in the Enterprise View (the left pane of the EA/V application window). If no connections are showing in the Enterprise View, right -click on the Enterprise icon and select AutoDiscover. Right -click on a hostname and select Edit Connection to examine and modify the settings for a particular host. If there an entry for a particular host does not exist, you will need to manually add a connection as follows: 1. Right -click the Enterprise icon and select New Host. 2. Enter the hostname in the Name field, and specify the host name or IP address in the Host field. If the host name you’ve entered does not resolve to the managed node’s IP address, check the hosts file on the local machine or the local Domain Name Server (DNS) set up in your environment to ensure that an entry exists for the managed node. If DNS is not set up in your environment, use the PC’s hosts file to ensure the resolution of machine names with IP addresses. To configure your PC trace file, right-click the Host Connection, select Edit Connection, and click the Edit button next to the Trace File field. The Trace File to Capture Data Into dialog box will appear. Edit this dialog box as follows: 1. The Save in box in the middle of the dialog box specifies the location of the trace file. Navigate to the directory where the trace file is to be stored. 2. Enter the name of the PC trace file in the File name text box. The default name is the first eight characters of the hostname, unless it contains non-alphanumeric characters. 3. The Host ID field will contain the default host ID, which identifies which trace file each metric comes from. You can update the host ID to make it longer or more host-specific. 4. Optionally, fill the File Info text box with a phrase describing the trace file. 5. The default size of the PC trace file (20 Mbytes) is shown in the File Size in MB text box in the lower right of the dialog box. If you would like this file to contain approximately 24 hours of data, leave this value at the default. 6. In the Dynamic Attributes box, there will be a Reserve Space of 20% allocated by default. This will accommodate changes to the trace file’s datablock. 7. Choose either the Append or Create option in the Trace File Initialization at the bottom of the dialog box. PC-to-VOS Host Configuration Parameters 3-1 PC-to-VOS Host Configuration Parameters 8. a. Append accesses the Trace files on the corresponding managed system and automatically appends any data to the PC Trace file from the available times from the host file. Essentially, the PC trace file catches itself up with any data it might have missed due to a loss in the connection. The Define Trace File Times to Download dialog box will appear once when the trace file is first initialized. From that point on, SightLine will use the initial host metrics for all future downloads and automatically update the trace file with any data that it has not yet downloaded from the Agent machine. Please be aware, considerable host resources can be consumed during this “catch-up” process. b. Create forces the Agent machine to send down a new set of host metrics each time a download is requested. The Define Trace File Times to Download dialog box will appear for each download. Additionally, the trace file will be reinitialized and overwritten each time. Functionally, Create should be used each time there is a change on the Managed Agent machine such as, addition of new peripherals, a change in the File systems, or the addition of a new processor. Click Save to exit the Trace File to Capture Data Into dialog box. Now that you have configured where to store the data for this connection, open the connection to the Power Agent by right -clicking on the connection and selecting Connect Now! Once the systems have established communications a dialog box will appear. The Define Trace File Times to Download dialog box will display a time period that covers the available times in the host trace file for the download system. If you want to download data that has already been collected, drag the slider bar left until the desired start time is displayed. If you want to only download a specific time period, uncheck the Continue into live session check box and position the start and end sliders so they include the time period you want to download. 3-2 SightLine Power Agent for VOS Systems: Power Agent (R464) Chapter 4 Agentmgr The Agent Manager (agentmgr) program module is the coordinator for all the SightLine Agents. It is intended to run continuously in the background and should be started at boot time. All performance data is reported through agentmgr by Interface Agents loaded from the FRTLHOME>lib>interface directory during startup. At each interval, agentmgr makes calls to each Interface Agent for updated data. It is important to remember that agentmgr will attempt to load every single file in this directory. In order to deactivate a given Interface Agent, it must be moved to another directory. The agentmgr’s configuration file contains the definitions for process filtering, defining logical workloads, and defining new metrics. Each of these items is discussed in more detail in the following sections. By default, the agentmgr uses port 8700. If there is a conflict in your environment and this port is already in use, you need to edit the slagent.cm command macro in FRTLHOME>bin to start the agentmgr with an alternate port number. Edit the agentmgr_port variable located near the top of the macro to reflect the new port number. You will also need to change the port number in datamgr.conf, as described in Chapter 5 of this manual. The Class Hierarchy Before describing how the agentmgr’s configuration file can be used, it is helpful to understand how the various performance metrics are organized by the agentmgr. In addition, some terms will be defined that are used throughout this document. Internally, the agentmgr organizes metrics (also called metric variables, or simply variables) into a hierarchy of metric classes. Each class can have its own set of metrics as well as a collection of zero or more subclasses. When specifying the names of metrics, the full class membership is specified using a dot notation (.). For example, the following metric is defined for the node VOS System: VOS System.CPU.Idle The following specifies the metric CPU Queue Busy Time for the class CPU, which is a subclass of VOS System: VOS System.CPU.Cpu Queue Busy Time In addition, some classes are defined as array classes. An example of a metric for an array class is: VOS System.Path.Read Queue Completions agentmgr 4-1 agentmgr For these classes, array names can be used to specify individual members of the array. If this Power Agent is configured to monitor three paths (files), there will be three members in the array class. Default agentmgr.conf File The default agentmgr.conf file is shown in Figure 4-1 and described in the following sections. Comments can be embedded in the configuration files by using the “#” character. Any text that follows the “#” on the same line will be ignored. CONFIG FILTER FILTER FILTER FILTER key { { { { VOS VOS VOS VOS XXXX-YYY-ZZZZZ; System.Processes.Reads } >= System.Processes.Writes } >= System.Processes.Page Faults } >= System.Processes.% Cpu } >= 0.01; 0.01; 0.01; 1.0; COMPUTATIONS CLASS { CPU Extra } = { VOS System.CPU } CRITERIA CPUExtra = { Empty Idle } >= 0; INCLUSIVE VARIABLE U_INT { CPU Logical Cpus } GROUPNAME { Module CPU Utilization } PCNAME { CPU Logical Cpus } POSITION { 1 } = ({ Cpu Seconds } / { Number Seconds }); VARIABLE FLOAT { CPU Wait Secs } GROUPNAME { Module CPU Utilization } PCNAME { CPU Wait Time Secs } POSITION { 30 } = ({ Cpu Queue Wait Time } - { Cpu Queue Busy Time }); VARIABLE FLOAT { CPU % Residence } GROUPNAME { Module CPU Utilization } PCNAME { CPU % Residence } POSITION { 31 } = (100 * { Cpu Queue Wait Time } / { Cpu Seconds } ); VARIABLE FLOAT { CPU % Busy } GROUPNAME { Module CPU Utilization } PCNAME { CPU % Busy } POSITION { 32 } = (100 * { Cpu Queue Busy Time } / { Cpu Seconds } ); VARIABLE FLOAT { CPU % Wait } GROUPNAME { Module CPU Utilization } PCNAME { CPU % Wait } POSITION { 33 } = (100 * ({ Cpu Queue Wait Time } - { Cpu Queue Busy Time }) / { Cpu Seconds } ); VARIABLE FLOAT { CPU Other Secs } GROUPNAME { Module CPU Utilization } PCNAME { CPU Other Time Secs } POSITION { 20 } = { Cpu Seconds } - ({ System } + { User } + { Server } + { Interrupt } + { Empty Idle } + { User Page Fault Time } + { System Page Fault Time } + { Server Page Fault Time }); VARIABLE FLOAT { CPU % System } GROUPNAME { Module CPU Utilization } PCNAME { CPU % System } POSITION { 4 } = 100 * { System } / { Cpu Seconds }; VARIABLE FLOAT { CPU % User } GROUPNAME { Module CPU Utilization } PCNAME { CPU % User } POSITION { 5 } = 100 * { User } / { Cpu Seconds }; 4-2 SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr VARIABLE FLOAT { CPU % Server } GROUPNAME { Module CPU Utilization } PCNAME { CPU % Server } POSITION { 6 } = 100 * { Server } / { Cpu Seconds }; VARIABLE FLOAT { CPU % Interrupt } GROUPNAME { Module CPU Utilization } PCNAME { CPU % Interrupts } POSITION { 7 } = 100 * { Interrupt } / { Cpu Seconds }; VARIABLE FLOAT { CPU % Empty Idle } GROUPNAME { Module CPU Utilization } PCNAME { CPU % Idle } POSITION { 12 } = 100 * { Empty Idle } / { Cpu Seconds }; VARIABLE FLOAT { CPU % User PF } GROUPNAME { Module CPU Utilization } PCNAME { CPU % User PF } POSITION { 8 } = 100 * { User Page Fault Time } / { Cpu Seconds }; VARIABLE FLOAT { CPU % System PF } GROUPNAME { Module CPU Utilization } PCNAME { CPU % System PF } POSITION { 9 } = 100 * { System Page Fault Time } / { Cpu Seconds }; VARIABLE FLOAT { CPU % Server PF } GROUPNAME { Module CPU Utilization } PCNAME { CPU % Server PF } POSITION { 10 } = 100 * { Server Page Fault Time } / { Cpu Seconds }; VARIABLE FLOAT { CPU % Other } GROUPNAME { Module CPU Utilization } PCNAME { CPU % Other } POSITION { 11 } = 100 * ({ Cpu Seconds } - ({ System } + { User } + { Server } + { Interrupt } + { Empty Idle } + { User Page Fault Time } + { System Page Fault Time } + { Server Page Fault Time })) / { Cpu Seconds }; VARSET { Name } = "CPUExtra" criteria CPUExtra; end CLASS CLASS { DiskQueue } = { VOS System.IOPs.Busses.Controllers.Disks } COMPUTE VARIABLE FLOAT { Read Queue Wait Secs } GROUPNAME { Module Disk Units } PCNAME { Disk Rd Wait Time Secs } POSITION { 15 } = { Read Queue Wait Time } - { Read Queue Busy Time }; VARIABLE FLOAT { Write Queue Wait Secs } GROUPNAME { Module Disk Units } PCNAME { Disk Wr Wait Time Secs } POSITION { 16 } = { Write Queue Wait Time } - { Write Queue Busy Time }; VARIABLE FLOAT { Disk % Busy } GROUPNAME { Module Disk Units } PCNAME { Disk % Busy } POSITION { 1 } = 100 * ({ Read Queue Busy Time } + { Write Queue Busy Time }) / { Read Queue Time }; VARIABLE FLOAT { Disk I/Os/Sec } GROUPNAME { Module Disk Units } PCNAME { Disk I/Os/Sec } POSITION { 2 } = ({ Read Queue Completions } + { Write Queue Completions }) / { Read Queue Time }; VARIABLE FLOAT { Disk Avg Res Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Res Time } POSITION { 3 } = 1000 * ({ Read Queue Wait Time } + { Write Queue Wait Time }) / ({ Read Queue Completions } + { Write Queue Completions }); VARIABLE FLOAT { Disk Avg Serv Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Serv Time } POSITION { 4 } = 1000 * ({ Read Queue Busy Time } + { Write Queue Busy Time }) / ({ Read Queue Completions } + { Write Queue Completions }); agentmgr 4-3 agentmgr VARIABLE FLOAT { Disk Avg Queue Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Queue Time } POSITION { 5 } = 1000 * (({ Read Queue Wait Time } - { Read Queue Busy Time }) + ({ Write Queue Wait Time } - { Write Queue Busy Time })) / ({ Read Queue Completions } + { Write Queue Completions }); VARIABLE FLOAT { Disk Avg Queue Length } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Queue Length } POSITION { 6 } = (({ Read Queue Wait Time } - { Read Queue Busy Time }) + ({ Write Queue Wait Time } - { Write Queue Busy Time })) / ({ Read Queue Busy Time } + { Write Queue Busy Time }); VARIABLE FLOAT { Disk Degradation } GROUPNAME { Module Disk Units } PCNAME { Disk Degradation } POSITION { 7 } = ({ Read Queue Wait Time } + { Write Queue Wait Time }) / ({ Read Queue Busy Time } + { Write Queue Busy Time }); VARIABLE FLOAT { Disk Concurrency } GROUPNAME { Module Disk Units } PCNAME { Disk Concurrency } POSITION { 8 } = ({ Read Queue Wait Time } + { Write Queue Wait Time }) / { Read Queue Time }; VARIABLE FLOAT { Disk Avg Rd Res Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Rd Res Time } POSITION { 9 } = 1000 * { Read Queue Wait Time } / { Read Queue Completions }; VARIABLE FLOAT { Disk Avg Rd Serv Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Rd Serv Time } POSITION { 10 } = 1000 * { Read Queue Busy Time } / { Read Queue Completions }; VARIABLE FLOAT { Disk Avg Rd Queue Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Rd Queue Time } POSITION { 11 } = 1000 * ({ Read Queue Wait Time } - { Read Queue Busy Time }) / { Read Queue Completions }; VARIABLE FLOAT { Disk Avg Wr Res Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Wr Res Time } POSITION { 12 } = 1000 * { Write Queue Wait Time } / { Write Queue Completions }; VARIABLE FLOAT { Disk Avg Wr Serv Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Wr Serv Time } POSITION { 13 } = 1000 * { Write Queue Busy Time } / { Write Queue Completions }; VARIABLE FLOAT { Disk Avg Wr Queue Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Wr Queue Time } POSITION { 14 } = 1000 * ({ Write Queue Wait Time } - { Write Queue Busy Time }) / { Write Queue Completions }; end CLASS CLASS { PathQueue } = { VOS System.Path } COMPUTE VARIABLE FLOAT { Path Read Wait Secs } GROUPNAME { Module Path Meters } PCNAME { Path Rd Wait Time Secs } POSITION { 15 } = { Read Queue Wait Time } - { Read Queue Busy Time }; VARIABLE FLOAT { Path Write Wait Secs } GROUPNAME { Module Path Meters } PCNAME { Path Wr Wait Time Secs } POSITION { 16 } = { Write Queue Wait Time } - { Write Queue Busy Time }; VARIABLE FLOAT { Path % Busy } GROUPNAME { Module Path Meters } PCNAME { Path % Busy } POSITION { 1 } = 100 * ({ Read Queue Busy Time } + { Write Queue Busy Time }) / { Read Queue Time }; VARIABLE FLOAT { Path I/Os/Sec } GROUPNAME { Module Path Meters } PCNAME { Path I/Os/Sec } POSITION { 2 } = ({ Read Queue Completions } + { Write Queue Completions }) / { Read Queue Time }; 4-4 SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr VARIABLE FLOAT { Path Avg Res Time } GROUPNAME { Module Path Meters } PCNAME { Path Avg Res Time } POSITION { 3 } = 1000 * ({ Read Queue Wait Time } + {Write Queue Wait Time }) / ({ Read Queue Completions } + { Write Queue Completions }); VARIABLE FLOAT { Path Avg Serv Time } GROUPNAME { Module Path Meters } PCNAME { Path Avg Serv Time } POSITION { 4 } = 1000 * ({ Read Queue Busy Time } + { Write Queue Busy Time }) / ({ Read Queue Completions } + { Write Queue Completions }); VARIABLE FLOAT { Path Avg Queue Time } GROUPNAME { Module Path Meters } PCNAME { Path Avg Queue Time } POSITION { 5 } = 1000 * (({ Read Queue Wait Time } - { Read Queue Busy Time }) + ({ Write Queue Wait Time } - { Write Queue Busy Time })) / ({ Read Queue Completions } + { Write Queue Completions }); VARIABLE FLOAT { Path Avg Queue Length } GROUPNAME { Module Path Meters } PCNAME { Path Avg Queue Length } POSITION { 6 } = (({ Read Queue Wait Time } - { Read Queue Busy Time }) + ({ Write Queue Wait Time } - { Write Queue Busy Time })) / ({ Read Queue Busy Time } + { Write Queue Busy Time }); VARIABLE FLOAT { Path Degradation } GROUPNAME { Module Path Meters } PCNAME { Path Degradation } POSITION { 7 } = ({ Read Queue Wait Time } + { Write Queue Wait Time }) / ({ Read Queue Busy Time } + { Write Queue Busy Time }); VARIABLE FLOAT { Path Concurrency } GROUPNAME { Module Path Meters } PCNAME { Path Concurrency } POSITION { 8 } = ({ Read Queue Wait Time } + { Write Queue Wait Time }) / { Read Queue Time }; VARIABLE FLOAT { Path Avg Rd Res Time } GROUPNAME { Module Path Meters } PCNAME { Path Avg Rd Res Time } POSITION { 9 } = 1000 * { Read Queue Wait Time } / { Read Queue Completions }; VARIABLE FLOAT { Path Avg Rd Serv Time } GROUPNAME { Module Path Meters } PCNAME { Path Avg Rd Serv Time } POSITION { 10 } = 1000 * { Read Queue Busy Time } / { Read Queue Completions }; VARIABLE FLOAT { Path Avg Rd Queue Time } GROUPNAME { Module Path Meters } PCNAME { Path Avg Rd Queue Time } POSITION { 11 } = 1000 * ({ Read Queue Wait Time } - { Read Queue Busy Time }) / { Read Queue Completions }; VARIABLE FLOAT { Path Avg Wr Res Time } GROUPNAME { Module Path Meters } PCNAME { Path Avg Wr Res Time } POSITION { 12 } = 1000 * { Write Queue Wait Time } / { Write Queue Completions }; VARIABLE FLOAT { Path Avg Wr Serv Time } GROUPNAME { Module Path Meters } PCNAME { Path Avg Wr Serv Time } POSITION { 13 } = 1000 * { Write Queue Busy Time } / { Write Queue Completions }; VARIABLE FLOAT { Path Avg Wr Queue Time } GROUPNAME { Module Path Meters } PCNAME { Path Avg Wr Queue Time } POSITION { 14 } = 1000 * ({ Write Queue Wait Time } - { Write Queue Busy Time }) / { Write Queue Completions }; end CLASS CLASS { Process Info } = { VOS System.Processes } CRITERIA Stop = { State } = /Stopped/; CRITERIA Ready = { State } = /Rdy/; CRITERIA Frozen = { State } = /Frozen/; CRITERIA WaitShort = { State } = /WaitShrt/; EXCLUSIVE agentmgr 4-5 agentmgr VARIABLE U_INT { TotalS } GROUPNAME { Module Processes } PCNAME { Procs Count } POSITION { 1 } = 1; VARSET { VARSET { VARSET { VARSET { end CLASS Name Name Name Name } } } } = = = = "Stopped" "Rdy" "Frozen" "WaitShrt" criteria criteria criteria criteria Stop; Ready; Frozen; WaitShort; CLASS { Workloads } = { VOS System.Processes } CRITERIA System = { Process Name } = /TheOverseer/ OR { Process Name } = /BatchOverseer/ OR { Process Name } = /mail_handler/ OR { Process Name } = /rsn/ OR { Process Name } = /TPOverseer/ OR { Process Name } = /Cache_Manager/; CRITERIA LinkServer = { Process Name } = /LinkServer/ OR { Process Name } = /osl_server/ ; CRITERIA OtherServer = { Process Name } = /network_client/ OR { Process Name } = /network_server/ OR { Process Name } = /open_client/ OR { Process Name } = /open_server/; CRITERIA SightLine = { Process Name } = /agentmgr/ OR { Process Name } = /servd/ OR { Process Name } = /datamgr/ OR { Process Name } = /protomgr/ OR { Process Name } = /threshd/; CRITERIA FtpD = { Program Name } = /ftpd.pm/; CRITERIA InetD = { Program Name } = /inetd.pm/; CRITERIA PagingD = { Person Name } = /Paging_Daemon/; CRITERIA QrunD = { Process Name } = /Qrun_Daemon/; CRITERIA MiscUtil = { Process Name } = /Maintenance_Utility/ OR { Process Name } = /Diagnostic_Utility/ OR { Process Name } = /Kernel_Utility/; CRITERIA Other EXCLUSIVE = 1; VARIABLE FLOAT { % Cpu } GROUPNAME { Module Workloads } PCNAME { Wkld % Cpu } POSITION { 1 } = { % Cpu }; VARIABLE FLOAT { Page Faults } GROUPNAME { Module Workloads } PCNAME { Wkld PgFlt } POSITION { 2 } = { Page Faults }; VARIABLE FLOAT { % Page Fault Time } GROUPNAME { Module Workloads } PCNAME { Wkld % PgFlt Time } POSITION { 3 } = { % Page Fault }; VARIABLE FLOAT { Reads } GROUPNAME { Module Workloads } PCNAME { Wkld Reads } POSITION { 4 } = { Reads }; VARIABLE FLOAT { Writes } GROUPNAME { Module Workloads } PCNAME { Wkld Writes } POSITION { 5 } = { Writes }; VARIABLE FLOAT { CPU Completions/Sec } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Completes/Sec } POSITION { 7 } = { Cpu Queue Completions/Sec }; 4-6 SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr VARIABLE FLOAT { CPU Queue Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Queue ET } POSITION { 8 } = { Cpu Queue Time }; VARIABLE FLOAT { CPU Residence Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Residence Time } POSITION { 9 } = { Cpu Queue Wait Time }; VARIABLE FLOAT { CPU Busy Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Busy Time } POSITION { 10 } = { Cpu Queue Busy Time }; VARIABLE FLOAT { CPU Wait Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Wait Time } POSITION { 11 } = { Cpu Queue Wait Time } - { Cpu Queue Busy Time }; VARIABLE FLOAT { CPU % Busy Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU % Busy Time } POSITION { 12 } = { % Cpu Queue Busy Time }; VARIABLE FLOAT { Disk Read Completions/Sec } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Completes/Sec } POSITION { 13 } = { Disk Read Queue Completions/Sec }; VARIABLE FLOAT { Disk Read Queue Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Queue ET } POSITION { 14 } = { Disk Read Queue Time }; VARIABLE FLOAT { Disk Read Residence Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Res Time } POSITION { 15 } = { Disk Read Queue Wait Time }; VARIABLE FLOAT { Disk Read Busy Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Busy Time } POSITION { 16 } = { Disk Read Queue Busy Time }; VARIABLE FLOAT { Disk Read Wait Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Wait Time } POSITION { 17 } = { Disk Read Queue Wait Time } - { Disk Read Queue Busy Time }; VARIABLE FLOAT { Disk Read % Busy Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd % Busy Time } POSITION { 18 } = { % Disk Read Queue Busy Time }; VARIABLE FLOAT { Cache Read Hits } GROUPNAME { Module Workloads } PCNAME { Wkld Cache Read Hits } POSITION { 19 } = { Cache Read Hits }; VARIABLE FLOAT { Cache Read Misses } GROUPNAME { Module Workloads } PCNAME { Wkld Cache Read Misses } POSITION { 20 } = { Cache Read Misses }; VARIABLE FLOAT { Cache Soiled } GROUPNAME { Module Workloads } PCNAME { Wkld Cache Soiled } POSITION { 21 } = { Cache Soiled }; VARIABLE FLOAT { Interrupts Pluses } GROUPNAME { Module Workloads } PCNAME { Wkld Interrupts } POSITION { 22 } = { Interrupts Pluses }; VARIABLE FLOAT { New Shared Memory } GROUPNAME { Module Workloads } PCNAME { Wkld Shared Memory } POSITION { 23 } = { New Shared Memory }; VARIABLE FLOAT { New Unshared Memory } GROUPNAME { Module Workloads } PCNAME { Wkld Unshared Memory } POSITION { 24 } = { New Unshared Memory }; VARIABLE U_INT { Total } GROUPNAME { Module Workloads } PCNAME { Wkld Total } POSITION { 6 } = 1; agentmgr 4-7 agentmgr VARSET { Name VARSET { Name VARSET { Name VARSET { Name VARSET { Name VARSET { Name VARSET { Name VARSET { Name VARSET { Name VARSET { Name end CLASS end COMPUTATIONS } } } } } } } } } } = = = = = = = = = = "System" "LinkServer" "OtherServer" "SightLine" "FtpD" "InetD" "PagingD" "MiscUtil" "QrunD" "Other" criteria criteria criteria criteria criteria criteria criteria criteria criteria criteria System; LinkServer; OtherServer; SightLine; FtpD; InetD; PagingD; MiscUtil; QrunD; Other; Figure 4-1. Default agentmgr.conf File The CONFIG Statement The first section of the agentmgr.conf file allows users to configure two important parameters: the AccessKey and the maximum number of processes to be managed (the nproc parameter). Each of these parameters is configured with the CONFIG keyword and is described in detail in the following sections. AccessKey Configuration In order to run the SightLine Power Agent software, a valid AccessKey must be entered in the agentmgr.conf file. This key should be provided by the vendor from whom you received the software, and is entered during installation. This key controls the expiration of the software. To configure the AccessKey, edit the CONFIG key line of the agentmgr.conf file as shown in the example below. Be sure to keep the semicolon at the end of the line. Example: CONFIG key XXXX-YYY-ZZZZZ; Nproc Configuration The SightLine Power Agent monitors the performance of a system at the detailed level of the processes that are consuming system resources. In other words, users not only have the ability to view overall system metrics such as CPU utilization, system cache statistics, and paging activity, but with SightLine they also are able to see the processes that actually account for these resources. To control the overhead of performance management, the agentmgr has the ability to limit the number of processes it collects at each interval. This limits the space required to store the performance data. The agentmgr limits the number of processes stored using the nproc parameter. Although the agentmgr looks at all the processes running on the system, it only stores the top ones. The most active processes are determined using an algorithm based on the FILTER statements. (See the next section.) Note that the nproc limit is only applied if there is at 4-8 SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr least one relevant FILTER statement. To configure the nproc limit, add the CONFIG nproc line of the agentmgr.conf file as shown in the example below. Example: CONFIG nproc 30; In this example, the nproc limit is configured to be the top 30 processes. FILTER Statements The agentmgr has the ability to filter out “uninteresting” or unwanted members of array classes. This helps decrease the overhead of performance management, because less space is required to store performance data after extraneous members have been filtered out. For example, given the array set of processes, it is rarely the idle process that causes performance problems. Process filtering is accomplished with two methods. The first is with the nproc parameter, which sets the limit of the maximum number of processes saved. The second method is by specifying FILTER statements in the agentmgr’s configuration file (agentmgr.conf). Note that the FILTER statements can be used to filter members of any array class, and nproc applies to all classes that have filters. There can be zero or more FILTER statements; all will be included in the agentmgr’s consideration of which class members to keep. If no FILTER statements are used, then the agentmgr will process all the members of all array classes. Once a member has met the criteria for any one of the FILTER statements, it is included in the collection. The syntax of the FILTER statement is as follows: FILTER { <array class>.<metric name> } >= value; The <array class> is any fully specified array class. The <metric name> is any metric name that is defined for the array class. The FILTER statements in the default agentmgr.conf (Figure 4-1) are: FILTER FILTER FILTER FILTER { { { { VOS VOS VOS VOS System.Processes.Reads } >= System.Processes.Writes } >= System.Processes.Page Faults } >= System.Processes.% Cpu } >= 0.01; 0.01; 0.01; 1.0; These statements specify that a process will be filtered if it does not meet any of the following criteria: • • • Consume 1% or more of the CPU, or Generate more than .01 page faults per second, or Perform more than 0.01 reads or writes per second agentmgr 4-9 agentmgr The COMPUTATIONS Section The COMPUTATIONS section of the agentmgr.conf file is used to define user-specific metrics. It accomplishes this by creating new array classes using the basic array classes that are delivered with the software. The syntax of the COMPUTATIONS section is as follows: COMPUTATIONS CLASS <New Array Class> = <Existing Array Class> CRITERIA <Criteria_Name> = <Criteria Boolean> . . INCLUSIVE | EXCLUSIVE | COMPUTE VARIABLE U_INT | INT | FLOAT { New Metric Name } GROUPNAME { EA/V Group Name } PCNAME { EA/V Metric Name } POSITION { Position in Metric list } = <Metric Expression>; . . VARSET { Name } = "<VarSetName>" criteria <Criteria_Name>; . . VARSET { Name } = "Others" criteria <System_others>; end CLASS end COMPUTATIONS The COMPUTATIONS section consists of one or more CLASS definitions. These definitions are used to define new array classes. A new array class is specified by first identifying an existing array class. At each sampling interval, all members of the existing array class are examined and the values of its metrics are evaluated. If specified criteria are met, metrics for the newly specified class are computed. CLASS A CLASS specification is used to define a new array class. It uses CRITERIA, VARIABLE, VARSET, and COMPUTE statements for its definition. The syntax of the CLASS statement is as follows: CLASS <New Array Class> = <Existing Array Class> <New Array Class> is the name of the array class being created, and <Existing Array Class> is the name of the existing array class upon which the new array class will be based. 4-10 SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr CRITERIA A CRITERIA is a definition of which members are used to establish membership to the array elements of the new class array. Membership is used by referencing metrics in the existing class. The syntax of the CRITERIA statement is as follows: CRITERIA <Criteria_Name> = <Criteria Boolean> <Criteria Boolean> evaluates to a boolean value. The boolean expression can include =, >, <=, and >= when comparing metrics to values. Also, a number of comparisons can be joined using logical operators, such as OR and AND. An example would be: { metric1 } = /S/ OR { metric2 } >= 3 The pattern matching used for CRITERIA evaluation utilizes regular expressions. Regular expressions are presented in Figure 4-2, which explains the symbols used. Regular Expressions In: Metacharacter Basic regular expressions (stringmatching regexps) Extended regular expressions (stringmatching regexps): All BRE constructions plus: Matches Example matches * Zero or more occurrences of preceding character of regexp Pattern: Th*omas Matches: Thomas, Tomas, Thhomas (period) Any single character Pattern: string1 Matches: string12, string13, etc. [...] Any single character enclosed in the brackets Pattern: string[12] Matches: string1 and string2 only [^...] Any single character not enclosed in the brackets Pattern: string[^12] Matches: string3, string4, etc. ^ Start of line Pattern: ^Tom Matches: “Tom is here” $ End of line Pattern: Tom$ Matches: “Here is Tom” + One or more occurrences of preceding character or regexp Pattern: A+ Matches: A, AA, AAA, etc. ? Zero or one occurrences of preceding character or regexp Pattern: BA? Matches: BA, BB, BC, etc. Figure 4-2. Regular Expressions agentmgr 4-11 agentmgr N O T E ————————————————————————To use any of the metacharacters literally, place a “\” in front of any character that has a metacharacter interpretation. Example: \$STRING will be read literally as $STRING instead of a shell variable called STRING. EXCLUSIVE | INCLUSIVE The keyword EXCLUSIVE specifies that, once a CRITERIA specification is met, no other criteria are considered. The VARSETS are mutually exclusive in this respect. The keyword INCLUSIVE allows members to be included in multiple members of the new array. The VARSETS are inclusive and there can be overlapping data within the array class. COMPUTE The keyword COMPUTE establishes another way to define a new array class: computing values based on existing metrics. In the following example, we would like to have some new metrics for each disk that is a member of the VOS System.IOPs.Busses.Controllers.Disks array class. This is accomplished be creating a new array class, DiskQueue. Among many new metrics are: Disk % Busy, Disk I/Os/Sec, and Disk Avg Res Time. The members of the new array class will have the same names as the members of the existing array class. CLASS { DiskQueue } = { VOS System.IOPs.Busses.Controllers.Disks } COMPUTE VARIABLE FLOAT { Disk % Busy } GROUPNAME { Module Disk Units } PCNAME { Disk % Busy } POSITION { 1 } = 100 * ({ Read Queue Busy Time } + { Write Queue Busy Time }) / { Read Queue Time }; VARIABLE FLOAT { Disk I/Os/Sec } GROUPNAME { Module Disk Units } PCNAME { Disk I/Os/Sec } POSITION { 2 } = ({ Read Queue Completions } + { Write Queue Completions }) / { Read Queue Time }; VARIABLE FLOAT { Disk Avg Res Time } GROUPNAME { Module Disk Units } PCNAME { Disk Avg Res Time } POSITION { 3 } = 1000 * ({ Read Queue Wait Time } + { Write Queue Wait Time }) / ({ Read Queue Completions } + { Write Queue Completions }); end CLASS 4-12 SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr VARIABLE Metrics for the new array class are defined with VARIABLE statements. The syntax of the VARIABLE statement is as follows: VARIABLE U_INT | INT | FLOAT { New Metric Name } GROUPNAME { EA/V Group Name } PCNAME { EA/V Metric Name } POSITION { Position in Metric list } = <Metric Expression>; VARIABLEs can have one of three types: unsigned integer (U_INT), integer (INT), or float (FLOAT). All names must be unique throughout all class definitions. GROUPNAME { EA/V Group Name } specifies the group in the SightLine Expert Advisor/Vision (EA/V) metric list under which the defined metric will appear (specifically, in the Edit Plot Variable List dialog box). PCNAME { EA/V Metric Name } is the name that will appear in the EA/V metric list. POSITION { Position in Metric list } defines the order in which the metric will appear in the group. <Metric Expression> is a simple numeric expression that can be formed with metric names from the existing array class and the numeric operators - *, +, - or /. Note that these expressions are parsed with right to left precedence. For example, the following evaluates to 26: 10 + 3 * 2 VARSET The VARSET statement is used to define the name for referencing a member of the new array class. It associates the name with the members that met specific criteria as defined for the class array. Example of a CLASS Specification The following example will help explain the syntax and semantics of the CLASS definition. This particular example may not apply to your system, but it provides a simple way to gain insights into the CLASS definition. In this example, we want to evaluate the % Busy metric for each member of the Disks array class. Using this metric for the CRITERIA definition, we will create values for the I/Os and Total metrics for a new array class called Disk Activity. The two members of this new array class are referenced by the names Hot Disks and Cool Disks. agentmgr 4-13 agentmgr CLASS { Disk Activity } = { VOS System.IOPs.Busses.Controllers.Disks } CRITERIA IsHotDisk = { % Busy } >= 50; CRITERIA IsCoolDisk = { % Busy } >= 25; EXCLUSIVE VARIABLE FLOAT { I/Os } GROUPNAME { Disk Activity } PCNAME { Disk Act I/Os } POSITION { 1 } = { I/Os }; VARIABLE U_INT { TotalD } GROUPNAME { Disk Activity } PCNAME { Disk Act Total } POSITION { 2 } = 1; VARSET { Name } = "Hot Disks" criteria IsHotDisk; VARSET { Name } = "Cool Disks" criteria IsCoolDisk; end CLASS The CLASS statement is used to establish the new array class and the existing array class that will be used to create the values for the metrics. The CRITERIA statements are evaluated in order. Note that the keyword EXCLUSIVE is included. This means that if a member meets the first CRITERIA statement, it will not be checked to see if it meets any other criteria. In this way, disks that are 50% busy or greater will not be included in the criterion for 25% busy or greater. Two metrics are defined for the new array class: I/Os and TotalD. These are the internal registry names, which will appear in the registry.csv file under FRTLHOME>data>Local directory. (Please refer to the section Registry.csv Metric Description File for more information about the registry.csv file.) The PCNAME settings, Disk Act I/Os and Disk Act Total, are the names of the metrics that will appear in EA/V. When a member of the existing array class matches the proper criterion, the value of its metric is added to the newly defined metric. Notice that, by adding one (1) to the TotalD metric, we will have the total number of disks that met the specified criterion. The new array class will have two members that can be referenced with the names Hot Disks and Cool Disks. Membership to each array entry is based on the criteria defined above. In order to make this new array class visible in EA/V, we will have to add specifications to the protomgr.conf file. The following shows how we enable the Disk Activity array class. In the ENABLE section, include: ENABLE { Disk Activity } Using SightLine EA/V, you would be able to monitor these four new metrics: Disk Disk Disk Disk 4-14 Act Act Act Act I/Os for Hot Disks Total for Hot Disks I/Os for Cool Disks Total for Cool Disks SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr The Workloads Class One very important class defined in the configuration file is the Workloads class. From the standard set of VOS performance data that SightLine delivers, it is relatively simple to track resource utilization on a system-wide basis. However, to effectively manage your system, you also need to track specific users, groups of users, or applications. Only then will you be able to answer questions like: • How much of my CPU are the programmers using during prime time? • How much memory does the production application really need? • How much I/O is my application really doing? The semantics and syntax of workload definitions follow the description above for CLASS, but there are some important issues to address before you start to define this class for your organization. You should first consider exactly what your organization does, how it is organized, and how the VOS system provides services for the organization within the context of that overall scheme. Ideally, the workloads you define will equate either to functional areas within your organization, specific applications or sets of application images, or some other distinguishable “grouping” for processes doing related work on your system. That way, you will gain more insight about how your organization and your system fit together, how one workload affects the performance of other workloads, and how to keep the system running well in your unique environment. The SightLine Power Agent allows you to define logical workloads, which group the many processes that make up your total processing load into manageable, functionally related metrics as well as collect and deliver data that indicates how each of these workloads is behaving in terms of activity, resource utilization and system impact. Inspect the Workloads class in the default configuration file. Then, use the various metrics from the existing array class of processes to create meaningful members of the Workloads array class. The important thing to remember is that capturing and reporting workload measurements is essentially a two-step process. The first step is to decide which processes fit into which logical bucket or workload. To accomplish step one, agentmgr uses CRITERIA. Within each CRITERIA, users can test for several attributes that a process might have, such as the Person Name, Process Name, Program Name, Group Name, or Terminal Name. Once the CRITERIA have been established, a name is assigned using VARSET specifications. In the second step, you must decide what metrics are to be collected, and specify these metrics using VARIABLE statements. The Processes Class In order to understand this example, it is important to know the metrics associated with the array class VOS System.Processes. The following table (Figure 4-3) lists the Process Class metrics. agentmgr 4-15 agentmgr 4-16 Metric Type Description Pid Process Name Person Name Group Name Program Name Terminal Name Priority Login Time % Cpu % Page Fault Reads Writes Page Faults State Cpu Time Limit Invoking Process Identity Integer String String String String String Integer String Float Float Float Float Float String String Integer Process ID Process Name Person Name Group Name Program Name Terminal Process priority Login time/date stamp CPU Utilization % CPU time processing page faults Reads/Second Writes/Sec Page Faults/Sec Processor ready state Clone Level Subprocesses Time Last Run Memory Pool % Cpu Queue Time Cpu Queue Time % Cpu Queue Wait Time Cpu Queue Wait Time Cpu Queue Completions Cpu Queue Completions/Sec Cpu Queue Busy Time % Cpu Queue Busy Tim % Disk Read Queue Time Disk Read Queue Time % Disk Read Queue Wait Time Disk Read Queue Wait Time Disk Read Queue Completions Disk Read Queue Completions/Sec Disk Read Queue Busy Time % Disk Read Queue Busy Time Cache Read Hits Cache Read Misses Cache Soiled Interrupts Pluses Interrupts Minuses Page Fault Pluses Page Fault Minuses Integer Integer String Integer Float Float Float Float Float Float Float Float Float Float Float Float Float Float Elapsed Time of the sample, expressed as a % Elapsed Time of the sample Residence Time, expressed as a % Residence Time Processor visits Processor visits, expressed as a rate per second Busy time, actually using the processor Busy time, expressed as a % Elapsed Time of the sample, expressed as a % Elapsed Time of the sample Residence Time, expressed as a rate per second Residence Time Disk visits Disk visits, expressed as a rate per second Float Float Float Float Float Float Float Float Float Busy time, actually using the disk(s) Busy time, expressed as a % Read Hits/second Read Misses/second Cache pages soiled/second Interrupt starts Interrupt completions Page fault starts Page fault completions SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr Metric Type Description Shared Memory Pluses Shared Memory Minuses Unshared Memory Pluses Unshared Memory Minuses New Page Faults New Interrupts Float Float Float Float Float Float Shared memory pages gained Shared memory pages released Unshared memory pages gained Unshared memory pages released Page Fault plusses – minuses Interrupt plusses – minuses New Shared Memory New Unshared Memory Unsigned Integer Unsigned Integer Shared pages plusses – minuses Unshared pages plusses – minuses Figure 4-3. Metrics for the Process Class Stratus VOS Class Definition The Workload CLASS definition from the default agentmgr.conf configuration file is shown in Figure 4-4. This class will be discussed in detail. CLASS { Workloads } = { VOS System.Processes } CRITERIA System = { Process Name } = /TheOverseer/ OR { Process Name } = /BatchOverseer/ OR { Process Name } = /mail_handler/ OR { Process Name } = /rsn/ OR { Process Name } = /TPOverseer/ OR { Process Name } = /Cache_Manager/; CRITERIA LinkServer = { Process Name } = /LinkServer/ OR { Process Name } = /osl_server/ ; CRITERIA OtherServer = { Process Name } = /network_client/ OR { Process Name } = /network_server/ OR { Process Name } = /open_client/ OR { Process Name } = /open_server/; CRITERIA SightLine = { Process Name } = /agentmgr/ OR { Process Name } = /servd/ OR { Process Name } = /datamgr/ OR { Process Name } = /protomgr/ OR { Process Name } = /threshd/; CRITERIA FtpD = { Program Name } = /ftpd.pm/; CRITERIA InetD = { Program Name } = /inetd.pm/; CRITERIA PagingD = { Person Name } = /Paging_Daemon/; CRITERIA QrunD = { Process Name } = /Qrun_Daemon/; CRITERIA MiscUtil = { Process Name } = /Maintenance_Utility/ OR { Process Name } = /Diagnostic_Utility/ OR { Process Name } = /Kernel_Utility/; CRITERIA Other = 1; EXCLUSIVE VARIABLE FLOAT { % Cpu } GROUPNAME { Module Workloads } PCNAME { Wkld % Cpu } POSITION { 1 } = { % Cpu }; agentmgr 4-17 agentmgr VARIABLE FLOAT { Page Faults } GROUPNAME { Module Workloads } PCNAME { Wkld PgFlt } POSITION { 2 } = { Page Faults }; VARIABLE FLOAT { % Page Fault Time } GROUPNAME { Module Workloads } PCNAME { Wkld % PgFlt Time } POSITION { 3 } = { % Page Fault }; VARIABLE FLOAT { Reads } GROUPNAME { Module Workloads } PCNAME { Wkld Reads } POSITION { 4 } = { Reads }; VARIABLE FLOAT { Writes } GROUPNAME { Module Workloads } PCNAME { Wkld Writes } POSITION { 5 } = { Writes }; VARIABLE FLOAT { CPU Completions/Sec } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Completes/Sec } POSITION { 7 } = { Cpu Queue Completions/Sec }; VARIABLE FLOAT { CPU Queue Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Queue ET } POSITION { 8 } = { Cpu Queue Time }; VARIABLE FLOAT { CPU Residence Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Residence Time } POSITION { 9 } = { Cpu Queue Wait Time }; VARIABLE FLOAT { CPU Busy Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Busy Time } POSITION { 10 } = { Cpu Queue Busy Time }; VARIABLE FLOAT { CPU Wait Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU Wait Time } POSITION { 11 } = { Cpu Queue Wait Time } - { Cpu Queue Busy Time }; VARIABLE FLOAT { CPU % Busy Time } GROUPNAME { Module Workloads } PCNAME { Wkld CPU % Busy Time } POSITION { 12 } = { % Cpu Queue Busy Time }; VARIABLE FLOAT { Disk Read Completions/Sec } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Completes/Sec } POSITION { 13 } = { Disk Read Queue Completions/Sec }; VARIABLE FLOAT { Disk Read Queue Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Queue ET } POSITION { 14 } = { Disk Read Queue Time }; VARIABLE FLOAT { Disk Read Residence Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Res Time } POSITION { 15 } = { Disk Read Queue Wait Time }; VARIABLE FLOAT { Disk Read Busy Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Busy Time } POSITION { 16 } = { Disk Read Queue Busy Time }; VARIABLE FLOAT { Disk Read Wait Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd Wait Time } POSITION { 17 } = { Disk Read Queue Wait Time } - { Disk Read Queue Busy Time }; VARIABLE FLOAT { Disk Read % Busy Time } GROUPNAME { Module Workloads } PCNAME { Wkld Disk Rd % Busy Time } POSITION { 18 } = { % Disk Read Queue Busy Time }; VARIABLE FLOAT { Cache Read Hits } GROUPNAME { Module Workloads } PCNAME { Wkld Cache Read Hits } POSITION { 19 } = { Cache Read Hits }; VARIABLE FLOAT { Cache Read Misses } GROUPNAME { Module Workloads } PCNAME { Wkld Cache Read Misses } POSITION { 20 } = { Cache Read Misses }; VARIABLE FLOAT { Cache Soiled } GROUPNAME { Module Workloads } PCNAME { Wkld Cache Soiled } POSITION { 21 } = { Cache Soiled }; 4-18 SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr VARIABLE FLOAT { Interrupts Pluses } GROUPNAME { Module Workloads } PCNAME { Wkld Interrupts } POSITION { 22 } = { Interrupts Pluses }; VARIABLE FLOAT { New Shared Memory } GROUPNAME { Module Workloads } PCNAME { Wkld Shared Memory } POSITION { 23 } = { New Shared Memory }; VARIABLE FLOAT { New Unshared Memory } GROUPNAME { Module Workloads } PCNAME { Wkld Unshared Memory } POSITION { 24 } = { New Unshared Memory }; VARIABLE U_INT { Total } GROUPNAME { Module Workloads } PCNAME { Wkld Total } POSITION { 6 } = 1; VARSET { VARSET { VARSET { VARSET { VARSET { VARSET { VARSET { VARSET { VARSET { VARSET { end CLASS Name Name Name Name Name Name Name Name Name Name } } } } } } } } } } = = = = = = = = = = "System" "LinkServer" "OtherServer" "SightLine" "FtpD" "InetD" "PagingD" "MiscUtil" "QrunD" "Other" criteria criteria criteria criteria criteria criteria criteria criteria criteria criteria System; LinkServer; OtherServer; SightLine; FtpD; InetD; PagingD; MiscUtil; QrunD; Other; Figure 4-4. Default Workload Class Definition CRITERIA In this example, CRITERIA are defined using only Process Name metrics. All processes that have the string agentmgr, servd, datamgr, protomgr, or threshd Process Name are considered to be a member of the SightLine CRITERIA statement. Similar groups of processes have been combined to define the other standard workloads. A final CRITERIA statement, defining Other, is included to capture all processes that do not fit into the previous criteria. For all processes that evaluated to FALSE in the previous statements, this statement will always evaluate to TRUE. Note that the keyword EXCLUSIVE is used. Once a CRITERIA statement is satisfied, the process cannot satisfy any following statements. Note that order may be important in defining CRITERIA statements, and the CRITERIA statement for Other should be the last CRITERIA statement. The following example explores regular expressions as presented in Figure 4-2 and their use when defining workloads: Person Name1 = jeff 1. Person name2 = rjeff Person Name3 = jeffe {Person Name} = /jeff/ This specifies that the Person Name must contain the string “jeff.” Therefore, all three persons will be included. agentmgr 4-19 agentmgr 2. {Person Name} = /^jeff/ This specifies that the Person Name must begin with the string “jeff.” Therefore, Person Name1 and Person Name3 will be included. 3. {Person Name} = /jeff$/ This specifies that the Person Name must end with the string “jeff.” Therefore, Person Name1 and Person Name2 will be included. 4. {Person Name} = /^jeff$/ This is a compound definition that specifies that the Person Name must begin string “jeff” and end with the string “jeff.” Therefore, Person Name1 would be the only one included. VARIABLE In this example, the metrics are defined to equal the same metrics (or a combination of metrics) defined for the existing array class. Note that metrics are always added. For example, for a specific VARSET, % Cpu is the sum of all processes that met the criterion to be in that VARSET. VARSET VARSET statements define the names for the members of the Workloads array class. The names associate these members to the members of the existing array class that met the specified criteria. Path Meters Class The SightLine Power Agent can collect detailed information about individual paths or files. You must specify the full path(s) in a file named control.harvest, located in the FRTLHOME>etc directory. The object specified by the path does not need to exist prior to starting the agentmgr process. However, if none of the paths specified in the control.harvest file are open when the agentmgr is started, the entire Path class will be disabled and no Path meters statistics will be gathered. You can specify a star (*) name for path_name. However, the name expansion only occurs within a given directory. It does not apply to any subdirectories. If only a directory is specified (no filenames), the agentmgr will attempt to collect path statistics for all the files in the directory. The control.harvest file delivered with the software is shown in Figure 4-5. It is delivered as a sample file, and must be modified for your system. Note that all the lines are commented out (with the number sign [ # ] preceding each line) so, by default, no paths are metered and the Path Meters class will not appear in EA/V. File metering does result in some overhead, so you should be judicious in deciding which files to monitor. 4-20 SightLine Power Agent for VOS Systems: Power Agent (R464) agentmgr # harvest file. specify only path_name entries # path_name:%es#enet* # path_name:%es#tcp* # path_name:%es#d02>sightline>log Figure 4-5. Default control.harvest File Registry.csv Metric Description File The datamgr process generates a metric description file when it connects to the agentmgr and also when the metric names change. The file is called registry.csv, and is normally stored in the data/Local directory. The registry.csv file is intended to assist you in customizing the software and for technical support. This file lists the internal name of each metric (registry name), and specifies how it appears in EA/V. This file can be read with a text editor or with a spreadsheet program, such as Microsoft® Excel. The primary reason for this file is to assist users in customizing the protomgr.conf and threshd.conf files. However, this file can also help with debug. The following columns are generated for each metric: Group Name The group name in EA/V, or the event class for EventList metrics Metric Name The metric name in EA/V, or the column name for EventList metrics Event Flag 0 for conventional metrics, 1 for EventList metrics Position The position of the metric in the group or in the event class Type The metric type (for example, FLOAT64) Form The metric form (for example, PCTDUR) Scale The scale factor (for example, 1024) Registry Path The dot-separated registry path name for the metric. Type, Form, and Scale are the values from the registry. The equivalent values from the metric data are not displayed. These registry values indicate the default presentation of agentmgr 4-21 agentmgr the metrics in EA/V on the PC. These are the settings that can be overridden by customizing the protomgr.conf file. For example, to change a metric from KBytes to MBytes, or a rate into an absolute value. If a metric has multiple PC definitions then it is listed multiple times (for example, Pid, which appears in all the event classes). The file is updated whenever datamgr writes a registry block to the HTF file. This should occur every time the datamgr starts, and whenever the registry changes. If datamgr is running in maximum debug (-ddd), then it generates a history of the registry changes, by renaming the registry.csv file before rewriting it. After three registry changes you would have registry0.csv, registry1.csv, registry2.csv, and the latest values in registry.csv. agentmgr Command Line Options To view all the available command line options at any time, issue the following command: agentmgr –h The “-h” option returns the possible options available with this executable. The output is shown below. Usage: agentmgr [-dfhuv] [-n name] [-p port] [-z level] -d -f -h -u -v -n name -p port -z level 4-22 show more messages in log file (may be repeated) run in foreground display this message do not compress data (same as -z 0) display version information specify alternate conf and log file name specify TCP port to listen on specify compression level (0=off, 9=max) SightLine Power Agent for VOS Systems: Power Agent (R464) Chapter 5 Datamgr The Database Manager program, datamgr, is the process that records performance metrics from the agentmgr in a data file and reads out historical data when requested by SightLine Expert Advisor/Vision (EA/V). It connects to the agentmgr and records blocks of metrics into a circular file that is usually, but not necessarily, stored locally. This file is called the Host Trace File, or HTF. The datamgr program also has the capability to connect to multiple agentmgrs and store separate HTFs for each managed node. This feature is discussed later in this chapter in Centralized Database Management. The datamgr program can also act as a data server to other FORTEL client processes. For example, datamgr reads out historical data when it is requested by EA/V. There are really two types of data that EA/V might request: real-time data and historical data. EA/V will connect to datamgr to get historical data and connect to agentmgr for real-time data. The parent datamgr spawns a child process datamgr. The parent then transmits all requested data to EA/V and terminates, while the child takes over the database management responsibilities. Usually EA/V will ask to switch to live data when all historical data has been downloaded. This transfer of connections from datamgr to agentmgr happens seamlessly. The capacity to store agentmgr output locally provides a temporary store of data that facilitates flexible strategies for downloading data to EA/V on the PC, as well as recovery of data that was collected on the host but not transferred to EA/V. The datamgr program is designed primarily to save data for recovery purposes. By design, the performance data should be archived on the PC client by EA/V. The data stored locally on the managed node acts as a safeguard against any downtime for the data transfer to the client. With this in mind, determining exactly how much data to store should take into account the stability of your environment. A very stable network may enable storing just a single day of data on the host. On the other hand, some environments might want to consider several days of storage to secure against extended network downtime. This chapter describes the capabilities of datamgr and the configuration steps required to achieve various operational objectives. After completing this chapter, you should understand the tasks performed by datamgr. You should also have a command of the configuration file syntax to program those tasks in your own environment. Datamgr 5-1 Datamgr Configuration The operation of datamgr is controlled through its configuration file, datamgr.conf, in the FRTLHOME>etc directory. The datamgr.conf file is quite brief and fairly simple. All the work in this sample file is performed in the last line. See the sample line below. # Host Name Port Number DB Name Interval Expire # DB system.us.com 8700 Local 30 36h { VOS System }, { General }, { CPU Extra }, { DiskQueue }, { PathQueue }, { Process Info }, { Workloads }; Figure 5-1. Default Line from datamgr.conf File A line is required for each system for which datamgr is to store data. The datamgr program can store data for more than a single system. By default, each datamgr.conf file lists the local system. A centralized database management scheme can also be employed where the datamgr program is instructed to collect and maintain data from multiple agentmgrs. (See Figure 5-2.) There are six essential entries in an operative line of the configuration file: 1. The declaration “DB.” 2. The Agent name: We recommend that you use the system name plus the full domain for this entry. Whatever string is used here, it should resolve to the IP address of the machine. 3. The port on which agentmgr is listening: By default, the port number is 8700. If the collector is configured to listen on an alternate port, this entry must be modified to reflect the change. 4. The directory name where host trace files will be stored: This directory will be located under the FRTLHOME>data directory. A unique name should be used for each directory when configuring a centralized database manager. In other words, when storing data for two systems in addition to the local machine, there would be three subdirectories in the data directory. The local data is stored in Local, and one directory for each of the other two systems. We recommend naming the subdirectories with the name of the system for which they hold performance data. 5. The sample interval at which datamgr will archive the data: By default, the sample interval is configured to be 30 seconds. The collection interval defined in the datamgr.conf must agree with the collection interval defined in the protomgr.conf file. A future release of the host software will permit multiple intervals. Changing this interval affects the amount of space needed to store the data. The degree of impact is dependent on the rate of compression of the data, but it is essentially a linear relationship; more samples per hour (shorter interval) means more storage and processing resources required. 5-2 SightLine Power Agent for VOS Systems: Power Agent (R464) Datamgr 6. The amount of data to store (in hours (h), days (d), or MBs (m)): By default, the software will store the last 24 hours of data. (3d = 72h) 3 days is equivalent to the last 72 hours from the current time, not the last 3 full days. To make sure that the last three full days are stored, use 4d or 96h.) 7. The list of Metric Classes: A list of the metric classes you want to store must be specified. This list should also include the General metric class, and any computed classes defined in agentmgr.conf. For instance, in Figure 5-1, the metric classes { General } and { VOS System } are stored, as well as the computational metrics based on metrics in { VOS System }: { CPU Extra }, { DiskQueue }, { PathQueue }, { Process Info }, and { Workloads }; By default, datamgr listens on port 8800. If there is a conflict in your environment and this port is already in use, you need to edit the slagent.cm command macro under FRTLHOME>bin to start the datamgr with an alternate port number. Simply edit the variable named DATAMGR_PORT located near the top of the macro to reflect the new port number, as shown below: &set datamgr_port 8800 Communication VOS Server Servd (1645) Agentmgr (8700) n k EA/V Workstation j mo SightLine Expert Advisor/Vision Protomgr l Datamgr (8800) Figure 5-2. SightLine EA/V and the Power Agent Components Datamgr 5-3 Datamgr The Database Manager communicates with data sources (agentmgrs), as well as with data clients (protomgrs) via TCP/IP. On startup, it will attempt to connect to all configured data sources. Should an attempt to connect fail, a new attempt will occur every 60 seconds. The typical course of events in a normal connection is to download some historical data, then carry on into live data, as depicted in the numbered sequence in Figure 5-2. The default TCP/IP ports are in parentheses. j SightLine Expert Advisor/Vision (EA/V) sends a connection request that is received by servd, the Service Manager, on a remote VOS host. k Servd will launch a protomgr process, the Real-Time Agent, to manage the transfer of data between the host and the EA/V client. l Having received a request for historical data, protomgr first connects to datamgr to receive the recorded performance metrics. m Protomgr transfers the data to EA/V. n When all the requested historical data has been received from datamgr, protomgr will drop its connection to datamgr and open one to the agentmgr for real-time performance metrics. o Protomgr will transfer data from the collector to EA/V until the connection is stopped manually. When EA/V wants to connect to the Database Manager Daemon, the client will send a message requesting data for a particular host to the Service Manager, servd, on the host (j in Figure 5-2). The name is passed from the Host field in the Configure Network Host Session dialog box in EA/V. This host name is translated to an IP address using DNS or the hosts file on the local machine. The datamgr program will perform an IP address match to verify that it stores data for the host passed from EA/V. If the IP address match fails, no historical data will be sent. The connection will pass directly to the agentmgr for a live download (n in Figure 5-2). If this happens, a message will appear in the datamgr log to the effect that the database cannot be found. If you attempt to connect for historical data, but only receive live data, then you should ensure that: 1. Datamgr is running (lu). 2. There is an HTF in FRTLHOME>data>Local. 3. The host name sent from EA/V resolves to the same IP Address as the host entry in the DB line of datamgr.conf. File Structure The temporary store of data is maintained as a set of directories and files corresponding to each host for which datamgr is archiving data. The set of host files will include a host index file and a host trace file. The host trace file is circular. New data will be wrapped over the oldest data when either the configured size is filled, or the configured period has elapsed. 5-4 SightLine Power Agent for VOS Systems: Power Agent (R464) Datamgr Directory Structure FRTLHOME>data> DB name > >DB Name.htf >DB Name.idx >registry.csv where: <DB Name > DB Name.htf DB Name.idx registry.csv Example: “DB Name” from the datamgr.conf file Host Trace File (data) Index to the host trace file >data>Local >Local.htf >Local.idx >registry.csv >data>develB >devel.htf >devel.idx >registry.csv The database directory files are located in a directory named, FRTLHOME>data>DB Name, where DB Name is the database name declared in the datamgr configuration file. The database directory contains three files: DB Name.htf This file contains the performance data. It is the largest file, and grows until the defined maximum is reached, then wraps new data over the oldest data. The maximum can expressed in absolute size, as in 25MB, or can be expressed in time, as in 24h (hours). DB Name.idx This file contains the index for the DB Name.htf file. Registry.csv The registry.csv file is intended to assist you in customizing the software and for technical support. This file lists the internal name of each metric (registry name), and specifies how it appears in EA/V. This file can be read with a text editor or with a spreadsheet program, such as ® Microsoft Excel. If Oracle support is installed, then multiple registry.csv files can be generated (depending on the configuration selected). These are stored in the data/*/ directories. The file is documented in Chapter 4 of this guide. Datamgr 5-5 Datamgr Centralized Database Management The datamgr and agentmgr do not have to run on the same module. Therefore, one datamgr can manage the host trace files for all your systems. This allows multiple host trace files to be stored on a single host, which can be very advantageous if storage resources are scarce. Now that you have seen a simple model of the relationship between EA/V and a single VOS host, refer to the more complex environment shown in Figure 5-3. System A System B servd (1645) agentmgr protomgr (8700) servd (1645) agentmgr protomgr (8700) threshd threshd datamgr (8800) A B B B Figure 5-3. Centralized Databa se Management The example in Figure 5-3 shows two systems, A and B, where system A is the central database repository. Notice that the datamgr on system B is not running. System A’s datamgr is recording the data for both. It has two TCPI/IP connections open. The first is to the local agentmgr, and the second is to system B’s agentmgr. System B no longer needs to run datamgr. It could be started any time should the connection be lost to the central datamgr on system A. It is possible to merge the data from the two separate datamgrs in SightLine so that long-term history files can be uninterrupted. The datamgr program will manage the two databases separately. For example, datamgr might be configured to store data from the local agentmgr for 3 days, while keeping 100 MB of performance data for system B. To configure this environment, edit the datamgr.conf file in the FRTLHOME>etc directory on system A. Enter one line for each system. Notice that the syntax of the datamgr.conf file includes which host and port it should get its data from for each trace file directory, the directory where the data should be stored, the interval at which it is being collected, and the amount of data to store. Revisit the previous section, Configuration, for more details. The following datamgr.conf entries would configure the environment in Figure 5-3: 5-6 SightLine Power Agent for VOS Systems: Power Agent (R464) Datamgr # # DB { DB { Host Name Port DB Name moduleA.domain.com 8700 Local 30 CPU Extra }, { DiskQueue }, { PathQueue moduleB.domain.com 8700 HostBDB 30 CPU Extra }, { DiskQueue }, { PathQueue Interval 3d { }, { 100m }, { Expire VOS System }, { Process Info }, { VOS System }, Process Info }, General }, { Workloads}; { General }, { Workloads}; N O T E ————————————————————————Intervals for all host entries must be equal to the collection interval set in the protomgr.conf file in the current release of the host software. A future release will allow multiple differing intervals. Now consider the communications model shown in Figure 5-2. Essentially, the sequence of events is the same for a central database, except that the protomgr process may have to go to two machines for all the data requested. The protomgr process on system B must go to system A for historical data before switching to the local agentmgr. It is possible to pass options from the Advanced Settings in EA/V’s Network Host Sessions dialog box to instruct the protomgr process to connect to an alternate location to retrieve historical data, but connect to the local host agentmgr for real-time data. Rather than add these setting to each SightLine machine, it is far more efficient to configure these settings on the command line of protomgr. Remember that protomgr is launched by the Service Manager, servd. The servd process’s sole responsibility is to listen for connection requests and launch protomgr when one is received. The protomgr process accepts several options on the command line. Let’s look more closely at how servd calls protomgr. The call to open protomgr is defined in servd.conf below: Calls protomgr for local data # service hostname command # SERVICE vpdata localhost protomgr -P %p -V %v -K %k -C %c -E %e -c %h –x1; In a centralized data storage environment, the historical performance data is not kept locally. There are now two places to connect to retrieve data: the local agentmgr for real-time data, and the central repository for historical data. You need an intelligent agent that knows to go to moduleA to retrieve historical data, and then connect back to the local agentmgr for real-time data. The protomgr program can be configured though command line options to do this. You should configure the servd process on the machine not storing the historical data (moduleB in our example) to access the datamgr process of the central database machine (moduleA). To accomplish this, you must edit the source.conf file of the system B to pass the “–m <central_host_name>” option when calling the protomgr process. This option instructs protomgr to use the datamgr process of the system A when downloading the historical data. The parameters contained in source.conf are appended as a protomgr option for the vpdata service in servd.conf. EA/V can declare which protomgr process to use, with which options, by adding a new data source definition in the source.conf file. By default, EA/V’s Discover Data Sources feature will display all connections defined in the source.conf file. The default connection for System data is named system. Any string can be placed in the source field. When EA/V passes a string that matches a data source entry, then the associated options to protomgr will be executed. To set up a new data source called Bdata that will call a protomgr pre-configured to transfer system B data from the repository on system A, the source.conf file should look like this: Datamgr 5-7 Datamgr # Source Name Description Options SOURCE system “System”; SOURCE Bdata “System A” –m system A.domain.com; System A is the centralized database manager. Therefore, the source.conf files for additional nodes whose data is stored on system A should also be edited to reflect the above changes. Datamgr Command Line Options To view all the available command line options at any time, issue the following command after first exporting the path to the dynamic libraries (see the slagent script for the path options): datamgr -h The “-h” option returns the possible options available with this executable. The output is shown below. Usage: datamgr [-dfhv] [-q port] [-n name] [-O output_file_number] -d -dd -f -h -v -q port -n name -O number 5-8 turn on debugging turn on an additional level of debugging run in foreground display this message display version information specify alternate datamgr port specify alternate name for conf and log files suffix to the datamgr.out file SightLine Power Agent for VOS Systems: Power Agent (R464) Chapter 6 Threshd SightLine’s MultiAction Threshold Agent, which is implemented in the threshd program, provides a robust threshold and reactive action management service for the SightLine Power Agent software. When user-specified performance thresholds are exceeded, the MultiAction Threshold Agent can be configured to automatically interface with various software components outside the Power Agent software. Three types of threshold reactive actions can be implemented: • • • Automatic e-mail Send SNMP traps and create an SNMP MIB Invoke a batch script Each one of these actions can be invoked using a robust set of criteria to evaluate any of the agentmgr metrics and then take action on their results. This chapter provides the overall syntax and semantics for using the MultiAction Threshold Agent, along with some simple, but useful, examples. The MultiAction Threshold Agent can connect to multiple agentmgrs. All agentmgrs connected to a single threshd share the same thresholds. The agentmgrs and thresholds are all defined in a single configuration file. This configuration file is discussed in detail in this chapter. The MultiAction Threshold Agent enables the user to define multiple thresholds on the same metric. This functionality permits an escalation of action as the metric’s value becomes more critical. For example, a MultiAction Threshold Agent could be configured to delete temporary files from a directory if the space used is over 50% but less than 75%. If this doesn't solve the problem, mail can be sent at 60% used, and an SNMP trap can be sent at 80% used. Configuration By default, the configuration file for the MultiAction Threshold Agent, threshd.conf, is located in the FRTLHOME>etc directory. The file is used to configure the condition-action thresholds, alert actions, and agentmgrs to be monitored. Since each of these parameters must be tailored for individual computer systems, the default file is essentially empty, consisting of sample lines that are commented out. These lines provide some guidance and examples to help get you started. Threshd 6-1 Threshd General Structure The following describes the overall structure of the threshd.conf file: MAILHOST domain_name; AGENTMANAGERS host.fortel.com; host.fortel.com:8700; host.fortel.com INTERVAL 30 SECONDS; host.fortel.com INTERVAL 1 MINUTE; host.fortel.com INTERVAL 10 MINUTES VPNAMES “protomgr”; END AGENTMANAGERS SNMPVARS snmpvariable_definition; . . snmpvariable_definition; END SNMPVARS SNMPTRAPS snmptrap_definition; . . snmptrap_definition; END SNMPTRAPS THRESHOLDS threshold_definition; . . threshold_definition; END THRESHOLDS Each part is discussed in the following sections. Metric Names By default, the registry.csv file located in FRTLHOME>data>Local for each VOS system should be considered the definitive list of metrics for that server. When specifying metric names for threshd, you may use either the metrics managed by agentmgr or the metric name that is used by SightLine Expert Advisor/Vision (EA/V) on the PC, which is often simpler. String Substitution Various agentmgr variables are available when forming expressions and messages. The following string substitutions can be used to provide additional information over and above the metrics’ names: 6-2 SightLine Power Agent for VOS Systems: Power Agent (R464) Threshd $ADDRESSLIST $COUNT $DURATION $EXPRESSION $HOSTIPADDRESS $HOSTNAME $INSTANCE $RESULT $SEVERITY $SLEEP $TIME $TIMEGMT $TITLE $VARNAME $VPVARNAME Address list of threshold The number of consecutive times that the action has been taken since the expression has evaluated to TRUE Duration of threshold Description of expression for threshold agentmgr IP address agentmgr name Instance or metric Description of the result Severity of threshold Sleep of threshold (output in seconds) Time of action (local) Time of action (GMT) Title of threshold Name of agentmgr metric (including the instance) Name of EA/V metric (including the instance) Whenever these keywords are used, they are replaced by their value. Additional String Features Any string enclosed in braces ( { } ) will be “evaluated.” For example, the string “{ VOS System.Users Logged On }” would be replaced by the number of users logged on at the time of the threshold violation. You can use the following escape characters: \n \r \t \" \\ replaced with an ASCII 10 (newline) replaced with an ASCII 13 (return) replaced with an ASCII 7 (tab) replaced with a double quotation mark replaced with a single \ To put a left brace ( { )or a dollar sign ( $ ) in your string, use: \\{ \\$ replaced with { replaced with $ MAILHOST Definition The MAILHOST subset of threshd.conf is optional. It is used to define the ip address or network name of the system on your network acting as a mail server, running the SNMP (Simple Mail Transfer Protocol) service, that threshd should use when sending mail alerts. The syntax for this is described below: Syntax: MAILHOST domain_name; Threshd 6-3 Threshd The domain_name specifies the name of the system that will service the sent mail. The default is the host executing threshd. Example: MAILHOST mailserver.fortel.com; AGENTMANAGERS Definition The AGENTMANAGERS subset of threshd.conf is used to list each agentmgr to be monitored by threshd. Every collection agent listed in this section will be alerted by the same set of condition-action thresholds. The syntax and parameters needed for each collector definition is described below: Syntax: hostname[:port] [INTERVAL duration] [VPNAMES "conf_file"] The hostname specifies the hostname of the system where the agentmgr resides. [:port] specifies the port where the agentmgr is running. The default port is 8700. [INTERVAL duration] specifies the time interval for data being collected. The default is 30 seconds. The duration may be specified as days, hours, minutes, or seconds; if it is not specified, it is assumed to be seconds. [VPNAMES "conf_file"] specifies the protomgr configuration file containing the metric redefinitions that may have been optionally configured. The conf_file specified will be prepended with the value of FRTLHOME>etc and appended with .conf. The default is protomgr. Example: AGENTMANAGERS host.fortel.com; host.fortel.com:8700; host.fortel.com INTERVAL 30 SECONDS; host.fortel.com INTERVAL 1 MINUTE; host.fortel.com:8700 INTERVAL 30 VPNAMES “protomgr”; END AGENTMANAGERS SNMPVARS Definition The SNMPVARS subset of threshd.conf is optional. It is used to specify the metrics that will be passed in any SNMP traps. The syntax and parameters needed for each metric definition is described below: Syntax: name: OID integer VALUE string [ TYPE type ] [ DESCRIPTION string]; name specifies the name of the SNMP metric. This name is used in the SNMPTRAPS section to reference a defined SNMP variable. Valid characters include a-z, A-Z, and 0-9. OID specifies the Object ID of the metric. The metric will have the object identifier of 1.3.6.1.4.1.1130.1.xx. 6-4 SightLine Power Agent for VOS Systems: Power Agent (R464) Threshd VALUE specifies the string to evaluate to generate the metric’s value at the time the trap is generated. For example, a string of $VPVARNAME would generate the string that represents the metric that triggered the trap. Also, {$VPVARNAME} would generate the value of the metric. TYPE specifies the output type of the metric. The TYPE can be one of the following: INTEGER STRING (default) DESCRIPTION specifies the description of the metric as it will appear in the MIB. SNMPTRAPS Definition The SNMPTRAPS subset of the threshd.conf file is optional. This section is used in conjunction with the SENDTRAP..TO action to define a customized SNMP trap. The syntax and parameters needed for each trap definition are described below: Syntax: name: SPECIFIC integer [VARIABLES var-list] [DESCRIPTION string] [MIBTEXT string]; name specifies the name of the SNMP trap. The name must be specified in the THRESHDOLDS section with the action SENDTRAP..TO in order for the defined trap to be used in the send SNMP Trap action. Valid characters include a-z, A-Z, and 0-9. SPECIFIC specifies the specific id, which should be a unique number for each defined SNMP trap. The value of the SPECIFIC integer must be greater than or equal 6. Generic traps sent with the SENDTRAP..TO action are assigned the specific id of 6. SNMP traps defined in the SNMPTRAPS section will have the Enterprise ID of 1.3.6.1.4.1.1130. VARIABLES is followed by a comma-separated list of names that specify the variables defined in the SNMPVARS section to include in the trap. DESCRIPTION specifies the description of the trap as it will appear in the MIB. MIBTEXT specifies any additional text to be included in the MIB. THRESHOLDS Definition The THRESHOLDS subset of threshd.conf is used to configure the condition-action thresholds. Each threshold that is defined in this section will apply to each agentmgr defined in the AGENTMANAGERS subset. This is the most complex section of the file. Take care to read thoroughly through this section to fully understand each parameter. Example thresholds are provided in following sections of this chapter. The syntax and parameters needed to define a threshold are described below: Syntax: title: IF expression [FOR duration] THEN action [parms] Threshd 6-5 Threshd title specifies the description of the threshold. Valid characters include: a-z A-Z Space _ (underscore) -!@#$& 0-9 expression specifies a boolean expression comprised of variable names and operators. This expression must be true for the action to occur. An expression can be as simple or complex as needed. Expressions are described in detail in the section, Expressions. duration specifies the amount of time expression must be true before action is taken. duration may be further specified as: nn days nn hours nn minutes nn seconds If not specified, the duration is assumed to be in seconds. If no duration is specified, it is assumed to have a value of zero. This means the action will be taken during the first interval where the expression evaluates as TRUE. action specifies one of the following: SENDMAILTO address_list SENDMPEVENTTO address_list EXECSHELL command_line SENDTRAPTO address_list SENDTRAP snmptrap TO address_list address_list allows mail and/or traps to be sent to multiple addressees. Multiple addresses are separated by commas. Remember that both string substitution and evaluation are supported within address lists. For SENDMAILTO, the address_list consists of a list of e-mail addresses. For SENDMPEVENTO, SENDTRAPTO, and SENDTRAP..TO, the address_list consists of domain names or IP addresses. command_line specifies the full command path to a shell script designed for corrective actions. snmptrap specifies the name of the SNMP trap as defined in the SNMPTRAPS section. Each of the actions is described later in the section, Actions. parms can be one or more of the following: SLEEP duration duration specifies the amount of time before the action will be reissued. duration may be further specified as: 6-6 SightLine Power Agent for VOS Systems: Power Agent (R464) Threshd nn days nn hours nn minutes nn seconds If not specified, the duration is assumed to be in seconds. If no duration is specified, it is assumed to have a value of zero. SEVERITY integer integer specifies the severity. The default severity is 5. The specification of the severity must come before the use of the $SEVERITY variable in the MESSAGE string. MESSAGE string string specifies the actual message to be sent with each action. This string message is surrounded by double quotes and supports string substitutions. Expressions The MultiAction Threshold Agent has can evaluate very complex expressions for each defined threshold. An expression is comprised of variable names, integers and operators. Each of these parameters is discussed below. Operators for Complex Expressions Complex expressions are supported with the following operators: AND OR < <= = != >= > + * / ( ) logical and logical or less less than or equal equal not equal greater than or equal greater than addition subtraction multiplication division left parenthesis right parenthesis Actions The MultiAction Threshold Agent has the ability to take action to alert the user to a specific event or problem. These actions include sending email, sending an SNMP trap, executing a batch script for corrective actions, and sending a message. Threshd 6-7 Threshd Send E-mail The keyword SENDMAILTO is used to specify an action where e-mail is to be sent. The MAILHOST that is specified in the configuration file is used to specify the domain name for the mail server that will service the e-mail sent by threshd. IF { Disk % Busy } > 90 FOR 5 MINUTES THEN SENDMAILTO [email protected], [email protected] MESSAGE “Disk $INSTANCE has been busy for more than $DURATION minutes”; In this example, e-mail will be sent to [email protected] and [email protected] when a disk is over 90% Busy for 5 minutes or more. Note that a string substitution is used to include the instance name of the disk and the duration in the message. Send an SNMP Trap The keywords SENDTRAPTO and SENDTRAP..TO are used to specify an action where an SNMP trap is sent to an IP address. SENDTRAPTO will send a generic SNMP trap, whereas SENDTRAP..TO will send a customized SNMP trap defined in the SNMPTRAPS section. The SNMP trap message conforms to SNMP standards, where the following fields are provided: SNMP Field Type Description Trap Message Value Alarm Occurrence IP Address Unused Field 1 Unused Field 2 VarName Unused Field 3 Time Duration Unused Field 4 Severity String Integer Integer Integer Integer Integer String String String Integer Integer Integer This is the string provided by the MESSAGE parameter of the threshold This is the value of the variable specified in the threshold $COUNT $HOSTIPADDRESS Constant with value 123 Constant with value 0 $VARNAME Constant with value “” $TIME $DURATION Constant with value 0 This is the string provided by the SEVERITY parameter of the threshold Figure 6-1. Default Line from datamgr.conf File In the table above only the Trap Message and Severity fields can be set by the user in the SENDTRAPTO and SENDTRAP..TO actions (refer to the example below). IF( { CPU % System } + { CPU % User } ) > 90 FOR 60 seconds THEN SENDTRAPTO 127.0.0.1 SEVERITY 7 MESSAGE “CPU utilization above 90%”; This example will set Severity to 7, and the Trap Message to “CPU utilization above 90%”. 6-8 SightLine Power Agent for VOS Systems: Power Agent (R464) Threshd In addition, the SNMP trap contains the following identification: Enterprise Name Enterprise Id Event Name Generic Trap Specific Trap Number FORTEL 1.3.6.1.4.1.1130 VOS_Host_Alarm 6 2 Execute a Script The keyword EXECSHELL is used to specify an action where a command line script is to be executed. In general, the EXECSHELL command can have as many arguments as necessary. Messages The MultiAction Threshold Agent can send customized messages to alert users to a specific event. These messages are sent using a string surrounded by double quotation marks. Multiple strings in a row will be appended. For example, the following are equivalent: “this is a string” “this “ “is” “ a string” The messages can contain pertinent information about the exact event that triggered the alert. For example, the threshold value, the actual value, the duration and the severity can be inserted into the message using string substitutions. Threshd Command Line Options The user-configurable command line options for threshd can be displayed by issuing the following command: threshd -h The resulting output is: Usage: threshd [-bdfhvM] [-n name] -d -f -h -v -b -n name show more messages in log file (may be repeated) run in foreground display this message show version information swap IP address byte order in SNMP trap header specify alternate name for conf and log files -M generate SNMP MIB Threshd 6-9 Chapter 7 Servd The Service Daemon, servd, is the host listening process that detects requests for service from a PC client running SightLine Expert Advisor/Vision (EA/V). It uses a registered port, 1645. If there is a conflict in your environment and this port is already in use, you need to edit the slagent script under FRTLHOME>bin to start servd with an alternate port number. Edit the variable named servd_port located near the top of the script to reflect the new port number. Additionally, edit the port number that EA/V uses to request the service for data. You can find this port by rightclicking the connection in Enterprise View, and selecting Edit Connection, then selecting Advanced Settings. When a service request is received, servd starts the corresponding process. For example, EA/V starting a download requests the service vpdata. Receiving a vpdata request causes servd to execute a protomgr process. The protomgr process then manages the connection and data transmission between EA/V and the Power Agent. To establish a connection from EA/V, servd must be running on the managed node. In most cases, there will never be a need to modify servd. If you need to run any of the processes on ports other than the defaults, then you may need to modify some entries in the servd.conf file. Configuration The servd process employs several macros to provide arguments to the protomgr command line options. The default servd.conf file is shown below. # Macros: # %c SightLine supplied compression version # %e SightLine supplied encryption version # %p SightLine supplied callback port number # %h SightLine supplied collection host # %v SightLine supplied callback host # %k SightLine supplied key # # service hostname command # # The '-x1' option is for temporary backward compatibility; it allows metric # name + array subscript to be longer than SightLine/PC can deal with in several # cases, but the truncated subscripts are awkward with Oracle data. # # To truncate subscripts (and thus be able to save environments in SightLine), # remove the '-x1' option. # SERVICE vpdata localhost protomgr -P %p -V %v -K %k -C %c -E %e -c %h; Servd 7-1 Servd To properly set up and manage the connection from the agentmgr to EA/V on the client PC, several communication parameters are passed from EA/V to the host communication process, protomgr. Notice the parameters listed in the default configuration file above. These configuration settings should not be modified except in response to a request from FORTEL Customer Support or unless instructed to do so in the documentation. Subscript name truncation SightLine provides a configurable option to truncate names (such as file system names) that guarantees full compatibility with EA/V. This option is configured with the following command line options to the protomgr: -x1 -x2 allows long subscripts truncate long subscripts These should be set in the servd.conf configuration file. SightLine has a 128-character maximum for metric names. Servd Command Line Options The user-configurable command line options for servd can be displayed by issuing the following command: servd -h The resulting output is: Usage: servd [-dfhv] [-p port] -d -f -h -v -p port 7-2 turn on debugging (d can be repeated) run in foreground display this message display version information specify alternate TCP/UDP port SightLine Power Agent for VOS Systems: Power Agent (R464) Chapter 8 Protomgr The SightLine communications process, protomgr, is the process that manages the data transmission from the SightLine Power Agent on the host to SightLine Expert Advisor/Vision (EA/V) on the PC client. Each time an EA/V client sends a vpdata service request to a system, servd will start a protomgr process to manage the download for that client. There is a one-to-one relationship between the number of EA/V PCs downloading data and the number of protomgr processes running on the system. In addition to managing the data transmission to EA/V on the PC, protomgr defines userconfigurable items that can be modified to suit user requirements. These eight items are: 1. Data source selection — multiple data sources, such as Interface Agents, are defined in source.conf, and these definitions are passed to the protomgr. 2. Short ID definition — a one- to three-character string that is prepended to the trace file and metric names inside EA/V when there is more than a single system connected. This is used to identify which Power Agent a particular metric comes from. 3. Collection interval definition — the sample interval in an integer number of seconds. (NOTE: This interval must match the interval configured in the dbmgrd.conf file.) 4. Network Address Translation (NAT) and Firewall support — specify port ranges and peer IP address to allow EA/V to connect to the SightLine Power Agent through a firewall. 5. Download Throttling — limit the rate at which data is sent from protomgr to EA/V. 6. Enable/Disable metric selection — select which metric classes to enable or disable from collection. This section controls which metric classes are requested from agentmgr and datamgr. 7. Exclusion section — select metrics to exclude from sending to EA/V. This does not disable the metric from collection by agentmgr or datamgr. 8. Data redefinition — specify redefinitions of metrics to appear in EA/V. Each item is discussed in the following sections. Protomgr 8-1 Protomgr Default protomgr.conf The protomgr.conf file is used to control which metrics are collected, exclude metrics from appearing in EA/V, and reconfigure default metric definitions that originate from Interface Agents loaded by agentmgr. This includes all metrics from the System Interface Agent as well as any additional Interface Agents available. This feature provides backward compatibility for users of previous SightLine versions. The default protomgr.conf file is quite large and consequently will not be listed here. In the discussions below, appropriate portions of the file will be shown and referenced. Consult the actual file for the definitive answer for any configuration questions. Data Source Selection Multiple data sources, such as different Interface Agents, or different configurations, can be defined in the source.conf file. To discover data sources in EA/V, right-click on the connection in the Enterprise View, and select Discover Data Sources. EA/V will automatically create new entries for all data sources defined in the source.conf file. The source name defined in the source.conf file will be passed to protomgr at the command line with the –S option. Short ID Definition SightLine uses a Short ID to identify each system when monitoring more than one host. The Short ID is declared on the protomgr command line with the -s option. Unless specified otherwise, SightLine automatically truncates the system name to its first three characters for the Short ID. There may be times when this is undesirable, such as when two or more systems have the same first three characters in its name. For example, a system named TIGER1 and another name TIGER2 will both be truncated to TIG as a Short ID. In such a case, SightLine will increment the last character to the Short ID of the system that is opened second (for example, TIG and TIH). Because it is sometimes difficult to determine which connection was established first, confusion may arise as to which system is actually being monitored. In our example, TIG could refer to TIGER1 or TIGER2 depending on which connection was made first, which could change each time a connection is established. To circumvent this possibility, a user-defined Short ID can be inserted on the command line. The option can be entered in two ways. 8-2 1. In EA/V in the Options field of the Advanced Session Settings dialog box, as shown in the example below (see also the SightLine Expert Advisor/Vision User’s Guide). 2. In the source.conf file on the command line for protomgr. If the entry is made in the Advanced Session Settings dialog box, then it is only effective for a session launched with this specific Enterprise View. If the entry is made in the source.conf file, then it is effective for every session, regardless of which PC client initiates it. SightLine Power Agent for VOS Systems: Power Agent (R464) Protomgr Examples: 1. Advanced Session Settings dialog box: Figure 8-1. Advanced Session Settings Dialog Box 2. Add an argument like the one circled in the source.conf file: # Source Name Description # SOURCE system “System” -s m9 ; Options In both of these examples, the Short ID is configured to be “m9.” Other options are discussed in Option Flags later in this chapter. Collection Interval Definition The collection interval is the period between data samples requested by the protomgr process. The default interval is 30 seconds. You can change this parameter in the CINTERVAL statement of the protomgr.conf file. The CINTERVAL statement defines the interval in an integer number of seconds. The agentmgr will report values to the protomgr process at this interval, and ultimately to EA/V as well. Protomgr 8-3 Protomgr To modify this parameter, simply change the value 30 to the integer number of seconds desired. Example: CINTERVAL 60 In this example, the interval is configured to be 60 seconds. Notice that no semicolon is required after the statement. N O T E ————————————————————————The collection interval defined in the protomgr.conf must agree with the collection interval defined in the datamgr.conf file. A future release of the host software will permit multiple differing intervals. Network Address Translation (NAT) and Firewall Support Firewall support is provided by the –F and –f options to protomgr. These options can be used to specify the name and port numbers that protomgr uses to connect to EA/V. In a NAT network configuration, the locally defined IP address of a machine may not be the same when accessing the machine from outside the network. In this case, the –F option can be used to specify the callback IP address of the host to EA/V. -F host specify outbound hostname In a Firewall network configuration, you can control access to a network by limiting the number of open ports. The –f option can be used to specify a range of ports open for use by SightLine. -f port,port specify outbound port range These options can be specified in the source.conf file, or through EA/V’s Edit Connection | Advanced Settings | Options. Refer to Option Flags section later in this chapter for a complete list of protomgr options. Download Throttling The rate at which data is sent from protomgr to EA/V can be limited by using the –T option (throttle option) to protomgr, and by specifying the maximum number of characters per second to be sent. The –T option also has the effect of limiting the CPU and disk activity during this operation (with corresponding increase in download time). The –T option can be specified in the source.conf file, or through EA/V’s Edit Connection | Advanced Settings | Options. 8-4 SightLine Power Agent for VOS Systems: Power Agent (R464) Protomgr Enable/Disable Metric Selection The ENABLE and DISABLE sections in protomgr.conf allow you to control the metrics that are collected by agentmgr. The ENABLE section specifies a list of metric classes to be requested by protomgr from agentmgr and datamgr. The DISABLE section omits classes from the request. In both section, you can specify subclasses to enable or disable subsections of a metric class. In the example below, the subclasses “Disks” and “IOPs” of “VOS System.IOPs” have been disabled. The internal names can be found in the registry.csv file in the FRTLHOME>data>Local directory. # # The ENABLE section specifies what data gets requested from the agentmgr. # # ENABLE { General }, { VOS System }, { Workloads } # ENABLE ALL # ENABLE { VOS System }, { General }, { CPU Extra }, { DiskQueue } ENABLE { PathQueue }, { Process Info }, { Workloads } # # # # # # The DISABLE section specifies what data is not requested from the agentmgr. (Evaluated after ENABLE section.) DISABLE { VOS System.IOPs.Busses.Controllers.Disks } DISABLE { VOS System.IOPs } Exclusion Section The EXCLUDE section specifies a list of metrics to be excluded from sending to EA/V. Complete Data Groups and Event Classes may be excluded, as well as individual metrics. The names used in this section are the metric names as they appear in EA/V. Data Groups and Event Classes can be turned on and off, giving you the ability to control exactly which metrics are being reported and analyzed by EA/V. The ability to control the metrics that protomgr sends to EA/V allows you to configure the software specifically to your environment. Protomgr 8-5 Protomgr The syntax for EXCLUDE is as follows: # # # # # # # # # # # # # # # # # # # # # # # # # # # The EXCLUDE section specifies what data is suppressed from being sent to Sightline Expert Adviser. Data groups may be suppressed with: DATAGROUP "Module CPU Utilization" Individual metrics may be suppressed with: DATAGROUP "Module CPU Utilization" DATAVARIABLE "CPU % Idle" Likewise, eventscope classes and columns may be suppressed with: EVENTCLASS "Summary" EVENTCLASS "Summary" EVENTCOLUMN " %Cpu" The EXCLUDE section takes precendence over all definitions in DATA and EVENT sections. (Evaluated after DISABLE section.) EventScope column names must match exactly as listed in the $FRTLHOME>data>*>registry.csv file(s). EXCLUDE DATAGROUP "Module Processes" DATAGROUP "Module Memory" DATAVARIABLE "Mem Sys Page Faults/Sec" EVENTCLASS "Identification" EVENTCLASS "I/O" EVENTCOLUMN " Reads" end EXCLUDE Data Redefinition The DATA and EVENT sections of the protomgr.conf file specify redefinition of metrics, by either renaming, rescaling, or repositioning existing metrics prior to sending them to EA/V. DATA section metrics are organized in groups. Array metrics in the DATA section require specifying the source of the subscript names. # # # # # # # # GROUP "Module Disk Info" ARRAY NAME = { VOS System.IOPs.Busses.Controllers.Disks.Name } VARIABLE { Disk Size MB } POSITION { 4 } = { VOS System.IOPs.Busses.Controllers.Disks.Size, mbytes, delta } end ARRAY end GROUP The DATA section recognizes tokens GROUP, ARRAY, NAME, VARIABLE, and POSITION. The EVENT section recognizes tokens CLASS, COLUMN, and POSITION. On the left side of the VARIABLE and COLUMN lines is defined the metric name as it is to appear in EA/V, followed by the position it will appear in the EA/V Variable List or EventList. 8-6 SightLine Power Agent for VOS Systems: Power Agent (R464) Protomgr # # # # # CLASS "Identification" COLUMN " Pid" POSITION { 1 } = { VOS System.Processes.Pid } end CLASS The variable (metric) position defines the ordered location within the Group or Event Class. Omitting the POSITION setting, or setting the position to zero, will default the metric to be added to the end of the list. It is not necessary to redefine the entire GROUP or CLASS when modifying or adding metrics. The internal metric names (on the right side of VARIABLE and COLUMN lines) can be found in the file(s) $FRTLHOME>data>*>registry.csv. The internal metric name can be followed by one or more of the following to adjust the way the metric is presented in EA/V. (These words are not case sensitive.) Forms: count - Display the value of a cumulative counter. (If the metric indicates a system activity then this value will continually increase.) delta - Display difference in the value of a counter from the preceding sample. ( The metric will show 'operations per interval'.) rate - Display the difference in the value of a counter from the preceding sample, scaled by the time since the preceding sample. (The metric will show 'operations per second'.) raw - Display raw value, as collected by the agentmgr without modification Scaling: blocks gb gbytes kb kbytes mb mbytes milli noscale (or just a number: 1024, -30, +50) The DATA section specifies redefinitions of what will appear in EA/V. #DATA # # Report the change per interval in free memory in megabytes # # GROUP "Module Memory" # VARIABLE { Mem Free Pages } POSITION { 3 } = # { VOS System.CPU.Free Memory, mbytes, delta } # end GROUP # # # Use 60000 because the base time unit is milliseconds and use # # 'count' to get an ever-increaing value # # GROUP "Module CPU Utilization" # VARIABLE { CPU Idle Minutes } POSITION { 2 } = # { VOS System.CPU.Empty Idle, 60000, count } # end GROUP # # # Computed metrics may be combined with internal metrics, and multiple # # arrays may be added to the same group in the DATA section. Protomgr 8-7 Protomgr # # GROUP "Module Disk Units" # ARRAY NAME = { VOS System.IOPs.Busses.Controllers.Disks.Name } # VARIABLE { Disk Rd % Busy } POSITION { 1 } = # { VOS System.IOPs.Busses.Controllers.Disks.% Read Queue Busy Time } # end ARRAY # ARRAY NAME = { DiskQueue.Name } # VARIABLE { Disk % Busy } POSITION { 1 } = # { DiskQueue.Disk % Busy } # end ARRAY # end GROUP # #end DATA The EVENT section specifies redefinitions of what will appear in the EventList windows in EA/V. It is important that the number and placement of the spaces in the COLUMN name matches exactly the COLUMN names as listed in the file $FRTLHOME>data>Local>registry.csv. Duplicate entries may result if the placement of the spaces differs. All column metrics must originate from the same internal class in the EVENT section. Computed metrics may not be combined with internal metrics within Event Classes. #EVENT # # CLASS "Summary" # COLUMN " Reads" POSITION { 4 } = { VOS System.Processes.Reads, 10000 } # end CLASS # #end EVENT Event Data The EventList Window section of the SightLine Expert Advisor/Vision User’s Guide describes EA/V’s EventList window. For VOS systems, Summary, CPU, Memory, I/O, and Identification Event Classes are delivered by default. Each Event Class displays information about a select group of processes on the system for each interval. Figure 8-2 shows an example of the Summary EventList display. 8-8 SightLine Power Agent for VOS Systems: Power Agent (R464) Protomgr Figure 8-2. Summary Event Class Data Passing protomgr Command Options As mentioned earlier, command options can be passed from EA/V on the PC in the Options field of the Advanced Session Settings dialog box (Figure 8-1). The protomgr command options allow users to configure more complex working environments, such as user-defined Short IDs. See the section Option Flags for a complete list of protomgr options. Option Flags The user-configurable command line options for protomgr can be displayed by issuing the following command: protomgr -h The “-h” option returns the possible options available with this executable. The output is shown below. usage: protomgr [-dhv] -c host -P port -V host -K key [-C version] [-E version] [-m host] [-n name] [-p port] [-q port] [-D database] [-s shortId] [-x1] [-x2] [-x3] [-x4] [-S source] [-i interval] [-e maxEvent] [-F host] [-f port,port ] [livewanted={t|f|y|n}] -d -h -v display -C version turn on display version version debugging this message information of SightLine compression to use Protomgr 8-9 Protomgr 8-10 -c host -D database -E version -e maxEvent -F host -f port,port -K key -i interval -m host -n name -P port -p port -q port -r -S source -s shortId -T -t -V host -x1 -x2 -x3 -x4 specify alternate agentmgr host name of datamgr database to read version of SightLine encryption to use specify maximum limit to EventList in bytes specify outbound hostname specify outbound port range specify SightLine key specify collection interval (bypass config file) specify alternate datamgr host specify alternate name for conf and log files specify SightLine 'answer' port specify alternate agentmgr port specify alternate datamgr port allow non-updated variables specify data source (as defined in source.conf) specify short ID for host throttling algorithm throttling algorithm on reading side specify SightLine 'answer' host allow long subscripts truncate long subscripts restart on normal exit terminate without verifying done livewanted= control whether or not protomgr switches to live data or not after sending historical data SightLine Power Agent for VOS Systems: Power Agent (R464) Chapter 9 Command Syntax — Quick Reference The following sections show the help output from each SightLine Power Agent process. Agentmgr Usage: agentmgr [-dfhuv] [-n name] [-p port] [-z level] -d -f -h -u -v -n name -p port -z level show more messages in log file (may be repeated) run in foreground display this message do not compress data (same as -z 0) display version information specify alternate conf and log file name specify TCP port to listen on specify compression level (0=off, 9=max) Datamgr Usage: datamgr [-dfhv] [-q port] [-n name] [-O output_file_number] -d -dd -f -h -v -q port -n name -O number turn on debugging turn on an additional level of debugging run in foreground display this message display version information specify alternate datamgr port specify alternate name for conf and log files suffix to the datamgr.out file Threshd Usage: threshd [-bdfhvM] [-n name] -d -f -h -v -b -n name -M show more messages in log file (may be repeated) run in foreground display this message show version information swap IP address byte order in SNMP trap header specify alternate name for conf and log files generate SNMP MIB Command Syntax — Quick Reference 9-1 Command Syntax — Quick Reference Servd Usage: servd [-dfhv] [-p port] -d -f -h -v -p port turn on debugging (d can be repeated) run in foreground display this message display version information specify alternate TCP/UDP port Protomgr usage: protomgr [-dhv] -c host -P port -V host -K key [-C version] [-E version] [-m host] [-n name] [-p port] [-q port] [-D database] [-s shortId] [-x1] [-x2] [-x3] [-x4] [-S source] [-i interval] [-e maxEvent] [-F host] [-f port,port ] [livewanted={t|f|y|n}] 9-2 -d -h -v display -C version -c host -D database -E version -e maxEvent -F host -f port,port -K key -i interval -m host -n name -P port -p port -q port -r -S source -s shortId -T -t -V host -x1 -x2 -x3 -x4 turn on debugging display this message version information version of SightLine compression to use specify alternate agentmgr host name of datamgr database to read version of SightLine encryption to use specify maximum limit to EventList in bytes specify outbound hostname specify outbound port range specify SightLine key specify collection interval (bypass config file) specify alternate datamgr host specify alternate name for conf and log files specify SightLine 'answer' port specify alternate agentmgr port specify alternate datamgr port allow non-updated variables specify data source (as defined in source.conf) specify short ID for host throttling algorithm throttling algorithm on reading side specify SightLine 'answer' host allow long subscripts truncate long subscripts restart on normal exit terminate without verifying done livewanted= control whether or not protomgr switches to live data or not after sending historical data SightLine Power Agent for VOS Systems: Power Agent (R464) Command Syntax — Quick Reference slagent Command Macro Usage: slagent action_string [-da] [-dd] [-ds] [ -dt ] { start | stop | restart | status } agentmgr in diagnostic mode servd in diagnostic mode datamgr in diagnostic mode threshd in diagnostic mode -da -ds -dd -dt starts starts starts starts start stop restart starts all SightLine Power Agent processes stops all SightLine Power Agent processes stops and then restarts all SightLine Power Agent processes reports the status of all SightLine Power Agent processes status SightLine Power Agent for VOS Systems: Power Agent 9-3 Chapter 10 Troubleshooting This chapter provides help in troubleshooting problems with the SightLine Power Agent software. The table below is a list of common problems and suggested cures. If these suggestions do not apply to your situation, you might be able to gain more insight by inspecting the various log files. The following are the procedures that are used to determine the cause of an error. If you start up the SightLine Expert Advisor/Vision (EA/V) client software and cannot connect to the host, inspect the protomgr log, which is located in the following directory: FRTLHOME>log>protomgr.log Based on messages in the log, inspect the logs for datamgr and agentmgr, located in the following directory: FRTLHOME>log>datamgr.log FRTLHOME>log>agentmgr.log Further debug messages can be supplied if the software components are run in debug mode. To restart agentmgr and datamgr in debug mode, issue the following command: slagent –da –dd start The first parameter invokes the agentmgr in debug mode, while the second parameter invokes datamgr. The easiest method for starting protomgr in debug mode is to set the option in EA/V, under Advanced Session Settings Options. Contact SightLine Technical Support for additional help. Troubleshooting 10-1 Troubleshooting Symptom Possible Cures Agentmgr starts then stops a few minutes later Check the agentmgr.log file; the key may have expired or may be invalid The agentmgr socket may not have timed out; perform a netstat – na and see if 8700 is in use No data files are being created Check to make sure datamgr is running Check the datamgr.log file Restart datamgr “Connection refused by host” in EA/V log file Ensure servd is running on the host Check the servd.log file Restart servd “Remote closed session (received FD_CLOSE)” in EA/V log file Check the protomgr.log file EA/V PC download stop at “Host is Version…” Agentmgr is not running or string mismatch between HostName used in EA/V on the PC & in datamgr.conf Attempt to reconnect Started download before software was initialized; restart download Check the log files for errors/warnings Restart all processes No Event data in EventList window No .vev file on PC Not all processes starting Make sure the .pms exist in FRTLHOME>bin Check the slagent.cm command macro in FRTLHOME>bin to ensure that frtldbm is set to 1 Log files contain the message “unable to bind address to socket...” 10-2 Some ports that the software is trying to use are already occupied. Run the command netstat -na and see if the port is already being used. If the socket is in use by another application, the SightLine processes can be assigned to another port. See the chapter pertaining to the problem process for a description of how to change the port. SightLine Power Agent for VOS Systems: Power Agent (R464) Troubleshooting Symptom Possible Cures Error message reads “No data available for # second interval” in datamgr.conf or EA/V The collection intervals in datamgr.conf and protomgr.conf do not match. Edit the protomgr.conf file to reflect the appropriate interval. New Workloads are not showing up in EA/V on PC Check the agentmgr.log file to see if there are any syntax errors in agentmgr.conf. Make sure you have stopped and restarted the software to recognize the changes. Make sure you have reinitialized the trace file on the PC in Create mode to force down the new symbol table. Troubleshooting 10-3 Appendix A Analyze_system Interface The SightLine Power Agent for VOS Systems has an interface to analyze_system that is installed by default with the Power Agent software. This interface is initiated automatically when the agentmgr process is started. To configure the analyze_system interface, update the analyze_system.conf file in the FRTLHOME>etc> directory. If you do not want this interface, edit FRTLHOME>bin>slagent.cm and set frtlasi to 0. Configuration The analyze_system.conf file must be updated if you want to use the analyze_system interface. This file contains many comments to aid you with your configuration. The default analyze_system.conf file is shown here. # ident "@(#)$Id: analyze_system.conf,v 1.1.2.1 2001/07/04 11:58:45 dsimmond Exp $ # # PROGRAM is used to specify the program to run # Full path name needed if cannot be found via start_up.cm # of the user who started agentmgr PROGRAM "analyze_system" # PROMPT is the string that is sent back when the program is ready # for more input PROMPT "as: " # When outputting to a pipe file, analyse_system does not flush # the as: prompt until it has some more output. To overcome this dummy # requests are sent. If this is fixed or you are using a different program # which DOES flush the prompt, you can switch on NOFLUSH #NOFLUSH # ERRMATCH is used to check for errors. If the output from a request matches # one of the ERRMATCH lines, the request has failed and will not be attempted # at any further intervals. ERRMATCH "Entry point name not found." ERRMATCH "No program is currently loaded." Analyze_system Interface A-1 Appendix A - Analyze_system Interface # There are two types of requests possible. # Ones for array classes, and ones for scalar classes. # The format of an array class is as follows: # CLASS { Class Name } QUERY "query string" [MODULE "module name"] # [MATCH "match string"] [NOTMATCH "notmatch string" ... ] # [STARTLINE { start line }] [ENDLINE { end line }] # ARRAY NAME COLUMN { array name column number } # VARIABLE variable_type { variable name } [GROUPNAME { group name }] # [PCNAME { pc name }] [POSITION { position }] # [INPFORM input_form] [INPSCALE input_scale] [OUTFORM output_form] # [OUTSCALE output_scale] [EVENT] # COLUMN { column number }; # ... # end ARRAY # end CLASS # # # # # # # # # # # # # # Anything in square braces is optional. ... indicates more than one of the preceeding is possible. The idea is that "query string" is sent to analyze_system and the output read back. Before that any "module name" or "match string"'s are sent. Any lines matching any "notmatch string"'s are ignored. if { start line } is specified, any lines before this number are skipped. e.g. if { start line } is { 2 }, the first line is skipped. if { end line } is specified, this number of lines are skipped at the end. if { end line } is negative, it is taken as the number of lines to skip at the end. e.g. STARTLINE { 2 } ENDLINE { 4 } will process lines 2,3 and 4. e.g. STARTLINE { 3 } ENDLINE { -1 } will skip the first two lines and the last one. # # # # # # # # # # # # # # Each line that is read in is broken into columns. Separators are space, tab and slash. { array name column number } specifies which column is the name of the array element, for example if you were listing disks, this would be the disk name. variable_type can be any one of: bool, float, float64, int, int32, int64, string, str8, uint, u_int, uint32, u_int32, uint64, u_int64. { variable name } is the internal name of the variable. { group name } is the group that the variable will appear under on the PC. { pc name } is the name of the variable on the PC. { position } is for ordering the variables on the PC. input_form and input_scale are the form and scale of the variable as it appears in the line read from analyze_system. output_form and output_scale are the form and scale of the variable as it should appears on the PC. # Form can be any one of: count, zcount, delta, rate, raw, pctdur. # Scale can be any one of: base, byte, blocks, pblocks, gb, gbytes, # kb, kbytes, mb, mbytes, kilo, mega, giga, milli, second, seconds, minute, # minutes, hour, hours, day, days, noscale. # Specify EVENT if the variable is to appear in the event scope. # { column number } specifies which column the variable appears in. A-2 SightLine Power Agent for VOS Systems: Power Agent (R464) Appendix A - Analyze_system Interface # Here is an example array class. #CLASS { Channels } QUERY "dump_channels -meter" MATCH "term" # NOTMATCH "not asynchronous." # STARTLINE { 2 } ENDLINE { -1 } # ARRAY NAME COLUMN { 1 } # VARIABLE FLOAT { Ochars } GROUPNAME { Module Channels } # PCNAME { Ochars } POSITION { 2 } INPFORM count OUTFORM delta # COLUMN { 3 }; # VARIABLE FLOAT { Ichars } GROUPNAME { Module Channels } # PCNAME { Ichars } POSITION { 1 } INPFORM count OUTFORM delta # COLUMN { 2 }; # end ARRAY #end CLASS # The format of a scalar class is as follows: # CLASS { Class Name } QUERY "query string" [MODULE "module name"] # [MATCH "match string"] [NOTMATCH "notmatch string" ... ] # VARIABLE variable_type { variable name } [GROUPNAME { group name }] # [PCNAME { pc name }] [POSITION { position }] # [INPFORM input_form] [INPSCALE input_scale] [OUTFORM output_form] # [OUTSCALE output_scale] [EVENT] # LINE { line number } COLUMN { column number }; # ... # end CLASS # # # # # # # # # # # # # # # # # # Anything in square braces is optional. ... indicates more than one of the preceeding is possible. The idea is that "query string" is sent to analyze_system and the output read back. Before that any "module name" or "match string"'s are sent. Any lines matching any "notmatch string"'s are ignored. Each line that is read in is broken into columns. Separators are space, tab and slash. variable_type can be any one of: bool, float, float64, int, int32, int64, string, str8, uint, u_int, uint32, u_int32, uint64, u_int64. { variable name } is the internal name of the variable. { group name } is the group that the variable will appear under on the PC. { pc name } is the name of the variable on the PC. { position } is for ordering the variables on the PC. input_form and input_scale are the form and scale of the variable as it appears in the line read from analyze_system. output_form and output_scale are the form and scale of the variable as it should appears on the PC. # Form can be any one of: count, zcount, delta, rate, raw, pctdur. # # # # # # Scale can be any one of: base, byte, blocks, pblocks, gb, gbytes, kb, kbytes, mb, mbytes, kilo, mega, giga, milli, second, seconds, minute, minutes, hour, hours, day, days, noscale. Specify EVENT if the variable is to appear in the event scope. { line number } specifies which line the variable appears in. { column number } specifies which column the variable appears in. # Here is an example scalar class. Analyze_system Interface A-3 Appendix A - Analyze_system Interface #CLASS { Cache Meters } QUERY "cache_meters" MODULE "%es#m18" # VARIABLE FLOAT { File Hits } GROUPNAME { Module Cache } # PCNAME { File Hits/Sec } POSITION { 1 } INPFORM count OUTFORM rate # LINE { 3 } COLUMN { 3 }; # VARIABLE FLOAT { Directory Hits } GROUPNAME { Module Cache } # PCNAME { Directory Hits/Sec } POSITION { 2 } INPFORM count OUTFORM rate # LINE { 7 } COLUMN { 3 }; # VARIABLE FLOAT { Directory Misses } GROUPNAME { Module Cache } # PCNAME { Directory Misses/Sec } POSITION { 3 } INPFORM count OUTFORM rate # LINE { 7 } COLUMN { 5 }; #end CLASS CLASS { IOP Meters } QUERY "use_iop 10 –file (master_disk)>system>prom_code>K6000fw18.0rom;dump_iop_meters" VARIABLE FLOAT { CmdsSent2 } GROUPNAME { Module AS } PCNAME { CmdsSentRate} POSITION { 2 } INPFORM count OUTFORM rate LINE { 5 } COLUMN { 2 }; VARIABLE FLOAT { IdleSecsDel } GROUPNAME { Module AS } PCNAME { IdleSecsDel } POSITION { 2 } INPFORM count OUTFORM delta LINE { 6 } COLUMN { 2 }; end CLASS CLASS { Cache Meters } QUERY "use_module;cache_meters" VARIABLE FLOAT { File Hits } GROUPNAME { Module ASCache } PCNAME { File Hits/Sec } POSITION { 1 } INPFORM count OUTFORM rate LINE { 6 } COLUMN { 3 }; VARIABLE FLOAT { Directory Hits } GROUPNAME { Module ASCache } PCNAME { Directory Hits/Sec } POSITION { 2 } INPFORM count OUTFORM rate LINE { 10 } COLUMN { 3 }; VARIABLE FLOAT { Directory Misses } GROUPNAME { Module ASCache } PCNAME { Directory Misses/Sec } POSITION { 3 } INPFORM count OUTFORM rate LINE { 10 } COLUMN { 5 }; end CLASS A-4 SightLine Power Agent for VOS Systems: Power Agent (R464) SightLine Power Agent for VOS Systems Analysis version 6.1 Contents Introduction 1 Sample Environment: VOS.VEN Main Page CPU Utilization Plot Memory Pages in Use and Free Plot I/O Rate Plot CPU Usage by Workload Plot Process States TopList CPU Page Avg CPU Response Plot Queue Meter Seconds Plot CPU Completions and Interrupts Plot CPU % Busy and % Wait Plot CPU Usage by Workload Plot Active CPUs Plot Memory Page Total Memory and Free Pages Plot Paging File Usage Plot Page Faults by Type Plot Cache Hit Rate Plot Wired/Unwired Pages Plot Workload Memory Usage TopList I/O Page Disk I/O Plot Disk % Busy Plot Disk I/O TopList File I/O Plot Disk Free Space (MB) Plot Wkld Disk I/O TopList Member Count Plot % Free Space Plot Avg Q Length by Disk Plot Cache Page Cache Activity and Hit Rate Plot Cache Soils Plot Workload Cache Usage TopList Workloads Page CPU Usage by Workload Plot Task Count by Workload Plot Workloads TopList Wkld Memory Usage Plot 2 2 3 4 5 6 7 7 7 9 10 11 12 13 14 14 15 16 17 18 19 20 20 21 22 23 24 25 26 27 28 29 29 30 31 32 32 33 34 35 Contents i Contents Reads by Workload Plot Writes by Workload Plot 36 37 Sample AutoAlert System: VOS.VTH Memory % Free Page File % Free Cache Read Hit % Page Fault Rate CPU % Other CPU % Interrupts Total CPU Utilization CPU Wait-Busy Ratio 38 38 39 39 39 39 40 40 40 AutoAnalyze Rules and Reports Disk-Busy CPU-Waiting Exceeds Running Disk-Error(s) Detected CPU-Busy CPU-Too many Interrupts CPU-High Scheduler Overhead Memory-High Page Fault Rate Cache-Low Hit Rate Disk-File Sys Free Space Low Memory-Pagefile Space Low Memory-Low Free Space 40 41 41 41 41 41 42 42 42 42 42 43 ii SightLine Power Agent for VOS Systems: Analysis (R464) Figures Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7. Figure 8. Figure 9. Figure 10. Figure 11. Figure 12. Figure 13. Figure 14. Figure 15. Figure 16. Figure 17. Figure 18. Figure 19. Figure 20. Figure 21. Figure 22. Figure 23. Figure 24. Figure 25. Figure 26. Figure 27. Figure 28. Figure 29. Figure 30. Figure 31. Figure 32. Figure 33. Figure 34. Figure 35. CPU Utilization Plot Memory Pages in Use and Free Plot I/O Rate Plot CPU Usage by Workload Plot Process States TopList Avg CPU Response Plot Queue Meter Seconds Plot CPU Completions and Interrupts Plot CPU % Busy and % Wait Plot CPU Usage by Workload Plot Active CPUs Plot Total Memory and Free Pages Plot Paging File Usage Plot Page Faults by Type Plot Cache Hit Rate Plot Wired/Unwired Pages Plot Workload Memory Usage TopList Disk I/O Plot Disk % Busy Plot Disk I/O TopList File I/O Plot Disk Free Space (MB) Plot Wkld Disk I/O TopList Disk % Busy Plot % Free Space Plot Avg Q Length by Disk Plot Cache Activity and Hit Rate Plot Cache Soils Plot Wkld Cache Usage TopList CPU Usage by Workload Plot Task Count by Workload Plot Task Count by Workload Plot Wkld Memory Usage Plot Reads by Workload Plot Writes by Workload Plot 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Figures iii Analysis This section of the SightLine Power Agent for VOS Systems User’s Guide describes the sample files provided on your installation media. By using these examples to “jump start” your SightLine sessions, you’ll spend less time learning how SightLine works and more time learning how SightLine works for you. Introduction The sample file collection on your distribution package includes several types of SightLine Expert Advisor/Vision (EA/V) files that you can use on your system: • Environments (file extension .VEN) — A set of plots and TopLists on one or many pages. • Threshold Systems (file extension .VTH) – A set of conditions that, when violated, cause EA/V to display a message and, optionally, take other action. • AutoAnalyze rules and reports — A set of rules, implemented using expressions (file extension .VEX) and reports (Word .RTF files with linked EA/V plots (.VPL)) that summarize the activity and performance of your VOS system. Each of these is described in the remaining sections of this chapter. Analysis 1 Analysis Sample Environment: VOS.VEN The sample environment provided with the EA/V installation kit, VOS.VEN, gives you a good starting point for performance analysis. Use it as delivered, or modify it to build a library of analytical views that fits the specific needs of your own computing environment. By default, a copy of this environment is loaded for each live connection. To manually load an environment, click File | Open | Environment to bring up the Open Environment dialog box. Then, select the filename of the environment you want to load. At the bottom of the dialog box, choose the system to which you want the environment to apply (Force into Trace File System) and the environment load options. The following sections describe the sample environments shipped with your EA/V installation kit. The descriptions follow this format: Page Name: The name of the page and a brief description of its intended purpose and the plots it contains. Plot Title: If there are multiple plots on the page, each will have its own descriptive paragraph. This is usually the text from the plot title bar and a brief description of the issue that it addresses, the metrics it contains, how it is formatted, and how to use it to interpret your system’s condition. The VOS.VEN environment uses plots to give you a high-level overview of the system resources CPU, Memory, I/O, and Workload. The VOS environment uses EA/V’s drill-down feature to help you quickly pinpoint system problems. It consists of six pages, each with multiple plots and TopLists. Use it as a launch point for analysis or as a general purpose screen to keep up at all times. You can customize it to suit your site’s needs. The drill-down links exist in each plot on the Main Page, and on selected plots on the CPU and Memory pages, as described in the diagram below. To activate the drill-down links, double-click on the active plot when the cursor looks like a little hand. If you double-click anywhere else, you will return to the Main page of the environment. Main CPU I/O Workloads Memory Workloads Cache Main Page The Main Page of VOS.VEN contains four plots and a minimized TopList window. This is the top level page of the VOS environment. Each plot contains drill-down links to pages detailing that plot’s contents. Double-click on a plot when the cursor looks like a small hand, and EA/V will present the page that contains the detailed display you need to identify potential problems before 2 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis they seriously affect your system. For more information on EA/V’s drill-down feature, see the topic, Fixed Section, Page, in the SightLine Expert Advisor/Vision User’s Guide. CPU Utilization Plot The CPU Utilization plot (Figure 1) describes how much of your total CPU resource is being utilized. The CPU resources are broken up in the components as displayed by the display_system_usage –long VOS command. Figure 1. CPU Utilization Plot CPU % System tells you what percentage of the CPU resource all the system processes are using. CPU % User tells you what percentage of the CPU resource all the user processes are using. CPU % Server tells you what percentage of the CPU resource all the server processes are using. CPU % Interrupts tells you what percentage of the CPU resource was used to handle the interrupts on a module. CPU % User PF tells you what percentage of the CPU resource the page faulting activity done by all the user processes is consuming. CPU % System PF tells you what percentage of the CPU resource the page faulting activity done by all the system processes is consuming. Analysis 3 Analysis CPU % Server PF tells you what percentage of the CPU resource the page faulting activity done by all the server processes is consuming. CPU % Other tells you what percentage of the CPU resource was spent in the scheduler. Memory Pages in Use and Free Plot The Memory Pages in Use and Free plot (Figure 2) describes how much of your total memory resource is being utilized. The memory resources are broken up in the components as described below. Figure 2. Memory Pages in Use and Free Plot Mem System-Cache Pages is an expression metric (“Mem System Pages” – “Mem Cache Phys Pages”) that tells you how many pages the VOS operating system is using, not including the pages in use by the Cache Manager. Mem Cache Phys Pages tells you how many pages of memory the cache manager is using. Mem User Pages is an expression metric (“Mem Total Pages” – “Mem Free Pages” –”Mem System Pages”) that tells you how many pages of memory user processes are using. Mem Free Pages tells you how many pages of memory are currently free (unassigned to the OS or to any process). 4 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis I/O Rate Plot The I/O Rate plot (Figure 3) shows the overall I/O activity on the system, broken down into the components described below. Figure 3. I/O Rate Plot Disk User Reads/Sec tells you the number of reads per second that all the user processes were charged with. A process is charged for a read when it requests a block of disk that is not in the cache manager and, thus, a physical I/O is required. Disk User Writes/Sec tells you the number of writes per second that all the user processes were charged with. A process is charged for a write when it is the first process to modify (dirty) a block in the cache manager and, thus, it needs to be written out sometime in the future. Disk Sys Reads/Sec tells you the number of reads per second that all the system processes were charged with. A process is charged for a read when it requests a block of disk that is not in the cache manager and, thus, a physical I/O is required. Disk Sys Writes/Sec tells you the number of writes per second that all the system processes were charged with. A process is charged for a write when it is the first process to modify (dirty) a block in the cache manager and, thus, it needs to be written out sometime in the future. Disk Svr Reads/Sec tells you the number of reads per second that all the server processes (StrataLink, StrataNet, and OSL) were charged with. A process is charged for a read when it requests a block of disk that is not in the cache manager and, thus, a physical I/O is required. Analysis 5 Analysis Disk Svr Writes/Sec tells you the number of writes per second that all the server processes (StrataLink, StrataNet, and OSL) were charged with. A process is charged for a write when it is the first process to modify (dirty) a block in the cache manager and thus it needs to be written out sometime in the future. CPU Usage by Workload Plot The CPU Usage by Workload plot (Figure 4) shows the overall CPU usage, broken down by the workloads as defined in the FRTLHOME>etc>agentmgr.conf file. (You should modify agentmgr.conf to capture the work done by your applications. See Chapter 4 of the Power Agent section of this User’s Guide for a description of defining workloads.) Process and workload metrics are all on a per CPU basis, so this plot can scale from 0 to 100 * Number of CPUs. Figure 4. CPU Usage by Workload Plot Wkld % CPU for a workload tells you what percentage of the total available CPU resource the workload is consuming. It is reported on a scale from 0 to 100 * Number of CPUs 6 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Process States TopList The Process States TopList (Figure 5) shows the total number of processes active, broken down by the components described below. Figure 5. Process States TopList Procs Count for WaitShrt tells you how many processes are in a short wait state. A process in the wait state (state = 4, also called short wait) is waiting for some action to occur; typically, it is waiting for some I/O. Procs Count for Rdy tells you how many processes are in a ready state. A process in the ready state (state = 1) is either executing or waiting to execute in the CPU queue. Procs Count for Frozen tells you how many processes are in a frozen state. A process in put in a frozen state (state = 2) when a privileged user performs a freeze_process command for that process. The thaw_process command restores a process to the ready or waiting state. Procs Count for Stopped tells you how many processes are in a stopped state. A process in the wait state (state = 0) has terminated but is waiting for VOS to destroy it. CPU Page The CPU Page is intended to provide a detailed look at the activity and performance of the processors. It contains six plots. Avg CPU Response Plot The Avg CPU Response plot (Figure 6) reports the average number of milliseconds per visit to the CPU. The total response time (CPU Residence Msec/Completion) and its components (CPU Wait and CPU Busy) are shown. Analysis 7 Analysis Figure 6. Avg CPU Response Plot CPU Resident Msec/Completion is an expression metric (1000 * “CPU Residence Time Secs” / (“CPU Completes/Sec” * “Interval”)) that reports the total number of milliseconds an average visit to the CPU took. This includes processing and waiting time. CPU Busy Msec/Completion is an expression metric (1000 * “CPU Busy Time Secs” / (“CPU Completes/Sec” * “Interval”)) that reports the average number of milliseconds each visit to the CPU took. CPU Wait Msec/Completion is an expression metric (1000 * “CPU Wait Time Secs” / (“CPU Completes/Sec” * “Interval”)) that reports the average number of milliseconds the process waited for a visit to the CPU. 8 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Queue Meter Seconds Plot The Queue Meter Seconds plot (Figure 7) reports the number of seconds during the interval that the CPU was busy and the number of seconds that processes spent waiting for the processor. Figure 7. Queue Meter Seconds Plot CPU Wait Time Secs tells you how many seconds all the processes on the module were waiting just for the CPU. This meter shows the delay a lack of CPU resources is imposing on the active processes. There is no upper bound for this meter as (within certain limits) you can keep adding processes to an overwhelmed 100% CPU busy system and they too will sit in the queue for the CPU and push this meter higher. CPU Busy Time Secs tells you how many seconds all the processes on the module were executing on a CPU. The largest possible value here is the sample period (CPU Queue Meter_ET) times the number of logical CPUs (CPU Logical Cpus). At that point, the module would be 100% busy. Analysis 9 Analysis CPU Completions and Interrupts Plot The CPU Completions and Interrupts plot (Figure 8) reports the rate of CPU completions (visits) and the rate of CPU interrupts. Figure 8. CPU Completions and Interrupts Plot CPU Completes/Sec tells you how often a process on this module loses the CPU it was running on. A process can lose a CPU because it terminates, decides to wait for some event, waits for an IO, or because some other process with a better priority takes the CPU away. CPU Interrupts/Sec tells you the number of interrupts per second on the module. Interrupts generally happen when a comm or a disk controller has completed some operation the system requested or is waiting for. There are some time-based interrupts, but for the most part, the number of interrupts is determined by how much work you are pushing through the system. The interrupt rate and the amount of work you are pushing through the module should correlate well. If you notice a sudden jump in interrupts for a given workload, you may have a problem with a comm line or some process is doing things in an inefficient manner (no wait IO, sending small rather than large packets, etc.). 10 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis CPU % Busy and % Wait Plot The CPU % Busy and % Wait plot (Figure 9) reports the percentage of time the CPU was busy processing and the percentage of time that processes were waiting for the CPU. Figure 9. CPU % Busy and % Wait Plot CPU % Busy tells what percent of the time the CPU(s) on the module are busy. This meter should vary between ~0% for an idle machine to ~100% for a machine that is consuming all of its CPU power. All processes running on this module contribute to this total. CPU % Wait tells you how much time all the processes spent waiting for the CPU, expressed as a percentage. This value can grow above 100%, as more and more jobs waiting in the queue contribute to overall wait time. Analysis 11 Analysis CPU Usage by Workload Plot The CPU Usage by Workload plot (Figure 10) shows the overall CP U usage, broken down by the workloads as defined in the FRTLHOME>etc>agentmgr.conf file. (You should modify agentmgr.conf to capture the work done by your applications. See Chapter 4 in the Power Agent section of this User’s Guide for a description of defining workloads.) Process and workload metrics are all on a per CPU basis, so this plot can scale from 0 to 100 * Number of CPUs. Figure 10. CPU Usage by Workload Plot Wkld % CPU for a workload tells you what percentage of the total available CPU resource the workload is consuming. It is reported on a scale from 0 to 100 * Number of CPUs 12 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Active CPUs Plot The Active CPUs plot (Figure 11) shows average number of CPUs active during the interval. Figure 11. Active CPUs Plot CPU Logical CPUs tells you the number of logical CPUs the system has. It tells you how many processes can be simultaneously executing. On a Stratus computer, there are multiple physical CPU chips running in lockstep as one logical CPU to provide the hardware fault tolerance, thus the distinction between logical and physical CPUs. Analysis 13 Analysis Memory Page The Memory Page is designed to provide details about the activity and performance of memory. It contains five plots and one TopList. Total Memory and Free Pages Plot The Total Memory and Free Pages plot (Figure 12) shows the total number of 4096 byte pages configured on the system, and the number of those which are free or unused. Figure 12. Total Memory and Free Pages Plot Mem Total Pages tells you how many pages of memory are installed on the module. Mem Free Pages tells you how many pages of memory are currently free (unassigned to the OS or to any process). Under normal circumstances, the system never runs out of free pages. When the number of free pages gets down to about 2% of main memory, the paging daemon starts creating new free pages by tossing old pages out of memory. 14 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Paging File Usage Plot The Paging File Usage plot (Figure 13) shows the free and used pages for the Paging file. Figure 13. Paging File Usage Plot Paging Used Pages tells you the total number of pages in use in your paging partition and paging file(s). For every page of memory in a user process there is allocated one page (block) of paging partition space as insurance so VOS knows it has someplace to put the page if memory gets full. Even if there is lots of room in memory this one to one allocation continues. Pages are not written to disk unless there is a memory shortage. Paging Free Pages tells you the total number of pages of free space in your paging partition and paging file(s). For every page of memory in a user process there is allocated one page (block) of paging partition space as insurance so VOS knows it has someplace to put the page if memory gets full. Running out of paging space is bad because processes can die. Analysis 15 Analysis Page Faults by Type Plot The Page Faults by Type plot (Figure 14) shows the rate at which page faults were generated, broken down by the components described below. Figure 14. Page Faults by Type Plot Mem User Page Faults/Sec tells you the number of page faults the user processes (not VOS system or server processes) were charged for during the sample. Specifically, this meter tells you how many times a process discovered that a page it needs is not there. Many types of page faults require no disk IO and can happen when there is plenty of free memory, so don’t automatically assume that a high page fault rate means the module is low on memory. Mem Sys Page Faults/Sec tells you the number of page faults the system processes (overseer, rsn, tp_overseer, mail_handler, batch_overseer, cache_manager) were charged for during the sample. Specifically, this metric tells you how many times a system process discovered that a page it needs is not there. Many types of page faults require no disk IO and can happen when there is plenty of free memory, so don’t automatically assume that a high page fault rate means the module is low on memory. Since these processes don’t tend to page fault at all until there is a real lack of memory, any page faulting here should get your attention focused on the memory resource. Mem Svr Page Faults/Sec tells you the number of page faults the server processes (link_server, network_client, StrataNet, network_server, open_client, open_server, osl_server) were charged for during the sample. Specifically, this metric tells you how many times a server process discovered that a page it needs is not there. Many types of page faults require no disk IO and can happen when there is plenty of free memory, so don’t automatically assume that a high page fault rate means the module is low on memory. Since these processes don’t tend to page fault at all 16 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis until there is a real lack of memory, any page faulting here should get your attention focused on the memory resource. Cache Hit Rate Plot The Cache Hit Rate plot (Figure 15) shows the percentage of reads that were satisfied by data already in the cache. Figure 15. Cache Hit Rate Plot Cache Read Hit % is an expression metric (“Mem Cache Read Hits/Sec” / (“Mem Cache Read Hits/Sec” + “Mem Cache Read Misses/Sec”) * 100). The disk cache is a part of the module main memory that is set aside to provide a buffer between the physical disks and the applications. When an application goes to read a record from a block(s) in cache, it either finds it there (a hit) or it has to wait for the IO subsystem to bring it into memory (a miss). This metric reports the percentage of requests that did not require an I/O (hits). Analysis 17 Analysis Wired/Unwired Pages Plot The Wired/Unwired Pages plot (Figure 16) reports the number of 4096 byte pages that are wired, unwired, and free. Figure 16. Wired/Unwired Pages Plot Mem Wired Pages tells you how many pages of memory are wired. A wired page is a page that cannot be paged out. Critical sections of VOS (the paging code) are in wired memory. User code cannot wire memory. This count does not include memory used by the cache manager (see “Mem Cache Phys Pages”) or the pageable parts of VOS (see “Mem System Pages”). Mem Unwired Used Pages is an expression metric “Mem Total Pages” – “Mem Wired Pages” – “Mem Free Pages”) that tells you how many pages of memory are unwired and in use. (Free pages are also considered unwired.) An unwired page is the opposite of a wired page in that it is a page that can be paged out. 18 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Workload Memory Usage TopList The Workload Memory Usage TopList (Figure 17) reports memory statistics for the workloads defined in FRTLHOME>etc>agentmgr.conf. (You should modify agentmgr.conf to capture the work done by your applications. See Chapter 4 of the Power Agent section of this User’s Guide for a description of defining workloads.) Figure 17. Workload Memory Usage TopList Shared mem tells you the shared memory pages in use by the workload. When two or more processes run the same .pm, they share one copy of each code page. Also, this meter counts the pages of memory explicitly shared when processes use SVM (Shared Virtual Memory). Unshared mem tells you the number of unshared memory pages in use by the workload. These are typically pages of memory that hold the variables, arrays and data structures of the program. NOTE If two processes are running the same .pm, they share the code pages unless the .pm is physically located on another module. When that happens, each process stores a private copy of the entire .pm in the paging area. This slows down startup and wastes a lot of memory and paging space. Flts/Sec tells you the page faults generated by the workload, expressed as a rate per second. Page faults can be a symptom of insufficient memory, an application memory leak, or they can be caused by an application design where memory is used for a short period and then returned, thus causing page faults on each repeated allocate. Typically, if the page faulting is widespread among most processes, there is a shortage of memory. If only one process is page faulting, that can be fixed with a code change. Analysis 19 Analysis I/O Page The I/O Page is designed to provide details about the activity and performance of the I/O subsystem. It contains seven plots and two TopLists. Disk I/O Plot The Disk I/O plot (Figure 18) shows the overall I/O activity on the system. Figure 18. Disk I/O Plot Disk IOs/Sec tells you the number that is the sum of all the process reads and writes plus all the I/Os generated by paging activity. A process is charged for a read when it requests a block of disk that is not in the cache manager. A process is charged for a write when it is the first process to modify (dirty) a block in the cache manager. This value is the same as what you get in the VOS command display_system_usage. 20 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Disk % Busy Plot NOTE This plot will probably need to be adjusted for your system. If there are more than 32 disks on your module, the metrics will not resolve. The scales may also need to be adjusted. The Disk % Busy plot (Figure 19) shows the percentage of time during the interval that each disk was busy. Figure 19. Disk % Busy Plot Disk % Busy tells you how busy individual disks are. Disks should not be run higher than 50% busy for acceptable performance. Analysis 21 Analysis Disk I/O TopList The Disk I/O TopList (Figure 20) reports various statistics for each disk drive on the system. Figure 20. Disk I/O TopList % Busy tells you how busy individual disks are. Disks should not be run higher than 50% busy for acceptable performance. I/O Rate is the rate of the total number of read and write operations to a given disk. This number includes file and paging IO. This number does not include the extra IOs needed to do the verify operation on the disks. Avg Resp is the amount of time in milliseconds that a disk request took to complete for a given disk. This time is composed of the average time it took to actually perform the disk IO (service time) and the average time a given request had to wait because the disk was busy doing other I/Os that were previously queued. Avg Svc is the average amount of time in milliseconds that a disk took to perform a given IO. Regardless of how busy a disk is, this number should be fairly constant as it ignores the time spent waiting as previously queued requests are processed. Some small change might be seen if application IOs happen to find the read/write heads on cylinder more often. Avg Wait is the amount of time in milliseconds that the average IO request had to wait as previously queued requests are processed. The busier the disk, the more likely this number will increase (assuming multiple sources of requests). This meter shows you the “price” you pay for running that disk, that busy. Q Length is the average number of IO requests in the queue for a given disk. Degradation gives you an idea of the performance penalty you are paying as multiple processes compete for the disk. It normalizes queue time with this basic formula: Degradation = ((serv time + queue time) / serv time) 22 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis It will never be less that 1 (it equals 1 if queue = 0), and will be 2 when queuing time equals service time (50% busy in theory). But, an average queue time of 5 msec may be okay if serv time = 30 msec (degradation =1.16) but bad if serv time = 10 msec (degradation = 1.5). Numbers above 1.5 are a flag of potential problems. Concurrency is a number similar to utilization, but is based on the overall response time, not just the service time ((serv time + queue time) / interval). The name implies it should give some measure of “how many users are visiting the service center.” The number can grow above 1, as queuing gets really bad. If the server is 100% busy (10 seconds busy time in a 10-second interval), and there is an additional 5 seconds of queuing, the concurrency is 15/10 = 1.5, or on average, there were 1.5 I/Os at the disk. File I/O Plot The File I/O plot (Figure 21) shows the overall I/O activity on the system, broken down onto the components described below. Figure 21. File I/O Plot Disk User Reads/Sec tells you the number of reads per second that all user processes were charged with. A process is charged for a read when it requests a block of disk that is not in the cache manager and thus a physical I/O is required. Disk User Writes/Sec tells you the number of writes per second that all user processes were charged with. A process is charged for a write when it is the first process to modify (dirty) a block in the cache manager and thus it needs to be written out sometime in the future. Analysis 23 Analysis Disk Sys Reads/Sec tells you the number of reads per second that all system processes were charged with. A process is charged for a read when it requests a block of disk that is not in the cache manager and thus a physical I/O is required. Disk Sys Writes/Sec tells you the number of writes per second that all system processes were charged with. A process is charged for a write when it is the first process to modify (dirty) a block in the cache manager and thus it needs to be written out sometime in the future. Disk Svr Reads/Sec tells you the number of reads per second that all server processes (StrataLink, StrataNet, and OSL) were charged with. A process is charged for a read when it requests a block of disk that is not in the cache manager and thus a physical I/O is required. Disk Svr Writes/Sec tells you the number of writes per second that all server processes (StrataLink, StrataNet, and OSL) were charged with. A process is charged for a write when it is the first process to modify (dirty) a block in the cache manager and, thus, it needs to be written out sometime in the future. Disk Free Space (MB) Plot NOTE This plot will probably need to be adjusted for your system. If there are more than 32 disks on your module, the metrics will not resolve. The scales may also need to be adjusted. The Disk Free Space (MB) plot (Figure 22) shows the number of megabytes available on each disk. Figure 22. Disk Free Space (MB) Plot 24 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Disk Free Space (MB) is an expression metric (“Disk FSize MB[]”-”Disk FUsed MB[]”) that tells you how much space (in megabytes) is available on each disk. Running out of free disk space crashes applications. A sufficient reserve should be maintained. Wkld Disk I/O TopList The Wkld Disk I/O TopList (Figure 23) reports I/O statistics by the workloads defined in FRTLHOME>etc>agentmgr.conf. (You should modify agentmgr.conf to capture the work done by your applications. See Chapter 4 in the Power Agent section of this User’s Guide for a description of defining workloads.) Figure 23. Wkld Disk I/O TopList % Busy shows the percentage of time all the processes that make up this workload were reading from disk. It takes the Wkld Disk Rd Busy Time and expresses it as a percentage of the sample interval. Reads shows the total number of reads to disk for the file system during the interval for the entire workload. A read is charged to a process when it does not find the record it is looking for in cache and has to go all the way to disk to get the block(s) that contain the record. Writes shows the total number of writes to disk for the file system during the interval for the entire workload. A process is charged for a write when it is the first process to modify a block in the cache manager. Until that modified block is written to disk, all subsequent writers to the file are not charged for the write. So, if your application does 50 writes in one second to the same record in a single disk block the application is only charged for one write. If lots of processes are writing to that same record, only one will get charged for doing the first modification to that block. Eventually, the cache manager will write that block out to disk and then the next process to write to that block gets charged for a write. Analysis 25 Analysis Res Time shows the total amount of time during the sample that all processes in the workload were doing reads. This time includes the time waiting to read and the time spent actually doing the reads. Busy Time shows the total amount of time during the sample that all the processes in the workload were actually reading from disk. This does not include the time they spent waiting for their reads to start because I/Os were ahead of them in the queue. Wait Time shows the total amount of time during the sample that all processes in the workload were waiting to access the disk because other I/Os were ahead of them in the queue. This time does not include the time spent actually doing the reads. Member Count Plot NOTE This plot will probably need to be adjusted for your system. If there are more than 32 disks on your module, the metrics will not resolve. The scales may also need to be adjusted. The Member Count plot (Figure 24) is a minimized plot that shows the number of members associated with each disk drive. Figure 24. Disk % Busy Plot Disk Members reports the number of logical members on the disk. 26 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis % Free Space Plot NOTE This plot will probably need to be adjusted for your system. If there are more than 32 disks on your module, the metrics will not resolve. The scales may also need to be adjusted. The % Free Space plot (Figure 25) shows the percentage of space available on each disk. Figure 25. % Free Space Plot % Free Space is an metric variable (100 – ((“Disk FUsed MB[]” / “Disk FSize MB[]”) * 100) that tells you what percentage of space is available on each disk. Running out of free disk space crashes applications. A sufficient reserve should be maintained. Analysis 27 Analysis Avg Q Length by Disk Plot NOTE This plot will probably need to be adjusted for your system. If there are more than 32 disks on your module, the metrics will not resolve. The scales may also need to be adjusted. The Avg Q Length by Disk plot (Figure 26) shows the average number of items queued for each disk drive. Figure 26. Avg Q Length by Disk Plot Disk Avg Queue Length tells you the average number of IO requests in the queue for a given disk. 28 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Cache Page The Cache Page displays detailed information about the VOS cache memory. It consists of two plots and a TopList window. Cache Activity and Hit Rate Plot The Cache Activity and Hit Rate plot (Figure 27) shows the overall activity to the system cache, and the effectiveness of the cache. Figure 27. Cache Activity and Hit Rate Plot Mem Cache Read Hits/Sec tells you how many times per second during the interval a process on the module asked for some part of the file structure and it found it in cache memory. This could be a file, an index or a directory block. Hits are good for performance, misses (see “Mem Cache Read Misses/Sec”) are bad. Mem Cache Read Misses/Sec tells you how many times per second during the interval a process on the module asked for some part of the file structure and did NOT find it in cache memory. This could be a file, an index, or a directory block. Hits (see “Mem Cache Read Hits/Sec”) are good for performance, misses are bad. Cache Read Hit % is an expression metric (“Mem Cache Read Hits/Sec” / (“Mem Cache Read Hits/Sec” + “Mem Cache Read Misses/Sec”) * 100). The disk cache is a part of the module main memory that is set aside to provide a buffer between the physical disks and the applications. When an application goes to read a record from a block(s) in cache it either finds them there (a hit) or has to wait for the IO subsystem to bring them into memory (a miss). This metric reports the percentage of requests that did not require an I/O, (hits). Analysis 29 Analysis Cache Soils Plot The Cache Soils plot (Figure 28) shows the rate at which cache pages are updated (written to), as described below. Figure 28. Cache Soils Plot Mem Cache Soiled/Sec tells you how many times per second during the interval a process on the module was the first process to write into a file system block. Because the cache manager buffers writes to disk, it is perfectly possible that many other processes will write to a given block before it is flushed to disk. But, only the first write is counted here. Once the cache manager writes that block out to disk, the next process that writes to that block will be “charged” with a cache soiled and the process starts all over again. This could be a file, an index, or a directory block. This number is interesting because it meters the blocks that will eventually have to be written to disk, regardless of how many times the processes wrote data into those blocks. 30 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Workload Cache Usage TopList The Workload Cache Usage TopList (Figure 29) reports cache memory usage and performance by the workloads defined in FRTLHOME>etc>agentmgr.conf. (You should modify agentmgr.conf to capture the work done by your applications. See Chapter 4 in the Power Agent section of this User’s Guide for a description of defining workloads.) Figure 29. Wkld Cache Usage TopList Read Hit % is an expression metric (“Wkld Cache Read Hits[]”)/(“Wkld Cache Read Misses[]”+”Wkld Cache Read Hits[]”) *100)) that shows the percentage of read requests that were satisfied with data already in the cache for all of the processes that make up the workload. Soiled/Sec tells you how many times per second during the interval all the processes in the workload were charged for a cache soil, or write. Only the first process to write into a file system block is charged with the I/O. Once the cache manager writes that block out to disk the next process that writes to that block will be charged with a cache soiled and the process starts all over again. This could be a file, an index, or a directory block. This number is interesting because it meters the blocks that will eventually have to be written to disk, regardless of how many times the processes wrote data into those blocks. Hits/Sec tells you how many times per second during the interval all the processes in the workload asked for some part of the file structure and it found it in cache memory. This could be a file, an index, or a directory block. Misses/Sec tells you how many times per second during the interval all the processes in the workload asked for some part of the file structure and did NOT find it in cache memory. This could be a file, an index, or a directory block. Analysis 31 Analysis Workloads Page The Workloads Page provides statistics on your applications, as defined in FRTLHOME>etc>agentmgr.conf. (You should modify agentmgr.conf to capture the work done by your applications. See Chapter 4 in the Power Agent section of this User’s Guide for a description of defining workloads.) CPU Usage by Workload Plot The CPU Usage by Workload plot (Figure 30) shows the overall CPU usage for all the processes that make up the workload. Process and workload metrics are all on a per CPU basis, so this plot can scale from 0 to 100 * Number of CPUs. Figure 30. CPU Usage by Workload Plot Wkld % CPU tells you what percentage of the total available CPU resource the workload is consuming. It is reported on a scale from 0 to 100 * Number of CPUs. 32 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Task Count by Workload Plot The Task Count by Workload plot (Figure 31) reports the total number of processes active in the workload. Figure 31. Task Count by Workload Plot Wkld Total tells the total number of processes in the workload. This value is only sampled once per interval so short -lived processes might get missed. Analysis 33 Analysis Workloads TopList The Workloads TopList (Figure 32) reports the following statistics for the workload: Figure 32. Task Count by Workload Plot % CPU tells you what percentage of the total available CPU resource the workload is consuming. It is reported on a scale from 0 to 100 * Number of CPUs. Flts/Sec tells you the page faults generated by the workload, expressed as a rate per second. Page faults can be a symptom of insufficient memory, an application memory leak or they can be caused by an application design where memory is used for a short period and then returned, thus causing page faults on each repeated allocate. Typically, if the page faulting is widespread among most processes there is a shortage of memory. If only one process is page faulting, that can be fixed with a code change. Rds/Sec shows the total number of reads to disk for the file system during the interval for the entire workload. A read is charged to a process when it does not find the record it is looking for in cache and has to go all the way to disk to get the block(s) that contain the record. Wrts/Sec shows the total number of writes to disk for the file system during the interval for the entire workload. A process is charged for a write when it is the first process to modify a block in the cache manager. Until that modified block is written to disk, all subsequent writers to the file are not charged for the write. So, if your application does 50 writes in one second to the same record in a single disk block the application is only charged for one write. If lots of processes are writing to that same record, only one will get charged for doing the first modification to that block. Eventually, the cache manager will write that block out to disk and then the next process to write to that block gets charged for a write. Procs shows the total number of processes in the workload. This value is only sampled once per interval, so short-lived processes might get missed. % CPU Busy reports the CPU busy time for this workload, expressed as a percentage. It is reported on a scale from 0 to 100 * Number of CPUs. 34 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Wkld Memory Usage Plot The Wkld Memory Usage plot (Figure 33) reports the total number of memory pages in use by the workload. Figure 33. Wkld Memory Usage Plot Wkld Mem Usage is an expression metric (“Wkld Shared Memory[]” + “Wkld Unshared Memory[]”) that reports the total number of pages in use for all processes that make up the workload. Analysis 35 Analysis Reads by Workload Plot The Reads by Workload plot (Figure 34) reports the rate of read I/Os generated by the workload. Figure 34. Reads by Workload Plot Wkld Reads reports the total number of reads to disk for the file system during the int erval for all the processes that make up the workload. A read is charged to a process when it does not find the record it is looking for in cache and has to go all the way to disk to get the block(s) that contain the record. 36 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Writes by Workload Plot The Writes by Workload plot (Figure 35) reports the rate of read I/Os generated by the workload. Figure 35. Writes by Workload Plot Wkld Writes reports the total number of writes to disk for the file system during the interval for the entire workload. A process is charged for a write when it is the first process to modify a block in the cache manager. Until that modified block is written to disk, all subsequent writers to the file are not charged for the write. So, if your application does 50 writes in one second to the same record in a single disk block, the application is only charged for one write. If lots of processes are writing to that same record, only one will get charged for doing the first modification to that block. Eventually, the cache manager will write that block out to disk and then the next process to write to that block gets charged for a write. Analysis 37 Analysis Sample AutoAlert System: VOS.VTH To help you quickly get started using SightLine’s AutoAlert System, also known as threshold alarms, we provide you a pre-configured threshold system. It is in the C:\Program Files\FORTEL SightLine\Expert Advisor Vision\VOS directory. AutoAlert configuration files are stored in files with a .VTH extension. The sample threshold system is by necessity very basic, because there are few “rules of thumb” that can be applied to all VOS systems. Use the threshold system we provide as a starting point. You will probably need to modify it (or create your own) to set thresholds tailored to your unique environment. The descriptions we provide address key elements of the EA/V menus and dialogs. See the SightLine Expert Advisor/Vision User’s Guide for a complete discussion of thresholds, threshold systems, and time systems. VOS.VTH consists of eight thresholds. The description for this sample threshold system follows this format: These items are repeated for each metric in the threshold system. Metric: The metric to which the threshold is assigned. Value: The value that the metric must exceed in order to constitute a violation. To change the value, click inside the edit box, and change the number to the value you want the threshold to be. Priority: From 0 (lowest) to 99 (highest), the priority of this threshold with regard to other thresholds. To change a priority, click in the edit box, and change the number to reflect the priority you want for this threshold. Direction: Low or High — For metrics that must exceed their threshold to be in violation (such as CPU Busy), this should be set to high. For metrics that must go below their threshold to be in violation (Memory % Free), it should be set to low. Trigger after [n] secs of violation: The value of n specifies how long a violation must persist before EA/V will trigger its alarm. To change this, click inside the edit box and change n. Violation Message: The text that EA/V will display on screen and write to the Threshold Violation Log when a violation of this threshold occurs. To change this, click inside the Violation Message edit box, and change the message to suit your situation. Memory % Free 38 Value: 10 Priority: 99 Direction: Low Trigger After: 60 seconds Violation Message: There is less than 10% free memory. If usage can not be reduced, more memory may be needed. SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Page File % Free Value: 20 Priority: 50 Direction: Low Trigger After: 0 seconds Violation Message: Paging space is running low. Additional swap space may be needed. Cache Read Hit % Value: 90 Priority: 50 Direction: Low Trigger After: 60 seconds Violation Message: The read cache hit rate is low. Additional system cache may be needed. Page Fault Rate Value: 10 Priority: 77 Direction: High Trigger After: 60 seconds Violation Message: The page fault rate is high. Check the EventList for offending processes. CPU % Other Value: 5 Priority: 50 Direction: High Trigger After: 60 seconds Violation Message: CPU % Other is running high. Check for no-wait I/O and excessive interprocess communication. Analysis 39 Analysis CPU % Interrupts Value: 20 Priority: 66 Direction: High Trigger After: 60 seconds Violation Messa ge: Device interrupt processing is quite high. Check configuration of network and communication devices. Total CPU Utilization Value: 80 Priority: 99 Direction: High Trigger After: 60 seconds Violation Message: CPU usage is very high. Check the EventList for offending processes. CPU Wait-Busy Ratio Value: 1 Priority: 77 Direction: High Trigger After: 60 seconds Violation Message: More time waiting for CPU than using it. Use the EventList to see which processes are waiting. AutoAnalyze Rules and Reports This section describes the rules and reports used when AutoAnalyze is invoked. AutoAnalyze is an analysis and reporting tool that automatically looks for exception conditions and then generates recommendations and summary reports. Each exception has a common set of attributes: Condition: The metric(s) and value that causes the exception. Persistance: The duration that the exception must exist for it to be considered an exception. Report: .RTF file located in the c:\Program Files\FORTEL SightLine\Expert Advisor Vision\AANALYZE directory. Plot(s): OLE-linked plot(s) (.VPL file) located in the c:\Program Files\FORTEL SightLine\Expert Advisor Vision\AANALYZE directory. 40 SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Disk-Busy Condition: A disk is greater than 50% busy Persistance: 5 out of 10 intervals Report: shdskbsy.rtf Plot(s): shdskbsy.vpl CPU-Waiting Exceeds Running Condition: CPU Wait time is greater than CPU Busy time Persistance: 5 out of 10 intervals Report: scpuwait.rtf Plot(s): scpuwait.vpl Disk-Error(s) Detected Condition: Any fatal or data errors detected on a disk Persistance: Immediate Report: shdskerr.rtf Plot(s): shdskerr.vpl CPU-Busy Condition: The CPU is greater than 80% busy Persistance: 7 out of 10 intervalse Report: shtotcpu.rtf Plot(s): shtotcpu.rtf CPU-Too many Interrupts Condition: CPU % Interrupts is greater than 20% Persistance: 5 out of 5 intervals Report: shintcpu.rtf Plot(s): shintcpu.rtf Analysis 41 Analysis CPU-High Scheduler Overhead Condition: CPU % Other is greater than 5% Persistance: 5 out of 5 intervals Report: shothcpu.rtf Plot(s): shothcpu.vpl Memory-High Page Fault Rate Condition: Total Page Fault Rate is greater than 5 per second Persistance: 5 out of 5 intervals Report: shpagflt.rtf Plot(s): shpagflt.rtf Cache-Low Hit Rate Condition: Cache Read Hit Rate is less than 90% Persistance: 10 out of 10 intervals Report: slcachit.rtf Plot(s): slcachit.rtf Disk-File Sys Free Space Low Condition: A disk’s file space is less than 20% free Persistance: Immediate Report: slfilspc.rtf Plot(s): slfilspc.vpl Memory-Pagefile Space Low 42 Condition: The page file hass less than 20% free space Persistance: 10 out of 10 intervals Report: slpagspc.rtf Plot(s): slpagspc.rtf SightLine Power Agent for VOS Systems: Analysis (R464) Analysis Memory-Low Free Space Condition: Memory % Free is less than 10% Persistance: 10 out of 10 intervals Report: slfremem.rtf Plot(s): slfremem.vpl Analysis 43