System Health Check [Compatibility Mode]
Transcription
System Health Check [Compatibility Mode]
PureData System for Analytics (PDA) System Health Check © 2014 IBM Corporation But first . . . Some general “How am I doing?” utilities What versions are running – NPS – nzrev – HPF – cat /nzlocal/scripts/version.txt – FDT – cat /opt/nz/fdt/version.txt Are you collecting query history? – Did you know that you can configure ODBC drivers to submit user data Are you using IBM Netezza Performance Portal? – With exception of configuring the event manager, it does as much and more than nzAdmin Review Event Manager configuration with every major release change nzhostbackup, nzraidcheck, nz_check_disk_scan_speeds nz_health, nzhealthcheck, nzhw –show issues nz_best_practices, nz_sysutil_history Call Home PureData System for Analytics (PDA) © 2014 IBM Corporation Agenda topics ✔ NPS and System Health Check versions ✔ Rules ✔ Running System Health Check © 2014 IBM Corporation NPS and System Health Check versions It is possible to run a later version of nzhealthcheck than what was installed by the NPS version upgrade nzhealthcheck and its rules are contained in the S/W Support Tools package See the following Technote for information on S/W Support Tools and versions of NPS supported https://www-304.ibm.com/support/entdocview.wss?uid=swg21668047 See the IBM Netezza System Administrator’s Guide, version 7.1 and later http://www-01.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.adm.doc/c_shc_overview.html http://www-01.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.adm.doc/c_shc_overview.html Visit Fix Central for PDF versions of the above manual reference. © 2014 IBM Corporation Level 1 rules Basic checks, usually platform-independent resolution Checking reported components states: (i.e. AMM goes offline, fan turns off, dimm reports error): amm, pwr, chassis fan, blower, switch fan, host cpu, disk, dimm, host fan, sas controller state, host power supply, spu, hba ... Checking basic counts and presence: SPU and CPU dimm Number of active FPGAs Check host error counters for disks, SAS controller Check cluster and DRBD state © 2014 IBM Corporation Level 2 rules Complex rule, specifying advice and problem description Complex logic, like identifying correct disc/spu/enclosure balance Specific per model logic Rule dependencies, e.g. don't report enclosure PHY's turned off, when you know, that the blade at the other side of SAS connection is down © 2014 IBM Corporation Running System Health Check Daemon to run nzhealthcheck started automatically by nzinit Can check on status of daemon, start or stop it via Linux service command service nzhealthcheck {start|stop|status|restart} Changes to admin password will require a restart If started manually, will be prompted for the root password Will prompt for root password upon initial execution. Will run w/o root password but some checks will not be performed. This is a requirement for some of the host device managers. © 2014 IBM Corporation Two Modes of Operation Diagnostic – default mode Monitoring No data gathering Automatic data collection No event generation Automatic rules evaluation no nzCallHome events Eventing generation nzCallHome events © 2014 IBM Corporation Call Home nzhealthcheck replaces the disk_monitor script nzOpenPMR can be activated for some events related to disk issues predictive disk failure notification based on LogPage 0x15 reporting predictive disk failure notification based on grown defects For 7.1 it is initiated by adding a line to CRONTAB /nz/kit/bin/adm/nzhealthcheck –p note the –p option Post 7.2 configuration is automatic © 2014 IBM Corporation Additional information gathered by nzhealthcheck If using the sysinfo option additional data is gathered, recommended for inclusion in PMRs Disk non-media error rate check SCSI error count check Transfer & error correction statistics (sysinfo report only) Firmware & hardware revisions (from sys_rev_check) Various host checks (from pts-check) Network connections (from concheck) Cluster check (from hpf_health) Regen status (from nzds) Verify availability of transaction ids in DB et cetera nzhealthcheck sysinfo Frontend Hosts Utilization and Statistics ************************************************************************ Host vendor : IBM model : -[7947AC1]uuid : 2EC092FB-9E52-3B99-86E3-F2578EB58039 serial_number: KQWWGHM hw_version : 00 role : primary rack_id :1 slot :2 Host Fans count dev_status name rpm slot status ---------------------------------------------------------2 Present Fan 1 2234 1 ok 2 Present Fan 2 1823 2 ok 2 Present Fan 3 2652 3 ok Host Power avg name p12 p3 p5 pBAT unit ------------------------------------------------------------26 Power Unit 12.15 3.34 5.04 3.09 ok Host SAS Controllers • • • © 2014 IBM Corporation File locations nzhealthcheck is part of the Software Support Tools Installed in /nz/kit/bin/adm pre NPS 7.2 The reports generated are in /nz/kit/log/nzhealthcheck Rules documented in /nz/kit/share/nzhealthcheck/rules-doc.pdf EventManager can be configured by editing the configuration file at /nz/kit/share/nzhealthcheck/nzhealthcheck.cfg Configuration at /nz/kit/share/nzhealthcheck/devmgrs/devmgrs.ini Example of small segment of configuration [nz@nz80409-h2 devmgrs]$ grep -A8 scsi_ls devmgr.ini [scsi_ls] triggeredOn = bom path = devmgrs/bin/adm/nz_query_logsense • • • npsStates = online,paused © 2014 IBM Corporation Trademarks, disclaimer, and copyright information IBM, the IBM logo, ibm.com, Current, and PureSystems are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of other IBM trademarks is available on the web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS OR SOFTWARE. © Copyright International Business Machines Corporation 2014. All rights reserved. PureData System for Analytics (PDA) © 2014 IBM Corporation