Traphandling mit snmptt, Nagios und Wrapperscripts

Transcription

Traphandling mit snmptt, Nagios und Wrapperscripts
SNMP Traphandling für Nagios
mit snmptt und Shellscripts
Martin Fürstenau
[email protected]
1
September 22, 2006
SNMP Trap Handling für Nagios
Methode “Nagios”
■ Voraussetzung: snmptrapd muss installiert sein
■ Für jeden Trap ist in /etc/snmptrapd.conf ein handle zu
definieren mit folgender Syntax:
traphandle OID Eventhandler/Programm
■ Die Handler übergeben ihr Erbebnis an NSCA
■ Für jeden Handle ist ein Service Check zu übergeben
■ Fazit
■ Aufwendig
■ Schwer zu warten
■ Unflexibel
2
September 22, 2006
SNMP Trap Handling für Nagios
Besser mit snmptt (http://www.snmptt.org)
■ Voraussetzung:
■ snmptrapd muss installiert sein
■ snmptt muss installiert sein
■ Konfigurationsdateien f. snmptt werden per snmpttconvertmib aus
MIBs generiert.
■ Der Eventhandler submit_check_result sollte installiert sein
■ Nachrichtenfluss:
■ snmptrapd übergibt an snmptt
■ snmptt bereitet den Trap auf und ruft
■ einen Eventhandler (submit_check_result) auf
■ oder ein Script, das seinerseits einen Eventhandler
(submit_check_result) aufruft
■ der Eventhandler übermittelt das als passiven Check an Nagios
3
September 22, 2006
SNMP Trap Handling für Nagios
submit_check_result?? (1)
#!/bin/sh
# SUBMIT_CHECK_RESULT
# Written by Ethan Galstad ([email protected])
# Last Modified: 02-18-2002
#
# This script will write a command to the Nagios command
# file to cause Nagios to process a passive service check
# result.
Note: This script is intended to be run on the
# same host that is running Nagios.
If you want to
# submit passive check results from a remote machine, look
# at using the nsca addon.
#
# Arguments:
#
$1 = host_name (Short name of host that the service is
#
associated with)
#
$2 = svc_description (Description of the service)
#
$3 = return_code (An integer that determines the state
#
of the service check, 0=OK, 1=WARNING, 2=CRITICAL,
#
3=UNKNOWN).
#
$4 = plugin_output (A text string that should be used
#
as the plugin output for the service check)
4
September 22, 2006
SNMP Trap Handling für Nagios
submit_check_result?? (2)
echocmd="/bin/echo"
CommandFile="/var/spool/nagios/nagios.cmd"
# get the current date/time in seconds since UNIX epoch
datetime=`date +%s`
# create the command line to add to the command file
cmdline="[$datetime] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4"
# append the command to the end of the command file
`$echocmd $cmdline >> $CommandFile`
5
September 22, 2006
SNMP Trap Handling für Nagios
Konfiguration
■ snmptrapd.conf:
■ traphandle default /usr/sbin/snmptthandler
■ snmptt.ini nach Handbuch
■ Wesentlich d. letzte Abschnitt:
[TrapFiles]
# A list of snmptt.conf files (this is NOT the snmptrapd.conf file).
The COMPLETE path
# and filename. Ex: '/etc/snmp/snmptt.conf'
snmptt_conf_files = <<END
/etc/snmptt/snmptt.conf.apx
END
6
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Szenario APX
■ 1 W2K-Server (1 x produktiv, 1 x icecold standby) Poing
■ 1 Testserver Venlo NL
■ 4 Traps
■ Start
■ Stop
■ Heartbeat
■ Alle abgebrochenen Jobs und ihre Probleme
■ Nur Traps vom produktiven Server sollen akzeptiert werden
■ Inhalte wie z.B. Abbruchcode, Severity etc. im Messagestring
des Traps
7
September 22, 2006
SNMP Trap Handling für Nagios
Konvertieren d. benötigten MIBS mit snmpttconvertmib
■ Beispielergebnis /etc/snmptt/snmptt.conf.apx (1):
#
MIB: APX-MIB (file:./apmAPX.mib) converted on Fri Nov 4 16:00:13 2005
using snmpttconvertmib v1.0
#
EVENT apxControlStart .1.3.6.1.4.1.1024.2.24.11.10.0.1081 "Status
Events" Normal
FORMAT Control started $*
EXEC /usr/lib/nagios/ops_plugins/apx_start_trap_wrapper $r
SDESC
Control started
Variables:
1: apxControlState
2: apxTrapID
EDESC
#
8
September 22, 2006
SNMP Trap Handling für Nagios
Konvertieren d. benötigten MIBS mit snmpttconvertmib
■
Beispielergebnis /etc/snmptt/snmptt.conf.apx (2):
#
EVENT apxControlShutdown .1.3.6.1.4.1.1024.2.24.11.10.0.1082 "Status
Events" Normal
FORMAT Control shutdown $*
EXEC /usr/lib/nagios/ops_plugins/apx_stop_trap_wrapper $r
SDESC
Control shutdown
Variables:
1: apxControlState
2: apxTrapID
EDESC
#
9
September 22, 2006
SNMP Trap Handling für Nagios
Konvertieren d. benötigten MIBS mit snmpttconvertmib
■
Beispielergebnis /etc/snmptt/snmptt.conf.apx (3):
#
EVENT apxHeartbeat .1.3.6.1.4.1.1024.2.24.11.10.0.1088 "Status
Events" Normal
FORMAT Control heartbeat $*
EXEC /usr/lib/nagios/ops_plugins/apx_heartbeat_trap_wrapper $r
SDESC
Control heartbeat
Variables:
1: apxControlState
2: apxLastAction
3: apxTrapID
EDESC
#
10
September 22, 2006
SNMP Trap Handling für Nagios
Konvertieren d. benötigten MIBS mit snmpttconvertmib
■
Beispielergebnis /etc/snmptt/snmptt.conf.apx (4):
#
EVENT apxJobCancelled .1.3.6.1.4.1.1024.2.24.11.11.0.2004 "Status
Events" Normal
FORMAT Job cancelled $*
EXEC /usr/lib/nagios/ops_plugins/apx_service_trap_wrapper $r $3 $5
$2 $7 $1
SDESC
Job cancelled
Variables:
1: apxTrapID
2: apxJobkey
3: apxJobname
4: apxJobOID
5: apxCondCode
6: apxJobstate
7: apxGroup
EDESC
11
September 22, 2006
SNMP Trap Handling für Nagios
Die Logfiles (1)
■ snmptrapd.log
2006-09-03 08:32:50 ops-apx01.ops.oce.net [10.53.11.121] (via UDP:
[10.53.11.121]:1048) TRAP, SNMP v1, community OCE_PO .
1.3.6.1.4.1.1024.2.24.11.10 Enterprise Specific Trap (1081) Uptime: 450
days, 4:48:25.32
.1.3.6.1.4.1.1024.2.24.11.1.1 = STRING: "1081"
.
1.3.6.1.4.1.1024.2.24.11.1.201 = STRING: "1,5,0,141" .
1.3.6.1.4.1.1024.2.24.11.1.202 = STRING: "RUN"
.
1.3.6.1.4.1.1024.2.24.11.1.203 = STRING: "03.09.2006 08:32:52"
■ snmptt.log
Sun Sep 3 08:32:50 2006 .1.3.6.1.4.1.1024.2.24.11.10.0.1081 Normal
"Status Events" ops-apx01 - Control started 1081 1,5,0,141 RUN
03.09.2006 08:32:52
12
September 22, 2006
SNMP Trap Handling für Nagios
Die Logfiles (2)
■ snmptrapd.log
2006-09-03 08:32:57 ops-apx01.ops.oce.net [10.53.11.121] (via UDP:
[10.53.11.121]:1067) TRAP, SNMP v1, community OCE_PO
.1.3.6.1.4.1.1024.2.24.11.10 Enterprise Specific Trap (1088) Uptime: 450
days, 4:48:32.93
.1.3.6.1.4.1.1024.2.24.11.1.1 = STRING: "1088"
.
1.3.6.1.4.1.1024.2.24.11.1.201 = STRING: "1,5,0,141" .
1.3.6.1.4.1.1024.2.24.11.1.202 = STRING: "RUN"
.
1.3.6.1.4.1.1024.2.24.11.1.203 = STRING: "03.09.2006 08:32:59"
■ snmptt.log
Sun Sep 3 08:32:58 2006 .1.3.6.1.4.1.1024.2.24.11.10.0.1088 Normal
"Status Events" ops-apx01 - Control heartbeat 1088 1,5,0,141 RUN
03.09.2006 08:32:59
13
September 22, 2006
SNMP Trap Handling für Nagios
Die Logfiles (3)
■ snmptrapd.log
2006-09-06 12:15:33 ops-apx01.ops.oce.net [10.53.11.121] (via UDP:
[10.53.11.121]:3344) TRAP, SNMP v1, community OCE_PO
.1.3.6.1.4.1.1024.2.24.11.11 Enterprise Specific Trap (2004) Uptime: 453
days, 8:31:06.25
.1.3.6.1.4.1.1024.2.24.11.1.1 = STRING: "2004"
.
1.3.6.1.4.1.1024.2.24.11.1.101 = STRING: "1858194"
.
1.3.6.1.4.1.1024.2.24.11.1.102 = STRING: "RP651700_1"
.
1.3.6.1.4.1.1024.2.24.11.1.103 = STRING:
"0x081190973FFC4642B68B026E2402423D" .1.3.6.1.4.1.1024.2.24.11.1.104 =
STRING: "ABAP"
.1.3.6.1.4.1.1024.2.24.11.1.106 = STRING: "?"
.
1.3.6.1.4.1.1024.2.24.11.1.107 = STRING: "MNLsev1"
■ snmptt.log
Wed Sep 6 12:15:33 2006 .1.3.6.1.4.1.1024.2.24.11.11.0.2004 Normal
"Status Events" ops-apx01 - Job cancelled 2004 1858194 RP651700_1
0x081190973FFC4642B68B026E2402423D ABAP ? MNLsev1
14
September 22, 2006
SNMP Trap Handling für Nagios
Define service (siehe snmptt Doku)
define service{
host_name
service_description
Description
server01
Name of host
TRAP
Name of service. What you use here must
match the same value for the
<submit_check_result> script
is_volatile
1
Enables volatile services
check_command
check-host-alive Used to reset the status to OK when
schedule an immediate check of this service
is selected
max_check_attempts
1
Leave as 1.
normal_check_interval 1
Leave as 1
retry_check_interval
1
Leave as 1
passive_checks_enabled 1
Enables passive checks
check_period
none
When this servcie can be checked. Because
it is a passive service, it never needs to
be automatically checked
notification_interval 31536000
Notification interval.&nbsp; Set to a very high
number to prevent you from getting pages of
previously received traps (1 year - restart Nagios
at least once a year! - do not set to 0!).
notification_period
24x7
When you can be notified. Can be changed
notification_options w,u,c,r
Notify on warning, unknown, critical and recovery
notifications_enabled 1
Enable notifications
contact_groups cg_core
Name of contact group to notify
15
September 22, 2006
SNMP Trap Handling für Nagios
Definition Service Template (im Beispiel f.APX)
define service{
name
APX-SERVER
register
0
host_name
ops-apx01
passive_checks_enabled
1
notifications_enabled
1
is_volatile
1
check_period
none
check_freshness
1
max_check_attempts
1
normal_check_interval
1
flap_detection_enabled
0
retry_check_interval
1
contact_groups
apx-admins,apx-admins_sms
notification_interval
31536000
notification_period
24x7
}
define service{
name
APX-TRAP
use
APX-SERVER
register
0
# Nach 5 Stunden werden Traps v. Service Checks zureuckgesetzt
freshness_threshold
18000
check_command
apx_service_trap_reset
notification_options
c,w
}
16
September 22, 2006
SNMP Trap Handling für Nagios
Definition Service (im Beispiel f.APX)
define service{
use
service_description
}
17
APX-TRAP
RP818100
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Wrapper: apx_heartbeat_trap_wrapperefinition (1)
#!/bin/bash
# Author: M.Fuerstenau, OPS, Poing
# Purpose:
# Will ge startet from snmptt (Config-file /etc/snmptt/snmptt.conf.apx)
# Instead of being started directly from the mentioned config file
# the use of this wrapper was necessary due to the fact that the
# service name for the Nagios service has to be generated and
# that strings like "APXsev1" must be mapped to the appropriate
# Nagios return code
HOST=$1
NAGIOS_DIR=/etc/nagios
# Pruefen, ob es den sendenden Host ueberhaupt gibt. Sonst raus.
grep ^$HOST$ $NAGIOS_DIR/hosts/* 1>&2 > /dev/null
RETCO=$?
if [ $RETCO -ne 0 ]
then
exit 1
fi
18
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Wrapper: apx_heartbeat_trap_wrapperefinition (2)
# IP des produktiven Host ermitteln
PROD_HOST=$(grep host_name $NAGIOS_DIR/services/APX-TRAP-template.cfg | awk
'{print $2}')
HOST_PROD_CFG=$(grep ^$PROD_HOST$ $NAGIOS_DIR/hosts/* | grep host_name |
awk '{print $1}' | sed 's/:$//')
PROD_IP=$(grep address $HOST_PROD_CFG | grep -v '#' | awk '{print $2}')
# IP des anfragenden Hosts ermitteln (da der mal mit Hostname (Poing) oder
IP (Venlo) daherkummt)
STR_2_GREP=$(grep ^$HOST$ $NAGIOS_DIR/hosts/* | grep -v alias |awk '{print
$2}')
HOST_ASK_CFG=$(grep ^$HOST$ $NAGIOS_DIR/hosts/* | grep $STR_2_GREP | awk
'{print $1}' | sed 's/:$//')
ASK_IP=$(grep address $HOST_ASK_CFG | grep -v '#' | awk '{print $2}')
if [ "$PROD_IP" = "$ASK_IP" ]
then
/usr/lib/nagios/eventhandlers/submit_check_result $HOST APX_Heartbeat 0
"APX is running"
exit 0
else
exit 1
fi
19
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel (Re)set: apx_heartbeat_alarm_set
#!/bin/sh
HOST=$1
/usr/lib/nagios/eventhandlers/submit_check_result $HOST APX_Heartbeat
2 "Heartbeat signal from APX is missing. Maybe system is not
running!"
exit 2
20
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Wrapper: apx_service_trap_wrapper (1)
#!/bin/bash
# Author: M.Fuerstenau, OPS, Poing
# 18.11.2005
#
# Purpose:
# Will ge startet from snmptt (Config-file /etc/snmptt/snmptt.conf.apx)
# Instead of being started directly from the mentioned config file
# the use of this wrapper was necessary due to the fact that the
# service name for the Nagios service has to be generated and
# that strings like "APXsev1" must be mapped to the appropriate
# Nagios return code
# Anzahl uebergebener Argumente pruefen
if [ $# -ne 6 ]
then
echo $*
echo "Wrong number of arguments - exiting"
exit 1
fi
21
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Wrapper: apx_service_trap_wrapper (2)
n
# Uebergebene Argumente Variablen zuordne
HOST=$1
APX_JOB_NAME=$2
APX_COND_CODE=$3
APX_JOB_KEY=$4
APX_GROUP=$5
APX_TRAP=$6
# Globale Variablen setzen
NAGIOS_DIR=/etc/nagios
HAMLET=0
# Pruefen, ob es den sendenden Host ueberhaupt gibt. Sonst raus.
grep ^$HOST$ $NAGIOS_DIR/hosts/* 1>&2 > /dev/null
RETCO=$?
22
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Wrapper: apx_service_trap_wrapper (3)
if [ $RETCO -ne 0 ]
then
exit 1
fi
# IP des produktiven Host ermitteln
PROD_HOST=$(grep host_name $NAGIOS_DIR/services/APX-TRAP-template.cfg | awk
'{print $2}')
HOST_PROD_CFG=$(grep ^$PROD_HOST$ $NAGIOS_DIR/hosts/* | grep host_name |
awk '{print $1}' | sed 's/:$//')
PROD_IP=$(grep address $HOST_PROD_CFG | grep -v '#' | awk '{print $2}')
# IP des anfragenden Hosts ermitteln (da der mal mit Hostname (Poing) oder
IP (Venlo) daherkummt)
STR_2_GREP=$(grep ^$HOST$ $NAGIOS_DIR/hosts/* | grep -v alias |awk '{print
$2}')
HOST_ASK_CFG=$(grep ^$HOST$ $NAGIOS_DIR/hosts/* | grep $STR_2_GREP | awk
'{print $1}' | sed 's/:$//')
ASK_IP=$(grep address $HOST_ASK_CFG | grep -v '#' | awk '{print $2}')
23
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Wrapper: apx_service_trap_wrapper (4)
if [ $RETCO -ne 0 ]
then
exit 1
fi
# IP des produktiven Host ermitteln
PROD_HOST=$(grep host_name $NAGIOS_DIR/services/APX-TRAP-template.cfg | awk
'{print $2}')
HOST_PROD_CFG=$(grep ^$PROD_HOST$ $NAGIOS_DIR/hosts/* | grep host_name |
awk '{print $1}' | sed 's/:$//')
PROD_IP=$(grep address $HOST_PROD_CFG | grep -v '#' | awk '{print $2}')
# IP des anfragenden Hosts ermitteln (da der mal mit Hostname (Poing) oder
IP (Venlo) daherkummt)
STR_2_GREP=$(grep ^$HOST$ $NAGIOS_DIR/hosts/* | grep -v alias |awk '{print
$2}')
HOST_ASK_CFG=$(grep ^$HOST$ $NAGIOS_DIR/hosts/* | grep $STR_2_GREP | awk
'{print $1}' | sed 's/:$//')
ASK_IP=$(grep address $HOST_ASK_CFG | grep -v '#' | awk '{print $2}')
24
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Wrapper: apx_service_trap_wrapper (5)
# Katastophe oder Warnung - dass ist hier die Frage
case "$APX_GROUP" in
APXsev1) HAMLET=2 ;;
APXsev2) HAMLET=1 ;;
MNLsev1) HAMLET=2 ;;
MNLsev2) HAMLET=1 ;;
EUROCEsev1) HAMLET=2 ;;
EUROCEsev2) HAMLET=1 ;;
SMSTESTsev1) HAMLET=2 ;;
SMSTESTsev2) HAMLET=1 ;;
*) echo "Unknown groupcode - babe, it is impossible"
esac
25
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Wrapper: apx_service_trap_wrapper (6)
#
# Pruefen of der den Trap sendende Host gleich dem ueberwachten ist. Wenn
ja, weiter, sonst raus
if [ "$PROD_IP" = "$ASK_IP" ]
then
/usr/lib/nagios/eventhandlers/submit_check_result $HOST $APX_JOB_NAME
$HAMLET "Job $APX_JOB_NAME cancelled - Condition Code: $APX_COND_CODE APX JobName $APX_JOB_KEY - APX Group $APX_GROUP - APX Trap ID $APX_TRAP"
exit 0
else
exit 1
fi
26
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel Reset: apx_service_trap_reset
#!/bin/sh
echo "Ok. No error. No warning."
exit 0
27
September 22, 2006
SNMP Trap Handling für Nagios
Beispiel: Start und Stop
■ Auszug aus apx_start_trap_wrapper:
if [ "$PROD_IP" = "$ASK_IP" ]
then
/usr/lib/nagios/eventhandlers/submit_check_result $HOST
APX_Start_Stop 0 "APX started"
exit 0
else
exit 1
fi
■ Auszug aus apx_stop_trap_wrapper:
if [ "$PROD_IP" = "$ASK_IP" ]
then
/usr/lib/nagios/eventhandlers/submit_check_result $HOST
APX_Start_Stop 2 "APX shutdown"
exit 0
else
exit 1
fi
28
September 22, 2006

Similar documents