AARNet - QUESTnet

Transcription

AARNet - QUESTnet
Questnet Workshop on
Network Monitoring
Network Operations
Mike Groeneweg
[email protected]
Copyright AARNet Pty Ltd 2009
Network Monitoring
• Network monitoring - from Wikipedia, the free encyclopedia
The term network monitoring describes the use of a system that
constantly monitors a computer network for slow or failing
components and that notifies the network administrator in case
of outages via email, pager or other alarms. It is a subset of the
functions involved in network management.
g
2
Copyright AARNet Pty Ltd
Pty Ltd 2009
1
Network Monitoring
•
•
•
•
•
•
26 core routers (xxx-a/b-bb1) Juniper
12 core switches (Cisco 7609)
14 small core switches (Foundry FESX-424)
>120 CPE routers (Cisco 7304, Cisco 7604)
>120 CPE servers (Acer 1RU 300/310/510/520)
>70 optical circuits
Optical Monitoring – outsourced to Cisco (-> SOUL)
3
Copyright AARNet Pty Ltd
Pty Ltd 2009
AARNet Tools
7
Copyright AARNet Pty Ltd
Pty Ltd 2009
2
AARNet Monitoring Tools
• Nagios
• MRTG & Cricket
– Weathermap
• Netflow
– Nullarbor
•
•
•
•
•
8
IP Routing - RADB & Quagga
Smokeping Traceroute,
Smokeping,
Traceroute iPerf
Beacon/Dbeacon
Logging - Splunk
SOUL & Cisco
Copyright AARNet Pty Ltd
Pty Ltd 2009
1 Portal to rule them all…
9
Copyright AARNet Pty Ltd
Pty Ltd 2009
3
The other one - Corvu - Dashboard
Customer portal for Traffic Data
Based on Classified Netflow Data
Basis for billing
• http://dashboard.aarnet.edu.au/
.
10
Copyright AARNet Pty Ltd
Pty Ltd 2009
NAGIOS
11
Copyright AARNet Pty Ltd
Pty Ltd 2009
4
NAGIOS
Nagios Ain't Gonna Insist On Sainthood
• 2 primary monitoring servers (Perth, Sydney)
• 2 secondary monitoring servers (Melbourne, Sydney)
642 hosts
1678 services
• AARNet running Nagios 1.4.x, 2.x, 3.2.x
Forked recently – ICINGA.
12
Copyright AARNet Pty Ltd
Pty Ltd 2009
NAGIOS Plugins
• Use the usual suite provided by nagios-plugins 1.4.x
• Custom plugins:
– to check BGP sessions (ipv4)
– to check environmental conditions (effectively an SNMP check)
– to check on netflow (another SNMP check with some smarts)
13
Copyright AARNet Pty Ltd
Pty Ltd 2009
5
NAGIOS Notifications
• E-Mail to [email protected]
• SMS notifications to Ops Engineers
– Modified format for 160 characters
Nagios Text
SMS Substitute
PROBLEM
-
RECOVERY
+
ACKNOWLEDGEMENT
ACK
- lax-a-bb1
(202.158.192.20)
TEMP 'CRITICAL: Temp
59 degrees on
Routing Engine 0' at
28-08-2009 06:36:39
Los Angeles
(Telehouse) POP A
Backbone Router
• Falcomm Tango GSM modem (www.esis.com.au)
• Smstools (http://smstools.meinemullemaus.de/)
14
Copyright AARNet Pty Ltd
Pty Ltd 2009
AARNet POP Setup
15
Copyright AARNet Pty Ltd
Pty Ltd 2009
6
AARNet Core Monitoring – per region POPs
•
•
•
•
•
•
•
•
16
Loopback addresses of Backbone Routers
Loopback addresses of Distribution Switches
10G Trunk interfaces on intra-capital circuit
10G interfaces between Backbone Router & Switch at each POP
All WAN interfaces (SDH to next State or Overseas)
All servers
Access router
Any peering routers
Copyright AARNet Pty Ltd
Pty Ltd 2009
AARNet CPE Basics – single connection
17
Copyright AARNet Pty Ltd
Pty Ltd 2009
7
AARNet CPE Monitoring – single connection
• For a single connection:
1.
2.
3.
4.
5.
6.
7.
8.
18
Gig1 – Customer Interface
Gig 2 – edge server interface
Loopback interface of CPE router
BGP Status of CPE router
IPv6 interface of CPE router
Gig port of Core Switch
(optional) GHIP/Mux port
(optional) Port Based Service interface
Copyright AARNet Pty Ltd
Pty Ltd 2009
AARNET CPE Basics – dual connection
19
Copyright AARNet Pty Ltd
Pty Ltd 2009
8
AARNet CPE Monitoring – dual connections
• For a dual connection:
1.
2.
3.
4.
5.
6.
7.
8.
20
Gig1 – Customer Interface
Gig 2 – edge server interface
Loopback interface of CPE routers
BGP Status of CPE router
IPv6 interface of CPE router
Gig port of Backbone Router
(optional) GHIP/Mux port
(optional) Port Based Service interface
Copyright AARNet Pty Ltd
Pty Ltd 2009
Nagios Configuration
• Now controlled by cfengine
• Each router, switch, target is a separate config file
#
# $Id: cpe-aarnet-er3.cfg 2815 2008-05-14 14:51:31Z mkg $
#
define host {
use
critical-host
host_name
cpe-aarnet-er3
alias
WA APL Edge Router
address
202.158.192.217
parents
per-a-bb1, per-b-bb1
hostgroups
apl-office-routers,edge-routerscisco7300
}
define service{
use
host_name
service_description
contact_groups
check_command
}
21
critical-service
cpe-aarnet-er3
BGP
network-eng
bgp_peer!public
define service{
use
critical-service
host_name
cpe-aarnet-er3
service_description
Gi1-APL-Perth
contact_groups
network-eng
notification_options
c,r
check_command
ifOperStatus!public!4
}
define host {
use
host_name
_
hostgroups
alias
address
parents
}
critical-ipv6-host
cpe-aarnet-er3-IPv6
p
router-v6
cpe-aarnet-er3 IPv6
2001:388:1::d9
cpe-aarnet-er3
Copyright AARNet Pty Ltd
Pty Ltd 2009
9
Looking in from the outside
AARNet runs a Nagios instance on a server outside of the
network.
Is routing working?
Nagios
N
i 2.x
2
to be upgraded to 3.x
not synced for configuration.
22
Copyright AARNet Pty Ltd
Pty Ltd 2009
Is Nagios the best?
http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems
Contact management for AARNet Ops is an issue.
blasts of SMS alarms, going to all engineers
should use service heirarchy, not implemented.
Looking to regionalise alarms – may need to write backend plugin.
ICINGA?
Nagios is well known and commercial support is available.
2009 Conference: http://www.netways.de/en/osmc/y2009/uebersicht/
23
Copyright AARNet Pty Ltd
Pty Ltd 2009
10
Current issues with Nagios
•
•
•
•
•
•
24
HTTP checks for syd
syd-a-noc1
a noc1 failing
Regionalised notifications
Configuration files in CFEngine leads to config errors
Nagios 1.x group of groups – fixed in 3.x
SNMP Traps
Interface monitoring to be improved
Copyright AARNet Pty Ltd
Pty Ltd 2009
MRTG & Cricket
25
Copyright AARNet Pty Ltd
Pty Ltd 2009
11
MRTG Stats
• Collect stats from all
router & switch interfaces
• Auto-generate interface
list nightly, at 00:00 UTC
• SNMP polling for IPv4,
IPv4
needed script for IPv6
26
Copyright AARNet Pty Ltd
Pty Ltd 2009
Weathermap IPv4
27
Copyright AARNet Pty Ltd
Pty Ltd 2009
12
Doing v6 stats…
Current SNMP MIBS for Junos do not support v6 collection
Nor do CISCO Mibs
… have to use an expect script.
Gather data
Store in text files
Process into RRD
… produces MRTG Graphs, and a v6 Weathermap
28
Copyright AARNet Pty Ltd
Pty Ltd 2009
Weathermap IPv6
29
Copyright AARNet Pty Ltd
Pty Ltd 2009
13
Netflow
30
Copyright AARNet Pty Ltd
Pty Ltd 2009
Netflow
AARNet Netflow for customers is collected on CPE Server
Logs processed to give billing data.
Available on CPE portal for customers, and via Corvu for
managers.
Allows AARNet to do analysis – for traffic management.
200Gb of binary flow data per day
31
Copyright AARNet Pty Ltd
Pty Ltd 2009
14
Nullarbor
Inhouse tool, written to analyse network traffic flows.
32
Copyright AARNet Pty Ltd
Pty Ltd 2009
Netflow Traffic
CPE server classifies the netflow data
Classified Data uploaded into MySQL database
Corvu collection script, pulls data from Federate MySQL setup
Customer billing from Corvu information
SNMP stats check.
33
Copyright AARNet Pty Ltd
Pty Ltd 2009
15
IP Routing Monitoring
Quagga & RADB
34
Copyright AARNet Pty Ltd
Pty Ltd 2009
Monitoring Routing
Monitoring a dynamic routing system is difficult…
• Use logs from Quagga
• All CPE servers run a copy of Quagga
• BGP logging occurs on the CPE
• Provides Looking Glass facility to customers
35
Copyright AARNet Pty Ltd
Pty Ltd 2009
16
Looking Glass – server software
36
Copyright AARNet Pty Ltd
Pty Ltd 2009
Looking Glass in action…
37
Copyright AARNet Pty Ltd
Pty Ltd 2009
17
BGP Events
• BGP Play http://www.ibgplay.org/
• http://bgplay.routeviews.org/bgplay/
• http://www.ris.ripe.net/bgpviz/
p
p
gp
• Cyclops
http://cyclops.cs.ucla.edu/
– Routing anomolies – bogus ASNs
38
Copyright AARNet Pty Ltd
Pty Ltd 2009
RPSL
Routing Policy Specification Language
•
•
•
•
•
The Routing Policy Specification Language (RPSL) is a language commonly used by
ISPs to describe their routing policies.
The routing policies are stored at various whois databases including RIPE, RADB and
APNIC. ISPs (using automated tools) then generate router configuration files that match
their business and technical policies.
RFC 2622 describes RPSL, and replaced RIPE-181. ( http://www.irr.net/docs/rfc2622.txt )
RFC 2650 provides
id a reference
f
ttutorial
t i l tto using
i RPSL iin th
the real-world.
l
ld
RPSL has been extended with RPSL-NG (RPSL-Next Generation) effort to support IPv6
routing policies and multicast routing policies. RPSL-NG is defined in RFC 4012.
From Wikipedia: http://en.wikipedia.org/wiki/RPSL
39
Copyright AARNet Pty Ltd
Pty Ltd 2009
18
RIPE NCC
Good information (the source) on RPSL at RIPE NCC pages.
http://www.ripe.net/ripencc/pub-services/db/rpsl/
Excellent training guides:
http://www.ripe.net/training/rr/material/
40
Copyright AARNet Pty Ltd
Pty Ltd 2009
RIPE Additional Resources (from training page)
• RIPE Database:
– RIPE Database Reference Manual
– RIPE Database User Manual: Getting Started
• RPSL:
– Routing Policy Specification Language (RFC-2622)
– Routing Policy System Security (RFC-2725)
– Using RPSL in Practice (RFC-2650)
• IRRToolset:
– IRRToolset project page
– Download tools (FTP Site Maintained by ISC)
• Routing Information Service Project
• Routing Registry Consistency Project
41
Copyright AARNet Pty Ltd
Pty Ltd 2009
19
RADB – Merit Networks
http://www.ra.net/
AARNet uses RADB.
Fee per annum, world wide access to all operators.
More tutorial info here:
http://www.ra.net/tutorials.html
42
Copyright AARNet Pty Ltd
Pty Ltd 2009
RADB & monitoring?
Ops have developed a script that probes all routers, looks at
advertisements & routes
Compares results against AS7575 policy
Produces
P
d
a report…
t looking
l ki ffor anomolies.
li
Last sweep early August 2009, found 5 anomolies….
- affects routing for the client
- MSDP, mBGP filters
43
Copyright AARNet Pty Ltd
Pty Ltd 2009
20
Network Performance
Smokeping, Traceroute,
iPerf
44
Copyright AARNet Pty Ltd
Pty Ltd 2009
Traceroute
• Tells you the path…
it s only ever half the story. How do they get back to us?
• But it’s
traceroute to 203.25.27.226 (203.25.27.226), 30 hops max, 60 byte packets
1 villa.net.uwa.edu.au (130.95.128.254) 1.266 ms 1.664 ms 1.655 ms
2 gigabitethernet1.er1.uwa.cpe.aarnet.net.au (202.158.198.249) 2.081 ms 2.369 ms 2.993 ms
3 ge-1-0-6.bb1.a.per.aarnet.net.au (202.158.198.241) 2.985 ms 3.384 ms 3.376 ms
4 so-0-1-0.bb1.a.adl.aarnet.net.au (202.158.194.6) 29.006 ms 29.063 ms 29.438 ms
5 so-0-1-0.bb1.a.mel.aarnet.net.au (202.158.194.18) 38.073 ms 38.091 ms 38.331 ms
6 so-0-1-0.bb1.b.syd.aarnet.net.au (202.158.194.34) 49.553 ms 49.701 ms 50.011 ms
7 tenge-2-1.pe2.c.syd.aarnet.net.au (202.158.194.196) 50.004 ms 50.331 ms 51.070 ms
8 101.ge-0-0-0.br1.syd6.alter.net (203.103.244.193) 50.354 ms 50.604 ms 51.046 ms
9 0.ge-6-1-0.XT3.SYD2.ALTER.NET (210.80.32.197) 50.314 ms 50.196 ms 50.439 ms
10 0.ge-7-1-0.XT3.SYD4.Alter.Net (210.80.34.57) 50.427 ms 50.703 ms 50.986 ms
11 so-6-0-0.GW4.SYD4.ALTER.NET (210.80.33.242)
ms 50.263 over
ms 50.159
ms
Tracing route 50.676
to 130.95.26.1
a maximum
of 30 hops
12 nextgen-syd4-gw.aspac.customer.alter.net (210.80.189.42) 51.619 ms 52.032 ms 52.406 ms
13 s-br1-sydn-1.pr3.wllt.nxg.net.au (121.200.224.248) 51.030 ms 50.592 ms 51.005 ms
1
<1 ms
<1 ms
<1 ms 10.0.70.1
14 121.200.227.190 (
(121.200.227.190)
) 49.816 ms 50.412 ms 51.217 ms
2
<1
1 ms
1 ms50.829<1
1msms 51.190
10 0 ms
10.0.64.1
64 151.857 ms
15 Gig5-1.cor-1.per.brightonline.com.au
(203.161.15.2)
3 (203.161.15.33)
<1 ms
<1 ms51.544<1msms 51.830
perisa1.zettaserve.com
[10.0.65.2]
16 Gig3-1.esr.coll.brightonline.com.au
ms 52.236 ms
4
<1msms gi0-12.147.core.per.highway1.net.au [203.32.127.1]
17 203.161.14.29 (203.161.14.29) 52.227
ms1 ms
52.705 <1
ms ms53.266
5
1 ms
1 ms ms 53.243
1 ms ms
ge0-2.border.per.highway1.net.au
[203.34.31.1]
18 ge0-24.core.per.highway1.net.au (203.34.31.2)
52.681
53.786 ms
19 203.25.27.226 (203.25.27.226) 50.664
ms1 ms
50.356 ms
*
6
1 ms
1 ms aarnet.ix.waia.asn.au [198.32.212.7]
7
8
9
45
49 ms
266 ms
50 ms
49 ms
211 ms
50 ms
49 ms
223 ms
50 ms
ge-1-0-3.bb1.a.per.aarnet.net.au [202.158.198.1]
gigabitethernet0.er1.uwa.cpe.aarnet.net.au [202.158.198.242]
gw1.er1.uwa.cpe.aarnet.net.au [202.158.198.250]
Copyright AARNet Pty Ltd
Pty Ltd 2009
21
Smokeping
Smokeping running on all CPE EdgeServers
Measuring latency across network to EXT POP Targets
Latency not monitored actively, but will be done one day.
46
Copyright AARNet Pty Ltd
Pty Ltd 2009
Smokeping
47
Copyright AARNet Pty Ltd
Pty Ltd 2009
22
iPerf
• Network Performance Tool
• Not used for active monitoring, used for debugging and
verification of network.
• Intend to run iPerf up on permanently configured 10G servers –
but could lead to a Denial of Service
48
Copyright AARNet Pty Ltd
Pty Ltd 2009
Test Kit: details
• Last year:
– 1RU Acer 2.0GHz Intel Xeon servers with CentOS5
• 3 Gbps with standard kernel (2.6.18)
• 6 Gbps with custom kernel 2.6.24
• Updated myri10ge drivers improved performance & stopped hangs
• 3.16GHz desktop E8500 approached 10 Gbps
• Got fastest available Xeon at 3.16 GHz – gave 9.8 Gbps without
being cpu limited.
• Servers were expensive, and not very portable
– Exclusively used the myricom cards with LR singlemode XFP optics
Copyright AARNet Pty Ltd
Pty Ltd 2009
23
Test kit: Last year Acer R520 & DIY system
Copyright AARNet Pty Ltd
Pty Ltd 2009
Test Kit: cards – myricom XFP & intel dual SFP+
Copyright AARNet Pty Ltd
Pty Ltd 2009
24
Test Kit: New server details
• Tried new intel core-i7 processors when newly released
– Best pperformance yyet with the slower 2.67 GHz.
• $500 processor getting wirespeed – faster version was 4 x price
– System price around $1600, plus card at $695 + $900 US
• Buy as small a system as possible with same processor
–
–
–
–
–
System ordered
Delays
y expected
p
– “If yyou change
g to pprocessor … we can deliver”
In the end delivered with new technology 2.13 GHz Xeon
Client results disappointed a little. – 8, 7.5, 6.5 Gbps as sender
Performed well as server – up to wirespeed
Copyright AARNet Pty Ltd
Pty Ltd 2009
Test Kit: New servers: core i7 & small 1RU
Copyright AARNet Pty Ltd
Pty Ltd 2009
25
Measurement tool: iperf
• Memory to memory transfers
• Lots of options – option order is important.
• Command line clients/servers for range of operating systems
– Linux, windows, OSX & other unix.
• Supports TCP and UDP
• Don’t set manual TCP windows – disables window auto-scaling
• Conventional wisdom: use UDP
– Bypasses any TCP tuning/buffering issues. – congestion risk.
– Need undocumented –l option to set jumbo datagram size
– Don’t do it! – tune your TCP stack for better results & side-step
congestion.
Copyright AARNet Pty Ltd
Pty Ltd 2009
10Gbps Test Results
Copyright AARNet Pty Ltd
Pty Ltd 2009
26
Iperf examples: TCP
Copyright AARNet Pty Ltd
Pty Ltd 2009
Iperf Example: UDP
Copyright AARNet Pty Ltd
Pty Ltd 2009
27
Links
• IPERF – http://iperf.sourceforge.net/
• NDT - http://e2epi.internet2.edu/ndt/
• Internet2 Performance Workshops
– Materials http://www.internet2.edu/workshops/npw/materials.html
• Search “TCP Performance Tuning” for your favorite OS
Copyright AARNet Pty Ltd
Pty Ltd 2009
Logs
Splunk
59
Copyright AARNet Pty Ltd
Pty Ltd 2009
28
SPLUNK
Logs collected on one server
Use GREP/AWK/SED & Perl sometimes.
Also have SPLUNK.
but not integrated with Nagios as yet.
60
Copyright AARNet Pty Ltd
Pty Ltd 2009
Multicast Monitoring
-- Beacon/Dbeacon
61
Copyright AARNet Pty Ltd
Pty Ltd 2009
29
Multicast Monitoring?
Beacon – http://beacon.dat.nlanr.net/ (233.4.200.18)
AARNet Beacon http://beacon2.aarnet.net.au/ (233.70.142.1)
Customer complaints…. ie GRID not working, Evo broken.
No nagios monitoring to date.
62
Copyright AARNet Pty Ltd
Pty Ltd 2009
Beacon or Dbeacon
• Original beacon
• http://sourceforge.net/projects/multicastbeacon/
Newer Dbeacon (C++)
• http://freshmeat.net/projects/dbeacon/
63
Copyright AARNet Pty Ltd
Pty Ltd 2009
30
Eduroam Monitoring System
64
Copyright AARNet Pty Ltd
Pty Ltd 2009
Eduroam Monitoring
Eduroam Technical Working group – consists of University
Wireless & Network Administrators and AARNet staff.
Started working on developing a monitoring solution.
1. Monitoring from the core -> towards an Eduroam member.
2. Monitoring from within the member -> toward the internet.
65
Copyright AARNet Pty Ltd
Pty Ltd 2009
31
Eduroam Monitoring CORE->Member
• Using NAGIOS to send test authentication to National Server,
ppretendingg to be an real institution – and usingg the credentials
of the target Eduroam Member.
Specific plugin “rad_eap_test”
Script
p developed
p for Eduroam.CZ
http://www.eduroam.cz/rad_eap_test/rad_eap_test.html
66
Copyright AARNet Pty Ltd
Pty Ltd 2009
Eduroam Monitoring CORE->Member
67
Copyright AARNet Pty Ltd
Pty Ltd 2009
32
Eduroam Monitoring inside -> Internet
Needs to be a client of the Eduroam Network
– Laptop?
L t ?
– Cheap mini-PC with 802.11a/b/g/n?
– PDA/iPhone/iPod?
– Opensource WAP?
68
Copyright AARNet Pty Ltd
Pty Ltd 2009
Eduroam Monitoring inside -> Internet
Currently evaluating flashing Linksys WRT54G
•
69
http://wiki.openwrt.org/oldwiki/openwrtdocs/hardware/linksys/wrt54g
Copyright AARNet Pty Ltd
Pty Ltd 2009
33
Optical Networks
70
Copyright AARNet Pty Ltd
Pty Ltd 2009
Optical Circuits
• Outsourced Monitoring to Cisco
• Contract is currently with SOUL Telecommunications
• Limited visibility for AARNet NOC
• Limited visibility on dark fibre services
71
Copyright AARNet Pty Ltd
Pty Ltd 2009
34
1300 APL NOC
72
Copyright AARNet Pty Ltd
Pty Ltd 2009
User input…
Sometimes we can
can’tt see it, nor predict it.
• Dark Fibre paths
• Perceived performance issues
• WOW, Gamers, Students, Users…
• 3rd party notifications (Telstra, Nextgen, Powertel etc)
73
Copyright AARNet Pty Ltd
Pty Ltd 2009
35
Automation
74
Copyright AARNet Pty Ltd
Pty Ltd 2009
202.158.192.0/24
All of our routers have loopback addresses on 1 netblock.
Allows for discovery of new hosts and devices.
In theory, with RADB mapping out all IPv4 routing, NMAP of the
network devices, OSPF probe…
… monitoring could be completely automated.
75
Copyright AARNet Pty Ltd
Pty Ltd 2009
36
Auto Configuration?
Google (as presented at AusNOG02) can autogenerate every
router on their network.
Should AARNet?
- Network Configurations?
- Network
N t k Monitoring?
M it i ?
76
Copyright AARNet Pty Ltd
Pty Ltd 2009
Copyright AARNet Pty Ltd 2009
37