AARNet - QUESTnet
Transcription
AARNet - QUESTnet
Questnet Workshop on Network Monitoring Network Operations Mike Groeneweg [email protected] Copyright AARNet Pty Ltd 2009 Network Monitoring • Network monitoring - from Wikipedia, the free encyclopedia The term network monitoring describes the use of a system that constantly monitors a computer network for slow or failing components and that notifies the network administrator in case of outages via email, pager or other alarms. It is a subset of the functions involved in network management. g 2 Copyright AARNet Pty Ltd Pty Ltd 2009 1 Network Monitoring • • • • • • 26 core routers (xxx-a/b-bb1) Juniper 12 core switches (Cisco 7609) 14 small core switches (Foundry FESX-424) >120 CPE routers (Cisco 7304, Cisco 7604) >120 CPE servers (Acer 1RU 300/310/510/520) >70 optical circuits Optical Monitoring – outsourced to Cisco (-> SOUL) 3 Copyright AARNet Pty Ltd Pty Ltd 2009 AARNet Tools 7 Copyright AARNet Pty Ltd Pty Ltd 2009 2 AARNet Monitoring Tools • Nagios • MRTG & Cricket – Weathermap • Netflow – Nullarbor • • • • • 8 IP Routing - RADB & Quagga Smokeping Traceroute, Smokeping, Traceroute iPerf Beacon/Dbeacon Logging - Splunk SOUL & Cisco Copyright AARNet Pty Ltd Pty Ltd 2009 1 Portal to rule them all… 9 Copyright AARNet Pty Ltd Pty Ltd 2009 3 The other one - Corvu - Dashboard Customer portal for Traffic Data Based on Classified Netflow Data Basis for billing • http://dashboard.aarnet.edu.au/ . 10 Copyright AARNet Pty Ltd Pty Ltd 2009 NAGIOS 11 Copyright AARNet Pty Ltd Pty Ltd 2009 4 NAGIOS Nagios Ain't Gonna Insist On Sainthood • 2 primary monitoring servers (Perth, Sydney) • 2 secondary monitoring servers (Melbourne, Sydney) 642 hosts 1678 services • AARNet running Nagios 1.4.x, 2.x, 3.2.x Forked recently – ICINGA. 12 Copyright AARNet Pty Ltd Pty Ltd 2009 NAGIOS Plugins • Use the usual suite provided by nagios-plugins 1.4.x • Custom plugins: – to check BGP sessions (ipv4) – to check environmental conditions (effectively an SNMP check) – to check on netflow (another SNMP check with some smarts) 13 Copyright AARNet Pty Ltd Pty Ltd 2009 5 NAGIOS Notifications • E-Mail to [email protected] • SMS notifications to Ops Engineers – Modified format for 160 characters Nagios Text SMS Substitute PROBLEM - RECOVERY + ACKNOWLEDGEMENT ACK - lax-a-bb1 (202.158.192.20) TEMP 'CRITICAL: Temp 59 degrees on Routing Engine 0' at 28-08-2009 06:36:39 Los Angeles (Telehouse) POP A Backbone Router • Falcomm Tango GSM modem (www.esis.com.au) • Smstools (http://smstools.meinemullemaus.de/) 14 Copyright AARNet Pty Ltd Pty Ltd 2009 AARNet POP Setup 15 Copyright AARNet Pty Ltd Pty Ltd 2009 6 AARNet Core Monitoring – per region POPs • • • • • • • • 16 Loopback addresses of Backbone Routers Loopback addresses of Distribution Switches 10G Trunk interfaces on intra-capital circuit 10G interfaces between Backbone Router & Switch at each POP All WAN interfaces (SDH to next State or Overseas) All servers Access router Any peering routers Copyright AARNet Pty Ltd Pty Ltd 2009 AARNet CPE Basics – single connection 17 Copyright AARNet Pty Ltd Pty Ltd 2009 7 AARNet CPE Monitoring – single connection • For a single connection: 1. 2. 3. 4. 5. 6. 7. 8. 18 Gig1 – Customer Interface Gig 2 – edge server interface Loopback interface of CPE router BGP Status of CPE router IPv6 interface of CPE router Gig port of Core Switch (optional) GHIP/Mux port (optional) Port Based Service interface Copyright AARNet Pty Ltd Pty Ltd 2009 AARNET CPE Basics – dual connection 19 Copyright AARNet Pty Ltd Pty Ltd 2009 8 AARNet CPE Monitoring – dual connections • For a dual connection: 1. 2. 3. 4. 5. 6. 7. 8. 20 Gig1 – Customer Interface Gig 2 – edge server interface Loopback interface of CPE routers BGP Status of CPE router IPv6 interface of CPE router Gig port of Backbone Router (optional) GHIP/Mux port (optional) Port Based Service interface Copyright AARNet Pty Ltd Pty Ltd 2009 Nagios Configuration • Now controlled by cfengine • Each router, switch, target is a separate config file # # $Id: cpe-aarnet-er3.cfg 2815 2008-05-14 14:51:31Z mkg $ # define host { use critical-host host_name cpe-aarnet-er3 alias WA APL Edge Router address 202.158.192.217 parents per-a-bb1, per-b-bb1 hostgroups apl-office-routers,edge-routerscisco7300 } define service{ use host_name service_description contact_groups check_command } 21 critical-service cpe-aarnet-er3 BGP network-eng bgp_peer!public define service{ use critical-service host_name cpe-aarnet-er3 service_description Gi1-APL-Perth contact_groups network-eng notification_options c,r check_command ifOperStatus!public!4 } define host { use host_name _ hostgroups alias address parents } critical-ipv6-host cpe-aarnet-er3-IPv6 p router-v6 cpe-aarnet-er3 IPv6 2001:388:1::d9 cpe-aarnet-er3 Copyright AARNet Pty Ltd Pty Ltd 2009 9 Looking in from the outside AARNet runs a Nagios instance on a server outside of the network. Is routing working? Nagios N i 2.x 2 to be upgraded to 3.x not synced for configuration. 22 Copyright AARNet Pty Ltd Pty Ltd 2009 Is Nagios the best? http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems Contact management for AARNet Ops is an issue. blasts of SMS alarms, going to all engineers should use service heirarchy, not implemented. Looking to regionalise alarms – may need to write backend plugin. ICINGA? Nagios is well known and commercial support is available. 2009 Conference: http://www.netways.de/en/osmc/y2009/uebersicht/ 23 Copyright AARNet Pty Ltd Pty Ltd 2009 10 Current issues with Nagios • • • • • • 24 HTTP checks for syd syd-a-noc1 a noc1 failing Regionalised notifications Configuration files in CFEngine leads to config errors Nagios 1.x group of groups – fixed in 3.x SNMP Traps Interface monitoring to be improved Copyright AARNet Pty Ltd Pty Ltd 2009 MRTG & Cricket 25 Copyright AARNet Pty Ltd Pty Ltd 2009 11 MRTG Stats • Collect stats from all router & switch interfaces • Auto-generate interface list nightly, at 00:00 UTC • SNMP polling for IPv4, IPv4 needed script for IPv6 26 Copyright AARNet Pty Ltd Pty Ltd 2009 Weathermap IPv4 27 Copyright AARNet Pty Ltd Pty Ltd 2009 12 Doing v6 stats… Current SNMP MIBS for Junos do not support v6 collection Nor do CISCO Mibs … have to use an expect script. Gather data Store in text files Process into RRD … produces MRTG Graphs, and a v6 Weathermap 28 Copyright AARNet Pty Ltd Pty Ltd 2009 Weathermap IPv6 29 Copyright AARNet Pty Ltd Pty Ltd 2009 13 Netflow 30 Copyright AARNet Pty Ltd Pty Ltd 2009 Netflow AARNet Netflow for customers is collected on CPE Server Logs processed to give billing data. Available on CPE portal for customers, and via Corvu for managers. Allows AARNet to do analysis – for traffic management. 200Gb of binary flow data per day 31 Copyright AARNet Pty Ltd Pty Ltd 2009 14 Nullarbor Inhouse tool, written to analyse network traffic flows. 32 Copyright AARNet Pty Ltd Pty Ltd 2009 Netflow Traffic CPE server classifies the netflow data Classified Data uploaded into MySQL database Corvu collection script, pulls data from Federate MySQL setup Customer billing from Corvu information SNMP stats check. 33 Copyright AARNet Pty Ltd Pty Ltd 2009 15 IP Routing Monitoring Quagga & RADB 34 Copyright AARNet Pty Ltd Pty Ltd 2009 Monitoring Routing Monitoring a dynamic routing system is difficult… • Use logs from Quagga • All CPE servers run a copy of Quagga • BGP logging occurs on the CPE • Provides Looking Glass facility to customers 35 Copyright AARNet Pty Ltd Pty Ltd 2009 16 Looking Glass – server software 36 Copyright AARNet Pty Ltd Pty Ltd 2009 Looking Glass in action… 37 Copyright AARNet Pty Ltd Pty Ltd 2009 17 BGP Events • BGP Play http://www.ibgplay.org/ • http://bgplay.routeviews.org/bgplay/ • http://www.ris.ripe.net/bgpviz/ p p gp • Cyclops http://cyclops.cs.ucla.edu/ – Routing anomolies – bogus ASNs 38 Copyright AARNet Pty Ltd Pty Ltd 2009 RPSL Routing Policy Specification Language • • • • • The Routing Policy Specification Language (RPSL) is a language commonly used by ISPs to describe their routing policies. The routing policies are stored at various whois databases including RIPE, RADB and APNIC. ISPs (using automated tools) then generate router configuration files that match their business and technical policies. RFC 2622 describes RPSL, and replaced RIPE-181. ( http://www.irr.net/docs/rfc2622.txt ) RFC 2650 provides id a reference f ttutorial t i l tto using i RPSL iin th the real-world. l ld RPSL has been extended with RPSL-NG (RPSL-Next Generation) effort to support IPv6 routing policies and multicast routing policies. RPSL-NG is defined in RFC 4012. From Wikipedia: http://en.wikipedia.org/wiki/RPSL 39 Copyright AARNet Pty Ltd Pty Ltd 2009 18 RIPE NCC Good information (the source) on RPSL at RIPE NCC pages. http://www.ripe.net/ripencc/pub-services/db/rpsl/ Excellent training guides: http://www.ripe.net/training/rr/material/ 40 Copyright AARNet Pty Ltd Pty Ltd 2009 RIPE Additional Resources (from training page) • RIPE Database: – RIPE Database Reference Manual – RIPE Database User Manual: Getting Started • RPSL: – Routing Policy Specification Language (RFC-2622) – Routing Policy System Security (RFC-2725) – Using RPSL in Practice (RFC-2650) • IRRToolset: – IRRToolset project page – Download tools (FTP Site Maintained by ISC) • Routing Information Service Project • Routing Registry Consistency Project 41 Copyright AARNet Pty Ltd Pty Ltd 2009 19 RADB – Merit Networks http://www.ra.net/ AARNet uses RADB. Fee per annum, world wide access to all operators. More tutorial info here: http://www.ra.net/tutorials.html 42 Copyright AARNet Pty Ltd Pty Ltd 2009 RADB & monitoring? Ops have developed a script that probes all routers, looks at advertisements & routes Compares results against AS7575 policy Produces P d a report… t looking l ki ffor anomolies. li Last sweep early August 2009, found 5 anomolies…. - affects routing for the client - MSDP, mBGP filters 43 Copyright AARNet Pty Ltd Pty Ltd 2009 20 Network Performance Smokeping, Traceroute, iPerf 44 Copyright AARNet Pty Ltd Pty Ltd 2009 Traceroute • Tells you the path… it s only ever half the story. How do they get back to us? • But it’s traceroute to 203.25.27.226 (203.25.27.226), 30 hops max, 60 byte packets 1 villa.net.uwa.edu.au (130.95.128.254) 1.266 ms 1.664 ms 1.655 ms 2 gigabitethernet1.er1.uwa.cpe.aarnet.net.au (202.158.198.249) 2.081 ms 2.369 ms 2.993 ms 3 ge-1-0-6.bb1.a.per.aarnet.net.au (202.158.198.241) 2.985 ms 3.384 ms 3.376 ms 4 so-0-1-0.bb1.a.adl.aarnet.net.au (202.158.194.6) 29.006 ms 29.063 ms 29.438 ms 5 so-0-1-0.bb1.a.mel.aarnet.net.au (202.158.194.18) 38.073 ms 38.091 ms 38.331 ms 6 so-0-1-0.bb1.b.syd.aarnet.net.au (202.158.194.34) 49.553 ms 49.701 ms 50.011 ms 7 tenge-2-1.pe2.c.syd.aarnet.net.au (202.158.194.196) 50.004 ms 50.331 ms 51.070 ms 8 101.ge-0-0-0.br1.syd6.alter.net (203.103.244.193) 50.354 ms 50.604 ms 51.046 ms 9 0.ge-6-1-0.XT3.SYD2.ALTER.NET (210.80.32.197) 50.314 ms 50.196 ms 50.439 ms 10 0.ge-7-1-0.XT3.SYD4.Alter.Net (210.80.34.57) 50.427 ms 50.703 ms 50.986 ms 11 so-6-0-0.GW4.SYD4.ALTER.NET (210.80.33.242) ms 50.263 over ms 50.159 ms Tracing route 50.676 to 130.95.26.1 a maximum of 30 hops 12 nextgen-syd4-gw.aspac.customer.alter.net (210.80.189.42) 51.619 ms 52.032 ms 52.406 ms 13 s-br1-sydn-1.pr3.wllt.nxg.net.au (121.200.224.248) 51.030 ms 50.592 ms 51.005 ms 1 <1 ms <1 ms <1 ms 10.0.70.1 14 121.200.227.190 ( (121.200.227.190) ) 49.816 ms 50.412 ms 51.217 ms 2 <1 1 ms 1 ms50.829<1 1msms 51.190 10 0 ms 10.0.64.1 64 151.857 ms 15 Gig5-1.cor-1.per.brightonline.com.au (203.161.15.2) 3 (203.161.15.33) <1 ms <1 ms51.544<1msms 51.830 perisa1.zettaserve.com [10.0.65.2] 16 Gig3-1.esr.coll.brightonline.com.au ms 52.236 ms 4 <1msms gi0-12.147.core.per.highway1.net.au [203.32.127.1] 17 203.161.14.29 (203.161.14.29) 52.227 ms1 ms 52.705 <1 ms ms53.266 5 1 ms 1 ms ms 53.243 1 ms ms ge0-2.border.per.highway1.net.au [203.34.31.1] 18 ge0-24.core.per.highway1.net.au (203.34.31.2) 52.681 53.786 ms 19 203.25.27.226 (203.25.27.226) 50.664 ms1 ms 50.356 ms * 6 1 ms 1 ms aarnet.ix.waia.asn.au [198.32.212.7] 7 8 9 45 49 ms 266 ms 50 ms 49 ms 211 ms 50 ms 49 ms 223 ms 50 ms ge-1-0-3.bb1.a.per.aarnet.net.au [202.158.198.1] gigabitethernet0.er1.uwa.cpe.aarnet.net.au [202.158.198.242] gw1.er1.uwa.cpe.aarnet.net.au [202.158.198.250] Copyright AARNet Pty Ltd Pty Ltd 2009 21 Smokeping Smokeping running on all CPE EdgeServers Measuring latency across network to EXT POP Targets Latency not monitored actively, but will be done one day. 46 Copyright AARNet Pty Ltd Pty Ltd 2009 Smokeping 47 Copyright AARNet Pty Ltd Pty Ltd 2009 22 iPerf • Network Performance Tool • Not used for active monitoring, used for debugging and verification of network. • Intend to run iPerf up on permanently configured 10G servers – but could lead to a Denial of Service 48 Copyright AARNet Pty Ltd Pty Ltd 2009 Test Kit: details • Last year: – 1RU Acer 2.0GHz Intel Xeon servers with CentOS5 • 3 Gbps with standard kernel (2.6.18) • 6 Gbps with custom kernel 2.6.24 • Updated myri10ge drivers improved performance & stopped hangs • 3.16GHz desktop E8500 approached 10 Gbps • Got fastest available Xeon at 3.16 GHz – gave 9.8 Gbps without being cpu limited. • Servers were expensive, and not very portable – Exclusively used the myricom cards with LR singlemode XFP optics Copyright AARNet Pty Ltd Pty Ltd 2009 23 Test kit: Last year Acer R520 & DIY system Copyright AARNet Pty Ltd Pty Ltd 2009 Test Kit: cards – myricom XFP & intel dual SFP+ Copyright AARNet Pty Ltd Pty Ltd 2009 24 Test Kit: New server details • Tried new intel core-i7 processors when newly released – Best pperformance yyet with the slower 2.67 GHz. • $500 processor getting wirespeed – faster version was 4 x price – System price around $1600, plus card at $695 + $900 US • Buy as small a system as possible with same processor – – – – – System ordered Delays y expected p – “If yyou change g to pprocessor … we can deliver” In the end delivered with new technology 2.13 GHz Xeon Client results disappointed a little. – 8, 7.5, 6.5 Gbps as sender Performed well as server – up to wirespeed Copyright AARNet Pty Ltd Pty Ltd 2009 Test Kit: New servers: core i7 & small 1RU Copyright AARNet Pty Ltd Pty Ltd 2009 25 Measurement tool: iperf • Memory to memory transfers • Lots of options – option order is important. • Command line clients/servers for range of operating systems – Linux, windows, OSX & other unix. • Supports TCP and UDP • Don’t set manual TCP windows – disables window auto-scaling • Conventional wisdom: use UDP – Bypasses any TCP tuning/buffering issues. – congestion risk. – Need undocumented –l option to set jumbo datagram size – Don’t do it! – tune your TCP stack for better results & side-step congestion. Copyright AARNet Pty Ltd Pty Ltd 2009 10Gbps Test Results Copyright AARNet Pty Ltd Pty Ltd 2009 26 Iperf examples: TCP Copyright AARNet Pty Ltd Pty Ltd 2009 Iperf Example: UDP Copyright AARNet Pty Ltd Pty Ltd 2009 27 Links • IPERF – http://iperf.sourceforge.net/ • NDT - http://e2epi.internet2.edu/ndt/ • Internet2 Performance Workshops – Materials http://www.internet2.edu/workshops/npw/materials.html • Search “TCP Performance Tuning” for your favorite OS Copyright AARNet Pty Ltd Pty Ltd 2009 Logs Splunk 59 Copyright AARNet Pty Ltd Pty Ltd 2009 28 SPLUNK Logs collected on one server Use GREP/AWK/SED & Perl sometimes. Also have SPLUNK. but not integrated with Nagios as yet. 60 Copyright AARNet Pty Ltd Pty Ltd 2009 Multicast Monitoring -- Beacon/Dbeacon 61 Copyright AARNet Pty Ltd Pty Ltd 2009 29 Multicast Monitoring? Beacon – http://beacon.dat.nlanr.net/ (233.4.200.18) AARNet Beacon http://beacon2.aarnet.net.au/ (233.70.142.1) Customer complaints…. ie GRID not working, Evo broken. No nagios monitoring to date. 62 Copyright AARNet Pty Ltd Pty Ltd 2009 Beacon or Dbeacon • Original beacon • http://sourceforge.net/projects/multicastbeacon/ Newer Dbeacon (C++) • http://freshmeat.net/projects/dbeacon/ 63 Copyright AARNet Pty Ltd Pty Ltd 2009 30 Eduroam Monitoring System 64 Copyright AARNet Pty Ltd Pty Ltd 2009 Eduroam Monitoring Eduroam Technical Working group – consists of University Wireless & Network Administrators and AARNet staff. Started working on developing a monitoring solution. 1. Monitoring from the core -> towards an Eduroam member. 2. Monitoring from within the member -> toward the internet. 65 Copyright AARNet Pty Ltd Pty Ltd 2009 31 Eduroam Monitoring CORE->Member • Using NAGIOS to send test authentication to National Server, ppretendingg to be an real institution – and usingg the credentials of the target Eduroam Member. Specific plugin “rad_eap_test” Script p developed p for Eduroam.CZ http://www.eduroam.cz/rad_eap_test/rad_eap_test.html 66 Copyright AARNet Pty Ltd Pty Ltd 2009 Eduroam Monitoring CORE->Member 67 Copyright AARNet Pty Ltd Pty Ltd 2009 32 Eduroam Monitoring inside -> Internet Needs to be a client of the Eduroam Network – Laptop? L t ? – Cheap mini-PC with 802.11a/b/g/n? – PDA/iPhone/iPod? – Opensource WAP? 68 Copyright AARNet Pty Ltd Pty Ltd 2009 Eduroam Monitoring inside -> Internet Currently evaluating flashing Linksys WRT54G • 69 http://wiki.openwrt.org/oldwiki/openwrtdocs/hardware/linksys/wrt54g Copyright AARNet Pty Ltd Pty Ltd 2009 33 Optical Networks 70 Copyright AARNet Pty Ltd Pty Ltd 2009 Optical Circuits • Outsourced Monitoring to Cisco • Contract is currently with SOUL Telecommunications • Limited visibility for AARNet NOC • Limited visibility on dark fibre services 71 Copyright AARNet Pty Ltd Pty Ltd 2009 34 1300 APL NOC 72 Copyright AARNet Pty Ltd Pty Ltd 2009 User input… Sometimes we can can’tt see it, nor predict it. • Dark Fibre paths • Perceived performance issues • WOW, Gamers, Students, Users… • 3rd party notifications (Telstra, Nextgen, Powertel etc) 73 Copyright AARNet Pty Ltd Pty Ltd 2009 35 Automation 74 Copyright AARNet Pty Ltd Pty Ltd 2009 202.158.192.0/24 All of our routers have loopback addresses on 1 netblock. Allows for discovery of new hosts and devices. In theory, with RADB mapping out all IPv4 routing, NMAP of the network devices, OSPF probe… … monitoring could be completely automated. 75 Copyright AARNet Pty Ltd Pty Ltd 2009 36 Auto Configuration? Google (as presented at AusNOG02) can autogenerate every router on their network. Should AARNet? - Network Configurations? - Network N t k Monitoring? M it i ? 76 Copyright AARNet Pty Ltd Pty Ltd 2009 Copyright AARNet Pty Ltd 2009 37