Slides

Transcription

Slides
Rocky K. C. Chang, Edmond Chan, Waiting Fok, and
Weichao Li
The Hong Kong Polytechnic University
Hung hom, Kowloon, Hong Kong
APRICOT 2010
1
Problem statement
Measurement system
Measurement methodology
Interesting findings
Conclusions
2
Source: http://www.jucc.edu.hk/jucc/harnet.html
3
Wide area network linking up eight tertiary
institutions in HK
Managed by Joint Universities Computer
Centre (JUCC)
−
Coordinate IT service of mutual interest
Provide high-speed
speed optical backbone
network and Internet connectivity
−
Bulk tendering and selection of Internet service
provider – (PCCW Wharf)
4
Collect reliable performance data for
operation and planning purposes.
Justifications for service upgrade
Evaluate the fairness of resource sharing
among the eight institutions.
Achieve some kind of “fairness”.
Improve the quality of network services.
Less optimal routes
Fault locations
5
Problem statement
Measurement system
Measurement methodology
Interesting findings
Conclusions
6
Operating since 1 Jan 2009
Measurement side
OneProbe: provide around
around-the-clock path-quality
monitoring
Planetopus: a measurement management
platform
User side
Web-based
based report on measurement results
Ad hoc performance diagnosis
7
User side
HKU
CUHK
PolyU
CityU
BU
HKUST
OneProbe
@HKIED
Planetopus,
database, etc
OneProbe
@LU
OneProbe
@HKUST
OneProbe
@BU
OneProbe
@CityU
OneProbe
@PolyU
OneProbe
@CUHK
OneProbe
@HKU
Measurement side
40+ web servers selected by the JUCC
LU
HKIED
8
9
10
11
12
Problem statement
Measurement system
Measurement methodology
Interesting findings
Conclusions
13
Continuous monitoring
Configurable sampling
rate and pattern
Low overhead
User-chosen websites
TCP data-path
measurement
Middlebox friendly
Multi-metric
measurement
Reverse
Loss
Reverse
Re-ordering
Forward
Loss
OneProbe
RTT
Jitter
Forward
Re-ordering
Round-trip
Capacity
RTT
14
Deploying measurement tasks
Monitoring the resources usage
Secure measurement data collection
Measurement data management
15
16
Problem statement
Measurement system
Measurement methodology
Interesting findings
Conclusions
17
18
•
Strong and diurnal correlation between RTT
and reverse-path
path packet loss
19
•
No correlation between RTT and reverse
reversepath loss
20
•
Good effect of a forward
forward-route change
21
The three fault events according to public
information:
9 Aug 1:37am(HKT) and 12 hours later
−
12 Aug 10:50am(HKT)
−
EAC
APCN2
17 Aug 2:20pm(HKT)
−
FNAL/RNAL
22
Path
9 Aug
12 Aug
17 Aug
Australasia - NLA
Diurnal RTT burst –
1200ms, up to 12 Aug
Loss burst – 50%, 8 hrs
X
X
Japan - Nissan
X
Rv Loss – 30%
17 hrs
RTT burst – 1800ms
7 hrs
Taiwan - TANET
RTT increase
Fw loss increase
RTT increase 60ms
Diurnal Rv loss –
10~50%, 22 hrs
Diurnal Rv loss burst 10~50%, 17+ hrs
US - Citibank
X
X
RTT burst – 1800ms,
7hrs
Rv Loss – 30%, 13 hrs
Finland - Nokia
X
X
Connectivity Lost 12hrs
Rv Loss – 50% 1.5
days
Korea - KREONET
X
X
RTT increase to 400ms
23
24
25
Affected by the 9 Aug fault:
RTT peaks of 1200ms up to 12 Aug
50%+ burst of losses at 2pm-10pm
2pm
on 9 Aug
PCCW → Pacnet → TransactSDN(AU) → NLA
9 Aug 13:37(HKT)
26
Affected by the 12 & 17 Aug faults:
Burst of Rv Loss(30%) from 12 Aug 10am to 13 Aug
3am
RTT burst of 1800ms on 17 Aug 2-9pm
2
PCCW → Equinix → NTT(US/JP) → OCN(JP)
12 Aug 10:50(HKT)
17 Aug 14:20(HKT)
27
Affected by the 12 & 17 Aug faults:
RTT increased for 60ms since 12 Aug 15:00
Diurnal Rv Loss (10~50%) in 22 hrs since 12 Aug
16:20 and 17+ hrs since 21:40 17 Aug
HKIX → ChungHwaTel → TANET
12 Aug 10:50(HKT)
17 Aug 14:20(HKT)
28
Affected by the 17 Aug fault:
RTT burst of 1800ms
Reverse-path
path loss up to 40%
From 17 Aug 2pm to 18 Aug 3am
PCCW → BNA → AT&T
17 Aug 14:20(HKT)
29
Affected by the 17 Aug fault:
Connectivity lost (OneProbe, TCPTraceroute)
From 17 Aug 2pm to 18 Aug 2am
Rv Loss burst up to 50% until 20 Aug 4pm
PCCW → BNA → GBLX(US) → Nokia(Finland)
17 Aug 14:20(HKT)
Connection
lost
30
Affected by the 17 Aug fault
RTT increased from 40ms to 400ms since 17 Aug
14:20
RTT burst of 400ms around 12 Aug 22:00 to 22:30
HARNET → ASGC (TW) → KREONET
12 Aug 10:50(HKT)
17 Aug 14:20(HKT)
31
Deploying and managing a distributed
measurement system is very challenging.
A reliable, non-cooperative
cooperative measurement method
A measurement management platform
But such a system, if deployed and managed
correctly, is very useful.
More information obtained from contrasting for
performance and fault diagnosis
Currently monitoring the impact of switching to a
new provider
32
33