Traffic at the Network Edge: Sniff, Analyze, Act.

Transcription

Traffic at the Network Edge: Sniff, Analyze, Act.
POLITECNICO DI TORINO
SCUOLA DI DOTTORATO
Dottorato in Ingegneria Elettronica e delle Comunicazioni – XVII Ciclo
Tesi di Dottorato
Traffic at the Network Edge:
Sniff, Analyze, Act.
Dario Rossi
Tutore
prof. Marco Ajmone Marsan
Coordinatore del corso di dottorato
prof. Ivo Montrosset
Gennaio 2005
2
Contents
1 A Primer on Network Measurement
1.1 Motivations and Methodology . .
1.2 Software Classification . . . . . .
1.2.1 Software List . . . . . . .
1.3 Sniffing Tools . . . . . . . . . . .
1.3.1 Tcpdump . . . . . . . . .
1.3.2 Other Tools . . . . . . . .
1.4 Tstat Overview . . . . . . . . . .
1.4.1 The TCPTrace Tool . . . .
1.4.2 Collected Statistics . . . .
1.4.3 Output Overview . . . . .
1.4.4 Usage Overview . . . . .
1.5 The Network Scenario . . . . . .
1.5.1 The GARR-B Architecture
1.5.2 The GARR-B Statistics . .
1.5.3 The GARR-G Project . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
4
6
8
8
8
10
11
12
21
24
25
25
26
26
2 The Measurement Setup
2.1 Traffic Measures in the Internet . . . . . . . . . . . . .
2.2 The Tool: Tstat . . . . . . . . . . . . . . . . . . . .
2.3 Trace analysis . . . . . . . . . . . . . . . . . . . . . .
2.4 IP Level Measures . . . . . . . . . . . . . . . . . . .
2.5 TCP Level Measures . . . . . . . . . . . . . . . . . .
2.5.1 TCP flow level analysis . . . . . . . . . . . . .
2.5.2 Inferring TCP Dynamics from Measured Data .
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
31
32
34
35
36
39
42
3 User Patience and the World Wide Wait
3.1 Background . . . . . . . . . . . . . . .
3.2 Interrupted Flows: a definition . . . . .
3.2.1 Methodology . . . . . . . . . .
3.2.2 Interruption Criterion . . . . . .
3.3 Results . . . . . . . . . . . . . . . . . .
3.3.1 Impact of the User Throughput .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
44
44
45
50
51
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
4
3.4
3.3.2 Impact of Flow Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.3 Completion and Interruption Times . . . . . . . . . . . . . . . . . . . . . 52
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 The Zoo of Elephant and Mice
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Preliminary Definitions . . . . . . . . . . . . . . . . . .
4.3.2 Input Data . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3 Properties of the Aggregation Criterion . . . . . . . . .
4.3.4 Trace Partitioning Model and Algorithm . . . . . . . . .
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Traffic Aggregate Bytewise Properties . . . . . . . . . .
4.4.2 Inspecting TCP Interarrival Time Properties within TAs
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
57
59
60
60
62
64
66
67
68
71
74
5 Feeding a Switch with Real Traffic
5.1 Introduction . . . . . . . . . . . . . . . . . . .
5.2 Internet Traffic Synthesis . . . . . . . . . . . .
5.2.1 Preliminary Definitions . . . . . . . . .
5.2.2 Traffic Matrix Generation . . . . . . .
5.2.3 Greedy Partitioning Algorithm . . . . .
5.3 Performance study . . . . . . . . . . . . . . .
5.3.1 Measurement setup . . . . . . . . . . .
5.3.2 The switching architectures under study
5.3.3 Traffic scenarios . . . . . . . . . . . .
5.3.4 Simulation results . . . . . . . . . . .
5.4 Conclusion . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
77
77
78
79
79
81
82
82
82
83
84
85
.
.
.
.
.
.
.
.
.
.
.
.
.
.
87
87
88
90
91
91
93
95
97
98
99
99
100
102
103
6 Data Inspection: the Analysis of Nonsense
6.1 Introduction and Background . . . . . . . .
6.2 Architecture Overview . . . . . . . . . . .
6.2.1 In the Beginning Was Perl . . . . .
6.2.2 Input Files and Formats . . . . . .
6.2.3 Formats and Expressions Interaction
6.2.4 The d.tools Core . . . . . . . .
6.2.5 The d.tools Flexibility . . . . .
6.2.6 The DiaNa GUI . . . . . . . . . .
6.3 Performance Evaluation and Benchmarking
6.3.1 Related Works . . . . . . . . . . .
6.3.2 The Benchmarking Setup . . . . . .
6.3.3 Perl IO . . . . . . . . . . . . . . .
6.3.4 Perl Implicit Split Construct . . . .
6.3.5 Perl Operators and Expressions . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
6.4
6.5
5
6.3.6 DiaNa Startup Overhead . . . . .
6.3.7 Fields Number and Memory Depth
Practical Examples . . . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . .
7 Final Thoughts
7.1 Measurement Tool . . . . . . . . . . .
7.1.1 Cope with Traffic Shifts . . . .
7.1.2 Cope with Bandwidth Increase .
7.1.3 Distributed Measurement Points
7.2 User Patience and the World Wide Wait
7.2.1 Practical Applications . . . . .
7.3 TCP Aggregates Analysis . . . . . . . .
7.3.1 Future Aggregations . . . . . .
7.4 Switching Performance Evaluation . . .
7.4.1 Modified Optimization Problem
7.4.2 Responsive Sources . . . . . . .
7.5 Post Processing with DiaNa . . . . . . .
7.5.1 DiaNa Makeup and Restyle . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
104
105
106
108
.
.
.
.
.
.
.
.
.
.
.
.
.
109
109
109
110
113
114
114
115
115
118
119
121
124
124
6
CONTENTS
List of Figures
1.1
1.2
1.3
1.4
2.1
2.2
2.3
2.4
Example of the tcpdump Command Output . . . . . . . . . . . . . . . . . . .
Evolution of the GARR-B Network: from April 1999 to May 2004 . . . . . . . .
Logical Map of the Italian GARR-B Network, with a Zoomed View of the Torino
MAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Input versus Output Load Statistics over Different Time-Scales: Yearly to Monthly
and Weekly to Daily . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 9
. 26
. 27
. 28
2.6
2.7
2.8
IP payload traffic balance - Period (B) . . . . . . . . . . . . . . . . . . . . . . . .
Distribution of the incoming traffic versus the source IP address . . . . . . . . . .
Distribution of the TTL field value for outgoing and incoming packets - Period (B)
Incoming and outgoing flows size distribution; tail distribution in log-log scale
(lower plot); zoom in linear and log-log scale of the portion near the origin (upper
plots) - Period (B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Asymmetry distribution of connections expressed in bytes (upper plot) and segments (lower plot) - Period (B) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Distribution of the connections completion time - Period(B) . . . . . . . . . . . .
Distribution of “rwnd” as advertised during handshake . . . . . . . . . . . . . . .
TCP congestion window estimated from the TCP header . . . . . . . . . . . . . .
38
39
40
41
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
Completed and Interrupted TCP Flow . . . . . . . . . . . . . . .
Probability
and Cumulative Distribution . . . . . . . . . . . .
Normalized
Probability Distribution, . . . . .
Temporal Gap Reduction . . . . . . . . . . . . . . . . . . . . . .
Interrupted vs Completed Flows Size CDF . . . . . . . . . . . .
Sensitivity of the Interruption Criterion to the and Parameters
Interrupted vs Completed vs Flows Amount and Ratio . . . . . . .
! : Server on the Top, Client on the Bottom . . . . . . .
"$#&%' : Server on the Top, Client on the Bottom . . . . . . . . . .
(#*)+' : Server case only . . . . . . . . . . . . . . . . . . . . . .
"$#&%', ! : server case only . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
46
47
48
49
49
50
53
54
55
55
4.1
4.2
4.3
4.4
Aggregation Level From Packet Level to TA Level
The Measure Setup . . . . . . . . . . . . . . . . .
Flow Size and Arrival Times for Different TAs . . .
TR Size Distribution . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
62
63
64
2.5
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
33
35
36
LIST OF FIGURES
8
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
TR Flow Number Distribution . . . . . . . . . . . . . . . . . . . .
Trace Partitioning: Algorithmic Behavior . . . . . . . . . . . . . .
Trace Partitioning: Samples for Different Aggregated Classes K . .
Number of Traffic Relations - .0/213540- within each Traffic Aggregate .
Number of TCP Flows within each Traffic Aggregate . . . . . . . .
Mean Size of TCP Flows within each Traffic Aggregate . . . . . . .
Elephant TCP Flows within each Traffic Aggregate . . . . . . . . .
TCP Flows Size Distribution of TAs (Class 36
78 ) . . . . . . . .
Interarrival Time Mean of TCP Flows within TAs . . . . . . . . . .
Interarrival Time Variance of TCP Flows within TAs . . . . . . . .
Interarrival Time Hurst Parameter of TCP Flows within TAs . . . .
Interarrival Time Hurst Parameter of TCP Flows within TAs . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
64
65
67
68
69
70
70
71
72
72
73
74
5.1
5.2
5.3
5.4
5.5
5.6
5.7
Internet traffic abstraction model (on the top) and measure setup (on the bottom).
Internet traffic at different levels of aggregation . . . . . . . . . . . . . . . . . .
The optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The greedy algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mean packet delay under PT and P3 scenarios for cell mode policies. . . . . . . .
Mean packet delay under PT and P3 scenarios for packet mode policies. . . . . .
Throughput under PT and P3 scenarios for cell mode policies. . . . . . . . . . .
.
.
.
.
.
.
.
79
80
80
81
83
84
85
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
DiaNa Framework Conceptual Layering . . . . . . . . . . . . .
Input Data from the DiaNa Perspective . . . . . . . . . . . . . .
Parallel Expressions and Default Expansion . . . . . . . . . . . .
Architecture of a Generic d.tools . . . . . . . . . . . . . . . .
The d.loop Synoptic . . . . . . . . . . . . . . . . . . . . . . .
The DiaNa Graphical User Interface . . . . . . . . . . . . . . .
Input/Output Performance on Linux-2.4 . . . . . . . . . . . . . .
Linux vs. Solaris Normalized Throughput Performance . . . . . .
Explicit split and Performance Loss . . . . . . . . . . . . . . .
Normalized Cost for Floating Point, Integer and String Operations
Modules Loading Performance . . . . . . . . . . . . . . . . . . .
Time Cost of Fields Number vs. Memory Depth . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
89
91
93
94
95
98
101
101
102
103
105
106
7.1
7.2
7.3
7.4
7.5
TstatRRD: Different Temporal Resolution for IP Packet Length Statistics
TstatRRD Web Interface: Round Trip Time Example . . . . . . . . . . .
Flow “Delocalization” as Consequence of the Aggregation Process . . . .
Flows Regrouping through Logical Masks . . . . . . . . . . . . . . . . .
Responsive Traffic Generation Process . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
111
112
119
120
122
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
1.1
1.2
1.3
2.1
2.2
2.3
2.4
Network Monitoring and Measuring Software Tools: Classification by Functional
Role . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Tstat Output: Log Field Description . . . . . . . . . . . . . . . . . . . . . . . . . 23
Input versus Output Bandwidth and Utilization Statistics over Different TimeScales: Yearly to Monthly and Weekly to Daily . . . . . . . . . . . . . . . . . . . 27
2.6
Summary of the analyzed traces . . . . . . . . . . . . . . . . . . . . . . . . . . .
Host name of the 10 most contacted hosts on a flow basis . . . . . . . . . . . . . .
TCP options negotiated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Percentage of TCP traffic generated by common applications in number of flows,
segments and transferred bytes - Period (B) . . . . . . . . . . . . . . . . . . . . .
Average data per flow sent by common applications, in segments and bytes - Period
(B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OutB or DupB events rate, computed with respect to the number of packets and flows
3.1
3.2
Three Most Active server and client statistics: total flows 9 # and interrupted : # . . . 50
Average Values of the Inspected Parameters . . . . . . . . . . . . . . . . . . . . . 52
4.1
Trace Informations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1
6.2
6.3
Serial Expression Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Sampling Option Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Benchmarking Setup Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.5
i
32
34
35
37
37
41
ii
LIST OF TABLES
Abstract
Since the dawning of science, the experimental method has been considered an essential investigation tool for the study of all natural phenomena. Similarly, traffic measurement represents an
indispensable and valuable tool for the analysis of nowadays telecommunication networks. Indeed,
the description of the key characteristics of the real communication processes is a necessary step
for a deeper understanding of the complex dynamics of the network traffic.
However, despite the importance gained by computer networks in current life, as testified by
the deep penetration of the Internet, several aspects of current traffic knowledge are nevertheless
limited or unsatisfactory. Indeed, the pace at which technology is evolving continuously enables
different services and applications: as a consequence, traffic streams flowing in current networks
are very different from the traffic patterns of the recent past. Indeed, while a few years ago Internet
was synonym with Web-browsing, the pervasive diffusion of wide-band access has entailed a shift
of the application spectrum – of which peer-2-peer file-sharing and audio/video streaming are
the most representative examples; moreover, the evolutionary process will likely to repeat itself,
although foreseeing its evolution might be hard from our current viewpoint.
Therefore, it is evident that traffic measurements and analysis represent one of the core tools in
order to allow for efficient Internet performance monitoring, evaluation and management. Indeed,
the knowledge of real traffic is indispensable for the development of synthetic traffic models –
which are very useful for the study, the analysis and the dimensioning of both real networks and
network apparatuses. Besides, efficient traffic engineering cannot leave real traffic properties out
of consideration. Finally, the characterization of current traffic is a key factor for enabling future
generation networks design and deployment.
This thesis, hopefully brings some original contributions to the academic research in the field
of network traffic measurements, and is organized as follows: after a necessary and complete introduction, the presented research focuses on the observation of the traffic from different viewpoints
–considering network, transport and application levels– analyzing the most hazy aspects of the
TCP/IP networking and uncovering some unanticipated phenomena.
More on details, Chapter 1 digs into the details of the traffic collection methodology: after
discussing the motivations behind field-measurement, an introduction is given on the available
measurement techniques; then, the software tools are discussed at length, devoting special attention
to the sniffing tools used throughout all this research study. Eventually, in order to fully describe
the measurement setup, the chapter exhaustively describes of the network scenario, considering its
architecture and its evolution as well.
Chapter 2 provides a thorough analysis, at both the network and transport levels, of the tipical
traffic patterns that can be measured in the network so far described. In order to characterize
the most relevant aspects of the measured traffic, the study considered a huge data set collected
iii
iv
Abstract
during more than two months on the Politecnico’s access link. It is important to stress that while
standard performance measures (such as flow dimensions, traffic distribution, etc.) remain at the
base of traffic evaluation, more sophisticated indices (obtained through data correlation between
the incoming and outgoing traffic) are taken into account as well, in order to give reliable estimates
of the network performance also from the user perspective.
The complex relationship between the network and its users is under analysis in Chapter 3,
which introduces the notion of “user-patience”, an application-level metric apt to quantify the
perceived Quality of Service (QoS). More in details, the chapter presents a study of Web users
behavior when the degradation of network performance causes the increase of page transfer times:
a criterion to infer the application-level impatience metric from the informations available at the
transport-level is presented and validated. In order to infer whether worsening network conditions
translate into greater user impatience, more than two millions of flows were analyzed: surprisingly,
several of the insights gained in the study are counter-intuitive, contributed to refine the complex
picture of interactions between user perception of the Web and network-level events.
Then, Chapter 4 enlighten some interesting aspects at the root of Long Range Dependence
properties exhibited by TCP flows aggregates, extending to the transport-level results that were
only previously known at packet-level. More on details, starting from traffic measurement at the
TCP-flow level, a simple aggregation criterion is used to induce a partition of heavy and light flows
–which are known in the literature respectively as elephants and mice– into different traffic aggregates. Then, the statistical properties of the TCP flow inter-arrival process within each aggregate
are analyzed at depth: interpretation of the results suggests TCP elephants to be cause of LRD
phenomena at the TCP-flow level.
In Chapter 5, traffic measurements are used to evaluate the performance of different switching
architectures. The chapter describes a novel methodology –which can be expressed in terms of
an optimization problem– for the synthesis of realistic traffic starting from a single real packetlevel trace. Using simulation, several scheduling algorithms were stressed by traditional traffic
models and by the novel synthetic traffic: the comparison of the results allowed to gather unexpected results –that traditional traffic models were unable to bring to evidence– along with a strong
degradation of the achievable performance.
Chapter 6 introduces DiaNa, a novel software tool primarly designed to process a huge amount
of data –possibly several orders of magnitude bigger than the workstation random access memory–
in an efficient, spreadsheet-like, batch fashion. The tools has been developed primarily to perform
the analysis described in the other research studies presented in this thesis; however, one of the
primary desing goals was to offer extreme flexibility from the user perspective: as a consequence,
DiaNa is written in Perl and its use is not restricted to traffic traces. The DiaNa syntax is a
very small and orthoghonal superset of the underlying Perl syntax, which allow, e.g., to comfortably address file tokens and profitably use file formats throughout the data processing; the DiaNa
software includes also an interactive Perl/Tk graphical user interface, layered on the top of several batch shell scripts. The chapter introduces the software, dissects its architecture, evaluate its
performance through benchmarking and presents some examples of use in a networking context.
Finally, some conclusive consideration are drawn in Chapter 7: after a brief summary for
each of the different research studies early described, the chapter reports some possible research
directions, as well as sketching some of the currently on-going activities of the candidate in the
measurement field.
Acknowledgements
I’ve always tried to be very concise:
following this choice, I just want to
thank everybody I know: if we are
still in contact, then we both care.
Oo
>
˜
v
vi
Acknowledgements
Chapter 1
A Primer on Network Measurement
RAFFIC measurement is an applied networking research methodology aimed at understanding packet traffic on the Internet. From its humble beginnings in LAN-based measurement of
network applications and protocols, network measurement research has grown in scope and magnitude, and has helped provide insight into fundamental behavioural properties of the Internet, its
protocols, and its users. For example, Internet traffic measurement and monitoring serves as the
basis for a wide range of IP network operations, management, engineering tasks such as trouble
shooting, accounting and usage profiling, routing weight configuration, load balancing, capacity
planning, and so forth.
From a very high-level perspective, there are mainly three core methodologies that, with very
different properties, enable the networking studies: namely, these are analytical models, simulation
and measurement. To validate the research results and the resercher’s intuition, it is normally useful to take several approaches in parallel. For example, from traffic measurement, useful analytical
models can be derived; these models possibly describe traffic behavior, or specify the traffic generation pattern – in which case they can be used as simulation input. Besides, the statistical properties
of the simulation results can be compared to the statistical properties of the real observed traffic;
or, in other setups, simulation input may directly be driven from real trace.
In this study, though we combine the aforementioned approaches, the starting point is always
constituted from the collection and the analysis of real traffic traces: therefore, we devote this introductory chapeter to brief overview the computer tools necessary for trace collection and analyis.
Adopting again a very high-level perspective, there are two main traffic measurement “philosphies”: namely, measurement can be either active or passive. Tools belonging to the first class
inject ad-hoc traffic into the network, aiming at measuring the reaction of the network itself; conversely, tools belonging to the second class passively observe and collect the traffic flowing into
the networks. Though the field of active measurement is very interesting, in the following we will
focus on passive measurements tools mainly and on the analysis of the gathered data as well.
The rest of this chapter is organized as follows: firstly, Section 1.1 provides a high level general description of the existent data collection methods, introducing the problem of packet sniffing;
then, a more detailed classification of the available traffic measurement tools is presented in Section 1.2 – where a partial taxonomy of such tools, without any pretention of completeness, will
also be presented. While Section 1.3 breifly decribes some of the most widely deployed tools, Section 1.4 focuses on Tstat, the software tool created by our research group, to whose developement
1
2
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
I have contributed. Finally, the setup on which measurement are taken is thoroughly described in
Section 1.5.
1.1 Motivations and Methodology
Network traffic measurement provides a means to go “under the hood”, much like an Internet mechanic, to understand what is or is not working properly on a local-area or wide-area network.
Using specialized network measurement hardware or software, a networking researcher can collect detailed information about the transmission of packets on the network, including their timing
structure and contents. With detailed packet-level measurements, and some knowledge of the Internet protocol stack, it is possible to “reverse engineer” significant information about the structure
of an Internet application, or the behaviour of an Internet user. There are four main reasons why
network traffic measurement is a useful methodology:
Network Troubleshooting
Computer networks are not infallible; often, a single malfunctioning piece of equipment
can disrupt the operation of an entire network, or at least degrade performance significantly.
Examples of such scenarios include “broadcast storms”, illegal packet sizes, incorrect addresses, and security attacks. In such scenarios, detailed measurements from the operational
network can often provide a network administrator with the information required to pinpoint
and solve the problem.
Protocol Debugging
Developers often want to test out “new, improved” versions of network applications and
protocols: in this context, network traffic measurement provides a means to ensure the correct operation of the new protocol or application, its conformance to required standards, and
(if necessary) its backward-compatibility with previous versions, prior to unleashing it on a
production network.
Workload Characterization
Network traffic measurements can also be used as input to the workload characterization
process, which analyzes empirical data (often using statistical techniques) to extract salient
and representative properties describing a network application or protocol; as briefly mentioned earlier, the knowledge of the workload characteristics can then lead to the design of
better protocols and networks for supporting the application.
Performance Evaluation
Finally, network traffic measurements can be used to determine how well a given protocol or
application is performing in the Internet, and a detailed analysis of network measurements
can help identify performance bottlenecks; once these performance problems are addressed,
new versions of the protocols can provide better (i.e., faster) performance for the end users
of Internet applications
The “tools of the trade” for network traffic measurement research can be classified in several different ways, classifying thus their methodology:
1.1. MOTIVATIONS AND METHODOLOGY
3
Hardware-based vs Software-based Measurement Tools
The primary categorization among network measurement tools is hardware-based versus
software-based measurement tools: hardware-based platforms, offered by many vendors, are
often referred to as network traffic analyzers: special-purpose equipment designed expressly
for the collection and analysis of network data. This equipment is (more than) often (very)
expensive (and sometimes even more), depending on the number of network interfaces, the
types of network cards, the storage capacity, and the protocol analysis capabilities.
Software-based measurement tools typically rely on kernel-level modifications to network
interfaces of commodity workstations to convert them into machines with packet capture capability. In general, the software-based approach is much less expensive than the hardwarebased approach, but may not offer the same functionality and performance as a dedicated
network traffic analyzer. In some cases, the software-based approach is very specialized, as
in the case of Web servers nd proxies workload analysis, which relies on the access logs that
are recorded by Internet servers and proxies: these logs record each client request for Web
site content, including the time of day, client IP address, URL requested, and document size
to get, e.g., provides useful insight into Web server workloads, without the need to collect
detailed network-level packet traces.
Passive vs Active Measurement Approaches
A passive network monitor is used to observe and record the packet traffic on an operational
network, without injecting any traffic of its own onto the network – that is, the measurement
device is non-intrusive; most network measurement tools fall into this category. An active
network measurement approach uses packets generated by a measurement device to probe
the Internet and measure its characteristics. Simple examples of the latter approach include
the ping utility for estimating network latency to a particular destination on the Internet,
the traceroute utility for determining Internet routing paths, and the pathchar tool
for estimating link capacities and latencies along an Internet path.
On-line vs Off-line Traffic Analysis
Some network traffic analyzers support real-time collection and analysis of network data,
often with graphical displays for on-line visualization of live traffic data, and most hardwarebased network analyzers support this feature. Other network measurement devices are intended only for real-time collection and storage of traffic data; analysis is postponed to an
off-line stage – which may be typically the case depending on the analysis features: once the
traffic data is collected and stored, a researcher can perform as many analyses as desired in
the post-processing phase. Clearly, performing as much on-line analysis as possible brings
several benefits, such as easening and speeding-up the post-processing phase.
LAN vs WAN Measurement
The early network traffic measurement research in the literature was undertaken in Local
Area Network (LAN) environments, such as Ethernet LANs. LANs are easier to measure,
for several reasons. First, a LAN is typically administered by a single well-known organization, meaning that obtaining security clearance for traffic analysis is a relatively straightforward process. Second, the broadcast nature of an Ethernet LAN means that all packets
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
4
transmitted on the LAN are seen by all hosts. Network measurement can be done in this context by simply configuring a network interface into promiscuous mode, which means that the
interface will receive and record (rather than ignore) the packets destined to other hosts on
the network. Later measurement work extended traffic collection and analysis to Wide Area
Network (WAN) environments. The greater challenges here include administrative control of
the network, and security and privacy issues. altough, with time and coordination between
these measurements, a more complete picture of end-to-end network performance will be
possible. For organizations with a single access point to the external Internet, such as ours,
measurement devices can be put in-line on an Internet link near the default router for the
organization. However, it is worth to point out that one of our current on-going concers is
to enable the the deployment of a Wide-Area Network Measurement infrastructure that can
collect simultaneous measurements of client, server, and network behaviours from different
measurement points: please refer to Section 7.1.3 for further details on this topic.
Following the classification so far introduced, we may identify the methodology followed in
this thesis as software software-based passive sniffing; measurement are taken on the unique egress
router of our wide campus LAN, and kind of on/off-line processing depends on the specific task.
1.2 Software Classification
In this section, we individuate four main criteria upon which network monitoring and measure
software can be classified: namely, their functional role, the resources managed, the underlying
mechanism and the software environment/licence.
With respect of the general management area or functional role of a tool, we can individuate
several possible class of software. The following keywords may be used to describe such classes:
;
Alarm: a reporting/logging tool that can trigger on specific events within a network.
;
Analyzer: a traffic monitor that reconstructs and interprets protocol messages that span
several packets.
;
Benchmark: a tool used to evaluate the performance of network components.
;
Control: a tool that can change the state or status of a remote network resource.
;
Debugger: a tool that by generating arbitrary packets and monitoring traffic, can drive a
remote network component to various states and record its responses.
;
Generator: a traffic generation tool.
;
Manager: a distributed network management system or system component.
;
Map: a tool that can discover and report a system’s topology or configuration.
;
Reference: a tool for documenting MIB structure or system configuration.
;
Routing: a packet route discovery tool.
1.2. SOFTWARE CLASSIFICATION
;
Security: a tool for analyzing or reducing threats to security.
;
Status: a tool that remotely tracks the status of network components.
;
Traffic: a tool that monitors packet flow.
5
Similarly, one can classify network monitoring software depending on the managed resources;
without aim of completeness, we may individuate the following self-explaining categories. Bridge,
CHAOS, DECnet, DNS, Ethernet, FDDI, IP, OSI, NFS, Ring, SMTP. Beside, another possible
classification policy involve the mechanism use by the tool, such as in the following list:
;
CMIS: a network management system or component based on CMIS/CMIP, the Common
Management Information System and Protocol.
;
Curses: a tool that uses the ”curses” tty interface package.
;
Eavesdrop: a tool that silently monitors communications media (e.g., by putting an ethernet
interface into ”promiscu- ous” mode).
;
NMS: the tool is a component of or queries a Network Manage- ment System.
;
Ping: a tool that sends packet probes such as ICMP echo mes- sages; to help distinguish
tools, we do not consider NMS queries or protocol spoofing (see below) as probes.
;
Proprietary: a distributed tool that uses proprietary communications techniques to link its
components.
;
RMON: a tool which employs the RMON extensions to SNMP.
;
SNMP: a network management system or component based on SNMP, the Simple Network
Management Protocol.
;
Spoof: a tool that tests operation of remote protocol modules by peer-level message exchange.
;
X: a tool that uses X-Windows.
Finally, we may discriminate among the tool’s operating environment and licence: DOS, HP, Macintosh, OS/2, Sun, UNIX1 , VMS, Standalone2 . About the licence, the main classification
1
2
;
Free: a tool is available at no charge, though other restric- tions may apply (tools that are
part of an OS distribu- tion but not otherwise available are not listed as ”free”).
;
Library: a tool packaged with either an Application Programming Interface (API) or objectlevel subroutines that may be loaded with programs.
As well as others *NIX sistems, such as FreeBSD or Linux
A integrated hardware/software tool that requires only a network interface for operation
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
6
;
Sourcelib: a collection of source code (subroutines) upon which developers may construct
other tools.
In the following, we will restrict out attention to software tools that
;
is open source
;
runs on *NIX systems
;
uses Eavesdrop
;
capture IP and TCP headers
1.2.1 Software List
In Table 1.1 we report a practical example of tool classification by functional role, listing the most
relevant software tools surveyed in [2, 3]. It is evident than the list is far from being complete, and
is also partly out-of-date: indeed, the task of keeping such a list has grown too big, as it is fairly
well known in the scientific community. As a side –but important– note, it should be pointed out
that in many cases the tools are not purely passive, simply because they try to resolve host names
using reverse DNS lookups. Notice that it may be possible to perform a similar classification with
respect to the other three criteria listed in the previous section. Besides, we must point out that a
single tool may fall into several categories, depending on its purpose and extent; in the next section
we will review some of the tools from the following list, specifically, those listed in bold from the
last category.
A few words must be devoted to some important missing items: however, as a disclaimer, it
should be pointed out that a complete bibliography of the measurement projects and tools could
have easily been a research project per-se. Moreover, the compilation of a complete survey of
measurement tools, altough very interesting, it is neverthless very cumbersome and, worst of all,
would become out-of-date very quickly vanyfing the tremendous effort. Neverthless, we cannot
forget to cite projects such as PlanetLab[4, 5], currently consisting of 550 nodes over 261 sites
scattered across the globe which enabled in its turn an infinite serie of independent measurement
projects and tools. Another good starting point is AIDA[6], the Cooperative Association for Internet Data Analysis, which provides tools and analyses promoting the engineering and maintenance
of a robust, scalable global Internet infrastructure: about 40 software tools and more than 100
related technical papers are hosted in the website.
Alarm
CMIP Library
EMANATE
MONET
SNMP Lib.
Analyzer
LANVista
PacketView
Benchmark hammer & anvil
spray
Dual Manager
EtherMeter
NetMetrix
snmpd
LANWatch
Sniffer
iozone
ttcp
Eagle
LanProbe
NETMON
SpiderMonitor
NetMetrix
SpiderMonitor
LANVista
XNETMON
Exeption
LANWatch
NETscout
XNETMON
NETscout
nhfsstone
1.2. SOFTWARE CLASSIFICATION
CMIS
Control
Debugger
Generator
Manager
Map
Reference
Routing
Security
Status
Traffic
CMIP library
CMIP Library
MIB Manager
SNMP Lib.
EthernetBox
hammer & anvil
NetMetrix
Sniffer
Beholder
EMANATE
LanProbe
NETMON
OverVIEW
tokenview
decaddrs
EtherMeter
NetIntegrator
EMANATE
arp
hopcheck
NPRV
Security Checklist
LAN Patrol
Beholder
DiG
EMANA
Internet Rover
MONET
NNStat
ping
SNMP Lib.
TokenVIEW
ENTM
EthernetBox
LAN Patrol
MONET
netwatch
NNStat
spray
trpt
Generic MIB
Dual Manager
MONET
snmpd
NetMetrix
LADDIS
nhfsstone
SpiderMonitor
CMIP Lib.
EthernetBox
MIB Manager
NETscout
SAS/CPE
Tricklet
Dual Manager
Network Map
NPRV
HyperMIB
decaddrs
MONET
ping query
Dual Manager
SNMP Lib.
CMIP Library
dnsstats
fping
lamers
net monitor
NOCOL
ping
PSI SNMP
Tricklet
EtherApe
EtherView
LanProbe
NetMetrix
NetIntegrator
PacketView
Tcpdump
Tstat
7
MIB Browser
Eagle
Entropy
NETMO
proxyd
TokenVW
XNETMON
ping
XNETMON
LANVista
LimeIP
ping
ping
spray
TTCP
decaddrs
Dual Manager
getone
Network Map
MONET
NetLabs Agent
NNStat
NOCOL
SNMP Li
snmpd
Wollongo
XNETMON
etherhostprobe XNETMON
LanProbe
NETMON
SNMP Lib.
MIB Manager
XNETMON
etherhostprobe getone
NETMON
netstat
traceroute
Eagle
EMANATE
XNETMON
CMU SNMP
ControlIP
doc
Dual Manager
getone
host
LanProbe
mconnect
Netlabs Agent
NETscout
NPRV
OverVIEW
proxy
SAS/CPE
snmpd
snmpd
vrfy
XNETMON
etherfind
EtherMeter
Getethers
IPTraf
LANVista
LANWatch
NETMON
NETscout
nfswatch
nhfsstone
Sniffer
SpiderMonitor
tcplogger
tcptrace
ttcp
XNETMON
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
8
Table 1.1: Network Monitoring and Measuring Software
Tools: Classification by Functional Role
1.3 Sniffing Tools
This section contains a brief descriptions of some network traffic tools, commonly known as packet
sniffers or protocol analyzer. Apart from Tstat, which we will review throughly in the next section,
for each tool we report a terse descriptions of its purposes or attributes. To summarize, however, let
us say that the tcpdump utility can be used for basic manual analysis (or by other more automated
tools via the libpcap library): these are, to date, the most widely used solutions, and a number
of utilities are built on top of them.
1.3.1 Tcpdump
Tcpdump [10] is a command line tool for network packet collection from a network interface.
Normally it prints out the packet headers that match a specific boolean expression but you can also
instruct it to display the entire packet in different output formats. Tcpdump can read and write
data from files (in tcpdump format) in addition to reading data from the network interface. This
is especially useful for logging data for later analysis, possibly with some other tool that can read
tcpdump files, possibly with some other tool. Libpcap [11], the library component of tcpdump, has
become very popular: the vaste majority of the packets sniffers mentioned in this thesis makes use
of this library. An example of the tcpdump command output is provided in Figure 1.1.
The Tcpdump/libpcap duo has been ported to the Windows platforms by another research group
of Politecnico di Torino: WinPcap [13, 14] is an open source library for packet capture and network
analysis for the Win32 platforms, including a kernel-level packet filter, a low-level dynamic link
library and a high-level and system-independent library. WinPcap, which is available at [12],
features a device driver that adds to Windows (95, 98, ME, NT, 2000, XP and 2003) the ability to
capture and send raw data from a network card, with the possibility to filter and store in a buffer
the captured packets.
1.3.2 Other Tools
ENTM
ENTM [7] is a screen-oriented utility that runs under VAX/VMS. It monitors local ethernet traffic
and displays either a real time or cumulative, histogram showing a percent breakdown of traffic
by ethernet protocol type. The information in the display can be reported based on packet count
or byte count. The percent of broadcast, multicast and approximate lost packets is reported as
well. The screen display is updated every three seconds. Additionally, a real time, sliding history
window may be displayed showing ethernet traffic patterns for the last five minutes. ENTM can
also report IP traffic statistics by packet count or byte count. The IP histograms reflect information
collected at the TCP and UDP port level, includ- ing ICMP type/code combinations. Both the
1.3. SNIFFING TOOLS
9
17:06:02.613909 nonsns.polito.it.675 < serverlipar.polito.it.sunrpc:
S 900316031:900316031(0) win 5840 <mss 1460,sackOK,timestamp 20220003
0,nop,wscale 0 < (DF)
17:06:02.614144 serverlipar.polito.it.sunrpc < nonsns.polito.it.675: S
2978051708:2978051708(0) ack 900316032 win 5792 <mss 1460,sackOK,timestamp
203083456 20220003,nop,wscale 2 < (DF)
17:06:02.614169 nonsns.polito.it.675 < serverlipar.polito.it.sunrpc: . ack
1 win 5840 <nop,nop,timestamp 20220003 203083456 < (DF)
17:06:02.614197 nonsns.polito.it.675 < serverlipar.polito.it.sunrpc: P
1:61(60) ack 1 win 5840 <nop,nop,timestamp 20220003 203083456 < (DF)
17:06:02.614432 serverlipar.polito.it.sunrpc < nonsns.polito.it.675: . ack
61 win 1448 <nop,nop,timestamp 203083456 20220003 < (DF)
17:06:02.614631 serverlipar.polito.it.sunrpc < nonsns.polito.it.675: P
1:33(32) ack 61 win 1448 <nop,nop,timestamp 203083456 20220003 < (DF)
17:06:02.614641 nonsns.polito.it.675 < serverlipar.polito.it.sunrpc: . ack
33 win 5840 <nop,nop,timestamp 20220003 203083456 < (DF)
17:06:02.614667 nonsns.polito.it.675 < serverlipar.polito.it.sunrpc: F
61:61(0) ack 33 win 5840 <nop,nop,timestamp 20220003 203083456 < (DF)
17:06:02.614689 nonsns.polito.it.676 < serverlipar.polito.it.731: S
910139082:910139082(0) win 5840 <mss 1460,sackOK,timestamp 20220003
0,nop,wscale 0 < (DF)
17:06:02.614896 serverlipar.polito.it.731 < nonsns.polito.it.676: S
2977556316:2977556316(0) ack 910139083 win 5792 <mss 1460,sackOK,timestamp
203083456 20220003,nop,wscale 2 < (DF)
17:06:02.614905 nonsns.polito.it.676 < serverlipar.polito.it.731: . ack 1
win 5840 <nop,nop,timestamp 20220003 203083456 < (DF)
17:06:02.614918 serverlipar.polito.it.sunrpc < nonsns.polito.it.675: F
33:33(0) ack 62 win 1448 <nop,nop,timestamp 203083456 20220003 < (DF)
17:06:02.614928 nonsns.polito.it.675 < serverlipar.polito.it.sunrpc: . ack
34 win 5840 <nop,nop,timestamp 20220003 203083456 < (DF)
17:06:02.614957 nonsns.polito.it.676 < serverlipar.polito.it.731: P 1:77(76)
ack 1 win 5840 <nop,nop,timestamp 20220003 203083456 < (DF)
17:06:02.615133 serverlipar.polito.it.731 < nonsns.polito.it.676: . ack 77
win 1448 <nop,nop,timestamp 203083457 20220003 < (DF)
Figure 1.1: Example of the tcpdump Command Output
ethernet and IP histograms may be sorted by ASCII protocol/port name or by percent-value. All
screen displays can be saved in a file for printing later.
EtherApe
EtherApe [8] passively collects data from one or several network interfaces. It makes simple traffic
calculations and print a graphical display of the network traffic in real-time. The graphical display
uses different colors to make it easier to distinguish between each protocol. It also displays how
much traffic of each protocol that has been transmitted over the network in a very visible way;
more traffic give a thicker line. The configuration on what network traffic it should listen to is very
flexible. It is for example possible to define that it should listen to TCP traffic from some hosts
and IP traffic from other hosts. You can also define how long it should remember traffic from hosts
10
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
in order to get a good view of the current and past network traffic. EtherApe can be useful for
the network administrator to get an overview of the traffic. The drawback is the graphical display,
which requires that a lot of libraries have to be installed on the node. The graphical output also
interferes with the flow if the display is not displayed locally connected to the node.
Getethers
Getethers [9] runs through all addresses on an ethernet segment (a.b.c.1 to a.b.c.254) and pings
each address, and then determines the ethernet address for that host. It produces a list, in either
plain ASCII, the file format for the Excelan Lanalyzer, or the file format for the Network General
Sniffer, of hostname/ethernet address pairs for all hosts on the local nework. The plain ASCII list
optionally includes the vendor name of the ethernet card in each system, to aid in the determination
of the identity of unknown systems. Getethers uses a raw IP socket to generate ICMP echo requests
and receive ICMP echo replies, and then examines the kernel ARP table to determine the ethernet
address of each responding system.
IPtraf
IPtraf [15] is a console based traffic sniffer. It can gather a variety of reports such as TCP connection packet and byte counts, interface statistics and activity indicators, TCP/UDP traffic breakdowns, and LAN station packet and byte counts in real time.
1.4 Tstat Overview
The lack of automatic tools able to produce statistical data from collected network traces was a
major motivation to develop Tstat, a tool which, starting from standard software libraries, is able
to offer network managers and researchers important information about classic and novel performance indexes and statistical data about Internet traffic. Tstat started as evolution of TCPtrace[49],
of which it inherits most of the features: therefore, we report a detailed description of the measurement mechanism of TCPtrace in Section 1.4.1; here, we briefly report the most relevant Tstat
mechanisms, referring the reader to the informations available along with Tstat distribution for further details. Tstat is able to analyze either real-time captured traces, using common PC hardware,
or previously recorded traces, supporting various dump formats, such as the one supported by the
libpcap library. The software assumes to receive as input a trace collected on an edge node, in such
a way that both data segments and ACK segments can be analyzed.
Besides the more common IP statistics, derived from the analysis of the IP header, Tstat is also
able to rebuild each TCP connection status looking at the TCP header in the forward and backward
packet flows: thus incoming and outgoing packets/flows are identified. If connection opening and
closing is observed, the flow is marked as a complete flow, and then analyzed. To free memory
related to status used by TCP flows that are inactive, a timer of 30 minutes is used as garbage
collector. TCP segments that belong to flows whose opening is not recorded (because either were
started before, or early declared closed by the garbage procedure) are discarded and marked as
“trash”. The bidirectional TCP flow analysis allows the derivation of novel statistics (such as, for
example, the congestion window size, out-of-sequence segments, duplicated segments, etc.) which
1.4. TSTAT OVERVIEW
11
are collected distinguishing both between clients and servers, (i.e., host that actively open a connection and host tha t reply to the connection request) and also identifying internal and external
hosts (i.e., hosts located inside oroutside the edge node used as measurement point). Instead of
dumping each single measured datum, for each measured quantity Tstat builds a histogram, dumping every collected distribution on a periodical basis (four times per hour by defaults), performing
thus a discrete amount of statistical analysis on-line. These data sets can be used to produce either
time plots, or aggregated plots over different time spans: the Web interface available at [32] allows
the user to browse all the collected data, selecting the desired performance indexes and directly
producing the graphs, – as well as retrieve the raw data that can then later be used for further
analysis. A total of more than 80 different histogram types are available, including both IP and
TCP statistics. They range from classic measures directly available from packet headers (e.g., percentage of TCP or UDP packets, packet length distribution, TCP port distribution, ...), to advanced
measures, related to TCP (e.g., average congestion window, RTT estimates, out-of-sequence data,
duplicated data, ...). A complete flow-level log, useful for post-processing purposes, keeps track
of all the TCP flow analyzed including all the performance indexed early described.
1.4.1 The TCPTrace Tool
TCPtrace is a tool written by Shawn Ostermann at Ohio University, for analysis of TCP dump
files and is maintained these days by his students and members of the Internetworking Research
Group (IRG) at Ohio University. It can take as input the files produced by several popular packetcapture programs, including tcpdump, snoop, etherpeek, HP Net Metrix, and WinDump. tcptrace
can produce several different types of output containing information on each connection seen, such
as elapsed time, bytes and segments sent and recieved, retransmissions, round trip times, window
advertisements, throughput, and more. It can also produce a number of graphs for further analysis.
In the following, we will overview the tool, referring the reader for further informations to [16].
Interface Overview
When tcptrace is run trivially on a dumpfile, it generates output similar to the following :
Beluga:/Users/mani> tcptrace tigris.dmp
1 arg remaining, starting with ’tigris.dmp’
Ostermann’s tcptrace -- version 6.4.5 -- Fri Jun 13, 2003
87 packets seen, 87 TCP packets traced
elapsed wallclock time: 0:00:00.037900, 2295 pkts/sec analyzed
trace file elapsed time: 0:00:12.180796
TCP connection info:
1: pride.cs.ohiou.edu:54735 - elephus.cs.ohiou.edu:ssh (a2b)
30>
30< (complete)
2: pride.cs.ohiou.edu:54736 - a17-112-152-32.apple.com:http (c2d)
12>
15< (complete)
\
\
In the above example, tcptrace is run on dumpfile tigris.dmp. The initial lines tell
that the file tcptrace is processing currently is tigris.dmp, the version of tcptrace
12
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
running, and when it was compiled. The next line tells that a total of 87 packets were seen in
the dumpfile and all the 87 TCP packets (in this case) were traced. The next line tells that the
elapsed wallclock time i.e., the time tcptrace took to process the dumpfile, and the
average speed in packets per second taken for processing. The following line indicates the trace
file elapsed time i.e., the duration of packet capture of the dumpfile calculated as the duration between the capture of the first and last packets.
The subsequent lines indicate the two TCP connections traced from the dumpfile. The first connection was seen between machines pride.cs.ohiou.edu at TCP port 54735, and elephus
.cs.ohiou.edu at TCP port ssh (22). Similarly the second connection was seen between machines pride.cs.ohiou.edu at TCP port 54736, and a17-112-152-32 .apple.com at
TCP port http (80). tcptrace uses a labeling scheme to refer to individual connections traced.
In the above example the two connections are labeled a2b and c2d respectively. For the first connection, 30 packets were seen in the a2b direction (pride.cs.ohiou.edu == = elephus.cs.ohiou.edu)
and 30 packets were seen in the b2a direction (elephus.cs.ohiou.edu == = pride.cs.ohiou.edu).
The two connections are reported as complete indicating that the entire TCP connection was
traced i.e., SYN and FIN segments opening and closing the connection were traced. TCP connections may also be reported as reset if the connection was closed with an RST segment, or
unidirectional if traffic was seen flowing in only one direction.
The above brief output generated by tcptrace can also be generated with the -b option.
In the above example, tcptrace looked up names (elephus.cs.ohiou.edu, for example) and service names (http, for example) involving a DNS name lookup operation. Such name and service lookups can be turned off with the -n option to make tcptrace process faster. If you
need name lookups but would rather have the short names of machines (elephus instead of
elephus.cs.ohiou.edu for example), use the -s option.
1.4.2 Collected Statistics
Output Example
tcptrace can produce detailed statistics of TCP connections from dumpfiles when given the -l
or the long output option. The -l option produces output similar to the one shown in this example.
Beluga:/Users/mani> tcptrace -l malus.dmp.gz
1 arg remaining, starting with ’malus.dmp.gz’
Ostermann’s tcptrace -- version 6.4.6 -- Tue Jul 1, 2003
32 packets seen, 32 TCP packets traced
elapsed wallclock time: 0:00:00.037948, 843 pkts/sec analyzed
trace file elapsed time: 0:00:00.404427
TCP connection info:
1 TCP connection traced:
TCP connection 1:
host a:
elephus.cs.ohiou.edu:59518
host b:
a17-112-152-32.apple.com:http
complete conn: yes
first packet: Thu Jul 10 19:12:54.914101 2003
last packet:
Thu Jul 10 19:12:55.318528 2003
1.4. TSTAT OVERVIEW
elapsed time: 0:00:00.404427
total packets: 32
filename:
malus.dmp.gz
a->b:
b->a:
total packets:
16
ack pkts sent:
15
pure acks sent:
13
sack pkts sent:
0
dsack pkts sent:
0
max sack blks/ack:
0
unique bytes sent:
450
actual data pkts:
1
actual data bytes:
450
rexmt data pkts:
0
rexmt data bytes:
0
zwnd probe pkts:
0
zwnd probe bytes:
0
outoforder pkts:
0
pushed data pkts:
1
SYN/FIN pkts sent:
1/1
req 1323 ws/ts:
Y/Y
adv wind scale:
0
req sack:
Y
sacks sent:
0
urgent data pkts:
0
urgent data bytes:
0
mss requested:
1460
max segm size:
450
min segm size:
450
avg segm size:
449
max win adv:
40544
min win adv:
5840
zero win adv:
0
avg win adv:
23174
initial window:
450
initial window:
1
ttl stream length:
450
missed data:
0
truncated data:
420
truncated packets:
1
data xmit time:
0.000
idletime max:
103.7
throughput:
1113
13
pkts
bytes
bytes
bytes
bytes
bytes
bytes
bytes
times
bytes
bytes
pkts
bytes
bytes
bytes
pkts
secs
ms
Bps
total packets:
ack pkts sent:
pure acks sent:
sack pkts sent:
dsack pkts sent:
max sack blks/ack:
unique bytes sent:
actual data pkts:
actual data bytes:
rexmt data pkts:
rexmt data bytes:
zwnd probe pkts:
zwnd probe bytes:
outoforder pkts:
pushed data pkts:
SYN/FIN pkts sent:
req 1323 ws/ts:
adv wind scale:
req sack:
sacks sent:
urgent data pkts:
urgent data bytes:
mss requested:
max segm size:
min segm size:
avg segm size:
max win adv:
min win adv:
zero win adv:
avg win adv:
initial window:
initial window:
ttl stream length:
missed data:
truncated data:
truncated packets:
data xmit time:
idletime max:
throughput:
16
16
2
0
0
0
18182
13
18182
0
0
0
0
0
1
1/1
Y/Y
0
N
0
0
0
1460
1448
806
1398
33304
33304
0
33304
1448
1
18182
0
17792
13
0.149
99.9
44957
pkts
bytes
bytes
bytes
bytes
bytes
bytes
bytes
times
bytes
bytes
pkts
bytes
bytes
bytes
pkts
secs
ms
Bps
The initial lines of output are similar to the brief output explained in . The following lines
indicate that the hosts involved in the connection and their TCP port numbers are:
host a:
host b:
elephus.cs.ohiou.edu:59518
a17-112-152-32.apple.com:http
The following lines indicate that the connection was seen to be complete i.e., the connection was
traced in its entirety with the SYN and FIN segments of the connection observed in the dumpfile.
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
14
The time at which the first and last packets of the connection were captured is reported, followed
by the lifetime of the connection, and the number of packets seen. Then, the filename currently
being processed is listed, followed by the multiple TCP statistics for the forward (a2b) and the
reverse (b2a) directions.
Output Statistics
We explain the TCP parameter statistics in the following, for the a2b direction. Similar explanation
would hold for the b2a direction too.
;
total packets The total number of packets seen.
;
ack pkts sent The total number of ack packets seen (TCP segments seen with the ACK
bit set).
;
pure acks sent The total number of ack packets seen that were not piggy-backed
with data (just the TCP header and no TCP data payload) and did not have any of the
SYN/FIN/RST flags set.
;
sack pkts sent The total number of ack packets seen carrying TCP SACK [20] blocks.
;
dsack pkts sent The total number of sack packets seen that carried duplicate SACK
(D-SACK) [22] blocks.
;
max sack blks/ack The maximum number of sack blocks seen in any sack packet.
;
unique bytes sent The number of unique bytes sent, i.e., the total bytes of data sent
excluding retransmitted bytes and any bytes sent doing window probing.
;
actual data pkts The count of all the packets with at least a byte of TCP data payload.
;
actual data bytes The total bytes of data seen. Note that this includes bytes from
retransmissions / window probe packets if any.
;
rexmt data pkts The count of all the packets found to be retransmissions.
;
rexmt data bytes The total bytes of data found in the retransmitted packets.
;
zwnd probe pkts The count of all the window probe packets seen. (Window probe
packets are typically sent by a sender when the receiver last advertised a zero receive window,
to see if the window has opened up now).
;
zwnd probe bytes The total bytes of data sent in the window probe packets.
;
outoforder pkts The count of all the packets that were seen to arrive out of order.
;
pushed data pkts The count of all the packets seen with the PUSH bit set in the TCP
header.
1.4. TSTAT OVERVIEW
15
;
SYN/FIN pkts sent The count of all the packets seen with the SYN/FIN bits set in the
TCP header respectively.
;
req 1323 ws/ts If the endpoint requested Window Scaling/Time Stamp options as specified in RFC 1323[25] a ‘Y’ is printed on the respective field. If the option was not requested,
an ‘N’ is printed. For example, an “N/Y” in this field means that the window-scaling option
was not specified, while the Time-stamp option was specified in the SYN segment.
Note that since Window Scaling option is sent only in SYN packets, this field is meaningful
only if the connection was captured fully in the dumpfile to include the SYN packets.
;
adv wind scale The window scaling factor used. Again, this field is valid only if the
connection was captured fully to include the SYN packets. Since the connection would use
window scaling if and only if both sides requested window scaling [25], this field is reset to
0 (even if a window scale was requested in the SYN packet for this direction), if the SYN
packet in the reverse direction did not carry the window scale option.
;
req sack If the end-point sent a SACK permitted option in the SYN packet opening the
connection, a ‘Y’ is printed; otherwise ‘N’ is printed.
;
sacks sent The total number of ACK packets seen carrying SACK information.
;
urgent data pkts The total number of packets with the URG bit turned on in the TCP
header.
;
urgent data bytes The total bytes of urgent data sent. This field is calculated by
summing the urgent pointer offset values found in packets having the URG bit set in the
TCP header.
;
mss requested The Maximum Segment Size (MSS) requested as a TCP option in the
SYN packet opening the connection.
;
max segm size The maximum segment size observed during the lifetime of the connection.
;
min segm size The minimum segment size observed during the lifetime of the connection.
;
avg segm size The average segment size observed during the lifetime of the connection
calculated as the value reported in the actual data bytes field divided by the actual
data pkts reported.
;
max win adv The maximum window advertisement seen. If the connection is using window scaling (both sides negotiated window scaling during the opening of the connection),
this is the maximum window-scaled advertisement seen in the connection. For a connection
using window scaling, both the SYN segments opening the connection have to be captured
in the dumpfile for this and the following window statistics to be accurate.
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
16
;
min win adv The minimum window advertisement seen. This is the minimum windowscaled advertisement seen if both sides negotiated window scaling.
;
zero win adv The number of times a zero receive window was advertised.
;
avg win adv The average window advertisement seen, calculated as the sum of all window advertisements divided by the total number of packets seen. If the connection endpoints
negotiated window scaling, this average is calculated as the sum of all window-scaled advertisements divided by the number of window-scaled packets seen. Note that in the windowscaled case, the window advertisements in the SYN packets are excluded since the SYN
packets themselves cannot have their window advertisements scaled, as per RFC 1323[25].
;
initial window The total number of bytes sent in the initial window i.e., the number
of bytes seen in the initial flight of data before receiving the first ack packet from the other
endpoint. Note that the ack packet from the other endpoint is the first ack acknowledging
some data (the ACKs part of the 3-way handshake do not count), and any retransmitted
packets in this stage are excluded.
;
initial window The total number of segments (packets) sent in the initial window as
explained above.
;
ttl stream length The Theoretical Stream Length. This is calculated as the difference between the sequence numbers of the SYN and FIN packets, giving the length of the
data stream seen. Note that this calculation is aware of sequence space wrap-arounds, and is
printed only if the connection was complete (both the SYN and FIN packets were seen).
;
missed data The missed data, calculated as the difference between the ttl stream
length and unique bytes sent. If the connection was not complete, this calculation
is invalid and an “NA” (Not Available) is printed.
;
truncated data The truncated data, calculated as the total bytes of data truncated during packet capture. For example, with tcpdump, the snaplen option can be set to 64 (with -s
option) so that just the headers of the packet (assuming there are no options) are captured,
truncating most of the packet data. In an Ethernet with maximum segment size of 1500
J0K
bytes, this would amount to truncated data of $>?A@CB?DE
F8DGBHI
for a packet.
;
truncated packets The total number of packets truncated as explained above.
;
data xmit time Total data transmit time, calculated as the difference between the times
of capture of the first and last packets carrying non-zero TCP data payload.
;
idletime max Maximum idle time, calculated as the maximum time between consecutive packets seen in the direction.
;
throughput The average throughput calculated as the unique bytes sent divided by the
elapsed time i.e., the value reported in the unique bytes sent field divided by the
elapsed time (the time difference between the capture of the first and last packets in the
direction).
1.4. TSTAT OVERVIEW
17
RTT Stats
RTT (Round-Trip Time) statistics are generated when the -r option is specified along with the
-l option. The following fields of output are produced along with the output generated by the -l
option.
surya:/home/mani/tcptrace-manual> tcptrace -lr indica.dmp.gz
1 arg remaining, starting with ’indica.dmp.gz’
Ostermann’s tcptrace -- version 6.4.5 -- Fri Jun 13, 2003
153 packets seen, 153 TCP packets traced
elapsed wallclock time: 0:00:00.128422, 1191 pkts/sec analyzed
trace file elapsed time: 0:00:19.092645
TCP connection info:
1 TCP connection traced:
TCP connection 1:
host a:
192.168.0.70:32791
host b:
webco.ent.ohiou.edu:23
complete conn: yes
first packet: Thu Aug 29 18:54:54.782937 2002
last packet:
Thu Aug 29 18:55:13.875583 2002
elapsed time: 0:00:19.092645
total packets: 153
filename:
indica.dmp.gz
a->b:
b->a:
total packets:
91
total packets:
. . .
. . .
. . .
. . .
throughput:
10 Bps
throughput:
RTT
RTT
RTT
RTT
RTT
samples:
min:
max:
avg:
stdev:
48
74.1
204.0
108.6
44.2
ms
ms
ms
ms
RTT
RTT
RTT
RTT
RTT
samples:
min:
max:
avg:
stdev:
62
94 Bps
47
0.1
38.8
8.1
14.7
ms
ms
ms
ms
RTT from 3WHS:
75.0 ms
RTT from 3WHS:
0.1 ms
RTT
RTT
RTT
RTT
RTT
1
79.5
79.5
79.5
0.0
RTT
RTT
RTT
RTT
RTT
1
0.1
0.1
0.1
0.0
full_sz
full_sz
full_sz
full_sz
full_sz
smpls:
min:
max:
avg:
stdev:
post-loss acks:
0
ms
ms
ms
ms
full_sz
full_sz
full_sz
full_sz
full_sz
smpls:
min:
max:
avg:
stdev:
post-loss acks:
For the following 5 RTT statistics, only ACKs for
multiply-transmitted segments (ambiguous ACKs) were
considered. Times are taken from the last instance
of a segment.
ambiguous acks:
1
ambiguous acks:
RTT min (last):
76.3 ms
RTT min (last):
ms
ms
ms
ms
0
0
0.0 ms
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
18
RTT max (last):
RTT avg (last):
RTT sdv (last):
segs cum acked:
duplicate acks:
triple dupacks:
max # retrans:
min retr time:
max retr time:
avg retr time:
sdv retr time:
76.3
76.3
0.0
0
0
0
1
380.2
380.2
380.2
0.0
ms
ms
ms
ms
ms
ms
ms
RTT max (last):
RTT avg (last):
RTT sdv (last):
segs cum acked:
duplicate acks:
triple dupacks:
max # retrans:
min retr time:
max retr time:
avg retr time:
sdv retr time:
0.0
0.0
0.0
0
0
0
0
0.0
0.0
0.0
0.0
ms
ms
ms
ms
ms
ms
ms
;
RTT samples The total number of Round-Trip Time (RTT) samples found. tcptrace
is pretty smart about choosing only valid RTT samples. An RTT sample is found only if
an ack packet is received from the other endpoint for a previously transmitted packet such
that the acknowledgment value is 1 greater than the last sequence number of the packet.
Further, it is required that the packet being acknowledged was not retransmitted, and that
no packets that came before it in the sequence space were retransmitted after the packet was
transmitted. Note : The former condition invalidates RTT samples due to the retransmission
ambiguity problem, and the latter condition invalidates RTT samples since it could be the
case that the ack packet could be cumulatively acknowledging the retransmitted packet, and
not necessarily ack-ing the packet in question.
;
RTT min The minimum RTT sample seen.
;
RTT max The maximum RTT sample seen.
;
RTT avg The average value of RTT found, calculated straightforward-ly as the sum of all
the RTT values found divided by the total number of RTT samples.
;
RTT stdev The standard deviation of the RTT samples.
;
RTT from 3WHS The RTT value calculated from the TCP 3-Way Hand-Shake (connection
opening) [18], assuming that the SYN packets of the connection were captured.
;
RTT full sz smpls The total number of full-size RTT samples, calculated from the
RTT samples of full-size segments. Full-size segments are defined to be the segments of the
largest size seen in the connection.
;
RTT full sz min The minimum full-size RTT sample.
;
RTT full sz max The maximum full-size RTT sample.
;
RTT full sz avg The average full-size RTT sample.
;
RTT full sz stdev The standard deviation of full-size RTT samples.
1.4. TSTAT OVERVIEW
19
;
post-loss acks The total number of ack packets received after losses were detected
and a retransmission occurred. More precisely, a post-loss ack is found to occur when
an ack packet acknowledges a packet sent (acknowledgment value in the ack pkt is 1 greater
than the packet’s last sequence number), and at least one packet occurring before the packet
acknowledged, was retransmitted later. In other words, the ack packet is received after we
observed a (perceived) loss event and are recovering from it.
;
ambiguous acks, RTT min, RTT max, RTT avg, RTT sdv These fields are
printed only if there was at least one ack received that was ambiguous due to the retransmission ambiguity problem i.e., the segment being ack-ed was retransmitted and it is impossible
to determine if the ack is for the original or the retransmitted packet. Note that these samples are not considered in the RTT samples explained above. The statistics below are
calculated from the time of capture of the last transmitted instance of the segment.
;
ambiguous acks is the total number of such ambiguous acks seen. The following RTT
min, RTT max, RTT avg, RTT sdv fields represent the minimum, maximum, average, and standard deviation respectively of the RTT samples calculated from ambiguous
acks.
;
segs cum acked The count of the number of segments that were cumulatively acknowledged and not directly acknowledged.
;
duplicate acks The total number of duplicate acknowledgments received. An ack
packet is found to be a duplicate ack based on this definition used by 4.4 BSD Lite TCP
Stack [26] :
- The ack packet has the biggest ACK (acknowledgment number) ever seen.
- The ack should be pure (carry zero tcp data payload).
- The advertised window carried in the ack packet should not change from the last window advertisement.
- There must be some outstanding data.
;
triple dupacks The total number of triple duplicate acknowledgments received (three
duplicate acknowledgments acknowledging the same segment), a condition commonly used
to trigger the fast-retransmit/fast-recovery phase of TCP.
;
max # retrans The maximum number of retransmissions seen for any segment during
the lifetime of the connection.
;
min retr time The minimum time seen between any two (re)transmissions of a segment
amongst all the retransmissions seen.
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
20
;
max retr time The maximum time seen between any two (re)transmissions of a segment.
;
avg retr time The average time seen between any two (re)transmissions of a segment
calculated from all the retransmissions.
;
sdv retr time The standard deviation of the retransmission-time samples obtained from
all the retransmissions.
The raw RTT samples found can also be dumped into data files with the -Z option as in
tcptrace -Z file.dmp
This generates files of the form a2b rttraw.dat and b2a rttraw.dat (for both directions of the first
TCP connection traced), c2d rttraw.dat and d2c rttraw.dat (for the second TCP connection traced)
etc. in the working directory. Each of the datafiles contain lines of the form :
seq#
rtt
where seq# is the sequence number of the first byte of the segment being acknowledged (by the
ack packet that contributed this RTT sample) and rtt is the RTT value in milli-seconds of the
sample. Note that only valid RTT samples (as counted in the RTT Samples field listed above)
are dumped.
CWND Stats
tcptrace reports statistics on the estimated congestion window with the -W option when used in
conjunction with the -l option. Since there is no direct way to determine the congestion window at
the TCP sender, the outstanding unacknowledged data is used to estimate the congestion window.
The 4 new statistics produced by the -W option in addition to the detailed statistics reported due to
the -l option, are explained below.
surya:/home/mani/tcptrace-manual> tcptrace -lW malus.dmp.gz
1 arg remaining, starting with ’malus.dmp.gz’
Ostermann’s tcptrace -- version 6.4.6 -- Tue Jul 1, 2003
32 packets seen, 32 TCP packets traced
elapsed wallclock time: 0:00:00.026658, 1200 pkts/sec analyzed
trace file elapsed time: 0:00:00.404427
TCP connection info:
1 TCP connection traced:
TCP connection 1:
host a:
elephus.cs.ohiou.edu:59518
host b:
A17-112-152-32.apple.com:80
complete conn: yes
first packet: Thu Jul 10 19:12:54.914101 2003
last packet:
Thu Jul 10 19:12:55.318528 2003
elapsed time: 0:00:00.404427
total packets: 32
1.4. TSTAT OVERVIEW
21
filename:
malus.dmp.gz
a->b:
b->a:
total packets:
16
. . .
. . .
avg win adv:
22091 bytes
max owin:
min non-zero owin:
avg owin:
wavg owin:
451
1
31
113
bytes
bytes
bytes
bytes
initial window:
. . .
. . .
throughput:
450 bytes
1113 Bps
total packets:
. . .
. . .
avg win adv:
16
33304 bytes
max owin:
min non-zero owin:
avg owin:
wavg owin:
1449
1
1213
682
bytes
bytes
bytes
bytes
initial window:
. . .
. . .
throughput:
1448 bytes
44957 Bps
;
max owin The maximum outstanding unacknowledged data (in bytes) seen at any point in
time in the lifetime of the connection.
;
min non-zero owin The minimum (non-zero) outstanding unacknowledged data (in
bytes) seen.
;
avg owin The average outstanding unacknowledged data (in bytes), calculated from the
sum of all the outstanding data byte samples (in bytes) divided by the total number of samples.
;
wavg owin The weighted average outstanding unacknowledged data seen. For example,
if the outstanding data (odata) was 500 bytes for the first 0.1 seconds, 1000 bytes for the
next 1 second, and 2000 bytes for the last 0.1 seconds of a connection that lasted 1.2 seconds, wavg owin= ((500 x 0.1) + (1000 x 1) + (2000 x 0.1)) / 1.2 = 1041.67 bytes an
estimate closer to 1000 bytes which was the outstanding data for the most of the lifetime of
the connection. Note that the straight-forward average reported in avg owin would have
been (500+1000+2000)/1.2 = 2916.67 bytes, a value less indicative of the outstanding data
observed during most of the connection’s lifetime.
1.4.3 Output Overview
This section provides a very brief overview of the Tstat output; since a number of them have alredy
been explained in the previous sections, being inherited by TCPtrace, this section will only report
a list of the items known to Tstat.
Col
1
2
3
4
5
6
7
Label
Client IP addr
Client TCP port
packets
RST sent
ACK sent
PURE ACK sent
unique bytes
Description
IP addresses of the client
TCP port addresses for the client
Total number of packets observed form the client
0 if no RST segment has been sent by the client, 1 otherwise
Number of segments with the ACK field set to 1
Number of segments with ACK field set to 1 and no data
Number of bytes sent in the payload
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
22
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
data pkts
data bytes
rexmit pkts
rexmit bytes
out seq pkts
SYN count
FIN count
RFC1323 ws
RFC1323 ts
window scale
SACK req
SACK sent
MSS
max seg size
min seg size
win max
24
win min
25
26
win zero
cwin max
27
28
29
cwin min
initial cwin
Average rtt
30
31
32
33
34
35
36
37
38
rtt min
rtt max
Stdev rtt
rtt count
rtx RTO
rtx FR
reordering
net dup
unknown
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
flow control
unnece rtx RTO
unnece rtx FR
!= SYN seqno
Server IP addr
Server TCP port
packets
RST sent
ACK sent
PURE ACK sent
unique bytes
data pkts
data bytes
rexmit pkts
rexmit bytes
out seq pkts
SYN count
Number of segments with payload
Number of bytes transmitted in the payload, including retransmissions
Number of retransmitted segments
Number of retransmitted bytes
Number of segments observed out of sequence
Number of SYN segments observed (including rtx)
Number of FIN segments observed (including rtx)
Window scale option sent [boolean]
Timestamp option sent [boolean]
Scaling values negotiated [scale factor]
SACK option set [boolean]
number of SACK messages sent
MSS declared [bytes]
Maxium segment size observed [bytes]
Minimum segment size observed [bytes]
Maximum receiver window announced (already scale by the window scale factor)
[bytes]
Maximum receiver window announced (already scale by the window scale factor)
[bytes]
Total number of segments declaring zero as receiver window
Maximum in-flight-size (= largest sequence number - corresponding last ACK )
[bytes]
Minimum in-flight-size [bytes]
First in-flight size (or total unacked bytes sent before receiving the first ACK) [bytes]
Average RTT (time elapsed between the data segment and the corresponding ACK)
[ms]
Minimum RTT observed during connection lifetime [ms]
Maximum RTT observed during connection lifetime [ms]
Standard deviation of the RTT [ms]
Number of valid RTT observation
Number of retransmitted segments due to timeout expiration
Number of retransmitted segments due to Fast Retransmit (three dup-ack)
Number of packet reordering observed
Number of network duplicates observed
Number of segments not in sequence (or duplicate which are not classified as specific
events)
Number of retransmitted segments to probe the receiver window
Number of unnecessary transmissions following a timeout expiration
Number of unnecessary transmissions following a fast retransmit
Set to 1 if the eventual retransmitted SYN segments have different initial seqno
IP addresses of the server
TCP port addresses for the server
Total number of packets observed form the server
0 if no RST segment has been sent by the server, 1 otherwise
Number of segments with the ACK field set to 1
Number of segments with ACK field set to 1 and no data
Number of bytes sent in the payload
Number of segments with payload
Number of bytes transmitted in the payload, including retransmissions
Number of retransmitted segments
Number of retransmitted bytes
Number of segments observed out of sequence
Number of SYN segments observed (including rtx)
1.4. TSTAT OVERVIEW
56
57
58
59
60
61
62
63
64
65
FIN count
RFC1323 ws
RFC1323 ts
window scale
SACK req
SACK sent
MSS
max seg size
min seg size
win max
66
win min
67
68
win zero
cwin max
69
70
cwin min
initial cwin
71
Average rtt
72
73
74
75
76
77
78
79
80
rtt min
rtt max
Stdev rtt
rtt count
rtx RTO
rtx FR
reordering
net dup
unknown
81
82
83
84
85
86
87
88
89
90
91
92
flow control
unnece rtx RTO
unnece rtx FR
!= SYN seqno
Completion time
First time
Last time
C first payload
S first payload
C last payload
S last payload
Internal
23
Number of FIN segments observed (including rtx)
Window scale option sent [boolean]
Timestamp option sent [boolean]
Scaling values negotiated [scale factor]
SACK option set [boolean]
number of SACK messages sent
MSS declared [bytes]
Maxium segment size observed [bytes]
Minimum segment size observed [bytes]
Maximum receiver window announced (already scale by the window scale factor)
[bytes]
Maximum receiver window announced (already scale by the window scale factor)
[bytes]
Total number of segments declaring zero as receiver window
Maximum in-flight-size (= largest sequence number - corresponding last ACK )
[bytes]
Minimum in-flight-size [bytes]
First in-flight size, or total number of unack-ed bytes sent before receiving the first
ACK segment [bytes]
Average RTT (time elapsed between the data segment and the corresponding
ACK)[ms]
Minimum RTT observed during connection lifetime [ms]
Maximum RTT observed during connection lifetime [ms]
Standard deviation of the RTT [ms]
Number of valid RTT observation
Number of retransmitted segments due to timeout expiration
Number of retransmitted segments due to Fast Retransmit (three dup-ack)
Number of packet reordering observed
Number of network duplicates observed
Number of segments not in sequence (or duplicate which are not classified as specific
events)
Number of retransmitted segments to probe the receiver window
Number of unnecessary transmissions following a timeout expiration
Number of unnecessary transmissions following a fast retransmit
Set to 1 if the eventual retransmitted SYN segments have different initial seqno
Flow duration since first packet to last packet [ms]
Flow first packet since first segment ever [ms]
Flow last segment since first segment ever [ms]
Client first segment with payload since the first flow segment [ms]
Server first segment with payload since the first flow segment [ms]
Client last segment with payload since the first flow segment [ms]
Server last segment with payload since the first flow segment [ms]
Bool set to 1 if the client has internal IP
Table 1.2:
Tstat Output: Log Field Description
Basically, there are two currently supported kinds of output, although for on-going extensions,
we refer the readed to Section 7.1.2. The first is a flow-level log, which tracks the statistics reported
in Table 1.2; the table is partitioned into three macro-blocks, which reflects the flow direction: i.e.,
client to server, server to client, common to both and server and client. Notice that, since the log is
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
24
created on-line, the flows are sorted by their finish-time. The second kind of output is a periodical
histogram dump, where some of the formerly mentioned statistics are tracked for all the aggregated
flows in higher details; the configurable dump-interval time is, by default, 15 minutes.
1.4.4 Usage Overview
For completeness, we report here a simple and brief overview of the calling syntax of Tstat: for
a more detailed description we refer the reader to the manual available along with the standard
distribution; conversely, we refer the reader to the next chapter for an exhaustive example of the
analysis allowed by the tool.
Usage:
tstat
Where:
-Nfile:
-d:
-h:
-H:
-t:
-u:
-v:
-w:
-m:
-l:
-iifc:
-ffile:
-dag:
-S:
-Rconf:
-rpath:
[-Nfile] [-d n] [-lmhtuvw] [-sdir]
[-iinterface] [-ffilterfile] <file1 file2>
[-dag device_name device_name ...]
specify the file name which contains the
description of the internal networks.
This file must contain the subnets that will be
considered as ’internal’ during the analyisis
Each subnet must be specified using IP address
on the first line and netmask on the second one:
130.192.0.0
255.255.0.0
193.204.134.0
255.255.255.0
increase debug level (repeat to increase debug level)
print this help and exit
print insternal histograms names and definitions
print ticks showing the trace analysis progress
do not trace udp packets
print version and exit
print *lots* of warning
enable multi-threaded engine (useful for live capture)
enable live capture using libpcap
specifies the interface to be used to capture traffic
specifies the libpcap filter file (see tcpdump)
enable live capture usign Endace DAG cards; the default
device for capture is /dev/dag0
pure rrd-engine for ever-lating scalability (do not create
histograms and log_files)
specify the configuration file for integration with
RRDTools (See README.rrdtools for further information)
path to use to create/update the RRDTool database: this
should better be outside the directory tree and should
1.5. THE NETWORK SCENARIO
-sdir:
file:
25
be accessible from the Web server
puts the trace analysis into the specified directory
trace file to be analyzed; use ’stdin’ to read from
standard input (e.g., when piped)
1.5 The Network Scenario
Politecnico di Torino LAN is connected to the Internet through the The Italian Academic and
Research Network [27] (GARR). This section briefly overview this network in both its current
setup Section 1.5.1, proposing some simple statistics in Section 1.5.2, and anticipating its future
evolution Section 1.5.3.
Before entering into further details, let us assert that despite the networking scenario is common
to all the research works described in the following chapters, neverthless it would be cumbersome
to introduce a common notation; on the same line, altough the data sets used for all the works
are somehow similar, neverthless, there are both minor and critical differences. Therefore, while
we provide a comprehensive statistical study of the typical data pattern observed on our network,
conversely we will reintroduce the notation and the describe the setup of each work separately and
at the risk of redundacy.
1.5.1 The GARR-B Architecture
The currently active network service is named “GARR-B Phase 4a: GARR BroadBand Network”
and it deploys the GARR-B Project[28]. GARR-B supports the TCP/IP Transport Service with
Managed Badwidth Service (MBS) (which is also available for new services exprimentation) on
top of a backbone running at 155 Mbps and 2.5 Gbps. GARR-B replaces the previous GARR-2
service, which was discontinued in late april 1999. Figure 1.2 depicts the evolution of the GARR-B
Network from april 1999 to may 2004, and will be used as reference in the following.
The GARR-B network is widely interconnected with the other european and worldwide Research Networks (including Internet2) via the 2.5 Gbps link to GEANT[30] backbone, and the
commercial Internet with a 2.5 Gbps link (Global Crossing - GX) and a 622 Mbps link (KPNQwest - KQ).
The route from Politecnico di Torino, toward an over-ocean Internet machine is as follows:
;
from Politecnico LAN to Torino (TO) access POP over an ATM AAL-5 OC5 link at 34 Mbps
(not shown on the map)
;
from the TO access POP to the Milano (MI) router (of the fully meshed backbone among
MI, BO, RM, NA), over a 155 Mbps channel (red on the map)
;
from the MI backbone router to:
– MI-GEANT 2.5 Gbpsm (violet on the map)
– MI-GX 2.5 Gbps (violet)
– any other Backbone router either 2.5 Gbps or 155 Mbps (respectively, blue and red)
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
26
Figure 1.2: Evolution of the GARR-B Network: from April 1999 to May 2004
1.5.2 The GARR-B Statistics
Abstracting from the network geography, Figure 1.3 represents the interconnection of functional
network elements, where edges are represented using different colors depending on the traffic
volume exchanged. Let now consider the Torino Metropolitan Area Network (MAN) subportion
of the whole GARR-B Network, as the zoomed inset of Figure 1.3 shows: it can be seen that out
institution is the most active network element.
Focusing on Politecnico di Torino, we can observe input versus output load statistics over different timescales. Since the measure are taken at the GARR, the input direction is to be considered
with the POP perspective. Figure 1.4 reports yearly to monthly and weekly to daily timescales.
The graphics shown in Figure 1.4 were created by the GARR usign the Multi Router Traffic
Grapher (MRTG) [29]. MRTG is a tool to monitor the traffic load on network-links, which generates HTML pages containing graphical images which provide a LIVE visual representation of this
traffic. As it can be gathered the output traffic represent the predominant traffic portion; considering
the POP perspective, the traffic streams entering the Politecnico di Torino have to be considered
as output streams. Therefore, the Politecnico LAN is principally a client network, where client
contact external servers (e.g., for file download or others services). Similary, Table 1.3 reports the
L
numerical results referring to Figure 1.4; istantaneous values have been sampled Tuesday the >
of October 2004, nearly at tea time3 .
1.5.3 The GARR-G Project
The GARR-G project defines a program of activities to develop and provision the next generation
of the data transmission network infrastructure for the italian academic and research community,
3
Probably, today 5’o clock tea will be an excellent Fortnum and Mason’s Earl Gray
1.5. THE NETWORK SCENARIO
27
Figure 1.3: Logical Map of the Italian GARR-B Network, with a Zoomed View of the Torino
MAN
Direction
‘Daily’ Graph
(5 Minute Average)
‘Weekly’ Graph
(30 Minute Average)
‘Monthly’ Graph
(2 Hour Average)
‘Yearly’ Graph
(1 Day Average)
In
Out
In
Out
In
Out
In
Out
Bandwidth Mbps (Utilization%)
Max
Average
Istantaneous
17.8 (63.7%) 4.75 (17.0%) 9.71 (34.7%)
28.0 (100.0%) 11.0 (39.3%) 27.7 (98.8%)
18.0 (64.4%) 5.27 (18.8%) 8.94 (31.9%)
27.7 (98.8%) 9.56 (34.2%) 27.3 (97.6%)
16.8 (60.0%) 4.87 (17.4%) 11.4 (40.6%)
27.5 (98.0%) 8.25 (29.5%) 27.0 (96.3%)
15.6 (55.8%) 5.14 (18.4%) 4.95 (17.7%)
16.1 (57.3%) 7.75 (27.7%) 12.9 (46.2%)
Table 1.3: Input versus Output Bandwidth and Utilization Statistics over Different Time-Scales:
Yearly to Monthly and Weekly to Daily
named GARR-Giganet.
GARR-Giganet will be the evolution of the present GARR-B network. GARR-Giganet will
continue to provide network connectivity between all the italian academic and reasearch institutions and with the rest of the world and have an active role in the development of the information
society.
28
CHAPTER 1. A PRIMER ON NETWORK MEASUREMENT
Figure 1.4: Input versus Output Load Statistics over Different Time-Scales: Yearly to Monthly and
Weekly to Daily
The GARR-Giganet network will provide connectivity and services, always being at the stateof-the art in transmission technology supporting IPv6 addresses and advanced Quality of Service
(QoS) policies. The network will provide an highly distributed access. The backbone will be based
on a high speed optical transport, using point to point “lambdas” with a link speed not lower then
2.5 Gigabit per second. The development of metropolitan networks connected to GARR-GigaPoPs
will allow high speed advanced services to GARR users connected to MANs.
The international connections to European Networks (GEANT)[30] and international (research
networks like Internet2[31] and general interest) are part of the basic infrastructure and will also
be connected at high speed. Meanwhile, a pilot network, named GARR-G Pilot, has been installed
and is operational since the beginning of 2001. It is based on optical “lambdas” links at 2.5 Gb/s
and it spans well over half of the Italian territory: the Pilot provides solid ground to the engineering
of GARR-Giganet.
Chapter 2
The Measurement Setup
HIS chapter introduce Tstat [34], a new tool for the collection and statistical analysis of
TCP/IP traffic, able to infer TCP connection status from traces. Discussing its use, we present
some of the performance figures that can be obtained and the insight that such figures can give on
TCP/IP protocols and the Internet.
While field measures have always been the starting point for networks planning and dimensioning, their statistical analysis beyond simple traffic volume estimation is not so common. Analyzing
Internet traffic is difficult because a large amount of performance figures that can be devised in
TCP/IP networks, but also because many performance figures can be derived only if bidirectional
traffic is jointly considered. Tstat automatically correlates incoming and outgoing flows and
derives about 80 different performance indices both at the IP and at the TCP level, allowing a very
deep insight in the network performance.
Moreover, while standard performance measure, such as flow dimensions, traffic distribution,
etc., remain at the base of traffic evaluation, more sophisticated indices, like the out-of-order probability and gap dimension in TCP connections, obtained through data correlation between the
incoming and outgoing traffic, give reliable estimates of the network performance also from the
user perspective. Several of these indices are discussed on traffic measures performed for more
than 2 months on the access link of our institution.
2.1 Traffic Measures in the Internet
Planning and dimensioning of telecom networks is traditionally based on traffic measurements,
upon which estimates and mathematical models are built. While this process proved to be reasonably simple in traditional, circuit switched, networks, it seems to be much harder in packet
switched IP based networks, where the TCP/IP client-server communication paradigm introduces
correlation both in space and time.
While a large part of this difficulty lies in the failure of traditional modeling paradigms [35, 43],
there are also several key points to be solved in performing the measures themselves and, most of
all, in organizing the enormous amount of data that are collected through measures.
First of all, the client-server communication paradigm implies that the traffic behavior does
have meaning only when the forward and backward traffic are jointly analyzed, otherwise half of
the story goes unwritten, and should be wearingly inferred (with high risk of making mistakes!).
29
30
CHAPTER 2. THE MEASUREMENT SETUP
This problem makes measuring inherently difficult; it can be solved if measures are taken on the
network edge, where the outgoing and incoming flows are necessarily coupled, but it can prove
impossible in the backbone, where the peering contracts among providers often disjoint the forward
and backward routes [44].
Second, data traffic must be characterized to a higher level of detail than voice traffic, since
the ‘always-on’ characteristics of most sources and the nature itself of packet switching require
the collections of data at the session, flow, and packet level, while circuit switched traffic is well
characterized by the connection level alone. This is due to the source model of the traffic, which is
well characterized and relatively simple in case of voice traffic, but more complex and variable in
case of data networks, where different application models can coexist and interact together. In the
absence of CAC (Connection Admission Control) functions and with a connectionless communication model, the notion of connection itself becomes quite fuzzy in the Internet.
Summarizing, the layered structure of the TCP/IP protocol suite, requires the analysis of traffic
at least at the IP, TCP/UDP, and Application level in order to have a picture of the traffic clear
enough to allow the interpretation of data.
Starting from the pioneering work of Danzig [57, 58, 59] and of Paxons and Floyd [35, 60]
in which the authors characterized the traffic of the ”first Internet” via measures, there has always
been an increasing interest in the data collection, measure and analysis, to characterize either
the network protocol or the users behavior. After the birth of the Web, lots of effort has been
devoted to study caching and content delivery architecture, which intrinsically are based on the
deep knowledge of the traffic and user behavior. Thus many works analyze traces at the application
levels, typically log files of web servers or proxy servers [36, 37, 38]. These are then very helpful
understand user behavior, but less interesting from the network point of view.
Many projects are instead using real traffic traces, captured form large campus networks, like
the work in [39], where the authors characterize the HTTP protocol by using large traces collected
at the university campus at Berkeley. Similarly in [40] the authors present data collected from a
large Olympic server in 1996, where very useful findings are helpful to understand TCP behavior, like loss recovery efficiency and ACK compression. In [41], authors analyzed more than 23
millions of HTTP connections, and derived a model for the connection interarrival time. More recently, the authors of [42] analyze and derive models for the Web traffic, starting from the TCP/IP
header analysis. None of these works, however, characterize the traffic at the network level, rebuilding the status of single TCP connections, independently from the application level.
We are interested in passive tools which analyze traces rather than active tools that derive performance indices injecting traffic in the network, like for example the classic ping or traceroute
utilities. Among passive tools, many are based on the libpcap library developed with the
tcpdump tool [11, 10], that allow different protocol level analysis.
For example, tcpanaly is a tool for automatically analyzing a TCP implementation’s behavior by inspecting packet traces of the TCP’s activity. Another interesting tool is tcptrace [49],
which is able to rebuild a TCP connection status from traces, matching data segments and ACKs.
For each connection, it keeps track of elapsed time, bytes/segments sent and received, retransmissions, round trip times, window advertisements, throughput, etc. At IP level, ntop [45] is able to
collect statistics, enabling users to track relevant network activities including traffic characterization, network utilization, network protocol usage.
However, none of the tools are able to derive statistical data collection and post-elaboration.
2.2. THE TOOL: TSTAT
31
Thus, to the best of our knowledge, this chapter presents two different contributions in the field of
Internet traffic measures.
;
A new tool, briefly described in Sect. 2.2, for gathering and elaborating Internet measurements has been developed and made available to the community.
;
The description of the most interesting results of traffic analysis performed with the above
tool, discussing their implication on the network, at both the IP level in Sect. 2.4, and the
TCP level in Sect. 2.5.
In the remaining of the chapter we assume that the reader is familiar with the Internet terminology, that can be found in [46, 47, 48] for example.
2.2 The Tool: Tstat
The lack of automatic tools able to produce statistical data from collected network traces was a
major motivation to develop a new tool, called Tstat [34], which, starting from standard software
libraries, is able to offer network managers and researchers important information about classic and
novel performance indices and statistical data about Internet traffic.
Started as an evolution of TCPtrace [49], Tstat is able to analyze traces in real time
(Tstat processes a 6 hour long trace from a 16 Mbit/s link in about 15 minute), using common
PC hardware, or start from previously recorded traces in various dump formats, such the one supported by the libpcap library [11]. The software assumes to receive as input a trace collected on
an edge node, such that both data segments and ACK segments can be analyzed. Besides common
IP statistics, derived from the analysis of the IP header, Tstat is also able to rebuild each TCP
connection status looking at the TCP header in the forward and backward packet flows. If connection opening and closing is observed, the flow is marked as a complete flow, and then analyzed.
To free memory related to status used by TCP flows that are inactive, a timer is used as garbage
collector. TCP segments that belong to flows whose opening is not recorded (because either were
started before, or early declared closed by the garbage procedure) are discarded and marked as
“garbage”, and are put into the corresponding bin (or else into the toilet and then flushed).
The TCP flow analysis allows the derivation of novel statistics, such as, for example, the
congestion window size, out-of-sequence segments, duplicated segments, etc. Some of the data
analysis are described in deeper details in Sects. 2.3–2.5, along with the measurement campaign
conducted on our campus access node.
Statistic are collected distinguishing between clients and servers, i.e., hosts that actively open
a connection and hosts that replies to the connection request, but also identifying internal and
external hosts, i.e., hosts located in our campus LAN or outside it with respect to the edge node
where measures are collected. Thus incoming and outgoing packets/flows are identified.
Instead of dumping single measured data, for each measured quantity Tstat builds a histogram, collecting the distribution of that given quantity. Every 15 minutes, it produces a dump of
all the histograms it collected. A set of companion tools are available to produce both time plot,
or aggregated plot over different time spans. Moreover a Web interface is available [34], which allows the user to travel among all the collected data, select the desired quantity, and directly produce
graphs, as well as retrieve the raw data that can then later be used for further analysis.
CHAPTER 2. THE MEASUREMENT SETUP
32
Period
Jun. 00
Jan. 01
Pkts
MON,P!QSR
242.7
401.8
Protocol share [%]
other UDP TCP
0.75 5.96 93.29
0.59 4.31 95.10
Flows
MON,PTQSR
4.48
7.06
Trash
%
5.72
6.77
Table 2.1: Summary of the analyzed traces
A total of 79 different histogram types are available, comprising both IP and TCP statistics.
They range from classic measures directly available from the packet headers (e.g., percentage of
TCP or UDP packets, packet length distribution, TCP port distribution, . . . ), to advanced measures,
related to TCP (e.g., average congestion window, RTT estimates, out-of-sequence data, duplicated
data, . . . ). A complete log also keeps track of all the TCP flows analyzed, and is useful for postprocessing purposes.
2.3 Trace analysis
We performed a trace elaboration using Tstat on data collected on the Internet access link of Politecnico di Torino, i.e., between the border router of Politecnico and the access router of GARR/BTEN [27], the Italian and European Research network. Data were collected on files, each storing
6 hours long traces (to avoid incurring in File System size limitations), for a total of more than
100 Gbytes of compressed data. Within the Politecnico Campus LAN, there are approximately
7,000 access points; most of them are clients, but several servers are regularly accessed from outside institutions.
100
TCP
UDP
Other
%
10
1
0.1
Mon 01/29 Tue 01/30 Wed 01/31 Thu 02/01 Fri 02/02 Sat 02/03 Sun 02/04
00:00
03:46
07:33
11:20
15:06
18:53
22:40
time
Figure 2.1: IP payload traffic balance - Period (B)
2.3. TRACE ANALYSIS
33
The data was collected during different time periods, in which the network topology evolved.
Among them, we selected the earliest one and the last one.
;
Period (A) - June 2000: from 6/1/2000 to 6/11/2000, when the bandwidth of the access
link was 4 Mbit/s, and the link between the GARR and the corresponding US peering was
45 Mbit/s;
;
Period (B) - January 2001: from 1/19/2001 to 2/11/2001, when the bandwidth of the access
link was 16 Mbit/s, and the link between the GARR and the corresponding US peering was
622 Mbit/s.
The two periods are characterized by a significant upgrade in network capacity. In particular,
the link between GARR and the US was a bottleneck during June 2000, but not during January
2001. Other data collections are freely available though the web interface [34]. Every time we
observed a non negligible difference in the measures during different periods, we report both of
them. Otherwise, we report only the most recent one.
Table 2.1 summarizes the traces that are analyzed in next two sections. It shows that, not
surprisingly, the larger part of the traffic is transported using TCP protocol, being the UDP traffic
percentage about 5%, and other protocols practically negligible. The number of complete TCP
flows is larger than 4 and 7 millions in the two periods, and only about 6% of the TCP packets
were thrashed from the flow analysis, the majority of them belonging to the beginning of each
trace.
Fig. 2.1 plots the IP payload type evolution normalized versus the link capacity, during a week
in period (B). It is clearly visible the alternating effects between days and nights, and between
working days and weekend days. This periodic behavior allows us to define a busy period, that we
selected from 8 AM to 6 PM, Monday to Friday. Thus in the remaining of the chapter, we report
results averaged only on busy periods.
1.6
1.4
10
1
0.1
0.01
0.001
0.0001
1e-05
1e-06
1e-07
1.2
%
1
0.8
0.6
0.4
Jun. 00
Jan. 01
1
10
100
1000
10000 100000
0.2
0
0
10
20
30
40
50
60
70
80
90
100
Position
Figure 2.2: Distribution of the incoming traffic versus the source IP address
CHAPTER 2. THE MEASUREMENT SETUP
34
2.4 IP Level Measures
Most measurements campaigns in data networks concentrated on the traffic volumes, packet interarrival times and similar measures. We avoid reporting similar results because they do not differ
much from previously published ones, and because we think that from the data elaboration tool
presented, other and more interesting performance figures can be derived, which allows a deeper
insight in the Internet. Thus, we report only the most interesting statistics that can be gathered
looking at the IP protocol header, referring the reader to [34] to analyze all the figures he might be
interested into.
Fig. 2.2 plots the distribution of the hit count for incoming traffic, i.e., the relative number
of times the same IP addresses was seen, at the IP level. The log/log scale plot in the inside box
draw all the distribution, while the larger, linear scale plot magnifies the first 100 positions of the
distribution. More than 200,000 different hosts were contacted, with the top 10 sending about 5%
of the packets. It is interesting to note that the distribution of the traffic is very similar during the
two different periods, but, looking at the top 100 IP addresses, little correlation can be found: the
most contacted hosts are different, but the relative quantity of traffic they send is, surprisingly, the
same. This confirms the difficulties in predicting the traffic pattern in the Internet.
A further, very interesting feature, is the similarity of the TCP flow and the IP packet distribution. The reason lies probably in the dominance of short, web-browsing flows in the overall
traffic.
June 2000
Host name
Flows
.cineca.it
244206
.kataweb.it
91880
.ilsole24ore.it 79039
.ilsole24ore.it 68194
.kataweb.it
64202
.heartland.net 53539
.edu.tr
51556
.ilsole24ore.it 50721
.banki.hu
47092
.iol.it
42837
Jan. 2001
Host name
Flows
.kataweb.it 193508
.cineca.it
184770
.e-urope.it
93989
.iol.it
87946
.matrix.it
76917
.kataweb.it
75611
.kataweb.it
74990
.iol.it
61526
.supereva.it 61237
.matrix.it
57550
Table 2.2: Host name of the 10 most contacted hosts on a flow basis
Looking instead at the distance (number of hops) between the client and the server, Fig. 2.3
reports the distribution of the Time To Live (TTL) distinguishing between incoming and outgoing
packets. For the outgoing traffic, TTL distribution is concentrated on the initial values set by
the different operating systems: 128 (Win 98/NT/2000), 64 (Linux, SunOs 5.8), 32 (Win 95), 60
(Digital OSF/1) being the most common. For the incoming traffic, instead, we can see very similar
distributions at the left of each peak, reflecting the number of routers traversed by packets before
arriving at the measurement point. The zoomed plot in the box, shows that, supposing that the
outside hosts set the TTL to 128, the number of hops traversed by packets is between 25 and 5
hops1 .
1
The minimum is intrinsically due to the topological position of the Politecnico gateway [27].
2.5. TCP LEVEL MEASURES
35
35
In
Out
30
5
4
20
3
%
25
2
15
1
10
0
98 104 110 116 122 128
5
0
0
32
64
96
128
160
TTL
192
224
256
288
Figure 2.3: Distribution of the TTL field value for outgoing and incoming packets - Period (B)
Table 2.3: TCP options negotiated
Option
succ.
SACK
WinScale
TimeStamp
5.0
10.9
3.1
SACK
WinScale
TimeStamp
11.6
5.1
4.5
client server
June 2000
29.5
0.1
19.2
1.3
3.1
0.1
January 2001
32.9
0.3
10.9
1.2
4.5
0.0
unset
70.4
79.5
96.9
66.7
87.9
95.5
2.5 TCP Level Measures
We concentrate now on TCP traffic, that represent the vast majority of the collected traffic. Tstat
offers the most interesting (and novel) performance figures at this level.
The first figure we look at, is the TCP options [50, 51] negotiated during the three way handshake. Table 2.3 shows the percentage of the clients that requested an option in the “client” column; if the server positively replies, then the option is successfully negotiated and accounted in
the “succ” column; otherwise it will not be used. The “unset” percentage counts connections
where no option was present, from either side. Finally, the “server” column reports the percentage
of servers that, although did not receive the option request, do sent an option acknowledge. For
example, looking at the SACK option, we see that about 30% of clients declared SACK capabilities, that were accepted by servers only in 5% of connections in June 2000 and 11.6% in January
2001. Note the “strange” behavior of some servers: 0.1% and 0.3% of replies contain a positive
acknowledgment to clients that did not request the option.
In general we can state that TCP options are rarely used, and, while the SACK option is in-
CHAPTER 2. THE MEASUREMENT SETUP
36
creasingly used, the use of Window Scale and Timestamp options is either constant or decreasing.
2.5.1 TCP flow level analysis
We now consider flow-level figures, those that require to correlate the flow in both directions for
their derivation.
14
In
Out
12
100
10
10
1
8
%
0.1
6
0.01
0.001
4
0.0001
2
0
0
2000
50
500
4000
6000
flow length [bytes]
10
5000
50000
8000
10000
In
Out
1
%
0.1
0.01
0.001
0.0001
1e-05
100000
1e+06
1e+07
flow length [bytes]
Figure 2.4: Incoming and outgoing flows size distribution; tail distribution in log-log scale (lower
plot); zoom in linear and log-log scale of the portion near the origin (upper plots) - Period (B)
Fig. 2.4 reports 3 plots. The lower one shows the tail of the distribution in log-log scale, showing a clear linear trend, typical of heavy tailed distributions. The linear plot (upper, large one)
shows a magnification near the origin, with the characteristic peak of very short connections, typically of a few bytes. The inset in log-log scale shows the portion of the distribution where the mass
2.5. TCP LEVEL MEASURES
37
Table 2.4: Percentage of TCP traffic generated by common applications in number of flows, segments and transferred bytes - Period (B)
Service
HTTP
SMTP
HTTPS
POP
FTP control
GNUTELLA
ftp data
Port
80
25
443
110
21
6346
20
flow %
81.48
2.98
1.66
1.26
0.54
0.53
0.51
segm. %
62.78
2.51
0.87
0.93
0.54
2.44
6.04
bytes %
61.27
2.04
0.52
0.42
0.50
1.58
9.46
is concentrated, and the linear decay begins. In particular, incoming flows shorter than 10 kbytes
are 83% of the total analyzed flows, while outgoing flows shorter than 10 kbytes are 98.5%. Notice
also that the dimension of incoming flows is consistently larger than that of outgoing flows, except
very close to the origin.
The TCP port number distribution, which directly translates in traffic generated by the most
popular applications, is reported in Table 2.4, sorted by decreasing percentage. Results are reported
for each application in percentage of flows, segments (including signalling and control ones), and
bytes (considering the payload only). The application average flow size on both directions of
TCP flows is instead reported in Table 2.5. These measures take a different look at the problem;
indeed that the largest portion of the Internet traffic is web-browsing is no news at all, and that
FTP, amounts to roughly 10% of the byte traffic, tough the number of FTP flows are marginal, is
again well known. The different amount of data transferred by the applications in the client-server
and server-client directions is instead not as well known, though not surprising. The asymmetry is
much more evident when expressed in bytes than is segments, hinting to a large number of control
segments (acknowledgments) sent without data to piggyback them on. For example, a HTTP client
sends to the server about 1 kbyte of data, and receives about 16 kbytes as reply. But more than 15
segments go from the client to the server, while more than 19 segments from the server to the client
are requested to transport the data.
Table 2.5: Average data per flow sent by common applications, in segments and bytes - Period (B)
Service
HTTP
SMTP
HTTPS
POP
FTP control
GNUTELLA
FTP data
Average
client to server
server to client
segment
byte
segment
byte
15.2
1189.9
19.5
15998.4
21.2
15034.4
16.7
624.3
11.5
936.7
12.3
6255.8
14.9
91.1
18.5
7489.0
23.7
11931.1
21.9
9254.3
101.5
23806.9
105.7
44393.9
314.0
343921.3
223.5
82873.2
Fig. 2.5 confirms intuition given by Table 2.5. The figure reports the index of asymmetry U
of connections, obtained as the ratio between the client-server and server-client plus client-server
CHAPTER 2. THE MEASUREMENT SETUP
38
5
4.5
4
3.5
%
3
2.5
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
ξ
0.6
0.7
0.8
0.9
1
30
30
25
25
20
20
15
%
10
15
5
0
10
0.45 0.475 0.5 0.525 0.55
5
0
0
0.2
0.4
0.6
0.8
1
ξ
Figure 2.5: Asymmetry distribution of connections expressed in bytes (upper plot) and segments
(lower plot) - Period (B)
amount of transferred data, i.e.:
WYX
UV
W(X
X
1[Z\
X
1[Z\
]_4a`
WYX
]^4
X
1]b\
ZE4
measured as either bytes-wise net-payload (upper plot), or segment-wise (bottom plot), which
again includes all the signalling/control segments. The upper plot shows a clear trend to asymmetric connections, with much more bytes transferred from the server to the client. If we consider
the number of segments, instead, connections are almost perfectly symmetrical, as highlighted by
the inset, magnifying the central portion of the distribution: 25% of the connections are perfectly
symmetrical, and no effect is observed due to the delayed ACK implementation. This observation
2.5. TCP LEVEL MEASURES
39
can have not marginal effects in the design of routers, which, regardless of the asymmetry of the
information in bytes, must always route and switch an almost equal number of packets in both
directions.
Fig. 2.6 reports the distribution of the connections completion time, i.e., the time elapsed between the first segment in the three-way-handshake and the last segment that closes the connection
in both directions. Obviously, this measure depends upon the application, data size, path characteristics, network congestion, possible packet drops, etc. There are however several characteristic
features in the plot. First of all, most of the connections complete in just a few seconds: indeed,
about 52% last less than 1 second, 78% less than 10 second, only 5% of the connections are not
terminated after more than 70 s, where the histogram collects all the remaining flows. Moreover,
the completion time tends to be heavy tailed, which can be related to the heavy tailed flow size.
Finally, spikes in the plot are due to application level timeouts (e.g., 15 seconds correspond to a
timer in the AUTH protocol, 30 seconds to caches and web servers timers) which are generally
not considered at all in traffic analysis. Interestingly, in the inset, it is also possible to observe the
retransmission timeout suffered by TCP flows whose first segment is dropped, that is set by default
at 3 seconds.
10
10
1
1
%
0.1
0.01
0.1
0
2
4
6
8
10
0.01
0.001
0
10
20
30
40
50
60
70
s
Figure 2.6: Distribution of the connections completion time - Period(B)
2.5.2 Inferring TCP Dynamics from Measured Data
In this last section, we show how some more sophisticated elaborations of the raw measured data
can be used to obtain insight in TCP behavior and dynamics.
Fig. 2.7 plots the advertised receiver window (rwnd) advertised in the TCP header during
handshake. Looking at the plots during period (A), we note that about 50% of clients advertise
rwnd around 8 kbytes, while 16 kbytes is used by about 9% of the connections, and 30% uses
about 32 kbytes. These values are obtained summing together all the bins around 8, 16 and 32.
During period (B), we observe a general increase in the initial rwnd, being now 44%, 19%, 24%
CHAPTER 2. THE MEASUREMENT SETUP
%
40
16
14
12
10
8
6
4
2
0
Jun. 00 - Avg
0 8 16 24 32 40 48 56 64
16
14
12
10
8
6
4
2
0
Jan. 01 - Avg
0
8 16 24 32 40 48 56 64
Figure 2.7: Distribution of “rwnd” as advertised during handshake
the respective percentages. Note that 8 kbytes rwnd can be a strong limitation on the maximum
throughput a connection can reach, as for 200 ms Round Trip Time correspond to about 40 kbytes/s.
In order to complete the picture, Fig. 2.8 plot the estimated in-flight data for the outgoing
flows, i.e., the bytes already sent from the source inside our LAN, and whose ACK is not yet received, evaluated looking at the sequence and acknowledgement number in the opposite directions.
Given that the measures are collected very close to the sender, and that the rwnd is not a constraints,
this is an estimate of the sender congestion window. The discrete-like result clearly shows the effect of the segmentation used by TCP. Moreover, being flow length very short (see Fig. 2.4), the
flight size is always concentrated on small values: more than 83% of samples are indeed counted
for flight sizes smaller than 4Kbytes, Finally, note that the increased network capacity in period
(B) does not apparently affect the in-flight data, and hence the congestion windows observed. This
suggests that in the current Internet scenario, where most of the flows are very short and the main
limitation to the performance of the others seems to be the receiver buffer, the dynamic, sliding
window implementation in TCP rarely comes into play. The only effect of TCP on the performance is delaying the data transfer both with the three-way handshake, and with unnecessary, very
long timeouts when one of the first packet of the flow is dropped, an event that is due to the traffic
fluctuations and not to congestion induced by the short flow itself.
The out-of-sequence burst size (OutB), i.e., the byte-wise length of data received out of sequence, and similarly, the duplicated data burst size (DupB), i.e., the byte-wise length of contiguous duplicated data received, are other interesting performance figures related to TCP and the
Internet. In particular a OutB can be observed if either a packet reordering had been performed in
the network, or if packets were dropped along the path, but if and only if the flow length is larger
than one segment. A DupB instead can be observed if either a packet is replicated in the network,
or if, after a packet drop, the recovery phase performed by the sender covers already received
segments.
Table 2.6 reports the probability of observing OutB (or DupB) events, evaluated with respect
to the number of segments and flows observed, i.e., the ratio between the total OutB (or DupB)
events recorded and the number of packets or flows observed during in the same period of time.
2.5. TCP LEVEL MEASURES
41
40
35
100
10
30
1
25
%
Jun. 00
Jan. 01
0.1
20
0.01
0.001
15
0.0001
256
10
1024
4096
16384
10240
12288
65536
5
0
0
2048
4096
6144
8192
bytes
14336
Figure 2.8: TCP congestion window estimated from the TCP header
Table 2.6: OutB or DupB events rate, computed with respect to the number of packets and flows
Period
Jun. 00
Jan. 01
Period
Jun. 00
Jan. 01
P c OutB d
vs Pkt
vs flow
in % out % in % out %
3.44
0.07 43.41 0.43
1.70
0.03 23.03 0.99
P c DupB d
vs Pkt
vs flow
in % out % in % out %
1.45
1.47 18.27 18.59
1.31
1.09 17.65 14.81
Starting from OutB, we see that practically no OutB events are recorded on the outgoing flows,
thanks to the simple LAN topology, which, being a 100 Mbit/s switched Ethernet, rarely drop
packets (recall that the access link capacity is either 4 Mbit/s or 16 Mbit/s). On the contrary, the
probability of observing an OutB is rather large for the incoming data: 3.4% of packets are received out of sequence in period (A), corresponding to a 43% of probability when related to flow.
Looking at the measure referring to period (B) we observe a halved chance of OutB, 1.7% and 23%
respectively. This is mainly due to the increased capacity of the access and the US peering links,
that reduced the dropping probability; however, the loss probability remains very high, specially
thinking that these are average values over the whole working day.
Looking at the DupB probabilities, and recalling that the internal LAN can be considered a
sequential, drop-free environment, the duplicated burst recorded on the outgoing data can be
ascribed to dropping events recovered by the servers. Thus a measure of the dropping probability is
derived. Incoming DupB events are due to retransmission from external hosts of already received
42
CHAPTER 2. THE MEASUREMENT SETUP
data, as the probability that a packet is recorded on the trace and then later dropped on the LAN is
negligible.
2.6 Conclusions
This chapter presented a novel tool for Internet traffic data collection and its statistical elaboration,
that offers roughly 80 different types of plots and measurement figures, ranging from the simple
amount of observed traffic, to complex reconstructions of the flow evolution.
The major novelty of the tool is its capability of correlating the outgoing and incoming flow
at a single edge router, thus inferring the performance and behavior of any single flow observed
during the measurement period.
Exploiting this capability, we have presented and discussed some statistical analysis performed
on data collected at the ingress router of our institution. The results presented offer a deep insight in
the behavior of both the IP and the TCP protocols, highlighting several characteristics of the traffic
that, to our best knowledge, were never observed on ‘normal’ traffic, but were only generated by
injecting ad-hoc flows in the Internet or observed in simulations.
Chapter 3
User Patience and the World Wide Wait
HIS chapter, whose results have been published in [52], presents a study of web user behavior when network performance decreases causing the increase of page transfer times. Real
traffic measurements are analyzed to infer whether worsening network conditions translate into
greater impatience by the user, which translates in early interruption of TCP connections. Several
parameters are studied, to gather their impact on the interruption probability upon web transfers:
times of day, file size, throughput and time elapsed since the beginning of the download. Results
presented try to paint a picture of the complex interactions between user perception of the Web and
network-level events.
3.1 Background
Despite the growing amount of peer-to-peer traffic exchanged, web browsing remains one of the
most popular activities on the Internet. Web users, at a rather unconscious level, usually define
their browsing experience through the page latency (or response time), defined as the time between
the user request for a specific web page and the complete transfer of every object in the web page.
With the improvement in server and router technology, the availability of high-speed network
access and larger capacity pipes, the web browsing experience is currently improving. However,
congestion may still arise, causing the TCP congestion control to kick in and leading to higher
page latencies. In such cases, users can become impatient, as testified by the popularization of
the World Wide Wait acronym [53]. The user behavior radically changes, the current transfer is
aborted, and new one is possibly started right away, e.g., hitting the ‘stop’ and ‘reload’ buttons in
Web browsers.
This behavior can affect the network performance, since the network does some effort to transfer information which might turn out to be useless. Furthermore, resources devoted to aborted
connections are unnecessarily taken away from other connections.
In this chapter, we do not focus on the causes that affect the web browsing performance, but,
rather, on the measurement of the impact of the user behavior when dealing with poorly performing
web transfers. Using almost two months of real traffic analysis, we study the effect of early transfer
interruptions on TCP connections, and the correlation between connection parameters (such as
throughput, file size, etc.) and the probability of early transfer interruption.
43
CHAPTER 3. USER PATIENCE AND THE WORLD WIDE WAIT
44
The rest of this chapter is organized as follows: Section 3.2 defines and validates the interruption measuring criterion; Section 3.3 analyzes the interruption of real traffic traces, reporting the
most interesting gathered results; conclusive considerations are the object of Section 7.
3.2 Interrupted Flows: a definition
With interruption event we indicate the early termination of an ongoing Web transfer by the client,
before the server ends sending data.
From the browser perspective, such an event can be generated by several interactions between
the user and the application: aborting the transfer by pressing the stop button, leaving the page
being downloaded by following a link or a bookmark, or closing the application.
From the TCP perspective, the events described above cause the early termination of all TCP
connections1 that are being used to transfer the web page objects. While it is impossible to distinguish among them, they can all be identified by looking at the evolution of the connection itself,
as detailed in the following section. Though it would seem natural to consider the interruption as
a “session” metric rather than a “flow” metric, session aggregation is extremely difficult and critical [54]. Therefore, due also to the hazy definition of “Web session”, we will restrict our attention
to individual TCP flows, attempting to infer the end of ongoing TCP connections, rather than the
termination of ongoing Web sessions.
3.2.1 Methodology
In order to define a heuristic criterion discriminating between interrupted and completed TCP
flows, we first inspected several packet-level traces corresponding to either artificially interrupted
or regularly terminated Web transfers. We considered the most common operating systems and
web browsers: Windows 9x, Me, 2k, Xp and Linux 2.2.x, 2.4.x were checked, in combination with
MSIE 4.x, 5.x, 6.x, Netscape 4.7x, 6.x or Mozilla 1.x.
Figure 3.1 sketches the evolution of a single TCP connection used in interrupted (right) versus
completed (left) HTTP transaction. In the latter case, after the connection set-up, the client performs a GET request, which causes DATA to be transmitted by the server. If persistent connections
are used, several GET-DATA phases can follow. At the end, the connection tear-down is usually
observed from the server side through FIN or reset (RST) messages. Conversely, user-interrupted
transfers cause the client to abruptly signal the server the TCP connection interruption. The actual chain of events depends on the OS used by clients, i.e., Microsoft clients immediately send
an RST segment, while Netscape/Mozilla clients gently close the connection by sending a FIN
message first. From then on, the client replies with RST segments upon the reception of server
segments that were in flight when the interruption happened (indicated by thicker arrows in the
figure). In all cases, any user interruption action generates an event which is asynchronous with
respect to the self-clocked TCP window mechanism.
In Figure 3.1, several time instants are also identified:
;fehg
1
"
and ehgji identifying the time of the TCP Flow Start and End, respectively;
In this chapter we interchangeably use the terms connection and flow.
3.2. INTERRUPTED FLOWS: A DEFINITION
45
Completed Flow
SYN
Interrupted Flow
SYN
tFS
Connection
Setup
SYN + ACK
ACK
Way
(Three
)
Handshake
GET
SYN + ACK
ACK
GET
t
CS
DATA
DATA
t
SS
ACK
GET
ACK
GET
t
CE
DATA
Data
Stream
DATA
ACK
ACK
DATA + FIN
ACK
tSE
FIN
tFE
User’s
Interruption
DATA + FIN
’
tFE
RST or FIN
RST
ACK
Client
Server
Time
Client
Server
Figure 3.1: Completed and Interrupted TCP Flow
;felk
and elk2i identifying the time of the client request Start and End, corresponding to the
first and last segment carrying data from the client side;
;fe
"
and e " i identifying the time of the server reply Start and End, corresponding to the first
and last segment carrying data from the server side.
"m"
Timestamps are recorded by Tstat, which passively analyzes traffic in between the client and
server hosts (its location being represented by the vertical dotted line in the figure); therefore, the
time reference is neither that of the client nor of the server2 .
3.2.2 Interruption Criterion
From the single flow traffic analysis, we can define a heuristic discriminating among client-interrupted
and completed connections. We preliminarily introduce a necessary condition to the interruption
2
In the measurement setup we used, the packet monitor is close to the client (or server) side, and therefore the
reference error is small, since the delay introduced by our campus LAN is small compared to the RTT.
CHAPTER 3. USER PATIENCE AND THE WORLD WIDE WAIT
46
flow property, which we call eligibility, derived from the observation of Figure 3.1. TCP connections in which the server sent DATA but did not send a FIN (or RST) segment and the client sent
an RST segment are said to be eligible. Thus:
Eligible
n*
poq1
FIN "sr RST "
4Vt
DATA"
t
RST k
(3.1)
where the index ( ] or Z ) refers to the sender of the segment. The client FIN asynchronously sent
by Netscape/Mozilla browsers can be neglected, because RSTs are sent anyway upon the reception
of the following incoming server packets. The client’s FIN (asynchronously sent at the time of the
interruption by Netscape/Mozilla Browsers) can be neglected, because subsequent RST are anyway
sent on each incoming server packets.
1
0.1
0.6
0.01
Probability Distribution
Cumulative Distribution
0.8
0.4
0.001
0.2
0
0
2
4
6
8 10 12 14 16 18 20
0.0001
Http
Eligible
0
30
Figure 3.2:
60
[
90
120
150
t_{gap} [sec]
180
210
240
270
Probability and Cumulative Distribution
However, this criterion by itself is not sufficient to distinguish among interrupted and completed
connections. Indeed, there are a number of cases in which we can still observe an RST segment
from clients before the connection tear-down by servers. In particular, due to HTTP protocol
settings [55], servers may wait for a timer to expire (usually set to 15 seconds after the last data
segment has been sent) before closing the connection; moreover, HTTP 1.1 and Persistent-HTTP
1.0 protocols use a longer timer, set to a multiple of 60 seconds. Connections abruptly closed
during this idle time would be classified as interrupted, even if the data transfers were already
completed.
as the time elapsed between the last data segment from the
To gauge this, let us define
u
gji
i
" . In Figure 3.2 we plot both the pdf (in the
server and the actual flow end, i.e.
@
[
for all HTTP connections (solid line) and the eligible ones (dotted lines).
inset) and the CDF of
As can be observed, the majority of connections are closed within few seconds after the reception
of the last data segment. The server-timer expiration is reflected by the pdf peak after 15s, which
is clearly absent for the eligible flow class. But the presence of a timer at the client side, triggered
3.2. INTERRUPTED FLOWS: A DEFINITION
47
0.1
Http
Eligible
pdf
0.01
0.001
0
0.5
Figure 3.3: Normalized
1
^t
gap
wxyz
v
1.5
2
0.0001
Probability Distribution, {5|}~€|‚
about 60s after the last segment is received, causes the client to send an RST segment before the
server connection tear-down, as shown by the CDF plot for eligible flows.
Unfortunately, all flows terminated by the timer expiration match the eligibility criterion: we need
an additional time constraint in order to uniquely distinguish the interrupted flows from the subset
of the eligible ones. Recalling that user interruptions are asynchronous with respect to TCP selfwxuyz
clocking based on the RTT, we expect that
of an interrupted flow is roughly independent
from TCP timings and upper-bounded by a function of the flow measured RTT. Let us define the
wxyz
normalized v
as
wxyz
wxuyz„ƒ2…
|
{F†ˆ‡ RTT ‰
‚†‹Š RTT Œ
(3.2)
v
where ‡ RTT and Š RTT are the average and standard deviation of the connection RTT respectively 3 .
wxuyz
Figure 3.3 plots the v
pdf for both the eligible and non-eligible flows when {|Ž} and | .
wxuyz
For non-eligible flows, the pdf shows that v
can be either:
3
‘
close to 0 when the server FIN is piggybacked by the last server data segments and the client
has already closed its half-connection or closes its half-open connection by the means of an
RST segment;
‘
roughly 1 RTT when the server FIN is piggybacked by the last server data segments, and the
client sends a FIN-ACK segment, causing the last server-side ACK segment to be received
1 RTT later by the server;
‘
much larger than 1 RTT for connections which remain open and are then closed by an application timer expiration.
The ’ RTT and “ RTT estimation used by Tstat is the same as the one TCP sender uses. The lack of accuracy of
the algorithm, the variability of RTT itself and the few samples per flow make this measurement not accurate, affecting
•—–˜[™
distribution.
the ”
48
CHAPTER 3. USER PATIENCE AND THE WORLD WIDE WAIT
u[
Instead, considering eligible flows, we observe that
is no longer correlated with the RTT.
Moreover, we would expect that, in this case, the asynchronous interruption events uniformly
distributed among one RTT. This is almost confirmed by Figure 3.3, except that the pdf exhibits a
peak close to 0. This is explained considering the impact of the TCP window size: the transmission
of several packets within the same window, and therefore during the same RTT, shifts the " i
[
measurement point, reducing the
toward smaller values than the RTT, as sketched in Figure 3.4.
DATA
ACK
ACK
RTT
DATA
ACK
ACK
RTT
User
Interruption
tSE
tFE
RST
Client
Tstat
Server
Time
Figure 3.4: Temporal Gap Reduction
Therefore, from the former observations, we define the flow interruption criterion as:
Interrupted
n*
Eligible
t1
[Vš
04
(3.3)
As a further validation of the criterion, we plot in Figure 3.5 the CDF of the server data size
transmitted on a connection of both complete and interrupted flows. Looking at the inset reporting
a zoom of the CDF curve, it can be noted that the interrupted flows size is essentially a multiple
of the maximum segment size (which is usually set to the corresponding Ethernet MTU of 1500
bytes). Indeed, for normal connections, the data size carried by flows is independent from the
segmentation imposed by TCP. This further confirms that in the former case not all the server
packets reached the client before the interruption happened.
In order to test the
sensitivity to the interruption heuristic, we analyzed the interruption
probability, i.e., the ratio of the interrupted connection number versus the totally traced connections, both for different values of and and with respect to a simplified interruption criterion
E› e
L'Sœ ).
that uses a fixed threshold (i.e.,
Results are plotted in Figure 3.6, adding in the inset the relative error percentage to the curve
1[^+4
C1ž4 , as a function of the time of day considering 10 min observation window. It can be
seen that different 1[^+4 values do not largely affect the RTT-dependent results (the error is within
few percentage points). On the contrary, a fixed-threshold approach deeply alters the interruption
3.2. INTERRUPTED FLOWS: A DEFINITION
49
1
Cumulative Distribution
0.8
1
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
Completed
Interrupted
0
0
20
1
2
3
4
5
6
40
60
Flow Size [KBytes]
7
8
9 10
0
80
100
Figure 3.5: Interrupted vs Completed Flows Size CDF
0.25
Interruption Probability
0.2
α=1 β=1
α=2 β=0
Ttresh = 30sec
Ttresh = 90sec
Relative Error
200%
150%
100%
0.15
50%
0%
0.1
0.05
0
10:00
11:00
12:00
13:00
14:00
Time of Day [Hours]
15:00
16:00
Figure 3.6: Sensitivity of the Interruption Criterion to the Ÿ and Parameters
ratio, compromising the criterion validity. For example, when ¡£¢L¤¥¦§[¤ includes the client 60-second
timer of persistent connections, the error grows to over ¨8©©[ª : recalling the results of Figure 3.2,
this would qualify almost all eligible flows as interrupted. Therefore we can confirm that the
interruption criterion we defined so far is affected by a relative error which is however small enough
to be neglected.
CHAPTER 3. USER PATIENCE AND THE WORLD WIDE WAIT
50
Table 3.1: Three Most Active server and client statistics: total flows 9
rank
Internal
server
External
server
1
2
3
1
2
3
« #
186400
131907
86189
29300
25637
18448
® ¬!­
­¯¬T­
²$³?´Oµ$³—¶
º»µ?´½¼8²—¶
º¸´O¾$³—¶
²?´O·$¿—¶
²?´ÁÀ„¾—¶
º8´O¾$¾—¶
°
#
15969
10024
7320
539
659
231
#
and interrupted :
#
± ­
¬ ­
!
·?´¹¸0²—¶
¼m´¹¸0²—¶
·?´¹¸0²—¶
º8´O·$³—¶
²?´O¿0¼Â¶
º8´O²$¿—¶
3.3 Results
In this section we study how the interruption probability is affected by the most relevant connection
properties, such as the flow size, throughput and completion time. Also, we discriminate flows as
client or server (respectively when the server is external or internal to our LAN) and as mice or
elephants (depending on whether their size is shorter than or longer than 100 KB).
0.5M
Interruprion Probability
Total Samples
Interrupted Samples
0.1M
0.5
Number of samples
10K
0.3
1K
0.2
0.1K
0.1
10
0:00
Interruption Probability
0.4
2:00
4:00
6:00
0
8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 24:00
Time [hours]
Figure 3.7: Interrupted vs Completed vs Flows Amount and Ratio
Figure 3.7 plots the number of interrupted versus totally traced flow (left y-axis scale), together
with their ratio (right y-axis), as a function of the time of day. Client flows only are considered 4 . As
expected, the total number of tracked flows is higher during working hours, and the same happens
to interrupted flows, leading to an almost constant interruption probability.
Given this behavior, in the following we will restrict our analysis to the 10:00–16:00 interval,
where we consider both the traffic and the interruption ratio to be stationary. It must be pointed
4
Server flows yielded the same behavior.
3.3. RESULTS
51
out that our campus is mainly a client network toward external servers, i.e., only the ö of the
tracked connections have servers inside our campus LAN. Therefore, to both have a statistically
meaningful data set and to compare the client versus server results on approximatively the same
number of connections, we used traces with different temporal extension. The client traces refer to the work-week from Monday 7 to Friday 12 November 2002 from 10:00 to 16:00, where
we observed >?Ã?DžÃ unique clients contacting Ä8Å¯Æ—Ç—Ç—Ç unique external server for a total of more than
$?È flows. Instead, servers data refer to a two-month-long trace (Monday to Friday, October to
November 2002, 10:00–16:00), where >($É$B unique external clients contacted 118 unique internal
servers generating Ä ÇÊ ÆË Å¯Ç—Ì connections. For the same reasons, the elephants data-set refers to the
same period of the server trace.
Considering the selected dataset, the average percentage of interrupted flow of all logged
servers is 9.18%, while for all logged clients is 4.20%. This shows that a significant percentage of TCP flows are interrupted: this quantity was measured on our campus network, which
offers a generally good browsing experience, therefore we expect this ratio to be much higher in
worse-performing scenarios.
Table 3.1 details the interruption statistics for the three most contacted internal and external
servers. 9 # and : # represent, respectively, the total and the interrupted number of observed flows.
Apart from noticing that the number of external contacted servers is higher and therefore the traffic
is more spread than the internal servers, it is worth to notice that the interruption probability of
the three most contacted internal servers is roughly the same for each server ( Ä ÆÁ¶ ). Considering the
external servers statistics, the interruption ratio is smaller (from ÍÏÎ>ж to ÎÍÏ>Ñж ), and also smaller
than the average interruption probability which is larger than D[¶ . This suggests that the three most
contacted servers offer a good browsing experience to our clients.
In order to better understand the motivations that drive user impatience, in the following subsections we inspect how the interruption probability varies when conditioned to different parameters
Ò
. In particular, we define as
#*ÓÔ 1
Ò
Õ4_
Z
XžÖ?W
1[9
is Interrupted
Z
t 9€n
×
XžÖ?W
Ҁ؀Ù
# 4
19qn
ÒØÚÙ
# 4
the ratio of the interrupted connection number over the total connection number, conditioned to a general Ò parameter, thus Û P Ü flow is Interrupted - ÒÞÝ . alternatively, with some
algebra and applying the Bayes formula, it can be seen that # expresses ßàÜm9 is Interrupted má ÒqØ
Ù
Ý
# .
Intuitively, when Û is constant over any Ò value interval, this means that the interruption is
not correlated with the parameter Ò .
We introduce in Table 3.2 the average values of the parameters studied in the following for both
elephant (â ) and mices (ã ) , client and server flows.
Û
3.3.1 Impact of the User Throughput
Let the average user throughput be the amount of the data transferred by the server over the time
elapsed between the connection setup and the last server data packet: referring to Figure 3.1, we
CHAPTER 3. USER PATIENCE AND THE WORLD WIDE WAIT
52
may write5 :
Throughput ×ä
DATA "Yå
1 "
i
@
g
" 4
Figure 3.8 reports É[ » as well as the number of total and interrupted flow samples, for
both server (on the top) and client (on the bottom) flows. The number of samples can be read on
the left y-axis, while the corresponding probability can be read on the right y-axis
It can be noted that, in the server case, ! slightly decreases when the user transfer
rate increases, while a general increase in the æ É[ » is observed by client connections, which
is quite counterintuitive. However, this is explained considering mice and elephant flows. Indeed,
in the mice case, the interrupted flows throughput is Ä$Å—Ê ç times higher than for completed flows.
This suggests that the early termination is due to a link-follow behavior (i.e., the user clicking on a
link to reach a new page). On the contrary, interrupted elephant flows have a throughput 1.5 times
smaller than the one of completed flows, confirming the intuition that a smaller throughput leads
to higher interruption probability.
3.3.2 Impact of Flow Size
In Figure 3.9 the interruption probability is conditioned to the flow size 6 , i.e., "$#&%' . Considering
client flows (on the bottom), we observe that there is a peak of short transfers that are aborted:
this is due to the interruption of parallel TCP connections opened by a single HTTP session. In the
server case (top plot), the "0#&%' is higher, on average, than the previous case. In both cases, against
expectations, users do not tend to wait longer when transferring longer flows, as the increasing
interruption probability suggests.
3.3.3 Completion and Interruption Times
Figure 3.10 shows the dependence of completed and interrupted server flows on the time elapsed
since flow start until its end, i.e., &#Ï)è' . It can be gathered from the figure that users mainly
5
This performance parameter does not include the time elapsed during connection tear-down since it does not affect
the user perception of transfer time.
6
In the case of interrupted connections, the size has to be interpreted as the amount of data transferred until the
interruption occurred.
é
ê
Average
T [s]
Size [Kb]
Thr [Kbps]
T [s]
Size [Kb]
Thr [Kbps]
Interrupted
client
server
4.81
28.91
10.01
19.47
115.93 118.72
108.42
82.86
1081.00 394.34
192.62 255.07
Completed
client
server
10.12
29.58
6.34
8.67
79.15
90.57
122.12 123.78
638.46 454.71
274.29 344.58
Table 3.2: Average Values of the Inspected Parameters
3.3. RESULTS
53
Interruption Probability
Total Samples
Interrupted Samples
(Server)
10K
0.2
1K
0.15
0.1K
0.1
10
0.05
0
20
1M
40
60
Throughput [Kbps]
80
Interruption Probability
Total Samples
Interrupted Samples
(Client)
0.1M
Number of Samples
0.25
0
100
0.3
0.25
10K
0.2
1K
0.15
0.1K
P| Thrughput
Number of Samples
0.1M
0.3
P| Thrughput
1M
0.1
10
0.05
0
Figure 3.8:
20
ëAì í¤¥îï»ð¤ñﻢ
40
60
Throughput [Kbps]
80
0
100
: Server on the Top, Client on the Bottom
abort the transfer in the first 20 seconds: during this time, users take the most ‘critical’ decisions,
while, after that time, they tend to wait longer before interrupting the transfer. The slow rise in the
interruption ratio after the 20 seconds mark, though, shows that users are still willing to interrupt
the transfer if they think it takes too much time.
Finally, Figure 3.11 considers server flows within the 0-20s interval only. The ë ì ò$ó&ô¦ probability
is further conditioned to different classes of users according to their throughput, i.e.,ë ì ò$ó&ô¦,ì í¤¥îï»ð¤ñﻢ .
Three throughput classes are considered: Fast ( õö¨$©© Kbps), Slow ( ÷ø¨$© Kbps) and Medium speed
(between 10Kbps and 100Kbps). Looking at the figure, it can be noticed that the three different
CHAPTER 3. USER PATIENCE AND THE WORLD WIDE WAIT
54
1M
0.3
0.25
10K
0.2
1K
0.15
0.1K
0.1
10
Interruption Probability
Total Samples
Interrupted Samples
0
20
1M
40
60
Transferred Data Size [KBytes]
80
Interruprion Probability
Total Samples
Interrupted Samples
(Client)
0.1M
Number of Samples
P| Size
0.1M
0.05
0
100
0.3
0.25
10K
0.2
1K
0.15
0.1K
P| Size
Number of Samples
(Server)
0.1
10
0.05
0
Figure 3.9:
20
ëAì ò$ó&ô¦
40
60
Received Data Size [KBytes]
80
0
100
: Server on the Top, Client on the Bottom
classes suffer very different interruption probability: higher for slow flows, and much smaller for
fast flows. Linear interpolation of data (dotted lines) is used to highlight this trend. Indeed, slow
connections massively increase the interruption probability, while faster connections are likely to
be left alone. This shows that the throughput is indeed one of the main performance indexes that
drives the interruption probability.
3.4 Conclusions
The research presented in this chapter inspected a phenomenon intrinsically rooted in the current
use of the Internet, caused by user impatience at waiting too long for web downloads to complete.
3.4. CONCLUSIONS
55
0.1M
Number of Samples
0.3
Interruption Probability
Total Samples
Interrupted Samples
(Server)
0.25
10K
0.2
1K
0.15
0.1K
P| Time
1M
0.1
10
0.05
0
10
20
30
40
Transfer Time [sec]
Figure 3.10:
ùú û(ü*ý+þ
50
60
0
: Server case only
0.8
Fast ( >100 Kbps)
Med ( >10 Kbps)
Slow ( <10 Kbps)
(Server)
0.7
Interruption Probability
0.6
0.5
0.4
0.3
0.2
0.1
0
0
20
Figure 3.11:
ë
40
60
Transferred Size [KBytes]
ì ò$ó&ô¦,ì í¤¥îï»ð¤ñï!¢
80
100
: server case only
We defined a methodology to infer TCP flows interruption, and presented an extended set of results gathered from real traffic analysis. Several parameters have been considered, showing that
the interruption probability is affected mainly by the user-perceived throughput. The presented
interruption metric could be profitably used in defining the user satisfaction of Web performance,
as well as to derive traffic models that include the early interruption of connections.
56
CHAPTER 3. USER PATIENCE AND THE WORLD WIDE WAIT
Chapter 4
The Zoo of Elephant and Mice
HIS chapter, whose results have been published in [56], studies the TCP flow arrival process,
starting from the aggregated measurement at the TCP flow level taken from our campus network. After introducing the tools used to collect and process TCP flow level statistics, we analyze
the statistical properties of the TCP flow inter-arrival process.
We create different traffic aggregates by splitting the original trace, such that i) each traffic
aggregate has, bytewise, the same amount of traffic, and ii) it is constituted by all the TCP flows
with the same source/destination IP addresses (i.e., belonging to the same traffic relation). In
addition, the splitting algorithm packs the largest traffic relations in the first traffic aggregates;
therefore, subsequently generated aggregates are constituted by an increasing number of smaller
traffic relations. This induces a divisions of TCP-elephants and TCP-mice into different traffic
aggregates.
The statistical characteristics of each aggregates are presented, showing that the TCP flow
arrival process exhibits long range dependencies which tends to vanish on traffic aggregates composed by many traffic relations made of TCP-mice mainly.
4.1 Introduction
Since the pioneering work of Danzig [57, 58, 59], and Paxons [60, 61] the interest in data collection, measurement and analysis to characterize either the network or the users behavior increased
steadily, also because it was clear from the very beginning that “measuring” the Internet was not
an easy job. The lack of simple, yet satisfactory model like the traditional Erlang teletraffic theory for the circuit-switched networks, still pose this research field as a central topic of the current
research community. Moreover, the well known Long Range Dependency (LRD) behavior shown
by the Internet traffic makes traffic measuring and modeling even more interesting. Indeed, after
the two seminal chapters [61, 62], in which authors showed that traffic traces captured on both
LANs and WANs exhibit LRD properties, many works focused on studying the behavior of data
traffic in packet networks. This, with the intent of both trying to find a physical explanation of the
properties displayed by the traffic, and to find accurate stochastic processes that can be used for
the traffic description in analytical models.
Considering the design of the Internet, it is possible to devise three different layers at which
study Internet traffic: Application, Transport and Network, to which user sessions, TCP or UDP
57
58
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
flows, and IP packets respectively correspond.
Indeed, a simple “click” on a web link, causes the generation of a request at the application level (i.e., an HTTP request), which is translated into many transport level connections (TCP
flows); each connection, then, generates a sequence of data messages that are transported by the
network (IP packets). In this chapter, we concentrate our attention to the flow level, and to the TCP
flow level in particular, given that the majority of the traffic is today transported using the TCP
protocol. The motivation behind this choice is that while it was shown (see for example [61, 63])
that the arrival processes of both packets and flows exhibit LRD properties, a lot of researchers
concentrated their attention to the packet level, while the flow level traffic characteristics are relatively less studied. Moreover, even if the packet level is of great interest to support router design,
e.g., for buffer dimensioning, the study of the TCP flow level is becoming more and more important, since the flow arrival process is of direct role in the dimensioning processes of web servers
and proxies. Another strong motivation to support the study of flow arrival process is represented
by the increasing diffusion of network apparatuses which operates at the flow level, e.g. Network
Address Translators or Load Balancer; indeed, their design and scalability mainly depend on the
number of flows they have to keep track to perform packet manipulations between incoming and
outgoing data.
Going back at the packet level, the prevailing justification to the LRD presence at this level
is supposed to be the heavy tailed distribution of files size [64]: the presence of long-lived flows,
called in the literature “elephants”, induces correlation to the packet level traffic, even if the majority of the traffic is build by short-lived flows, or “mice”. The question we try to answer in this
chapter is weather the presence of mice and elephants has an influence to the LRD characteristics
at the flow level as well. To face this topic, we collected several days of live traffic from our campus network at the Politecnico di Torino, which consist of more than 7000 hosts, the majority of
which are clients. Instead of considering the packet level trace, we performed a live collection of
data directly at the TCP flow level, using Tstat [65, 33], a tool able to keep track of single TCP
flows by looking at both the data and acknowledgment segments. The flow level trace was then
post-processed by DiaNa [95], a novel tool which allowed to easily derive several simple as well
as very complex measurement indexes in a very efficient way. Both tools are under development
and made available to the research community as open source.
To gauge the impact of elephants and mice on the TCP flow arrival process, we follow an
approach similar to [66, 68], that creates a number of artificial scenarios, deriving each of them
from the original trace into a number of sub-traces, mimicking the splitting/aggregation process
that traffic aggregates experience following different paths inside the network. We then study the
statistical properties of the flow arrival process of different sub-traces, showing that the LRD tends
to vanish on traffic aggregates composed mostly of TCP-mice.
The rest of the chapter is organized as follows. Section 4.2 provides a survey of related works
and results; problem definition and input data analysis is the object of Section 4.3, in which the
adopted algorithm to derive traffic aggregates will be briefly described highlighting its features.
The result of the analysis on traffic aggregates is presented in Section 4.4, and conclusions will be
drawn in Section 4.5.
4.2. RELATED WORKS
59
4.2 Related Works
In the last years a lot of effort has been spent to the traffic analysis issue in the Internet, at both IP
and TCP level. It is well known IP arrival process is characterized by long-range dependence and
how important is to consider this property for network planning, buffer dimensioning in primis.
Long-range dependence study in the network domain has been pioneered by many works as [62,
61, 64, 72], all agreeing on the possible causes of LRD of IP traffic. Indeed, they identified heavytailed file size distribution (and, consequently, TCP flows size) as the main cause of long-term
correlation at IP level, that keeps true for WAN and LAN traffic.
IP traffic is also characterized by more complicated scaling phenomena, which will not be our
issues, that have been well discussed in [68], where authors consider also small-scales phenomena, which have less direct impacts on engineering problems. Authors in [68] used an interesting
manipulation of data at TCP level, which allowed to study the relation between the IP and TCP
scaling behavior both at small and large scales. The gained results also confirmed that IP traffic
LRD properties are partly inherited as consequence of TCP level properties (e.g., the distribution of
the connection duration and flow size), while others scaling properties seem to depend on packets
arrivals within flows.
The statistical analysis of real measured traffic, due to the significant amount of collected data
and research efforts, gave new impulses to traffic modeling as well. Here, we briefly summarize
the different approaches followed in the last years, where a number of attempts were made to
develop models for LRD data traffic at packet level mainly. Among the different approaches,
Fractional Brownian Motion (FBM) received a lot of attentions thanks to pioneering work [73];
while providing good approximations, the FBM models, however, are not able to capture both the
short and long term correlation structure of network traffic.
Therefore, the poor FBM scaling properties drove many research efforts toward Multifractal
models, whose attractive is due to their scale-invariance properties. [74, 75] mainly focus on
the physical explanation of network dynamics, showing multifractal proprieties for the first time.
Indeed, also other works such as [76, 71] suggest multifractal models as possibly being the best
fit to measured data. For example, in [76], authors tried to explain the scaling phenomena, from
small to large scales, from direct inference of the network dynamics. In particular, they identify
in the heavy-tailed nature of the TCP flows number per Web session the cause of LRD effects
clearly visible at large scales in the TCP traffic. This effect, analogous to an ÿ åEå effect with
infinite variance service time distribution, was pointed out in [71] and already known in literature.
Finally, [77, 78] have been more interested on measure-based traffic modeling issue, identifying a
multifractal class of processes able to describe the multiscaling properties of TCP/IP traffic, from
small to large scales; however, the real impact of the for the small traffic scales properties still
remain questionable.
All these traffic characterization works deviate considerably from classical Markovian models
which continue to be widely used for performance evaluation purposes with good results [79, 80,
81, 82]; in all the above works for example, the Markov Modulated Poisson Process (MMPP), is
considered as one of the best Markov process that emulates traffic behavior. However, in [81, 82]
authors also point out that MMPP do not bring long-term correlation; authors therefore, define
the local Hurst parameter using an approximated LRD definition valid on a limited range of time
scales.
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
60
Finally, another approach to model Internet traffic involves the emulation of the real hierarchical nature of network dynamics, e.g., considering users sessions, TCP flows and IP packets; e.g.,
in [83], each of the model’s components was fitted to empirical pdf, such as the distribution of both
TCP flows and web pages size, and the arrival distribution of page and flows.
All of these different approaches reach similar conclusion using different techniques. The
common point of view has always been to take into account the real traffic behavior, in order to
be able to either i) use more reasonable tools for network planning or ii) explain the links between
causes and effects of network traffic phenomena. The analysis brought network community to
awareness on the real traffic which has to be taken into account for a correct performance evaluation
of the systems.
In this chapter, we are interested on the large scale mainly as explained in the following section,
where we will describe a new criterion to classify traffic starting from the TCP level and not
considering the packet level at all. We define rules to aggregate TCP flows and define an high level
entity with good engineering properties to split homogeneous traffic. Studying then the different
traffic aggregates, we try to gain an insight on TCP flow interarrival behavior.
4.3 Problem Definition
In this section, we first introduce the different aggregation level and the notation that will be used
in the remaining of the chapter. We then describe the measuring setup and the input trace characteristics that are relevant to understand the significance of the presented results. Finally, we describe
the splitting/aggregation criterion that will be used to derive different traffic aggregates, showing
briefly its properties and its technical aspects.
4.3.1 Preliminary Definitions
When performing trace analysis, it is possible to focus the attention on different level of aggregations. Considering the Internet architecture, it is common to devise three different levels, i.e.,
IP-packet, TCP/UDP flow, and user session level. While the definition of the first two layers is
commonly accepted, the user session one is more fuzzy, as it derives from the user behavior. In
this chapter we therefore decided to follow a different approach in the definition of aggregates. In
particular, to mimic the splitting/aggregation process that data experience following different paths
in the network, we define four level of aggregation, sketched in Figure 4.1: IP packets, TCP flows,
Traffic Relations (TR), and Traffic Aggregates (TA). Being interested into the Flow arrival process,
we will neglect the packet level, and also the UDP traffic, because of its connectionless nature, and
because it consists of a small portion of the current Internet traffic. Let us first introduce the formal definition of the different aggregation levels considered in this chapter, as well as the related
notation:
;
1
TCP Flow Level
A single TCP connection1 is constituted, as usual, by several packets exchanged between
K
the same client (i.e., the host that performed the active open) and the same server (i.e.,
In this chapter we use the term “flow” and “connection” interchangeably.
4.3. PROBLEM DEFINITION
TA
Size
|τ| = 2
61
|τ| = 1
Traffic Aggregate
TA Level
t
Traffic Relation
TR Level
TR
Size
t
Flow
Size
TCP Flow Level
t
Packet Level
Figure 4.1: Aggregation Level From Packet Level to TA Level
the host that performed the passive open), having besides the same (source,destination) TCP
ports pair. We will consider only successfully opened TCP flows –i.e., whose three-wayhandshake was successful– and we will consider the flow arrival time as the observation
K
time of the first client ] segment. We denote with # 10 4 the bytewise size of the Õ -th
TCP Flow tracked during the observation interval; the flow size considers the amount of
W
K
bytes flowing from the server toward the client , which is usually the most relevant part
of the connection.
;
Traffic Relation Level
K
K
A Traffic Relation (TR) 10 4 aggregates all the -
1„ 4$- TCP flows, ‘having as source
œ
K
K
K
and as destination; we indicate its size, expressed in bytes, with e 10 4 #
# 1„ 4 .
The intuition behind this aggregation criterion is that all the packets within TCP flows belonging to the same TR usually follow the same path(s) in the network, yielding then the
same statistical properties on links along the path(s).
;
Traffic Aggregate Level
Considering that several TRs can cross the same links along their paths, we define a higher
level of aggregation, which we call Traffic Aggregate (TA). The -th TA is stated by .?/ and
K
its bytewise size is e 1ˆ
4^
!"# œ$&%(')$*,+ e 10 4 ; the number of traffic relations belonging to
a traffic aggregate is indicated with - .0/ - .
In the following, rather than using the bytewise size of flows, TRs and TAs, we will will refer
to their weight, that is, their size expressed in bytes normalized over the total amount of bytes
K
observed in the whole trace - # # œ$ # 1„ 4 ; thus
. # 10 K 4 / # 10 K
. œ e 10 K 4 å . /æ
e 10ˆ
4å -
4 å
-
TCP Flow Level
Traffic Relation Level
Traffic Aggregate Level
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
62
Outgoing data
Incoming data
Internet
Packet
Sniffer
edge
router
Internal LANs
Pc with
Tstat
Figure 4.2: The Measure Setup
4.3.2 Input Data
The analysis was conduced over different traces collected in several days over our Institution ISP
link during October 2002. Our campus network is built upon a large 100 Mbps Ethernet LAN,
which collects traffic from more than 7,000 hosts. The LAN is then connected to the Internet by a
single router, whose WAN link has a capacity of 28 Mbps2 .
We used Tstat to perform the live analysis of the incoming and outgoing packets sniffed
on the router WAN link, as schemed in Figure 4.2, obtaining several statistics at both packet and
flow levels. Moreover, Tstat dumped a flow-level trace, which has then been split into several
traces, each of which refers to a period of time equal to an entire day of real traffic. Besides, since
our campus network is mainly populated by clients, we consider in this analysis only the flows
originated by clients internal to our Institution LAN. Each trace has been then separately analyzed,
preliminary eliminating the non-stationary time-interval (i.e., night/day effect) and considering
then a busy-period from 8:00 to 18:00. Given that the qualitative results observed on several traces
do not change, in this chapter we present results derived from a single working day trace, whose
properties are briefly reported in Table 4.1. During the 10 hours of observation, 2,380 clients
contacted about 36,000 different servers, generating more than 172,000 TRs, for a total of more
then 2.19 million TCP flows, or 71.76 million packets, or nearly 80 GBytes of data.
In the considered mixture of traffic, TCP protocol represents the 94% of the total packets, which
allows us to neglect the influence of the other protocols (e.g., UDP); considering the application
services, TCP connections are mainly constituted by HTTP flows, representing the 86% of the
total services and more than half of the totally exchanged bytes. More in detail, there are 10664
different identified TCP server ports, 99 of which are well-known: these account for 95% of the
flows and for the 57% of the traffic volume.
Considering the different traffic aggregation levels previously defined, Figure 4.3 shows examples of the flow arrival time sequence. Each vertical line represents a single TCP flow, which
K
started at the corresponding time instant of the x-axis, and whose weight . # 1„ 4 is reported on the
y-axis. The upper plot shows trace . , whose .1 7 , and represents the largest possible TA, built
considering all connections among all the possible source-destination pairs. Trace sub-portions .2
and . k , while being constituted by a rather different number of TRs ( - .32 -
>4 ÃB and - . k -Y
×>4 ),
2
The data-link level is based on an AAL-5 ATM virtual circuit at 34 Mbps (OC1).
4.3. PROBLEM DEFINITION
63
wi(c,s)
w=1, |τA|=172332
τA
wi(c,s)
w=1/50, |τB|=5986
τB
wi(c,s)
w=1/50, |τC|=59
τC
wi(c,s)
w=0.06 ≈1/16, |τD|=1
τD
8:00
10:00
12:00
14:00
Time of Day
16:00
18:00
Figure 4.3: Flow Size and Arrival Times for Different TAs
2 . k å >m . Observing Figure 4.3, it can be gathered that .32
have indeed the same weight . aggregates a larger number of flows than . k ; furthermore, weight of .(2 flows is smaller (i.e., TCP
flows tend to be “mice”), while . k is build by a much smaller number of heavier (i.e., “elephants”)
TCP flows. This intuition will be confirmed by the data analysis presented in Section 4.4. Finally,
TCP flows shown in .65 constitute a unique traffic relation; this TR is built by a small number of
TCP flows, whose weight is very large, so that they amount to 1/16 of the total traffic.
To give the reader more details on the statistical properties of the different TRs, Figure 4.4
shows the distribution of . œ for all the TRs (using a lin/log plot). It can be noticed that, except
for the largest and smallest TRs, the distribution can be approximated by a linear function, i.e., the
amount of bytes exchanged by considering different client/server couples follows an exponential
distribution. More interesting is instead the distribution of the TCP-flows number per traffic relation, shown in Figure 4.5 using a log/log plot. Indeed, the almost linear shape of the pdf shows that
it exhibits a heavy tail, which could be at the basis of the LRD properties in the TCP-flow arrival
process. The parallel with the heavy-tailed pdf of the flow-size, which induces LRD properties to
the packet level, is straightforward.
Table 4.1: Trace Informations
Internal Clients
External Servers
Traffic Relations
2,380
35,988
172,574
Flows Number
Packets Number
Total Trace Size
²?´Lº»¾170º¸ È
¼?º8´½¼8µ!70º¸ È
79.9 GB
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
Normalized Weight wsd
64
10
-2
10
-3
10
-4
10-5
10
-6
10
-7
10-8
10
-9
10-10
1
40K
80K
120K
160K
Traffic Relation Index
Figure 4.4: TR Size Distribution
Probability Distribution
10-1
10-2
10-3
10-4
1
10
100
10-5
1000
Number of TCP Flows within TR
Figure 4.5: TR Flow Number Distribution
4.3.3 Properties of the Aggregation Criterion
We designed the aggregation criterion in order to satisfy some properties that help the analysis and
interpretation of the results.
The key-point is that the original trace is split into 8 different TA, such that i) each TA has,
bytewise, the same amount of traffic, i.e., the 8 -th portion of the total traffic and ii) each TA
aggregates together one or more TRs. Indeed, we considered TR aggregation a natural choice,
since it preserve the characteristics of packet within TCP flows following the same network path,
4.3. PROBLEM DEFINITION
65
having therefore similar properties.
Being possible to find more than one solution to the previous problem, the splitting algorithm
we implemented packs the largest TRs in the first TAs; besides, in virtue of the bytewise traffic
constraint, subsequently generated aggregates are constituted by an increasing number of smaller
TRs. Therefore the TAs, while composed by several TRs related to heterogeneous network paths,
are ordered by the number - .0/ - of TRs constituting them.
To formalize the problem, and to introduce the notation that will be used also to present the
results, let us define:
;
Class K: the number of TA in which we split the trace;
;
Ø :
Slot J: a specific TA of class K, namely .0/21[354»9 ;
3=< ;
;
Weight .
;
Target Weight .
TA at class 3 .
/21354
: the weight of slot J of class K;
1[354s
: the ideal portion of the traffic that should be present in each
å 3
| τ1(1)|=172574
K=2
| τ1(2)|=242
K=3
| τ1(3)|=55
| τ2(2)|=172332
| τ2(3)|=1110
| τ3(3)|=171409
Undersized
-
wJ(K) < 1/K
...
K=1
...
NF(K)=1
K=16
Exact
...
K=100
...
NF(K)=2
...
K=42
...
...
wJ(K) = 1/K
NF(K)=9
...
Oversized
+
wJ(K) > 1/K
Figure 4.6: Trace Partitioning: Algorithmic Behavior
Figure 4.6 sketches the splitting procedure. When considering class 3 we have a single
TA of weight . 1[354E
6 , derived by aggregating all the TRs, which corresponds to the original
trace. Considering 3 Î , we have two TAs, namely . 1Î4 and .?>„1Î4 ; the former is build by 242
TRs, which account to . 1354 å Î of relative traffic, while the latter contains all the remaining
TRs. This procedure can be repeated for increasing values of 3 , until the weight of a single traffic
relation becomes larger than the target weight. Being impossible to split a single TR into smaller
aggregates, we are forced to consider TAs having a weight .A/@ 1354 = . 1354 .
The weight . 1[354 has therefore to be interpreted as an ideal target, in the sense that it is possible
that one or more TRs will have a weight larger than . 1[354 , as the number of slots grows. In such
cases, there will be a b
1354 number of fixed slots, i.e., TAs constituted by a single TR, of weight
.B @ 1[354Ú=p å 3 ; the remaining weight will be distributed over the 3 @/ 1[354 non-fixed slots;
#
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
66
therefore the definition of .
.
/21[354
E
DF
CD
is:
/21[354_
.B @ 1354 =
/
GIH K/ JMLON "P
L N "P3 S
R
. @
@;Q #
#
g
3Ž@T
1354
1[354
›
3
LON "P H K/ J P
In the dataset considered in this chapter, for example, the TR .K5 shown in Figure 4.3 is the
largest of the whole trace, having . 5
ŽÉͽBžÎÞ= å $B . Therefore, from class 3 p8B on, the
slot J=1 will be always occupied by this aggregate, i.e., . 1354A
.65 VUl3 W $B , as evidenced in
Figure 4.6.
4.3.4 Trace Partitioning Model and Algorithm
More formally, the problem can be conduced to a well known optimization problem ß á!Z ) Û of job
scheduling over identical parallel machines [69], which is known to be strongly NP-hard. Traffic
relations TR are the jobs that have to be scheduled on a fixed number 3 of machines (i.e., TA)
minimizing the maximum completion time (i.e., the TA weight).
The previously introduced ideal target . P å 3 is the optimum solution in the case of
preemptive scheduling. Since we preserve the TR identities, preemption is not allowed; however,
it is straightforward that minimizing the maximum deviation of the completion time from . 1[354 is
equivalent to the objective function that minimize the maximum completion time.
K
We define X as the Bb
Y - . 104$- vector of the jobs length (i.e., TR weights . # 10 4 ) and Z as the
.
vector of the machine completion times (i.e., TA weights / ). Stating with ÿ the mapping
BY 3
matrix (i.e., ÿ #\[  means that the i-th job is assigned to the j-th machine), and stating with ÿ #
its i-th column, we have:
]
Õ$^`_
_=WbC a # UhÕ ØT: 3=<dcfe
E Q [ ÿ #\[ UhÕ
F
s.t.
a # W ÿ #Og X @hZ # a # f
W Z # @fÿ #Og X UhÕ
UhÕ
The greedy adopted solution, which has the advantage over, e.g., an LPT[69] solution of the
clustering properties earlier discussed, implies the preliminary bytewise sorting of the traffic relations, and three simple rules:
;
allow a machine load to exceed
;
keep scheduling the biggest unscheduled job into the same machine while the load is still
below å 3 ;
;
remove the scheduled job from the unscheduled jobs list as soon as job has been scheduled.
å 3
if the machine has no previously scheduled job;
A representation of the algorithm output applied to the TA creation problem is provided in
Figure 4.7; the x-axis represents the number of TR within TA and the y-axis represents the weight
4.4. RESULTS
Normalized Traffic Aggregate Weight wi(c,s)
67
∀K ∈ [3,100]
K=10
K=50
K=100
0.1
0.01
10
100
1000
10000
100000
Number of Traffic Relations |τ| within each Traffic Aggregate
Figure 4.7: Trace Partitioning: Samples for Different Aggregated Classes K
…l
lnmpo
}$~rq?É~8}8Ms are
of each generated TA, i.e, ikv j
Œ . For the ease of the read, results for classes
l tvu
highlighted, whereas neither classes
nor fixed slots are reported in the picture. Observing
the figure, and recalling the distribution of the TR weights plotted in Figure 4.4, we notice that,
l
given a class , the number w x j w of TRs inside the y -th slot increases as y increases. For example,
the last slot (the one on the rightmost part of the plot) exhibits always a number of TRs larger than
100,000. On the contrary, looking at the first classes, we observe that the number of TRs tends to
l
decrease for increasing , showing the “packing” enforced by the selected algorithm.
4.4 Results
In this section we investigate the most interesting properties of the artificially built traffic aggregates; the analysis will be conduced in terms of, mainly, aggregated size and TCP flow interarrival
times, coupling their study with the knowledge of the underlaying layers – i.e., traffic relation and
flow levels.
To help the presentation and discussion of the measurement results, we first propose a visual
representation, alternative to the one of Figure 4.7, of the dataset obtained by applying the splitting
algorithm. Indeed, the aggregation process induces a non-linear sampling of both the number
…l
iSj …l , complicating the interpretation of the
wx j
Œ w of TRs within each TA and the TA weight v
Œ
results. However, the same qualitative information can be immediately gathered if we plot data
l
as a function of the class
and slot y indexes, using besides different gray-scale intensities to
represent the measured quantity we are interested in.
…l
As a first example of the new representation, Figure 4.8 depicts the number w x j
Œ w of the traffic
…l
relations mapped into each traffic aggregate x j
.
Looking
at
the
plot
and
choosing
a particular
Œ
l
class , every point of the vertical line represents therefore the number of TRs within each of
l
l
the possible TA – obtained by partitioning the trace into the TAs having approximatively a
68
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
Figure 4.8: Number of Traffic Relations
- .0/21[354$-
within each Traffic Aggregate
weight. Given the partitioning algorithm used, it is straigth-forward to understand the reason
why the higher is the slot considered, the larger is the number of TRs within the same TA, as the
gray gradient clearly shows. To better appreciate this partitioning effect, contour lines are shown
for - .8/21[354$- Ø ÜY?8$É88É8$?É8$? Ý as reference values; it can be gathered that the bottom
white-colored zone of the plot is constituted by fixed slots (i.e., - .+-(
 ), whereas the slot F3
has always - . P 1[354$-h= $É?Ézl
U 3 , as already observed from Figure 4.7. This further confirms
the validity of our simple heuristic in providing the previously described aggregation properties.
å 3
4.4.1 Traffic Aggregate Bytewise Properties
Having showed that the number of TR within the generated TA spread smoothly over a very wide
range for any TA weight, we now investigate if these effects are reflected also by the number of
K
TCP flows withing each TA, # œ$&%,)*3"P -
1„ 4$- , shown in Figure 4.9. Quite surprisingly, we
observe that also the number of TCP connections within TA shows an almost similar spreading
behavior: the larger number of TRs within a TA, the larger number of TCP flows within the same
TA. Indeed, the smoothed upper part of the plot (i.e., roughly, slots 
= 3 å Î ) is represent by
TAs with a large number of TCP flows, larger than about 1,000; instead, TAs composed by few
TRs contain a much smaller number of TCP connections: that is, the number of TCP flows keeps
relatively small, while not as regular as in the previous case. Indeed, it must be pointed out that
there are exception to this trend, as shown by the “darker” diagonal lines – e.g., those ending in
>mD , considering class 3 Ž$ . Probably, within these TAs there is one (or
slot Î> or 
possibly more) TR which is built by a large number of short TCP flows.
Coupling this result with the bytewise constraint imposed on TAs within the same class, we
can state that the bottom region is constituted by a small number of flows accounting for the same
4.4. RESULTS
69
100
1000000
Number of TCP Flows within
Slot J for Different Classes K
90
80
100000
70
10000
Slot J
60
50
1000
40
30
100
20
10
10
0
100
90
80
70
60
50
Class K
40
30
20
10
0
Figure 4.9: Number of TCP Flows within each Traffic Aggregate
traffic volume generated by a huge number of lighter flows. This intuition is also confirmed by
Figure 4.10, which shows the mean TCP flow size in the different aggregates; the higher is the slot
considered, the larger are both the TRs and TCP flows number, the shorter are the TCP flows.
While this might be surprising at first, the intuition behind this clustering is that the largest TRs
are built by heavier TCP-connection than the smaller TRs, i.e., TCP-elephants play a big role also
in defining the TR weight. Therefore the splitting algorithm, packing together larger TRs, tends
also to pack together TCP-elephants.
The TCP flow size variance, not shown to avoid cluttering the figures, further confirms the
clustering of TCP elephant flows in the bottom side of the plot. To better highlight this trend,
we adopted a threshold criterion: the mean values of the TCP flow size are compared against
fixed quantization thresholds, which allows to better appreciate the mean flow size distribution
in TAs. The results for threshold values set to $»Î>?É>?É88 kB are shown in Figure 4.11,
where higher threshold values correspond to darker colors. The resulting plot underlines that the
mean flow size grows toward TA constituted by a smaller number of larger TR, which are in their
turn, constituted by a small number of TCP-elephants. Similarly, the higher-order slots are mostly
constituted by mice.
However, the gained result do not exclude the presence of TCP-mice in the bottom TAs, nor the
presence of TCP-elephants in the top TAs; therefore, to further ensure that the clustering property
of TCP-elephants in the first slots holds, we investigated how the TCP flow size distributions in
the different aggregates vary as a function of the TA number 3 . We thus plot in Figure 4.12
the empirical flow size distribution for each TAs, showing only the results of class 3 G for
the ease of the read. As expected, i) TCP-elephants are evidently concentrates in lower slots,
as shown by the heavier tail of the distribution; ii) the TCP-elephants presence decreases as the
slot increases, i.e., when moving toward higher slots. The complementary cumulative distribution
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
70
100
90
Mean TCP Flow Size within
Slot J for Different Classes K
80
100 MB
70
10 MB
Slot J
60
50
1 MB
40
100 KB
30
20
10 KB
10
0
100
90
80
70
60
50
Class K
40
30
20
10
0
Figure 4.10: Mean Size of TCP Flows within each Traffic Aggregate
100
> 1 MB
90
TCP Elephant Estimate
Based on Different Mean
TCP Flow Size Thresholds
80
> 500 KB
70
Slot J
60
> 250 KB
50
40
30
> 100 KB
20
10
100
90
80
70
60
50
Class K
40
30
20
10
< 100 KB
Figure 4.11: Elephant TCP Flows within each Traffic Aggregate
function ßàÜ Ù = Ò£Ý , shown in the inset of Figure 4.12, clearly confirms this trend.
This finally confirms that the adopted aggregation criterion induces a division of TCP-elephants
and TCP-mice into different TAs: TRs containing the highest number of TCP-elephants tends to
be packed in the first TA. Therefore in the following, we will use for convenience the terms TA-
4.4. RESULTS
71
mice and TA-elephants to indicate TA constituted (mostly) by TCP-mice and TCP-elephants flows
respectively.
1
J=1
J=2
J=3
0.9
0.8
1
0.7
0.1
0.5
1-CDF(X)
CDF
0.6
0.4
0.01
0.3
0.001
0.2
0.1
0
10K
10
100
1K
10K
100K
100K
1M
1M
10M
100M
10M
1G
0.0001
100M
TCP Flow Size (Bytes)
Figure 4.12: TCP Flows Size Distribution of TAs (Class 3
7$
)
4.4.2 Inspecting TCP Interarrival Time Properties within TAs
The result gained in the previous section has clearly important consequences when studying the
TCP flow arrival process of different aggregates.
Let us first consider the mean interarrival time of TCP flows within each TA, shown in Figure 4.14. Intuitively, TA-mice have a large number of both TR and TCP flows, and therefore
TCP flow mean interarrival time is fairly small (less than 100ms). This is no longer true for TAelephants, since a smaller number of flows has to carry the same amount of data over the same
temporal window. Therefore, the mean interarrival time is much larger (up to hours).
We recognize, however, that a possible problem might arise, affecting the statistical quality of
the results: the high interarrival time may be due to non-stationarity in the TA-elephants traffic,
where TCP flows may be separated by long silence gap. This effect becomes more visible as long
as the class index 3 and so the aggregation level - .„/213540- decreases. We will try to underline
whether the presented results are affected by this problem in the remaining part of the analysis.
The previous assertion is also confirmed observing Figure 4.14, which shows, within each TA,
the plot of TCP flow interarrival time variance. It can be noticed that the TCP flow interarrival
time variance is several orders of magnitude smaller considering TA-mice with respect to the TAelephants.
Let us now consider the Hurst parameter ˆ
{ 10
354 measured considering the interarrival time of
TCP flows within TAs. For each TA, we performed the calculation of ˆ
{ 10
354 using the waveletbased approach developed in [70]. We adopted the tools developed there and usually referred to
as the AV estimator. Other approaches can be pursued to analyze traffic traces, but the wavelet
72
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
Mean TCP Flow Interarrival
Times (ms) within Slot J for
Different Classes K
Figure 4.13: Interarrival Time Mean of TCP Flows within TAs
Figure 4.14: Interarrival Time Variance of TCP Flows within TAs
framework has emerged as one of the best estimators, as it offers a very versatile environment, as
well as fast and efficient algorithms.
The results are shown in Figure 4.15, which clearly shows that the Hurst parameter tends to
decrease for increasing slot, i.e., the TA-mice show ˆ
{ 1É
354 smaller than TA-elephants. Recalling
4.4. RESULTS
73
TCP Flow Interarrival Times
Hurst Parameter within Slot J
for Different Classes K
Figure 4.15: Interarrival Time Hurst Parameter of TCP Flows within TAs
the problem of the stationarity of the dataset obtained by the splitting algorithm, not all the {ˆ1É354
are significant; in particular, the one whose confidence interval is too large or the series is not
stationary enough to give correct estimations were discarded, and not reported in the plot. Still, the
increase in the Hurst parameter is visible for TA-elephants.
To better show this property, Figure 4.16 presents detailed plots of ‹
{ 1É
354 for 3 Ø ÜY$É>?É8$? Ý
(on the top-left, top-right, bottom-left plots respectively) and for all the slots of each class. It can
be observed that the Hurst parameter always tends to decrease when considering the TA-mice slots,
while it becomes unreliable for TA with few TRs, as testified by the larger confidence intervals.
This is particularly visible when considering the 3 8 class. In the bottom-right plot, finally,
{ 10
354 value for large . In the x-axis, the å 3
we report a detail of the decaying feature of the ˆ
value is used, so that to allow a direct comparison among the three different classes. Notice that
the last class, the one composed by many, small TRs which are aggregation of small TCP flows,
always exhibits the same Hurst parameter.
Therefore, the most important observation, trustable due to good confidence interval, is that
we are authorized to say that TA-mice behavior is driven by TCP-mice. Similarly, TA-elephants
are driven by TCP-elephants. Moreover, the interarrival process dynamic in TA-elephants and TAmice are completely different in nature because light TRs tend to contain a relatively small amount
of data carried over many small TCP flows, which do not clearly exhibit LRD properties. On the
contrary, TCP-elephants seem to introduce a more clear LRD effects in the interarrival time of
flows within TA-elephants.
A possible justification of this effect might reside in the different behavior the users have when
generating connections: indeed, when considering that TCP-mice are typically of Web browsing,
the correlation generated by Web sessions tends to vanish when a large number of (small) TRs are
aggregated together. On the other side, the TCP-elephants, which are rare but not negligible, seem
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
74
to be generated with a higher degree of correlation so that i) TRs are larger, ii) when aggregating
them, the number of users is still small.
For example, consider the different behavior of a user which is downloading large files, e.g.,
MP3s; one can suppose that the user starts downloading the first file, then –immediately after
having finished the download– he starts downloading a second one, and so on until, e.g., the entire
LP has been downloaded. The effect on the TCP flow arrival time is similar to a ON-OFF source
behavior, whose ON period is heavy-tailed and vaguely equal to the file download period, which
turns out to follow an heavy tailed distribution; besides, we do not care about the OFF period. This
is clearly one of the possible causes of LRD properties at the flow level: see, for an example, the
.,5 aggregate in Figure 4.3.
1
1
K=10
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
1
1
5
10
0.5
K=50
1
25
50
0.7
K=100
0.9
0.65
0.8
0.7
0.6
K=100
K=50
K=10
0.6
0.5
0.55
1
50
100
0.8
0.85
0.9
0.95
Normalized Slot J/K
1
Figure 4.16: Interarrival Time Hurst Parameter of TCP Flows within TAs
Besides, consider again the heavy-tailed distribution of the TCP flow number within TRs,
shown earlier in Figure 4.5. If we further consider TRs as a superset of web sessions, we gather
the same result stated in [71], that is, the ÿ åVå| effect with infinite variance service time distribution.
Finally, TAs tends to aggregate several TRs, separating short term correlated connections (TAmice) from low-rate ON-OFF connections with infinite variance ON time (TA-elephants). This
clearly recall to the well know phenomena described in [64]; that is, the superposition of ONOFF sources (as well as “packet trains”) exhibiting the Noah effect (infinite variance in the ON
or OFF period) produces the so called “Joseph effect”: the resulting aggregated process exhibits
self-similarity and in particular LRD properties.
4.5 Conclusions
In this chapter we have studied the TCP flow arrival process, starting from the aggregated measurement taken from our campus network; specifically, we performed a live collection of data directly at
4.5. CONCLUSIONS
75
the TCP flow level, neglecting therefore the underlaying IP-packet level. Trace were both obtained
and post-processed through software tools developed by our research group, publicly available to
the community as open sources.
We focused our attention beyond the TCP level, defining two layered high-level traffic entities.
At a first level beyond TCP, we identified traffic relations, which are constituted by TCP flows with
the same path properties. At the highest level, we considered traffic relation aggregates having
homogeneous bytewise weight; most important, the followed approach enabled us to divide the
whole traffic into aggregates mainly made of either TCP-elephants or TCP-mice.
This permitted to gain some interesting insights on the TCP flow arrival process. First, we
have observed, as already known, that long range dependence at TCP level can be caused from
the fact that the number of flows within a traffic aggregate is heavy-tailed. In addition, the traffic
aggregate properties allowed us to see that TCP-elephants aggregates behave like ON-OFF sources
characterized by an heavy-tailed activity period. Besides, we were able to observe that LRD at
TCP level vanishes for TCP-mice aggregates: this strongly suggests that even ON-OFF behavior
is responsible of the LRD at TCP level.
76
CHAPTER 4. THE ZOO OF ELEPHANT AND MICE
Chapter 5
Feeding a Switch with Real Traffic
HIS chapter, whose results have been published in [84], proposes a novel methodology to
generate realistic traffic traces to be used for performance evaluation of high performance
switches. It is fairly well known that real Internet traffic shows long and short range dependency
characteristics, difficult to be captured by flexible, yet simple, synthetic models. One option is to
use real traffic traces, which however are difficult to obtain, as requires to capture traffic in different
places with synchronization and management problems.
We therefore present a methodology to generate several synthetic traffic traces from a single
real trace of packets, by carefully grouping packets belonging to the same flow to guarantee to
keep the same statistical properties of the original trace. After formalizing the problem, we solve
it and apply the results to assess the performance of scheduling algorithms in high performance
switches, comparing the results to other simpler traffic models traditionally adopted in the switching community. Our results show that realistic traffic degrades the performance of the switch by
more than one order of magnitude with respect to the traditional traffic models.
5.1 Introduction
In the last years, many different studies have pointed out how the Internet traffic behaves, focusing
their analysis on the statistical properties of IP packets and traffic flows. The whole network
community is now more conscious that traffic arriving at an IP router is considerably different from
the traditional models (Bernoulli, on/off and many others). The seminal paper of Leland [62] gave
new impulses in traffic modeling, leading to a huge number of papers deeply investigating such
problem from different point of view. A group of studies focused on statistical analysis and data
fit, e.g. [85]. These works highlighted traffic properties such as Long Range Dependence (LRD)
at large time scales and also multi-fractal properties. LRD is probably the most relevant cause
of degradation in system performance because it heavily influences the buffer performance whose
behavior, being characterized by a Weibull tail [86], is considerably different from the exponential
tail of conventional Markovian models. However, no commonly accepted model of Internet traffic
has yet emerged, because either the proposed models are too simple (e.g., Markovian models), or
very complex and difficult to understand and tune (e.g., multi-fractal models).
Therefore, Internet traffic is intrinsically different from the random processes commonly used
in performance evaluation of networking systems, where trace-driven simulations are the most
77
78
CHAPTER 5. FEEDING A SWITCH WITH REAL TRAFFIC
commonly used approach. This applies in particular to performance evaluation of high speed
switches/routers, since the overall complexity of switching systems cannot be fully captured by
analytical models. Hence, all the switch designers use simulation to validate their architectures by
stressing its performance under critical situations.
How to generate the traffic to feed the simulation model is still an open question, because: i)
traffic models (like Bernoulli, on/off, etc.) are indeed flexible and easy to tune, but they are not
good model of real Internet traffic; ii) real traffic traces are more difficult to tune, and are not
flexible since they refer to a particular network topology/configuration. In this paper, we propose
a novel approach to generate synthetic traffic traces to be used for performance evaluation. It
steams from the generation of synthetic traffic from a real trace, and adds the capability of building
different scenarios (e.g., number of input/output ports and traffic pattern), keeping real traffic characteristics and providing synthetic traffic flexibility. The main idea is to generate the traffic which
follows the time correlation of packets at IP flow level and, at the same time, satisfies some given
traffic pattern relations. As interesting example of application, the performance of basic switching
architectures are discussed and compared to the tradition benchmarking models.
5.2 Internet Traffic Synthesis
We consider a nY} switch, where each switch port is associated to an input/output link toward
external networks, as shown in top plot of Figure 5.1. One possible approach to feed this switch
with real packet traces requires to sample the traffic at each of the links; unfortunately, this
approach is not easily viable, since it requires either to have a real ~Y} switching architecture
or to manage and synchronize several distributed packet sniffers.
In a more realistic situation, only one trace referring to one link is available: for example in
this work, traffic traces have been sniffed at Politecnico’s egress router, as shown in bottom part
of Figure 5.1. Once the trace is available, a methodology to create different traffic scenarios is
required, for example by imposing specific traffic relations among the input/output ports. The
output of the methodology will be a set of traces, satisfying the constraints imposed by the
selected scenario. Our approach tries to establish the best mapping among source-destination IP
W €
K W
K€
addresses 1 4 and input-output ports 1Õu(
 4 of the switch (i.e.,
Õ and
 ), in order to
generate a total of synthetic traces that can be “replayed”, i.e. fed to the input links of the
switch under analysis.
The traffic relations are described by a traffic matrix e , of size ‚Yƒ , expressing the normalized average offered load between any input and output ports.
Additional constraints must be met in order to keep the statistical behavior of the original traffic
trace, e.g., to keep the packet time correlation among the IP packets having the same source and
destination addresses.
This problem is intuitively not trivial, given the number of constraints that must be satisfied.
To better discuss the synthesis problem, we introduce the notation used throughout the rest of the
paper, then formalize the problem, and solve it using a greedy heuristic.
5.2. INTERNET TRAFFIC SYNTHESIS
s
79
i
j
d
...
Edge
Router
Packet
Sniffer
External Servers
Internal Clients
Figure 5.1: Internet traffic abstraction model (on the top) and measure setup (on the bottom).
5.2.1 Preliminary Definitions
When traffic is routed on a network, it is possible to focus the attention on different level of aggregations – namely IP packet, IP Flow, and Flow Aggregate levels. In particular, we define:
;
K
IP Flow: An IP flow aggregates all IP packets having the same IP source address Ø=„ and
W Ø=…
IP destination address
. We state its size expressed in bytes with † œ‡ , and its normalized
load ˆ œ‡ ‰† œ!‡ åŠQ 6%?‹ Q/Œ %? † Œ . This is a natural aggregation level which entails that IP
W
K
packets routed from to will follow the same route, closely mimicking Internet behavior.
;
Flow Aggregate: We define a flow aggregate Ž ‹  as the aggregation of all packets having
W
­
K
source address ØT„ #’‘ „ and destination address Ø;… [“‘ … . We will choose address
sets such that Ü „ # Ý and Ü … [ Ý are partitions of „ and … respectively.
The flow aggregate Ž ‹  represents the traffic crossing the switch from input Õ to output  .
­
Let us denote the address-2-port mapping function with A 2 P 1[4 ; then, for any source address tied
K
K
to a specific input Õ we have A 2 P 1 4s
Õ,VU ØT„ # , as well as for any destination address it holds
W
W Ø=…
[ for a specific output port  .
A 2 P 1 4_
”
 0U
Figure 5.2 reports an example of the previous classification: at the bottom there is the original
traffic trace, which is composed by four IP flows (labeled A,B,C,D). Then two flow aggregates are
generated, considering the union of Ü A,B Ý , and Ü C,D Ý respectively.
5.2.2 Traffic Matrix Generation
The mapping of the IP flows to a given traffic matrix e establishes a binding among IP addresses
and switch ports. More formally, this binding can be thought as a generalization of the ß á!Z ) Û
optimization problem of scheduling jobs over identical parallel machines[69], and its formalization
 4 of matrix Ù is denoted by Ù #[ , the Õ -th row by
is provided in Figure 5.3, in which element 1Õu(
Ù
Ù Ù # and the  -th column by [ , being
the transpose matrix of Ù .
2
‹ –É
• +
 The normalized IP flows load matrix Z ØT: É8,<
is the input of the optimization problem,
in which each IP flow represents a job of size ˆ œ‡ which have to be scheduled without deadline.
CHAPTER 5. FEEDING A SWITCH WITH REAL TRAFFIC
80
A+B)
Flow Aggregates
C+D)
A)
B)
IP Flows
C)
D)
Measured
Aggregated
Traffic
t
Figure 5.2: Internet traffic at different levels of aggregation
>
A fixed number of machines are available, each corresponding to an input output couple of
the switch; each machine is assigned a target completion time, i.e., the target (normalized) traffic
L • L
L •É ‹2
matrix e ؗ: ÉÔ,<
, which can be exceeded without penalty. Matrices ] Ø Ü„É8 Ý
and
˜ Ø

+
–
•
L
Ý
܄É8
are the output of the problem, i.e., the mapping of jobs to machines, or, in our
case, the mapping of source IP addresses to switch input ports and destination IP addresses to
switch output ports respectively.
The objective is to minimize the maximum error committed in the approximation, i.e., the maximum deviance from the target traffic matrix. The first two system constraints lower bound the
˜ Z 4 [ mapping and the target
error a #\[ with the absolute difference among the approximated 1] # s
e
#\[ completion time.
™“šœ› _
_ƒWba
s.t.
CDD
a \# [
E
DDF a \# [
Q
Q
#\[ U
Õ,
Ø
ÜY»ÎÔÍ8Í8Í»
˜ WF1[] # s
Z 4 [ @ e #[
˜ W e #\[ @1[] # s
Z 4 [
K
[ ˜ ] œž[ F? U W
#Ÿ‡ U
#
Ý
UhÕ,
UhÕ,
Figure 5.3: The optimization problem
With respect to the classic ß á!Z ) Û formulation, two additional constraints are present: once
an IP destination address has been mapped to a particular switch output port, then all IP flows with
the same destination IP address must be mapped to the same output port, as we assume that only
one path is used to route packets. The same applies for IP source addresses and switch input ports.
5.2. INTERNET TRAFFIC SYNTHESIS
¡¢¤£ 3P ¥§¦¨`¥©
81
P
while( ª6«(¬®­_<
) c
//¯±°$select
freer
heaviest IP flow
¥&²6³O´µr¶$·O¸1µ¹ ¯º ports and
³
¡ ¢¼»` M¡¢
¡ ¢
¯œ½¾¥$¿,³´µr¶$·O¸1¡µ¢ ¹ ¡¢
«
// set address-to-port
if unset
½
½ ´/mapping
°
if( À(Á(Â0ÃYc ¿ d ), Á(ÂÃYc ¿ d ´f²
if( À(Á(Â0ÃYc d ), Á6Â0ÃYc d
// update
involving already mapped dests
¿
foreach( already mapped Ä ) c
P
¿
if(
Ç±È « ¬ÆÅ­ < £Ë) MÇÈ && ( ª3Á6Â0ÃYcÄ d ) c
d
« ¬ Å­
£ É P #Å­ Ê
a2p
a2p
É Å­ ÊRÌ « ¬ Å­
d
// update
involving already
mapped sources
½
foreach( already
mapped
)
c
P
½Ä
if( « ¬Å ­< È Í
) && (È Í ª3Á6Â0ÃYc Ä d ) c
d
£ É ¬ Å PÊ
a2p
« ¬ Å ­
£ƒ a2p
É ¬ Å Ê
Ì « ¬®Å ­
d
d
Figure 5.4: The greedy algorithm
Therefore, each flow source will use just one row of the machine grid and each flow destination
will use just one column, as enforced by the last two constraints.
Since the well known ß á!Z ) Û optimization problem [69] is known to be strongly NP-hard,
its bi-dimensional extension cannot have a polynomial time solution. Moreover, due to the size of
our problem, we look for a simple and fast approximation of the optimal solution: among all the
possible strategies, a greedy approach has been selected due to its extreme simplicity.
5.2.3 Greedy Partitioning Algorithm
In this section, we will briefly highlight some of the main features of the greedy adopted strategy,
whose pseudo-code is shown in Figure 5.4. The intuition at the base of the algorithm is to try to
K W
map the heaviest (un-mapped) IP flow 1 4 to the freest port pair 1Õu(
 4 , and then force all flows
having the same IP source address to enter from the same switch input port, while flows having
the same destination address will be force to exits from the same switch output port. This is done
by updating the approximated traffic matrix Î , whose elements account for the size of IP flows
assigned to that particular port pair. This is repeated until all the IP flows have been mapped.
CHAPTER 5. FEEDING A SWITCH WITH REAL TRAFFIC
82
In the context of a greedy solution, the choice to accommodate at each step the heaviest remaining IP flow –which simply yields to process IP flows in a reversed sorted order– is quite intuitive.
However, this not the case for the port pair selection; indeed, we tried several policies, and tested
their performance under different traffic scenarios. For example:
;
the port pair corresponding to the globally largest element in e h
@ Î , i.e.
1ÐÕ,(
 4Ï ÐÆÑIÒ ™ Ð3Ó )Š Ó 1 e )Ô Ó @TÎ )Š Ó 4 ;
;
the row-wise largest element, i.e. the largest element of the largest row
ÕÕÏ
ÐÑIÒ ™ Ð3Ó ) 1 Q/Œ e )Š Œ @TÎ )Ô Œ 4 , §Ï ÐÆÑIÒ ™ ÐKÓ Ó 1 e #œ Ó @TÎ #œ Ó 4 — or, symmetrically, the columnwise largest;
;
the coupled (row,column)-wise largest element, that is the element that lies at the intersection
of the largest row and column, i.e. ÕÏ ÐÑIÒ ™ ÐKÓ ) 1 Q Œ e )Š Œ @hÎ )Ô Œ 4 ,
§Ï ÐÑÖÒ ™ ÐKÓ Ó 1 Q/Œ e Œ Ó @hÎ Œ Ó 4 .
All the former approaches gave similar results only with uniform target matrix e ; in the other
cases the global approach gave the best results, even when compared to the coupled (row,column)wise. Indeed, the global approach tries to minimize the maximum error among the target e and
approximated traffic matrix Î at a local level, i.e. for a specific input/output pair. Other strategies
try to minimize respectively the error on either the input ports (row-wise strategy) or the average
error (coupled strategy), explaining thus the relatively worse performances.
5.3 Performance study
5.3.1 Measurement setup
In order to collect traffic traces, we observed the data flow on the Internet access link of our institution, i.e., we focus on the data flow between the edge router of our campus LAN and the access
router of GARR/B-TEN, the Italian and European Research network. Since our university hosts
mainly act as clients, we recorded only the traffic flows originated by external servers reaching
internal clients (i.e., the direction highlighted in Figure 5.1).
The trace has been sampled during a busy period of six hours, by collecting data on 28 million
of packets and 42400 IP flows. The time window has been chosen such that the overall traffic is
tested to be stationary both for the first and second order statistics. The property of real traffic that
we mainly take into account is the long range dependency which is well known to be responsible of
buffer performance degradation. It is not the topic of this paper to provide a statistical analysis of
the traffic measured at our institution router but it is relevant to highlight that the measured traffic
exhibit LRD [87] properties from the scales of hundreds of milliseconds to the entire length of the
data trace with the Hurst parameter in the range of 0.7-0.8 [88].
5.3.2 The switching architectures under study
An IP switch/router is a very complex system [89], composed by several functionalities: here we
focus our attention only on the performance of switching systems. We consider a simple model
5.3. PERFORMANCE STUDY
83
Mean delay [time slots]
100000
OQ-P3
MWM-P3
iSLIP-P3
OQ-PT
MWM-PT
iSLIP-PT
10000
1000
100
10
0
0.1
0.2
0.3
0.4
0.5
0.6
Normalized load
0.7
0.8
0.9
Figure 5.5: Mean packet delay under PT and P3 scenarios for cell mode policies.
of the switching architecture, based on the one described in [90]. The incoming, variable size,
IP packets are chopped into fixed size cells which are sent to the internal switch, where they are
transferred to output port, and then reassembled into the original packets before being sent across
the output link. The internal switch, which operates in a time slotted fashion, can be input queued
(IQ), output queued (OQ) or a combined solution, depending of the available bandwidth inside the
switching fabric.
IQ switches are usually considered scaling better than OQ switches with the line speed, and for
this reason they are considered in practical implementations of high speed switches. Input queues
are organized into the well known virtual output queue (VOQ) structure, necessary to maximize
the throughput. One disadvantage of IQ switches is that they require a scheduling algorithm to
coordinate the transfer of the packets across the switching fabric; the performance of an IQ switch,
in terms of delays and throughput, is very sensible to the adopted scheduling algorithm and depends also on the traffic matrix considered. Scheduling algorithms can work either in cell mode
or in packet mode [90]. In cell mode, cells are transferred individually. In packet mode, cells
belonging to the same IP packet are transferred as a train of cells, in subsequent time slots; hence,
the scheduling decision is correlated to the packet size.
In the past, the performance of several scheduling algorithms have been compared [90, 91] under Bernoulli or correlated on/off traffic. Here we compare the performance of maximum weight
matching (MWM) [92] and iSLIP [93] scheduling algorithms under different traffic models. We selected MWM as example of theoretical optimal algorithm which is too complex to be implemented,
whereas iSLIP was chosen as example of practical implementation with suboptimal performance.
We consider a ×ÙØ× switch, with internal cell format of 64 bytes. In the IQ switch, buffers are
set equal to ÚÆÛ|Û|Û cells per VOQ, i.e. about 320 KBytes per VOQ and about ÜÝŸÚ MBytes per input
port – which is a reasonable amount of high-speed memory available today in an input card. In the
OQ switch, buffers are set equal to Þ|Û|Û|Û|Û cells ( ßÚKÛ|Û|ÛØà× ) to compare fairly with the IQ switch.
5.3.3 Traffic scenarios
For the sake of space, we present our results only for uniform traffic, i.e. ádâãƒßåäæ?ç , where ä
(between Ûèݱé and ÛèÝê ) is the average input load, normalized to the link speed. We consider two
CHAPTER 5. FEEDING A SWITCH WITH REAL TRAFFIC
84
Mean delay [time slots]
100000
OQ-P3
MWM-P3
iSLIP-P3
OQ-PT
MWM-PT
iSLIP-PT
10000
1000
100
10
0
0.1
0.2
0.3
0.4
0.5
0.6
Normalized load
0.7
0.8
0.9
Figure 5.6: Mean packet delay under PT and P3 scenarios for packet mode policies.
scenarios, depending on the process of packet generation.
ë Packet trace (PT). Packets are generated according to the trace, following the methodology
of traffic synthesis we presented in Section 5.2.2.
ë Packet trimodal (P3). Packet generation is modulated by an on/off process, satisfying the
traffic matrix á . Packet lengths are generated according to a trimodal distribution, which
approximates the distribution observed in our trace. This can be considered a traditional
good synthetic model, tuned according to the features of the real trace.
5.3.4 Simulation results
Figs. 5.5 and 5.6 plot the average delay as a function of the normalized load for tree configurations:
OQ switch, IQ switch with MWM scheduler and IQ switch with iSLIP scheduler. The first graph
refers to cell mode (CM) schedulers for IQ, the second to packet mode (PM) schedulers for IQ.
In all cases, the delays experienced under Packet Trace model are much larger than in the case of
trimodal traffic. This holds true not only for high load, but also at low load the effect of the LRD
traffic causes much higher delay. Note also that the performance of CM and PM are almost the
same in both scenarios; this is reasonable, since the estimated coefficient of variation of the packet
length distribution is é|ÝìÛzÚ and, according to the approximate model in [90], CM and PM should
behave the same.
Fig. 5.7 shows the throughput achieved in both scenarios considering CM policies (PM behave
the same). Because of the LRD property of the input traffic, the queue occupation under PT is
much larger than P3 causing higher loss probability and reduced throughput.
The most surprising result is that the relative behavior of the three schedulers changes from P3
to PT. Indeed, OQ, IQ-MWM an IQ-iSLIP give almost the same performance with the traditional
traffic model, while a degradation in the throughput curves is present considering the Packet Trace
model (up to 10% reduction) and an increase in delays. Moreover, IQ-MWM behaves the worse
considering the delay metric, which depends mainly on metric based on queue length [90]; iSLIP
on the contrary shows shorter delays than OQ: one partial explanation of this is that, for high
5.4. CONCLUSION
85
1
OQ-P3
MWM-P3
iSLIP-P3
OQ-PT
MWM-PT
iSLIP-PT
0.9
Throughput per port
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
Normalized load
0.6
0.7
0.8
0.9
Figure 5.7: Throughput under PT and P3 scenarios for cell mode policies.
loads, iSLIP is experiencing larger losses than OQ under PT (larger than IQ-MWM for instance),
as Fig. 5.7 shows.
Finally, the most important fact is that OQ is penalized in terms of average delay: the shared
buffer at output queue allow much longer queues to build up, therefore degrading the delay performance because of the Weibull tail [94]. These results underline that traffic models traditionally
adopted to assess switching performance are not capable of showing real world figures.
5.4 Conclusion
This work proposed a novel and flexible methodology to synthesize realistic traffic traces to evaluate the performance of switches and, in general, of controlled queuing networks. Packets are
generated from a single packet trace from which different synthetic traffic traces are obtained fulfilling a desired scenario, e.g., a traffic matrix. Additional constraints are imposed to maintain the
original traffic characteristics, mimicking the behavior imposed by Internet routing.
We compared the performance of a switch adopting different queuing and scheduling strategies,
under two scenarios: the synthetic traffic of our methodology and traditional traffic models. We
observed that not only absolute values of throughput and delays can change considerably from one
scenario to the other, but also their relative behaviors. This fact highlights the importance of some
design aspects (e.g., the buffer management) which are traditionally treated separately. These
results show new behavioral aspects of the queuing and scheduling in switches, which requires
more insight in the future.
86
CHAPTER 5. FEEDING A SWITCH WITH REAL TRAFFIC
Chapter 6
Data Inspection: the Analysis of Nonsense
ESPITE the availability of a rather large number of scientific software tools, to the best author
knowledge, they all fail, unfortunately, in one scenario: when all the data cannot be fitted into
the workstation random access memory.
This chapter introduces DiaNa, a novel software tool primarly designed to process a huge
amount of data, in an efficient, spreadsheet-like, batch fashion. One of the primary desing goals
was to offer extreme flexibility from the user perspective: as a consequence, DiaNa is written in
Perl. The DiaNa syntax is a very small and orthoghonal superset of the underlying Perl syntax,
which allow, e.g., to comfortably address file tokens and profitably use file formats throughout the
data processing.
The DiaNa software includes also an interactive Perl/Tk graphical user interface, layered on
the top of several batch shell scripts. While we shall only briefly review the former, we will focus
on the latters, throughly describing the tools architecture in order to better explain the offered
possibilities. Besides, the achievable performance will extensively be inspected; we finally present
some examples of the results gathered thorugh the use of the presented tool in a networking context.
6.1 Introduction and Background
The vaste landscape of software is characterized by a massive redundancy, of which the number
of existing programming languages, from general-purpose to very specialized ones, is a manifest
example. The choice of the “right” tool for the given task is not univocal, given the number of
existing tradeoffs and interrelationships among the different possible comparison criteria.
Nevertheless, despite the wide variety of already available software, there are cases where the
need of programs such as the Data Inspector aka Nonsense Analyzer (DiaNa) arises. Specifically,
to the best of author’s knowledge, all the existing publicy available solutions fails to perform when
the amount of data to be processed is much greather than –and cannot therefore be fitted into– the
workstation’s random access memory.
The DiaNa software has been designed by the The Networking Group (TNG) at the Politecnico di Torino during research works dealing with massive amount of real traffic measurements.
Obviously many researcher before have dealt with similar neworking problems, and a large amount
of software has been constructed to facilitate their analysis; however, in the authors opinion, the
lack of generality of these programs entailed the need for a significant effort in order to do anything
87
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
88
out of the ordinary. In the past years we therefore developed, extensively tested and profitably used
the tool, which is made freely available [95] to the research community; the package, which is constituted by a set of Perl[96] scripts and a Perl/Tk Graphical User Interface (GUI): in the following,
we will refer to the shell tools as d.tools and to both the GUI and to the entire package as
DiaNa, covering their essential features.
It must be pointed out that the presented framework does not intend to be a replacement for
any existing analysis tools, and a key of its desing is to allow extreme interoperability with other
existing tool. Also, despite the software has been designed for rather lenghty batch computation
on huge volumes of data, neverthless its use is not restricted to the former scenario, as the featured
GUI can profitably be used in an interactive fashion. DiaNa and the d.tools can automate
tedious, repetitive and otherwise error-prone activities, and can assist the user in activities ranging
from data collection and preparation to the typesetting and dissemination of the final results.
More specifically, the software is able to parse text files holding informations structurally representable as matrices, performing spreedsheet-like operations on the data. However, even in a
single application domain the range of tasks that have to be performed over the same data set can
change dramatically: therefore, the flexibility to perform the widest possible range of tasks has
been one of the most critical desing feature. The choice of Perl as the base language is extremely
helpful in this direction, allowing a natural cooperation with other tools, beside the use of a wide
number of available libraries or the on-the-fly parsing of user-defined Perl code, as we will detail in
the following. Therefore the d.tools, beside offering a small number of specialized tasks, have
to be intended as a customizable framework with a number of built-in functionalities especially
suited to process a certain the kind of input data.
The rest of this chapter is organized as follows: Section 6.2 contains a description of the architecture and syntax; a benchmarking analysis is proposed in Section 6.3, and finally some example
of use are presented in Section 6.4 for a networking context.
6.2 Architecture Overview
In this section, starting from a wide-angle perspective, we outline the interaction and the interdependence of the various pieces composing the DiaNa software. As it will be clear later, there are
basically three different possibilities of interactions with the framework: namely, i) the interactive
GUI approach, ii) the shell-oriented usage and iii) the API level, ordered by increasing level of
complexity.
The relationship among the DiaNa GUI, the d.tools and Perl is sketched in Figure 6.1.
From a top-down point of view, the graphical Perl/Tk GUI acts as a centralized information collector and coordinator of the lower-level shell-tools; each tool performs an atomic task on the input
data, offering the possibility of defining and exploiting custom file formats, upon which useful
algorithmic expressions may be built.
More in details, the GUI offers an Integrated Developement Environment (IDE) to manage the
DiaNa syntactical items; furthermore, it automatizes the interaction among the tools, covering a
wide range of tasks, as for example:
;
discriminating and split the input according to a given condition;
6.2. ARCHITECTURE OVERVIEW
89
Extendible
(Plugin/Templates)
Syntactical
items manager
Gnuplot
frontend
Centralized
coordination
DiaNa
GUI
Perl
syntax
Perl
shell
Specialized text file
low level processing
DiaNasyntax
Tools
Perl
Expressions
DiaNa
Syntax
Perl syntax
Formats
Ranges
Perl Syntax
syntax
Figure 6.1: DiaNa Framework Conceptual Layering
;
performing arbitrary numerical and textual operations, in a serial or parallel fashion on the
input data;
;
ranking the input data on a database-like manner;
;
evaluating, in an on-line fashion, the first and second order statistics possibly subsampling
the data;
;
computing empiric probability distributions, possibly conditioned.
The DiaNa items, can be divided into the following cathegoriesbasically: formats define columnar
fields labels and comments, allowing to easily identify and describe different file tokens; expressions allow to use the so far defined format labels and tokens for, e.g., spreadsheet-like computation; as discussed more in details later, the expression’s syntax represents a small and orthogonal
superset of the already full blown Perl one; ranges represent a convenient way to define partitions
of the real numbers field í . In the following sections we will dissect, adopting different perspectives, the architecture common to the d.tools and, devoting special attention to their syntax, we
will describe how the items are defined and interact together.
First of all, Section 6.2.1 motivates the choice of the base language. Although the DiaNa
syntax provides a fairly small increment to the Perl one, nevertheless it would be cumbersome
to provide here its exhaustive reference material. Therefore, rather than covering all the gory
details of the added items (i.e., format, ranges and expressions) we will just give, respectively in
Section 6.2.2 and Section 6.2.3, an adequate description of their use by illustrating the interaction
between formats and expressions. After detailing the common core architecture in Section 6.2.4,
we consider the most basic tool d.loop describing its usage in Section 6.2.5, while we quickly
review the main GUI’s features in Section 6.2.6.
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
90
6.2.1 In the Beginning Was Perl
Altough languages are interesting and intrinsecally worthy of study, their real purpose is as tools
in problem solving: therefore, we will focus here on practical reasons behind the choice of Perl
as DiaNa’s parent framework. Neverthless, we may say that the philosophy of the language
designer shapes the language, its libraries, its ecological niche, and its community: if the language
is successful, it will be used in domains that the original designer may not have intended as well
as being often extended in ways the original designer may not have intended. This happened with
many languages, but it is especially true for Perl: a real culture grew up around the language, to
such extent that there exists even perl poetry as well as chapters on Perl poetry [98].
Technically, there are two main manifest disadvantages in this choice, the first being the actual
a priori limit to the performance achievable, e.g., by a specialized C program; however, whether
performance could become a serious problem, it shall not be forgot that is possible to use external
C code from Perl1 . A more serious drawback, as admitted by the authors themselves in [99], is
that “There’s no correct way to write Perl. A Perl script is correct if it’s halfway readble and gets
the job done before your boss fires you.”. Altough this is stated in a fun and winning fashion,
truth is that Perl syntax normally allows to produce really illegible code –even outside the context
of the Obfuscated Perl Contest[104]– which clearly has a cost in terms of the readability and
maintainability.
On the other side, Perl dramatically reduces the writability cost, and its desing guidance principle “There’s More Than One Way to Do It” translates directly into unbeatable flexibility and
extensibility. These latters are irremissible properties since, even within an application domain,
requirements for two distinct projects may vary widely. Concerning this, Perl has the singular
advantage of a huge collection of well organised modules, publically distributed worldwide on the
CPAN[96] network of Internet sites; these includes obviously an incredible amount of scientific
data-manipulation tools, such as the Perl Data Language (PDL) [102]. Moreover, Perl offers a full
suite of modern features beyond numerical and graphics features (e.g., from databases access to
network connectivity and Object-Oriented programming) as well as a wide range of ways to communicate with other tools, e.g., though the mechanism of i) shell piping or ii) embedding of other
applications. The former approach, though powerful, is a limited form of “one-way” communication. The latter, more advanced form of communication is also supported, as examplified in [106];
the main disadvantage is that their implementation is often platform dependent.
Finally, we believe that implementation in a true general-purpose full-featured language gives
access to a wealth of useful features –and Perl filled all these constraints most admirably– whereas
specialist systems, whether commercial or free, are hampered in their access to these features by
their proprietary nature and specialist syntax. Besides, it is undoubtedly easier to add features
to a robust and complete language rather than going the opposite directions. Therefore, we may
say that who already used Perl for day-to-day programming tasks will find the DiaNa extension
extremely productive. Conversely, the advanced use of DiaNa can be achieved through an expressive knowledge of Perl, and therefore DiaNa learning cost is be absorbed by the other large
number of possible uses of Perl.
1
XS allows to interface Perl and C code or a C library, creating either dynamically loaded or statically linked Perl
libraries; it must be said that external C code is also supported in an inline fashion thorugh the Inline Perl module
6.2. ARCHITECTURE OVERVIEW
91
6.2.2 Input Files and Formats
The d.tools matrix abstraction of a generic input data file is sketched in Figure 6.2. Input
files may contain comments, that is, lines beginning with a configurable prefix (by default a î
sign). Comments rows are usually discarded, but can be optionally processed; moreover, special
informations can be included in the file header, defined as the first block of comment lines at
the beginning of the input file Different rows are individuated by a configurable Record Separator
(RS), newline by default, whereas an Input Field Separator (IFS) –which can be a character or a
regular expression [103]– is responsible for columns partitioning.
IFS
RS
Figure 6.2: Input Data from the DiaNa Perspective
As earlier mentioned, the assumption behind the DiaNa developement is that not all the processing data can be sourced at once into the workstation memory. Therefore, the adopted strategy
is to perform an on-line processing over a window of data rows, doubly limiting the information
available at any point of time in both the “future” and the “past”, with respect to the “present”
point. That is, as soon as data rows have been read from input, they become available for processing and are possibly buffered; all the columns of the buffered rows represent the sub-portion of the
input available to the running algorithm. This crucial point is emphasized in Figure 6.2, where the
buffered rows are represented with colors fading toward the past.
Columnar fields play a privileged role over file rows: therefore, a format allows to attribute a
(label,comment) pair to each file column. Each format definition is usually stored in an external
file, where each row contains the label and comment string pair separated by a tabulation character. However, columnar labels may be directly stored in the input file header, which has to be
intended as a quick and dirty alternative to the proper definition of a format; indeed, although very
useful, this option is neverthless discouraged, since it is more error prone, less expressive and less
maintainable than the centralized definition of a stable format.
6.2.3 Formats and Expressions Interaction
DiaNa expressions are the glue that ties input files to formats, allowing to quick reference the file
tokens as matrix cells in algebraic idioms or algorithmic chunks of code.
As will be outlined later, input files can be read either in serial or in parallel; therefore, expressions are interpreted in a serial or parallel fashion accordingly to the input context, which is
92
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
Token
#j
#i:j
#lab
#i:lab
Meaning
the j-th columnar field of the current row
the j-th columnar field of the the abs(i)-th
previously read row
the columnar field, whose label is lab,
of the current row
the columnar field, whose label is lab,
of the abs(i)-th previously read row
Table 6.1: Serial Expression Tokens
usually uniquely settled by each tool. Let focus first on the former kind, i.e., on serial expressions:
although parallel expressions syntax is formally identical to the serial one, nevertheless their interpretation is radically different.
Serial Expressions
As earlier mentioned, the DiaNa syntax consists of a small and orthogonal superset of the Perl
syntax: more precisely, the novelty is represented by the introduction of a few tokens. Tokens
are strings beginning with the pound # sign, which is interpreted by Perl as a comment delimiter
character: that way, DiaNa tokens do not clashes with the underlying Perl syntax. Token allows
to easily address different columnar fields among the several buffered rows within the current file
“window”: i.e., they embed a mechanism to profit of this sort of memory. Moreover, since formats
define a 1-to-1 correspondence between file columns and labels, these latter can be used as in
expressions’ tokens.
Serial tokens have mainly the form reported in Table 6.1; however, the list bares additional
discussion, and there are a number of special convenient expansions. First of all, it should be
noted that DiaNa tokens resemble more closely Awk’s tokens (i.e., $1,$2,. . . ,$NF) than Perl
ones (i.e., $ [0],$ [1],. . . ,$ [-1]). Secondly, the two notations #i:j and #-i:j being
perfectly equivalent, the latter may be used for further clarity. Besides, it is possible to reference
fields starting from the end of the line through tokens of the form #-j : #-1 represents the last
field, #-2 represents the second last and so on; however, this feature is valid on the current row
only, i.e., it cannot be used in conjunction with memory.
Other useful expansions are supported: for example, i is expanded to the i-th column when
the matches the regular expression /ˆ ï d+$/, i.e., is uniquely constituted by the relative number
Øñð
Õ
. Also, #. expands to the line number of the current input files seen as one unique stream:
this differs from Perl internal variable $., since the latter counter is restarted on any new input file.
Several tokens can then be combined together through the powerful Perl syntax, including any
possible algebraic, textual and subroutine based manipulations. Indeed, all the tokens are parsed
once at startup, in order to speed up their evaluation during processing, and the DiaNa expression
becomes a Perl expression This desing choice follows from the pursuit of the maximum possible
flexibility, in order for the application to be as general-purpose as possibile, suited to a range of
contexts as well as to different problem classes within the same domain.
6.2. ARCHITECTURE OVERVIEW
93
Parallel Expressions
Parallel expression tokens, though formally identical to their serial counterpart, have a totally different interpretation: #i:j identifies the j-th column of the i-th file (where both columns and files
numbering starts from 1), whereas #i expands to #1:i , . . . , #N:i (where N is the number of
input files).
‘paste‘
a A
b
B c
#1, #2, #3
C
a b
c
A B
C
Figure 6.3: Parallel Expressions and Default Expansion
Readers familiar with UNIX-like environment will immediately appreciate the ability to easily
write a tool orthogonal to paste, as illustrated in Figure 6.3. Indeed, suppose to work parallely
on three files a, b, c; assume further that each file has simply two columns, containing respectively
lower and upper case characters. In reason of the default token expansion, any expression concatenating the tokens #1 #2 #3 will naturally interleave each files column; conversely, an output
like paste can be achieved only by explicit reference, as in #1:1 #2:1 #3:1 #1:2 #2:2
#3:2) It is not hard to devise that parallel expressions can be tremendously useful for direct comparison: in the simplest form, each token can be algebrically (e.g., differences, ratios, averages)
compared to its counterparts, stored in the same columnar positions of other files.
]
Ranges
]
For completeness, we overview the partitioning
syntax of] the real set : »ÿò< Ø í into different
]
K
intervals (or bins). To uniformly partition the : ÿò< range, one can] specify either the bin size
K
of the bins number (respectively through : : ÿ and : ÿ @ ); or, to just take into account
the magnitudo of the variable,
]ôó one can specify a ó logarithmic bin size ( :LOG: ÿ , where LOG is a
keyword; future releases plan to support an arbitrary base H through LOG H ); finally, arbitrary ranges
Ò
Ò
Ò
ÿ ).
can be specified as lists (
G , ,..., L
6.2.4 The d.tools Core
The d.tools are a set of tools, each designed to perform a well defined task, developed around
a common core, sketched in Figure 6.4. The framework provides a set of useful built-in functionalities, that we will review in the current section, to simplify the implementation of new and
unsupported tools. At the leftmost side of the scheme, we shall notice that tools may work on
either a parallel or serial fashion over the input files: in the former case, several buffered rows for
each separate file will be available at the same time; in the latter, the files will be concatenated in a
unique stream. Each tool will usually operate in a single context, since the syntactical expressions,
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
94
although formally identical, have a totally different interpretation. The core will transparently handle the most common archiving and compression schemes (i.e., tar, gzip, bz2 as well as their
combination): the needed decompression/dearchiving filters are automatically pipelined at the input basing on files extension. Comments lines, discarded by default, can be possibly processed if
needed.
At the tool bootstrap, the engine loads any required Perl module and possibly evaluates on-thefly custom startup code, in the form of external scripts or inline-code. The core module tryes also to
load the format corresponding to the input files, again based on files extension (note that the format
recognition is compatible and coordinated with compression recognition); whether the file format
has been defined (either globally or locally in the input file header) textual labels can be used in
token. Expression are parsed once, and real memory requirements are automatically determined
in both the vertical deepness (i.e., how many input rows have to be buffered) and the horizontal
extension (i.e., which fields have to be buffered), so that DiaNa memory can be represented as a
sparse matrix. Besides, memory is setup only for the really required fields: this, rather than for
memory-economy purposes, result in better performance by partial optimization of the memory
update loop, as we shall discuss more in details on Section 6.3.7. Besides, the configurable uniform
or Montecarlo sub-sampling routine is possibly set up; among the other startup tasks, whether
verbose mode has been required, the core sets up the progress meter heuristic (provided that either
one of the /dev/stderr or /dev/stdout channels, in that order, are unused).
Serial Input
inN
...
in0
(SP0)
Progress Bar
Expressions
tool
specific
Ranges
actions
Parallel Input
.
.
.
out0
Output
Redirection
shell
pipe
(SP9)
out9
Formats
in0
.
.
.
shell
pipe
decompression
sampling
mux
User Libraries
Post Processing
inN
compression
.
.
.
Figure 6.4: Architecture of a Generic d.tools
At each step in the loop over the input rows, a sampling decision is taken, and data get either discarded or read and possibly buffered, becoming thus available for online processing; the
progress meter, whether used, is refreshed by unity-percentage increments. The specific action
performed clearly depends on the tool purpose: for example, the most simple tool (d.loop), does
not perform any action but does rather offer the possibility of executing user-specific code, possibly defined directly on the command line. Rather than detailing the possibilities offered by the
already existing tools, it is important to stress that user code can be easily plugged at this point of
the process, using the DiaNa engine as flexible and smart input-output framework which alleviate the user from a wide serie of tasks, allowing him to concentrate on the pure algorithmic part.
The interaction with DiaNa is possible either through the existing tools by on-the-fly evaluation
of user-defined code or, with an orthogonal approach, the embedding of the framework into the
6.2. ARCHITECTURE OVERVIEW
95
custom application as a plain Perl module. Finally, the rightmost side of the synopsis shows that
up to ten output streams (optionally compressed through the same mechanisms accepted at the
input) can be used simultaneously, each of which can be fed into a different custom shell pipe;
this is clearly useful to parallelize algorithmic work on the same input data, sharing and therefore
amortizing an intensive IO workload over multiple processing tasks.
The data processing can benefit of Perl natural support for a wide variety of structures, such
as hashes and arrays, that can be grown on demand without requiring explicit memory management. While we will provide some practical example on Section 6.4, let us anticipate that nested
combination of the above structures is allowed without restrictions – which can be very useful to i)
post-process the parsed data, or ii) for processes layered on several stages, or finally iii) for a common pre-processing phase. Besides, it should be said that complex data structures can naturally
be stored and re-loaded, which : the internal Perl structures can be dumped, through standard Perl
modules, into text script that may be simply sourced by the tools on demand.
6.2.5 The d.tools Flexibility
d.loop \
[−cvVzZfPW] \
[−d lib1,lib2,..libN ] \
[−B code ] [−C code ] [−E code ] \
[−? ’key1=>val1,...keyN=>valN’] \
[−! private ] \
[−H hdrchar ] \
[−I IFS] [−O OFS] \
[−F format ] \
[−S sampling ]\
[−0 out0 .. −9 out9] \
[(−e expression | ’expr’)] \
in0 .. inN
Figure 6.5: The d.loop Synoptic
In this section we quickly describe the so far described core functionalities can be accessed at
the shell level – avoiding as possible to provide a reference manual, which scope would be out of
context. This is done with the intent of showing, among the the three possible aforementioned kind
of interaction, the one representing the best tradeoff between complexity of the approach versus
featured flexibility.
As we previously mentioned, d.loop is a basic template for executing user-supplied code,
thus building many useful tools that have access to all the core functionalities provided by the
DiaNa core module. Also, this tool should be a reference for the subset of common options,
whose synoptic is presented in Figure 6.5.
d.loop Specific Options
There is only one option (specifically, -C) among those listed in Figure 6.5 specific to d.loop,
which allows to specify the Perl code that will be executed on-the-fly (by default, d.loop action
is to translate the IFS to OFS)
96
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
Output Control
DiaNa features a redirection mechanism which we will call explicit to distinguish it from the
default implicit shell redirection syntax. The explicit redirection currently offers up to ten simultaneous ouput channels (addressable through -0 out0 . . . -9 out9), but future releases plan
to raise the limit up to the operating system capabilities (addressable through any numerical-only
extended option, starting from --10). However, the channel -0 should be considered as reserved,
as it is currently used for debug (dumping a copy of the Perl script created on-the-fly that is going
to be executed).
Unless otherwise specified, the first two channels are directed to /dev/stdout and /dev/stderr
respectively, whereas and -^ defaults to /dev/null for nU^ =öÎ . The colon character :, when
used as parameter of a redirection option, expands to the null device /dev/null (this can be
useful to discard part of the output in some processing, such as the part not matching the condition
specified to d.match). By default existing output files are preserved from being overwritten,
unless explicitally otherwise specified (through -f)
Custom per-channels piping is supported by the Perl open syntax (e.g., -3 | command >
file will pipe the output to command before redirecting it to file), where clearly there are
no restriction to the levels of piped command. Output compression can be required through the
appropriate option (specifically, -z to use gzip and -Z for bzip2); the core module will handle
the specified custom output pipes, and the compressed output files extensions .gz, .bz2 will be
added if necessary.
Channels are directly available in an expression through the use of the OUT() subroutine (e.g.,
OUT($n,"string") will put the string in the $n-th file, and OUT("string") to the first output channel) or by direct access to the corresponding FileHandle (e.g., $OUT[$n]->print("string")
Input Control
Input data can be either piped or specified as file names on the command line; in the latter case, the
core transparently handle the most common compressions/archiving formats (i.e., gzip, bzip2,
tar and their combination). The comment character defaults to î (but can be specified through
-H) and the comment lines, that are skipped by defaults, can be optionally processed (--P flag).
The input field separator regular expression, can be configured (through -I) to a value other than
the default / ï s+/ (i.e., multiple spaces are treated as an unique separator).
Format recognition enables the token resolution, allowing to use the expanded alphanumerical
labels in expressions. Though the format can be specified in several ways, here we consider for
brevity the most common behavior (i.e., the colon expansion through -F:). In this case the core
will look for a format name matching the file extension (taking care of compressed extensions);
however, if a valid format definition cannot be found, the core will try to load format from the last
header line of the files being processed; otherwise, alphanumerical labels expansion will be turned
off. Besides, it should be said the search of the format definition file can be affected in several
ways, the simplest of which is either the use of the DIANADIR environment variable, or the use of
local format definitions (i.e., in the current working directory).
Finally, the input control subsampling, accepting the comma-separated tokens listed in Table 6.2, can be tuned through the option S; the colon : character expand to the environment
variable DIANASAMPLE or the hard-coded defaults.
6.2. ARCHITECTURE OVERVIEW
97
Code Control
There are several peers to “inject” custom code in a DiaNa tool; for example, different kind of Perl
code can be loaded at startup through the -d switch, and a basic code checking is also performed.
To be noted that it is also possible to load with the same option both Perl modules and code fragments, such as subroutine definition as well as stored variables (e.g., -d My::Module,myscript.pl,myv
Pre- and post-processing code, similarly to Awk’s (but not Perl’s) BEGIN and END blocks,
can be specified through -B and-E respectively; remarkably, the loaded code is available in any
point of the processing. Although this may be misleading, these special code blocks are unrelated from the homonym Perl syntactical blocks: rather, “begin” and “end” are relative to the
main loop over the data. As mentioned early, variables can be easily stored into external files
for later use: the easiest way is to “dump” them to a specific channel in and end block (e.g., -e
’OUT($n,Dumper($var))’, being the Data::Dumper module loaded by default) and to
redirect the channel to a file (e.g., -n myvar.pl).
Finally, the code that has to be executed at each step can be either specified on the command
line (’expr’) or through a keyword (i.e., -e expr). In the latter case, loop code can be stored
in a file, and the keyword is subject to an expansion mechanism similar to the format resolution
early mentioned. Expressions are critical items, so strict syntax checking is always performed;
furthermore, it is possible to compile only the expressions (-c), printing additional informations
about the DiaNa token expansion.
6.2.6 The DiaNa GUI
The DiaNa framework features also a Tk user interface, visually organized as a notebook as shown
in Figure 6.6. Following the paradigm that motivated the developement of DiaNa, the GUI has
been developed to allow for the greatest possible flexibility, while offering at the same time ease
and immediateness of use typical of graphical interfaces. Therefore, beside the point-and-click approach (useful to manage the expression and formats items), the first notebook page is an evoluted
Perl Shell (featuring auto-completion, interactive history, . . . ). Following the same approach, the
Gnuplot page is a gnuplot [105] front-end that couples a drag-and-drop approach to a template
mechanism. Templates authors need only to define a couple of well-known function functions and,
accessing DiaNa internal data, can provide advanced visualization features; besides, the gnuplot
code is always available for direct editing, and multiple picture formats are handled in a way transparent to the user (based on the output file name). A plugin mechanism offers another comfortble
peering point to customize the GUI, providing reusable building blocks for (possibly graphic and
interactive) functionalities that are not implemented in the factory interface. For lack of space,
we refer for further details about the framework, its usage and the developement, to the DiaNa
¯
uõ mõ hõ t
p
>min
<max
³
uniform, Montecarlo (default), head, tail
use p percentage of the file (default ö 10) NuP,÷
use at least min lines of input (default < N,P,ù)
use all most max lines of input (default ø
)
Table 6.2: Sampling Option Synopsis
98
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
Tutorial and man-pages, both included into the standard distribution; finally, a valid source of information is represented by the about 16k lines of source code itself, more than half of which are
comments.
Figure 6.6: The DiaNa Graphical User Interface
6.3 Performance Evaluation and Benchmarking
The choise of Perl as the base language of the DiaNa software allows for great flexibility to the expense of the achievable performance: in this section we quantitatively inspect the aforementioned
tradeoff, further gathering some relevant qualitative insight.
Techniques for computer performance analysis are divided into three broad classes: analytic
modeling, simulation modeling and benchmarking; the latter technique, which is the object of
the current section, is the performance measurement through direct observation of the system of
interest.
In this context, a benchmark is said to be representative of a user’s workload if it accurately
predicts performance of the user’s workload on a range of configurations. Clearly, a benchmark that
measures general compute performance will not be very representative for a user with a workload
that is either intensively floating point, or memory bandwidth, or I/O oriented. Due to the difficulty
to define general, significant workloads representive of a broad class of tasks, which evidently
depend on the nature of problem to be solved, we will try in the following to isolate the impact of
each single component affecting the performance of the proposed software.
6.3. PERFORMANCE EVALUATION AND BENCHMARKING
99
6.3.1 Related Works
We must stress that our focus it is not to present a complete picture of the suitability of Perl as a
scientific tool, neither to assert a final word on Perl benchmarking, nor to contrast Perl with the
performance achieved by other languages – since these topics have already been more thoroughly
explored by a number of relevant studies.
For example, [106] explored in details how Perl can be used in processing data for statistics
and statistical computing can be either by using modules or by embedding specialized statistical
applications, also investigating the numerical and statistical reliability of various Perl statistical
modules. The thesis of the chapter is to demonstrate how Perl’s distinguishing features make it
particularly well suited to perform labor intensive (e.g., complex statistical computations involving
matrix algebra and numerical optimization) and sophisticated tasks (ranging from the preparation
of data to the writing of statistical reports).
Authors in [107] performed some basic experiments to see how fast various popular scripting
(like Awk, Perl,and Tcl) and user-interface languages (like Tck/Tk, Java, and Visual Basic) run on
a spectrum of representative tasks. Their observed enormous variation in performance, depending
on many factors, some of which uncontrollable and even unknowable. As they point out, the
comparison revealed to be more challenging, with more difficulties and fewer clear-cut results
than expected. The results seems to indicate little hope of predicting performance in other than a
most general way. In the author’s own words: “if there is a single clear conclusion, it is that no
benchmark result should ever be taken at face value”.
Finally, [108] considered a rather different approach to effectuate a thorough empirical comparison of four scripting languages (Perl, Python, Rexx, and Tcl), and three conventional system
programming languages (C, C++, Java). In a very large scale effort, about 80 realizations of the
same requirement set, coded by different programmers, were compared with each other in terms
of program length, programming effort, runtime efficiency, memory consumption, and reliability.
Interestingly, it was found that programs written in conventional languages were typically two to
three times longer than scripts, and that consequently productivity, in terms of line of code per hour,
was haved with respect to scripting languages. Programs written in a conventional language (excluding Java) consumed about half the memory of a typical script program, and run twice as fast;
also, in terms of efficiency, Java was consistently outperformed by scripting languages. Scripting
languages were found to be, on the whole, more reliable than conventional languages.
6.3.2 The Benchmarking Setup
The DiaNa benchmarking setup, as schematized in Table 6.3, considered different hardware architectures (HW), as well as different operating systems (OS) and different flavors of the Perl
interpreter (SW).
The data reported in the table bare additional discussion; on the SW regard, it should be
noted that we indicate simply by Perl the most common perl interpreter distribution (available
at [96] as well as in almost any GNU+Linux distribution), whereas we denote by ActivePerl the
ActiveState[109] solution. Besides, it should be noted that although the two interpreters share both
the revision and version numbers, they differ in the subversion number.
However, let us remind at the risk of being tedious that our purpose is not to provide an exhaustive quantitative picture of DiaNa performance, nor to rank hardware architectures, operating sys-
100
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
SL
LX
WN
HW
OS
SW
HW
OS
SW
HW
OS
SW
UltraSPARC IIe @ 0.5 Mhz + 1 GB RAM
Solaris 5.8
Perl 5.8.0
Pentium IV @ 2.4 GHz + 1 GB RAM
Linux-2.4
Perl 5.8.0 + ActivePerl 5.8.4
Pentium IV @ 2.4 GHz + 1 GB RAM
Windows 9X + Cygwin 1.5.11
Perl 5.8.0 + ActivePerl 5.8.4
Table 6.3: Benchmarking Setup Architectures
tems or Perl interpreter; rather, our purpose is to gather a meaningful qualitative insight of DiaNa
performance valid under a range of different architectures. Therefore, the peformance achieved by
each program instance are to be intended on a relative rather than absolute scale. This remark also
justifies the significant difference between the SN and LX or WN clock speeds.
In the following, where the performance picture are similars across the different systems and
unless otherwise stated, we will report the performance results referring to the LX (i.e., Perl 5.8.0
on a Linux 2.4 kernel running on an Intel Pentium IV platform clocked at 2.4 GHz and equiped
with 1 GB RAM).
In any of the above systems, the performance metrics were measured using the GNU time
utility. Besides, a number of standard techniques have been used to reduce the caching effect, such
as to discard the first measurement and to use long enough files, roughly twice the workstation
main memory size.
6.3.3 Perl IO
Preliminary, we report some results on a benchmark whose workload is purely input/outputoriented. The setup is very simple: a very long file is transferred using different tools, and the
transfer mode is either binary or textual, depending on the tool.
Specifically, we effectuated a textual-mode file copy with Perl, Awk, cat, and a binary-mode
copy with cp, dd, and a C program. Although our interest is mainly driven by textual-mode
performance, neverthless we tested the achievable performance using different sizes of the input
and output buffers of both dd and the specialized C program. Figure 6.7 reports, with the exception
of the dd output, the IO peformance expressed as the CPU load required (on the x-axis) to achieve
a given throughput (on the y-axis) on the Linux platform. In terms of throughput, cp achieves
the best performance, although the least cpu load is generated by cat. The performance of the
specialized C program can be tuned among these two different behaviors varying the IO buffer
size: small buffers allow for high throghput at the expense of higher CPU load (cp-like), while
both the throughput and the CPU load decrease as the buffer size increase (cat-like).
Interestingly, under the Linux platform, Perl falls into the cp-like throughput class, whereas
the throughput achieved by Awk belongs to the lower cat class; however, the Perl throughput
peformance comes at the cost of a higher load, roughly twice as much as the CPU required by cp:
this entails that less computational power will be available to sustain such a level of throughput
6.3. PERFORMANCE EVALUATION AND BENCHMARKING
101
55
Throughput (MB/s)
50
45
40
awk
cat
cp
perl
C(256)
C(512)
C(1024)
C(2048)
C(4096)
35
30
25
10
15
20
25
30
35
40
45
50
55
CPU Load
Figure 6.7: Input/Output Performance on Linux-2.4
performance.
Normalized Throughput
1.5
1.25
1
0.75
0.5
10
15
20
(Linux-2.4)
(Linux-2.4)
(Linux-2.4)
(Linux-2.4)
25
awk
cat
cp
perl
30
35 40
CPU Load
45
50
55
60
65
(Solaris-5.8) awk
(Solaris-5.8) cat
(Solaris-5.8) cp
(Solaris-5.8) perl
Figure 6.8: Linux vs. Solaris Normalized Throughput Performance
Let now investigate if these remarks hold for different systems, specifically on the Linux and
Solaris platforms, considering in Figure 6.8 the behavior of a subset of tools (namely cat, cp,
Awk and Perl) again evidencing the tradeoff among CPU load and the achievable throughput. In
order to allow a qualitative comparison across systems, each tools throughput has been normalized
over the average throughput achieved within each system; similarly, the normalized CPU load has
to be intended as rescaled over the average load within each system.
102
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
A first observation is that although the scaled cp performance are very similar in both systems,
the cat ones have a totally orthogonal behavior. More precisely, under LX, cat requires less
CPU but achieves a low throughput (lower than both cp the average of other tools); conversely,
under SL, cat achieves the highest throughput among all tools, requiring even more CPU power
that cp. A final interesting remark is that the performance of Awk and Perl appear far more similar
under SL than LX.
6.3.4 Perl Implicit Split Construct
The Perl interpreters offer a number of switches and flags to modify and finely tune their runtime
behavior; while these options are designed with Perl programmers’ Lazyness in mind (i.e., to
quickly write powerful inline scripts), as a matter of performance the use of either switch does not
provide any significant improvement. Rather, the use of implicit split (i.e., splitting the current line
of text, loaded into the default scalar variable $ , on the basis of the IFS to several tokens into the
default array @ ) dramatically contributes to boost performance, practically more than halving the
execution times. Therefore the implicit construct, though deprecated since it clobbers subroutine
arguments, is neverthless used by DiaNa in reason of the huge performance benefit.
To better highlight this phenomenon, we measured the execution times of two Perl inlines, respectively implementing an implicit (perl -ne "split") and explicit (perl -ne "@ =split")
split, under the LX and WX machines. We repeated the tests considering both ActivePerl and Perl
and, similarly to before, we normalized the result over the mean per-system execution time. Results
are plotted in Figure 6.9: the first important observation is that the performance gap of the explicit
vs. implicit solution persist across systems and interpreters. Besides, looking at the WX machine
it is evident that ActivePerl literally outperforms the Cygwin Perl interpreter: it is tempting to conjecture that ActivePerl is highly optimized for Windows platforms, although this statement would
require further investigation.
Linux-2.4
2.5
Perl
Windows 98
ActivePerl
Perl
ActivePerl
@_=
2
@_=
1.5
@_=
1
@_=
0.5
0
Figure 6.9: Explicit split and Performance Loss
6.3. PERFORMANCE EVALUATION AND BENCHMARKING
103
6.3.5 Perl Operators and Expressions
Perl’s rich set of operators, data types, and control constructs are not necessarily intuitive when it
comes to speed and space optimization. As admitted by Perl authors, many trade-offs were made
during its design, and such decisions are hidden in the code internals.
A number of practical advices are given in [99], the de facto Perl reference, featuring an entire section devoted to programming efficiency – where it is stated that in general, the shorter and
simpler the code is, the faster it runs, but there are exceptions. Some of the advices follows programming praxis: it is well known that accessing a scalar variable is (experimentally, Ú
ú ¶ˆÎ> ) faster
than accessing an array; similarly, arrays and scalars are (respectively, ú ¶£Ñ> and ú ¶è0Î?> ) faster
than an hash table. Conversely, a counterintuitive statement is that maximizing the length of any
nonoptional literal strings in regular expressions may result in a peformance gain; indeed, longer
patterns often match faster than shorter patterns because the optimizer peforms a Boyer-Moore
search, which benefits from longer strings.
Since all the operations contained in a DiaNa expression are ultimately performed by the Perl
interpreter, it can be useful to investigate its per-operator performance: indeed, such a knowledge
may lead to “rephrase” an expression, when possible, in a more convenient fashion.
An interesting plot of the unary, binary and trinary Perl operators (for the LX archictecture
û only) is provided in Figure 6.10, depicting the relative time-cost of each operator for both integer
and floating point ü operations; for completeness, string operations are reported in the small
inset plot of Figure 6.10. The cost is measured as the ratio of the time elapsed to perform the
generic ý operation with respect to the time elapsed to perform the sum ` operation on the same
data set. The top and bottom x-axis in Figure 6.10 refer to, respectively, the floating and integer
operations, whose operators are sorted into increasing cost order.
Floating Operators
--
1.35
||= ||
++ !
15
12
9
6
3
0
1.3
1.25
Time(Op) / Time(+)
&&
1.2
1.15
1.1
-
+=
>= <=
<=>&&=
>
< /=
* *=
== !=
-= ?:
/
+
-
%=
**=
% **
&= &
<<= = <<
| >> |=
>>=
String
.=
ne
eq
cmp
!˜
=˜
1.05
1
0.95
0.9
Integer
Float
0.85
0.8
++
&&
||
--
!
-
/=
-=
!=
&&=
||=
>=
+
?:
<
<=
>
*=
/
*
==
<=>
+=
%=
&=
%
-
<<=
=
&
|=
<<
|
>>=
>>
**
**=
Integer Operators
Figure 6.10: Normalized Cost for Floating Point, Integer and String Operations
First of all, it must be said that the cost absolute scale for the integer operations is 1.5 times
smaller than the corresponding floating operation (averaged over all the observations). Then, it
104
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
should be noticed
that the floating cost scale is flattened with respect to itsû integer counterpart:
û
the longest -operation takes approximately ¶lB? more time than shortest -one, whereas this
ratio drops to ¶hG in a floating context. Besides, it should be said that Perl allow to use realoperands combined with integer-only operators, such as the integer reminder division %, since the
û interpreters performs a preliminary conversion of the operands – which partly explains the changes
to ü of the operators cost order.
As a side note, although it could be tempting to to use the measured operator cost to predict, on
a single platform, the cost of any arbitrary expression, neverthless this approach would remain of
limited interest (indeed, the relatively small difference among the longest and the shortest operation
easily allow to estimate a coarse but satisfying upper-bound).
Finally, it should be noted that, in general, the “contract” assignment form for one of the the
four arithmetic operations (i.e., +=,-=,*=,/=) is slighlty faster than the corresponding operation
itself (i.e., +,-,*,/): this suggest to break long expressions into smaller chunks, especially if the
intermediate values have to be reused at a later step. We also tested the effect of parenthesization
on expression cost, to find out that even with several levels of redundant and unnecessary parenthesization (i.e., using more than 20,000 parentheses, differently combined on a 100 element sum),
the parenthesization overhead is neglectable.
6.3.6 DiaNa Startup Overhead
We now shift the focus from Perl to DiaNa, which introduce different kinds of overhead with
respect to the parens language: module loading, files startup, progress bar startup, sampling expression parsing, memory functionality to mention a few. However, since we shall expect DiaNa
to work on large amount of data, we are allowed to consider the startup overhead associated with
the shell scripts small enough to be neglected. Nevertheless, is interesting to notice that the startup
time, altough depending on the desired features, always remains bounded by a small constant time,
which make the tools suitable also for interactive data processing. Indeed, it should be noted that
much of the startup overhead is clearly neglectable per se: e.g., the computations required to set-up
the correct parameters for deterministic, uniform or Monte-Carlo sampling is offset by the dramatic
reduction of the file size; the DiaNa expression parsing into Perl syntax occurs once at startup, so
that no matter how complex the expression can be, this will affect only the evaluation performed
by the Perl interpreter.
Let un now analyze the precise estimate of the file lines number, which can be a very time
consuming task, especially since we expect to process large files; therefore, we decided to estimate
the file lines number in a very simple manner. First we gather the actual, i.e., non-compressed, file
size ] expressed in bytes; then we compute the average line length þ , expressed in bytes, considering only a small subset of the file – which, by default, is constituted the first hundred lines. The
underlying assumption is that although the values measured in the files can change significantly,
the amount of bytes needed to store them will be roughly the same (i.e., both and ɝ
may be stored in a textfile as 1e+00 and 1e+09 respectively). Finally, we consider the file to be
] å þ lines long, practically yielding to very quick estimate with a satisfying approximation degree.
For instance, the estimation of the number of line of the whole of data, requiring one access for
each of the 12 files, took 250 milliseconds on average for the non-compressed 1 GB set and took
430 ms for the gzip-compressed 67-MB set.
6.3. PERFORMANCE EVALUATION AND BENCHMARKING
105
1.6
1
0.9
1.4
Load Time (s)
0.7
1
0.6
0.8
0.5
0.4
0.6
0.3
0.4
➚ Load Time
➚ LoC CDF
➘ Load Time
➘ LoC CDF
0.2
0
0
50
100
150
200
Lines of Code (LoC) CDF
0.8
1.2
0.2
0.1
0
250
Number of Loaded Modules
Figure 6.11: Modules Loading Performance
More interesting are the performance result of the module loading time, which measure the
time required by perl to load external modules through the use pragma. We varied the number
of loaded module from a single module to more than 200 modules included in the standard GNU
uèÿ
q †É}8 Lines of Code (LoC). We repeated the test varying the order
distribution, for a total of }(æ
of the modules. The benchmarks reported in Figure 6.11 consider the cases where the modules
are either loaded, basing on their LoC length, in ascending or descending order. The
figure points represent, on the left y-axis, the load time expressed in seconds as a function of the
number of loaded modules on the x-axis; the two lines on the right y-axis represent the cumulative
distribution function of the LoC loaded as a function of the number of loaded modules. It can be
gathered that the absolute times keeps very small, i.e. about 1.5 seconds, even in the insane case
where all the system’s modules are loaded. Considering a more reasonable and practical number of
modules, the time required to load the 5 biggest module is on average equal to 150 ms, whereas the
time required to load the 5 smallest module is about three times smaller; an interesting observation
is that this difference vanish as we consider the }$ biggest or smallest modules, where the
loading time is roughly 300 ms in both cases.
6.3.7 Fields Number and Memory Depth
To conclude the benchmarking analysis, we investigated the effect of the number of fields and
memory depth; in order to avoid to negatively bias the test toward expression with longer output,
we simply evaluated the expression and then discarded the result value, so that the amount of IO
workload does not depend on the expression under examination.
The considered expressions consist of mere sum of different fields, all of which belongs to the
same “past” point: in other words, each expression evaluate a sum of the form ü #M:i, where
o ÿ6ÿ,ÿ
both the number of fields and memory depth vary in }~
~r?
q s . Recalling Figure 6.2,
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
106
z=50
50
z=100
z=150
z=200
250
45
Memory Depth
40
200
35
30
150
25
20
100
15
10
50
5
0
0
5
10
15 20 25 30 35
Number of Fields
40
45
50
0
Figure 6.12: Time Cost of Fields Number vs. Memory Depth
the number of considered fields defines the horizontal extension of the used memory, whereas the
vertical dimension individuates its extent (or depth) toward the past. When the vertical dimension
is not null, DiaNa implicitally sets up a loop to update the fields that makes use of the buffering
feature. Furthermore, to simplify the updating process, DiaNa overdimension the buffers to the
maximum of the memory depth over all expression tokens, thus avoiding to keeping and accessing
per-field state. In other words, the buffer maintanance requires memory cells to be updated
for each of the ç memory-active fields, at each advance in the input file; therefore, the required
per-step updates amount to a total of ç .
The cost of the expression, expressed in this benchmark as the seconds elasped to process 100
MB on LX, are shown in Figure 6.12 as a function of both the fields number (on the x-axis) and
and memory depth (on the y-axis). The hyperbolic shape of the contour lines, shown for execution
times multiple of 50 seconds, confirms our expectations; indeed, the nesting of the memory and
field loops entails the buffer update cost to be proportional to the ç product.
6.4 Practical Examples
The presented tool has been developed for the analysis of network traces: in the following we
explore how the DiaNa framework has been used for the resarch activities in our group. Altough
the networking context is common to all the considered projects, each of them follows a rather
different path. Referring to the results discussed in the previous chapters, we briefly recall the
problem and highlight the specific DiaNa feature used for its solution.
6.4. PRACTICAL EXAMPLES
107
The Web and User Patience
The DiaNa tools developement started for [52], which study the web users’ behavior when network performance decreases causes the increase of page transfer times. Basing on two month real
traffic measurements at the flow level, we analyzed whether worsening network conditions translate into greater user impatience, with the intent to refine a picture of the complex interactions
between user perception of the Web and network-level events.
The chapter introduced a TCP-level heuristic criterion able to discriminate among completed
and user-interrupted HTTP connections, involving about a dozen of TCP connection parameter. Let
consider a simpler property (called eligibility in [52], which is a necessary condition to any interrupted flow), that can be stated as in the following expression, valid for Tstat files: (#47==80)
&& (#4 && !#59) && #53.
The definition a Tstat format allowed us to rephrase the former expression into the following
more readable equivalent: (#S.port==80) && #S.data && #C.rst && !(#S.fin á
#S.rst), where the S and C prefixes stand for server and client. This is clearly convenient in
terms of readability and maintainability, especially with more complex expressions. Besides, infering the flow-level heuristic, involving the processing of two month of real traces, could not have
been possible without the core assumptions of the DiaNa framework.
The Zoo of Elephants and Mices
In [56], we analyze the statistical properties of the TCP flow arrival process, and its Long Range
Dependency (LRD) characteristics in particular, showing as possible causes of the LRD of TCP
flow arrival process i) the heavy tailed distribution of the number of flows in a traffic aggregate,
and ii) the presence of TCP-elephants within them.
Methodologically, we aggregated several TRs within the same TA in such a way that each
TA has, bytewise, the same amount of traffic. To induce a divisions of TCP-elephants and TCPmice into different traffic aggregates, the used algorithm packs the largest TRs in the first TA, so
that subsequently generated aggregates are constituted by an increasing number of smaller traffic
relations.
To perform such an aggregation, we computed once the amount of bytes exchanged during the observed trace per traffic relation (i.e., $TR Ü "#S.addr/#C.addr" Ý += #S.data),
post-processing the hash by sorting its keys bytewise (i.e., @TR = sort Ü $TR Ü $a Ý <=>
$TR Ü $b Ý7Ý keys %TR;). A greedy algorithm has been then used to consecutively aggregate
the sorted TRs into a single TA accounting for å of the totally exchanged bytes, where N varied
in ÜYÍ¯Í $? Ý . Then, we applied the gathered mapping to the original trace, defining thus traffic
aggregates; the tremendously useful features, in this case, are Perl ability to easily define mapping
through hash tables, coupled to its powerful sorting semantic.
Finally, to study the TCP arrival process with the TA, and specifically its LRD properties, we
used the wavelet-based AV[67] estimator, which has many advantages, as it is fast, robust in the
presence of non-stationarities and can be used to generate on-line estimates; in this case, the natural
integration between Perl and the shell allowed the parallel execution of already existing (and fast)
C programs through simple piping mechanism.
108
CHAPTER 6. DATA INSPECTION: THE ANALYSIS OF NONSENSE
Synthetizing Realistic Traffic
In [84] we present a methodology to generate several synthetic traffic traces from a single real
trace of packets captured with tcpdump [10]. The synthetic traffic has then been used to assess
the performance of scheduling algorithms in high performance switches. Our results show that
realistic traffic degrades the performance of the switch by more than one order of magnitude with
respect to the traditional traffic models, suggesting that the latters may not be able of showing
real-world pictures.
Formally,
the synthetis problem in [84] can be stated as an optimization problem of job schedul]
ing over parallel machines. However, due to the size of the problem, and since a solution to the
X Ò
ß á!Z
problem, which is well known to be NP-hard, cannot be found in polynomial time, we
opted for a greedy approximated solution; neverthless, future work plan to optimally solve, using
Perl Data Language, a modified version of the problem.
Methodologically, the first step of the synthesis problem consist of gathering the per-TR amount
of exchanged bytes, similarly to the previous section but, unlike before, we are now considering
a packet-level trace. The gathered information are then used to define a near-optimal mapping
among logical switches port and real IP adress, where, again, mapping function are expressed
through Perl hashes. The second step of the synthesis problem consist of apply the mapping to the
original trace, in order to obtain the synthetic traffic and to fed an external switch simulator engine.
6.5 Conclusions
This chapter presented DiaNa, a versatile and flexible Perl framework, designed especially to
process huge amount of data; however, as suggested from the benchmarking results, it can be
profitably used for interactive analysis as well.
After overviewing its architecture and syntax, we devoted special attention to the simplest
possible way of getting the most out of it, describing its usage and interaction at a shell-level.
Finally, we gave some intuition of possible uses of the tool, describing the context that brought
us to its developement as well as giving some practical examples.
Chapter 7
Final Thoughts
HIS thesis covered several aspects related to the network measurements field: this final chapter discusses some interesting future directions and possible extensions of the research presented so far.
7.1 Measurement Tool
To summarize, in Chapter 1 we provided a general introduction to the networking measurement
field, describing how the real traffic can actually be gather as well as Politecnico’s network. In
Chapter 2, we presented Tstat, a tool for Internet traffic data collection and its statistical elaboration. Exploiting its capabilities, we have presented and discussed some statistical analysis performed on data collected at the Politecnico ingress router. The results presented offer a deep
insight in the behavior of both the IP and the TCP protocols, highlighting several characteristics of
the traffic that were hardly observed through passive measurement – but were rather i) generated
by injecting ad-hoc flows in the Internet or ii) observed in simulations.
Having access to relevant and up-to-date measurement data is a key issue for network analysis
in order to allow for efficient Internet performance monitoring, evaluation and management. New
applications keep appearing; user and protocol behavior keep evolving; traffic mixes and characteristics are continuously changing, which implies that traffic traces may have a short span of
relevance and new traces have to be collected quite regularly. There are several ongoing works
dealing with Tstat: indeed, based on the experience gained on IP networks traffic measurement,
and especially with TCP traffic, we are currently extending Tstat capabilities in several directions,
which we describe here shortly.
7.1.1 Cope with Traffic Shifts
As mentioned early, the pace at which technology is evolving, continuously enables different services and applications: as a consequence, traffic streams flowing in current networks are very
different from the traffic pattern of the very recent past. Indeed, while a few years ago Internet
was synonym with Web-browsing, the pervasive diffusion of wide-band access has entailed a shift
of the application spectrum – of which peer-2-peer file-sharing and audio/video streaming are the
most representative examples.
109
CHAPTER 7. FINAL THOUGHTS
110
Although TCP traffic still represent the majority of Internet traffic, the volume of UDP and
RTP traffic is increasing. A significant effort should be devoted to enable the monitoring of traffic
types other than the currently supported. We have currently extended Tstat to support for simple
UDP and RTP accounting, although much remains to be done in order to use Tstat as profitably
as with TCP traffic. Finally, the traffic shift is still in-progress: for example, no media streaming
solution has definitively taken over the others, and as a result HTTP-based streaming is still very
popular beside RTP-based applications. Also, the popularity of peer-2-peer applications changes
very frequently; furthermore, since the underlying protocols keeps evolving, the TCP ports carrying peer-2-peer control traffic changes as well – which complicates, e.g., even the basic task of
tracking the control traffic volume of peer-2-peer1 . applications.
7.1.2 Cope with Bandwidth Increase
In oder to be useful, network traffic measures should evidently be continuous and persistent: these
joint requirements allow to track changes in the traffic pattern, as well as to gather a statistically
significant traffic “population”. Under this light, another issue arising from the widespread and
massive increase of the access link capacity is that the quantity of the data that has to be monitored
increase as well; although this may seem obvious, nevertheless this generate two potentially very
critical issues:
;
computational complexity, due to the continuously monitoring requirement
;
scalability and storage problems, due to the data persistence requirement
While at present the computational complexity does not constitute a problem, scalability of the
measurement and storage capacity begins to be troublesome. We have adopted a simple solution to
allow for a complete and “flavored” analysis of the network traffic that satisfy both the persistence
and continuity requirements. The solution involves the use of a sophisticated and efficient RoundRobin Database (RRD)[112] structure: in this way, the scalability is achieved through the use of
different levels of temporal granularity. In very simple terms, the adopted methodology allow to
determine a priori the maximum amount of state that has to be maintained, tuned as a function of
the available storage amount. At the same time, the detail of the information will depend on the
resolution of the temporal scale under examination: in other words, the higher the detail level (i.e.,
the finer the temporal resolution), the smaller the statistical population sample size (i.e., the shorter
the temporal observation window). Such a system would allow, for example:
;
to know exactly the characteristic of the last traffic hour, keeping packet level granularity
;
to know the traffic gathered in the last day, week, month and year using decreasing levels of
detail, as shown for the IP packet length statistics in Figure 7.1
Finally, Figure 7.2 show an example of the Web interface provided to browse the statistical properties of the different collected traces in the persistent monitoring setup.
1
I have worked on P2p when I was a Visiting Researcher at Computer Science Division, University of California,
Berkeley; I worked with Prof. Ion Stoica on an enhancement [110] of the Chord[111] Distributed Hash Table
7.1. MEASUREMENT TOOL
Figure 7.1: TstatRRD: Different Temporal Resolution for IP Packet Length Statistics
111
112
CHAPTER 7. FINAL THOUGHTS
Figure 7.2: TstatRRD Web Interface: Round Trip Time Example
7.1. MEASUREMENT TOOL
113
7.1.3 Distributed Measurement Points
In order to give a holistic view of what is going on in the network, passive measurements have to
be carried out at different places simultaneously: on this basis, we are currently integrating Tstat
in a passive measurement infrastructure, consisting of coordinated measurement points, arranged
in measurement areas.
Specifically, the infrastructure identified is the Distributed Passive Measurement Infrastructure
(DPMI)[113] developed at the Blekinge Institute of Technology (Bth). The DPMI structure allows
for a efficient use of passive monitoring equipment in order to supply researchers and network
managers with up-to-date and relevant data. The infrastructure is generic with regards to the capturing equipment, ranging from simple PCAP-based devices to high-end DAG cards and dedicated
ASICs, in order to promote a large-scale deployment of measurement points. The key requirements
of DPMI are:
;
Cost: Access to measurement equipment should be shared among users, primarily for two
reasons: first, as measurements get longer (for instance for detecting LRD behavior) a single measurement can tie up a resource for many days; second, high quality measurement
equipment is expensive and should hence have a high rate of utilization.
;
Ease of use: The setup and control of measurements should be easy from the user’s point of
view; as the complexity of measurements grows, we should hide this complexity from the
users as far as possible.
;
Modularity: The system should be modular to allow independent development of separate
modules, in order to separately handling security, privacy and scalability (w.r.t. different link
speeds as well as locations); since one cannot predict all possible uses of the system, the system should be flexible to support different measurements as well as different measurement
equipment.
The DPMI architecture consisting of three main components, Measurement Point (MP), Consumer and Measurement Area (MAr), which is depicted in Figure 7.1.3a. The task of the MP is
to deal with the packet capturing, packet filtering and measurement data distribution, while the
Consumer is fed with the MP data streams. A complex policy of filtering and optimization can be
configured through the MArC which, hiding the complexity to the consumers, basically handles
the communications between MP and Consumers. The actual distribution from MP to Consumers
is delayed for a matter of efficiency: captured data is stored in a buffer pending transmission, and
the frame distribution and data duplication is handled by the MArN.
Under this perspective, Tstat is simply a Consumer of the DMPI architecture, and is fed a
stream of encapsulated DPMI packet, whose header is shown if Figure 7.1.3b. The header stores a
Capture Identifier (CI) a Measurement Point Identifier (MP), a capture timestamp (supporting an
accuracy of picoseconds), the packet length, and the number of bytes that actually were captured,
plus the length of the captured packet and the length of the packet fraction that is actually being
distributed (typically, the header). Therefore, the capture header enables us to exactly pinpoint by
which MP and on what link the frame was captured, which is vital information when trying to
obtain spatial information about the network behavior.
CHAPTER 7. FINAL THOUGHTS
114
MP1
S
M w
MP2 A i
r t
MPn N c
h
TSD
MArC
User
Tstat
Consumer
A
Consumer
B
(a)
CI
MPid
Time
Len
Time
CLen
(b)
Distributed Passive Measurement Infrastructure: The Architecture (a) and the Packet Header (b)
7.2 User Patience and the World Wide Wait
Then, in Chapter 3, we inspected a phenomenon intrinsically rooted in the current use of the Internet, caused by user impatience at waiting too long for web downloads to complete. We defined
a methodology to infer TCP flows interruption, and presented an extended set of results gathered from real traffic analysis. Considering several parameters, we showed that the interruption
probability is affected mainly by the user-perceived throughput, as expectable. Nevertheless, we
gathered that, unlike what one may think, users do not tend to wait longer when transferring long
files, as the unexpected increase of the interruption probability along with the file size testify.
Moreover, results shown that the user intervention is mostly limited to the very beginning (environ
20 seconds) of the life of the elephant flows.
7.2.1 Practical Applications
A possible practical application of the presented work would consider to use the presented interruption metric to measure the user satisfaction of Web performance: this objective and quantifiable
metric could be used, e.g., by system administrator to test the validity of network link upgrades, of
the effectiveness of Web caching.
Another possible use would be to include a model of the early interruption of connections in
traffic generators for simulation purposes, such as WETMO, a WEb Traffic MOdule for the Network
Simulator ns-2 available at [114]. WETMO has been developed during my Master Thesis A Simulation Study of Web Traffic over DiffServ Networks, available at [115], which has been published
in the International Archives and has been developed in collaboration with NEC Europe Ltd. –
Network Laboratories, Heidelberg (Germany). Besides, it is worth to point out that WETMO has
been used to gather the results published in [116]. Let me briefly describe the tool: WETMO is a
7.3. TCP AGGREGATES ANALYSIS
115
complete environment for the performance evaluation of short-lived traffic over DiffServ networks,
featuring primarily a sophisticated, light and efficient Web traffic generator, which can also automatically handle long-lived TCP or UDP background traffic – and Web traffic could incorporate
the notion of early interruption. WETMO is designed to leverage the user from a wide series of
tasks, from traffic generation to statistics collection, allowing him to concentrate on the description of the scenario (network parameters, topology and QoS settings). Worth of mention is the
fact that WETMO implements a new flow-level monitor method, allowing to collect both per-flow
and aggregated statistics of TCP connections, which has proved to be very light-weighted on both
computational power and storage requirements.
7.3 TCP Aggregates Analysis
Chapter 4 studied the TCP flow arrival process, starting from the aggregated measurement taken
from our campus network; specifically, we performed a live collection of data directly at the TCP
flow level, neglecting therefore the underlaying IP-packet level. Trace were both obtained and
post-processed through software tools developed by our research group, publicly available to the
community as open sources. We focused our attention beyond the TCP level, defining two layered
high-level traffic entities. At a first level beyond TCP, we identified traffic relations, which are
constituted by TCP flows with the same path properties. At the highest level, we considered traffic
relation aggregates having homogeneous byte-wise weight; most important, the followed approach
enabled us to divide the whole traffic into aggregates mainly made of either TCP-elephants or TCPmice. This permitted to gain some interesting insights on the TCP flow arrival process. First, we
have observed, as already known, that long range dependence at TCP level can be caused from
the fact that the number of flows within a traffic aggregate is heavy-tailed. In addition, the traffic
aggregate properties allowed us to see that TCP-elephants aggregates behave like ON-OFF sources
characterized by an heavy-tailed activity period. Besides, we were able to observe that LRD at
TCP level vanishes for TCP-mice aggregates: this strongly suggests that even ON-OFF behavior
is responsible of the LRD at TCP level.
7.3.1 Future Aggregations
The simple methodology described in Chapter 4 allowed to create comparable traffic aggregates,
i.e., traffic aggregates having the same volume (or weight) but internally constituted by an extremely different number of flows, having thus a very different density (or weight density). The
aggregation process allowed, in other words, to naturally partition the so called Elephants from
Mice, as the heavy and light flows are commonly addressed in the literature. However, several
other aggregation criteria could be explored as well, in order to gain a deeper knowledge of the
traffic patterns currently populating the Internet.
Notation
So, the past analysis considered a growing number of homogeneous partitions of the original trace;
recalling the notation adopted in Section 4.3.1, we have:
CHAPTER 7. FINAL THOUGHTS
116
K
- TCP Level # 10 4 flow size, expressed in byte, of the Õ -th TCP flow exchanged between K
K
and (considering only the data originated from server directed to client ).
K
- Traffic Relation (TR) Level 10 4 represents all the TCP flows exchanged among and
œ
K
K
# 1„ 4
during the whole trace, having size: e9 10 4 #
- Traffic Aggregate (TA) Level . Œ is the -th traffic aggregate, having size: e9 1 24_
K
!"# œ$&%(') + e 10
- Trace Level . represents all the observed flows, and thus the whole trace, having size:
K
- # # œ$ # 1„ 4 .
In other words, the aggregated traffic, having total weight - , has been divided into groups . Œ :
it is important to stress that this aggregation is a partition of the trace . in the sense that:
L
#Ÿ
. #
Œ
.
.
(7.1)
[ !
20UhÕ#
;
" 
.
(7.2)
Moreover, each of the groups contains the same absolute traffic quantity - å ; neverthless,
inside each group there is a significatively different number of flows (or, in other words, a different
XžÖmW
K
e9
cardinality of the aggregates Z
1Ð. Œ 4 ), depending on their size
10
4:
e
1±4_
e
ä
d# œ$&%,)ž
L
ä
e
[I
1„
K
4Ä%$'&)(
1ž(4_
/-
(7.3)
(7.4)
Moreover, the number of Traffic Relations with each Traffic Aggregate satisfy the following relationship:
Z
XžÖ?W
1—. # 4¼W
Z
XžÖmW
1—.
[ +
4 *-,
Õ_=p
(7.5)
in other words, by construction, the aggregates having a smaller index are constituted by a smaller
number of the bigger flows observed during the whole trace.
Aggregation Criteria
A possible extension of the past work could consider a complementary approach: rather than
partitioning the whole trace on different TAs comprised of a variable number of TRs, one could
decide to aggregate a constant number of traffic relations adopting several different aggregation
criteria; clearly, this entails that the TAs would no longer be dimensionally homogeneous. The
problem formulation is therefore rather simplified: the elimination of the partitioning constraint
release the researcher from the burden of solving of an optimization problem.
K
4
7.3. TCP AGGREGATES ANALYSIS
117
Let us indicate the aforementioned criterion with . and define the criterion cardinality, indicated with î/. , as the number of traffic relations aggregated by . ; let us define further as .0 the
traffic aggregate generated by . ; finally, whether it will be necessary to explicitly indicate at the
same time both the cardinality and the criterion, the notation . (n) will be used, where evidently
^ î/. . As previously stated, the set of aggregates does not constitute a partition of the whole
0 .
trace unless the complementary aggregate .10 is considered, defined as .2V
. .
The aggregation upon . of a number î3. of TRs can be performed in several ways; for example:
- Largest .24 : select the î/. biggest TRs
- Smallest .
"
: select the î/. smallest TRs
- Random . : randomly select î3. TRs
- Nearest . L : select the î/. nearest traffic relations, where the distance function in the IP space
could be tuned on the base of a given reference address (AND-bitmask)
- Farthest . g : select the î/. farthest traffic relations, where the distance function in the IP
space could be tuned on the base of a given reference address (XOR-bitmask)
Without entering into further details, let us briefly state that . represent an extremely simple
though significant subsample of the trace, that can be implemented in several ways (e.g., uniformly
or Monte-Carlo, on the number of flows within TRs or on the IP-space distance, or on the flow size,
. . . ).
In practice, one could consider a number of increasing cardinality aggregates, e.g., including
87
# Ý65 L:9#> up to half of the total TR number (i.e., adopting base 2 or 10, we would have î/£
or
. Õ+
^ ÜmÎ #
# Ý 5 Å G L:9#> ) Intuitively, when the number of TR increases, the observed properties of the
î/a
. Õ$‹
^ ÜY$ #Ÿ
traffic aggregate should become, at the same time i) statistically more significant and ii) decreasingly evident. This is better explained through an example: let consider the normalized TRs weight
within TAs (i.e., the ratio of the
] TA size over the whole trace size, divided further by the number of
TRs within
that TA) induced by .;4 and . " : initially, .14 (1) individuate the biggest traffic relations
]
X Ò e9
(having normalized weight
1
4 å - ) while . " (1) individuate the smallest one (normalized
e
weight Õ$^ 1 4 å - ). If we now extend the TAs considering the second biggest (smallest) traffic
relation the normalized weight .;4 (2) (. " (2)) will clearly decrease (increase): when finally all the
TRs are considerer, thus considering . " (N) and .14 (N), both the TAs are comprised of the same
TRs and cannot be distinguished.
. å Î ,
A simple approach could be to directly contrast the characteristics of . and .;0 when î3q
in other words when the two aggregates split the whole trace in a two-set partition. For example,
one could consider the heaviest half .;4 (N/2) and the complementary one .10=< (N/2), which is
therefore constituted by the lightest half of the trace (notice furthermore that with .:0?> 1 å Î4 . " 1 å Î4 , since .14 is the complementary criterion with respect to . " ). Recalling the results shown
in Section 4.4, this would prove a successful method to uncover remarkably different properties of
the aggregates.
The following is a brief list of the metrics whose distribution, first and second order statistics,
Hurst parameter could be studied. A first group of metric is irrespective from the aggregation
criteria, in the sense that they are global, or related to the whole trace, as for example:
CHAPTER 7. FINAL THOUGHTS
118
- number of
- TRs per trace
- TCP flows per TRs
- IP packets per TR
- IP packets per TCP flow
- size of:
- TRs
- TCP flows
- interarrival time:
- of IP packets within TCP flows
- of TCP flows within TRs
These metrics could be easily extended to the local case, or in other words be studies along with
the specific aggregation criteria; for example, it could be interesting to study, for different criteria
and cardinality, the behavior of, e.g.:
- number of TRs, TCP flows and IP packets per TA
- interarrival time properties of TRs, TCP flows and IP packets within TAs
- TA size
7.4 Switching Performance Evaluation
In Chapter 5 we proposed a novel and flexible methodology to synthesize realistic traffic traces
to evaluate the performance of switches and, in general, of controlled queuing networks. Packets
are generated from a single packet trace from which different synthetic traffic traces are obtained
fulfilling a desired scenario, e.g., a traffic matrix. Additional constraints are imposed to maintain
the original traffic characteristics, mimicking the behavior imposed by Internet routing. We compared the performance of a switch adopting different queuing and scheduling strategies, under two
scenarios: the synthetic traffic of our methodology and traditional traffic models. We observed that
not only absolute values of throughput and delays can change considerably from one scenario to
the other, but also their relative behaviors. This fact highlights the importance of some design aspects (e.g., the buffer management) which are traditionally treated separately. These results show
new behavioral aspects of the queuing and scheduling in switches, which will likely requires more
insight in the future.
The work presented in Chapter 5 could be extended in two ways, which I will briefly describe
here. The first path would require to solve a modified version of the optimization problem, considering additional constraints related to the network topology; a second path that could be possibly
7.4. SWITCHING PERFORMANCE EVALUATION
119
followed in order to fill the gap between realism of the traffic source and the current traffic models could be to use responsive traffic sources. Both the approaches could then be contrasted and
compared with the results achievable under traditional traffic models, much as it has been done in
Section 5.3.4.
7.4.1 Modified Optimization Problem
One of the main limitations of the current model is that, though it save the identity of the single
packet streams at the IP level, nevertheless the process still allows the aggregation of different TCP
flows that, traveling possibly completely disjoint paths in the network, experience rather different
networks conditions. Similarly but in a mirrored fashion, flows that travel along the same path may
be assigned to different input-output aggregates.
The Problem
s2
d
s1
i1
j
...
s3
External Servers
i2
Internal Clients
(Polito)
Figure 7.3: Flow “Delocalization” as Consequence of the Aggregation Process
The diagram sketched in Figure 7.3 may be helpful to illustrate the problem. Let assume that
a user, incidentally named Dario2 , browses the Web from the Politecnico di Torino with a PC
W
having IP address . In order to correctly visualize the current page, the browser will send several
HTTP requests in parallel: since P-HTTP1.0 or HTTP1.1, one for each object in the current page 3 .
Moreover, due to the increased diffusion of load balancing techniques, it is possible that some
of these requests will be redirected to two different servers belonging to the same ISP, having IP
W
K
K
K
addresses e Î . Similarly, another work session may generate traffic from G directed to .
W
K
Currently, the aggregation process can lead to the situation schemed, where the paths V\
W
K
e Î5\
enter the Politecnico ingress router through two different ports Õu and ÕSÎ : this equals
to logically disjoin two physically totally overlapped paths. The complementary effect, early described and not shown in the picture, corresponds to the logical regrouping (through the ingress
W
W
K
K
port Õu ) of physically disjoined paths ( \
and Gæ\
).
Therefore, the packet interarrival process within each input-output aggregates may be partly
“artificial”, in the sense that the real correlations among different traffic sources is potentially
scrambled during the aggregation process.
2
3
Any resemblance or reference to actual persons, living or dead, is unintentional and purely coincidental
Any reference to cookies in the website and any resemblance to real cookies or biscuits is purely coincidental
CHAPTER 7. FINAL THOUGHTS
120
The Solution
A possible solution to this problem would consist in devising a simple criterion to logically regroup
K
K
the servers (e.g., ed Î ); moreover, this should be done a priori, or in other words prior to the
application of the greedy aggregation algorithm described in Section 5.2.3 To ease the notation and
for the sake of the clarity, in the following we will refer to the a priori process, which involves
K
uniquely IP sources # as to regrouping, whereas the final process will be indicated, as usual, as
aggregation.
s1
s2
s3
s4
s5
FF.FF.FF.0
10.10.10.1
10.10.10.2
10.10.20.3
10.20.20.4
20.20.20.5
FF.FF.00.0
10.10.10.1
10.10.10.2
10.10.20.3
10.20.20.4
20.20.20.5
FF.00.00.0
10.10.10.1
10.10.10.2
10.10.20.3
10.20.20.4
20.20.20.5
Figure 7.4: Flows Regrouping through Logical Masks
Let focus first on the regrouping process only: we will analyze its impact on the aggregation
process at a later step. Figure 7.4 reports an extreme example of the logical regrouping process
through logical masks.
Since IP addresses can be represented on 32 bits, we indicate with logical mask a sequence of ^
symbols followed by GžÎ @ ^ symbols ; if we further define ^ as the mask depth, in the following
we will indicate the ^ -deep mask with ã Ó . The ^ -th regrouping criterion can be expressed as a
K
K
simple logical function between the server IP addresses and the mask ã Ó : two addresses Œ
belong to the same group ü [ if and only if their first ^ bit are the same, i.e.:
K
K
Œ
Ø
ü [
K
*-,
K
t
Œ
t
ã
Ó
(7.6)
Intuitively, the ^ -th grouping criterion will generate a number 1 ^£4 , unknown a priori, of address
groups that are a partition of the IP address space „ , thus:
L ½ Ó(
#
ü
# „
# ü [ @
ü 2VUhÕA
”
" 
(7.7)
(7.8)
This process bring several advantages at the price of a limited disadvantage:
B
though the regrouping process does not guarantee that two sources near in the IP space
correspond to two sources physically near in the real network interconnections, it can be
considered nevertheless a valid approximation;
B
the improved model represents therefore a further step toward a realistic traffic generation
methodology, offering an arrival pattern significantly more robust with respect to the early
adopter model (as a consequence of the elimination of the false positive aggregation of “far”
sources and the false negative missed aggregations of “near” sources);
7.4. SWITCHING PERFORMANCE EVALUATION
B
121
B
the regrouping of source addresses does not affect the freedom to map destination addresses
to the switch output ports; therefore, though the constraint number increase, the increase is
“unilateral”, in the sense that the flexibility of the synthesis is shifted toward the ability of
efficiently associate destination addresses to the switch output ports.
C
moreover, it should be considered that given the real network position of the destination
address with respect to the trace collection point (i.e., edge router of Politecnico di Torino),
the model is largely less affected from the mapping among destination addresses and output
ports; indeed, the statistical properties of the flow that are entering the Politecnico LAN
have already been determined: the impact of the LAN is clearly negligible with respect
to the multi-hop route that flows have traveled from the source until this last router – in
other words, the Politecnico router does not act as a bottleneck and does not bias, or modify
infinitesimally in the worst case, the flow performance;
C
the average load will be harder to “balance” and the instantaneous load will be less “balanced” during the whole trace: potentially, the offered traffic will be disjoined on the temporal support
the reduced flexibility of the aggregation process leads to a smaller number of potential
traffic scenarios; however, the most interesting traffic matrices (such as uniform or diagonal
matrices) should not be much harder to generate than in the previous case.
7.4.2 Responsive Sources
As we previously outlined, in the field of switching and routing architectures, a severe problem
is constituted by the absence of traffic models apt to describe the complex interactions happening in packet networks: a possible solution, still unexplored, would require to adopt a feedback
mechanism to implement reactive sources, i.e., sources that reacts to the real network condition,
mimicking the real TCP behavior.
The Problem
A simple choice would be to consider a generic Additive Increase Multiplicative Decrease (AIMD)
source, which is an extreme simplification of the real TCP behavior but would be much important
for a first-grade analysis of feedback sources; without entering in the full details of the TCP algorithm and behavior, we illustrate here the appropriateness of the AIMD responsive sources.
Indeed, it is well-known that the amount of data transmitted by TCP during a Round Trip Time
(RTT) is proportional to the congestion window (cwnd), which is tuned by an acknowledgment
mechanism. When the data sent is acknowledged, the cwnd is slowly increased (thus, the Additive
Increase), and so does the amount of data injected in the network: this is done to probe slowly probe
the network reaction reaction to the increase of the transmission rate. Conversely, when the data
fails to be acknowledged the TCP sources has an indication of packet loss, in which case the cwnd
can be abruptly diminished (thus, the Multiplicative Decrease): specifically, it is either halved (on
Fast Recovery) or reset (on Timeout), depending on the “gravity” of the received congestion signal:
indeed, the loss of a packet is a symptom of network congestion and therefore, in order to avoid
network saturation, responsive sources react by decreasing their transmission rate.
CHAPTER 7. FINAL THOUGHTS
122
Although the AIMD sources are not able to capture all the different flavors of the TCP protocol
suite and its many variations (e.g., Reno, NewReno, Tahoe, Sack, Vegas, Westwood and other more
or less stable algorithms), nevertheless it constitute a first valuable approximation of a windowed
protocol – featuring anyway some important differences from the classical Markovian traffic models, where everything is independent, memoryless and uncorrelated and thus the network feedback
is completely neglected.
The Solution
How to contextualize an AIMD traffic model to the performance evaluation of switching architectures is very easy iff we allow some compromise. With the help of Figure 7.5, we will introduce in
the following the details of AIMD traffic generation.
Normally, the switch offered traffic is specified through the matrix D whose elements D #\[
quantify the traffic entering on average in Õ and directed to  . Clearly, whether we introduce a
feedback-model, it will no longer be possible to impose a priori the average load, that will be
rather measured at the ingress ports of the device. The target matrix can therefore be D shall be
substituted by an adjacency matrix E utile to control the AIMD traffic, whose semantic meaning is
the following: every element E #\[ is an integer number representing the number of superimposed
AIMD sources generating responsive traffic directed from Õ to  . Initially, it could be a good
compromise to use a uniform unitary adjacency matrix (i.e., only one source per ingress-egress
pair).
M/M/1/B
D/D/oo
Background
β Traffic
Switch
AIMD
Sink
AIMD
Source
A
(ACK)
T’
∆
T’’
Figure 7.5: Responsive Traffic Generation Process
Referring to Figure 7.5, the adjacency matrix E # s only the first control mechanism for the
AIMD traffic, the second being the feedback mechanism. Let continue the picture analysis in
clockwise sense focusing only on the metrics that may be interesting for a future analysis: namely,
the offered load eAF , the delay G and the throughput e#F F , as well as the packet loss statistics that can
K0K
be gathered as ß ÜHJI Ý 61 e F @ e F F 4 å e F . At the output of the switching fabric, there’s an ACK
generator that injects acknowledgment packets in the feedback ring, which abstract the network
7.4. SWITCHING PERFORMANCE EVALUATION
123
model traveled by acknowledgment packets (or acks). Acks are firstly queued into a M/M/1/B,
where, possibly, some background traffic can be injected in order to tune the network congestion.
Before completing the backward path and being notified to the AIMD generator, every ack can
also be delayed of a constant quantity: the D/D/Õ$^† I queue emulates the portion of the RTT due
to the link propagation; thus, as summarized below, acks are subject to two different kind of delay:
;
the queuing delay, a variable component due to ÿ
;
the propagation delay, a fixed component due to
å åLK
˜
å ÿ
å
˜
å
The algorithm of the AIMD-ACK pair is very simple; to be very schematic, the AIMD generator:
;
checks for timeout expiration, in which case:
- it drops the transmission windows by , (e.g., the window is halved when ‚Î
- retransmits the lost packet, starting from the last frame positively acknowledged (avoiding to overload the network if the in-flight size is bigger than the decreased window
size)
;
otherwise
- generates new packets (the amount is such to fill up the cwnd)
- sets the timeout corresponding to each packet acknowledgment
- start the actual packet transmission
- upon reception of a positive acknowledgment, it increments the the transmission winWNM O
4 , where the window may saturate to a finite value or
dow to ™ šž› 1 ++cwnd . ^
WPM O
)
. ^
\
It should be noted that there are two point in the system where packet loss can happen: either in the
backward bottleneck queue or in the switch. The second case bare additional discussion; indeed,
there may be cases in which an AIMD packet will not even reach the ACK generator: specifically
this can happen when the scheduling algorithm is not stable for any admissible ingress traffic
(e.g., iSLIP). In any of the two above situations, the timer expiration due to the missed reception
of an acknowledgment packet for the corresponding lost packet will trigger the reduction of the
transmission window.
Conversely, the ACK generator will work in a way similar to a “reflector”: upon reception of
K
a packet directed from Õ to  , having sequence number , it will send a packet from  to Õ with seK
quence number (or, to mimic more closely the TCP behavior, we could set this value to the next
K
sequence number the sender of the segment is expecting to receive, thus ` ). Acknowledgment
packets (or acks) enter then a queuing network, which constitute the backward part of the path,
and they will be responsible of either a decrease (on ack loss) or an increase (on successful ack
reception) of the window at the transmitter side, closing therefore the feedback loop. As a conclusive observation, notice that the feedback loop can also contribute, beside packet loss, to tune
the AIMD transmission rate: indeed a simple ack delay may be sufficient to trigger the expiration
CHAPTER 7. FINAL THOUGHTS
124
of the timeout and consequently yielding to a decrease of the congestion window and thus of the
transmission rate.
This network model has several parameter that can be tuned: without aim of completeness,
we finally report some interesting scenarios that could be studied. First of all, architectures and
switching algorithms that could be considered are the same that I have considered in Section 5.3.4:
mainly, input queuing (iSLIP, MWM, . . . ) or outptut queuing solutions. The new parameters that
can be tuned are the ÿ å ÿ å åLK buffer size K , as well as the queuing discipline that can be adopted
(i.e., FIFO, DropTail, RED, Choke, . . . ); moreover, one can consider several classes of propagation
˜ ˜
delay, or even non-deterministic delays (e.g., ÿ å ÿ å| rather than å å| ). Finally, AIMD
traffic allow to specify the number of sources per input-output port pair through the adjacency
matrix E whereas the background traffic (which can be markovian, bursty, bernoulli on/off, . . . )
can be controlled in volume through .
7.5 Post Processing with DiaNa
Finally, in Chapter 6 we presented DiaNa, the versatile and flexible Perl framework, designed
especially to process huge amount of data, that has been used to carry on the analysis described in
the previous chapters. After over viewing its architecture and syntax, we devoted special attention
to the simplest possible way of getting the most out of it, describing its usage and interaction at a
shell-level. Finally, we gave some intuition of possible uses of the tool, describing the context that
brought us to its development as well as giving some practical examples, that are ultimately the
“backstage” of the presented research activity.
7.5.1 DiaNa Makeup and Restyle
As with any computer program, a lot has been done since but nevertheless a lot remains to be done.
For example, several extensions have been implemented, such as the P-Square algorithm [97] for
the on-line estimation of quantiles requiring a constant state and no a-priori knowledge or assumption of the data; however, the implementation has been carried on in Perl, which has a number of
features that speedup the implementation to the detriment of the running speed: therefore, a lot of
optimization could be possible, such as re-implementing the most stable part of the tool in a faster
and more efficient language (as well as more stubborn and annoying 4 ).
However, we refer the reader to the DiaNa website [95] for other technical issues: indeed, to
thank the reader for its attention until this point, I have decided to report (and sometime invent)
some Murphy’s law regarding computer topics, which ironically highlight some of the real problems that occurred during the development of DiaNa (and for the analysis of the results described
in this thesis as well).
;
4
Complexity Axiom
The complexity of any program grows until it exceeds the capability of the programmer who
must maintain it.
As you have probably guessed, I’m talking about Kerny’s & Ritchy’s C, with whom many programmers have a
love-hate relationship. About the optimization of DiaNa through reimplementation, however, I would go along with
the perl manual pages, which suggest that “the implementation in C is left as an exercise for the reader”.
7.5. POST PROCESSING WITH DIANA
;
Obsolescence Law
Any given program, when running, is obsolete.
;
Bugs Invariant
Software bugs are impossible to detect by anybody except the end user.
125
- Corollary 1: A working program is one that has only unobserved bugs.
- Corollary 2: If a program has not crashed yet, it is waiting for a critical moment before
crashing.
- Corollary 3: Bugs will appear in one part of a working program when another “unrelated” part is modified.
- Corollary 4: A program generator creates programs that are more buggy than the
program generator itself5
;
Triviality Theorem
Every non-trivial program has at least one bug
- Corollary 1: A sufficient condition for program triviality is that it has no bugs.
- Corollary 2: At least one bug will be observed after the author finishes its PhD.
;
Axiom of File-dynamics
The number of files produced by each simulation run tends to overcome any effort to keep
them ranked in any order.
;
Saturation Lemma
Disks are always full6 .
- Corollary: It is futile to try to get more disk space: data expands to fill any void.
;
Rules of Thumb
The value of a program is inversely proportional to the weight of its output.
- Corollary 1: If a program is useful, it will have to be changed.
- Corollary 2: If a program is useless, it will have to be documented.
;
Indetermination Principle
Constants aren’t and variables won’t.
- Corollary: In any multi-threaded program the value (or the state) of any given resource
cannot be punctually determined, even if the resource is not supposed to be shared
across agents7
5
Except the DiaNa core, of course
Except disks holding traffic measurement when the output of tcpdump is redirected to a single file named
/dev/null
7
Tstat obviously being an exception.
6
126
CHAPTER 7. FINAL THOUGHTS
Bibliography
[1] J. D. Sloan, “Network Troubleshooting Tools”, O’Reilly Ed., Aug. 2001.
[2] R. Stine Ed., “FYI on a Network Management Tool Catalog”, Network Working Group
RFC1147, Apr. 1990
[3] “Tools for Monitoring and Debugging TCP/IP Internets and Interconnected Devices”, Network Working Group RFC1470, Jun. 1993
[4] PlanetLab website http://www.planet-lab.org/
[5] L. Peterson, T. Anderson, D. Culler, and T. Roscoe. In Proceedings of First Workshop on Hot
Topics in Networking (HotNets-I), October 2002.
[6] CAIDA, the Cooperative Association
http://www.caida.org
for
Internet
Data
Analysis,
website
[7] ENTM, available at ftp://ccc.nmfecc.gov
[8] EtherApe homepage, http://etherape.sourceforge.net/
[9] Getethers, available at ftp://harbor.ecn.purdue.edu/
[10] S.
McCanne,
C.
Leres,
http://www.tcpdump.org/
and
V.
Jacobson,
tcpdump
homepage,
[11] S.
McCanne,
C.
Leres,
and
V.
Jacobson,
http://sourceforge.net/projects/libpcap/
libpcap
homepage,
[12] WinPcap homepage, http://winpcap.polito.it
[13] L. Degioanni, M. Baldi, F. Risso and G. Varenni, “Profiling and Optimization of SoftwareBased Network-Analysis Applications”, In Proceedings of the 15th IEEE Symposium on
Computer Architecture and High Performance Computing (SBAC-PAD 2003), Sao Paulo,
Brasil, November 2003
[14] F. Risso, L. Degioanni, “An Architecture for High Performance Network Analysis”, In Proceedings of the 6th IEEE Symposium on Computers and Communications (ISCC 2001), Hammamet, Tunisia, July 2001
[15] IPTraf homepage, http://cebu.mozcom.com/riker/iptraf/
127
BIBLIOGRAPHY
128
[16] Manikantan
Ramadas,
“TCPTrace
Manual”,
available
http://jarok.cs.ohiou.edu/software/tcptrace/manual/
at
[17] Information Sciences Institute, U. o. S. C. User Datagram Protocol, August 1980. RFC 768.
[18] Information Sciences Institute, U. o. S. C. Transmission Control Protocol, September 1981.
RFC 793.
[19] K.Ramakrishnan, S.Floyd and D.Black, The Addition of Explicit Congestion Notification
(ECN) to IP, September 2001. RFC 3168.
[20] M.Mathis, J.Mahdavi, S.Floyd and A.Romanow, TCP Selective Acknowledgement Options,
October 1996. RFC 2018.
[21] R.Fielding, J.Gettys, J.Mogul, H.Frystyk, L.Masinter, P.Leach and T.Berners-Lee, Hypertext
Transfer Protocol – HTTP/1.1, June 1999. RFC 2616.
[22] S.Floyd, J.Mahdavi, M.Mathis and M.Podolsky, “An Extension to the Selective Acknowledgement (SACK)” Option for TCP, July 2000. RFC 2883.
[23] W.R. Stevens, “TCP/IP Illustrated Volume I: The Protocols”. Addison-Wesley, 1994.
[24] T.Berners-Lee, R.Fielding and H.Frystyk, “Hypertext Transfer Protocol – HTTP/1.0”, May
1996. RFC 1945.
[25] V.Jacobson, R.Braden and D.Borman, “ TCP Extensions for High Performance”, May 1992.
RFC 1323.
[26] G.R. Wright, and W.R. Stevens,
Addison-Wesley, 1996.
“TCP/IP Illustrated Volume II The Implementation”,
[27] GARR,
“GARR - The Italian Academic and Research
http://www.garr.it/garr-b-home-engl.shtml, 2001.
Network,”
[28] Commissione Reti e Calcolo Scientifico del MURST, “Progetto di Rete a Larga Banda per le
Universitá e la Ricerca Scientifica Italiana”, Tecnical Document CRCS-97/11, 1997, available
at http://www.garr.it/docs/crcs1.shtml
[29] MRTG homepage, http://people.ee.ethz.ch/ oetiker/webtools/mrtg/
[30] GEANT,
Pan-European
Gigabit
http://www.dante.net/geant/
Network
homepage,
cfr.
[31] Internet2 homepage, cfr. http://www.internet2.edu/
[32] Tstat’s Homepage, http://tstat.tlc.polito.it/
[33] M. Mellia, A. Carpani and R. Lo Cigno, “Measuring IP and TCP behavior on Edge Nodes”,
IEEE Globecom 2002, Taipei, Taiwan, November 2002
BIBLIOGRAPHY
129
[34] M. Mellia, A. Carpani, and R. Lo Cigno, “Tstat web page,” http://tstat.tlc.polito.it/, 2001.
[35] V. Paxson and S. Floyd, “Wide-Area Traffic: The Failure of Poisson Modeling,” IEEE/ACM
Transactions on Networking, Vol. 3, no. 3, pp. 226–244, Jun. 1995.
[36] B. M. Duska, D. Marwood and M. J. Feeley, “The Measured Access Characteristics of WorldWide-We b Client Proxy Caches”, USENIX Symposium on Internet Technologies and Syst
ems, pp. 23–36, Dec. 1997.
[37] L. Fan, P. Cao, J. Almeida and A. Broder, “Summary Cache: A Scalable Wide-Area Web
Cache Sharin g Protocol,” ACM SIGCOMM ’98, pp. 254–265, 1998.
[38] A. Feldmann, R. Caceres, F. Douglis, G. Glass and M. Rabinovich, “Performance of Web
Proxy Caching in Heterogeneous BandwidthEnvironments”, IEEE INFOCOM ’99, pp. 107–
116, 1999.
[39] B. Mah, “An Empirical Model of HTTP Network Traffic,” IEEE INFOCOM ’97, Apr. 1997.
[40] H. Balakrishnan, M. Stemm, S. Seshan, V. Padmanabhan and R. H. Katz, “TCP Behavior of
a Busy Internet Server: Analysis and Solutions”, IEEE INFOCOM ’98, pp. 252–262, Mar.
1998.
[41] W. S. Cleveland, D. LinD and X. Sun, “IP Packet Generation: Statistical Models for TCP Sta
rt Times Based on Connection-Rate Superposition”, ACM SIGMETRICS 2000, pp. 166–177,
Jun. 2000.
[42] F. Donelson Smith, F. Hernandez, K. Jeffay and D. Ott, “What TPC/IP Protocol Headers can
Tell us about the Web,” ACM SIGMETRICS ’01, pp. 245–256, Jun 2001.
[43] M. E. Crovella and A. Bestavros, “Self Similarity in World Wide Web Traffic: Evidence and
Possible Causes,” IEEE/ACM Transactions on Networking, Vol. 5, no. 6, pp. 835–846, 1997.
[44] V. Paxson, “End-to-end routing behavior in the Internet,” IEEE/ACM Transactions on Networking, Vol. 5, no. 5, pp. 601–615, 1997.
[45] L. Deri and S. Suin, “Effective traffic measurement using ntop”, IEEE Communications
Magazine, Vol. 38, pp. 138–1 43, May 2000.
[46] J. Postel, “Internet Protocol”, RFC 791, Sept. 1981.
[47] J. Postel, “Transmission control protocol”, RFC 793, Sept. 1981.
[48] M. Allman, V. Paxson, and W. Stevens, “TCP Congestion Control”, RFC 2581, 1999
[49] S. Ostermann, tcptrace, 2001, Version 5.2.
[50] V. Jacobson, R. Braden, and D. Borman, “TCP Extensions For High Performance”, RFC
1323, May 1992.
130
BIBLIOGRAPHY
[51] M. Mathis, J. Madhavi, S. Floyd, and A. Romanow, “TCP Selective Acknowledgment Options”, RFC 2018, Oct. 1996.
[52] D. Rossi, C. Casetti, M. Mellia, User Patience and the Web: a Hands-on Investigation, In
Proceedings of IEEE Globecom, San Francisco, CA, Dec. 2003.
[53] R. Khare and I. Jacobs,
http://www.w3.org/Protocols/NL-PerfNote.html
[54] M. Molina et al., Web Traffic Modeling Exploiting TCP Connections’ Temporal Clustering
through HTML-REDUCE, IEEE Network, May 2000
[55] R. Fielding et al., Hypertext Transfer Protocol HTTP/1.1, RFC2616, June 1999
[56] D. Rossi, L. Muscariello, M. Mellia, On the properties of TCP Flow Arrival Process, In
Proceedings of ICC, QoS and Performance Modeling Symposium, Paris, Jun. 2004.
[57] R. Caceres, P. Danzig, S. Jamin and D. Mitzel, Characteristics of Wide-Area TCP/IP Conversations, ACM SIGCOMM, 1991
[58] P. Danzig and S. Jamin, tcplib: A library of TCP Internetwork Traffic Characteristics,
USC Technical report, 1991
[59] P. Danzig, S. Jamin, R. Caceres, D. Mitzel, and D. Estrin, An Empirical Workload Model
for Driving Wide-Area TCP/IP Network Simulations, Internetworking: Research and Experience, Vol.3, No.1, pp.1–26, 1992
[60] V. Paxons, Empirically Derived Analytic Models of Wide-Area TCP Connections, IEEE/ACM
Transactions on Networking, Vol.2, pp.316–336, Aug. 1994
[61] V. Paxson and S. Floyd, Wide-area Traffic: The Failure of Poisson Modeling, IEEE/ACM
Transactions on Networking, Vol.3, No.3, pp.226–244, Jun. 1995
[62] W.E. Leland, M.S. Taqqu, W. Willinger and V. Wilson, On the Self-Similar Nature of Ethernet
Traffic (Extended version), IEEE/ACM Transaction on Networking, Vol.2, No.1, pp.1–15,
Jan. 1994
[63] A. Feldmann, Characteristics of TCP Connection Arrivals, Park and Willinger (editors)
Self-Similar Network Traffic and Performance Evaluation, Wiley-Interscience, 2000
[64] W. Willinger, M.S. Taqqu, R. Sherman and D.V. Wilson, Self-Similarity through High Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level, IEEE/ACM Transaction on Networking, Vol.5, No.1, pp.71–86, Jan. 1997
[65] M. Mellia, A. Carpani and R. Lo Cigno, Measuring IP and TCP behavior on Edge Nodes,
IEEE Globecom, Taipei (TW), Nov 2002
[66] A. Erramilli, O. Narayan and W. Willinger, Experimental Queueing Analysis with LongRange Dependent Packet Traffic, IEEE/ACM Transactions on Networking, Vol.4, No.2,
pp.209–223, 1996
BIBLIOGRAPHY
131
[67] P. Abry and D. Veitch, “Wavelet Analysis of Long Range Dependent Traffic”, Transactions
on Information Theory, Vol.44, No.1 pp.2–15, Jan. 1998
[68] N. Hohn, D. Veitch and P. Abry, , Does Fractal Scaling at the IP Level depend on TCP Flow
Arrival Processes ?, 2nd Internet Measurement Workshop, Marseille, Nov. 2002
[69] S.C. Graves, A.H.G. Rinnooy Kan and P.H. Zipkin, Logistic of Production and Inventory,
Nemhauser and Rinnoy Kan (editors), Handbooks in Operation Research and Management
Science, Vol.4, North-Holland, 1993.
[70] P. Abry, P. Flandrin, M.S. Taqqu and D.Veitch, Self-similarity and Long-Range Dependence
Through the Wavelet Lens, In Long Range Dependence: Theory and Applications, Doukhan,
Oppenheim, 2000
[71] A. Feldmann, A. Gilbert, W. Willinger and T. Kurtz, The Changing Nature of Network Traffic:
Scaling Phenomena, Computer Communication Review 28, No.2, Ap. 1998
[72] M.E. Crovella and A. Bestavros, Self-Similarity in World Wide Web Traffic: Evidence and
Possible Causes, IEEE/ACM Transactions on Networking, Vol.5, No.6, pp.835–846, 1997
[73] I. Norros, On the Use of Fractional Brownian Motion in the Theory of Connectionless Networks, IEEE Journal on Selected Areas in Communications, Vol.13, pp.953–962, Aug. 1995
[74] M. Taqqu and V. Teverosky, Is Network Traffic Self-Similar or Multifractal ?, Fractals, Vol.5,
No.1, pp.63–73, 1997
[75] R.H. Riedi and J. Lévy Véhel, Multifractal Properties of TCP Traffic: a Numerical Study,
IEEE Transactions on Networking, Nov. 1997 (Extended version appeared as INRIA research
report 3129, Mar. 1997)
[76] A. Gilbert, W. Willinger and A. Feldmann, Scaling Analysis of Conservative Cascades, with
Applications to Network Traffic, IEEE Transactions on Information Theory, Vol.45, No.3,
pp.971–992, Apr. 1999
[77] D. Veitch, P. Abry, P. Flandrin and P. Chainais, Infinitely Divisible Cascade Analysis of
Network Traffic Data, IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP’00), 2000.
[78] S.Roux, D.Veitch, P.Abry, L.Huang, P.Flandrin and J.Micheel, Statistical Scaling Analysis of
TCP/IP Data, IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP’01), Special session: Network Inference and Traffic Modeling, May 2001
[79] A. Horváth and M. Telek, A Markovian Point Process Exhibiting Multifractal Behavior
and its Application to Traffic Modeling, 4th International Conference on Matrix-Analytic
Methods in Stochastic Models, Adelaide (Australia), Jul. 2002
[80] A.T. Andersen and B.F. Nielsen, A Markovian Approach for Modeling Packet Traffic with
Long-Range Dependence IEEE Journal on Selected Areas on Communication, Vol.16, No.5,
pp.719–732, 1998.
132
BIBLIOGRAPHY
[81] S. Robert and J.Y. Le Boudec, New Models for Pseudo Self-Similar Traffic, Performance
Evaluation, Vol.30, No.1-2, pp.57–68, 1997.
[82] S. Robert and J.Y. Le Boudec, Can Self-Similar Traffic be Modeled by Markovian Process ?,
International Zurich Seminar on Digital Communication, Feb. 1996
[83] A. Reyes Lecuona, E. González Parada, E. Casilari, J.C. Casasola and A. Dı́az Estrella, A
Page-Oriented WWW Traffic Model for Wireless System Simulations, ITC16, Edinburgh, Jun.
1999
[84] P. Giaccone, L. Muscariello, M. Mellia, D. Rossi, The performance of Switch under Real
Traffic, In Proceedings of HPSR04, Phoenix, AZ, April 2004.
[85] A. Feldmann, A.C. Gilbert and W. Willinger, “Data Networks as Cascades: Investigating the
Multifractal Nature of Internet WAN Traffic”, ACM SIGCOMM’98, Boston, Ma, pp. 42–55,
Sep. 1998.
[86] I. Norros, “ A storage model with self-similar input”, Queueing Systems, Vol. 16, pp. 387–
396, 1994.
[87] M.S. Taqqu, “Fractional Brownian Motion and Long Range Dependence”, Theory and
Application of Long-Range Dependence P.Doukhan, G. Oppenheim, M.S. Taqqu Editors,
2002.
[88] L. Muscariello, M. Mellia, M. Meo, R. Lo Cigno, M.Ajmone Marsan, “A Simple Markovian Approach to Model Internet Traffic at Edge Routers”, COST279, Technical Document,
TD(03)032, May 2003
[89] H. J. Chao, “Next generation routers”, Proceedings of the IEEE, Vol. 90, No. 9, pp. 1518–
1558, Sep. 2002,
[90] M. Ajmone Marsan, A. Bianco, P. Giaccone, E. Leonardi and F. Neri, “Packet-mode scheduling in input-queued cell-based switches”, IEEE/ACM Transactions on Networking, Vol. 10,
No. 5, pp. 666–678, Oct. 2002
[91] N. McKeown and T. E. Anderson, “A Quantitative Comparison of Scheduling Algorithms
for Input-Queued Switches”, Computer Networks and ISDN Systems, Vol. 30, No. 24, pp.
2309–2326, Dec. 1998.
[92] N. McKeown, A. Mekkittikul, V. Anantharam and J.Walrand, “Achieving 100% Throughput
in an Input-Queued Switch”, IEEE Transactions on Communications, Vol. 47, No. 8, pp.
1260–1272, Aug. 1999,
[93] N.McKeown,“The iSLIP scheduling algorithm for input-queued switches”, IEEE/ACM
Transactions on Networking, Vol. 7, No. 2, Aug.1999, pp.188-201
[94] B. V. Rao, K. R. Krishnan and D. P. Heyman, “Performance of Finite Buffer Queues under
Traffic with Long-Range Dependence,” Proc. IEEE Globecom, Vol. 1, pp. 607–611, Nov.
1996.
BIBLIOGRAPHY
133
[95] DiaNa,
http://www.tlc-networks.polito.it/diana
[96] Perl, http://www.perl.com/
[97] R. Jain and I. Chlamtac, The P-Square Algorithm for Dynamic Calculation of Percentiles and
Histograms without Storing Observations, Communications of the ACM, Oct. 1985,
[98] S. Hopkins, Camels and Needles: Computer Poetry Meets the Perl Programming Language,
Usenix, 1992
[99] L. Wall, T. Christiansen and J. Orwant, Programming Perl, O’Reilly Ed.,
2000
G
rd
Edition, Jul.
[100] D. M. Beazley, D. Flecther and D. Dumont, Perl Extension Building with SWIG, USENIX,
1998
[101] Comprehensive Perl Archive Network, cfr. http://www.cpan.org
[102] Perl Data Language, cfr. http://pdl.perl.org
[103] J. E. Friedl, Mastering Regular Expressions, O’Reilly Ed., Î
rd
Edition, Jul. 2002
[104] Obfuscated Perl Contest, The Perl Journal, cfr http://www.tpj.com/
[105] Gnuplot, cfr. http://www.gnuplot.info
[106] G. Baiocchi, Using Perl for Statistics: Data Processing and Statistical Computing, Journal
of Statistical Software, Vol. 11, 2004.
[107] B. W. Kernighan and C. J. Van Wyk, Timing Trials, or, the Trials of Timing: Experiments
with Scripting and User-Interface Languages, Technical Report, Nov. 97
[108] L. Prechelt, An Empirical Comparison of Seven Programming Languages, IEEE Computer,
Oct. 2000.
[109] ActiveState, http://www.activestate.com
[110] D. Rossi and I. Stoica, Gambling Heuristics and Chord Routing, Submitted to IEEE Globecom’05, St Luis, MO, November 2005.
[111] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek and H.
Balakrishnan, “Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications,”
IEEE/ACM Transactions on Networking, Vol. 11, No. 1, pp. 17-32, Feb. 2003.
[112] T. Oetiker, RRDtools website http://people.ee.ethz.ch/ oetiker/webtools/rrdtoo
[113] P. Arlos, M. Fiedler and A. A. Nilsson, “A Distributed Passive Measurement Infrastructure”,
Passive and Active Measurements Workshop, Boston, USA 2005.
134
BIBLIOGRAPHY
[114] WETMO, a WEb Traffic MOdule for the Network Simulator ns-2 is available at
http://www.tlc-networks.polito.it/wetmo/
[115] D.
Rossi,
“A
Simulation
Study
of
Web
Traffic
over
DiffServ
Networks”,
Master
Thesis,
International
Archives,
available
at
http://www.tlc-networks.polito.it/rossi/DRossi-Thesis.ps.gz
[116] D. Rossi, C. Casetti, M. Mellia, “A Simulation Study of Web Traffic over DiffServ Networks”, In Proceedings IEEE Globecom 2002, Taipei, TW, November 2002.