PHD THESIS

Transcription

PHD THESIS

AGH
University of Science and Technology in Krakow
Faculty of Electrical Engineering, Automatics, Computer Science
and Biomedical Engineering
D EPARTMENT OF A PPLIED C OMPUTER S CIENCE
P H D T HESIS
P RZEMYSŁAW B EREZI ŃSKI , M.S C . E NG .
E NTROPY- BASED N ETWORK A NOMALY D ETECTION
S UPERVISOR :
Marcin Szpyrka, Ph.D., D.Sc.
AUXILIARY SUPERVISOR :
Bartosz Jasiul, Ph.D., Lt. Col.
Krakow 2015
Akademia Górniczo-Hutnicza
im. Stanisława Staszica w Krakowie
Wydział Elektrotechniki, Automatyki, Informatyki i Inżynierii Biomedycznej
K ATEDRA I NFORMATYKI S TOSOWANEJ
ROZPRAWA DOKTORSKA
MGR IN Ż .
P RZEMYSŁAW B EREZI ŃSKI
D ETEKCJA ANOMALII W RUCHU SIECIOWYM Z
WYKORZYSTANIEM MIAR ENTROPIJNYCH
P ROMOTOR :
dr hab. Marcin Szpyrka, prof. AGH
P ROMOTOR POMOCNICZY:
ppłk dr inż. Bartosz Jasiul
Kraków 2015
Working on the Ph.D. has been a wonderful but sometimes overwhelming
experience. I would like to express my sincere gratitude to all those who
provided me the possibility to complete this Thesis.
First and foremost, I would like to thank my supervisors prof. Marcin
Szpyrka and dr Bartosz Jasiul for enabling and supporting preparation of this
Dissertation and for ensuring the freedom of work. Their guidance helped me
in all the time of research and writing of this Thesis.
Besides my supervisors, I would like to thank dr Joanna Śliwa and dr Rafał
Piotrowski for the opportunity to work in many interesting cyber security
projects. A special thanks goes to my labmates: dr Marek Małowidzki, Tomasz
Dalecki, Michał Mazur and Robert Goniacz for their contribution to the
software implemented during this research and inspiring discussions regarding
not only cyber security.
Last but not least, I would like to thank my family, my wife Marzena and my
sons for all their love and encouragement. My sincere thanks also goes to my
Mother for motivating me throughout my life.
Przemysław Bereziński
Abstract
This Dissertation focuses on application of anomaly detection in the field of network intrusion
detection. This is a very important issue as the number of cyber-attacks is alarmingly high and to
make things worse it increases each year. Partially, this is due to the fact that widely used security
solutions are ineffective against modern malicious software (malware). Damage from a malware,
especially this which acts in botnets, can take many serious forms including loss of important
data, reputation or money. Typically, botnet is a group of infected hosts (bots) operated by cybercriminals who are focused on making money. Recently, botnets are also used in a cyber warfare to
conduct sabotage and espionage.
Network anomaly detection is a very broad and heavily explored area. The first methods were
proposed almost 40 years ago but the problem of finding a generic network anomaly detection
method still remains unsolved. Dedicated methods for different types of network anomalies caused
by malware can be found in the literature. Recently entropy-based methods for detection of various
types of anomalies have gained a lot of attention. The use of entropy to detect botnet-like malware
has not been investigated so far.
The main goal of this Dissertation is to prove that entropy-based approach is suitable for detection
of modern botnet-like malware in local networks and thus it can be used to complement existing
signature-based solutions. In order to reach this goal and prove the claim of the Thesis, the Dissertation makes several original contributions. Comparison of different entropy measures to use
in network anomaly detection is provided. Original network anomaly detection method based on
parameterized entropies and supervised machine learning is proposed, implemented and verified
with the representative semi-synthetic dataset prepared for this purpose due to the lack of realistic,
complete and up-to-date datasets available. Moreover, analysis of proper parameters, suitable network features and right classifier to use with the method is conducted. Results of the verification
shows that the proposed method with parameterized Renyi or Tsallis entropy acting together with
classifier based on logistic regression allows to detect botnet-like malware with satisfactory level
of detection rate while keeping low rate of false alarms. Comparable detection based on Shannon
entropy or volume counters (number of flows, packets and bytes) turns out to be ineffective.
4
Streszczenie
Przedstawiona rozprawa doktorska dotyczy detekcji anomalii w obszarze wykrywania włamań
sieciowych. Tematyka ta jest bardzo ważna, gdyż liczba przeprowadzanych ataków cybernetycznych jest alarmujaco
˛ wysoka i co gorsza rośnie z roku na rok. Jest to cz˛eściowo spowodowane
tym, że powszechnie stosowane rozwiazania
˛
ochrony cybernetycznej sa˛ nieskuteczne w detekcji aktualnego złośliwego oprogramowania. Szkody powodowane przez takie oprogramowanie,
szczególnie to działajace
˛ w ramach botnetów, obejmuja˛ utrat˛e danych, reputacji czy pieni˛edzy.
Typowo, botnet to grupa zainfekowanych hostów (botów) sterowanych przez przest˛epców cybernetycznych w celu uzyskania korzyści finansowych. Obecnie botnety sa˛ także wykorzystywane
w wojnie cybernetycznej do sabotażowania czy też szpiegostwa.
Detekcja anomalii sieciowych to temat szeroki i mocno eksplorowany. Pierwsze metody pojawiły si˛e prawie 40 lat temu, ale problem znalezienia metod generycznych nie został do tej pory
rozwiazany.
˛
Istnieja˛ metody dedykowane do określonych typów anomalii zwiazanych
˛
ze złośliwym oprogramowaniem w tym metody bazujace
˛ na miarach entropijnych, które ostatnio ciesza˛
si˛e duża˛ popularnościa.˛ Nikt do tej pory nie zastosował ich jednak do detekcji złośliwego oprogramowania typu botnet.
Głównym celem niniejszej rozprawy jest dowiedzenie, że wykorzystanie miar entropijnych
pozwala na detekcj˛e złośliwego oprogramowania typu botnet w sieciach lokalnych i podejście
to może być stosowane jako uzupełnienie obecnie wykorzystywanych metod bazujacych
˛
na sygnaturach. W celu potwierdzenia postawionej tezy w rozprawie przedstawiono oryginalny wkład
w obecny stan wiedzy. Porównano kilka miar entropijnych pod katem
˛
ich zastosowania w detekcji
anomalii sieciowych. Zaproponowano, zaimplementowano i zweryfikowano autorska˛ metod˛e
bazujac
˛ a˛ na parametryzowanych entropiach i nadzorowanym uczeniu maszynowym. Weryfikacj˛e
wykonano na podstawie własnego, reprezentatywnego zbioru danych, jako że dost˛epne zbiory
okazały si˛e nierealistyczne, niekompletne i przestarzałe. Dodatkowo, dokonano analiz pod katem
˛
właściwych wartości parametrów, stosownych cech ruchu sieciowego i odpowiedniego klasyfikatora dla zaproponowanej metody. Badania skuteczności wykazały, że metoda wykorzystujaca
˛
parametryzowana entropie Renyiego lub Tsallisa wraz z klasyfikatorem bazujacym
˛
na regresji logicznej pozwala na skuteczne wykrywanie anomalii zwiazanych
˛
ze złosliwym oprogramowaniem
typu botnet przy jednoczesnym zachowaniu niskiego poziomu fałszywych alarmów. Odpowiadajace
˛ detekcja bazujac
˛ a˛ na entropii Shannona lub podejściu wolumenowych bazujacym
˛
na prostych
licznikach takich jak liczba przepływów, pakietów i bajtów okazuje si˛e nieskuteczna.
5
Contents
Abstract ............................................................................................................................................
4
Streszczenie......................................................................................................................................
5
1. Introduction....................................................................................................................................
9
1.1.
Motivation, Scope and Research Problem.............................................................................
9
1.2.
Goal and Plan of the Work .................................................................................................... 10
1.3.
Original contribution ............................................................................................................. 11
1.4.
Exclusions.............................................................................................................................. 12
2. Related work................................................................................................................................... 13
2.1.
General overview of network anomaly techniques................................................................ 13
2.2.
Closely related work.............................................................................................................. 14
2.2.1. Detection via network volume counters..................................................................... 15
2.2.2. Detection via network feature distributions ............................................................... 16
2.3.
Existing Datasets ................................................................................................................... 18
2.4.
Summary................................................................................................................................ 20
3. Entropy-based network anomaly detector – preface.................................................................. 21
3.1.
Main features ......................................................................................................................... 22
3.2.
Classification of the approach ............................................................................................... 22
4. Entropy ........................................................................................................................................... 24
4.1.
Shannon entropy .................................................................................................................... 24
4.2.
Parameterized entropy ........................................................................................................... 25
4.3.
Comparison............................................................................................................................ 27
4.3.1. Binominal distribution ............................................................................................... 27
4.3.2. Uniform distribution................................................................................................... 29
4.3.3. Impact of frequent and rare events............................................................................. 29
4.3.4. Entropy of exemplary distributions............................................................................ 30
5. Network flows ................................................................................................................................. 38
5.1.
Flows vs. packets ................................................................................................................... 38
5.2.
Flow export............................................................................................................................ 39
6
CONTENTS
7
5.2.1. Operating principle .................................................................................................... 39
5.2.2. Problems and difficulties............................................................................................ 41
5.3.
NetFlow export setup............................................................................................................. 42
6. Entropy-based network anomaly detector .................................................................................. 44
6.1.
Architecture ........................................................................................................................... 44
6.2.
Implementation...................................................................................................................... 46
7. Dataset............................................................................................................................................. 50
7.1.
Origin of the idea................................................................................................................... 50
7.2.
Legitimate traffic ................................................................................................................... 50
7.3.
Scenario 1 .............................................................................................................................. 53
7.4.
Scenario 2 .............................................................................................................................. 54
7.5.
Scenario 3 .............................................................................................................................. 57
7.6.
Anomaly generator ................................................................................................................ 60
8. Verification of the approach.......................................................................................................... 65
8.1.
Correlation ............................................................................................................................. 65
8.2.
Performance evaluation ......................................................................................................... 66
8.3.
Conclusions ........................................................................................................................... 76
9. Conclusions and further work ...................................................................................................... 80
9.1.
Conclusions ........................................................................................................................... 80
9.2.
Further work .......................................................................................................................... 82
9.2.1. On-line analysis in a real environment....................................................................... 82
9.2.2. Multi-classifier ........................................................................................................... 82
9.2.3. Multi-label approach .................................................................................................. 82
9.2.4. Dataset........................................................................................................................ 82
9.3.
Publications ........................................................................................................................... 83
P. Bereziński Entropy-based Network Anomaly Detection
List of Abbreviations
ACC – Accuracy
AUC – Area Under a Curve
BDR – Bayesian Detection Rate
CEP – Complex Event Processing
CybOX – Cyber Observable Expression
DDoS – Distributed Denial of Service
DNS – Domain Name System
DoS – Denial of Service
FDR – False Discovery Rate
FN – False Negative
FNR – False Negative Rate
FP – False Positive
FPR – False Positive Rate
HIDS – Host-based Instrusion Detection System
ICMP – Internet Control Message Protocol
IDS – Intrusion Detection System
IP – Internet Protocol
IPFIX – IP Flow Information Export
IRC – Internet Relay Chat
NIDS – Network-based Intrusion Detection System
NPV – Negative Predictive Value
NTP – Network Time Protocol
P2P – Peer-to-Peer
PCA – Principal Component Analysis
PPV – Positive Predictive Value
PR – Precission Recall
RDP – Remote Desktop Protocol
ROC – Receiver Operating Characteristic
RPC – Remote Procedure Call
SNMP – Simple Network Management Protocol
SQL – Structured Query Language
STIX – Structured Threat Information Expression
TCP – Transport Control Protocol
TN – True Negative
TNR – True Negative Rate
TP – True Positive
TPR – True Positive Rate
UDP – User Datagram Protocol
1. Introduction
This chapter introduces the reader to the subject of the Thesis. It is divided into four sections.
Section 1.1 presents motivation, scope and briefly describes the research problem. It shows why it is
an important issue in the field of Computer Science. Section 1.2 specifies the main goal of the research
and presents the steps that were made in order to reach it. It familiarizes the reader with the outline of this
Dissertation and presents contents of subsequent chapters. Section 1.3 emphasizes those results of the
Thesis that are considered as the original contribution. Section 1.4 discusses issues that are deliberately
not addressed in this research.
1.1. Motivation, Scope and Research Problem
Data mining is an interdisciplinary subfield of Computer Science involving methods at the
intersection of artificial intelligence, machine learning and statistics [HTF09]. One of the data mining
task is anomaly detection which is the analysis of large quantities of data to identify items, events
or observations which do not conform to an expected pattern. Anomaly detection is applicable in
a variety of domains, e.g. fraud detection [PLSG10], fault detection [Nai09], system health monitoring [MSOS07] but this Dissertation focuses on application of anomaly detection in the field of
network intrusion detection. The first anomaly detection method for intrusion detection was proposed almost 40 years ago by Denning [Den87]. Today network anomaly detection is a very broad
and heavily explored subject but the problem of finding a generic method for a wide range of network anomalies is still unsolved. There are some problems with anomaly detectors which have to
be addressed. The main challenges are: high false alarm rates, long computation time, tuning and
calibration and root-cause identification [Bra10]. Because of that anomaly detection techniques are
rarely implemented in commercial Intrusion Detection Systems (IDS). Such systems mostly make
use of the common signature-based (or misuse-based) technique. This approach is known of its
shortcomings [LDZ05], [CLLL12], [GOB11], [JSl14a], [JSl14b]. Signatures describe only illegal patterns in network traffic, so a prior knowledge is required [LDZ05]. Signature-based solutions do not cope
with evasion techniques and attacks yet unknown (0-days) [CLLL12], [JSl14a], [JSl14b]. Moreover,
they are unable to detect a specific attack until a rule for the corresponding vulnerability is created,
tested, released and deployed, which usually takes some time [GOB11]. As the widely used intrusion
detection systems are often ineffective against a modern malicious software (malware), a proper network anomaly detection as one of the possible solutions to complement signature-based approach is
9
1.2. Goal and Plan of the Work
10
so essential. Recently, entropy-based methods which rely on network feature distributions have been of
great interest [Eim08], [WP05], [NSA+ 08], [Tel12], [YKW11], [KBHJ08]. It is crucial to check if with
entropy-based approach it is possible to successfully detect anomalous network activity caused by modern botnet-like malware [HP14]. This is a really important issue, as the number of such malware as well
as the level of its sophistication increases each year [Sop14]. Botnet is a group of infected hosts (bots)
controlled by Command and Control (C&C) servers operated by cyber-criminals and according to recent
reports provided by cyber security organizations [Ver14], [Sym14], [Cer13], [Sop14] they are one of the
most sophisticated and popular types of cybercrime today. Damage from such a malware can take many
serious forms including loss of important data, reputation or money. Moreover, nowadays botnets are also
used in a cyber warfare to conduct sabotage and espionage [SK14]. Entropy-based approach to detect
anomalies caused by botnet-like malware in local networks is a not investigated area. Some entropybased methods proposed in the past, e.g. [TBSM09], [YKW11], [NSA+ 08] deal with massive spreads
of rather old not botnet-like worms and different types of Distributed Denial of Service (DDoS) attacks
in high-speed backbone networks controled by Internet Service Providers (ISP). In the work presented
in this Dissertation we have tried to find the best way of using entropy in order to properly detect and
categorize network anomalies which indicate existence of a botnet-like malware in local networks. This
type of anomalies is often very small and hidden in a network traffic volume expressed by the number
of flows, packets or bytes, so their detection via popular solutions and methods which rely mostly on
a traffic volume changes, e.g. [NfS], [BKPR02], [MSHJSC+ 04], [Nto] is highly difficult.
1.2. Goal and Plan of the Work
The main goal of this Dissertation is to prove that: Entropy-based approach is suitable for detection
of modern botnet-like malware in local networks based on network anomalies characteristic for such
a malware. We will try to find the answer for the following questions:
– Are entropy measures useful in the context of network anomaly detection?
– Is it possible to effectively detect and classify small and low-rate anomalies connected with botnetlike malware activity in local networks by means of entropy?
– Is entropy-based approach better than traditional volume-based approach?
– Do parameterized entropies help to improve results obtained for Shannon entropy?
– What is the proper set of parameters for entropies to successfully detect network anomalies?
– Which network features should be taken into consideration in order to detect broad spectrum of
anomalies connected with botnet-like malware?
– Which popular classifiers work fine with entropy-based approach?
It is assumed that the goal of this work can be reached in the following steps:
1. Preparation of a concept of original entropy-based network anomaly detection method.
1.3. Original contribution
11
2. Implementation of the method.
3. Preparation of original dataset (due to the lack of appropriate benchmarking data available).
4. Evaluation of the method.
These steps are discussed in detail in the further part of the Thesis that is organized as follows:
– Chapter 2 reviews related work in the area of network anomaly detection. General overview of the
latest advances in this broad subject as well as a detailed review of anomaly detection techniques
that are closely related to the approach proposed in this Dissertation are presented. Additionally,
some comments on existing datasets for evaluating network anomaly detection systems are included.
– Chapter 3 provides a brief overview of the approach taken to prove the Thesis and it introduces the
reader to the proposed method. The main features as well as a general classification of the method
are presented.
– Chapter 4 introduces the definition of Shannon entropy and describes Renyi and Tsallis generalizations. Brief overview as well as comparison of entropy measures based on simulations is provided.
– Chapter 5 describes the concept of network flows and provides comparison of this technique with
widely used packet-based approach. Additionally, the NetFlow [Cla04] export setup prepared to
interact with the proposed method is presented.
– Chapter 6 presents the architecture of the proposed method. Detailed specification as well as results
of implementation are given.
– Chapter 7 refers to the dataset developed to evaluate performance of the proposed method.
– Chapter 8 presents results of verification of the method.
– Chapter 9 finishes this Dissertation providing conclusions and a short summary. It also outlines
future work.
1.3. Original contribution
The approach proposed in this Dissertation is superior to state of the art in several aspects. The
following issues are considered to be original contribution of the Thesis:
1. The use of entropy-based approach to detect botnet-like malware in local networks.
2. Concept and implementation of an original entropy-based network anomaly detection method.
3. Comparison of different entropy measures to use in entropy-based network anomaly detection.
4. Selection of a proper set of α-values for parameterized entropies and proper set of network features
to successfully detect various network anomalies.
1.4. Exclusions
12
5. Comparison of performance of different classifiers to work with the proposed method.
6. Comparison of entropy measures with volume-based counters to use in network anomaly detection.
7. Preparation of the original dataset which includes anomalies specific for network activity of modern botnet-like malware.
8. Detailed performance evaluation of the method by means of both standard and novel (introduced
for the purpose of this Thesis) metrics.
1.4. Exclusions
Network anomaly detection is a broad topic. Some of the issues that are deliberately not addressed
in this Thesis are presented below.
1. This Thesis does not cover the aspects of detecting anomalies or attacks visible in IP packets
and their payloads. This is mainly due to the fact that such anomalies are easly detectable with
signature-based approach until the attack is not known or network traffic is not encrypted.
2. There is no empirical evaluation of the proposed method working on-line in real environment since
it is planned for a future work.
3. There is no comparison of the method with other summarization techniques such as histograms
or sketches in this Thesis. The main reason is lack of publicly available implementations of these
methods. Moreover, such a comparison would be difficult and results could be inaccurate since the
performance of these methods strongly depends on a proper tuning.
4. There is no evaluation of the proposed method with publicly available dataset as during preparing
this Thesis none of them met all necessary requirements such as completeness, timeliness and
correctness.
This Thesis has been partially supported by the Polish National Centre for Research and Development, under the project no. PBS1/A3/14/2012 SECOR and the project no. 01.01.02-00-062/09 CybSecLab and by the European Regional Development Fund the Innovative Economy Operational Programme, under the project no. 01.01.02-00-062/09 INSIGMA.
2. Related work
This chapter reviews related work in the area of network anomaly detection. The chapter starts with
a general overview of the latest advances in this broad subject. Then, more details on anomaly detection
techniques that are closely related to the approach proposed in this Dissertation are presented and comments are provided. Finally, some remarks on existing datasets for evaluating network anomaly detection
systems are given.
2.1. General overview of network anomaly techniques
The
problem
of
anomaly
detection
in
network
traffic
has
been
extensively
stud-
ied. There are many surveys, review articles, as well as books on this broad subject.
A great number of research on anomaly detection techniques is found in several books,
e.g. [WFH11], [BK13], [Agg13], [HTF09]. In surveys such as [CBK09], [HA04], authors discuss
anomaly detection in general and cover the network intrusion detection domain only briefly. In several
review papers [ETGTDV04], [PP07], [Cal09], [CKS+ 09], [GTDVMFV09] various network anomaly
detection methods have been summarized. Recent, well-structured and comprehensive survey on
anomaly-based network intrusion detection in terms of general overview, techniques, systems, tools and
datasets with a discussion of challenges and recommendations is presented by Bhuyan et al. [BBK13].
The review of network intrusion detection by Sperotto et al. [SSS+ 10] where valuable comparison of
packet-based and flow-based approach is provided is another paper worth mentioning.
From the aforementioned surveys it follows that the most effective methods of network anomaly
detection include Principle Component Analysis, Wavelets, Markovian models, Clustering, Histograms,
Sketches, and Entropies. To familiarize the reader with these techniques and to facilitate understanding
of Section 2.2 a short description of each of them is presented below.
Principle Component Analysis (PCA) is a popular dimension reduction technique in machine learning [HNG+ 07], [SCSC03], [LYW13]. PCA transforms a set of correlated random variables to a new
coordinate system that is given by the principal components. Simply speaking, PCA is a technique where
a set of correlated random variables is transformed into smaller set of uncorrelated ones. The uncorrelated
variables are linear combinations of the original ones and can be used to express the data in a reduced
form.
Wavelet
transformation
is
one
of
the
techniques
of
time-frequency
transforma-
tions [LG09], [LTG08], [LWK10]. It is used for analyzing localized variations of power within
13
2.2. Closely related work
14
a timeseries. By decomposing a timeseries into time–frequency space, one is able to find the dominant
modes of variability and determine how those modes vary in time. There are some important differences
between well-known Fourier analysis [YZX+ 04] and wavelets. Fourier functions are localized in frequency but not in time. Small frequency changes in Fourier transform will produce changes everywhere
in the time domain. Wavelets are local in both frequency and time. This localization is an advantage in
many cases.
Markov models are very useful for modeling sequences [YZB04], [SZH+ 13]. For a given system,
a Markov model consists of a list of possible states, possible transition paths between those states and
rate parameters of those transitions. The simplest Markov model is a Markov chain. It models the state
of a system with a random variable that changes through time. The distribution for this variable depends
only on the distribution of the previous state. A hidden Markov model [JP05] is a Markov chain for which
the state is only partially observable. In other words, observations are related to the state of the system,
but they are typically insufficient to precisely determine the state.
Cluster analysis (or clustering) is a technique used to group objects of a similar kind into respective
categories [SPBW12], [REHA13], [BSS+ 14]. This technique is based on unlabeled data. In machine
learning, methods that use labeled samples are said to be supervised and methods which rely on unlabeled
samples are said to be unsupervised [Alp10]. Clustering can be achieved by various algorithms that
differ significantly in their notion of what constitutes a cluster and how clusters are identified. Usually,
clustering-based techniques require distance computation between a pair of objects.
Histograms, sketches and entropy-based approaches are methods that summarize random variable distributions, e.g. distribution of addresses or ports in the domain of network anomaly detection.
Histogram-based methods divide the entire range of values of distributions into a series of small intervals called bins [KSD09], [SST+ 04]. Sketch-based approach relies on a set of histograms where the
elements are assigned to bins using a set of different hash-functions [SLBK08], [BDWS09]. Entropy is
a measure of the uncertainty connected with a random variable [Sha48]. In general the more random the
variable the higher the entropy. Entropy summarizes a probability distribution with a single value, which
can be conveniently used to compare certain qualitative differences of probability distributions. Entropy
fits well to network anomaly detection, because some attacks or anomalies result in concentrating or
dispersing probability distributions of network features [NSA+ 08], [TBSM09].
In this section a closer look at works strictly related to approach proposed in this Dissertation is taken.
The review of detection methods based on summarizing network feature distributions via entropy, histograms and sketches is provided. Special attention is devoted to the methods employing different forms
of entropy. Some comments related to noticed gaps are given. The section starts with the comparison
of the network feature distributions approach to the older but still more popular detection via network
volume counters.
15
2.2.1. Detection via network volume counters
In the past, network anomalies were treated as deviations in the traffic volume. Simple counters such
as number of flows, packets (total, forwarded, fragmented, discarded) and bytes (per packet, per second) were used. These counters can be derived from network devices via Simple Network Management
Protocol (SNMP) [HPW02] or NetFlow [Cla04], [SBCQ09].
Barford et al. [BKPR02] presented wavelet analysis to distinguish between predictable and anomalous traffic volume changes using a very basic set of counters from NetFlow and SNMP data. They used
the advanced signal analysis technique combined with very simple metrics, i.e. number of flows, packets
and bytes. The authors reported some positive results in detection of high-volume anomalies such as
network failure, bandwidth flood and flash crowd.
Kim et al. [MSHJSC+ 04] proposed a method where many different Distributed Denial of Service
(DDoS) attacks are described in terms of traffic patterns in a flow characteristics. In particular, the authors
focused on counters like: number of flows, packets, bytes, the flow and packet sizes, average flow size
and number of packets per flow. In a presented TCP SYN flood example, the following pattern has
been applied: a large number of flows, yet small number of small packets and no constraints on the
bandwidth and the total amount of packets. This pattern differs significantly from the one generated for
an ICMP/UDP flooding attack, where high bandwidth consumption and a large number of packets is
involved. Although the authors reported some good results, they also mentioned that common legitimate
peer-to-peer (P2P) traffic may result in some false alarms in their approach.
A threshold-based detector measuring the deviation from a mean value present in a traffic collection algorithm for frequent collection of SNMP data was proposed by Lee et al. [LPKL09]. To assess
the algorithm, the authors examined how it impacts detection of volume anomalies. Only some minor
differences were reported in comparison to the original traffic collection algorithm.
Casas et al. [CFVN09] introduced an anomaly detection algorithm based on SNMP data which
deals with abrupt and large traffic changes. The authors proposed a novel linear parsimonious model
for anomaly-free network flows. This model makes it possible to treat the legitimate traffic as a nuisance
parameter, to remove it from the detection problem and to detect the anomalies in the residuals. Authors
reported that with this approach they slightly improved the previously introduced approach based on
PCA in terms of false alarms.
Many commercial and open source solutions that rely on SNMP or NetFlow counters are available
on the market, e.g. NFSen [NfS], NtopNg [Nto], Plixer Scrutinizer [Scr], Peassler PRTG [Prt], and Solarwinds Network Traffic Analyzer [Sol]. All of them provide more or less the same functionality:
– browsing and filtering network data;
– statistics overview, e.g. top-talkers, i.e. hosts or services that exchanged most traffic;
– reporting, e.g. bandwidth reports, i.e. which user exchanged how much traffic;
– alerting when traffic thresholds are exceeded or some rules describing anomalous behavior are
matched.
16
Several solutions available on the market, e.g. Invea-Tech FlowMon [Floc] or AKMA Labs FlowMatrix [Floa] offer some anomaly detection methods which mostly rely on predefined set of rules for
detection of undesirable behavior patterns, and some simple long-term network behavior profiles in
terms of services, traffic volume and communication sides. Although vendors classify their solutions
as anomaly detection, usage of rule-based heuristic describing well known patterns corresponds more to
the signature-based approach.
Concluding this subsection, we noticed that although there are many methods that rely on counters,
their capabilities are limited. The main problem with a counter-based approach it mostly rely on traffic volume . Nowadays, many network attacks or anomalies such as low-rate DDoS, stealth scanning
or botnet-like worm propagation and communication do not result in substantial traffic volume change.
The presented counter-based methods handle well large and abrupt traffic changes such as bandwidth
flooding attacks or flash crowds, but a large group of anomalies which do not cause changes of volume remains undetected. Moreover, there is also a practical issue connected with counters reported by
Brauckhoff et al. [BTW+ 06] who stated that packets sampling used by many routers to save resources
when collecting data can influence a counter-based anomaly detection metrics, but does not significantly
affect the distribution of network features.
2.2.2. Detection via network feature distributions
Network anomaly detection via network feature distributions is becoming more and more popular.
Several feature distributions, i.e. header-based (addresses, ports, flags), volume-based (host or service
specific percentage of flows, packets and bytes) and behavior-based (in/out connections for particular
host) have been suggested in the past [LCD05], [NSA+ 08], [TBSM09]. However, it is unclear which
network feature distributions perform best. Nychis in [NSA+ 08], based on his results of pairwise correlation, reported dependencies between addresses and ports and recommended the use of volume-based and
behavior-based feature distributions. In contrast, Tellenbach in [TBSM09] found no correlation among
header-based features. In this Dissertation, an original results of network features correlation are presented and some interesting conclusions are given.
Shannon Entropy
Entropy as the measure of uncertainty can be used to summarize feature distributions in a compact
form, i.e. single number. Many forms of entropy exist, but only a few have been applied to network
anomaly detection. The most popular is the well-known Shannon [Sha48] entropy. Application of Shannon measures such as relative entropy and conditional entropy to conduct network anomaly detection
were proposed by Lee and Xiang [LX01]. Also, Lakhina et al. [LCD05] made use of Shannon entropy
to sum up feature distributions of network flows. By using unsupervised learning, the authors showed
that anomalies can be successfully clustered. Wagner and Plattner [WP05] made use of the Kolmogorov
Complexity, which is related to Shannon entropy [GV03], [TMSA11], in order to detect worms in network traffic. Their work mostly focuses on implementation aspects and scalability and does not propose
any specific analysis techniques. The authors reported that the method is able to detect worm outbreaks
and massive scanning activities in a near real time. Ranjan et al. [RSN+ 07] suggested another worm deP. Bereziński Entropy-based Network Anomaly Detection
17
tection algorithm which measures Shannon entropy ratios for traffic feature pairs and issues an alarm on
sudden changes. Gu et al. [GMT05] made use of Shannon maximum entropy estimation to estimate the
network baseline distribution and to give a multi-dimensional view of network traffic. The authors claim
that with their approach they were able to distinguish anomalies that change the traffic either abruptly
or slowly. Iglesias et al. [IZ14] proposed a fast, lightweight method to distinguish different attack types
observed in the IP darkspace monitor. The method is based on Shannon entropy measures of network features and machine learning techniques. The explored data belongs to a portion of the Internet background
radiation from a large IP darkspace.
Generalized entropy
Besides Shannon entropy, several generalizations of entropy have been recently introduced in the
context of network anomaly detection. Einman in [SEB07], [ESB05], [Eim08] reported some positive
results of using T-entropy [TNS+ 05] for intrusion detection based on analysis of packets. T-entropy can
be estimated from a string complexity measure called T-complexity [TNS+ 05]. String complexity is
a minimum number of steps required to construct a given string. In contrast to entropy, where probabilities (estimated from frequencies) can be permuted, in a complexity-based approach, the order matters.
A string is compressed with an algorithm and the output length is used to estimate the complexity. Finally, the complexity becomes an estimate for the entropy. Because in this approach sequence of events
is crucial, it fits to the fine-grinded methods of network data analysis such as full packet or packet header
inspection. The problem is, that this type of inspection is not scalable in the context of network speed.
Some details about T-entropy are presented in our paper [PBPC12]. A parameterized generalization of
entropy has also been recently reported as very promising. The Shannon entropy assumes a tradeoff
between contributions from the main mass of the distribution and the tail. With the parameterized Tsallis [Tsa88] or Renyi [Ren70] entropy, one can control this tradeoff. In general, if the parameter denoted
as α has a positive value, it exposes the main mass, if the value is negative – it refers to the tail. Ziviani
et al. [ZGMR07] investigated Tsallis entropy in the context of the best value of α parameter for DoS
attacks detection. They found that α-value around 0.9 is the best for detecting such attacks. Shafiq et
al. [SKF08] did the same for port scan anomalies caused by malware. He reported that α-value around
0.5 is the best choice to detect scan anomalies. A comparative study of the use of the Shannon, Renyi and
Tsallis entropy for attribute selecting to obtain an optimal attribute subset, which increases the detection
capability of decision tree and k-means classifiers was presented by Lima et al. [LAS12]. The experimental results demonstrate that the performance of the models built with smaller subsets of attributes is
comparable and sometimes better than that associated with the complete set of attributes for DoS and
scan attack categories. The authors found, that for the DoS category, Renyi entropy with α-value around
0.5 and Tsallis entropy with α-value around 1.2 are the best for decision tree classifier. We believe that,
the proper choice of the α-value depends either on the anomaly or the legitimate traffic used as a baseline, or for both, since none of the authors mentioned above reported similar results. Thus, goals such
as finding the proper value of parameter for entropy in order to improve detection of particular group of
anomalies will remain unachieved. Some authors, e.g. Tellenbach et al. [TBSM09], [TBS+ 11], [Tel12]
employed a set of α-values in their methods. The authors proposed the Traffic Entropy Telescope prototype based on Tsallis entropy capable to detect a broad spectrum of anomalies in a backbone traffic
2.3. Existing Datasets
18
including fast-spreading worms (not so common nowadays), scans and different form of DoS/DDoS attacks. Although Tsallis entropy seems to be more popular than Renyi entropy in the context of network
anomaly detection, the latter was also successfully applied in detection of different anomalies. An example is the work by Yang et al. [YKW11] who employed Renyi entropy to early detection of low-rate
DDoS attacks, and Kopylova et al. [KBHJ08] who reported positive results of using Renyi conditional
entropy in detection of selected worms. We believe that with parameterized entropy some limitations of
Shannon entropy caused by small descriptive capability [Tel12] which results in a little ability to detect
typical small or low-rate anomalies can be overcome. Moreover, we think that with properly chosen set
of α-values this detection will be accurate in terms of low number of false alarms and high detection rate.
In this Thesis we present original results of our research on the proper set of α-values as well as original
research on the most suitable entropy type.
Other techniques
Apart from entropy, some other feature distributions summarization techniques are successfully used
in the context of network anomaly detection, namely sketches and histograms. Soule et al. [SST+ 04] proposed a flow classification method based on modeling network flow histograms using Dirichlet Mixture
Processes for random distributions. The authors validated their model against three synthetic test cases
and achieved almost 100% accuracy. In [SLBK08], Stoecklin et al. introduced a two-layered sketch
anomaly detection technique. The first layer models typical values of different feature components, e.g.
typical number of flows connecting to a specific port while the second layer evaluates the differences
between an observed feature distribution and a corresponding model. The authors claim that the main
strength of their method is the construction of fine-grained models that capture the details of feature
distributions, instead of summarizing it into an entropy value. A more general approach was presented
by Kind et al. [MSHJSC+ 04]. In their method, histogram-based baselines were constructed from some
essential network feature distributions such as addresses and ports. This work was augmented by Brauckhoff et al. in [BDWS09], who applied association rule mining, in order to identify flows representing
anomalous network traffic. Although the non-entropic feature distributions summarization techniques
seem to work fine, proper tuning is the main problem with them [Tel12]. The performance of detection
depends, to a great extent, on the accuracy of a bin size. This may be difficult to set and control while
network traffic changes.
One of the main problems in network anomaly detection is the lack of good and publicly available datasets for evaluation purposes. The authors of research in this area have noticed this situation [CRKM11], [Owe10], [EDD+ 13], [GGSZ14]. Some of the research works employ "what is available", that is, datasets that are outdated (from the point of view of both legitimate traffic and anomalies they contain); some works are based on own datasets, prepared for the sole purpose of evaluating
a proposed method "somehow" – as the dataset creation was not the goal in itself, its quality is usually limited. In our paper [MBM15] a detailed review of the existing datasets is presented, requirements
are defined and dataset preparation methods are described. Real network traces are the most valuable
19
but because of privacy issues they are rarely published. One possible solution for privacy is anonimization [CMRB09], [KAA+ 06], [FAAM07]. The goal of anonymization is to preserve the structure
of the data while at the same time preserve privacy policies. Finding the right balance sometimes may
be a difficult task [SSTG12]. Another problem with real traces is a proper labeling, which in many
cases has to be done manually. Real traffic traces can be found in some publicly available repositories, such as Internet Traffic Archive [ITA], LBNL/ICSI Enterprise Tracing [LBN], SimpleWeb [Sim],
Caida [Cai], MOME [MoM], WITS [WIT], UMASS [UMa]. Unfortunately, these traces are usually
old, unlabeled and not dedicated to anomaly detection. Alternative approaches cover synthetic or semisynthetic datasets. To build such dataset, a deep domain knowledge and appropriate methods and tools
are required in order to get realistic data. According to Brauckhoff et al. [BWM08], a realistic simulation
of legitimate traffic is largely an unsolved problem today and combining synthetic anomalies with real,
background traffic traces is one of the solutions. In [BWM08] and then in [Bra10] she introduced the
FLAME tool which allows injection of hand-crafted anomalies into a given legitimate traffic flow trace.
This tool is freely available but the current distribution does not include any models reflecting anomalies.
Another interesting concept was introduced by Shiravi et al. [SSTG12]. The authors proposed to describe
network traffic (not only flows) by a set of so-called α and β profiles which can subsequently be used to
generate a dataset. The α-profiles consist of actions which should be executed to generate a given event
in the network (such as attack) while in β-profiles certain entities (packet sizes, number of packets per
flow) are represented by a statistical model. Regrettably, this solution is not freely available.
Lack of traces of botnet-like malware behavior in available network datasets questions their timeliness. This type of traces should be included in contemporary datasets and researches should address
anomalies typical for botnet-like malware in their methods as nowadays they are one of the main threat.
The number of datasets containing botnet-like malware anomalies is limited. Worth mentioning are these
prepared by Shiravi et al. [STG+ 11] and Garcia et al. [GGSZ14]. The first one is a mixture of malicious
and non-malicious datasets. Unfortunately only one host in this datasets is infected with a botnet-like
malware. The second dataset which has been made public recently is much richer and consist of traces
of 13 different scenarios of running bots from 7 different families. It is obtained by running real (mostly
unmodified) malware on a subnetwork of infected hosts in a lab environment. This traffic has been mixed
with background traffic coming from real network. A controversial (but beneficial from the point of view
of the resulting dataset) decision was not to restrict botnet communication with the Internet in any way.
For privacy reasons, the dataset contains NetFlow data; additionally, full packet capture of botnet activity
is included. The dataset is carefully labeled, although the whole traffic from infected hosts was marked
as hostile. Unfortunately this dataset was unavailable while preparing this Thesis. An interesting dataset
has been also prepared by Sperotto et al. [SSVP09]. This dataset is based on data collected from a real
honeypot (an isolated and monitored trap) which was running for several days. The honeypot featured
common network services such us HTTP, SSH and FTP. The authors gathered about 14 million malicious
network flows and most of them referred to activity of web and network scanners. Some details about
particular anomalies in this dataset are also presented in our paper [BPMP14]. Even though some valuable datasets are emerging, many researchers still make use of very old and criticized DARPA [HLF+ 01]
dataset and its modified versions, namely, KDD99 [KDD] and NSL-KDD [TBLG09]. Besides strong
2.4. Summary
20
criticism by McHugh [McH00], Mahoney et al. [MC03] or Thomas [TSB08] for being unrealistic and
not balanced, nowadays DARPA datasets are simply out of date in the context of network services and
attacks.
2.4. Summary
As one can see, network anomaly detection is a very broad and heavily explored area. The problem of a generic anomaly detection method for network anomalies is still unsolved. The widely used
security solutions are ineffective against modern botnet-like malware. Feature distribution approach is
very promising. To summarize feature distributions application of entropy seems to be the best choice.
Entropy fits well to network anomaly detection, because some network attacks or anomalies result in
concentrating or dispersing probability distributions of network features but do not result in significant
traffic volume change. It seems that with parameterized entropy some limitations of Shannon entropy
caused by small descriptive capability, which results in a little ability to detect typical small or low-rate
anomalies, can be overcome. Usage of a broad spectrum of α-values seems to be crucial because unlike
Ziviani, Shafiq or Lima we do not believe that it is possible to find a single α-value that fits to particular
anomaly type. None of the authors adopt entropy to detect anomalies indicating botnet-like malware.
Current methods are dedicated to detecting massive worm spreads (not popular nowadays) and DDoS
attacks in high speed networks. The problem of finding a proper set of α-values, proper set of network
feature and proper classification (not just detection) method in order to find not only massive but also
small and low-rate anomalies, such as these typical to botnet-like behavior in local networks, remains
intact. This may contribute to the current state of the art in a botnet detection which is limited to some
non-entropic methods, e.g. method proposed by Livadas et al. [LWLS06] who proposed a machine learning technique to identify the C&C traffic of IRC-based botnets, Francois et al. [FWB+ 11] who presented
a system that uses the PageRank algorithm to detect different families of peer-to-peer botnets via network
flows and Bilge et al. [BBR+ 12] who proposed advanced knowledge-based botnet hunting system named
DISCLOSURE. The possibility of use of parameterized entropies for detection of anomalies connected
with botnet-like malware has been confirmed in the following chapters. Because of the lack of a realistic,
up-to-date and representative datasets, additional effort to develop labeled traces based on real legitimate
traffic and synthetic anomalies [BSJM14] reflecting botnet-like activity in local network had to be also
taken.
3. Entropy-based network anomaly detector – preface
In order to prove the claim of the Thesis, an entropy-based network anomaly detection module named
Anode has been proposed. It is developed to cooperate with the existing signature-based or known
pattern-based security solutions such as the popular Intrusion Detection Systems, e.g. Snort [Roe99],
Bro [Pax99] as well as Flow-based Network Traffic Analyzers, e.g. NfSen [NfS], NtopNg [Nto]. We
used such sulutions in SOPAS system [CKP+ 11], [BlPJ12], [JPB+ 12] developed to protect a set of
connected heterogenous systems which are not centrally managed. Currently, Anode is a component of
the anomaly detection and security event data correlation system developed in SECOR [JSl14a] project
which is SOPAS’ successor. In SECOR, Anode is expected to detect network anomalies with acceptable
False Positive Rate [Faw06] and high True Positive Rate [Faw06], categorize anomalies and report some
details (timestamps, related addresses and ports) to the correlation engine which correlates events coming from different anomaly detection modules and external sensors, such as the aforementioned Snort,
in order to improve detection and limit false alarms. SECOR anomaly detectors are not only limited to
network. For example, one of the components named PRONTO [JSl14a], [JSl14b] detects obfuscated
malware at infected hosts. General operating principle of Anode is presented in Fig. 3.1.
Figure 3.1: Anode – Entropy-based network anomaly detection module
Anode analyzes network flows. Various network feature distributions based on flows, e.g. addresses,
ports, are summarized by means of entropy. There are two phases: training and detection. In the training
phase, a profile of legitimate traffic is built and a model for classification is prepared. In the detection
21
3.1. Main features
22
phase, current observations are compared with the model. An abnormal dispersion or concentration for
different network feature distributions indicates anomaly. Extraction of anomaly details is also assumed –
related ports and addresses are obtained by looking into the top contributors to the entropy value. A much
more detailed description of the architecture is provided in Chapter 6.
3.1. Main features
The main features of Anode are presented below:
– off-line and on-line analysis of network flows within fixed time intervals;
– supervised machine learning with training and detection phases;
– multi-class classification;
– summarization of network feature distributions with parameterized Tsallis or Renyi entropy;
– use of selected range of α-values for entropy instead of single value which fits well;
– use of selected set of network features in order to detect a broad spectrum of anomalies;
– use of fine-grained legitimate network traffic profile;
– anomaly evidence extraction by reporting ip addresses and ports of attackers and victims.
3.2. Classification of the approach
On the basis of the main features, according to Figure 3.2, one can classify our approach as:
– anomaly detection;
– Network-based Intrusion Detection System (NIDS);
– having a centralized architecture;
– with a detection module fed up by the network traffic data;
– analyzing incidents off-line and on-line.
23
Figure 3.2: Features of detection methods (based on [Ren11])
3.2. Classification of the approach
4. Entropy
This chapter presents an introduction to the theoretic fundamentals of entropy. It starts with a brief
overview of Shannon entropy. Next, the parameterized generalizations are presented – this part is especially important as we decide to use this form of entropy in the approach presented in this Dissertation.
Finally, a comparison of entropy measures based on simulations is provided.
4.1. Shannon entropy
Definition of entropy as a measure of disorder comes from thermodynamics and was proposed in
the early 1850s by Clausius [CH67]. In 1948 Shannon [Sha48] adopted entropy to information theory.
In information theory, entropy is a measure of the uncertainty associated with a random variable. The
more random the variable, the bigger the entropy, and in contrast, the greater certainty of the variable,
the smaller the entropy. For a probability distribution p(X = xi ) of a discrete random variable X, the
Shannon entropy is defined as:
Hs (X) =
n
X
p(xi ) loga
i=1
1
p(xi )
(4.1)
X is the feature that can take values {x1 ...xn } and p(xi ) is the probability mass function of outcome
xi . The entropy of X can be also interpreted as the expected value of loga
1
p(X)
where X is drown ac-
cording to probability mass function p(x). Depending on the base of the logarithm, different units can
be used: bits (a = 2), nats (a = e) or hurtleys (a = 10). For the purpose of network anomaly detection,
sampled probabilities estimated from a number of occurrences of xi in a time window t are typically
used. The value of entropy depends on randomness (it attains maximum when probability p(xi ) for every xi is equal) but also on the value of n. In order to measure randomness only, normalized forms have
to be employed. For example, an entropy value can be divided by n or by maximum entropy defined as
loga (n). Some important properties of Shannon entropy are listed below. More properties can be found
in [Kar03] and [Csi08].
– Nonnegativity ∀p(xi )∈[0,1] Hs (X) ≥ 0
– Symmetry Hs (p(x1 ), p(x2 ), ...) = Hs (p(x2 ), p(x1 ), ...)
– Maximality Hs (p(x1 ), ..., p(xn )) ≤ Hs ( n1 , ..., n1 ) = loga (n)
– Additivity Hs (X, Y ) = Hs (X) + Hs (Y ) if X and Y are independent variables
24
25
4.2. Parameterized entropy
If not only the degree of uncertainty is important but also the extent of changes between assumed and
observed distributions, denoted as q and p respectively, a relative entropy, also known as the KullbackLeibler divergence [Kul59], [Csi08] can be used:
DKL (p||q) =
n
X
p(i) loga
i=1
p(i)
q(i)
(4.2)
This definition is not symmetric, i.e. DKL (p||q) 6= DKL (q||p) unless p = q.
To measure how much uncertainty is eliminated in X by observing Y the conditional entropy (or
equivocation) [CT06] may be employed:
m X
n
X
HS (X|Y ) =
p(xi , yj ) loga p(xi |yj )
(4.3)
i=1 j=1
The Shannon entropy assumes a tradeoff between contributions from the main mass of the distribution and the tail [MD08]. To control this tradeoff, two parameterized Shannon entropy generalizations
were proposed by Renyi (1970s) [Ren70] and Tsallis (late 1980s) [Tsa88] respectively. In general, if the
parameter denoted as α has a positive value, it exposes the main mass (the concentration of events that
occur often), if the value is negative – it refers to the tail (the dispersion caused by seldom events).
Both parameterized entropies (Renyi and Tsallis) are derived from the Kolmogorov-Nagumo generalization of an average [Mar05], [W˛e12]:
hXiφ = φ
−1
n
X
!
p(xi )φ(xi ) ,
(4.4)
i=1
where φ is a function which satisfies the postulate of additivity (only affine or exponential functions
satisfy this) and φ−1 is the inverse function. Due to affine transformations φ(xi ) → γ(xi ) = aφ(xi ) + b
(where a and b are numbers), the inverse function φ(xi ) is expressed as γ −1 (xi ) = φ−1 ( xia−b )
Renyi proposed the following function φ:
φ(xi ) = 2(1−α)xi
(4.5)
Renyi entropy can be obtained from the Shannon entropy with the following transformations:
HRα (X) = φ−1
n
X
!
p(xi )φ(− log2 p(xi ))
i=1
Given φ(xi ) = 2(1−α)xi and φ−1 (xi ) =
1
(1−α)
log2 xi
(4.6)
26
1
HRα (X) =
log2
1−α
1
log2
=
1−α
1
=
log2
1−α
1
log2
=
1−α
n
X
i=1
n
X
i=1
n
X
i=1
n
X
!
p(xi )2−(1−α) log2 p(xi )
!
log2 p(xi )(α−1)
p(xi )2
(4.7)
!
p(xi )p(xi )
(α−1)
!
p(xi )α
i=1
After transformation, a well-known form of Renyi entropy is obtained:
1
HRα (X) =
loga
1−α
n
X
!
p(xi )α
(4.8)
i=1
The Renyi entropy satisfies the same postulates as the Shannon entropy and there are the following
relations between these two:
HRα1 (X) ≥ HS (X) ≥ HRα2 (X)
1
loga
α→1 1 − α
lim
(4.9)
where α1 < 1 and α2 > 1
!
n
n
X
X
α
= Hs (X) =
p(xi ) loga
p(xi )
i=1
i=1
1
p(xi )
(4.10)
Tsallis proposed the following function φ:
2(1−α)xi − 1
1−α
After transformation, a well-known form of Tsallis entropy is as follows:
φ(xi ) =
1
HT α (X) =
1−α
n
X
(4.11)
!
p(xi )α − 1
(4.12)
i=1
As it can be seen this entropy is non logarithmic. There are the following relations between the
Shannon and the Tsallis entropy:
HT α1 (X) ≥ HS (X) ≥ HT α2 (X)
1
α→1 1 − α
lim
n
X
where α1 < 1 and α2 > 1
!
p(xi )α − 1
= log 2Hs (X) = log 2
i=1
(4.13)
n
X
i=1
p(xi ) loga
1
p(xi )
(4.14)
Moreover, the Tsallis entropy is nonextensive, i.e. it satisfies only pseudo-additivity criteria. For an
independent discrete random variables X,Y :
HT α (X, Y ) = HT α (X) + HT α (Y ) + (1 − α)HT α (X) + HT α (Y ).
It means that:
(4.15)
27
4.3. Comparison
HT α (X, Y ) > HT α (X) + HT α (Y ) for α ∈ (−∞, 1)
and HT α (X, Y ) < HT α (X) + HT α (Y ) for α ∈ (1, ∞)
To summarize parameterized (Renyi and Tsallis) entropies. Both of them:
– expose concentration for α > 1 and dispersion for α < 1;
– converge to the Shannon entropy for α → 1.
4.3. Comparison
In order to understand, compare and successfully apply parameterized entropies in our approach,
some simulation experiments were conducted.
Firstly, a comparison of Shannon, Renyi and Tsallis entropy of a binominal probability distributions
was performed. Then, calculated entropies for a uniform distribution were compared to check how they
depend on a number of equal probabilities and α-values. Next, the impact of rare and frequent events
on the entropy for different α-values was examined. Finally, we looked at exemplary network feature
distribution of addresses and ports in order to summarize them with Renyi and Tsallis entropy.
4.3.1. Binominal distribution
Shannon, Renyi and Tsallis entropy for a binominal probability distribution where the probability of
success is p, and the probability of failure is 1−p is depicted in Fig. 4.1, Fig. 4.2 and Fig. 4.3 respectively.
1
0.9
0.8
HS
0.7
0.6
0.5
0.4
0.3
0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95
P
Figure 4.1: Shannon entropy – binominal distribution
It is noticeable that maximum entropy for Shannon is obtained when p = 1 − p. Renyi and Tsallis
converge to the Shannon entropy for α → 1. Note: according to Eq. 4.14 values of Tsallis entropy need
28
4.3. Comparison
3
α = −2
α = −1
α=0
2.5
α=1
α=2
HRα
2
1.5
1
0.5
0
0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95
P
Figure 4.2: Renyi entropy of several α-values – binominal distribution
3
α = −0.5
α = −0.1
α=0
α=1
2.5
α=2
HT α
2
1.5
1
0.5
0
0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95
P
Figure 4.3: Tsallis entropy of several α-values – binominal distribution
29
4.3. Comparison
Shannon = Renyi α ∈ (−∞, ∞)
Tsallis α = −0.1
10
Tsallis α = 2
H(X)
8
6
4
2
0
2
3
4
5
6
n
7
8
9
10
Figure 4.4: Shannon, Renyi and Tsallis entropy – uniform distribution
to be multiplied by
1
log 2
to get the similar to Shannon curve for α → 1. For α ≥ 1 Renyi and Tsallis
entropy behaves similar to Shannon as both reach maximum for p = 1 − p, although Tsallis maximum
entropy changes with α, while Renyi maximum entropy is always equal to 1. For α ≤ 1 Tsallis and
Renyi entropy curves are concave as in this case low probabilities are exposed.
4.3.2. Uniform distribution
Shannon, Renyi and Tsallis entropy for a uniform probability distribution is depicted in Fig. 4.4.
In this distribution maximum entropy (case when probabilities are equal) is calculated for different n
representing number of equal probabilities. As it can be seen entropy always grows with n. Renyi entropy
grows similarly to Shannon, no matter which α-value is used. Tsallis entropy behaves differently as it
depends not only on n but also on α.
4.3.3. Impact of frequent and rare events
Example
Let us assume a discrete random variable X = addresses observed in network within last 1 min. X =
{“10.1.0.1”, “10.1.0.2”, ”10.1.0.3”, ”10.1.0.4”, ”10.1.0.5”}, and the following number of occurrences for
the subsequent addresses F req = {96, 1, 1, 1, 1}. Based on frequencies let us estimate the following
probability distribution of X (see Table. 4.1).
Let us examine what is an impact of a frequent event p(X = “10.1.0.1”) = 0.96 and rare event
p(X = “10.1.0.2”) = 0.01 on the Renyi and Tsallis entropy when α = −2 and α = 2 values are used.
To measure the impact of these events, we can check results of expotential expression p(xi )α existing in
both Renyi and Tsallis formulas [Eq. 4.8, Eq. 4.12]. The results are presented in Table. 4.2.
30
4.3. Comparison
Table 4.1: Probability distribution of X
X
“10.1.0.1”
“10.1.0.2”
”10.1.0.3”
”10.1.0.4”
”10.1.0.5”
p(X = x)
0.96
0.01
0.01
0.01
0.01
Table 4.2: Impact of frequent and rare events on the value of parameterized entropy
HH
α
HH
H
HH
p(xi )
-2
2
0.96
1.08
0.92
0.01
10000
0.0001
As it can be seen the impact of frequent events (expressed by p(xi ) = 0.96) on the entropy is greater
than impact of rare events (expressed by p(xi ) = 0.01) when positive α-values are used and in contrast,
the impact of rare events is greater than that of frequent events when negative α-values are used.
4.3.4. Entropy of exemplary distributions
In this section, an analysis of entropy value for sample distributions reflecting both legitimate and
anomalous network traffic is performed. The aim of this experiments is to show, how via entropies,
highlight concentration (frequent events forming the main mass) and dispersion (rare events forming the
tail) caused by typical network anomalies such as port and network scans. This type of anomalies are
specific for botnet-like malware. More details about scan anomalies can be found in [BI08] and [MFF14].
Experiments help to understand how parameterized entropies differ from Shannon entropy. Moreover, it
allows to learn how Renyi, Tsallis and Shannon entropies differ in a context of sensitivity.
Before we start analyzing distribution characteristic for anomalies, let us start with a very basic
example with even, concentrated and dispersed distribution as presented in Fig. 4.5. On Y axis we have
a number of occurrences of certain instances, e.g. addresses or ports which appear on X axis.
Now let us calculate Tsallis, Renyi and Shannon entropy for each distribution. The results of this
calculation are presented in Table 4.3. The change of entropy value in reference to even distribution
for concentrated and dispersed distribution is presented in Table 4.4. As it can be seen Shannon and
parameterized entropies behave similarly when positive α-values for parameterized entropies are used.
Higher concentration reflects a decrease in the entropy while higher dispersion reflects an increase in the
entropy value. For this case Renyi entropy seems to be the most sensitive. For negative α-values situation
is slighty different. Parameterized entropies differ from Shannon because for both concentration and
dispersion the value of entropy increases. This higher value of entropy for more concentrated distribution
is due to the fact that in this case estimated (based on number of occurrences) probabilities in the tail are
lower and more exposed by negative α-value. In general, for a negative α-values Tsallis entropy is far
31
4.3. Comparison
Even
Occurrences
3
2
1
Instances
Concentrated
Occurrences
30
20
10
1
Instances
Dispersed
Occurrences
3
2
1
Instances
Figure 4.5: Even, concentrated and dispersed distribution
32
4.3. Comparison
more sensitive than Renyi and Shannon.
Table 4.3: Entropy values for even, concentrated and dispersed distributions
Shannon
Renyi α = 2
Renyi α = −2
Tsallis α = 2
Tsallis α = −2
even
3.78
3.64
4.07
0.92
1581
concentrated
2.79
1.82
4.67
0.72
5508
dispersed
4.82
4.7
5
0.96
10968
Table 4.4: Entropy value change in reference to even distribution
Shannon
Renyi α = 2
Renyi α = −2
Tsallis α = 2
Tsallis α = −2
concentrated
−26%
−50%
+14%
−22%
+248%
dispersed
+27.5%
+29%
+22%
+4%
+594%
Now, suppose we have the following distribution of source and destination addresses as well as
destination ports for 1 minute of legitimate network traffic – Fig. 4.6. Again, on Y axis we have a number
of occurrences of particular addresses or ports which appear on X axis. As we see, all distributions are
quite even. Let us summarize these distributions by calculating Tsallis, Renyi and Shannon entropy –
Table 4.5.
Table 4.5: Entropy value for addresses and ports distributions - legitimate traffic
Shannon
Renyi α = 2
Renyi α = −2
Tsallis α = 2
Tsallis α = −2
src IP addresses
4.79
4.57
5.44
0.96
27437
dst IP addresses
4.65
4.18
5.48
0.94
29925
dst ports
3.75
2.85
5.42
0.86
26164
Now let us simulate two different types of anomalies in this traffic. In order to do it, we have to inject
some characteristic concentration or dispersion to particular distributions.
Port scan
Typically, during a port scan, concentration in addresses and dispersion in ports is observable. Let
us modify our distribution to simulate this type of anomaly. Suppose that a single host was scanned and
the number of scanned ports was equal to 50. Modified distributions are depicted in Fig. 4.7. Now let
us recalculate the entropy – Table 4.6 and compare new results with these for the legitimate traffic –
Table 4.7. As it can be seen each entropy properly reported (as a value change) a concentration in source
and destination addresses and dispersion in destination ports, although sensitivity of each entropy was
different. For the concentration, the most significant change was obtained for Renyi with positive αP. Bereziński Entropy-based Network Anomaly Detection
33
4.3. Comparison
Source IP addresses
5
Occurences
4
3
2
1
Instances
Destination IP addresses
Occurences
10
5
1
Instances
Destination ports
20
Occurences
15
10
5
1
Instances
Figure 4.6: Addresses and ports distributions – legitimate traffic
34
4.3. Comparison
value (about 50% decrease) and Tsallis with negative α-value (more than 200% increase). Dispersion in
destination ports was the most distinctly exposed by negative α-values of Tsallis entropy (as more than
100 % increase).
Table 4.6: Entropy value for addresses and ports distributions – port scan
Shannon
Renyi α = 2
Renyi α = −2
Tsallis α = 2
Tsallis α = −2
src IP addresses (conc.)
4.79
4.57
5.44
0.96
27437
dst IP addresses (conc.)
4.65
4.18
5.48
0.94
29925
dst ports (disp.)
3.75
2.85
5.42
0.86
26164
Table 4.7: Entropy value change in reference to legitimate traffic distributions – port scan
Shannon
Renyi α = 2
Renyi α = −2
Tsallis α = 2
Tsallis α = −2
−23%
−50%
+10%
−17%
+217%
dst IP addresses (conc.)
−22%
−46%
+10%
−16%
+217%
dst ports (disp.)
+48%
+54%
+28%
+10%
+104%
Network scan
Typically, during a network scan, concentration in source addresses and destination ports as well as
dispersion in destination addresses is observable. Let us modify our distribution to simulate this type
of anomaly. Suppose that a single host scanned 100 hosts to check if particular serviceon these hosts is
running. Modified distributions are depicted in Fig. 4.8. Now let us recalculate the entropy – Table 4.8
and compare new results with these for the legitimate traffic – Table 4.9. As it can be seen, each entropy
properly reported (as a value change) concentration in source addresses and destination ports as well
as dispersion in destination addresses, although similarly as in the previous example sensitivity of each
entropy was different. For both concentration and dispersion the most significant change was obtained
for Tsallis with negative α-value (more than 500% increase and more than 3500% increase respectively).
Table 4.8: Entropy value for addresses and ports distributions – network scan
Shannon
Renyi α = 2
Renyi α = −2
Tsallis α = 2
Tsallis α = −2
2.83
1.4
6.35
0.62
180164
dst IP addresses (disp.)
6.83
6.37
7.21
0.99
1093036
dst ports (conc.)
2.43
1.35
6.33
0.61
171809
Different network anomalies cause concentration or dispersion in different network feature distributions. Not only the aforementioned addresses and ports can be used. It should be augmented by others,
35
4.3. Comparison
Source IP addresses
5
Occurrences
4
3
2
1
Instances
Occurrences
10
5
1
Instances
Destination ports
Occurrences
20
15
10
5
1
Instances
Figure 4.7: Addresses and ports distributions – port scan
36
4.3. Comparison
Source IP addresses
100
Occurences
75
50
25
1
Instances
Occurences
10
5
1
Instances
Destination ports
100
Occurences
75
50
25
1
Instances
Figure 4.8: Addresses and ports distributions – network scan
37
4.3. Comparison
Table 4.9: Entropy value change in reference to legitimate traffic distributions – network scan
Shannon
Renyi α = 2
Renyi α = −2
Tsallis α = 2
Tsallis α = −2
−41%
−69%
+16%
−34%
+556%
dst IP addresses (disp.)
+47%
+52%
+31%
+5%
+3555%
dst ports (conc.)
−35%
−52%
+16%
−29%
+556%
e.g. flow duration, flow size or host degree (number of in/out connections) distributions. The whole set
of network feature distributions employed in Anode is presented in Chapter 6.
5. Network flows
In this chapter we describe a network flow export technique, as it was chosen as a data source for
the proposed method. The chapter starts with a general description and comparison of two main network
traffic capture and analysis techniques, namely flow export and packet inspection based on packets capture. Then, some details related to network flows as well as some known problems and difficulties are
presented. Finally, the flow export setup prepared to interact with Anode is described.
5.1. Flows vs. packets
There are two popular methods of network traffic capture and analysis, namely packet inspection which is based on packet capture [KS12], [BDKC10], [Roe99], [Pax99], [Par13] and flow
export [PSS+ 09], [SSSP12], [NfS], [Nto]. Detailed comparison of these methods can be found
in [HCT+ 14], [SSS+ 10], [GHK14], [SF02]. Packet-based approach refers to the process of capturing
individual packets and analyzing their headers and payloads, while flow-based approach is based on the
ability of network devices to aggregate packets in flows. Packet inspection is found in widely used network intrusion detection systems, e.g. Snort [Roe99] and Bro [Pax99]. In an attempt to find known attacks
or unusual behavior, such systems inspect the contents (header and payload) of every packet. This may
consume a lot of resources when network speed is high. In addition, the spread of encrypted protocols
poses a new challenge to this approach. In a flow export, aggregated information is captured and analyzed in order to find communication patterns within the network. Statistics on flows provide information
about who communicates with whom, when, how long, how often, using what protocol and service and
also how much data was transferred. Exporting of network flows was originally intended for accounting
and network profiling but recently it becomes the popular source of data for a network anomaly detection. One of the reason of its popularity is scalability in the context of network speed. Moreover, flow
export provides several other advantages compared to packet inspection. Firstly, flow export mechanism
is widely deployed in popular network devices. Secondly, significant data reduction can be achieved –
in the order of 1/2000 of the original volume, as was shown by Hofstede et al. in [HCT+ 14]. Lastly,
flow export is usually less privacy-sensitive than packet capture since typically only packet header data
is considered. Some researchers, e.g. Schaffrath [SS08] are more skeptical and claim that flow-based
anomaly detection is still immature and it can be used only as complement of packet inspection due to
the fact that many attacks or their symptoms can be hidden in the content of packets. Nowadays, this
claim is not entirely true because with a modern, flexible approach to network flows proposed in IPFIX
38
39
5.2. Flow export
standard [SBCQ09] any data from packets may be also included in flows. So, in this context the gap
between those two techniques is much smaller than before [SSHB14].
5.2. Flow export
The concept of network flows was introduced by Cisco as a NetFlow [Cla04] technology. The first
open version of NetFlow was the version 9, which was then standardized by the Internet Engineering
Task Force (IETF) under the name IP Flow Information Export (IPFIX) [SBCQ09]. Although, several
definitions of an IP flow exists, we follow this proposed by IETF:
“A flow is defined as a set of IP packets passing an observation point in the network during a certain
time interval. All packets belonging to a particular flow have a set of common properties.”
In the simplest form, these properties are source and destination addresses and ports.
5.2.1. Operating principle
Flow export is a quite complex process. It includes a real-time aggregation of packets into flows and
periodic export of reports to collectors. Some details are presented in Fig. 5.1.
Figure 5.1: Flow export architecture
Flows are created and exported to collectors by accounting module placed in network routers or
dedicated probes. This module is responsible for the metering process, i.e. creating flow records from
the observed traffic. It extracts the header from each packet seen on the monitored interface. Then each
header is marked with the timestamp and triggers an update to the entry in the flow cache. If there is no
flow matching the packet header, a new entry is created. A flow is considered ready to be exported to
collectors when:
40
5.2. Flow export
– the flow was idle (no packets have been detected in the flow) for a time longer than a given threshold (known as inactive timeout) which is 15 s by default;
– the flow reaches the maximum allowed lifetime (by default 30 min); when this happens, the flow
record is exported to the collector and a new one is created;
– the FIN or RST flags have been seen in a TCP connection;
– the flow-cache gets full; in this case, certain flow records are marked as expired and exported to
the collector.
It is important to emphasize that there is a difference between flows and TCP connections. A flow can
be also defined for a connectionless UDP or ICMP protocols when a set of packets has been sent between
two communication sides. Moreover, a flow does not have size restrictions, i.e. each communication
between source and destination hosts will generate a flow, even if a single packet has been exchanged.
A flow export protocol defines how flow records are exported to a collector. Typically such a record
contains source and destination addresses and ports, start and end timestamps, type of service, protocol,
flags, next hop router address, input and output SNMP interfaces, source and destination autonomous
system numbers and network masks. Additionally, each flow carries aggregated information about the
amount of packets and bytes exchanged. Details regarding NetFlow protocol are presented in Fig. 5.2.
IPFIX proposes a flexible protocol in which flow record formats can be defined by using templates. It
allows a larger set of parameters to be used. An IPFIX packet is logically divided into sections known as
sets. A message can normally consist of three kinds of sets, namely template (format description), data
and options. For more information regarding IPFIX message format, see [SBCQ09].
Figure 5.2: Flow record in NetFlow protocol (based on [Plo00])
The aim of the collector is to retrieve the flows created by the exporter and to store them in a form suitable for further monitoring or analysis. Typically collectors store this data in relational databases [Nto],
stream databases [CJSS03], column-oriented databases [GM10] or binary files [NfS].
41
5.2. Flow export
A flow is typically defined as a unidirectional sequence of packets, which means that there are two
entries for each connection between two endpoints – one from the server to client and one from the client
to server. Recently, bidirectional flows [TB08], which define one record for each session between two
endpoints has been also supported by vendors. The differences between unidirectional and bidirectional
flows are presented in Fig. 5.3.
Figure 5.3: Unidirectional vs bidirectional flows
5.2.2. Problems and difficulties
The modern approach assumes the use of dedicated probes transparently connected as a passive
appliance via Switch Port Analyzer (SPAN) ports [Tap12] or network Test Access Points (TAP) [Tap12]
rather than the usage of routers to export flows. Routers have often limited resources and they can be
overloaded when network traffic is high, so extra processing connected with NetFlow is undesirable. By
applying dedicated probes (as presented in Fig. 5.4) the problem with performance limitations of routers
can be easily overcome.
There are some cases where the use of flows can be problematic. For example, when a flow is created
for each packet passing through the monitoring device, as a consequence of a DDoS attack [SSP12]. In
such a case, the number of flows increases dramatically and extra load has to be be put on the monitoring
and analysis system. To mitigate this problem or, in general, to improve the performance of routers,
dedicated probes and sampling or aggregation techniques may be used. With sampling every n-th. packet
is inspected, instead of all packets [BTW+ 06]. In aggregation technique flows with similar characteristics
are merged [TWC13].
Another shortcoming of flow-based approach is an inaccuracy of metering modules applied in popular network routers or dedicated probes. It is known that the flow export may introduce artifacts in the
42
5.3. NetFlow export setup
Figure 5.4: Modern approach to flow exporting
exported data such as the imprecision in flow timestamps, lack of flags in TCP flows or invalid byte
counters [CSO+ 09], [Kog11], [TTSB11]. Moreover, these artifacts are widely spread among different
devices from various vendors as was reported by Hofstede et al. [HDS+ 13].
As it can be seen there are some problems and difficulties with flow export that may interfere anomaly
detection process. The aforementioned shortcoming should be taken into consideration and addressed
while planning a flow export setup for network anomaly detection systems.
In order to employ NetFlow to work with Anode an appropriate flow export setup has been prepared
and launched first. In our research we decided to rely an open-source software although we believe that
commercial solutions which consist of dedicated hardware, e.g. these proposed by Invea-Tech [Floc] are
much more efficient, reliable and error-prone. Open source community provides a wide range of software to generate and collect NetFlow data. The most popular are NfDump [NfS], Flow-tools [Flob],
YAF [IT10], Argus [Arg], Softflowd [Sof], Fprobe [Fpr], NtopNg [Nto]. Based on such criteria as popularity, maturity, simplicity and support we decided to choose Softflowd and NfDump. Softflowd is a
software probe capable of generating end exporting NetFlow records. Nfdump is a NetFlow collector
able to collect, filter and dump collected data. The flow export setup prepared for Anode is presented in
Fig.5.5.
Softflowd probe is running promiscuously (all traffic, not only intended, is captured) on the host
connected directly to the SPAN port of network central switch. In this way the whole traffic present in
the network is mirrored to the probe. Flows generated by Softflowd are periodically (every 5 min by
default) exported to NfDump collector via NetFlow 9 protocol. Statistics from Softlowd are passed to
43
Figure 5.5: NetFlow capture setup for Anode
NfDump via NfCapd deamon which is part of NfDump package. NfCapd reads data sent by the probe
and stores it into binary files. NfDump reads the NetFlow data from the files and display it in a chosen
format (bidirectional flows in our case). Results from NfDump are dumped to a text file and converted to
SQL queries in order to feed the relational database (MySQL).
6. Entropy-based network anomaly detector
This chapter is focused on detailed specification of the proposed network anomaly detector named
Anode. Firstly, a detailed architecture is given. Then, results of implementation are presented.
6.1. Architecture
The architecture of Anode is presented in Fig. 6.1.
Figure 6.1: Anode – the architecture
Anode analyzes network data captured by NetFlow probes. Typical probes such as routers can be
utilized but for this research dedicated probes connected to the SPAN ports (as presented in Chapter 5)
have been used. Flows are analyzed within fixed time intervals (every 5 min by default). Bidirectional
flows [TWC13] are chosen since, according to some works, e.g. [NSA+ 08], unidirectional flows may
entail biased results. Collected data is stored in the relational database and then analyzed. In order to
limit the area of search for anomalies, filters per direction, protocol and subnet are provided. Next,
depending on the mode, Tsallis or Renyi entropy of positive and negative α-values is calculated for
the set of network feature distributions which is presented in Table 6.1. Note: the Shannon version of the
method use Renyi entropy with α set to 1.
Initially, during the training phase, a dynamic profile is built using min and max entropy values
within a sliding time window for every hfeature, αi pair, i.e. hsrc ip, α = −2i, hsrc ip, α = −1i, . . .
44
45
6.1. Architecture
Table 6.1: Selected network feature distributions
Feature
Probability mass function
src(dst)ip(port)
number of xi as src(dst)address(port)
total number of src(dst)addresses(ports)
flows duration
number of flows with xi as duration
total number of flows
packets, bytes
number of pkts(bytes) with xi as src(dst) addr (port)
total number of pkts(bytes)
in(out)-degree
number of hosts with xi as in(out)−degree
total number of hosts
hdst ip, α = −2i, . . . hsrc port, α = −2i, . . . and so on. A way of building a profile is presented in
Fig. 6.2. By using sliding time window, traffic changes during the day can be reflected but at the same
time a margin for some minor differences, e.g. small delays between the profile and current traffic is
provided.
Figure 6.2: A way of building a profile
In the detection phase, the observed entropy is compared with the min and max values stored in the
profile according to the following rule:
rα (xi ) =
Hα (xi ) − k ∗ minα
,
k ∗ (maxa − minα )
k ∈ h1..2i
(6.1)
With this rule, anomaly threshold is defined. Detection is based on the relative value of entropy with
respect to the distance between min and max. Values rα (xi ) < 0 or rα (xi ) > 1 indicate abnormal concentration or dispersion. This abnormal dispersion or concentration for different feature distributions is
characteristic for anomalies. For example, during a port scan, a high dispersion in port numbers and high
concentration in addresses is observed. Coefficient k in the formula determines a margin for min and
max boundaries and may be used for tuning purposes. A high value of k, e.g. k = 2, limits the number
6.2. Implementation
46
of false alarms (alarms where no anomaly has taken place) while a low value (k = 1) increases the detection rate (the percentage of anomalies correctly detected). Some other approaches to thresholding based
on standard deviation – mean ± 2sdev and median absolute deviation – median ± 2mad [RFG05] has
been also taken into consideration but empirical results, showed that the proposed rule is the best choice.
Classification is based on popular methods (decision trees, Bayes nets, rules and functions) employed in
Weka [HFH+ 09]. Extraction of anomaly details is also assumed – related ports and addresses of attackers
and victims are obtained by looking into the top contributors to the entropy value.
6.2. Implementation
A proof of concept implementation of Anode has been developed in Microsoft .NET environment in
C# language. All experiments presented in this Thesis have been conducted with it. This implementation
allows to detect anomalies only in an off-line mode. A large number of options for a testing purposes is
provided. The software produces Weka arff files based on entropy calculations for each network feature
distribution. Recorded NetFlow data (e.g. whole day traffic) has to be captured and labeled in advance.
Classification performance is evaluated with Weka (ten-fold cross-validation mode) based on the provided arff files. User interface of the proof of concept implementation is presented in Fig. 6.3.
Currently Anode is also a module of the anomaly detection and security event data correlation system developed in SECOR project [CKP+ 11]. An implementation in SECOR has been developed in JAVA
WSO2 [WSO] environment. WS02 is a Service Oriented Architecture (SOA) middleware platform built
on Open Service Gateway initiative (OSGi) [OSG]. WSO2 environment contains among other elements,
Application Server, Enterprise Service Bus, Complex Event Processing (CEP) engine and Web Service framework. SECOR implementation of Anode allows on-line detection an classification of anomalies based on NetFlow reports coming in real time from probes deployed in the network. Each time an
anomaly is detected, Anode sends alert to CEP engine. These alerts conform to the STIX/CybOX format [Bar13] – a new and promising standard proposed by MITRE [Cor15]. An implementation of Anode
in SECOR as WSO2 feature is depicted in Fig. 6.4. Exemplary STIX/CybOX alert indicating port scan
anomaly detected by Anode is presented in Listing 6.1. It is noticeable that besides information about
timestamp and anomaly type, additional data regarding suspected attackers and victims as well as the
level of confidance are provided.
One of the feature of Anode is a visualization of timeseries of volume-based counters and entropy
of network features. Such visualization is presented in Fig.6.5. The border of legitimate area is marked
with red and blue lines. Everything which is below or above can be treated as anomalous. There are
anomalous areas in the presented timeseries marked with red oval. It can be seen that there are anomalies
which are invisible in a traffic volume expressed by flows, packets and octets but visible via entropy of
network features such as source and destination addresses and destination port.
47
6.2. Implementation
Figure 6.3: Anode – proof of concept implementation
Figure 6.4: Anode – the implementation in SECOR
48
6.2. Implementation
Listing 6.1: Exemplary STIX/CybOX alert generated by Anode
<stix:STIX_Header>
<stix:Title>Port Scan</stix:Title>
<stix:Information_Source>
<stixCommon:Identity>
<stixCommon:Name>Anode</stixCommon:Name>
</stixCommon:Identity>
<stixCommon:Time>
<cyboxCommon:Produced_Time>2015-02-0T15:42:24Z</cyboxCommon:Produced_Time>
</stixCommon:Time>
</stix:Information_Source>
</stix:STIX_Header>
<stix:Indicators>
<stix:Indicator timestamp="2015-02-0T15:39:24Z" id="7c3885fe" xsi:type="IndicatorType">
<stix:Indicator:Title>Attackers</Indicator:Title>
<stix:Indicator:Type xsi:type="IndicatorTypeVocab">IP Watchlist</Indicator:Type>
<stix:Indicator:Description>Potential attackers</ Indicator:Description>
<stix:Indicator:Observable id="1c798262">
<cybox:Object id="1980ce43">
<cybox:Properties xsi:type="AddressObject:AddressObjectType">
<Address_Value condition="Equals">10.10.0.155</Address_Value>
</cybox:Properties>
</cybox:Object>
</stix:Indicator:Observable>
<stix:Indicator:Confidence>
<stixCommon:Value xsi:type="HighMediumLowVocab">Medium</stixCommon:Value>
</stix:indicator:Confidence>
</stix:Indicator>
<stix:Indicator timestamp="2015-02-0T15:39:24Z" id="404d122c" xsi:type="IndicatorType">
<stix:Indicator:Title>Victims</Indicator:Title>
<stix:Indicator:Type xsi:type="IndicatorTypeVocab">IP Watchlist</Indicator:Type>
<stix:Indicator:Description>Potential victims</ Indicator:Description>
<stix:Indicator:Observable id="1c798262">
<cybox:Object id="1980ce43">
<cybox:Properties xsi:type="AddressObjectType">
<Address_Value condition="Equals" apply_condition="Any">1.1.0.7,1.1.0.9</Address_Value>
</cybox:Properties>
</cybox:Object>
<cybox:Object id="Port numbers:obj-1980ce43-8e03">
<cybox:Properties xsi:type="PortObjectType">
<Port_Value condition="Equals" apply_condition="Any">21,22,80,443</Port_Value>
</cybox:Properties>
</cybox:Object>
</stix:Indicator:Observable>
<stix:Indicator:Confidence>
<stixCommon:Value xsi:type="HighMediumLowVocab">High</stixCommon:Value>
</stix:indicator:Confidence>
</stix:Indicator>
</stix:Indicators>
49
6.2. Implementation
Figure 6.5: Timeseries visualisation in Anode
7. Dataset
This chapter presents the dataset developed to evaluate the proposed method. This dataset is based
on a real legitimate traffic and synthetic anomalies. The chapter starts with the origin of the idea. Next,
some details concerning legitimate and anomalous traffic are presented. Finally, explanation of anomaly
generation process is given.
7.1. Origin of the idea
An effort to build own dataset was taken due to:
– limited availability of datasets for network anomaly detection;
– the lack of proper labeling in shared datasets;
– the fact that most of the available datasets are obsolete in terms of legitimate traffic and anomalies;
– the absence of realistic data in datasets;
– small number of dataset with flows (conversion from packets is necessary, which results in loss of
labels);
– incompleteness of data (narrow range of anomalies, lack of anomalies related to botnet-like malware).
More details about the existing dataset are presented in Chapter 2 and in our paper [MBM15]. Due
to the aforementioned reasons an effort to develop labeled traffic traces based on real legitimate traffic
and synthetic anomalies was taken.
7.2. Legitimate traffic
Firstly, one-week legitimate traffic from a medium size local network connected to the Internet was
captured. This was accomplished using open source software – Softflowd [Sof] and NfDump [NfS]
as described in Chapter 5. The captured data in form of labeled bidirectional flows was exported to
the relational database. Because daily profile of each working day in the captures traffic was similar
(except some minor differences on Monday morning and Friday afternoon) one-day profiling approach
was chosen. So, from the whole data it was enough to extract two days (Tuesday, Wednesday) in order
50
51
to build the dataset. The first day is designated for a training (only legitimate traffic) and the second day
for a detection (legitimate traffic + injected anomalies). The profile expressed by the number of flows of
this 2-day traffic (before any injection of anomalies) is depicted in Fig. 7.1.
Figure 7.1: Legitimate traffic profile by number of flows
Figure 7.2: Legitimate traffic profile by number of packets
There is time t on x axis (5 minute fixed time window) and the number of flows on y (log scale)
axis. Working day starts around 7 a.m. and finishes around 4 p.m. The volume of the traffic expressed
52
by the number of flows for both days is similar, but looking at the volume expressed by the number of
packets (Fig. 7.2), this similarity is a bit lower. Some global characteristics of the traffic are presented in
Table 7.1.
Table 7.1: Global characteristics of legitimate traffic
Feature
Total count
Flows
767 498
Distinct src ip
733
Distinct dst ip
12 977
Distinct dst port
23 140
Packets
57 239 939
Bytes
46 216 539 894
Flow breakdown according to the transport protocol is depicted in Fig. 7.3. As it can be seen utilization of ICMP protocol is negligible.
Figure 7.3: Flow breakdown according to the transport protocol
Flow breakdown according to the services is depicted in Fig. 7.4. It is noticeable that most of the
traffic is connected with web browsing (HTTP, HTTPS and significant part of DNS). Flow breakdown
according to the activity of hosts is depicted in Fig. 7.5. One can see, that this distribution is rather even
and there are no distinctly active hosts. Flow breakdown according to the activity of servers is depicted
in Fig. 7.6. It is observable that the most utilized servers are primary and secondary DNS.
In the next step implementation of different scenarios of malicious network activities was prepared.
Synthetic anomalies typical for botnet-like network behavior were generated and then injected into the
legitimate traffic. More details concerning anomaly generation process are presented in Section 7.6.
53
7.3. Scenario 1
Figure 7.4: Flow breakdown according to the services
Figure 7.5: Flow breakdown according to the hosts
Figure 7.6: Flow breakdown according to the servers
7.3. Scenario 1
For this scenario, a small and low-rate ssh brute force, port scan, ssh network scan and TCP SYN
flood DDoS anomalies in different variants were generated. These anomalies do not form any realistic
54
7.4. Scenario 2
traces of malware, but a detection and a proper classification of such set of anomalies is crucial because
they are typical for behavior of a botnet-like malware. Main characteristics of the generated anomalies
are presented in Table 7.2.
Table 7.2: Characteristics of anomalies in Scenario 1
Type/kind
No. of flows
Duration [sec]
No. of victims
No. of attackers
1
1K
300
1
1
2
1K
100
1
1
3
2K
300
1
1
1
2K
200
1
50
2
2K
200
1
250
3
3K
300
1
50
4
3K
300
1
250
5
4K
400
1
50
6
4K
400
1
250
1
6K
60
6K
1
2
6K
300
6K
1
3
8K
80
8K
1
4
8K
400
8K
1
1
1K
50
1
1
2
1K
100
1
1
3
2K
100
1
1
4
2K
200
1
1
SSH brute force (bf)
TCP SYN flood DDoS (dd)
SSH network scan (ns)
Port scan (ps)
The generated anomalies were mixed with the legitimate traffic from Day2 (Wednesday) in the way
presented in Fig. 7.7. Anomalies are not injected into Day1 (Tuesday) as it is intended for the profiling
of a legitimate traffic.
As it can be seen, each anomaly is injected every 15 minutes mainly during working time. After
injection only a few anomalies are visible in the volume expressed by a number of flows or a number of
packets as depicted respectively in Fig. 7.8 and Fig. 7.9.
7.4. Scenario 2
For this scenario, a much more realistic sequence of a modern botnet-like malware behavior was
generated. The subsequent stages look as follows:
55
7.4. Scenario 2
Figure 7.7: Distribution of anomalies in time in Scenario 1
Figure 7.8: Legitimate and anomalous traffic by number of flows in Scenario 1
1. One of the hosts in local network gets infected with a botnet-like malware. In order to propagate
via network it starts scanning his neighbors. Malware is looking for hosts running Remote Desktop
Protocol (RDP) services. RDP is a proprietary protocol developed by Microsoft, which provides
a user with a graphical interface to connect to another computer over a network. RDP servers are
built into Windows operating systems. By default, the servers listen on TCP/UDP port 3389.
2. Hosts serving Remote Desktop services are attacked with a dictionary attack (similarly to the
technique found in MORTO worm [Bit11]).
3. After a successful dictionary attack vulnerable machines get infected and become a member of
botnet.
4. A peer-to-peer communication based on UDP transport protocol is established among the infected
hosts.
56
7.4. Scenario 2
Figure 7.9: Legitimate and anomalous traffic by number of packets in Scenario 1
5. On C&C server command botnet members start a low-rate DDoS attack called Slowrolis [DDL+ 12] on an external HTTP server. After a few minutes the server is blocked.
The whole scenario is presented in Fig. 7.10. Main characteristics of the generated anomalies are presented in Table 7.3.
Figure 7.10: Scenario 2
57
7.5. Scenario 3
Type
No. of flows
Duration [sec]
No. of victims
No. of attackers
Network scan (ns)
252
200
252
1
RDP brute force (bf)
720
550
53
1
Botnet p2p (p2p)
150
185
15
15
Slowrolis DDoS (dd)
1124
117
15
1
Anomalies generated for the scenario were mixed with the legitimate traffic from Day2 (Wednesday)
in the way presented in Fig. 7.11.
It is noticable, that whole scenario which consists of four anomalies is injected every hour during
working time. Anomalies in this scenario are small and slow. They represent only a small fraction of
total traffic, so after injection none of them is visible in the volume expressed by a number of flows or a
number of packets as depicted respectively in Fig. 7.12 and Fig. 7.13.
7.5. Scenario 3
For this scenario, another realistic sequence of a modern botnet-like malware behavior was generated.
The subsequent stages look as follows:
1. One of the hosts in local network infected with a modern botnet malware starts scanning his neighbors in order to propagate. It uses similar network propagation mechanism as it is employed in
Stuxnet worm [Stu11], [Den12], [BPBF12]. Malware is looking for hosts with open TCP and
UDP ports reserved for Remote Procedure Call (RPC). In Windows, RPC is an inter-process communication mechanism that enables data exchange and invocation of functionality residing in a
different process locally or via network. The list of ports used to initiate a connection with RPC is
as follows: UDP – 135, 137, 138, 445, TCP – 135, 139, 445, 593.
2. Hosts with open RPC ports are attacked with specially crafted RPC requests.
7.5. Scenario 3
58
3. After successful exploitation vulnerable machines get infected and become a member of botnet.
4. A direct communication to a single C&C server is established on each infected host.
5. On C&C server command botnet members start a DDoS amplification attack based on Network
Time Protocol (NTP). This attack is targeted to an external server. Botnet members send packets
with a forged source ip address (set to this used by the victim). Thus, replies from NTP server
59
7.5. Scenario 3
are sent to the victim instead to the attackers. Moreover, this attack is amplified, i.e. the attackers
send a small (234 bytes) packet with a command to get a list of interacting machines and NTP
server sends a large (up to 200 times bigger) reply to the victim. As a result the attackers turn
small amount of bandwidth coming from a small number of machines into a significant traffic
load hitting the victim. More details regarding NTP amplification DDoS attacks can be found
in [KHRH14].
The whole scenario is presented in Fig. 7.14. Main characteristics of the generated anomalies are presented in Table 7.4.
Figure 7.14: Scenario 3
Type
No. of flows
Duration [sec]
No. of victims
No. of attackers
Block scan (bs)
1.5K
80
168
1
RPC attack (rpc)
650
200
90
1
Botnet C&C communication (c&c)
125
190
63
1
NTP DDoS (dd)
2.9K
580
1
63 (spoofed to 1)
Anomalies generated for the scenario were mixed with the legitimate traffic from Day2 (Wednesday)
in the way presented in Fig. 7.15.
As it can be seen, the whole scenario which consists of four anomalies is injected every hour during
working time. Similarly to Scenario 2, anomalies here are small and slow and they represent only o small
60
7.6. Anomaly generator
fraction of total traffic. After injection none of them is visible in the volume expressed by a number of
flows or a number of packets as depicted respectively in Fig. 7.16 and Fig. 7.17.
In order to produce flows that can mimic an anomalous behavior, a dedicated tool in Python language
was developed [BPMP14]. With this tool one can generate flows according to the predefined policy. The
policy assigns a certain type of generation method to each field of flow record. In consequence a set of
flows which meets given statistical profile can be obtained.
61
Listing 7.1: Default generator group
[testgroup]
protocol = con[TCP]
srcIP = con[10.5.0.77]
dstIP = ran[10.1.0.1; (["0.0.0.1", "0.0.0.2", "-0.0.0.1"],[0.97,0.15,0.15]);
(10.1.0.1, 10.1.0.253)]
srcPort = ran[uniform(300, 500)]
dstPort = con[22]
fromSrcPkts = con[1]
fromSrcOctets = con[60]
fromDstPkts = con[1]
fromDstOctets = con[60]
#duration
dur = con[1]
#inter arrival time
iar = per[300:ran[uniform(10, 50)]; 800:con[500];
ran[([10, 11, 12, 13], [0.20, 0.30, 0.40, 0.10])]]
flags = con[SYN|ACK|RST]
Internally, the tool operates on integer values which are manipulated by generation methods introduced in [BWM08]. They are as follows: con (constant), ran (random) and per (periodical). Con generator is straightforward and does not need further explanation, others are described below. Ran generators
are used to obtain random values. There are two types of such generators: absolute (e.g. srcPort in Listing 7.1) or relative (e.g. dstIP in Listing 7.1). The value produced with the relative generator is summed
with that previously generated. This feature can be used to sweep across certain range of values. Both
generators can be initiated with either uniform or arbitrary distribution. Arbitrary distribution consists
62
of two list: values and probabilities of these values. Relative generator additionally needs a start value
and a range. Per generators are used to match a certain generating method with the sequence number of
the currently generated flow. They are initiated with a list of key-value pairs out of which the first one
represents the flow number and the second – the generator definition. On the last position, the default
generator is placed. For example, iar definition in Listing 7.1 means that every 300-th flow a uniform
(10,50) generator will be applied and respectively every 800-th flow generator returns 500. In other cases,
default generator will be applied. The set of generators shown in Listing 7.1 is called the generator group.
A policy may consist of multiple groups. In such a case the probability of using a certain generator group
must be defined by means of volume declaration as presented in Listing 7.4. Only one generator group
(considered as default) in a policy has a generator for each field of the flow. The additional groups may
override all or selected definitions of the default one. A concept of a generator group was introduced to
ensure that fields of the flow will be consistent with each other. For example, to disallow flows which are
too short when compared with the amount of bytes of the flow. There are phenomena on the network that
can only be modeled with sequences of flows. Our tool provides such a functionality which is available
through indexing of group names. In such indexed groups, one can use mechanisms which allow sharing
state between the subsequent flows. For example, in Listing 7.2, we enforced value of dstIP not to be
changed through the whole sequence.
Listing 7.2: Flows sequence modelling
[testgroup.1]
dstIP = args[usePrevValue]
dur = con[100]
[testgroup.2]
dur = con[1000]
To model more advanced scenarios where the sequence of anomalies is generated and state is shared
not only between subsequent flows but also subsequent generation groups representing particular anomalies a top-level policy file can be used. This mechanism is presented in Listing 7.3. In this model, the SSH
scan anomaly is generated first and then (after 5 s. according to offset) an SSH brute force attack is mimic.
Only hosts which are running SSH service in the first step are passed to the second step, so brute force
attack is performed only on vulnerable hosts. Generators for SSH scan and SSH brute force are presented
in Listing 7.4 and Listing 7.5 respectively.
An example of a similar generator is FLAME [BWM08], [Bra10]. There are however some
significant differences. FLAME comes with a very basic support for generating flows, forcing users to
implement all the generation logic by themselves, while our tool supports policy files. On the other hand
FLAME has fairly sophisticated functionality of inserting generated flows into the base traffic which
our tool does not support at all. Another interesting approach was introduced by Shiravi [SSTG12]. The
authors proposed to describe network traffic (not only flows) by a set of so-called α- and β-profiles
which can subsequently be used to generate a data set. The α-profiles consist of actions which should
be executed to generate a given event in the network (such as attack) while β-profiles are more similar
to our policy files where behavior of certain entities (packet sizes, number of packets per flow) are
63
represented by statistical model. On the whole this concept is similar to ours but far more complex and
thus, more difficult to use.
Listing 7.3: Anomaly sequence modelling – top level policy file
[step1]
#network scan (SSH service)
offset = 0
generator_config_file = ssh_scan.config
[step2]
offset = 5000
generator_config_file = ssh_brute.config
#filtering host runing SSH services
filter = SSHScan-open
#list of filtered hosts
template = {’__DST_IP_LIST__’:{’list’: { ’source’: ’DST_IP’ }}}
Listing 7.4: Anomaly sequence modelling – ssh_scan.config
[SSHScan-noResponse]
maxflows = 252
protocol = con[TCP]
srcIP = con[10.1.0.2]
dstIP = ran[10.1.0.3; (["0.0.0.1"],[1.0]); (10.1.0.1, 10.1.0.253)]
dstPort = con[22]
dur = con[0]
iar = ran[uniform(0, 3000) ]
flags = con[SYN]
[SSHScan-reset]
volume = 0.34
flags = con[SYN|RST|ACK]
dur = ran[([0, 1, 2, 15],[0.70,0.25,0.03,0.02])]
[SSHScan-open]
volume = 0.05
fromDstOctets = ran[uniform(180,210)]
dur = ran[uniform(40, 300)]
flags = con[FIN|SYN|RST|PSH|ACK]
iar = con[2]
[SSHScan-open.1]
srcPort = args[incrementPrevValue]
fromSrcPkts = ran[uniform(19,21)]
fromSrcOctets = ran[uniform(2500,2700)]
fromDstPkts = ran[uniform(14,18)]
fromDstOctets = ran[uniform(2850,2900)]
dur = ran[uniform(6000, 11000)]
flags = con[FIN|SYN|PSH|ACK]
Listing 7.5: Anomaly sequence modelling – ssh_brute.config
[bruteSSH]
maxflows = 1000
protocol = con[TCP]
srcIP = con[10.1.0.2]
dstIP = lis[__DST_IP_LIST__]
srcPort = ran[ 60310; (["1","2"],[0.9, 0.1]); (60300, 60400) ]
dstPort = con[22]
fromSrcPkts = ran[uniform(14, 15) ]
fromSrcOctets = ran[uniform(1400, 1500) ]
fromDstPkts = ran[uniform(9, 11) ]
fromDstOctets = ran[uniform(1200, 2400) ]
dur = ran[uniform(800, 1200) ]
iar = ran[uniform(1000, 3000) ]
flags = con[FIN|SYN|PSH|ACK]
[bruteSSH-success-1]
volume = 0.2
[bruteSSH-success-1.1]
[brute SSH-success-2]
volume = 0.1
[bruteSSH-failure]
volume = 0.7
[bruteSSH-failure.1]
64
8. Verification of the approach
This chapter presents verification of the proposed method. The aim of the verification is to check if
the method is able to detect network anomalies and categorize them. Firstly, results of correlation tests
performed in order to find the proper range of α-values and proper set of network features to use in the
method are presented. Next, the performance of the method is evaluated. Finally, some conclusions are
given.
8.1. Correlation
Firstly, correlation tests for various α-values and for various network features were performed. These
tests were important as strong correlation suggests that some results are closely related to each other and
thus it may be sufficient to restrict the scope of α-values and network features without impairing validity
of the method.
In the experiments, Pearson [HK11] and Spearman [HK11] correlation coefficients were used. For
a sample of discrete random variables X, Y the formula for Pearson coefficient is defined as:
n
P
rX,Y =
X̄ =
1
n
n
P
(Xi − X̄)(Yi − Ȳ )
i=1
(8.1)
sx sy
where
s
Xi and sx =
i=1
1
n−1
n
P
(Xi − X̄)2
i=1
The formula for Spearman coefficient for a sample of discrete random variables X, Y is defined as:
rX,Y = corr(RX, RY )
(8.2)
where
corr - Pearson correlation coefficient for sample
RX - ranks of X
RY - ranks of Y
The results of correlation between entropy timeseries for different α-values are presented in Table 8.1. The table shows the pairwise Tsallis α correlation scores from range h−1..1i where scopes
± |1 − 0.9|, |0.9 − 0.7|, |0.7 − 0.5|, |0.5 − 0| denote, respectively, strong, medium, weak, and no correlation. The sign determines if the correlation is positive (no sign or +) or negative (-). The presented
65
66
8.2. Performance evaluation
values (see Table 8.1) are average scores from 15 different network features. Only results based on Tsallis
entropy are presented as these obtained for Renyi entropy were similar.
Spearman
Pearson
Table 8.1: Results of correlation of α
α = −3
α = −2
α = −1
α=0
α=1
α=2
α=3
α = −3
1
0.99
0.96
0.66
0.12
−0.06
−0.09
α = −2
-
1
0.98
0.69
0.13
−0.06
−0.09
α = −1
-
-
1
0.75
0.16
−0.05
−0.08
α=0
-
-
-
1
0.44
0.18
0.12
α=1
-
-
-
-
1
0.88
0.82
α=2
-
-
-
-
-
1
0.97
α=3
-
-
-
-
-
-
1
α = −3
1
0.97
0.837
0.46
0.06
−0.09
−0.11
α = −2
-
1
0.94
0.57
0.1
−0.07
−0.1
α = −1
-
-
1
0.72
0.15
−0.06
−0.09
α=0
-
-
-
1
0.49
0.2
0.15
α=1
-
-
-
-
1
0.87
0.79
α=2
-
-
-
-
-
1
0.9
α=3
-
-
-
-
-
-
1
It should be noticed, that there is a strong positive linear (Pearson) and rank (Spearman) correlation
for negative α-values and strong positive correlation between α-values which are higher than 1. For
α = 0 there is a small positive correlation with negative values. For α = 1 (Shannon) there is a medium
correlation with α = 2 and α = 3. These results suggest that it is sufficient to use α-values from range
h−2..2i to obtain different and distinctive sensitivity levels of entropy.
Some interesting results of pairwise correlation between Tsallis entropy timeseries of different network features are presented in Table 8.2 and Table 8.3. The results obtained for Renyi are not presented
as they closely reassemble these obtained for Tsallis.
Results for one positive and one negative value of α are presented because these results differ significantly. Averaging (based on results from the whole range of α-values) would hide an essential property.
It is noticeable, that there is a strong positive correlation of addresses and ports for negative α-values and
no correlation for positive α-values.
Experiments were performed for Tsallis, Renyi and Shannon version of our method as well as traditional volume-based approach with flow, packet and byte counters. Final evaluation was performed with
Weka [HFH+ 09]. Experiments were performed with the dataset presented in Chapter 7. Some exemplary
results of entropies for a selected network feature distributions are presented below. Abnormally high dispersion in destination addresses distribution for network scan anomalies exposed by negative value of
67
Spearman
Pearson
Table 8.2: Results of correlation of features for α = −3
src ip
dst ip
src port
dst port
in-degree
out-degree
src ip
1
0.89
0.89
0.91
0.37
0.35
dst ip
-
1
0.98
0.89
0.27
0.55
src port
-
-
1
0.86
0.15
0.5
dst port
-
-
-
1
0.41
0.53
ind-egree
-
-
-
-
1
0.27
out-degree
-
-
-
-
-
1
src ip
1
0.9
0.85
0.87
0.47
0.69
dst ip
-
1
0.96
0.89
0.43
0.83
src port
-
-
1
0.83
0.3
0.69
dst port
-
-
-
1
0.53
0.12
in-degree
-
-
-
-
1
0.48
out-degree
-
-
-
-
-
1
Spearman
Pearson
Table 8.3: Results of correlation of features for α = 3
src ip
dst ip
src port
dst port
in-degree
out-degree
src ip
1
−0.07
−0.34
−0.02
−0.07
0.44
dst ip
-
1
−0.29
0.05
0.08
−0.28
src port
-
-
1
−0.42
0.59
−0.04
dst port
-
-
-
1
−0.39
0.01
in-degree
-
-
-
-
1
0.03
out-degree
-
-
-
-
-
1
src ip
1
0.03
−0.21
0.07
0.21
0.37
dst ip
-
1
−0.31
0.07
0.08
−0.35
src port
-
-
1
−0.55
0.64
0.23
dst port
-
-
-
1
0.52
0.76
in-degree
-
-
-
-
1
0.18
out-degree
-
-
-
-
-
1
α parameters is depicted in Fig. 8.1. One can see time t on x axis (5 minute time windows), result r
on y axis and α-values on z axis. The r value corresponds to thresholding rule applied in the method
[Eq. (6.1)]. Values of r outside (0..1) threshold are considered as anomalous. Anomalies are marked with
A on the time axis. Values of Shannon entropy are denoted as S. Abnormal concentration of flows duration for network scans is depicted in Fig. 8.2. This concentration is typical for anomalies with a fixed
data stream, i.e. anomalies where all flows have similar size. Fig. 8.3 shows ambiguous detection (no
distinctive pattern in exceeding 0 − 1 threshold) of port scan anomaly with flow, packet and byte counters. While experimenting, we noticed that measurements for all network features as a group work better
than single ones. In our experiments addresses, ports and duration feature distributions seemed to be the
68
most deterministic, although we believe that the proper set of network features is specific for particular
anomalies.
Figure 8.1: Abnormally high dispersion in destination addresses for network scan anomalies
(Renyi/Shannon)
Figure 8.2: Abnormally high concentration in flows duration for network scan anomalies
(Tsallis/Shannon)
Overall (whole data set, all network features) multi-class classification was performed with Weka.
Classes for each anomaly type plus one class for the legitimate traffic were defined. To correctly evaluate
predictive performance ten-fold cross-validation method [HFH+ 09] was used. From the performance
point of view, every classification attempt can produce one of four outcomes presented in Fig. 8.4.
An ideal classifier should not produce False Positive (FP) and False Negative (FN) statistical errors.
To evaluate non-ideal classifiers, one could measure proportion of correct assessments to all assessments
– Accuracy (ACC), the share of benign activities reported as anomalous – False Positive Rate (FPR) and
the share of anomalies missed by the detector – False Negative Rate (FNR). Usage of Precision – proporP. Bereziński Entropy-based Network Anomaly Detection
69
Figure 8.3: Ambiguous detection of port scan anomaly with a volume-based approach
predicted
n
True Positive
False Negative
TP
FN
False Positive
True Negative
FP
TN
P
N
P0
actual
p0
p
n0
N0
Figure 8.4: Possible results of classification
tion of correctly reported anomalies and Recall – share of correctly reported anomalies compared to the
total number of anomalies is another option. Based on these measures Receiver Operating Characteristics
(ROC) and Precision vs Recall (PR) plots are typically used [DG06]. Formulas for the mentioned metrics
as well as some additional measures which can be also used to evaluate the performance of classifier are
presented in Table 8.4.
To work effectively, anomaly detector should detect a substantial percentage of anomalies into the
supervised system, while still keeping the False Positive Rate at an acceptable level. Now, we present a
realistic example which shows that for reasonable set of assumptions, the FPR is the limiting factor for
the performance of anomaly detection. This is due to the base-rate fallacy phenomenon, which was first
pointed out by Axelsson [Axe99], that in order to achieve reasonable realistic detection rate known as
Bayesian Detection Rate (BDR), we have to achieve a very low FPR. Following Axelsson let us assume
that:
70
Table 8.4: Metrics used to evaluate performance of classification
Name
Formula
True Positive Rate (TPR) eqv. with Recall, Sensitivity
TPR =
True Negative Rate (TNR) eqv. with Specificity
TNR =
Positive Predictive Value (PPV) eqv. with Precision
PPV =
Negative Predictive Value (NPV)
NPV =
False Positive Rate (FPR) eqv. with Fall-out
FPR =
False Discovery Rate (FDR)
FDR =
False Negative Rate (FNR)
Accuracy (ACC)
F1 score – harmonic mean of Precision and Recall
TP
TP +FN
TN
FP +TN
TP
TP +FP
TN
TN +FN
FP
FP +TN
FP
FP +TP
= 1 − TNR
= 1 − PPV
FN
FN +TP
TP +TN
ACC = TP +FN
+FP +TN
2TP
F1 = 2TP +FP +FN
FNR =
– I: intrusive (anomalous) event in a system;
– ∼ I: non-intrusive event;
– A: alarm signaled;
– ∼ A: no alarm fired;
– TPR = P (A|I);
– FPR = P (A| ∼ I).
Goal is to maximize both:
– Bayesian Detection Rate (BDR), P (I|A);
– Bayesian True Negative Rate P (∼ I| ∼ A).
The fallacy stems out directly from the Bayes theorem [Bay63] which relates prior and posterior
probabilities of the events. According to the Bayes theorem:
P (A|I)P (I)
(8.3)
P (A|I)P (I) + P (A| ∼ I)(P (∼ I)
Our anomaly detector tests for the presence of anomaly 12 NetFlow reports per hour, as each report
P (I|A) =
includes last 5 minutes of network traffic. During the day it gives 12 ∗ 24 = 248 reports to test. Let us
make an realistic assumption that most of the network traffic is not malicious and the anomaly is present
in every hundred report. This allows us to calculate the following a’priori probabilities:
– P (I) = 0.01
– P (∼ I) = 1 − P (I) = 0.99
Now let us assume that our detector has the following characteristic:
71
– TPR = P (A|I) = 0.9
– FPR = P (A| ∼ I) = 0.01
Taking all these values let us calculate BDR:
P (I|A) =
P (A|I)P (I)
0.9 ∗ 0.01
=
≈ 0.47
P (A|I)P (I) + P (A| ∼ I)(P (∼ I)
(0.9 ∗ 0.01) + (0.01 ∗ 0.99)
(8.4)
If the characteristic of our detector is 5 times weaker in terms of false alarms – F P R = 0.05 than
BDR will drop to ≈ 0.15 and if it is 10 times weaker (F P R = 0.1) than BDR will drop to ≈ 0.08
which is an unacceptable.
In our approach we deal with a multi-classification problem where more than two classes are taken
into consideration, in contrast to single binary classification (or detection) where only two classes, i.e.
anomalous and not anomalous are used. We classify instances into one of many classes such as port
scan, network scan, brute force, etc. We use classifiers from Weka which transform internally multiclass problem into multiple binary one. To handle it Weka uses One-vs-All approach [Rif08]. The idea
behind this approach is:
– take n binary classifiers (one for each class);
– for the ith classifier, let the positive examples be all the points in class i, and let the negative
examples be all the points not in class i;
– let fi be the ith classifier than classify with the following rule:
f (x) = arg max fi (x)
i
(8.5)
Averaged Accuracy and avaraged FPR are the standard measures to assess a performance of multiclass classifiers. We also propose our own measurement method, namely weighted ROC curves which
is presented later in this section. Evaluation results based on Scenario 1 are presented in Table 8.5. As
it can be seen the results for several popular classifiers are presented. ZeroR is a trivial algorithm which
classifies the whole traffic as not anomalous. We included it here as a reference to other results as it
is expected that other classifiers perform better than that which does not detect anomalies. Evaluation
results for Scenario 2 and Scenario 3 are presented in Table 8.6 and Table 8.7 respectively.
It is noticeable that the best performance in each scenario was obtained by applying SimpleLogistic.
In Weka, SimpleLogistic is a classifier for building linear logistic regression models [SFH05]. Logistic
regression comes from the fact that linear regression [SL12] can be used to perform classification. The
idea of logistic regression is to make linear regression produce probabilities, thus instead of class prediction, there is a prediction of class probabilities. More details on SimpleLogistic can be found in [LHF05]
and [SFH05]. If we look at the detailed results of SimpleLogistic (Renyi entropy case) for all scenarios (Table 8.8), it can be seen that different classes are characterized by rather different performance of
recognition. For example, models for network scan and not anomalous are very strong, whereas that for
p2p is much weaker.
72
FPR
Accuracy
Table 8.5: Averaged performance of classification – Scenario 1
ZeroR
Bayes Network
Decision Tree J48
Random Forest
Simple Logistic
Tsallis
0.66
0.89
0.90
0.93
0.93
Renyi
0.66
0.88
0.89
0.90
0.93
Shannon
0.66
0.84
0.86
0.90
0.92
volume-based
0.66
0.72
0.77
0.76
0.80
Tsallis
0.66
0.07
0.08
0.07
0.06
Renyi
0.66
0.08
0.09
0.11
0.09
Shannon
0.66
0.08
0.11
0.12
0.08
volume-based
0.66
0.21
0.15
0.22
0.20
FPR
Accuracy
ZeroR
Bayes Network
Decision Tree J48
Random Forest
Simple Logistic
Tsallis
0.68
0.82
0.84
0.85
0.91
Renyi
0.68
0.83
0.88
0.89
0.92
Shannon
0.68
0.77
0.80
0.84
0.89
volume-based
0.68
0.68
0.73
0.78
0.80
Tsallis
0.68
0.22
0.14
0.27
0.11
Renyi
0.68
0.15
0.12
0.20
0.11
Shannon
0.68
0.29
0.21
0.28
0.15
volume-based
0.68
0.68
0.20
0.15
0.28
FPR
Accuracy
ZeroR
Bayes Network
Decision Tree J48
Random Forest
Simple Logistic
Tsallis
0.68
0.83
0.83
0.87
0.93
Renyi
0.68
0.83
0.83
0.85
0.94
Shannon
0.68
0.76
0.80
0.85
0.90
volume-based
0.68
0.68
0.62
0.65
0.66
Tsallis
0.68
0.13
0.17
0.22
0.10
Renyi
0.68
0.13
0.16
0.22
0.06
Shannon
0.68
0.23
0.16
0.22
0.13
volume-based
0.68
0.68
0.57
0.45
0.67
73
Table 8.8: Detailed performance of SimpleLogistic classifier (Renyi entropy case)
TPR/FPR
Scenario1
Scenario2
Scenario3
0.78/0
1/0.01
−
network scan
0.92/0.02
0.9/0
−
port scan
0.92/0.01
−
−
−
−
0.9/0.01
0.67/0.01
0.9/0
0.9/0.01
p2p
−
0.3/0.02
−
c&c
−
−
0.9/0.01
RPC exploitation
−
−
0.7/0.01
0.98/0.13
0.97/0.16
0.97/0.08
brute force
block scan
DDoS
not anomalous
As it was mentioned before, ROC plots can be also used to evaluate performance of a classifier. It
presents more detailed characteristic than Accuracy. The ROC curve is obtained for a classifier by plotting
TPR on x axis and FPR on y axis. The Area Under a Curve (AUC) is a scalar measurement method
connected with a ROC. While evaluating the classifier, the ROC plot considers all possible operating
points (thresholds) in the classifier’s prediction in order to identify the operating point at which the best
performance is achieved. A ROC curve does not directly present the optimal value, instead it shows
a tradeoff between TPR and FPR. Depending on the goals one can change the optimal operating point in
order to limit FPR or to increase TPR. An exemplary ROC for perfect (a), partially overlapped (b) and
random (c) classifier is presented in Fig. 8.5.
Figure 8.5: Examplary ROC curves
ROC is only applicable to the binary classification case. As in our approach more than two classes
are considered we can analyze individual ROC curves for each of the classes separately as presented in
74
Fig. 8.6. This way of analysis is supported in Weka. Based on such analysis we can find the performance
of particular classifier for each class. This may be useful to find the best classifier for a specific anomaly,
but in our case we are looking for the classifiers which are on average the best for all classes. This is
typically measured by avaraged Accuracy but this measure hides some important characteristics. Thus,
we propose a method of calculating a multi-class ROC based on weighted results of binary ROC for each
individual class. In Weka there is a feature to generate and save in files individual ROC curves for each
of the classes of multi-class classifier separately. Weka ROC file consists of operating points (threshold
values) and confusion matrices containing relevant TP, FN, TN, FP values for binary classification of
particular class. In our approach we take ROC files generated by Weka (one file for each class in the
dataset) and perform processing in order to average the results. The pseudo-code in Python language
explaining in details the way weighted ROC is calculated is presented in Listing 8.1. The main idea
behind it is that the corresponding ROC for each class is avaraged with respect to the number of class
instances. As a result we get one weighted multi-class ROC based on all binary ROC results.
(a) brute force
(b) network scan
(c) ddos
(d) port scan
(e) not anomalous
Figure 8.6: ROC curves (one per class) for SimpleLogistic classifier (Renyi entropy case) based on
Scenario 1
75
Listing 8.1: Weighted ROC calculation
def confusion_matrix(roc, threshold):
"""Finds a confusion matrix for the specified threshold value.
Args: roc - an object representing a ROC curve, threshold - a threshold value
Returns: A confusion matrix for the specified threshold value."""
points = roc.points_sorted_by_threshold
index = bsearch(points, threshold, key=lambda x: x[0])
if index >= 0:
point = points[index]
else:
"""take the point with the closest smaller threshold value"""
point = points[-index - 1]
return point.confusion_matrix
def all_thresholds(rocs):
"""Returns a list of all unique threshold values for the ROC curves.
Args: rocs - an iterable of ROC curves
Returns: A sorted list of all unique threshold values for supplied ROC curves."""
return sorted(set((th for roc in rocs for th in roc.all_thresholds)))
def weighted_roc_gen(rocs, weights):
"""Generates weighted ROC points for the supplied ROC curves and weights.
Args: rocs - an iterable of ROC curves, weights: a list/tuple of weights such as:
- 0 <= weights[i] <= 1 for all i, sum(weights) = 1,
- weights[i] corresponds to rocs[i],
- weights are proportional to the number of anomalies in dataset
Returns: Yields subsequent weighted confusion matrices."""
thresholds = all_thresholds(rocs)
for th in thresholds:
weighted_cm = 0, 0, 0, 0
for i in range(len(rocs)):
cm = confusion_matrix(rocs[i], th)
for j in range(len(weighted_cm)):
weighted_cm[j] += weights[i] * cm[j]
tp, fn, tn, fp = weighted_cm
fpr = fp / (fp + tn)
tpr = tp / (tp + fn)
yield fpr, tpr
def print_roc(rocs, weights):
"""Prints a weighted ROC as a function.
Args: rocs - an iterable of ROC curves, weights - a list/tuple of weights"""
for fpr, tpr in weighted_roc_gen(rocs, weights):
print("%f %f" % (fpr, tpr))
Weighted ROC curves for SimpleLogistic classifier for all scenarios are depicted in Fig. 8.7, Fig. 8.8
and Fig. 8.9 respectively. It is noticeable that the results for Tsallis and Renyi entropy are better than that
8.3. Conclusions
76
for Shannon. ROC curves for volume-based case shows that this approach is really poor.
8.3. Conclusions
Concluding the results of evaluation, we can observe that, for our scenarios:
– the Tsallis and Renyi entropy performed best;
– the Shannon entropy turned out to be a bit worse both in Accuracy and False Positive Rate as well
as weighted ROC curves;
– the volume-based approach performed poorly;
– using a broad spectrum of network features is essential to successfully detect and classify different
types of network anomalies; this was proved both by results of features correlation and good results
of classification of different anomalies in tested scenarios;
– using α-values from a set {−2, −1, 0, 1, 2} is a proper choice; it was proved by both the results of
α-values correlation and good results of classification of different anomalies in the tested scenarios;
using a larger set of α-values is redundant; using one value is not enough to recognize different
types of anomalies;
– the most suitable classifier (among popular method employed in Weka) to our approach is the
SimpleLogistic which relies on linear regression.
Our experiments were limited to a few number of cases. However, these cases are representative.
Although, only one day legitimate traffic profile was built, we have observed that this profile suits to
each regular working day in the network we monitored so there was no need to prepare whole-week
profiles in our case. The weak performance of the Shannon entropy and poor performance of volumebased counters allows to question whether they are the right approach to detect anomalies caused by
botnet-like malware.
77
8.3. Conclusions
(a) Tsallis
(b) Renyi
(c) Shannon
(d) volume-based
Figure 8.7: Weighted ROC curves for SimpleLogistic classifier – Scenario 1
78
8.3. Conclusions
(a) Tsallis
(b) Renyi
(c) Shannon
(d) volume-based
79
8.3. Conclusions
(a) Tsallis
(b) Renyi
(c) Shannon
(d) volume-based
9. Conclusions and further work
This chapter summarizes results achieved in this Thesis and outlines further works related to the subject of network anomaly detection. It also provides a list of publications completed during this research.
9.1. Conclusions
Looking for effective method for network anomaly detection is the general problem of this Dissertation. The scope of this work is limited to detection of anomalies indicating presence of modern
botnet-like malware in local networks. From many anomaly detection techniques an entropy-based approach was chosen and deeply examined. The goal of the Dissertation was accomplished by finding the
answer for the following questions:
– Are entropy measures useful in the context of network anomaly detection?
– Is it possible to effectively detect and classify small and low-rate anomalies connected with botnetlike malware activity in local networks by means of entropy?
– Is entropy-based approach better than traditional volume-based approach?
– Do parameterized entropies help to improve results obtained for Shannon entropy?
– What is the proper set of parameters for entropies to successfully detect network anomalies?
– Which network features should be taken into consideration in order to detect broad spectrum of
anomalies connected with botnet-like malware?
– Which popular classifiers work fine with entropy-based approach?
A thorough analysis of the state of the art provided in Chapter 2, shows that entropy-based approach
seems to be promising in detecting different types of network anomalies while traditional volume-based
approach is limited to anomalies, which results in significant and abrupt network volume change. Furthermore, the use of parameterized entropies allows overcoming some limitations of Shannon entropy
caused by its small descriptive power. Although entropy-based network anomaly detection is a deeply
investigated area the following gaps have been found:
– entropy-based detection of botnet-like anomalies in local networks remains intact;
80
9.1. Conclusions
81
– the problem of finding proper α-values for parameterized entropies in order to sucessfully detect
small and low-rate network anomalies is still open;
– there are some contrary results regarding network feature distributions to use with entropy-based
network anomaly detection;
– it is unknown which multi-class classification method works fine with entropy-based approach as
most authors focus mainly on detection;
– there is lack of available datasets to evaluate the method proposed in this Dissertation.
In order to answer the questions and to prove the claim of this Thesis, an original method based
on entropy measures was proposed – Chapter 5, Chapter 6 and than verified – Chapter 8. To make
verification possible a proper semi-synthetic dataset was prepared – Chapter 7. Additionally, a theoretical
background as well as deep comparison of different entropy measures is presented in Chapter 4 and some
network data capture issues are presented in Chapter 5.
General conclusions for the presented studies state that it is possible to detect modern botnet-like
malware in local networks based on detected anomalies with entropy-based approach. Based on the
detailed results presented in Chapter 8 we claim that:
– the Tsallis or Renyi should be used to achieve satisfactory effectiveness, as in our experiments
Shannon entropy turned out to be worse both in Accuracy and False Positive Rate as well as
weighted ROC curves;
– the volume-based approach performs poorly, so the popular methods based on simple counters like
number of flows, packets or bytes are completely ineffective for detection of botnet-like malware
based on the observed anomalies;
– using a broad spectrum of network features is essential to successfully detect and classify different
types of network anomalies; this was proved by both the results of features correlation and good
results of classification of different anomalies in the tested scenarios;
– using α-values for Tsallis or Renyi entropy from a set {−2, −1, 0, 1, 2} is a proper choice; it was
proved by results of α-values correlation and good results of classification of different anomalies
in the tested scenarios; using a larger set of α-values is redundant; using one value is not enough
to recognize different types of anomalies;
– the most suitable classifier (among popular methods employed in Weka) to our approach is the
SimpleLogistic which relies on linear regression.
These claims are based on experiments which were limited to a few number of cases. However, these
cases are representative.
9.2. Further work
82
9.2. Further work
9.2.1. On-line analysis in a real environment
As it was mentioned in Chapter 6 the final implementation of the proposed method allows on-line
classification of anomalies based on NetFlow reports coming in real time from probes deployed in network. So far, Anode has not been tested in real environment. Pilot implementation of Anode in Military
Communication Institute is planned for the near feature.
9.2.2. Multi-classifier
During our experiments we could observe that some classifiers were especially effective in detecting
a single, selected anomaly type. In particular, the false positives ratio was small. We thus believe that
it would be possible to combine a number of such "dedicated" classifiers in a way to cover the whole
anomaly spectrum. There is a number of proposed architecture variants for a multi-classifier like stacking,
bagging and boosting [SPBW12], however, we think this case may require a dedicated approach. We are
going to design such a multi-classifier and compare its performance against possible competitors.
9.2.3. Multi-label approach
A multi-class classification usually means classifying a data point into only one of the many (more
than two) classes possible. It is much more advanced and sophisticated than the simple detection where
only two classes, i.e. anomalous and not anomalous, are taken into account. However, a multi-class
approach does not solve the problem when more than one class should be assigned to one data point.
For example, the data point may belong to port scan and brute force classes simultaneously because
both anomalies appeared in the same time. With a multi-label classification [MKGD12], [Mek14] one
can classify a data point into more than one of the possible classes. This Dissertation does not cover
a multi-label problem, however this is one of the main directions for a further work.
9.2.4. Dataset
The dataset presented in this Thesis consists of synthetic anomalies mixed with real legitimate network traffic. For the feature we are planning to extend the range of anomalies in our dataset by adding
new models of network behavior typical for a botnet-like malware. Moreover, we are planning to capture
a legitimate traffic from a bit larger network and publish (after anonimization) the dataset in order to
make possible the comparison with other methods. As research in network anomaly detection suffers
from a lack of such up-to-date datasets such contribution is desirable. The dataset will be developed and
published under CybSecLab project sponsored by the Polish National Centre for Research and Development. We are going to put a attention to the sanitization process as weak anonimization in network traces
may disclose some classified data [PAPL06].
83
9.3. Publications
9.3. Publications
The work presented in this Thesis is based on several publications. The concept of multi-sensor
cyber defence system named SOPAS designed for federated environment is presented in:
P. Bereziński, J. Śliwa, R. Piotrowski, and B. Jasiul. Detection of multistage attack in federation
of systems environment. In NATO Science and Technology Organization MP-IST-111 - Information
Assurance and Cyber Defence, 2012.
Detection of the original multistage attack in the SOPAS system with the use of signaturebased and anomaly-based techniques and tools is described in:
B. Jasiul, R. Piotrowski, P. Bereziński, M. Choraś, R. Kozik, and J. Brzostek.
defence system - applied methods and techniques.
Federated cyber
In Communications and Information Systems
Conference (MCC), pages 1–6, Oct 2012.
A survey on modern entropy-based measures to use in network anomaly detection is presented in:
J. Pawelec, P. Bereziński, R. Piotrowski, and W. Chamela.
Entropy measures for internet traffic
anomaly detection. In TransComp conference on Computer Systems, Industry and Transport, pages
309–318, 2012.
The problem of lack of good, recent datasets that could be employed for evaluation of network
anomaly detection methods is pointed out in:
M. Małowidzki, P. Bereziński, and M. Mazur. Network intrusion detection: Half a kingdom for a
good dataset. In Proceedings of NATO STO SAS-139 Workshop, Portugal, 2015.
The concept of entropy-based anomaly detection method to use in SECOR system and preliminary
results based on a case study are presented in:
P. Bereziński, J. Pawelec, M. Małowidzki, and R. Piotrowski. Entropy-based internet traffic anomaly
detection: A case study. In W. Zamojski, J. Mazurkiewicz, J. Sugier, T. Walkowiak, and J. Kacprzyk,
editors, Proceedings of the Ninth International Conference on Dependability and Complex Systems
DepCoS-RELCOMEX, volume 286 of Advances in Intelligent Systems and Computing, pages 47–58.
Springer International Publishing, 2014.
Dataset generation process and performance results of the method based on one of the available
datasets are shown in:
84
9.3. Publications
P. Bereziński, M. Szpyrka, B. Jasiul, and M. Mazur.
Network anomaly detection using parame-
terized entropy. In K. Saeed and V. Snášel, editors, Computer Information Systems and Industrial
Management, volume 8838 of Lecture Notes in Computer Science, pages 465–478. Springer Berlin
Heidelberg, 2014, Best Paper Award.
Final implementation and performance results based on 3 botnet-like worm scenarios included in
the self-cratfted dataset are presented in:
P. Bereziński, B. Jasiul, and M. Szpyrka.
An entropy-based network anomaly detection method.
Entropy, 17(4):2367–2408, 2015, IF = 1.56, Lista A wykazu MNiSW - 30 pkt.
Bibliography
[Agg13]
C.C. Aggarwal. Outlier Analysis. Springer New York, 2013.
[Alp10]
E. Alpaydin. Introduction to Machine Learning. The MIT Press, 2nd edition, 2010.
[Arg]
Argus – Audit Record Generation and Utilization System. http://qosient.com/argus.
[Axe99]
S. Axelsson. The base-rate fallacy and its implications for the difficulty of intrusion detection. In Proceedings of the 6th ACM Conference on Computer and Communications
Security, CCS ’99, pages 1–7, New York, NY, USA, 1999. ACM.
[Bar13]
S.
tion
Barnum.
with
Standardizing
the
Structured
Cyber
Threat
Threat
Information
Intelligence
Informa-
eXpression
(STIXTM ).
http://stix.mitre.org/about/documents/STIX_Whitepaper_v1.0.pdf, 2013.
[Bay63]
T. Bayes. An essay towards solving a problem in the doctrine of chances. by the late
rev. Mr. Bayes, f. r. s. communicated by Mr. Price, in a letter to John Canton, m. a. and
f. r. s. Philosophical Transactions of the Royal Society, 53:370–418, 1763.
[BBK13]
M. Bhuyan, D.K. Bhattacharyya, and J. Kalita. Network anomaly detection: methods,
systems and tools. IEEE Communication Surveys and Tutorials, 16(1):1–34, 2013.
[BBR+ 12]
L. Bilge, D. Balzarotti, W. Robertson, E. Kirda, and C. Kruegel. Disclosure: Detecting
botnet command and control servers through large-scale NetFlow analysis. In Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC ’12,
pages 129–138, New York, NY, USA, 2012. ACM.
[BDKC10]
L. Braun, A. Didebulidze, N. Kammenhuber, and G. Carle. Comparing and improving
current packet capturing solutions based on commodity hardware. In Proceedings of
the 10th ACM SIGCOMM Conference on Internet Measurement, IMC ’10, pages 206–
217, New York, NY, USA, 2010. ACM.
[BDWS09]
D. Brauckhoff, X. Dimitropoulos, A. Wagner, and K. Salamatian. Anomaly extraction in backbone networks using association rules. In Proceedings of the 9th ACM
SIGCOMM Conference on Internet Measurement Conference, IMC ’09, pages 28–34,
New York, NY, USA, 2009. ACM.
85
86
BIBLIOGRAPHY
[BI08]
R.J. Barnett and B. Irwin. Towards a taxonomy of network scanning techniques. In
Proceedings of the 2008 Annual Research Conference of the South African Institute
of Computer Scientists and Information Technologists on IT Research in Developing
Countries: Riding the Wave of Technology, SAICSIT ’08, pages 1–7, New York, NY,
USA, 2008. ACM.
[Bit11]
T. Bitton. Imperva – Morto post mortem: Dissecting a worm. Technical report, 2011.
[BJS15]
P. Bereziński, B. Jasiul, and M. Szpyrka. An entropy-based network anomaly detection
method. Entropy, 17(4):2367–2408, 2015.
[BK13]
D.K. Bhattacharyya and J.K. Kalita. Network Anomaly Detection: A Machine Learning
Perspective. Chapman & Hall/CRC, 2013.
[BKPR02]
P. Barford, J. Kline, D. Plonka, and A. Ron. A signal analysis of network traffic anomalies. In Proceedings of the 2Nd ACM SIGCOMM Workshop on Internet Measurment,
IMW ’02, pages 71–82, New York, NY, USA, 2002. ACM.
[BlPJ12]
P. Bereziński, J. Śliwa, R. Piotrowski, and B. Jasiul. Detection of multistage attack
in federation of systems environment. In NATO Science and Technology Organization
MP-IST-111 - Information Assurance and Cyber Defence, 2012.
[BPBF12]
B. Bencsáth, G. Pék, L. Buttyán, and M. Félegyházi. The cousins of stuxnet: Duqu,
flame, and gauss. Future Internet, 4(4):971–1003, 2012.
[BPMP14]
P. Bereziński, J. Pawelec, M. Małowidzki, and R. Piotrowski. Entropy-based internet
traffic anomaly detection: A case study. In W. Zamojski, J. Mazurkiewicz, J. Sugier,
T. Walkowiak, and J. Kacprzyk, editors, Proceedings of the Ninth International Conference on Dependability and Complex Systems DepCoS-RELCOMEX, volume 286 of
Advances in Intelligent Systems and Computing, pages 47–58. Springer International
Publishing, 2014.
[Bra10]
D. Brauckhoff. Network Traffic Anomaly Detection and Evaluation. PhD thesis, ETH
Zürich, 2010.
[BSJM14]
P. Bereziński, M. Szpyrka, B. Jasiul, and M. Mazur. Network anomaly detection using
parameterized entropy. In K. Saeed and V. Snášel, editors, Computer Information Systems and Industrial Management, volume 8838 of Lecture Notes in Computer Science,
pages 465–478. Springer Berlin Heidelberg, 2014.
[BSS+ 14]
J.G. Bazan, M. Szpyrka, A. Szczur, Ł. Dydo, and H. Wojtowicz. Classifiers for behavioral patterns identification induced from huge temporal data. In Proceedings of
the Concurrency Specification and Programming Workshop (CSP 2014), volume 1269
of CEUR Workshop Proceedings, pages 22–33, Chemnitz, Germany, September 29October 1 2014.
87
BIBLIOGRAPHY
[BTW+ 06]
D. Brauckhoff, B. Tellenbach, A. Wagner, M. May, and A. Lakhina. Impact of packet
sampling on anomaly detection metrics. In Proceedings of the 6th ACM SIGCOMM
Conference on Internet Measurement, IMC ’06, pages 159–164, New York, NY, USA,
2006. ACM.
[BWM08]
D. Brauckhoff, A. Wagner, and M. May. Flame: A flow-level anomaly modeling engine. In Proceedings of the Conference on Cyber Security Experimentation and Test,
CSET’08, pages 1–6, Berkeley, CA, USA, 2008. USENIX Association.
[Cai]
Center
for
Applied
Internet
Data
Analysis
(CAIDA).
http://www.caida.org/data/overview.
[Cal09]
C. Callegari. Statistical approaches for network anomaly detection. In Proceedings of
the 4Th International Conference on Internet Monitoring and Protection, 2009.
[CBK09]
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput.
Surv., 41(3):15:1–15:58, July 2009.
[Cer13]
CERT Poland Raport. http://www.cert.pl/PDF/Report_CP_2013.pdf, 2013.
[CFVN09]
P. Casas, L. Fillatre, S. Vaton, and I. Nikiforov. Volume anomaly detection in data
networks: An optimal detection algorithm vs. the PCA approach. In R. Valadas and
P. Salvador, editors, Traffic Management and Traffic Engineering for the Future Internet, volume 5464 of Lecture Notes in Computer Science, pages 96–113. Springer
Berlin Heidelberg, 2009.
[CH67]
R. Clausius and T.A. Hirst. The mechanical theory of heat: With its applications to the
steam-engine and to the physical properties of bodies. J. van Voorst, London, 1867.
[CJSS03]
C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk. Gigascope: A stream
database for network applications. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD’03, pages 647–651, New
York, NY, USA, 2003. ACM.
[CKP+ 11]
M. Choraś, R. Kozik, R. Piotrowski, J. Brzostek, and W. Hołubowicz. Network events
correlation for federated networks protection system. In Towards a Service-Based Internet, volume 6994 of Lecture Notes in Computer Science, pages 100–111. Springer,
2011.
[CKS+ 09]
A. Callado, C. Kamienski, G. Szabo, B. Gero, J. Kelner, S. Fernandes, and D. Sadok.
A survey on internet traffic identification. Commun. Surveys Tuts., 11(3):37–52, July
2009.
[Cla04]
B. Claise. Cisco Systems NetFlow Services Export Version 9. RFC 3954, IETF, 2004.
88
BIBLIOGRAPHY
[CLLL12]
T.H. Cheng, Y.D. Lin, Y.C Lai, and P.C Lin. Evasion techniques: Sneaking through
your intrusion detection/prevention systems. Communications Surveys Tutorials, IEEE,
14(4):1011–1020, Fourth 2012.
[CMRB09]
S.E. Coull, F. Monrose, M.K. Reiter, and M. Bailey. The challenges of effectively
anonymizing network data. In Proceedings of the 2009 Cybersecurity Applications &
Technology Conference for Homeland Security, CATCH ’09, pages 230–236, Washington, DC, USA, 2009. IEEE Computer Society.
[Cor15]
MITRE
Corp.
The
MITRE
Corporation
Research
Overview.
http://www.mitre.org/research/overview, 2015.
[CRKM11]
Z.B. Celik, J. Raghuram, G. Kesidis, and D.J. Miller. Salting public traces with attack
traffic to test flow classifiers. In 4th Workshop on Cyber Security Experimentation and
Test, CSET ’11, San Francisco, CA, USA, August 8, 2011, 2011.
[Csi08]
I. Csiszár. Axiomatic characterizations of information measures. Entropy, 10(3):261–
273, 2008.
[CSO+ 09]
I. Cunha, F. Silveira, R. Oliveira, R. Teixeira, and C. Diot. Uncovering artifacts of
flow measurement tools. In S. Moon, R. Teixeira, and S. Uhlig, editors, Passive and
Active Network Measurement, volume 5448 of Lecture Notes in Computer Science,
pages 187–196. Springer Berlin Heidelberg, 2009.
[CT06]
T.M. Cover and J.A. Thomas. Elements of Information Theory. A Wiley-Interscience
publication. Wiley, 2006.
[DDL+ 12]
E. Damon, J. Dale, E. Laron, J. Mache, N. Land, and R. Weiss. Hands-on Denial of
Service lab exercises using Slowloris and Rudy. In Proceedings of the 2012 Information Security Curriculum Development Conference, InfoSecCD ’12, pages 21–29, New
York, NY, USA, 2012. ACM.
[Den87]
D.E. Denningm. An intrusion-detection model. IEEE Transactions on Software Engineering, 13(2):222–232, 1987.
[Den12]
D.E. Denning. Stuxnet: What has changed? Future Internet, 4(3):672–687, 2012.
[DG06]
J. Davis and M. Goadrich. The relationship between Precision-Recall and ROC curves.
In Proc. of the 23rd Int. Conference on Machine Learning, ICML’06, pages 233–240.
ACM, 2006.
[EDD+ 13]
A.F. Emmott, S. Das, T. Dietterich, A. Fern, and W. Wong. Systematic construction of
anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD
Workshop on Outlier Detection and Description, ODD ’13, pages 16–21, New York,
NY, USA, 2013. ACM.
89
BIBLIOGRAPHY
[Eim08]
R. Eimann. Network Event Detection with Entropy Measures. PhD thesis, University
of Auckland, 2008.
[ESB05]
R. Eimann, U. Speidel, and J.N. Brownlee. A T-entropy analysis of the slammer worm
outbreak. In Proceedings of Asia-Pacific Network Operations and Management Symposium (APNOMS), pages 434–445, 2005.
[ETGTDV04]
J.M. Estevez-Tapiador, P. Garcia-Teodoro, and J.E. Diaz-Verdejo. Anomaly detection
methods in wired networks: A survey and taxonomy. Comput. Commun., 27(16):1569–
1584, October 2004.
[FAAM07]
M. Foukarakis, D. Antoniades, S. Antonatos, and E.P. Markatos. Flexible and highperformance anonymization of NetFlow records using anontool. In Third International
Conference on Security and Privacy in Communication Networks and the Workshops,
SecureComm 2007, Nice, France, 17-21 September, 2007, pages 33–38, 2007.
[Faw06]
T. Fawcett. An introduction to ROC analysis. Pattern Recogn. Lett., 27(8):861–874,
2006.
[Floa]
AKMA Labs FlowMatrix. http://www.akmalabs.com.
[Flob]
Flow-tools – Tool set for working with NetFlow data. http://code.google.com/p/flowtools.
[Floc]
Invea-Tech FlowMon. https://www.invea.com.
[Fpr]
Fprobe – NetFlow probe. http://fprobe.sourceforge.net.
[FWB+ 11]
J. Francois, S. Wang, W. Bronzi, R. State, and T. Engel. BotCloud: Detecting botnets
using MapReduce. In Proceedings of the 2011 IEEE International Workshop on Information Forensics and Security, WIFS ’11, pages 1–6, Washington, DC, USA, 2011.
IEEE Computer Society.
[GGSZ14]
S. García, M. Grill, J. Stiborek, and A. Zunino. An empirical comparison of botnet
detection methods. Comput. Secur., 45:100–123, September 2014.
[GHK14]
M. Golling, R. Hofstede, and R. Koch. Towards multi-layered intrusion detection in
high-speed networks. In Cyber Conflict (CyCon 2014), 2014 6th International Conference On, pages 191–206, June 2014.
[GM10]
P. Giura and N. Memon. Netstore: An efficient storage infrastructure for network forensics and monitoring. In S. Jha, R. Sommer, and Ch. Kreibich, editors, Recent Advances
in Intrusion Detection, volume 6307 of Lecture Notes in Computer Science, pages 277–
296. Springer Berlin Heidelberg, 2010.
90
BIBLIOGRAPHY
[GMT05]
Y. Gu, A. McCallum, and D. Towsley. Detecting anomalies in network traffic using
maximum entropy estimation. In Proceedings of the 5th ACM SIGCOMM Conference
on Internet Measurement, IMC ’05, pages 32–32, Berkeley, CA, USA, 2005. USENIX
Association.
[GOB11]
H. Gascon, A. Orfila, and J. Blasco. Analysis of update delays in signature-based
network intrusion detection systems. Computers & Security, 30(8):613–624, 2011.
[GTDVMFV09] P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, and E. Vázquez. Anomalybased network intrusion detection: Techniques, systems and challenges. Computers &
Security, 28(1–2):18–28, 2009.
[GV03]
P.D. Grünwald and P.M.B. Vitányi. Kolmogorov complexity and information theory
with an interpretation in terms of questions and answers. J. of Logic, Lang. and Inf.,
12(4):497–529, 2003.
[HA04]
V.J. Hodge and J. Austin. A survey of outlier detection methodologies. Artificial
Intelligence Review, 22(2):85–126, 2004.
[HCT+ 14]
R. Hofstede, P. Celeda, B. Trammell, I. Drago, R. Sadre, A. Sperotto, and A. Pras. Flow
monitoring explained: From packet capture to data analysis with NetFlow and IPFIX.
Communications Surveys Tutorials, IEEE, 16(4):2037–2064, 2014.
[HDS+ 13]
R. Hofstede, I. Drago, A. Sperotto, R. Sadre, and A. Pras. Measurement artifacts in
NetFlow data. In M. Roughan and R. Chang, editors, Passive and Active Measurement, volume 7799 of Lecture Notes in Computer Science, pages 1–10. Springer Berlin
Heidelberg, 2013.
[HFH+ 09]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The
WEKA data mining software: An update. SIGKDD Explor. Newsl., 11(1):10–18, 2009.
[HK11]
J. Hauke and T. Kossowski. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae, 30(2):87–93,
2011.
[HLF+ 01]
J.W. Haines, R.P. Lippmann, D.J. Fried, M.A. Zissman, E. Tran, and S.B. Boswell.
1999 DARPA intrusion detection evaluation: Design and procedures.
report,
Technical
https://www.ll.mit.edu/mission/communications/cyber/CSTcorpora/files/TR-
1062.pdf, 2001. MIT Lincoln Laboratory, Technical Report 1062.
[HNG+ 07]
L. Huang, X. Nguyen, M. Garofalakis, M. Jordan, A.D. Joseph, and N. Taft. In-network
PCA and anomaly detection. Technical Report UCB/EECS-2007-10, EECS Department, University of California, Berkeley, Jan 2007.
[HP14]
HP - The bot threat. http://www.bitpipe.com/detail/RES/1384218191_706.html, 2014.
91
BIBLIOGRAPHY
[HPW02]
D. Harrington, R. Presuhn, and B. Wijnen. An Architecture for Describing Simple
Network Management Protocol (SNMP) Management Frameworks. RFC 3411, IETF,
2002.
[HTF09]
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: data
mining, inference and prediction. Springer, 2 edition, 2009.
[IT10]
C.M. Inacio and B. Trammell. Yaf: Yet another flowmeter. In Proceedings of the 24th
International Conference on Large Installation System Administration, LISA’10, pages
1–16, Berkeley, CA, USA, 2010. USENIX Association.
[ITA]
ACM Sigcomm Internet Traffic Archive. http://www.sigcomm.org/ITA.
[IZ14]
F. Iglesias and T. Zseby. Entropy-based characterization of internet background radiation. Entropy, 17(1):74–101, 2014.
[JP05]
S.S. Joshi and V.V. Phoha.
Investigating Hidden Markov Models capabilities in
anomaly detection. In Proceedings of the 43rd Annual Southeast Regional Conference
- Volume 1, ACM-SE 43, pages 98–103, New York, NY, USA, 2005. ACM.
[JPB+ 12]
B. Jasiul, R. Piotrowski, P. Bereziński, M. Choraś, R. Kozik, and J. Brzostek. Federated cyber defence system - applied methods and techniques. In Communications and
Information Systems Conference (MCC), pages 1–6, Oct 2012.
[JSl14a]
B. Jasiul, M. Szpyrka, and J. Śliwa. Detection and modeling of cyber attacks with Petri
Nets. Entropy, 16(12):6602–6623, 2014.
[JSl14b]
B. Jasiul, M. Szpyrka, and J. Śliwa. Malware behavior modeling with Colored Petri
Nets. In Computer Information Systems and Industrial Management - 13th IFIP TC8
International Conference, CISIM 2014, Ho Chi Minh City, Vietnam, November 5-7,
2014. Proceedings, pages 667–679, 2014.
[KAA+ 06]
D. Koukis, S. Antonatos, D. Antoniades, E.P. Markatos, and P. Trimintzios. A generic
anonymization framework for network traffic. In Communications, 2006. ICC ’06.
IEEE International Conference on, volume 5, pages 2302–2309, June 2006.
[Kar03]
J. Karmeshu. Entropy Measures, Maximum Entropy Principle and Emerging Applications. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2003.
[KBHJ08]
Y. Kopylova, D.A. Buell, C Huang, and J. Janies. Mutual information applied to
anomaly detection. pages 89–97, 2008.
[KDD]
The Third International Knowledge Discovery and Data Mining Tools (KDD) Cup
1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
92
BIBLIOGRAPHY
[KHRH14]
M. Kührer, T. Hupperich, C. Rossow, and T. Holz. Exit from hell? Reducing the impact
of amplification DDoS attacks. In Proceedings of the 23rd USENIX Security Symposium, August 2014.
[Kog11]
J. Kogel. One-way delay measurement based on flow data: Quantification and compensation of errors by exporter profiling. In International Conference on Information
Networking (ICOIN), pages 25–30, Jan 2011.
[KS12]
P. Kumar and S.A. Senthil. Establishing a valuable method of packet capture and packet
analyzer tools in firewal. International Journal of Research Studies in Computing,
1(1):11–20, 2012.
[KSD09]
A. Kind, M.P. Stoecklin, and X. Dimitropoulos. Histogram-based traffic anomaly detection. IEEE Trans. on Netw. and Serv. Manag., 6(2):110–121, June 2009.
[Kul59]
S. Kullback. Information Theory and Statistics. John Wiley & Sons, New York, 1959.
[LAS12]
C.F. L. Lima, F.M. Assis, and C.P. Souza. A comparative study of use of Shannon,
Renyi and Tsallis entropy for attribute selecting in network intrusion detection. In
Proceedings of the 13th International Conference on Intelligent Data Engineering and
Automated Learning, IDEAL’12, pages 492–501, Berlin, Heidelberg, 2012. SpringerVerlag.
[LBN]
Lawrence Berkeley National Laboratory/International Computer Science Institute Enterprise Tracing. http://www.icir.org/enterprise-tracing/Overview.html.
[LCD05]
A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM ’05, pages 217–228,
2005.
[LDZ05]
Z.i Li, A. Das, and J. Zhou. Usaid: Unifying signature-based and anomaly-based intrusion detection. In T. Ho, D. Cheung, and H. Liu, editors, Advances in Knowledge
Discovery and Data Mining, volume 3518 of Lecture Notes in Computer Science, pages
702–712. Springer Berlin Heidelberg, 2005.
[LG09]
W. Lu and A.A. Ghorbani. Network anomaly detection based on wavelet analysis.
EURASIP J. Adv. Sig. Proc., 2009, 2009.
[LHF05]
N. Landwehr, M. Hall, and E. Frank. Logistic model trees. 95(1-2):161–205, 2005.
[LPKL09]
D.C. Lee, B. Park, K.E. Kim, and J.J. Lee. Fast traffic anomalies detection using SNMP
MIB correlation analysis. In Proceedings of the 11th International Conference on Advanced Communication Technology - Volume 1, ICACT’09, pages 166–170, Piscataway, NJ, USA, 2009. IEEE Press.
93
BIBLIOGRAPHY
[LTG08]
W. Lu, M. Tavallaee, and A.A. Ghorbani. Detecting network anomalies using different
wavelet basis functions. In Sixth Annual Conference on Communication Networks and
Services Research (CNSR 2008), 5-8 May 2008, Halifax, Nova Scotia, Canada, pages
149–156, 2008.
[LWK10]
K. Limthong, P. Watanapongse, and F. Kensuke. A wavelet-based anomaly detection
for outbound network traffic. In Information and Telecommunication Technologies
(APSITT), 2010 8th Asia-Pacific Symposium on, pages 1–6, 2010.
[LWLS06]
C. Livadas, R. Walsh, D. Lapsley, and W.T. Strayer. Using machine learning techniques
to identify botnet traffic. In In 2nd IEEE LCN Workshop on Network Security (WoNS
2006, pages 967–974, 2006.
[LX01]
W. Lee and D. Xiang. Information-theoretic measures for anomaly detection. In Proceedings of the 2001 IEEE Symposium on Security and Privacy, SP ’01, pages 130–
143, Washington, DC, USA, 2001. IEEE Computer Society.
[LYW13]
Y.J. Lee, Y.R. Yeh, and Y.C.F. Wang. Anomaly detection via online oversampling
principal component analysis. IEEE Trans. on Knowl. and Data Eng., 25(7):1460–
1470, 2013.
[Mar05]
M. Marco. A step beyond Tsallis and Renyi entropies. Physics Letters A, 338(3–
5):217–224, 2005.
[MBM15]
M. Małowidzki, P. Bereziński, and M. Mazur. Network intrusion detection: Half a kingdom for a good dataset. In Proceedings of NATO STO SAS-139 Workshop, Portugal,
2015.
[MC03]
M.V. Mahoney and P.K. Chan. An analysis of the 1999 DARPA/Lincoln laboratory
evaluation data for network anomaly detection. In G. Vigna, C. Kruegel, and E. Jonsson, editors, Recent Advances in Intrusion Detection, volume 2820 of Lecture Notes in
Computer Science, pages 220–237. Springer Berlin Heidelberg, 2003.
[McH00]
J. McHugh. Testing intrusion detection systems: A critique of the 1998 and 1999
DARPA intrusion detection system evaluations as performed by Lincoln laboratory.
ACM Trans. Inf. Syst. Secur., 3(4):262–294, November 2000.
[MD08]
T. Maszczyk and W. Duch. Comparison of Shannon, Renyi and Tsallis entropy used
in decision trees. In L. Rutkowski, R. Tadeusiewicz, L.A. Zadeh, and J.M. Zurada,
editors, Artificial Intelligence and Soft Computing – ICAISC 2008, volume 5097 of
Lecture Notes in Computer Science, pages 643–651. Springer Berlin Heidelberg, 2008.
[Mek14]
MEKA: A Multi-label Extension to WEKA. http://meka.sourceforge.net/, 2014.
94
BIBLIOGRAPHY
[MFF14]
J. Mazel, R. Fontugne, and K. Fukuda. A taxonomy of anomalies in backbone network
traffic. In Wireless Communications and Mobile Computing Conference (IWCMC),
2014 International, pages 30–36, Aug 2014.
[MKGD12]
G. Madjarov, D. Kocev, D. Gjorgjevikj, and S. Deroski. An extensive experimental
comparison of methods for multi-label learning. Pattern Recogn., 45(9):3084–3104,
2012.
[MoM]
Cluster of European Projects aimed at Monitoring and Measurement (MoMe).
http://www.ist-mome.org/database/MeasurementData.
[MSHJSC+ 04] K. Myung-Sup, K. Hun-Jeong, H. Seong-Cheol, C. Seung-Hwa, and J.W. Hong. A
flow-based method for abnormal network traffic detection. In Network Operations and
Management Symposium, 2004. NOMS 2004. IEEE/IFIP, volume 1, pages 599–612,
April 2004.
[MSOS07]
R.A. Martin, M. Schwabacher, N. Oza, and A. Srivastava. Comparison of unsupervised
anomaly detection methods for systems health management using space shuttle. In
Main Engine Data,” Proceedings of the Joint Army Navy NASA Air Force Conference
on Propulsion, 2007, 2007.
[Nai09]
S. Nair. Finding Fault: Anomaly Detection for Embedded Networked Sensing. PhD
thesis, University of California, 2009.
[NfS]
NfSen – NetFlow Sensor. http://nfsen.sourceforge.net.
[NSA+ 08]
G. Nychis, V. Sekar, D. Andersen, H. Kim, and H. Zhang. An empirical evaluation of
entropy-based traffic anomaly detection. In Proceedings of the 8th ACM SIGCOMM
Conference on Internet Measurement, IMC ’08, pages 151–156, New York, NY, USA,
2008. ACM.
[Nto]
NtopNg – High-Speed Web-based Traffic Analysis and Flow Collection.
http://www.ntop.org.
[OSG]
OSGi – Open Service Gateway initiative. http://www.osgi.org.
[Owe10]
P. Owezarski. A database of anomalous traffic for assessing profile based IDS. In
F. Ricciato, M. Mellia, and E.W. Biersack, editors, TMA, volume 6003 of Lecture Notes
in Computer Science, pages 59–72. Springer, 2010.
[PAPL06]
R. Pang, M. Allman, V. Paxson, and J. Lee. The devil and packet trace anonymization.
Computer Communication Review, 36(1):29–38, 2006.
[Par13]
C. Parsons. Deep packet inspection and its predecessors. Technical report, 2013.
[Pax99]
V. Paxson. Bro: A system for detecting network intruders in real-time. Comput. Netw.,
31(23-24):2435–2463, 1999.
95
BIBLIOGRAPHY
[PBPC12]
J. Pawelec, P. Bereziński, R. Piotrowski, and W. Chamela. Entropy measures for internet traffic anomaly detection. In TransComp conference on Computer Systems, Industry
and Transport, pages 309–318, 2012.
[Plo00]
D. Plonka. Flowscan: A network traffic flow reporting and visualization tool. In In
USENIX LISA, pages 305–317, 2000.
[PLSG10]
C. Phua, V.C.S Lee, K. Smith-Miles, and R.W. Gayler. A comprehensive survey of
data mining-based fraud detection research. CoRR, abs/1009.6119, 2010.
[PP07]
A. Patcha and J.M. Park. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Comput. Netw., 51(12):3448–3470, August 2007.
[Prt]
Peassler PRTG – Network Monitor. http://www.paessler.com.
[PSS+ 09]
A. Pras, R. Sadre, A. Sperotto, T. Fioreze, D. Hausheer, and J. Schönwälder. Using NetFlow/IPFIX for network management. Journal of network and systems management,
17(4):482–487, November 2009.
[REHA13]
A.M. Riad, I. Elhenawy, A. Hassan, and N. Awadallah. Visualize network anomaly
detection by using k-means clustering algorithm. International Journal of Computer
Networks & Communications (IJCNC), 5(5), 2013.
[Ren70]
A. Renyi. Probability theory. By A. Renyi. [Enlarged version of Wahrscheinlichkeitsrechnung, Valoszinusegszamitas and Calcul des probabilites. English translation by
L. Vekerdi]. North-Holland Pub. Co Amsterdam, 1970.
[Ren11]
R. Renk. Modyfikacja metody opartej o słownik funkcji bazowych do wykrywania
anomalii w ruchu sieciowym w sieciach IP. PhD thesis, Uniwersytet TechnologicznoPrzyrodniczy im. Jana i J˛edrzeja Śniadeckich w Bydgoszczy, Wydział Telekomunikacji
i Elektrotechniki, Bydgoszcz, 2011.
[RFG05]
C. Reimann, P. Filzmoser, and R.G. Garrett. Background and threshold: critical comparison of methods of determination. Science of The Total Environment, 346(1–3):1 –
16, 2005.
[Rif08]
R.
Rifkin.
MIT
-
Multiclass
Classification.
http://www.mit.edu/∼9.520/spring09/Classes/multiclass.pdf, 2008.
[Roe99]
M. Roesch. Snort - lightweight intrusion detection for networks. In Proceedings of
the 13th USENIX Conference on System Administration, LISA ’99, pages 229–238,
Berkeley, CA, USA, 1999. USENIX Association.
[RSN+ 07]
S. Ranjan, S. Shah, A. Nucci, M. Munafo, R. Cruz, and S. Muthukrishnan. Dowitcher: Effective worm detection and containment in the internet core. In INFOCOM
2007. 26th IEEE International Conference on Computer Communications. IEEE, pages
2541–2545, May 2007.
96
BIBLIOGRAPHY
[SBCQ09]
G. Sadasivan, N. Brownlee, B. Claise, and J. Quittek. Architecture for IP Flow Information Export. RFC 5470, IETF, 2009.
[Scr]
Plixer Scrutinizer – Incident Response System. http://www.plixer.com.
[SCSC03]
M.L. Shyu, S.C. Chen, K. Sarinnapakorn, and L Chang. A novel anomaly detection
scheme based on principal component classifier. In in Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop, in conjunction with the Third
IEEE International Conference on Data Mining (ICDM’03, pages 172–179, 2003.
[SEB07]
U. Speidel, R. Eimann, and N. Brownlee. Detecting network events via T-entropy. In
Information, Communications Signal Processing, 2007 6th International Conference
on, pages 1–5, Dec 2007.
[SF02]
R. Sommer and A. Feldmann. NetFlow: Information loss or win? In Proceedings of
the 2Nd ACM SIGCOMM Workshop on Internet Measurment, IMW’02, pages 173–
174, New York, NY, USA, 2002. ACM.
[SFH05]
M. Sumner, E. Frank, and M. Hall. Speeding up logistic model tree induction. In
9th European Conference on Principles and Practice of Knowledge Discovery in
Databases, pages 675–683. Springer, 2005.
[Sha48]
C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423, 1948.
[Sim]
SimpleWeb. http://www.simpleweb.org/wiki/Traces.
[SK14]
M. Scanlon and M. Kechadi. The case for a collaborative universal peer-to-peer botnet investigation framework. In Proceedings of the 9th International Conference on
Cyber Warfare and Security (ICCWS 2014), pages 287–293, Purdue University, West
Lafayette, Indiana, USA, March 2014. Academic Conferences Limited.
[SKF08]
M.Z. Shafiq, S.A. Khayam, and M. Farooq. Improving accuracy of immune-inspired
malware detectors by using intelligent features. In Proceedings of the 10th Annual
Conference on Genetic and Evolutionary Computation, GECCO ’08, pages 119–126,
New York, NY, USA, 2008. ACM.
[SL12]
G.A.F. Seber and A.J. Lee. Linear Regression Analysis. Wiley Series in Probability
and Statistics. Wiley, 2012.
[SLBK08]
M.P. Stoecklin, J.Y. Le Boudec, and A. Kind. A two-layered anomaly detection technique based on multi-modal flow behavior models. In Proceedings of the 9th International Conference on Passive and Active Network Measurement, PAM’08, pages 212–
221, Berlin, Heidelberg, 2008. Springer-Verlag.
[Sof]
Softflowd – Flow-based Network Traffic Analyser. http://code.google.com/p/softflowd/.
97
BIBLIOGRAPHY
[Sol]
Solarwinds – Network Traffic Analyzer. http://www.solarwinds.com.
[Sop14]
Sophos
–
Security
Threat
Report
Smarter,
Shadier,
Stealthier
Malware.
http://www.sophos.com/en-us/threat-center/medialibrary/PDFs/other/sophos-securitythreat-report-2014.pdf, 2014.
[SPBW12]
I. Syarif, A. Prugel-Bennett, and G. Wills. Unsupervised clustering approach for network anomaly detection. In R. Benlamri, editor, Networked Digital Technologies, volume 293 of Communications in Computer and Information Science, pages 135–145.
Springer Berlin Heidelberg, 2012.
[SS08]
G. Schaffrath and B. Stiller. Conceptual integration of flow-based and packet-based
network intrusion detection. In D. Hausheer and J. Schönwälder, editors, Resilient
Networks and Services, volume 5127 of Lecture Notes in Computer Science, pages
190–194. Springer Berlin Heidelberg, 2008.
[SSHB14]
R. Sadre, A. Sperotto, R. Hofstede, and N. Brownlee. Flow-based approaches in network management: Recent advances and future trends. International Journal of Network Management, 24(4):219–220, 2014.
[SSP12]
R. Sadre, A. Sperotto, and A. Pras. The effects of DDoS attacks on flow monitoring
applications. In NOMS, pages 269–277, 2012.
[SSS+ 10]
A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller. An overview of
IP flow-based intrusion detection. Commun. Surveys Tuts., 12(3):343–356, July 2010.
[SSSP12]
R.O. Schmidt, A. Sperotto, R. Sadre, and A. Pras. Towards bandwidth estimation
using flow-level measurements. In R. Sadre, J. Novotný, P. Čeleda, M. Waldburger,
and B. Stiller, editors, Dependable Networks and Services, volume 7279 of Lecture
Notes in Computer Science, pages 127–138. Springer Berlin Heidelberg, 2012.
[SST+ 04]
A. Soule, K. Salamatia, N Taft, R. Emilion, and K. Papagiannaki. Flow classification
by histograms: Or how to go on safari in the internet. In Proceedings of the Joint
International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’04/Performance ’04, pages 49–60, New York, NY, USA, 2004. ACM.
[SSTG12]
A. Shiravi, H. Shiravi, M. Tavallaee, and A.A. Ghorbani. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur.,
31(3):357–374, May 2012.
[SSVP09]
A. Sperotto, R. Sadre, F. Vliet, and A. Pras. A labeled data set for flow-based intrusion
detection. In Proceedings of the 9th IEEE International Workshop on IP Operations
and Management, IPOM ’09, pages 39–50, Berlin, Heidelberg, 2009. Springer-Verlag.
98
BIBLIOGRAPHY
[STG+ 11]
S. Saad, I. Traoré, A.A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix, and P. Hakimian.
Detecting p2p botnets through network behavior analysis and machine learning. In
PST, pages 174–180. IEEE, 2011.
[Stu11]
ESET - Stuxnet Under the Microscope.
http://www.eset.com/us/resources/white-
papers/Stuxnet_Under_the_Microscope.pdf, 2011.
[Sym14]
2014
Internet
Security
Threat
Report,
Volume
19.
http://www.symantec.com/security_response/publications/threatreport.jsp, 2014.
[SZH+ 13]
W. Sha, Y Zhu, T. Huang, M. Qiu, Y Zhu, and Q. Zhang. A multi-order Markov chain
based scheme for anomaly detection. In IEEE 37th Annual Computer Software and
Applications Conference,COMPSAC Workshops 2013, pages 83–88, 2013.
[Tap12]
Gigamon
–
SPAN
Port
Or
TAP?
White
Paper.
https://www.netdescribe.com/downloads/span_port_or_tap_web.pdf, 2012.
[TB08]
B. Trammell and E. Boschi. Bidirectional Flow Export Using IP Flow Information
Export (IPFIX). RFC 5103, IETF, 2008.
[TBLG09]
M. Tavallaee, E. Bagheri, W. Lu, and A.A. Ghorbani. A detailed analysis of the KDD
Cup 99 data set. In Proceedings of the Second IEEE International Conference on
Computational Intelligence for Security and Defense Applications, CISDA’09, pages
53–58, Piscataway, NJ, USA, 2009. IEEE Press.
[TBS+ 11]
B. Tellenbach, M. Burkhart, D. Schatzmann, D. Gugelmann, and D. Sornette. Accurate network anomaly classification with generalized entropy metrics. Comput. Netw.,
55(15):3485–3502, October 2011.
[TBSM09]
B. Tellenbach, M. Burkhart, D. Sornette, and T. Maillart. Beyond Shannon: Characterizing internet traffic with generalized entropy metrics. In Proceedings of the 10th International Conference on Passive and Active Network Measurement, PAM ’09, pages
239–248, Berlin, Heidelberg, 2009. Springer-Verlag.
[Tel12]
B. Tellenbach. Detection, Classification and Visualization of Anomalies using Generalized Entropy Metrics. PhD thesis, ETH Zürich, 2012.
[TMSA11]
A. Teixeira, A. Matos, A. Souto, and L. Antunes. Entropy measures vs. Kolmogorov
complexity. Entropy, 13(3):595–611, 2011.
[TNS+ 05]
M. Titchener, R. Nicolescu, L. Staiger, T. Gulliver, and U. Speidel. Deterministic complexity and entropy. Fundam. Inform., 64(1-4):443–461, 2005.
[Tsa88]
C. Tsallis. Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical
Physics, 52(1-2):479–487, 1988.
99
BIBLIOGRAPHY
[TSB08]
C. Thomas, V. Sharma, and N. Balakrishnan. Usefulness of DARPA dataset for intrusion detection system evaluation. In Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 6973 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, March 2008.
[TTSB11]
B. Trammell, B. Tellenbach, D. Schatzmann, and M. Burkhart. Peeling away timing
error in NetFlow data. In Neil Spring and GeorgeF. Riley, editors, Passive and Active
Measurement, volume 6579 of Lecture Notes in Computer Science, pages 194–203.
Springer Berlin Heidelberg, 2011.
[TWC13]
B. Trammell, A. Wagner, and B. Claise. Flow Aggregation for the IP Flow Information
Export (IPFIX) Protocol. RFC 7015, IETF, 2013.
[UMa]
UMass Trace Repository (UMass). http://traces.cs.umass.edu.
[Ver14]
Verizon Data Breach Investigations Report. http://www.verizonenterprise.com/dbir/2014,
2014.
[W˛e12]
E. W˛edrowska. Miary entropii i dywergencji w analizie struktur. A Wiley-Interscience
publication. Wydawnictwo Uniwersytetu Warmińsko-Mazurskiego, 2012.
[WFH11]
I.H. Witten, E. Frank, and M.A. Hall. Data Mining: Practical Machine Learning Tools
and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 3rd
edition, 2011.
[WIT]
Waikato Internet Traffic Storage (WITS). http://wand.net.nz/wits.
[WP05]
A. Wagner and B. Plattner. Entropy based worm and anomaly detection in fast IP networks. In Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise, WETICE ’05, pages 172–177,
Washington, DC, USA, 2005. IEEE Computer Society.
[WSO]
WSO2 – SOA middleware platform. http://wso2.com.
[YKW11]
X. Yang, L. Ke, and Z. Wanlei. Low-rate DDoS attacks detection and traceback by
using new information metrics. Trans. Info. For. Sec., 6(2):426–437, June 2011.
[YZB04]
N. Ye, Y. Zhang, and C.M. Borror. Robustness of the Markov-chain model for cyberattack detection. pages 116–123, 2004.
[YZX+ 04]
B. Yue, Y. Zhao, Z. Xu, H. Fu, and F. Ma. An anomaly intrusion detection method
using Fourier transform. Journal of Electronics (China), 21(2):135–139, 2004.
[ZGMR07]
A. Ziviani, A.T.A. Gomes, M.L. Monsores, and P.S.S. Rodrigues. Network anomaly
detection using nonextensive entropy. Communications Letters, IEEE, 11(12):1034–
1036, December 2007.

PHD THESIS

Transcription

Similar documents

Herbert Girardet - Future of Cities Forum

Free-air gravity

2015 seasonal prediction using CNU/KOPRI Seasonal Prediction

Axenfeld`s Anomaly and Syndrome • Associated systemic findings

Salt Lake City Tribune August 2015

Bergvesenet

cinbad - CERN openlab

Slides of the third lecture - Pagine personali del personale della

Roy Parker - New Mexico Ski Hall of Fame

S ll S ll l d h t l d h t t i bilit f t i bilit f Small