PHD THESIS
Transcription
PHD THESIS
AGH University of Science and Technology in Krakow Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering D EPARTMENT OF A PPLIED C OMPUTER S CIENCE P H D T HESIS P RZEMYSŁAW B EREZI ŃSKI , M.S C . E NG . E NTROPY- BASED N ETWORK A NOMALY D ETECTION S UPERVISOR : Marcin Szpyrka, Ph.D., D.Sc. AUXILIARY SUPERVISOR : Bartosz Jasiul, Ph.D., Lt. Col. Krakow 2015 Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie Wydział Elektrotechniki, Automatyki, Informatyki i Inżynierii Biomedycznej K ATEDRA I NFORMATYKI S TOSOWANEJ ROZPRAWA DOKTORSKA MGR IN Ż . P RZEMYSŁAW B EREZI ŃSKI D ETEKCJA ANOMALII W RUCHU SIECIOWYM Z WYKORZYSTANIEM MIAR ENTROPIJNYCH P ROMOTOR : dr hab. Marcin Szpyrka, prof. AGH P ROMOTOR POMOCNICZY: ppłk dr inż. Bartosz Jasiul Kraków 2015 Working on the Ph.D. has been a wonderful but sometimes overwhelming experience. I would like to express my sincere gratitude to all those who provided me the possibility to complete this Thesis. First and foremost, I would like to thank my supervisors prof. Marcin Szpyrka and dr Bartosz Jasiul for enabling and supporting preparation of this Dissertation and for ensuring the freedom of work. Their guidance helped me in all the time of research and writing of this Thesis. Besides my supervisors, I would like to thank dr Joanna Śliwa and dr Rafał Piotrowski for the opportunity to work in many interesting cyber security projects. A special thanks goes to my labmates: dr Marek Małowidzki, Tomasz Dalecki, Michał Mazur and Robert Goniacz for their contribution to the software implemented during this research and inspiring discussions regarding not only cyber security. Last but not least, I would like to thank my family, my wife Marzena and my sons for all their love and encouragement. My sincere thanks also goes to my Mother for motivating me throughout my life. Przemysław Bereziński Abstract This Dissertation focuses on application of anomaly detection in the field of network intrusion detection. This is a very important issue as the number of cyber-attacks is alarmingly high and to make things worse it increases each year. Partially, this is due to the fact that widely used security solutions are ineffective against modern malicious software (malware). Damage from a malware, especially this which acts in botnets, can take many serious forms including loss of important data, reputation or money. Typically, botnet is a group of infected hosts (bots) operated by cybercriminals who are focused on making money. Recently, botnets are also used in a cyber warfare to conduct sabotage and espionage. Network anomaly detection is a very broad and heavily explored area. The first methods were proposed almost 40 years ago but the problem of finding a generic network anomaly detection method still remains unsolved. Dedicated methods for different types of network anomalies caused by malware can be found in the literature. Recently entropy-based methods for detection of various types of anomalies have gained a lot of attention. The use of entropy to detect botnet-like malware has not been investigated so far. The main goal of this Dissertation is to prove that entropy-based approach is suitable for detection of modern botnet-like malware in local networks and thus it can be used to complement existing signature-based solutions. In order to reach this goal and prove the claim of the Thesis, the Dissertation makes several original contributions. Comparison of different entropy measures to use in network anomaly detection is provided. Original network anomaly detection method based on parameterized entropies and supervised machine learning is proposed, implemented and verified with the representative semi-synthetic dataset prepared for this purpose due to the lack of realistic, complete and up-to-date datasets available. Moreover, analysis of proper parameters, suitable network features and right classifier to use with the method is conducted. Results of the verification shows that the proposed method with parameterized Renyi or Tsallis entropy acting together with classifier based on logistic regression allows to detect botnet-like malware with satisfactory level of detection rate while keeping low rate of false alarms. Comparable detection based on Shannon entropy or volume counters (number of flows, packets and bytes) turns out to be ineffective. 4 Streszczenie Przedstawiona rozprawa doktorska dotyczy detekcji anomalii w obszarze wykrywania włamań sieciowych. Tematyka ta jest bardzo ważna, gdyż liczba przeprowadzanych ataków cybernetycznych jest alarmujaco ˛ wysoka i co gorsza rośnie z roku na rok. Jest to cz˛eściowo spowodowane tym, że powszechnie stosowane rozwiazania ˛ ochrony cybernetycznej sa˛ nieskuteczne w detekcji aktualnego złośliwego oprogramowania. Szkody powodowane przez takie oprogramowanie, szczególnie to działajace ˛ w ramach botnetów, obejmuja˛ utrat˛e danych, reputacji czy pieni˛edzy. Typowo, botnet to grupa zainfekowanych hostów (botów) sterowanych przez przest˛epców cybernetycznych w celu uzyskania korzyści finansowych. Obecnie botnety sa˛ także wykorzystywane w wojnie cybernetycznej do sabotażowania czy też szpiegostwa. Detekcja anomalii sieciowych to temat szeroki i mocno eksplorowany. Pierwsze metody pojawiły si˛e prawie 40 lat temu, ale problem znalezienia metod generycznych nie został do tej pory rozwiazany. ˛ Istnieja˛ metody dedykowane do określonych typów anomalii zwiazanych ˛ ze złośliwym oprogramowaniem w tym metody bazujace ˛ na miarach entropijnych, które ostatnio ciesza˛ si˛e duża˛ popularnościa.˛ Nikt do tej pory nie zastosował ich jednak do detekcji złośliwego oprogramowania typu botnet. Głównym celem niniejszej rozprawy jest dowiedzenie, że wykorzystanie miar entropijnych pozwala na detekcj˛e złośliwego oprogramowania typu botnet w sieciach lokalnych i podejście to może być stosowane jako uzupełnienie obecnie wykorzystywanych metod bazujacych ˛ na sygnaturach. W celu potwierdzenia postawionej tezy w rozprawie przedstawiono oryginalny wkład w obecny stan wiedzy. Porównano kilka miar entropijnych pod katem ˛ ich zastosowania w detekcji anomalii sieciowych. Zaproponowano, zaimplementowano i zweryfikowano autorska˛ metod˛e bazujac ˛ a˛ na parametryzowanych entropiach i nadzorowanym uczeniu maszynowym. Weryfikacj˛e wykonano na podstawie własnego, reprezentatywnego zbioru danych, jako że dost˛epne zbiory okazały si˛e nierealistyczne, niekompletne i przestarzałe. Dodatkowo, dokonano analiz pod katem ˛ właściwych wartości parametrów, stosownych cech ruchu sieciowego i odpowiedniego klasyfikatora dla zaproponowanej metody. Badania skuteczności wykazały, że metoda wykorzystujaca ˛ parametryzowana entropie Renyiego lub Tsallisa wraz z klasyfikatorem bazujacym ˛ na regresji logicznej pozwala na skuteczne wykrywanie anomalii zwiazanych ˛ ze złosliwym oprogramowaniem typu botnet przy jednoczesnym zachowaniu niskiego poziomu fałszywych alarmów. Odpowiadajace ˛ detekcja bazujac ˛ a˛ na entropii Shannona lub podejściu wolumenowych bazujacym ˛ na prostych licznikach takich jak liczba przepływów, pakietów i bajtów okazuje si˛e nieskuteczna. 5 Contents Abstract ............................................................................................................................................ 4 Streszczenie...................................................................................................................................... 5 1. Introduction.................................................................................................................................... 9 1.1. Motivation, Scope and Research Problem............................................................................. 9 1.2. Goal and Plan of the Work .................................................................................................... 10 1.3. Original contribution ............................................................................................................. 11 1.4. Exclusions.............................................................................................................................. 12 2. Related work................................................................................................................................... 13 2.1. General overview of network anomaly techniques................................................................ 13 2.2. Closely related work.............................................................................................................. 14 2.2.1. Detection via network volume counters..................................................................... 15 2.2.2. Detection via network feature distributions ............................................................... 16 2.3. Existing Datasets ................................................................................................................... 18 2.4. Summary................................................................................................................................ 20 3. Entropy-based network anomaly detector – preface.................................................................. 21 3.1. Main features ......................................................................................................................... 22 3.2. Classification of the approach ............................................................................................... 22 4. Entropy ........................................................................................................................................... 24 4.1. Shannon entropy .................................................................................................................... 24 4.2. Parameterized entropy ........................................................................................................... 25 4.3. Comparison............................................................................................................................ 27 4.3.1. Binominal distribution ............................................................................................... 27 4.3.2. Uniform distribution................................................................................................... 29 4.3.3. Impact of frequent and rare events............................................................................. 29 4.3.4. Entropy of exemplary distributions............................................................................ 30 5. Network flows ................................................................................................................................. 38 5.1. Flows vs. packets ................................................................................................................... 38 5.2. Flow export............................................................................................................................ 39 6 CONTENTS 7 5.2.1. Operating principle .................................................................................................... 39 5.2.2. Problems and difficulties............................................................................................ 41 5.3. NetFlow export setup............................................................................................................. 42 6. Entropy-based network anomaly detector .................................................................................. 44 6.1. Architecture ........................................................................................................................... 44 6.2. Implementation...................................................................................................................... 46 7. Dataset............................................................................................................................................. 50 7.1. Origin of the idea................................................................................................................... 50 7.2. Legitimate traffic ................................................................................................................... 50 7.3. Scenario 1 .............................................................................................................................. 53 7.4. Scenario 2 .............................................................................................................................. 54 7.5. Scenario 3 .............................................................................................................................. 57 7.6. Anomaly generator ................................................................................................................ 60 8. Verification of the approach.......................................................................................................... 65 8.1. Correlation ............................................................................................................................. 65 8.2. Performance evaluation ......................................................................................................... 66 8.3. Conclusions ........................................................................................................................... 76 9. Conclusions and further work ...................................................................................................... 80 9.1. Conclusions ........................................................................................................................... 80 9.2. Further work .......................................................................................................................... 82 9.2.1. On-line analysis in a real environment....................................................................... 82 9.2.2. Multi-classifier ........................................................................................................... 82 9.2.3. Multi-label approach .................................................................................................. 82 9.2.4. Dataset........................................................................................................................ 82 9.3. Publications ........................................................................................................................... 83 P. Bereziński Entropy-based Network Anomaly Detection List of Abbreviations ACC – Accuracy AUC – Area Under a Curve BDR – Bayesian Detection Rate CEP – Complex Event Processing CybOX – Cyber Observable Expression DDoS – Distributed Denial of Service DNS – Domain Name System DoS – Denial of Service FDR – False Discovery Rate FN – False Negative FNR – False Negative Rate FP – False Positive FPR – False Positive Rate HIDS – Host-based Instrusion Detection System ICMP – Internet Control Message Protocol IDS – Intrusion Detection System IP – Internet Protocol IPFIX – IP Flow Information Export IRC – Internet Relay Chat NIDS – Network-based Intrusion Detection System NPV – Negative Predictive Value NTP – Network Time Protocol P2P – Peer-to-Peer PCA – Principal Component Analysis PPV – Positive Predictive Value PR – Precission Recall RDP – Remote Desktop Protocol ROC – Receiver Operating Characteristic RPC – Remote Procedure Call SNMP – Simple Network Management Protocol SQL – Structured Query Language STIX – Structured Threat Information Expression TCP – Transport Control Protocol TN – True Negative TNR – True Negative Rate TP – True Positive TPR – True Positive Rate UDP – User Datagram Protocol 1. Introduction This chapter introduces the reader to the subject of the Thesis. It is divided into four sections. Section 1.1 presents motivation, scope and briefly describes the research problem. It shows why it is an important issue in the field of Computer Science. Section 1.2 specifies the main goal of the research and presents the steps that were made in order to reach it. It familiarizes the reader with the outline of this Dissertation and presents contents of subsequent chapters. Section 1.3 emphasizes those results of the Thesis that are considered as the original contribution. Section 1.4 discusses issues that are deliberately not addressed in this research. 1.1. Motivation, Scope and Research Problem Data mining is an interdisciplinary subfield of Computer Science involving methods at the intersection of artificial intelligence, machine learning and statistics [HTF09]. One of the data mining task is anomaly detection which is the analysis of large quantities of data to identify items, events or observations which do not conform to an expected pattern. Anomaly detection is applicable in a variety of domains, e.g. fraud detection [PLSG10], fault detection [Nai09], system health monitoring [MSOS07] but this Dissertation focuses on application of anomaly detection in the field of network intrusion detection. The first anomaly detection method for intrusion detection was proposed almost 40 years ago by Denning [Den87]. Today network anomaly detection is a very broad and heavily explored subject but the problem of finding a generic method for a wide range of network anomalies is still unsolved. There are some problems with anomaly detectors which have to be addressed. The main challenges are: high false alarm rates, long computation time, tuning and calibration and root-cause identification [Bra10]. Because of that anomaly detection techniques are rarely implemented in commercial Intrusion Detection Systems (IDS). Such systems mostly make use of the common signature-based (or misuse-based) technique. This approach is known of its shortcomings [LDZ05], [CLLL12], [GOB11], [JSl14a], [JSl14b]. Signatures describe only illegal patterns in network traffic, so a prior knowledge is required [LDZ05]. Signature-based solutions do not cope with evasion techniques and attacks yet unknown (0-days) [CLLL12], [JSl14a], [JSl14b]. Moreover, they are unable to detect a specific attack until a rule for the corresponding vulnerability is created, tested, released and deployed, which usually takes some time [GOB11]. As the widely used intrusion detection systems are often ineffective against a modern malicious software (malware), a proper network anomaly detection as one of the possible solutions to complement signature-based approach is 9 1.2. Goal and Plan of the Work 10 so essential. Recently, entropy-based methods which rely on network feature distributions have been of great interest [Eim08], [WP05], [NSA+ 08], [Tel12], [YKW11], [KBHJ08]. It is crucial to check if with entropy-based approach it is possible to successfully detect anomalous network activity caused by modern botnet-like malware [HP14]. This is a really important issue, as the number of such malware as well as the level of its sophistication increases each year [Sop14]. Botnet is a group of infected hosts (bots) controlled by Command and Control (C&C) servers operated by cyber-criminals and according to recent reports provided by cyber security organizations [Ver14], [Sym14], [Cer13], [Sop14] they are one of the most sophisticated and popular types of cybercrime today. Damage from such a malware can take many serious forms including loss of important data, reputation or money. Moreover, nowadays botnets are also used in a cyber warfare to conduct sabotage and espionage [SK14]. Entropy-based approach to detect anomalies caused by botnet-like malware in local networks is a not investigated area. Some entropybased methods proposed in the past, e.g. [TBSM09], [YKW11], [NSA+ 08] deal with massive spreads of rather old not botnet-like worms and different types of Distributed Denial of Service (DDoS) attacks in high-speed backbone networks controled by Internet Service Providers (ISP). In the work presented in this Dissertation we have tried to find the best way of using entropy in order to properly detect and categorize network anomalies which indicate existence of a botnet-like malware in local networks. This type of anomalies is often very small and hidden in a network traffic volume expressed by the number of flows, packets or bytes, so their detection via popular solutions and methods which rely mostly on a traffic volume changes, e.g. [NfS], [BKPR02], [MSHJSC+ 04], [Nto] is highly difficult. 1.2. Goal and Plan of the Work The main goal of this Dissertation is to prove that: Entropy-based approach is suitable for detection of modern botnet-like malware in local networks based on network anomalies characteristic for such a malware. We will try to find the answer for the following questions: – Are entropy measures useful in the context of network anomaly detection? – Is it possible to effectively detect and classify small and low-rate anomalies connected with botnetlike malware activity in local networks by means of entropy? – Is entropy-based approach better than traditional volume-based approach? – Do parameterized entropies help to improve results obtained for Shannon entropy? – What is the proper set of parameters for entropies to successfully detect network anomalies? – Which network features should be taken into consideration in order to detect broad spectrum of anomalies connected with botnet-like malware? – Which popular classifiers work fine with entropy-based approach? It is assumed that the goal of this work can be reached in the following steps: 1. Preparation of a concept of original entropy-based network anomaly detection method. P. Bereziński Entropy-based Network Anomaly Detection 1.3. Original contribution 11 2. Implementation of the method. 3. Preparation of original dataset (due to the lack of appropriate benchmarking data available). 4. Evaluation of the method. These steps are discussed in detail in the further part of the Thesis that is organized as follows: – Chapter 2 reviews related work in the area of network anomaly detection. General overview of the latest advances in this broad subject as well as a detailed review of anomaly detection techniques that are closely related to the approach proposed in this Dissertation are presented. Additionally, some comments on existing datasets for evaluating network anomaly detection systems are included. – Chapter 3 provides a brief overview of the approach taken to prove the Thesis and it introduces the reader to the proposed method. The main features as well as a general classification of the method are presented. – Chapter 4 introduces the definition of Shannon entropy and describes Renyi and Tsallis generalizations. Brief overview as well as comparison of entropy measures based on simulations is provided. – Chapter 5 describes the concept of network flows and provides comparison of this technique with widely used packet-based approach. Additionally, the NetFlow [Cla04] export setup prepared to interact with the proposed method is presented. – Chapter 6 presents the architecture of the proposed method. Detailed specification as well as results of implementation are given. – Chapter 7 refers to the dataset developed to evaluate performance of the proposed method. – Chapter 8 presents results of verification of the method. – Chapter 9 finishes this Dissertation providing conclusions and a short summary. It also outlines future work. 1.3. Original contribution The approach proposed in this Dissertation is superior to state of the art in several aspects. The following issues are considered to be original contribution of the Thesis: 1. The use of entropy-based approach to detect botnet-like malware in local networks. 2. Concept and implementation of an original entropy-based network anomaly detection method. 3. Comparison of different entropy measures to use in entropy-based network anomaly detection. 4. Selection of a proper set of α-values for parameterized entropies and proper set of network features to successfully detect various network anomalies. P. Bereziński Entropy-based Network Anomaly Detection 1.4. Exclusions 12 5. Comparison of performance of different classifiers to work with the proposed method. 6. Comparison of entropy measures with volume-based counters to use in network anomaly detection. 7. Preparation of the original dataset which includes anomalies specific for network activity of modern botnet-like malware. 8. Detailed performance evaluation of the method by means of both standard and novel (introduced for the purpose of this Thesis) metrics. 1.4. Exclusions Network anomaly detection is a broad topic. Some of the issues that are deliberately not addressed in this Thesis are presented below. 1. This Thesis does not cover the aspects of detecting anomalies or attacks visible in IP packets and their payloads. This is mainly due to the fact that such anomalies are easly detectable with signature-based approach until the attack is not known or network traffic is not encrypted. 2. There is no empirical evaluation of the proposed method working on-line in real environment since it is planned for a future work. 3. There is no comparison of the method with other summarization techniques such as histograms or sketches in this Thesis. The main reason is lack of publicly available implementations of these methods. Moreover, such a comparison would be difficult and results could be inaccurate since the performance of these methods strongly depends on a proper tuning. 4. There is no evaluation of the proposed method with publicly available dataset as during preparing this Thesis none of them met all necessary requirements such as completeness, timeliness and correctness. This Thesis has been partially supported by the Polish National Centre for Research and Development, under the project no. PBS1/A3/14/2012 SECOR and the project no. 01.01.02-00-062/09 CybSecLab and by the European Regional Development Fund the Innovative Economy Operational Programme, under the project no. 01.01.02-00-062/09 INSIGMA. P. Bereziński Entropy-based Network Anomaly Detection 2. Related work This chapter reviews related work in the area of network anomaly detection. The chapter starts with a general overview of the latest advances in this broad subject. Then, more details on anomaly detection techniques that are closely related to the approach proposed in this Dissertation are presented and comments are provided. Finally, some remarks on existing datasets for evaluating network anomaly detection systems are given. 2.1. General overview of network anomaly techniques The problem of anomaly detection in network traffic has been extensively stud- ied. There are many surveys, review articles, as well as books on this broad subject. A great number of research on anomaly detection techniques is found in several books, e.g. [WFH11], [BK13], [Agg13], [HTF09]. In surveys such as [CBK09], [HA04], authors discuss anomaly detection in general and cover the network intrusion detection domain only briefly. In several review papers [ETGTDV04], [PP07], [Cal09], [CKS+ 09], [GTDVMFV09] various network anomaly detection methods have been summarized. Recent, well-structured and comprehensive survey on anomaly-based network intrusion detection in terms of general overview, techniques, systems, tools and datasets with a discussion of challenges and recommendations is presented by Bhuyan et al. [BBK13]. The review of network intrusion detection by Sperotto et al. [SSS+ 10] where valuable comparison of packet-based and flow-based approach is provided is another paper worth mentioning. From the aforementioned surveys it follows that the most effective methods of network anomaly detection include Principle Component Analysis, Wavelets, Markovian models, Clustering, Histograms, Sketches, and Entropies. To familiarize the reader with these techniques and to facilitate understanding of Section 2.2 a short description of each of them is presented below. Principle Component Analysis (PCA) is a popular dimension reduction technique in machine learning [HNG+ 07], [SCSC03], [LYW13]. PCA transforms a set of correlated random variables to a new coordinate system that is given by the principal components. Simply speaking, PCA is a technique where a set of correlated random variables is transformed into smaller set of uncorrelated ones. The uncorrelated variables are linear combinations of the original ones and can be used to express the data in a reduced form. Wavelet transformation is one of the techniques of time-frequency transforma- tions [LG09], [LTG08], [LWK10]. It is used for analyzing localized variations of power within 13 2.2. Closely related work 14 a timeseries. By decomposing a timeseries into time–frequency space, one is able to find the dominant modes of variability and determine how those modes vary in time. There are some important differences between well-known Fourier analysis [YZX+ 04] and wavelets. Fourier functions are localized in frequency but not in time. Small frequency changes in Fourier transform will produce changes everywhere in the time domain. Wavelets are local in both frequency and time. This localization is an advantage in many cases. Markov models are very useful for modeling sequences [YZB04], [SZH+ 13]. For a given system, a Markov model consists of a list of possible states, possible transition paths between those states and rate parameters of those transitions. The simplest Markov model is a Markov chain. It models the state of a system with a random variable that changes through time. The distribution for this variable depends only on the distribution of the previous state. A hidden Markov model [JP05] is a Markov chain for which the state is only partially observable. In other words, observations are related to the state of the system, but they are typically insufficient to precisely determine the state. Cluster analysis (or clustering) is a technique used to group objects of a similar kind into respective categories [SPBW12], [REHA13], [BSS+ 14]. This technique is based on unlabeled data. In machine learning, methods that use labeled samples are said to be supervised and methods which rely on unlabeled samples are said to be unsupervised [Alp10]. Clustering can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how clusters are identified. Usually, clustering-based techniques require distance computation between a pair of objects. Histograms, sketches and entropy-based approaches are methods that summarize random variable distributions, e.g. distribution of addresses or ports in the domain of network anomaly detection. Histogram-based methods divide the entire range of values of distributions into a series of small intervals called bins [KSD09], [SST+ 04]. Sketch-based approach relies on a set of histograms where the elements are assigned to bins using a set of different hash-functions [SLBK08], [BDWS09]. Entropy is a measure of the uncertainty connected with a random variable [Sha48]. In general the more random the variable the higher the entropy. Entropy summarizes a probability distribution with a single value, which can be conveniently used to compare certain qualitative differences of probability distributions. Entropy fits well to network anomaly detection, because some attacks or anomalies result in concentrating or dispersing probability distributions of network features [NSA+ 08], [TBSM09]. 2.2. Closely related work In this section a closer look at works strictly related to approach proposed in this Dissertation is taken. The review of detection methods based on summarizing network feature distributions via entropy, histograms and sketches is provided. Special attention is devoted to the methods employing different forms of entropy. Some comments related to noticed gaps are given. The section starts with the comparison of the network feature distributions approach to the older but still more popular detection via network volume counters. P. Bereziński Entropy-based Network Anomaly Detection 2.2. Closely related work 15 2.2.1. Detection via network volume counters In the past, network anomalies were treated as deviations in the traffic volume. Simple counters such as number of flows, packets (total, forwarded, fragmented, discarded) and bytes (per packet, per second) were used. These counters can be derived from network devices via Simple Network Management Protocol (SNMP) [HPW02] or NetFlow [Cla04], [SBCQ09]. Barford et al. [BKPR02] presented wavelet analysis to distinguish between predictable and anomalous traffic volume changes using a very basic set of counters from NetFlow and SNMP data. They used the advanced signal analysis technique combined with very simple metrics, i.e. number of flows, packets and bytes. The authors reported some positive results in detection of high-volume anomalies such as network failure, bandwidth flood and flash crowd. Kim et al. [MSHJSC+ 04] proposed a method where many different Distributed Denial of Service (DDoS) attacks are described in terms of traffic patterns in a flow characteristics. In particular, the authors focused on counters like: number of flows, packets, bytes, the flow and packet sizes, average flow size and number of packets per flow. In a presented TCP SYN flood example, the following pattern has been applied: a large number of flows, yet small number of small packets and no constraints on the bandwidth and the total amount of packets. This pattern differs significantly from the one generated for an ICMP/UDP flooding attack, where high bandwidth consumption and a large number of packets is involved. Although the authors reported some good results, they also mentioned that common legitimate peer-to-peer (P2P) traffic may result in some false alarms in their approach. A threshold-based detector measuring the deviation from a mean value present in a traffic collection algorithm for frequent collection of SNMP data was proposed by Lee et al. [LPKL09]. To assess the algorithm, the authors examined how it impacts detection of volume anomalies. Only some minor differences were reported in comparison to the original traffic collection algorithm. Casas et al. [CFVN09] introduced an anomaly detection algorithm based on SNMP data which deals with abrupt and large traffic changes. The authors proposed a novel linear parsimonious model for anomaly-free network flows. This model makes it possible to treat the legitimate traffic as a nuisance parameter, to remove it from the detection problem and to detect the anomalies in the residuals. Authors reported that with this approach they slightly improved the previously introduced approach based on PCA in terms of false alarms. Many commercial and open source solutions that rely on SNMP or NetFlow counters are available on the market, e.g. NFSen [NfS], NtopNg [Nto], Plixer Scrutinizer [Scr], Peassler PRTG [Prt], and Solarwinds Network Traffic Analyzer [Sol]. All of them provide more or less the same functionality: – browsing and filtering network data; – statistics overview, e.g. top-talkers, i.e. hosts or services that exchanged most traffic; – reporting, e.g. bandwidth reports, i.e. which user exchanged how much traffic; – alerting when traffic thresholds are exceeded or some rules describing anomalous behavior are matched. P. Bereziński Entropy-based Network Anomaly Detection 2.2. Closely related work 16 Several solutions available on the market, e.g. Invea-Tech FlowMon [Floc] or AKMA Labs FlowMatrix [Floa] offer some anomaly detection methods which mostly rely on predefined set of rules for detection of undesirable behavior patterns, and some simple long-term network behavior profiles in terms of services, traffic volume and communication sides. Although vendors classify their solutions as anomaly detection, usage of rule-based heuristic describing well known patterns corresponds more to the signature-based approach. Concluding this subsection, we noticed that although there are many methods that rely on counters, their capabilities are limited. The main problem with a counter-based approach it mostly rely on traffic volume . Nowadays, many network attacks or anomalies such as low-rate DDoS, stealth scanning or botnet-like worm propagation and communication do not result in substantial traffic volume change. The presented counter-based methods handle well large and abrupt traffic changes such as bandwidth flooding attacks or flash crowds, but a large group of anomalies which do not cause changes of volume remains undetected. Moreover, there is also a practical issue connected with counters reported by Brauckhoff et al. [BTW+ 06] who stated that packets sampling used by many routers to save resources when collecting data can influence a counter-based anomaly detection metrics, but does not significantly affect the distribution of network features. 2.2.2. Detection via network feature distributions Network anomaly detection via network feature distributions is becoming more and more popular. Several feature distributions, i.e. header-based (addresses, ports, flags), volume-based (host or service specific percentage of flows, packets and bytes) and behavior-based (in/out connections for particular host) have been suggested in the past [LCD05], [NSA+ 08], [TBSM09]. However, it is unclear which network feature distributions perform best. Nychis in [NSA+ 08], based on his results of pairwise correlation, reported dependencies between addresses and ports and recommended the use of volume-based and behavior-based feature distributions. In contrast, Tellenbach in [TBSM09] found no correlation among header-based features. In this Dissertation, an original results of network features correlation are presented and some interesting conclusions are given. Shannon Entropy Entropy as the measure of uncertainty can be used to summarize feature distributions in a compact form, i.e. single number. Many forms of entropy exist, but only a few have been applied to network anomaly detection. The most popular is the well-known Shannon [Sha48] entropy. Application of Shannon measures such as relative entropy and conditional entropy to conduct network anomaly detection were proposed by Lee and Xiang [LX01]. Also, Lakhina et al. [LCD05] made use of Shannon entropy to sum up feature distributions of network flows. By using unsupervised learning, the authors showed that anomalies can be successfully clustered. Wagner and Plattner [WP05] made use of the Kolmogorov Complexity, which is related to Shannon entropy [GV03], [TMSA11], in order to detect worms in network traffic. Their work mostly focuses on implementation aspects and scalability and does not propose any specific analysis techniques. The authors reported that the method is able to detect worm outbreaks and massive scanning activities in a near real time. Ranjan et al. [RSN+ 07] suggested another worm deP. Bereziński Entropy-based Network Anomaly Detection 2.2. Closely related work 17 tection algorithm which measures Shannon entropy ratios for traffic feature pairs and issues an alarm on sudden changes. Gu et al. [GMT05] made use of Shannon maximum entropy estimation to estimate the network baseline distribution and to give a multi-dimensional view of network traffic. The authors claim that with their approach they were able to distinguish anomalies that change the traffic either abruptly or slowly. Iglesias et al. [IZ14] proposed a fast, lightweight method to distinguish different attack types observed in the IP darkspace monitor. The method is based on Shannon entropy measures of network features and machine learning techniques. The explored data belongs to a portion of the Internet background radiation from a large IP darkspace. Generalized entropy Besides Shannon entropy, several generalizations of entropy have been recently introduced in the context of network anomaly detection. Einman in [SEB07], [ESB05], [Eim08] reported some positive results of using T-entropy [TNS+ 05] for intrusion detection based on analysis of packets. T-entropy can be estimated from a string complexity measure called T-complexity [TNS+ 05]. String complexity is a minimum number of steps required to construct a given string. In contrast to entropy, where probabilities (estimated from frequencies) can be permuted, in a complexity-based approach, the order matters. A string is compressed with an algorithm and the output length is used to estimate the complexity. Finally, the complexity becomes an estimate for the entropy. Because in this approach sequence of events is crucial, it fits to the fine-grinded methods of network data analysis such as full packet or packet header inspection. The problem is, that this type of inspection is not scalable in the context of network speed. Some details about T-entropy are presented in our paper [PBPC12]. A parameterized generalization of entropy has also been recently reported as very promising. The Shannon entropy assumes a tradeoff between contributions from the main mass of the distribution and the tail. With the parameterized Tsallis [Tsa88] or Renyi [Ren70] entropy, one can control this tradeoff. In general, if the parameter denoted as α has a positive value, it exposes the main mass, if the value is negative – it refers to the tail. Ziviani et al. [ZGMR07] investigated Tsallis entropy in the context of the best value of α parameter for DoS attacks detection. They found that α-value around 0.9 is the best for detecting such attacks. Shafiq et al. [SKF08] did the same for port scan anomalies caused by malware. He reported that α-value around 0.5 is the best choice to detect scan anomalies. A comparative study of the use of the Shannon, Renyi and Tsallis entropy for attribute selecting to obtain an optimal attribute subset, which increases the detection capability of decision tree and k-means classifiers was presented by Lima et al. [LAS12]. The experimental results demonstrate that the performance of the models built with smaller subsets of attributes is comparable and sometimes better than that associated with the complete set of attributes for DoS and scan attack categories. The authors found, that for the DoS category, Renyi entropy with α-value around 0.5 and Tsallis entropy with α-value around 1.2 are the best for decision tree classifier. We believe that, the proper choice of the α-value depends either on the anomaly or the legitimate traffic used as a baseline, or for both, since none of the authors mentioned above reported similar results. Thus, goals such as finding the proper value of parameter for entropy in order to improve detection of particular group of anomalies will remain unachieved. Some authors, e.g. Tellenbach et al. [TBSM09], [TBS+ 11], [Tel12] employed a set of α-values in their methods. The authors proposed the Traffic Entropy Telescope prototype based on Tsallis entropy capable to detect a broad spectrum of anomalies in a backbone traffic P. Bereziński Entropy-based Network Anomaly Detection 2.3. Existing Datasets 18 including fast-spreading worms (not so common nowadays), scans and different form of DoS/DDoS attacks. Although Tsallis entropy seems to be more popular than Renyi entropy in the context of network anomaly detection, the latter was also successfully applied in detection of different anomalies. An example is the work by Yang et al. [YKW11] who employed Renyi entropy to early detection of low-rate DDoS attacks, and Kopylova et al. [KBHJ08] who reported positive results of using Renyi conditional entropy in detection of selected worms. We believe that with parameterized entropy some limitations of Shannon entropy caused by small descriptive capability [Tel12] which results in a little ability to detect typical small or low-rate anomalies can be overcome. Moreover, we think that with properly chosen set of α-values this detection will be accurate in terms of low number of false alarms and high detection rate. In this Thesis we present original results of our research on the proper set of α-values as well as original research on the most suitable entropy type. Other techniques Apart from entropy, some other feature distributions summarization techniques are successfully used in the context of network anomaly detection, namely sketches and histograms. Soule et al. [SST+ 04] proposed a flow classification method based on modeling network flow histograms using Dirichlet Mixture Processes for random distributions. The authors validated their model against three synthetic test cases and achieved almost 100% accuracy. In [SLBK08], Stoecklin et al. introduced a two-layered sketch anomaly detection technique. The first layer models typical values of different feature components, e.g. typical number of flows connecting to a specific port while the second layer evaluates the differences between an observed feature distribution and a corresponding model. The authors claim that the main strength of their method is the construction of fine-grained models that capture the details of feature distributions, instead of summarizing it into an entropy value. A more general approach was presented by Kind et al. [MSHJSC+ 04]. In their method, histogram-based baselines were constructed from some essential network feature distributions such as addresses and ports. This work was augmented by Brauckhoff et al. in [BDWS09], who applied association rule mining, in order to identify flows representing anomalous network traffic. Although the non-entropic feature distributions summarization techniques seem to work fine, proper tuning is the main problem with them [Tel12]. The performance of detection depends, to a great extent, on the accuracy of a bin size. This may be difficult to set and control while network traffic changes. 2.3. Existing Datasets One of the main problems in network anomaly detection is the lack of good and publicly available datasets for evaluation purposes. The authors of research in this area have noticed this situation [CRKM11], [Owe10], [EDD+ 13], [GGSZ14]. Some of the research works employ "what is available", that is, datasets that are outdated (from the point of view of both legitimate traffic and anomalies they contain); some works are based on own datasets, prepared for the sole purpose of evaluating a proposed method "somehow" – as the dataset creation was not the goal in itself, its quality is usually limited. In our paper [MBM15] a detailed review of the existing datasets is presented, requirements are defined and dataset preparation methods are described. Real network traces are the most valuable P. Bereziński Entropy-based Network Anomaly Detection 2.3. Existing Datasets 19 but because of privacy issues they are rarely published. One possible solution for privacy is anonimization [CMRB09], [KAA+ 06], [FAAM07]. The goal of anonymization is to preserve the structure of the data while at the same time preserve privacy policies. Finding the right balance sometimes may be a difficult task [SSTG12]. Another problem with real traces is a proper labeling, which in many cases has to be done manually. Real traffic traces can be found in some publicly available repositories, such as Internet Traffic Archive [ITA], LBNL/ICSI Enterprise Tracing [LBN], SimpleWeb [Sim], Caida [Cai], MOME [MoM], WITS [WIT], UMASS [UMa]. Unfortunately, these traces are usually old, unlabeled and not dedicated to anomaly detection. Alternative approaches cover synthetic or semisynthetic datasets. To build such dataset, a deep domain knowledge and appropriate methods and tools are required in order to get realistic data. According to Brauckhoff et al. [BWM08], a realistic simulation of legitimate traffic is largely an unsolved problem today and combining synthetic anomalies with real, background traffic traces is one of the solutions. In [BWM08] and then in [Bra10] she introduced the FLAME tool which allows injection of hand-crafted anomalies into a given legitimate traffic flow trace. This tool is freely available but the current distribution does not include any models reflecting anomalies. Another interesting concept was introduced by Shiravi et al. [SSTG12]. The authors proposed to describe network traffic (not only flows) by a set of so-called α and β profiles which can subsequently be used to generate a dataset. The α-profiles consist of actions which should be executed to generate a given event in the network (such as attack) while in β-profiles certain entities (packet sizes, number of packets per flow) are represented by a statistical model. Regrettably, this solution is not freely available. Lack of traces of botnet-like malware behavior in available network datasets questions their timeliness. This type of traces should be included in contemporary datasets and researches should address anomalies typical for botnet-like malware in their methods as nowadays they are one of the main threat. The number of datasets containing botnet-like malware anomalies is limited. Worth mentioning are these prepared by Shiravi et al. [STG+ 11] and Garcia et al. [GGSZ14]. The first one is a mixture of malicious and non-malicious datasets. Unfortunately only one host in this datasets is infected with a botnet-like malware. The second dataset which has been made public recently is much richer and consist of traces of 13 different scenarios of running bots from 7 different families. It is obtained by running real (mostly unmodified) malware on a subnetwork of infected hosts in a lab environment. This traffic has been mixed with background traffic coming from real network. A controversial (but beneficial from the point of view of the resulting dataset) decision was not to restrict botnet communication with the Internet in any way. For privacy reasons, the dataset contains NetFlow data; additionally, full packet capture of botnet activity is included. The dataset is carefully labeled, although the whole traffic from infected hosts was marked as hostile. Unfortunately this dataset was unavailable while preparing this Thesis. An interesting dataset has been also prepared by Sperotto et al. [SSVP09]. This dataset is based on data collected from a real honeypot (an isolated and monitored trap) which was running for several days. The honeypot featured common network services such us HTTP, SSH and FTP. The authors gathered about 14 million malicious network flows and most of them referred to activity of web and network scanners. Some details about particular anomalies in this dataset are also presented in our paper [BPMP14]. Even though some valuable datasets are emerging, many researchers still make use of very old and criticized DARPA [HLF+ 01] dataset and its modified versions, namely, KDD99 [KDD] and NSL-KDD [TBLG09]. Besides strong P. Bereziński Entropy-based Network Anomaly Detection 2.4. Summary 20 criticism by McHugh [McH00], Mahoney et al. [MC03] or Thomas [TSB08] for being unrealistic and not balanced, nowadays DARPA datasets are simply out of date in the context of network services and attacks. 2.4. Summary As one can see, network anomaly detection is a very broad and heavily explored area. The problem of a generic anomaly detection method for network anomalies is still unsolved. The widely used security solutions are ineffective against modern botnet-like malware. Feature distribution approach is very promising. To summarize feature distributions application of entropy seems to be the best choice. Entropy fits well to network anomaly detection, because some network attacks or anomalies result in concentrating or dispersing probability distributions of network features but do not result in significant traffic volume change. It seems that with parameterized entropy some limitations of Shannon entropy caused by small descriptive capability, which results in a little ability to detect typical small or low-rate anomalies, can be overcome. Usage of a broad spectrum of α-values seems to be crucial because unlike Ziviani, Shafiq or Lima we do not believe that it is possible to find a single α-value that fits to particular anomaly type. None of the authors adopt entropy to detect anomalies indicating botnet-like malware. Current methods are dedicated to detecting massive worm spreads (not popular nowadays) and DDoS attacks in high speed networks. The problem of finding a proper set of α-values, proper set of network feature and proper classification (not just detection) method in order to find not only massive but also small and low-rate anomalies, such as these typical to botnet-like behavior in local networks, remains intact. This may contribute to the current state of the art in a botnet detection which is limited to some non-entropic methods, e.g. method proposed by Livadas et al. [LWLS06] who proposed a machine learning technique to identify the C&C traffic of IRC-based botnets, Francois et al. [FWB+ 11] who presented a system that uses the PageRank algorithm to detect different families of peer-to-peer botnets via network flows and Bilge et al. [BBR+ 12] who proposed advanced knowledge-based botnet hunting system named DISCLOSURE. The possibility of use of parameterized entropies for detection of anomalies connected with botnet-like malware has been confirmed in the following chapters. Because of the lack of a realistic, up-to-date and representative datasets, additional effort to develop labeled traces based on real legitimate traffic and synthetic anomalies [BSJM14] reflecting botnet-like activity in local network had to be also taken. P. Bereziński Entropy-based Network Anomaly Detection 3. Entropy-based network anomaly detector – preface In order to prove the claim of the Thesis, an entropy-based network anomaly detection module named Anode has been proposed. It is developed to cooperate with the existing signature-based or known pattern-based security solutions such as the popular Intrusion Detection Systems, e.g. Snort [Roe99], Bro [Pax99] as well as Flow-based Network Traffic Analyzers, e.g. NfSen [NfS], NtopNg [Nto]. We used such sulutions in SOPAS system [CKP+ 11], [BlPJ12], [JPB+ 12] developed to protect a set of connected heterogenous systems which are not centrally managed. Currently, Anode is a component of the anomaly detection and security event data correlation system developed in SECOR [JSl14a] project which is SOPAS’ successor. In SECOR, Anode is expected to detect network anomalies with acceptable False Positive Rate [Faw06] and high True Positive Rate [Faw06], categorize anomalies and report some details (timestamps, related addresses and ports) to the correlation engine which correlates events coming from different anomaly detection modules and external sensors, such as the aforementioned Snort, in order to improve detection and limit false alarms. SECOR anomaly detectors are not only limited to network. For example, one of the components named PRONTO [JSl14a], [JSl14b] detects obfuscated malware at infected hosts. General operating principle of Anode is presented in Fig. 3.1. Figure 3.1: Anode – Entropy-based network anomaly detection module Anode analyzes network flows. Various network feature distributions based on flows, e.g. addresses, ports, are summarized by means of entropy. There are two phases: training and detection. In the training phase, a profile of legitimate traffic is built and a model for classification is prepared. In the detection 21 3.1. Main features 22 phase, current observations are compared with the model. An abnormal dispersion or concentration for different network feature distributions indicates anomaly. Extraction of anomaly details is also assumed – related ports and addresses are obtained by looking into the top contributors to the entropy value. A much more detailed description of the architecture is provided in Chapter 6. 3.1. Main features The main features of Anode are presented below: – off-line and on-line analysis of network flows within fixed time intervals; – supervised machine learning with training and detection phases; – multi-class classification; – summarization of network feature distributions with parameterized Tsallis or Renyi entropy; – use of selected range of α-values for entropy instead of single value which fits well; – use of selected set of network features in order to detect a broad spectrum of anomalies; – use of fine-grained legitimate network traffic profile; – anomaly evidence extraction by reporting ip addresses and ports of attackers and victims. 3.2. Classification of the approach On the basis of the main features, according to Figure 3.2, one can classify our approach as: – anomaly detection; – Network-based Intrusion Detection System (NIDS); – having a centralized architecture; – with a detection module fed up by the network traffic data; – analyzing incidents off-line and on-line. P. Bereziński Entropy-based Network Anomaly Detection 23 Figure 3.2: Features of detection methods (based on [Ren11]) 3.2. Classification of the approach P. Bereziński Entropy-based Network Anomaly Detection 4. Entropy This chapter presents an introduction to the theoretic fundamentals of entropy. It starts with a brief overview of Shannon entropy. Next, the parameterized generalizations are presented – this part is especially important as we decide to use this form of entropy in the approach presented in this Dissertation. Finally, a comparison of entropy measures based on simulations is provided. 4.1. Shannon entropy Definition of entropy as a measure of disorder comes from thermodynamics and was proposed in the early 1850s by Clausius [CH67]. In 1948 Shannon [Sha48] adopted entropy to information theory. In information theory, entropy is a measure of the uncertainty associated with a random variable. The more random the variable, the bigger the entropy, and in contrast, the greater certainty of the variable, the smaller the entropy. For a probability distribution p(X = xi ) of a discrete random variable X, the Shannon entropy is defined as: Hs (X) = n X p(xi ) loga i=1 1 p(xi ) (4.1) X is the feature that can take values {x1 ...xn } and p(xi ) is the probability mass function of outcome xi . The entropy of X can be also interpreted as the expected value of loga 1 p(X) where X is drown ac- cording to probability mass function p(x). Depending on the base of the logarithm, different units can be used: bits (a = 2), nats (a = e) or hurtleys (a = 10). For the purpose of network anomaly detection, sampled probabilities estimated from a number of occurrences of xi in a time window t are typically used. The value of entropy depends on randomness (it attains maximum when probability p(xi ) for every xi is equal) but also on the value of n. In order to measure randomness only, normalized forms have to be employed. For example, an entropy value can be divided by n or by maximum entropy defined as loga (n). Some important properties of Shannon entropy are listed below. More properties can be found in [Kar03] and [Csi08]. – Nonnegativity ∀p(xi )∈[0,1] Hs (X) ≥ 0 – Symmetry Hs (p(x1 ), p(x2 ), ...) = Hs (p(x2 ), p(x1 ), ...) – Maximality Hs (p(x1 ), ..., p(xn )) ≤ Hs ( n1 , ..., n1 ) = loga (n) – Additivity Hs (X, Y ) = Hs (X) + Hs (Y ) if X and Y are independent variables 24 25 4.2. Parameterized entropy If not only the degree of uncertainty is important but also the extent of changes between assumed and observed distributions, denoted as q and p respectively, a relative entropy, also known as the KullbackLeibler divergence [Kul59], [Csi08] can be used: DKL (p||q) = n X p(i) loga i=1 p(i) q(i) (4.2) This definition is not symmetric, i.e. DKL (p||q) 6= DKL (q||p) unless p = q. To measure how much uncertainty is eliminated in X by observing Y the conditional entropy (or equivocation) [CT06] may be employed: m X n X HS (X|Y ) = p(xi , yj ) loga p(xi |yj ) (4.3) i=1 j=1 4.2. Parameterized entropy The Shannon entropy assumes a tradeoff between contributions from the main mass of the distribution and the tail [MD08]. To control this tradeoff, two parameterized Shannon entropy generalizations were proposed by Renyi (1970s) [Ren70] and Tsallis (late 1980s) [Tsa88] respectively. In general, if the parameter denoted as α has a positive value, it exposes the main mass (the concentration of events that occur often), if the value is negative – it refers to the tail (the dispersion caused by seldom events). Both parameterized entropies (Renyi and Tsallis) are derived from the Kolmogorov-Nagumo generalization of an average [Mar05], [W˛e12]: hXiφ = φ −1 n X ! p(xi )φ(xi ) , (4.4) i=1 where φ is a function which satisfies the postulate of additivity (only affine or exponential functions satisfy this) and φ−1 is the inverse function. Due to affine transformations φ(xi ) → γ(xi ) = aφ(xi ) + b (where a and b are numbers), the inverse function φ(xi ) is expressed as γ −1 (xi ) = φ−1 ( xia−b ) Renyi proposed the following function φ: φ(xi ) = 2(1−α)xi (4.5) Renyi entropy can be obtained from the Shannon entropy with the following transformations: HRα (X) = φ−1 n X ! p(xi )φ(− log2 p(xi )) i=1 Given φ(xi ) = 2(1−α)xi and φ−1 (xi ) = 1 (1−α) log2 xi P. Bereziński Entropy-based Network Anomaly Detection (4.6) 26 4.2. Parameterized entropy 1 HRα (X) = log2 1−α 1 log2 = 1−α 1 = log2 1−α 1 log2 = 1−α n X i=1 n X i=1 n X i=1 n X ! p(xi )2−(1−α) log2 p(xi ) ! log2 p(xi )(α−1) p(xi )2 (4.7) ! p(xi )p(xi ) (α−1) ! p(xi )α i=1 After transformation, a well-known form of Renyi entropy is obtained: 1 HRα (X) = loga 1−α n X ! p(xi )α (4.8) i=1 The Renyi entropy satisfies the same postulates as the Shannon entropy and there are the following relations between these two: HRα1 (X) ≥ HS (X) ≥ HRα2 (X) 1 loga α→1 1 − α lim (4.9) where α1 < 1 and α2 > 1 ! n n X X α = Hs (X) = p(xi ) loga p(xi ) i=1 i=1 1 p(xi ) (4.10) Tsallis proposed the following function φ: 2(1−α)xi − 1 1−α After transformation, a well-known form of Tsallis entropy is as follows: φ(xi ) = 1 HT α (X) = 1−α n X (4.11) ! p(xi )α − 1 (4.12) i=1 As it can be seen this entropy is non logarithmic. There are the following relations between the Shannon and the Tsallis entropy: HT α1 (X) ≥ HS (X) ≥ HT α2 (X) 1 α→1 1 − α lim n X where α1 < 1 and α2 > 1 ! p(xi )α − 1 = log 2Hs (X) = log 2 i=1 (4.13) n X i=1 p(xi ) loga 1 p(xi ) (4.14) Moreover, the Tsallis entropy is nonextensive, i.e. it satisfies only pseudo-additivity criteria. For an independent discrete random variables X,Y : HT α (X, Y ) = HT α (X) + HT α (Y ) + (1 − α)HT α (X) + HT α (Y ). It means that: P. Bereziński Entropy-based Network Anomaly Detection (4.15) 27 4.3. Comparison HT α (X, Y ) > HT α (X) + HT α (Y ) for α ∈ (−∞, 1) and HT α (X, Y ) < HT α (X) + HT α (Y ) for α ∈ (1, ∞) To summarize parameterized (Renyi and Tsallis) entropies. Both of them: – expose concentration for α > 1 and dispersion for α < 1; – converge to the Shannon entropy for α → 1. 4.3. Comparison In order to understand, compare and successfully apply parameterized entropies in our approach, some simulation experiments were conducted. Firstly, a comparison of Shannon, Renyi and Tsallis entropy of a binominal probability distributions was performed. Then, calculated entropies for a uniform distribution were compared to check how they depend on a number of equal probabilities and α-values. Next, the impact of rare and frequent events on the entropy for different α-values was examined. Finally, we looked at exemplary network feature distribution of addresses and ports in order to summarize them with Renyi and Tsallis entropy. 4.3.1. Binominal distribution Shannon, Renyi and Tsallis entropy for a binominal probability distribution where the probability of success is p, and the probability of failure is 1−p is depicted in Fig. 4.1, Fig. 4.2 and Fig. 4.3 respectively. 1 0.9 0.8 HS 0.7 0.6 0.5 0.4 0.3 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 P Figure 4.1: Shannon entropy – binominal distribution It is noticeable that maximum entropy for Shannon is obtained when p = 1 − p. Renyi and Tsallis converge to the Shannon entropy for α → 1. Note: according to Eq. 4.14 values of Tsallis entropy need P. Bereziński Entropy-based Network Anomaly Detection 28 4.3. Comparison 3 α = −2 α = −1 α=0 2.5 α=1 α=2 HRα 2 1.5 1 0.5 0 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 P Figure 4.2: Renyi entropy of several α-values – binominal distribution 3 α = −0.5 α = −0.1 α=0 α=1 2.5 α=2 HT α 2 1.5 1 0.5 0 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 P Figure 4.3: Tsallis entropy of several α-values – binominal distribution P. Bereziński Entropy-based Network Anomaly Detection 29 4.3. Comparison Shannon = Renyi α ∈ (−∞, ∞) Tsallis α = −0.1 10 Tsallis α = 2 H(X) 8 6 4 2 0 2 3 4 5 6 n 7 8 9 10 Figure 4.4: Shannon, Renyi and Tsallis entropy – uniform distribution to be multiplied by 1 log 2 to get the similar to Shannon curve for α → 1. For α ≥ 1 Renyi and Tsallis entropy behaves similar to Shannon as both reach maximum for p = 1 − p, although Tsallis maximum entropy changes with α, while Renyi maximum entropy is always equal to 1. For α ≤ 1 Tsallis and Renyi entropy curves are concave as in this case low probabilities are exposed. 4.3.2. Uniform distribution Shannon, Renyi and Tsallis entropy for a uniform probability distribution is depicted in Fig. 4.4. In this distribution maximum entropy (case when probabilities are equal) is calculated for different n representing number of equal probabilities. As it can be seen entropy always grows with n. Renyi entropy grows similarly to Shannon, no matter which α-value is used. Tsallis entropy behaves differently as it depends not only on n but also on α. 4.3.3. Impact of frequent and rare events Example Let us assume a discrete random variable X = addresses observed in network within last 1 min. X = {“10.1.0.1”, “10.1.0.2”, ”10.1.0.3”, ”10.1.0.4”, ”10.1.0.5”}, and the following number of occurrences for the subsequent addresses F req = {96, 1, 1, 1, 1}. Based on frequencies let us estimate the following probability distribution of X (see Table. 4.1). Let us examine what is an impact of a frequent event p(X = “10.1.0.1”) = 0.96 and rare event p(X = “10.1.0.2”) = 0.01 on the Renyi and Tsallis entropy when α = −2 and α = 2 values are used. To measure the impact of these events, we can check results of expotential expression p(xi )α existing in both Renyi and Tsallis formulas [Eq. 4.8, Eq. 4.12]. The results are presented in Table. 4.2. P. Bereziński Entropy-based Network Anomaly Detection 30 4.3. Comparison Table 4.1: Probability distribution of X X “10.1.0.1” “10.1.0.2” ”10.1.0.3” ”10.1.0.4” ”10.1.0.5” p(X = x) 0.96 0.01 0.01 0.01 0.01 Table 4.2: Impact of frequent and rare events on the value of parameterized entropy HH α HH H HH p(xi ) -2 2 0.96 1.08 0.92 0.01 10000 0.0001 As it can be seen the impact of frequent events (expressed by p(xi ) = 0.96) on the entropy is greater than impact of rare events (expressed by p(xi ) = 0.01) when positive α-values are used and in contrast, the impact of rare events is greater than that of frequent events when negative α-values are used. 4.3.4. Entropy of exemplary distributions In this section, an analysis of entropy value for sample distributions reflecting both legitimate and anomalous network traffic is performed. The aim of this experiments is to show, how via entropies, highlight concentration (frequent events forming the main mass) and dispersion (rare events forming the tail) caused by typical network anomalies such as port and network scans. This type of anomalies are specific for botnet-like malware. More details about scan anomalies can be found in [BI08] and [MFF14]. Experiments help to understand how parameterized entropies differ from Shannon entropy. Moreover, it allows to learn how Renyi, Tsallis and Shannon entropies differ in a context of sensitivity. Before we start analyzing distribution characteristic for anomalies, let us start with a very basic example with even, concentrated and dispersed distribution as presented in Fig. 4.5. On Y axis we have a number of occurrences of certain instances, e.g. addresses or ports which appear on X axis. Now let us calculate Tsallis, Renyi and Shannon entropy for each distribution. The results of this calculation are presented in Table 4.3. The change of entropy value in reference to even distribution for concentrated and dispersed distribution is presented in Table 4.4. As it can be seen Shannon and parameterized entropies behave similarly when positive α-values for parameterized entropies are used. Higher concentration reflects a decrease in the entropy while higher dispersion reflects an increase in the entropy value. For this case Renyi entropy seems to be the most sensitive. For negative α-values situation is slighty different. Parameterized entropies differ from Shannon because for both concentration and dispersion the value of entropy increases. This higher value of entropy for more concentrated distribution is due to the fact that in this case estimated (based on number of occurrences) probabilities in the tail are lower and more exposed by negative α-value. In general, for a negative α-values Tsallis entropy is far P. Bereziński Entropy-based Network Anomaly Detection 31 4.3. Comparison Even Occurrences 3 2 1 Instances Concentrated Occurrences 30 20 10 1 Instances Dispersed Occurrences 3 2 1 Instances Figure 4.5: Even, concentrated and dispersed distribution P. Bereziński Entropy-based Network Anomaly Detection 32 4.3. Comparison more sensitive than Renyi and Shannon. Table 4.3: Entropy values for even, concentrated and dispersed distributions Shannon Renyi α = 2 Renyi α = −2 Tsallis α = 2 Tsallis α = −2 even 3.78 3.64 4.07 0.92 1581 concentrated 2.79 1.82 4.67 0.72 5508 dispersed 4.82 4.7 5 0.96 10968 Table 4.4: Entropy value change in reference to even distribution Shannon Renyi α = 2 Renyi α = −2 Tsallis α = 2 Tsallis α = −2 concentrated −26% −50% +14% −22% +248% dispersed +27.5% +29% +22% +4% +594% Now, suppose we have the following distribution of source and destination addresses as well as destination ports for 1 minute of legitimate network traffic – Fig. 4.6. Again, on Y axis we have a number of occurrences of particular addresses or ports which appear on X axis. As we see, all distributions are quite even. Let us summarize these distributions by calculating Tsallis, Renyi and Shannon entropy – Table 4.5. Table 4.5: Entropy value for addresses and ports distributions - legitimate traffic Shannon Renyi α = 2 Renyi α = −2 Tsallis α = 2 Tsallis α = −2 src IP addresses 4.79 4.57 5.44 0.96 27437 dst IP addresses 4.65 4.18 5.48 0.94 29925 dst ports 3.75 2.85 5.42 0.86 26164 Now let us simulate two different types of anomalies in this traffic. In order to do it, we have to inject some characteristic concentration or dispersion to particular distributions. Port scan Typically, during a port scan, concentration in addresses and dispersion in ports is observable. Let us modify our distribution to simulate this type of anomaly. Suppose that a single host was scanned and the number of scanned ports was equal to 50. Modified distributions are depicted in Fig. 4.7. Now let us recalculate the entropy – Table 4.6 and compare new results with these for the legitimate traffic – Table 4.7. As it can be seen each entropy properly reported (as a value change) a concentration in source and destination addresses and dispersion in destination ports, although sensitivity of each entropy was different. For the concentration, the most significant change was obtained for Renyi with positive αP. Bereziński Entropy-based Network Anomaly Detection 33 4.3. Comparison Source IP addresses 5 Occurences 4 3 2 1 Instances Destination IP addresses Occurences 10 5 1 Instances Destination ports 20 Occurences 15 10 5 1 Instances Figure 4.6: Addresses and ports distributions – legitimate traffic P. Bereziński Entropy-based Network Anomaly Detection 34 4.3. Comparison value (about 50% decrease) and Tsallis with negative α-value (more than 200% increase). Dispersion in destination ports was the most distinctly exposed by negative α-values of Tsallis entropy (as more than 100 % increase). Table 4.6: Entropy value for addresses and ports distributions – port scan Shannon Renyi α = 2 Renyi α = −2 Tsallis α = 2 Tsallis α = −2 src IP addresses (conc.) 4.79 4.57 5.44 0.96 27437 dst IP addresses (conc.) 4.65 4.18 5.48 0.94 29925 dst ports (disp.) 3.75 2.85 5.42 0.86 26164 Table 4.7: Entropy value change in reference to legitimate traffic distributions – port scan Shannon Renyi α = 2 Renyi α = −2 Tsallis α = 2 Tsallis α = −2 src IP addresses (conc.) −23% −50% +10% −17% +217% dst IP addresses (conc.) −22% −46% +10% −16% +217% dst ports (disp.) +48% +54% +28% +10% +104% Network scan Typically, during a network scan, concentration in source addresses and destination ports as well as dispersion in destination addresses is observable. Let us modify our distribution to simulate this type of anomaly. Suppose that a single host scanned 100 hosts to check if particular serviceon these hosts is running. Modified distributions are depicted in Fig. 4.8. Now let us recalculate the entropy – Table 4.8 and compare new results with these for the legitimate traffic – Table 4.9. As it can be seen, each entropy properly reported (as a value change) concentration in source addresses and destination ports as well as dispersion in destination addresses, although similarly as in the previous example sensitivity of each entropy was different. For both concentration and dispersion the most significant change was obtained for Tsallis with negative α-value (more than 500% increase and more than 3500% increase respectively). Table 4.8: Entropy value for addresses and ports distributions – network scan Shannon Renyi α = 2 Renyi α = −2 Tsallis α = 2 Tsallis α = −2 src IP addresses (conc.) 2.83 1.4 6.35 0.62 180164 dst IP addresses (disp.) 6.83 6.37 7.21 0.99 1093036 dst ports (conc.) 2.43 1.35 6.33 0.61 171809 Different network anomalies cause concentration or dispersion in different network feature distributions. Not only the aforementioned addresses and ports can be used. It should be augmented by others, P. Bereziński Entropy-based Network Anomaly Detection 35 4.3. Comparison Source IP addresses 5 Occurrences 4 3 2 1 Instances Destination IP addresses Occurrences 10 5 1 Instances Destination ports Occurrences 20 15 10 5 1 Instances Figure 4.7: Addresses and ports distributions – port scan P. Bereziński Entropy-based Network Anomaly Detection 36 4.3. Comparison Source IP addresses 100 Occurences 75 50 25 1 Instances Destination IP addresses Occurences 10 5 1 Instances Destination ports 100 Occurences 75 50 25 1 Instances Figure 4.8: Addresses and ports distributions – network scan P. Bereziński Entropy-based Network Anomaly Detection 37 4.3. Comparison Table 4.9: Entropy value change in reference to legitimate traffic distributions – network scan Shannon Renyi α = 2 Renyi α = −2 Tsallis α = 2 Tsallis α = −2 src IP addresses (conc.) −41% −69% +16% −34% +556% dst IP addresses (disp.) +47% +52% +31% +5% +3555% dst ports (conc.) −35% −52% +16% −29% +556% e.g. flow duration, flow size or host degree (number of in/out connections) distributions. The whole set of network feature distributions employed in Anode is presented in Chapter 6. P. Bereziński Entropy-based Network Anomaly Detection 5. Network flows In this chapter we describe a network flow export technique, as it was chosen as a data source for the proposed method. The chapter starts with a general description and comparison of two main network traffic capture and analysis techniques, namely flow export and packet inspection based on packets capture. Then, some details related to network flows as well as some known problems and difficulties are presented. Finally, the flow export setup prepared to interact with Anode is described. 5.1. Flows vs. packets There are two popular methods of network traffic capture and analysis, namely packet inspection which is based on packet capture [KS12], [BDKC10], [Roe99], [Pax99], [Par13] and flow export [PSS+ 09], [SSSP12], [NfS], [Nto]. Detailed comparison of these methods can be found in [HCT+ 14], [SSS+ 10], [GHK14], [SF02]. Packet-based approach refers to the process of capturing individual packets and analyzing their headers and payloads, while flow-based approach is based on the ability of network devices to aggregate packets in flows. Packet inspection is found in widely used network intrusion detection systems, e.g. Snort [Roe99] and Bro [Pax99]. In an attempt to find known attacks or unusual behavior, such systems inspect the contents (header and payload) of every packet. This may consume a lot of resources when network speed is high. In addition, the spread of encrypted protocols poses a new challenge to this approach. In a flow export, aggregated information is captured and analyzed in order to find communication patterns within the network. Statistics on flows provide information about who communicates with whom, when, how long, how often, using what protocol and service and also how much data was transferred. Exporting of network flows was originally intended for accounting and network profiling but recently it becomes the popular source of data for a network anomaly detection. One of the reason of its popularity is scalability in the context of network speed. Moreover, flow export provides several other advantages compared to packet inspection. Firstly, flow export mechanism is widely deployed in popular network devices. Secondly, significant data reduction can be achieved – in the order of 1/2000 of the original volume, as was shown by Hofstede et al. in [HCT+ 14]. Lastly, flow export is usually less privacy-sensitive than packet capture since typically only packet header data is considered. Some researchers, e.g. Schaffrath [SS08] are more skeptical and claim that flow-based anomaly detection is still immature and it can be used only as complement of packet inspection due to the fact that many attacks or their symptoms can be hidden in the content of packets. Nowadays, this claim is not entirely true because with a modern, flexible approach to network flows proposed in IPFIX 38 39 5.2. Flow export standard [SBCQ09] any data from packets may be also included in flows. So, in this context the gap between those two techniques is much smaller than before [SSHB14]. 5.2. Flow export The concept of network flows was introduced by Cisco as a NetFlow [Cla04] technology. The first open version of NetFlow was the version 9, which was then standardized by the Internet Engineering Task Force (IETF) under the name IP Flow Information Export (IPFIX) [SBCQ09]. Although, several definitions of an IP flow exists, we follow this proposed by IETF: “A flow is defined as a set of IP packets passing an observation point in the network during a certain time interval. All packets belonging to a particular flow have a set of common properties.” In the simplest form, these properties are source and destination addresses and ports. 5.2.1. Operating principle Flow export is a quite complex process. It includes a real-time aggregation of packets into flows and periodic export of reports to collectors. Some details are presented in Fig. 5.1. Figure 5.1: Flow export architecture Flows are created and exported to collectors by accounting module placed in network routers or dedicated probes. This module is responsible for the metering process, i.e. creating flow records from the observed traffic. It extracts the header from each packet seen on the monitored interface. Then each header is marked with the timestamp and triggers an update to the entry in the flow cache. If there is no flow matching the packet header, a new entry is created. A flow is considered ready to be exported to collectors when: P. Bereziński Entropy-based Network Anomaly Detection 40 5.2. Flow export – the flow was idle (no packets have been detected in the flow) for a time longer than a given threshold (known as inactive timeout) which is 15 s by default; – the flow reaches the maximum allowed lifetime (by default 30 min); when this happens, the flow record is exported to the collector and a new one is created; – the FIN or RST flags have been seen in a TCP connection; – the flow-cache gets full; in this case, certain flow records are marked as expired and exported to the collector. It is important to emphasize that there is a difference between flows and TCP connections. A flow can be also defined for a connectionless UDP or ICMP protocols when a set of packets has been sent between two communication sides. Moreover, a flow does not have size restrictions, i.e. each communication between source and destination hosts will generate a flow, even if a single packet has been exchanged. A flow export protocol defines how flow records are exported to a collector. Typically such a record contains source and destination addresses and ports, start and end timestamps, type of service, protocol, flags, next hop router address, input and output SNMP interfaces, source and destination autonomous system numbers and network masks. Additionally, each flow carries aggregated information about the amount of packets and bytes exchanged. Details regarding NetFlow protocol are presented in Fig. 5.2. IPFIX proposes a flexible protocol in which flow record formats can be defined by using templates. It allows a larger set of parameters to be used. An IPFIX packet is logically divided into sections known as sets. A message can normally consist of three kinds of sets, namely template (format description), data and options. For more information regarding IPFIX message format, see [SBCQ09]. Figure 5.2: Flow record in NetFlow protocol (based on [Plo00]) The aim of the collector is to retrieve the flows created by the exporter and to store them in a form suitable for further monitoring or analysis. Typically collectors store this data in relational databases [Nto], stream databases [CJSS03], column-oriented databases [GM10] or binary files [NfS]. P. Bereziński Entropy-based Network Anomaly Detection 41 5.2. Flow export A flow is typically defined as a unidirectional sequence of packets, which means that there are two entries for each connection between two endpoints – one from the server to client and one from the client to server. Recently, bidirectional flows [TB08], which define one record for each session between two endpoints has been also supported by vendors. The differences between unidirectional and bidirectional flows are presented in Fig. 5.3. Figure 5.3: Unidirectional vs bidirectional flows 5.2.2. Problems and difficulties The modern approach assumes the use of dedicated probes transparently connected as a passive appliance via Switch Port Analyzer (SPAN) ports [Tap12] or network Test Access Points (TAP) [Tap12] rather than the usage of routers to export flows. Routers have often limited resources and they can be overloaded when network traffic is high, so extra processing connected with NetFlow is undesirable. By applying dedicated probes (as presented in Fig. 5.4) the problem with performance limitations of routers can be easily overcome. There are some cases where the use of flows can be problematic. For example, when a flow is created for each packet passing through the monitoring device, as a consequence of a DDoS attack [SSP12]. In such a case, the number of flows increases dramatically and extra load has to be be put on the monitoring and analysis system. To mitigate this problem or, in general, to improve the performance of routers, dedicated probes and sampling or aggregation techniques may be used. With sampling every n-th. packet is inspected, instead of all packets [BTW+ 06]. In aggregation technique flows with similar characteristics are merged [TWC13]. Another shortcoming of flow-based approach is an inaccuracy of metering modules applied in popular network routers or dedicated probes. It is known that the flow export may introduce artifacts in the P. Bereziński Entropy-based Network Anomaly Detection 42 5.3. NetFlow export setup Figure 5.4: Modern approach to flow exporting exported data such as the imprecision in flow timestamps, lack of flags in TCP flows or invalid byte counters [CSO+ 09], [Kog11], [TTSB11]. Moreover, these artifacts are widely spread among different devices from various vendors as was reported by Hofstede et al. [HDS+ 13]. As it can be seen there are some problems and difficulties with flow export that may interfere anomaly detection process. The aforementioned shortcoming should be taken into consideration and addressed while planning a flow export setup for network anomaly detection systems. 5.3. NetFlow export setup In order to employ NetFlow to work with Anode an appropriate flow export setup has been prepared and launched first. In our research we decided to rely an open-source software although we believe that commercial solutions which consist of dedicated hardware, e.g. these proposed by Invea-Tech [Floc] are much more efficient, reliable and error-prone. Open source community provides a wide range of software to generate and collect NetFlow data. The most popular are NfDump [NfS], Flow-tools [Flob], YAF [IT10], Argus [Arg], Softflowd [Sof], Fprobe [Fpr], NtopNg [Nto]. Based on such criteria as popularity, maturity, simplicity and support we decided to choose Softflowd and NfDump. Softflowd is a software probe capable of generating end exporting NetFlow records. Nfdump is a NetFlow collector able to collect, filter and dump collected data. The flow export setup prepared for Anode is presented in Fig.5.5. Softflowd probe is running promiscuously (all traffic, not only intended, is captured) on the host connected directly to the SPAN port of network central switch. In this way the whole traffic present in the network is mirrored to the probe. Flows generated by Softflowd are periodically (every 5 min by default) exported to NfDump collector via NetFlow 9 protocol. Statistics from Softlowd are passed to P. Bereziński Entropy-based Network Anomaly Detection 43 5.3. NetFlow export setup Figure 5.5: NetFlow capture setup for Anode NfDump via NfCapd deamon which is part of NfDump package. NfCapd reads data sent by the probe and stores it into binary files. NfDump reads the NetFlow data from the files and display it in a chosen format (bidirectional flows in our case). Results from NfDump are dumped to a text file and converted to SQL queries in order to feed the relational database (MySQL). P. Bereziński Entropy-based Network Anomaly Detection 6. Entropy-based network anomaly detector This chapter is focused on detailed specification of the proposed network anomaly detector named Anode. Firstly, a detailed architecture is given. Then, results of implementation are presented. 6.1. Architecture The architecture of Anode is presented in Fig. 6.1. Figure 6.1: Anode – the architecture Anode analyzes network data captured by NetFlow probes. Typical probes such as routers can be utilized but for this research dedicated probes connected to the SPAN ports (as presented in Chapter 5) have been used. Flows are analyzed within fixed time intervals (every 5 min by default). Bidirectional flows [TWC13] are chosen since, according to some works, e.g. [NSA+ 08], unidirectional flows may entail biased results. Collected data is stored in the relational database and then analyzed. In order to limit the area of search for anomalies, filters per direction, protocol and subnet are provided. Next, depending on the mode, Tsallis or Renyi entropy of positive and negative α-values is calculated for the set of network feature distributions which is presented in Table 6.1. Note: the Shannon version of the method use Renyi entropy with α set to 1. Initially, during the training phase, a dynamic profile is built using min and max entropy values within a sliding time window for every hfeature, αi pair, i.e. hsrc ip, α = −2i, hsrc ip, α = −1i, . . . 44 45 6.1. Architecture Table 6.1: Selected network feature distributions Feature Probability mass function src(dst)ip(port) number of xi as src(dst)address(port) total number of src(dst)addresses(ports) flows duration number of flows with xi as duration total number of flows packets, bytes number of pkts(bytes) with xi as src(dst) addr (port) total number of pkts(bytes) in(out)-degree number of hosts with xi as in(out)−degree total number of hosts hdst ip, α = −2i, . . . hsrc port, α = −2i, . . . and so on. A way of building a profile is presented in Fig. 6.2. By using sliding time window, traffic changes during the day can be reflected but at the same time a margin for some minor differences, e.g. small delays between the profile and current traffic is provided. Figure 6.2: A way of building a profile In the detection phase, the observed entropy is compared with the min and max values stored in the profile according to the following rule: rα (xi ) = Hα (xi ) − k ∗ minα , k ∗ (maxa − minα ) k ∈ h1..2i (6.1) With this rule, anomaly threshold is defined. Detection is based on the relative value of entropy with respect to the distance between min and max. Values rα (xi ) < 0 or rα (xi ) > 1 indicate abnormal concentration or dispersion. This abnormal dispersion or concentration for different feature distributions is characteristic for anomalies. For example, during a port scan, a high dispersion in port numbers and high concentration in addresses is observed. Coefficient k in the formula determines a margin for min and max boundaries and may be used for tuning purposes. A high value of k, e.g. k = 2, limits the number P. Bereziński Entropy-based Network Anomaly Detection 6.2. Implementation 46 of false alarms (alarms where no anomaly has taken place) while a low value (k = 1) increases the detection rate (the percentage of anomalies correctly detected). Some other approaches to thresholding based on standard deviation – mean ± 2sdev and median absolute deviation – median ± 2mad [RFG05] has been also taken into consideration but empirical results, showed that the proposed rule is the best choice. Classification is based on popular methods (decision trees, Bayes nets, rules and functions) employed in Weka [HFH+ 09]. Extraction of anomaly details is also assumed – related ports and addresses of attackers and victims are obtained by looking into the top contributors to the entropy value. 6.2. Implementation A proof of concept implementation of Anode has been developed in Microsoft .NET environment in C# language. All experiments presented in this Thesis have been conducted with it. This implementation allows to detect anomalies only in an off-line mode. A large number of options for a testing purposes is provided. The software produces Weka arff files based on entropy calculations for each network feature distribution. Recorded NetFlow data (e.g. whole day traffic) has to be captured and labeled in advance. Classification performance is evaluated with Weka (ten-fold cross-validation mode) based on the provided arff files. User interface of the proof of concept implementation is presented in Fig. 6.3. Currently Anode is also a module of the anomaly detection and security event data correlation system developed in SECOR project [CKP+ 11]. An implementation in SECOR has been developed in JAVA WSO2 [WSO] environment. WS02 is a Service Oriented Architecture (SOA) middleware platform built on Open Service Gateway initiative (OSGi) [OSG]. WSO2 environment contains among other elements, Application Server, Enterprise Service Bus, Complex Event Processing (CEP) engine and Web Service framework. SECOR implementation of Anode allows on-line detection an classification of anomalies based on NetFlow reports coming in real time from probes deployed in the network. Each time an anomaly is detected, Anode sends alert to CEP engine. These alerts conform to the STIX/CybOX format [Bar13] – a new and promising standard proposed by MITRE [Cor15]. An implementation of Anode in SECOR as WSO2 feature is depicted in Fig. 6.4. Exemplary STIX/CybOX alert indicating port scan anomaly detected by Anode is presented in Listing 6.1. It is noticeable that besides information about timestamp and anomaly type, additional data regarding suspected attackers and victims as well as the level of confidance are provided. One of the feature of Anode is a visualization of timeseries of volume-based counters and entropy of network features. Such visualization is presented in Fig.6.5. The border of legitimate area is marked with red and blue lines. Everything which is below or above can be treated as anomalous. There are anomalous areas in the presented timeseries marked with red oval. It can be seen that there are anomalies which are invisible in a traffic volume expressed by flows, packets and octets but visible via entropy of network features such as source and destination addresses and destination port. P. Bereziński Entropy-based Network Anomaly Detection 47 6.2. Implementation Figure 6.3: Anode – proof of concept implementation Figure 6.4: Anode – the implementation in SECOR P. Bereziński Entropy-based Network Anomaly Detection 48 6.2. Implementation Listing 6.1: Exemplary STIX/CybOX alert generated by Anode <stix:STIX_Header> <stix:Title>Port Scan</stix:Title> <stix:Information_Source> <stixCommon:Identity> <stixCommon:Name>Anode</stixCommon:Name> </stixCommon:Identity> <stixCommon:Time> <cyboxCommon:Produced_Time>2015-02-0T15:42:24Z</cyboxCommon:Produced_Time> </stixCommon:Time> </stix:Information_Source> </stix:STIX_Header> <stix:Indicators> <stix:Indicator timestamp="2015-02-0T15:39:24Z" id="7c3885fe" xsi:type="IndicatorType"> <stix:Indicator:Title>Attackers</Indicator:Title> <stix:Indicator:Type xsi:type="IndicatorTypeVocab">IP Watchlist</Indicator:Type> <stix:Indicator:Description>Potential attackers</ Indicator:Description> <stix:Indicator:Observable id="1c798262"> <cybox:Object id="1980ce43"> <cybox:Properties xsi:type="AddressObject:AddressObjectType"> <Address_Value condition="Equals">10.10.0.155</Address_Value> </cybox:Properties> </cybox:Object> </stix:Indicator:Observable> <stix:Indicator:Confidence> <stixCommon:Value xsi:type="HighMediumLowVocab">Medium</stixCommon:Value> </stix:indicator:Confidence> </stix:Indicator> <stix:Indicator timestamp="2015-02-0T15:39:24Z" id="404d122c" xsi:type="IndicatorType"> <stix:Indicator:Title>Victims</Indicator:Title> <stix:Indicator:Type xsi:type="IndicatorTypeVocab">IP Watchlist</Indicator:Type> <stix:Indicator:Description>Potential victims</ Indicator:Description> <stix:Indicator:Observable id="1c798262"> <cybox:Object id="1980ce43"> <cybox:Properties xsi:type="AddressObjectType"> <Address_Value condition="Equals" apply_condition="Any">1.1.0.7,1.1.0.9</Address_Value> </cybox:Properties> </cybox:Object> <cybox:Object id="Port numbers:obj-1980ce43-8e03"> <cybox:Properties xsi:type="PortObjectType"> <Port_Value condition="Equals" apply_condition="Any">21,22,80,443</Port_Value> </cybox:Properties> </cybox:Object> </stix:Indicator:Observable> <stix:Indicator:Confidence> <stixCommon:Value xsi:type="HighMediumLowVocab">High</stixCommon:Value> </stix:indicator:Confidence> </stix:Indicator> </stix:Indicators> P. Bereziński Entropy-based Network Anomaly Detection 49 6.2. Implementation Figure 6.5: Timeseries visualisation in Anode P. Bereziński Entropy-based Network Anomaly Detection 7. Dataset This chapter presents the dataset developed to evaluate the proposed method. This dataset is based on a real legitimate traffic and synthetic anomalies. The chapter starts with the origin of the idea. Next, some details concerning legitimate and anomalous traffic are presented. Finally, explanation of anomaly generation process is given. 7.1. Origin of the idea An effort to build own dataset was taken due to: – limited availability of datasets for network anomaly detection; – the lack of proper labeling in shared datasets; – the fact that most of the available datasets are obsolete in terms of legitimate traffic and anomalies; – the absence of realistic data in datasets; – small number of dataset with flows (conversion from packets is necessary, which results in loss of labels); – incompleteness of data (narrow range of anomalies, lack of anomalies related to botnet-like malware). More details about the existing dataset are presented in Chapter 2 and in our paper [MBM15]. Due to the aforementioned reasons an effort to develop labeled traffic traces based on real legitimate traffic and synthetic anomalies was taken. 7.2. Legitimate traffic Firstly, one-week legitimate traffic from a medium size local network connected to the Internet was captured. This was accomplished using open source software – Softflowd [Sof] and NfDump [NfS] as described in Chapter 5. The captured data in form of labeled bidirectional flows was exported to the relational database. Because daily profile of each working day in the captures traffic was similar (except some minor differences on Monday morning and Friday afternoon) one-day profiling approach was chosen. So, from the whole data it was enough to extract two days (Tuesday, Wednesday) in order 50 51 7.2. Legitimate traffic to build the dataset. The first day is designated for a training (only legitimate traffic) and the second day for a detection (legitimate traffic + injected anomalies). The profile expressed by the number of flows of this 2-day traffic (before any injection of anomalies) is depicted in Fig. 7.1. Figure 7.1: Legitimate traffic profile by number of flows Figure 7.2: Legitimate traffic profile by number of packets There is time t on x axis (5 minute fixed time window) and the number of flows on y (log scale) axis. Working day starts around 7 a.m. and finishes around 4 p.m. The volume of the traffic expressed P. Bereziński Entropy-based Network Anomaly Detection 52 7.2. Legitimate traffic by the number of flows for both days is similar, but looking at the volume expressed by the number of packets (Fig. 7.2), this similarity is a bit lower. Some global characteristics of the traffic are presented in Table 7.1. Table 7.1: Global characteristics of legitimate traffic Feature Total count Flows 767 498 Distinct src ip 733 Distinct dst ip 12 977 Distinct dst port 23 140 Packets 57 239 939 Bytes 46 216 539 894 Flow breakdown according to the transport protocol is depicted in Fig. 7.3. As it can be seen utilization of ICMP protocol is negligible. Figure 7.3: Flow breakdown according to the transport protocol Flow breakdown according to the services is depicted in Fig. 7.4. It is noticeable that most of the traffic is connected with web browsing (HTTP, HTTPS and significant part of DNS). Flow breakdown according to the activity of hosts is depicted in Fig. 7.5. One can see, that this distribution is rather even and there are no distinctly active hosts. Flow breakdown according to the activity of servers is depicted in Fig. 7.6. It is observable that the most utilized servers are primary and secondary DNS. In the next step implementation of different scenarios of malicious network activities was prepared. Synthetic anomalies typical for botnet-like network behavior were generated and then injected into the legitimate traffic. More details concerning anomaly generation process are presented in Section 7.6. P. Bereziński Entropy-based Network Anomaly Detection 53 7.3. Scenario 1 Figure 7.4: Flow breakdown according to the services Figure 7.5: Flow breakdown according to the hosts Figure 7.6: Flow breakdown according to the servers 7.3. Scenario 1 For this scenario, a small and low-rate ssh brute force, port scan, ssh network scan and TCP SYN flood DDoS anomalies in different variants were generated. These anomalies do not form any realistic P. Bereziński Entropy-based Network Anomaly Detection 54 7.4. Scenario 2 traces of malware, but a detection and a proper classification of such set of anomalies is crucial because they are typical for behavior of a botnet-like malware. Main characteristics of the generated anomalies are presented in Table 7.2. Table 7.2: Characteristics of anomalies in Scenario 1 Type/kind No. of flows Duration [sec] No. of victims No. of attackers 1 1K 300 1 1 2 1K 100 1 1 3 2K 300 1 1 1 2K 200 1 50 2 2K 200 1 250 3 3K 300 1 50 4 3K 300 1 250 5 4K 400 1 50 6 4K 400 1 250 1 6K 60 6K 1 2 6K 300 6K 1 3 8K 80 8K 1 4 8K 400 8K 1 1 1K 50 1 1 2 1K 100 1 1 3 2K 100 1 1 4 2K 200 1 1 SSH brute force (bf) TCP SYN flood DDoS (dd) SSH network scan (ns) Port scan (ps) The generated anomalies were mixed with the legitimate traffic from Day2 (Wednesday) in the way presented in Fig. 7.7. Anomalies are not injected into Day1 (Tuesday) as it is intended for the profiling of a legitimate traffic. As it can be seen, each anomaly is injected every 15 minutes mainly during working time. After injection only a few anomalies are visible in the volume expressed by a number of flows or a number of packets as depicted respectively in Fig. 7.8 and Fig. 7.9. 7.4. Scenario 2 For this scenario, a much more realistic sequence of a modern botnet-like malware behavior was generated. The subsequent stages look as follows: P. Bereziński Entropy-based Network Anomaly Detection 55 7.4. Scenario 2 Figure 7.7: Distribution of anomalies in time in Scenario 1 Figure 7.8: Legitimate and anomalous traffic by number of flows in Scenario 1 1. One of the hosts in local network gets infected with a botnet-like malware. In order to propagate via network it starts scanning his neighbors. Malware is looking for hosts running Remote Desktop Protocol (RDP) services. RDP is a proprietary protocol developed by Microsoft, which provides a user with a graphical interface to connect to another computer over a network. RDP servers are built into Windows operating systems. By default, the servers listen on TCP/UDP port 3389. 2. Hosts serving Remote Desktop services are attacked with a dictionary attack (similarly to the technique found in MORTO worm [Bit11]). 3. After a successful dictionary attack vulnerable machines get infected and become a member of botnet. 4. A peer-to-peer communication based on UDP transport protocol is established among the infected hosts. P. Bereziński Entropy-based Network Anomaly Detection 56 7.4. Scenario 2 Figure 7.9: Legitimate and anomalous traffic by number of packets in Scenario 1 5. On C&C server command botnet members start a low-rate DDoS attack called Slowrolis [DDL+ 12] on an external HTTP server. After a few minutes the server is blocked. The whole scenario is presented in Fig. 7.10. Main characteristics of the generated anomalies are presented in Table 7.3. Figure 7.10: Scenario 2 P. Bereziński Entropy-based Network Anomaly Detection 57 7.5. Scenario 3 Table 7.3: Characteristics of anomalies in Scenario 2 Type No. of flows Duration [sec] No. of victims No. of attackers Network scan (ns) 252 200 252 1 RDP brute force (bf) 720 550 53 1 Botnet p2p (p2p) 150 185 15 15 Slowrolis DDoS (dd) 1124 117 15 1 Anomalies generated for the scenario were mixed with the legitimate traffic from Day2 (Wednesday) in the way presented in Fig. 7.11. Figure 7.11: Distribution of anomalies in time in Scenario 2 It is noticable, that whole scenario which consists of four anomalies is injected every hour during working time. Anomalies in this scenario are small and slow. They represent only a small fraction of total traffic, so after injection none of them is visible in the volume expressed by a number of flows or a number of packets as depicted respectively in Fig. 7.12 and Fig. 7.13. 7.5. Scenario 3 For this scenario, another realistic sequence of a modern botnet-like malware behavior was generated. The subsequent stages look as follows: 1. One of the hosts in local network infected with a modern botnet malware starts scanning his neighbors in order to propagate. It uses similar network propagation mechanism as it is employed in Stuxnet worm [Stu11], [Den12], [BPBF12]. Malware is looking for hosts with open TCP and UDP ports reserved for Remote Procedure Call (RPC). In Windows, RPC is an inter-process communication mechanism that enables data exchange and invocation of functionality residing in a different process locally or via network. The list of ports used to initiate a connection with RPC is as follows: UDP – 135, 137, 138, 445, TCP – 135, 139, 445, 593. 2. Hosts with open RPC ports are attacked with specially crafted RPC requests. P. Bereziński Entropy-based Network Anomaly Detection 7.5. Scenario 3 58 Figure 7.12: Legitimate and anomalous traffic by number of flows in Scenario 2 Figure 7.13: Legitimate and anomalous traffic by number of packets in Scenario 2 3. After successful exploitation vulnerable machines get infected and become a member of botnet. 4. A direct communication to a single C&C server is established on each infected host. 5. On C&C server command botnet members start a DDoS amplification attack based on Network Time Protocol (NTP). This attack is targeted to an external server. Botnet members send packets with a forged source ip address (set to this used by the victim). Thus, replies from NTP server P. Bereziński Entropy-based Network Anomaly Detection 59 7.5. Scenario 3 are sent to the victim instead to the attackers. Moreover, this attack is amplified, i.e. the attackers send a small (234 bytes) packet with a command to get a list of interacting machines and NTP server sends a large (up to 200 times bigger) reply to the victim. As a result the attackers turn small amount of bandwidth coming from a small number of machines into a significant traffic load hitting the victim. More details regarding NTP amplification DDoS attacks can be found in [KHRH14]. The whole scenario is presented in Fig. 7.14. Main characteristics of the generated anomalies are presented in Table 7.4. Figure 7.14: Scenario 3 Table 7.4: Characteristics of anomalies in Scenario 3 Type No. of flows Duration [sec] No. of victims No. of attackers Block scan (bs) 1.5K 80 168 1 RPC attack (rpc) 650 200 90 1 Botnet C&C communication (c&c) 125 190 63 1 NTP DDoS (dd) 2.9K 580 1 63 (spoofed to 1) Anomalies generated for the scenario were mixed with the legitimate traffic from Day2 (Wednesday) in the way presented in Fig. 7.15. As it can be seen, the whole scenario which consists of four anomalies is injected every hour during working time. Similarly to Scenario 2, anomalies here are small and slow and they represent only o small P. Bereziński Entropy-based Network Anomaly Detection 60 7.6. Anomaly generator Figure 7.15: Distribution of anomalies in time in Scenario 3 fraction of total traffic. After injection none of them is visible in the volume expressed by a number of flows or a number of packets as depicted respectively in Fig. 7.16 and Fig. 7.17. Figure 7.16: Legitimate and anomalous traffic by number of flows in Scenario 3 7.6. Anomaly generator In order to produce flows that can mimic an anomalous behavior, a dedicated tool in Python language was developed [BPMP14]. With this tool one can generate flows according to the predefined policy. The policy assigns a certain type of generation method to each field of flow record. In consequence a set of flows which meets given statistical profile can be obtained. P. Bereziński Entropy-based Network Anomaly Detection 61 7.6. Anomaly generator Figure 7.17: Legitimate and anomalous traffic by number of packets in Scenario 3 Listing 7.1: Default generator group [testgroup] protocol = con[TCP] srcIP = con[10.5.0.77] dstIP = ran[10.1.0.1; (["0.0.0.1", "0.0.0.2", "-0.0.0.1"],[0.97,0.15,0.15]); (10.1.0.1, 10.1.0.253)] srcPort = ran[uniform(300, 500)] dstPort = con[22] fromSrcPkts = con[1] fromSrcOctets = con[60] fromDstPkts = con[1] fromDstOctets = con[60] #duration dur = con[1] #inter arrival time iar = per[300:ran[uniform(10, 50)]; 800:con[500]; ran[([10, 11, 12, 13], [0.20, 0.30, 0.40, 0.10])]] flags = con[SYN|ACK|RST] Internally, the tool operates on integer values which are manipulated by generation methods introduced in [BWM08]. They are as follows: con (constant), ran (random) and per (periodical). Con generator is straightforward and does not need further explanation, others are described below. Ran generators are used to obtain random values. There are two types of such generators: absolute (e.g. srcPort in Listing 7.1) or relative (e.g. dstIP in Listing 7.1). The value produced with the relative generator is summed with that previously generated. This feature can be used to sweep across certain range of values. Both generators can be initiated with either uniform or arbitrary distribution. Arbitrary distribution consists P. Bereziński Entropy-based Network Anomaly Detection 62 7.6. Anomaly generator of two list: values and probabilities of these values. Relative generator additionally needs a start value and a range. Per generators are used to match a certain generating method with the sequence number of the currently generated flow. They are initiated with a list of key-value pairs out of which the first one represents the flow number and the second – the generator definition. On the last position, the default generator is placed. For example, iar definition in Listing 7.1 means that every 300-th flow a uniform (10,50) generator will be applied and respectively every 800-th flow generator returns 500. In other cases, default generator will be applied. The set of generators shown in Listing 7.1 is called the generator group. A policy may consist of multiple groups. In such a case the probability of using a certain generator group must be defined by means of volume declaration as presented in Listing 7.4. Only one generator group (considered as default) in a policy has a generator for each field of the flow. The additional groups may override all or selected definitions of the default one. A concept of a generator group was introduced to ensure that fields of the flow will be consistent with each other. For example, to disallow flows which are too short when compared with the amount of bytes of the flow. There are phenomena on the network that can only be modeled with sequences of flows. Our tool provides such a functionality which is available through indexing of group names. In such indexed groups, one can use mechanisms which allow sharing state between the subsequent flows. For example, in Listing 7.2, we enforced value of dstIP not to be changed through the whole sequence. Listing 7.2: Flows sequence modelling [testgroup.1] dstIP = args[usePrevValue] dur = con[100] [testgroup.2] dstIP = args[usePrevValue] dur = con[1000] To model more advanced scenarios where the sequence of anomalies is generated and state is shared not only between subsequent flows but also subsequent generation groups representing particular anomalies a top-level policy file can be used. This mechanism is presented in Listing 7.3. In this model, the SSH scan anomaly is generated first and then (after 5 s. according to offset) an SSH brute force attack is mimic. Only hosts which are running SSH service in the first step are passed to the second step, so brute force attack is performed only on vulnerable hosts. Generators for SSH scan and SSH brute force are presented in Listing 7.4 and Listing 7.5 respectively. An example of a similar generator is FLAME [BWM08], [Bra10]. There are however some significant differences. FLAME comes with a very basic support for generating flows, forcing users to implement all the generation logic by themselves, while our tool supports policy files. On the other hand FLAME has fairly sophisticated functionality of inserting generated flows into the base traffic which our tool does not support at all. Another interesting approach was introduced by Shiravi [SSTG12]. The authors proposed to describe network traffic (not only flows) by a set of so-called α- and β-profiles which can subsequently be used to generate a data set. The α-profiles consist of actions which should be executed to generate a given event in the network (such as attack) while β-profiles are more similar to our policy files where behavior of certain entities (packet sizes, number of packets per flow) are P. Bereziński Entropy-based Network Anomaly Detection 63 7.6. Anomaly generator represented by statistical model. On the whole this concept is similar to ours but far more complex and thus, more difficult to use. Listing 7.3: Anomaly sequence modelling – top level policy file [step1] #network scan (SSH service) offset = 0 generator_config_file = ssh_scan.config [step2] offset = 5000 generator_config_file = ssh_brute.config #filtering host runing SSH services filter = SSHScan-open #list of filtered hosts template = {’__DST_IP_LIST__’:{’list’: { ’source’: ’DST_IP’ }}} Listing 7.4: Anomaly sequence modelling – ssh_scan.config [SSHScan-noResponse] maxflows = 252 protocol = con[TCP] srcIP = con[10.1.0.2] dstIP = ran[10.1.0.3; (["0.0.0.1"],[1.0]); (10.1.0.1, 10.1.0.253)] srcPort = ran[uniform(33800, 61100)] dstPort = con[22] fromSrcPkts = con[1] fromSrcOctets = con[60] fromDstPkts = con[0] fromDstOctets = con[0] dur = con[0] iar = ran[uniform(0, 3000) ] flags = con[SYN] [SSHScan-reset] volume = 0.34 fromDstPkts = con[1] fromDstOctets = con[46] flags = con[SYN|RST|ACK] dur = ran[([0, 1, 2, 15],[0.70,0.25,0.03,0.02])] [SSHScan-open] volume = 0.05 fromSrcPkts = con[4] fromSrcOctets = con[204] fromDstPkts = con[3] fromDstOctets = ran[uniform(180,210)] dur = ran[uniform(40, 300)] flags = con[FIN|SYN|RST|PSH|ACK] iar = con[2] [SSHScan-open.1] P. Bereziński Entropy-based Network Anomaly Detection 7.6. Anomaly generator dstIP = args[usePrevValue] srcPort = args[incrementPrevValue] fromSrcPkts = ran[uniform(19,21)] fromSrcOctets = ran[uniform(2500,2700)] fromDstPkts = ran[uniform(14,18)] fromDstOctets = ran[uniform(2850,2900)] dur = ran[uniform(6000, 11000)] flags = con[FIN|SYN|PSH|ACK] Listing 7.5: Anomaly sequence modelling – ssh_brute.config [bruteSSH] maxflows = 1000 protocol = con[TCP] srcIP = con[10.1.0.2] dstIP = lis[__DST_IP_LIST__] srcPort = ran[uniform(33800, 61100)] srcPort = ran[ 60310; (["1","2"],[0.9, 0.1]); (60300, 60400) ] dstPort = con[22] fromSrcPkts = ran[uniform(14, 15) ] fromSrcOctets = ran[uniform(1400, 1500) ] fromDstPkts = ran[uniform(9, 11) ] fromDstOctets = ran[uniform(1200, 2400) ] dur = ran[uniform(800, 1200) ] iar = ran[uniform(1000, 3000) ] flags = con[FIN|SYN|PSH|ACK] [bruteSSH-success-1] volume = 0.2 [bruteSSH-success-1.1] dstIP = args[usePrevValue] [bruteSSH-success-1.2] dstIP = args[usePrevValue] [bruteSSH-success-1.3] dstIP = args[usePrevValue] [brute SSH-success-2] volume = 0.1 [bruteSSH-success-2.1] dstIP = args[usePrevValue] [bruteSSH-failure] volume = 0.7 [bruteSSH-failure.1] dstIP = args[usePrevValue] [bruteSSH-failure.2] dstIP = args[usePrevValue] [bruteSSH-failure.3] dstIP = args[usePrevValue] [bruteSSH-failure.4] dstIP = args[usePrevValue] P. Bereziński Entropy-based Network Anomaly Detection 64 8. Verification of the approach This chapter presents verification of the proposed method. The aim of the verification is to check if the method is able to detect network anomalies and categorize them. Firstly, results of correlation tests performed in order to find the proper range of α-values and proper set of network features to use in the method are presented. Next, the performance of the method is evaluated. Finally, some conclusions are given. 8.1. Correlation Firstly, correlation tests for various α-values and for various network features were performed. These tests were important as strong correlation suggests that some results are closely related to each other and thus it may be sufficient to restrict the scope of α-values and network features without impairing validity of the method. In the experiments, Pearson [HK11] and Spearman [HK11] correlation coefficients were used. For a sample of discrete random variables X, Y the formula for Pearson coefficient is defined as: n P rX,Y = X̄ = 1 n n P (Xi − X̄)(Yi − Ȳ ) i=1 (8.1) sx sy where s Xi and sx = i=1 1 n−1 n P (Xi − X̄)2 i=1 The formula for Spearman coefficient for a sample of discrete random variables X, Y is defined as: rX,Y = corr(RX, RY ) (8.2) where corr - Pearson correlation coefficient for sample RX - ranks of X RY - ranks of Y The results of correlation between entropy timeseries for different α-values are presented in Table 8.1. The table shows the pairwise Tsallis α correlation scores from range h−1..1i where scopes ± |1 − 0.9|, |0.9 − 0.7|, |0.7 − 0.5|, |0.5 − 0| denote, respectively, strong, medium, weak, and no correlation. The sign determines if the correlation is positive (no sign or +) or negative (-). The presented 65 66 8.2. Performance evaluation values (see Table 8.1) are average scores from 15 different network features. Only results based on Tsallis entropy are presented as these obtained for Renyi entropy were similar. Spearman Pearson Table 8.1: Results of correlation of α α = −3 α = −2 α = −1 α=0 α=1 α=2 α=3 α = −3 1 0.99 0.96 0.66 0.12 −0.06 −0.09 α = −2 - 1 0.98 0.69 0.13 −0.06 −0.09 α = −1 - - 1 0.75 0.16 −0.05 −0.08 α=0 - - - 1 0.44 0.18 0.12 α=1 - - - - 1 0.88 0.82 α=2 - - - - - 1 0.97 α=3 - - - - - - 1 α = −3 1 0.97 0.837 0.46 0.06 −0.09 −0.11 α = −2 - 1 0.94 0.57 0.1 −0.07 −0.1 α = −1 - - 1 0.72 0.15 −0.06 −0.09 α=0 - - - 1 0.49 0.2 0.15 α=1 - - - - 1 0.87 0.79 α=2 - - - - - 1 0.9 α=3 - - - - - - 1 It should be noticed, that there is a strong positive linear (Pearson) and rank (Spearman) correlation for negative α-values and strong positive correlation between α-values which are higher than 1. For α = 0 there is a small positive correlation with negative values. For α = 1 (Shannon) there is a medium correlation with α = 2 and α = 3. These results suggest that it is sufficient to use α-values from range h−2..2i to obtain different and distinctive sensitivity levels of entropy. Some interesting results of pairwise correlation between Tsallis entropy timeseries of different network features are presented in Table 8.2 and Table 8.3. The results obtained for Renyi are not presented as they closely reassemble these obtained for Tsallis. Results for one positive and one negative value of α are presented because these results differ significantly. Averaging (based on results from the whole range of α-values) would hide an essential property. It is noticeable, that there is a strong positive correlation of addresses and ports for negative α-values and no correlation for positive α-values. 8.2. Performance evaluation Experiments were performed for Tsallis, Renyi and Shannon version of our method as well as traditional volume-based approach with flow, packet and byte counters. Final evaluation was performed with Weka [HFH+ 09]. Experiments were performed with the dataset presented in Chapter 7. Some exemplary results of entropies for a selected network feature distributions are presented below. Abnormally high dispersion in destination addresses distribution for network scan anomalies exposed by negative value of P. Bereziński Entropy-based Network Anomaly Detection 67 8.2. Performance evaluation Spearman Pearson Table 8.2: Results of correlation of features for α = −3 src ip dst ip src port dst port in-degree out-degree src ip 1 0.89 0.89 0.91 0.37 0.35 dst ip - 1 0.98 0.89 0.27 0.55 src port - - 1 0.86 0.15 0.5 dst port - - - 1 0.41 0.53 ind-egree - - - - 1 0.27 out-degree - - - - - 1 src ip 1 0.9 0.85 0.87 0.47 0.69 dst ip - 1 0.96 0.89 0.43 0.83 src port - - 1 0.83 0.3 0.69 dst port - - - 1 0.53 0.12 in-degree - - - - 1 0.48 out-degree - - - - - 1 Spearman Pearson Table 8.3: Results of correlation of features for α = 3 src ip dst ip src port dst port in-degree out-degree src ip 1 −0.07 −0.34 −0.02 −0.07 0.44 dst ip - 1 −0.29 0.05 0.08 −0.28 src port - - 1 −0.42 0.59 −0.04 dst port - - - 1 −0.39 0.01 in-degree - - - - 1 0.03 out-degree - - - - - 1 src ip 1 0.03 −0.21 0.07 0.21 0.37 dst ip - 1 −0.31 0.07 0.08 −0.35 src port - - 1 −0.55 0.64 0.23 dst port - - - 1 0.52 0.76 in-degree - - - - 1 0.18 out-degree - - - - - 1 α parameters is depicted in Fig. 8.1. One can see time t on x axis (5 minute time windows), result r on y axis and α-values on z axis. The r value corresponds to thresholding rule applied in the method [Eq. (6.1)]. Values of r outside (0..1) threshold are considered as anomalous. Anomalies are marked with A on the time axis. Values of Shannon entropy are denoted as S. Abnormal concentration of flows duration for network scans is depicted in Fig. 8.2. This concentration is typical for anomalies with a fixed data stream, i.e. anomalies where all flows have similar size. Fig. 8.3 shows ambiguous detection (no distinctive pattern in exceeding 0 − 1 threshold) of port scan anomaly with flow, packet and byte counters. While experimenting, we noticed that measurements for all network features as a group work better than single ones. In our experiments addresses, ports and duration feature distributions seemed to be the P. Bereziński Entropy-based Network Anomaly Detection 8.2. Performance evaluation 68 most deterministic, although we believe that the proper set of network features is specific for particular anomalies. Figure 8.1: Abnormally high dispersion in destination addresses for network scan anomalies (Renyi/Shannon) Figure 8.2: Abnormally high concentration in flows duration for network scan anomalies (Tsallis/Shannon) Overall (whole data set, all network features) multi-class classification was performed with Weka. Classes for each anomaly type plus one class for the legitimate traffic were defined. To correctly evaluate predictive performance ten-fold cross-validation method [HFH+ 09] was used. From the performance point of view, every classification attempt can produce one of four outcomes presented in Fig. 8.4. An ideal classifier should not produce False Positive (FP) and False Negative (FN) statistical errors. To evaluate non-ideal classifiers, one could measure proportion of correct assessments to all assessments – Accuracy (ACC), the share of benign activities reported as anomalous – False Positive Rate (FPR) and the share of anomalies missed by the detector – False Negative Rate (FNR). Usage of Precision – proporP. Bereziński Entropy-based Network Anomaly Detection 69 8.2. Performance evaluation Figure 8.3: Ambiguous detection of port scan anomaly with a volume-based approach predicted n True Positive False Negative TP FN False Positive True Negative FP TN P N P0 actual p0 p n0 N0 Figure 8.4: Possible results of classification tion of correctly reported anomalies and Recall – share of correctly reported anomalies compared to the total number of anomalies is another option. Based on these measures Receiver Operating Characteristics (ROC) and Precision vs Recall (PR) plots are typically used [DG06]. Formulas for the mentioned metrics as well as some additional measures which can be also used to evaluate the performance of classifier are presented in Table 8.4. To work effectively, anomaly detector should detect a substantial percentage of anomalies into the supervised system, while still keeping the False Positive Rate at an acceptable level. Now, we present a realistic example which shows that for reasonable set of assumptions, the FPR is the limiting factor for the performance of anomaly detection. This is due to the base-rate fallacy phenomenon, which was first pointed out by Axelsson [Axe99], that in order to achieve reasonable realistic detection rate known as Bayesian Detection Rate (BDR), we have to achieve a very low FPR. Following Axelsson let us assume that: P. Bereziński Entropy-based Network Anomaly Detection 70 8.2. Performance evaluation Table 8.4: Metrics used to evaluate performance of classification Name Formula True Positive Rate (TPR) eqv. with Recall, Sensitivity TPR = True Negative Rate (TNR) eqv. with Specificity TNR = Positive Predictive Value (PPV) eqv. with Precision PPV = Negative Predictive Value (NPV) NPV = False Positive Rate (FPR) eqv. with Fall-out FPR = False Discovery Rate (FDR) FDR = False Negative Rate (FNR) Accuracy (ACC) F1 score – harmonic mean of Precision and Recall TP TP +FN TN FP +TN TP TP +FP TN TN +FN FP FP +TN FP FP +TP = 1 − TNR = 1 − PPV FN FN +TP TP +TN ACC = TP +FN +FP +TN 2TP F1 = 2TP +FP +FN FNR = – I: intrusive (anomalous) event in a system; – ∼ I: non-intrusive event; – A: alarm signaled; – ∼ A: no alarm fired; – TPR = P (A|I); – FPR = P (A| ∼ I). Goal is to maximize both: – Bayesian Detection Rate (BDR), P (I|A); – Bayesian True Negative Rate P (∼ I| ∼ A). The fallacy stems out directly from the Bayes theorem [Bay63] which relates prior and posterior probabilities of the events. According to the Bayes theorem: P (A|I)P (I) (8.3) P (A|I)P (I) + P (A| ∼ I)(P (∼ I) Our anomaly detector tests for the presence of anomaly 12 NetFlow reports per hour, as each report P (I|A) = includes last 5 minutes of network traffic. During the day it gives 12 ∗ 24 = 248 reports to test. Let us make an realistic assumption that most of the network traffic is not malicious and the anomaly is present in every hundred report. This allows us to calculate the following a’priori probabilities: – P (I) = 0.01 – P (∼ I) = 1 − P (I) = 0.99 Now let us assume that our detector has the following characteristic: P. Bereziński Entropy-based Network Anomaly Detection 71 8.2. Performance evaluation – TPR = P (A|I) = 0.9 – FPR = P (A| ∼ I) = 0.01 Taking all these values let us calculate BDR: P (I|A) = P (A|I)P (I) 0.9 ∗ 0.01 = ≈ 0.47 P (A|I)P (I) + P (A| ∼ I)(P (∼ I) (0.9 ∗ 0.01) + (0.01 ∗ 0.99) (8.4) If the characteristic of our detector is 5 times weaker in terms of false alarms – F P R = 0.05 than BDR will drop to ≈ 0.15 and if it is 10 times weaker (F P R = 0.1) than BDR will drop to ≈ 0.08 which is an unacceptable. In our approach we deal with a multi-classification problem where more than two classes are taken into consideration, in contrast to single binary classification (or detection) where only two classes, i.e. anomalous and not anomalous are used. We classify instances into one of many classes such as port scan, network scan, brute force, etc. We use classifiers from Weka which transform internally multiclass problem into multiple binary one. To handle it Weka uses One-vs-All approach [Rif08]. The idea behind this approach is: – take n binary classifiers (one for each class); – for the ith classifier, let the positive examples be all the points in class i, and let the negative examples be all the points not in class i; – let fi be the ith classifier than classify with the following rule: f (x) = arg max fi (x) i (8.5) Averaged Accuracy and avaraged FPR are the standard measures to assess a performance of multiclass classifiers. We also propose our own measurement method, namely weighted ROC curves which is presented later in this section. Evaluation results based on Scenario 1 are presented in Table 8.5. As it can be seen the results for several popular classifiers are presented. ZeroR is a trivial algorithm which classifies the whole traffic as not anomalous. We included it here as a reference to other results as it is expected that other classifiers perform better than that which does not detect anomalies. Evaluation results for Scenario 2 and Scenario 3 are presented in Table 8.6 and Table 8.7 respectively. It is noticeable that the best performance in each scenario was obtained by applying SimpleLogistic. In Weka, SimpleLogistic is a classifier for building linear logistic regression models [SFH05]. Logistic regression comes from the fact that linear regression [SL12] can be used to perform classification. The idea of logistic regression is to make linear regression produce probabilities, thus instead of class prediction, there is a prediction of class probabilities. More details on SimpleLogistic can be found in [LHF05] and [SFH05]. If we look at the detailed results of SimpleLogistic (Renyi entropy case) for all scenarios (Table 8.8), it can be seen that different classes are characterized by rather different performance of recognition. For example, models for network scan and not anomalous are very strong, whereas that for p2p is much weaker. P. Bereziński Entropy-based Network Anomaly Detection 72 8.2. Performance evaluation FPR Accuracy Table 8.5: Averaged performance of classification – Scenario 1 ZeroR Bayes Network Decision Tree J48 Random Forest Simple Logistic Tsallis 0.66 0.89 0.90 0.93 0.93 Renyi 0.66 0.88 0.89 0.90 0.93 Shannon 0.66 0.84 0.86 0.90 0.92 volume-based 0.66 0.72 0.77 0.76 0.80 Tsallis 0.66 0.07 0.08 0.07 0.06 Renyi 0.66 0.08 0.09 0.11 0.09 Shannon 0.66 0.08 0.11 0.12 0.08 volume-based 0.66 0.21 0.15 0.22 0.20 FPR Accuracy Table 8.6: Averaged performance of classification – Scenario 2 ZeroR Bayes Network Decision Tree J48 Random Forest Simple Logistic Tsallis 0.68 0.82 0.84 0.85 0.91 Renyi 0.68 0.83 0.88 0.89 0.92 Shannon 0.68 0.77 0.80 0.84 0.89 volume-based 0.68 0.68 0.73 0.78 0.80 Tsallis 0.68 0.22 0.14 0.27 0.11 Renyi 0.68 0.15 0.12 0.20 0.11 Shannon 0.68 0.29 0.21 0.28 0.15 volume-based 0.68 0.68 0.20 0.15 0.28 FPR Accuracy Table 8.7: Averaged performance of classification – Scenario 3 ZeroR Bayes Network Decision Tree J48 Random Forest Simple Logistic Tsallis 0.68 0.83 0.83 0.87 0.93 Renyi 0.68 0.83 0.83 0.85 0.94 Shannon 0.68 0.76 0.80 0.85 0.90 volume-based 0.68 0.68 0.62 0.65 0.66 Tsallis 0.68 0.13 0.17 0.22 0.10 Renyi 0.68 0.13 0.16 0.22 0.06 Shannon 0.68 0.23 0.16 0.22 0.13 volume-based 0.68 0.68 0.57 0.45 0.67 P. Bereziński Entropy-based Network Anomaly Detection 73 8.2. Performance evaluation Table 8.8: Detailed performance of SimpleLogistic classifier (Renyi entropy case) TPR/FPR Scenario1 Scenario2 Scenario3 0.78/0 1/0.01 − network scan 0.92/0.02 0.9/0 − port scan 0.92/0.01 − − − − 0.9/0.01 0.67/0.01 0.9/0 0.9/0.01 p2p − 0.3/0.02 − c&c − − 0.9/0.01 RPC exploitation − − 0.7/0.01 0.98/0.13 0.97/0.16 0.97/0.08 brute force block scan DDoS not anomalous As it was mentioned before, ROC plots can be also used to evaluate performance of a classifier. It presents more detailed characteristic than Accuracy. The ROC curve is obtained for a classifier by plotting TPR on x axis and FPR on y axis. The Area Under a Curve (AUC) is a scalar measurement method connected with a ROC. While evaluating the classifier, the ROC plot considers all possible operating points (thresholds) in the classifier’s prediction in order to identify the operating point at which the best performance is achieved. A ROC curve does not directly present the optimal value, instead it shows a tradeoff between TPR and FPR. Depending on the goals one can change the optimal operating point in order to limit FPR or to increase TPR. An exemplary ROC for perfect (a), partially overlapped (b) and random (c) classifier is presented in Fig. 8.5. Figure 8.5: Examplary ROC curves ROC is only applicable to the binary classification case. As in our approach more than two classes are considered we can analyze individual ROC curves for each of the classes separately as presented in P. Bereziński Entropy-based Network Anomaly Detection 74 8.2. Performance evaluation Fig. 8.6. This way of analysis is supported in Weka. Based on such analysis we can find the performance of particular classifier for each class. This may be useful to find the best classifier for a specific anomaly, but in our case we are looking for the classifiers which are on average the best for all classes. This is typically measured by avaraged Accuracy but this measure hides some important characteristics. Thus, we propose a method of calculating a multi-class ROC based on weighted results of binary ROC for each individual class. In Weka there is a feature to generate and save in files individual ROC curves for each of the classes of multi-class classifier separately. Weka ROC file consists of operating points (threshold values) and confusion matrices containing relevant TP, FN, TN, FP values for binary classification of particular class. In our approach we take ROC files generated by Weka (one file for each class in the dataset) and perform processing in order to average the results. The pseudo-code in Python language explaining in details the way weighted ROC is calculated is presented in Listing 8.1. The main idea behind it is that the corresponding ROC for each class is avaraged with respect to the number of class instances. As a result we get one weighted multi-class ROC based on all binary ROC results. (a) brute force (b) network scan (c) ddos (d) port scan (e) not anomalous Figure 8.6: ROC curves (one per class) for SimpleLogistic classifier (Renyi entropy case) based on Scenario 1 P. Bereziński Entropy-based Network Anomaly Detection 75 8.2. Performance evaluation Listing 8.1: Weighted ROC calculation def confusion_matrix(roc, threshold): """Finds a confusion matrix for the specified threshold value. Args: roc - an object representing a ROC curve, threshold - a threshold value Returns: A confusion matrix for the specified threshold value.""" points = roc.points_sorted_by_threshold index = bsearch(points, threshold, key=lambda x: x[0]) if index >= 0: point = points[index] else: """take the point with the closest smaller threshold value""" point = points[-index - 1] return point.confusion_matrix def all_thresholds(rocs): """Returns a list of all unique threshold values for the ROC curves. Args: rocs - an iterable of ROC curves Returns: A sorted list of all unique threshold values for supplied ROC curves.""" return sorted(set((th for roc in rocs for th in roc.all_thresholds))) def weighted_roc_gen(rocs, weights): """Generates weighted ROC points for the supplied ROC curves and weights. Args: rocs - an iterable of ROC curves, weights: a list/tuple of weights such as: - 0 <= weights[i] <= 1 for all i, sum(weights) = 1, - weights[i] corresponds to rocs[i], - weights are proportional to the number of anomalies in dataset Returns: Yields subsequent weighted confusion matrices.""" thresholds = all_thresholds(rocs) for th in thresholds: weighted_cm = 0, 0, 0, 0 for i in range(len(rocs)): cm = confusion_matrix(rocs[i], th) for j in range(len(weighted_cm)): weighted_cm[j] += weights[i] * cm[j] tp, fn, tn, fp = weighted_cm fpr = fp / (fp + tn) tpr = tp / (tp + fn) yield fpr, tpr def print_roc(rocs, weights): """Prints a weighted ROC as a function. Args: rocs - an iterable of ROC curves, weights - a list/tuple of weights""" for fpr, tpr in weighted_roc_gen(rocs, weights): print("%f %f" % (fpr, tpr)) Weighted ROC curves for SimpleLogistic classifier for all scenarios are depicted in Fig. 8.7, Fig. 8.8 and Fig. 8.9 respectively. It is noticeable that the results for Tsallis and Renyi entropy are better than that P. Bereziński Entropy-based Network Anomaly Detection 8.3. Conclusions 76 for Shannon. ROC curves for volume-based case shows that this approach is really poor. 8.3. Conclusions Concluding the results of evaluation, we can observe that, for our scenarios: – the Tsallis and Renyi entropy performed best; – the Shannon entropy turned out to be a bit worse both in Accuracy and False Positive Rate as well as weighted ROC curves; – the volume-based approach performed poorly; – using a broad spectrum of network features is essential to successfully detect and classify different types of network anomalies; this was proved both by results of features correlation and good results of classification of different anomalies in tested scenarios; – using α-values from a set {−2, −1, 0, 1, 2} is a proper choice; it was proved by both the results of α-values correlation and good results of classification of different anomalies in the tested scenarios; using a larger set of α-values is redundant; using one value is not enough to recognize different types of anomalies; – the most suitable classifier (among popular method employed in Weka) to our approach is the SimpleLogistic which relies on linear regression. Our experiments were limited to a few number of cases. However, these cases are representative. Although, only one day legitimate traffic profile was built, we have observed that this profile suits to each regular working day in the network we monitored so there was no need to prepare whole-week profiles in our case. The weak performance of the Shannon entropy and poor performance of volumebased counters allows to question whether they are the right approach to detect anomalies caused by botnet-like malware. P. Bereziński Entropy-based Network Anomaly Detection 77 8.3. Conclusions (a) Tsallis (b) Renyi (c) Shannon (d) volume-based Figure 8.7: Weighted ROC curves for SimpleLogistic classifier – Scenario 1 P. Bereziński Entropy-based Network Anomaly Detection 78 8.3. Conclusions (a) Tsallis (b) Renyi (c) Shannon (d) volume-based Figure 8.8: Weighted ROC curves for SimpleLogistic classifier – Scenario 2 P. Bereziński Entropy-based Network Anomaly Detection 79 8.3. Conclusions (a) Tsallis (b) Renyi (c) Shannon (d) volume-based Figure 8.9: Weighted ROC curves for SimpleLogistic classifier – Scenario 3 P. Bereziński Entropy-based Network Anomaly Detection 9. Conclusions and further work This chapter summarizes results achieved in this Thesis and outlines further works related to the subject of network anomaly detection. It also provides a list of publications completed during this research. 9.1. Conclusions Looking for effective method for network anomaly detection is the general problem of this Dissertation. The scope of this work is limited to detection of anomalies indicating presence of modern botnet-like malware in local networks. From many anomaly detection techniques an entropy-based approach was chosen and deeply examined. The goal of the Dissertation was accomplished by finding the answer for the following questions: – Are entropy measures useful in the context of network anomaly detection? – Is it possible to effectively detect and classify small and low-rate anomalies connected with botnetlike malware activity in local networks by means of entropy? – Is entropy-based approach better than traditional volume-based approach? – Do parameterized entropies help to improve results obtained for Shannon entropy? – What is the proper set of parameters for entropies to successfully detect network anomalies? – Which network features should be taken into consideration in order to detect broad spectrum of anomalies connected with botnet-like malware? – Which popular classifiers work fine with entropy-based approach? A thorough analysis of the state of the art provided in Chapter 2, shows that entropy-based approach seems to be promising in detecting different types of network anomalies while traditional volume-based approach is limited to anomalies, which results in significant and abrupt network volume change. Furthermore, the use of parameterized entropies allows overcoming some limitations of Shannon entropy caused by its small descriptive power. Although entropy-based network anomaly detection is a deeply investigated area the following gaps have been found: – entropy-based detection of botnet-like anomalies in local networks remains intact; 80 9.1. Conclusions 81 – the problem of finding proper α-values for parameterized entropies in order to sucessfully detect small and low-rate network anomalies is still open; – there are some contrary results regarding network feature distributions to use with entropy-based network anomaly detection; – it is unknown which multi-class classification method works fine with entropy-based approach as most authors focus mainly on detection; – there is lack of available datasets to evaluate the method proposed in this Dissertation. In order to answer the questions and to prove the claim of this Thesis, an original method based on entropy measures was proposed – Chapter 5, Chapter 6 and than verified – Chapter 8. To make verification possible a proper semi-synthetic dataset was prepared – Chapter 7. Additionally, a theoretical background as well as deep comparison of different entropy measures is presented in Chapter 4 and some network data capture issues are presented in Chapter 5. General conclusions for the presented studies state that it is possible to detect modern botnet-like malware in local networks based on detected anomalies with entropy-based approach. Based on the detailed results presented in Chapter 8 we claim that: – the Tsallis or Renyi should be used to achieve satisfactory effectiveness, as in our experiments Shannon entropy turned out to be worse both in Accuracy and False Positive Rate as well as weighted ROC curves; – the volume-based approach performs poorly, so the popular methods based on simple counters like number of flows, packets or bytes are completely ineffective for detection of botnet-like malware based on the observed anomalies; – using a broad spectrum of network features is essential to successfully detect and classify different types of network anomalies; this was proved by both the results of features correlation and good results of classification of different anomalies in the tested scenarios; – using α-values for Tsallis or Renyi entropy from a set {−2, −1, 0, 1, 2} is a proper choice; it was proved by results of α-values correlation and good results of classification of different anomalies in the tested scenarios; using a larger set of α-values is redundant; using one value is not enough to recognize different types of anomalies; – the most suitable classifier (among popular methods employed in Weka) to our approach is the SimpleLogistic which relies on linear regression. These claims are based on experiments which were limited to a few number of cases. However, these cases are representative. P. Bereziński Entropy-based Network Anomaly Detection 9.2. Further work 82 9.2. Further work 9.2.1. On-line analysis in a real environment As it was mentioned in Chapter 6 the final implementation of the proposed method allows on-line classification of anomalies based on NetFlow reports coming in real time from probes deployed in network. So far, Anode has not been tested in real environment. Pilot implementation of Anode in Military Communication Institute is planned for the near feature. 9.2.2. Multi-classifier During our experiments we could observe that some classifiers were especially effective in detecting a single, selected anomaly type. In particular, the false positives ratio was small. We thus believe that it would be possible to combine a number of such "dedicated" classifiers in a way to cover the whole anomaly spectrum. There is a number of proposed architecture variants for a multi-classifier like stacking, bagging and boosting [SPBW12], however, we think this case may require a dedicated approach. We are going to design such a multi-classifier and compare its performance against possible competitors. 9.2.3. Multi-label approach A multi-class classification usually means classifying a data point into only one of the many (more than two) classes possible. It is much more advanced and sophisticated than the simple detection where only two classes, i.e. anomalous and not anomalous, are taken into account. However, a multi-class approach does not solve the problem when more than one class should be assigned to one data point. For example, the data point may belong to port scan and brute force classes simultaneously because both anomalies appeared in the same time. With a multi-label classification [MKGD12], [Mek14] one can classify a data point into more than one of the possible classes. This Dissertation does not cover a multi-label problem, however this is one of the main directions for a further work. 9.2.4. Dataset The dataset presented in this Thesis consists of synthetic anomalies mixed with real legitimate network traffic. For the feature we are planning to extend the range of anomalies in our dataset by adding new models of network behavior typical for a botnet-like malware. Moreover, we are planning to capture a legitimate traffic from a bit larger network and publish (after anonimization) the dataset in order to make possible the comparison with other methods. As research in network anomaly detection suffers from a lack of such up-to-date datasets such contribution is desirable. The dataset will be developed and published under CybSecLab project sponsored by the Polish National Centre for Research and Development. We are going to put a attention to the sanitization process as weak anonimization in network traces may disclose some classified data [PAPL06]. P. Bereziński Entropy-based Network Anomaly Detection 83 9.3. Publications 9.3. Publications The work presented in this Thesis is based on several publications. The concept of multi-sensor cyber defence system named SOPAS designed for federated environment is presented in: P. Bereziński, J. Śliwa, R. Piotrowski, and B. Jasiul. Detection of multistage attack in federation of systems environment. In NATO Science and Technology Organization MP-IST-111 - Information Assurance and Cyber Defence, 2012. Detection of the original multistage attack in the SOPAS system with the use of signaturebased and anomaly-based techniques and tools is described in: B. Jasiul, R. Piotrowski, P. Bereziński, M. Choraś, R. Kozik, and J. Brzostek. defence system - applied methods and techniques. Federated cyber In Communications and Information Systems Conference (MCC), pages 1–6, Oct 2012. A survey on modern entropy-based measures to use in network anomaly detection is presented in: J. Pawelec, P. Bereziński, R. Piotrowski, and W. Chamela. Entropy measures for internet traffic anomaly detection. In TransComp conference on Computer Systems, Industry and Transport, pages 309–318, 2012. The problem of lack of good, recent datasets that could be employed for evaluation of network anomaly detection methods is pointed out in: M. Małowidzki, P. Bereziński, and M. Mazur. Network intrusion detection: Half a kingdom for a good dataset. In Proceedings of NATO STO SAS-139 Workshop, Portugal, 2015. The concept of entropy-based anomaly detection method to use in SECOR system and preliminary results based on a case study are presented in: P. Bereziński, J. Pawelec, M. Małowidzki, and R. Piotrowski. Entropy-based internet traffic anomaly detection: A case study. In W. Zamojski, J. Mazurkiewicz, J. Sugier, T. Walkowiak, and J. Kacprzyk, editors, Proceedings of the Ninth International Conference on Dependability and Complex Systems DepCoS-RELCOMEX, volume 286 of Advances in Intelligent Systems and Computing, pages 47–58. Springer International Publishing, 2014. Dataset generation process and performance results of the method based on one of the available datasets are shown in: P. Bereziński Entropy-based Network Anomaly Detection 84 9.3. Publications P. Bereziński, M. Szpyrka, B. Jasiul, and M. Mazur. Network anomaly detection using parame- terized entropy. In K. Saeed and V. Snášel, editors, Computer Information Systems and Industrial Management, volume 8838 of Lecture Notes in Computer Science, pages 465–478. Springer Berlin Heidelberg, 2014, Best Paper Award. Final implementation and performance results based on 3 botnet-like worm scenarios included in the self-cratfted dataset are presented in: P. Bereziński, B. Jasiul, and M. Szpyrka. An entropy-based network anomaly detection method. Entropy, 17(4):2367–2408, 2015, IF = 1.56, Lista A wykazu MNiSW - 30 pkt. P. Bereziński Entropy-based Network Anomaly Detection Bibliography [Agg13] C.C. Aggarwal. Outlier Analysis. Springer New York, 2013. [Alp10] E. Alpaydin. Introduction to Machine Learning. The MIT Press, 2nd edition, 2010. [Arg] Argus – Audit Record Generation and Utilization System. http://qosient.com/argus. [Axe99] S. Axelsson. The base-rate fallacy and its implications for the difficulty of intrusion detection. In Proceedings of the 6th ACM Conference on Computer and Communications Security, CCS ’99, pages 1–7, New York, NY, USA, 1999. ACM. [Bar13] S. tion Barnum. with Standardizing the Structured Cyber Threat Threat Information Intelligence Informa- eXpression (STIXTM ). http://stix.mitre.org/about/documents/STIX_Whitepaper_v1.0.pdf, 2013. [Bay63] T. Bayes. An essay towards solving a problem in the doctrine of chances. by the late rev. Mr. Bayes, f. r. s. communicated by Mr. Price, in a letter to John Canton, m. a. and f. r. s. Philosophical Transactions of the Royal Society, 53:370–418, 1763. [BBK13] M. Bhuyan, D.K. Bhattacharyya, and J. Kalita. Network anomaly detection: methods, systems and tools. IEEE Communication Surveys and Tutorials, 16(1):1–34, 2013. [BBR+ 12] L. Bilge, D. Balzarotti, W. Robertson, E. Kirda, and C. Kruegel. Disclosure: Detecting botnet command and control servers through large-scale NetFlow analysis. In Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC ’12, pages 129–138, New York, NY, USA, 2012. ACM. [BDKC10] L. Braun, A. Didebulidze, N. Kammenhuber, and G. Carle. Comparing and improving current packet capturing solutions based on commodity hardware. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC ’10, pages 206– 217, New York, NY, USA, 2010. ACM. [BDWS09] D. Brauckhoff, X. Dimitropoulos, A. Wagner, and K. Salamatian. Anomaly extraction in backbone networks using association rules. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement Conference, IMC ’09, pages 28–34, New York, NY, USA, 2009. ACM. 85 86 BIBLIOGRAPHY [BI08] R.J. Barnett and B. Irwin. Towards a taxonomy of network scanning techniques. In Proceedings of the 2008 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on IT Research in Developing Countries: Riding the Wave of Technology, SAICSIT ’08, pages 1–7, New York, NY, USA, 2008. ACM. [Bit11] T. Bitton. Imperva – Morto post mortem: Dissecting a worm. Technical report, 2011. [BJS15] P. Bereziński, B. Jasiul, and M. Szpyrka. An entropy-based network anomaly detection method. Entropy, 17(4):2367–2408, 2015. [BK13] D.K. Bhattacharyya and J.K. Kalita. Network Anomaly Detection: A Machine Learning Perspective. Chapman & Hall/CRC, 2013. [BKPR02] P. Barford, J. Kline, D. Plonka, and A. Ron. A signal analysis of network traffic anomalies. In Proceedings of the 2Nd ACM SIGCOMM Workshop on Internet Measurment, IMW ’02, pages 71–82, New York, NY, USA, 2002. ACM. [BlPJ12] P. Bereziński, J. Śliwa, R. Piotrowski, and B. Jasiul. Detection of multistage attack in federation of systems environment. In NATO Science and Technology Organization MP-IST-111 - Information Assurance and Cyber Defence, 2012. [BPBF12] B. Bencsáth, G. Pék, L. Buttyán, and M. Félegyházi. The cousins of stuxnet: Duqu, flame, and gauss. Future Internet, 4(4):971–1003, 2012. [BPMP14] P. Bereziński, J. Pawelec, M. Małowidzki, and R. Piotrowski. Entropy-based internet traffic anomaly detection: A case study. In W. Zamojski, J. Mazurkiewicz, J. Sugier, T. Walkowiak, and J. Kacprzyk, editors, Proceedings of the Ninth International Conference on Dependability and Complex Systems DepCoS-RELCOMEX, volume 286 of Advances in Intelligent Systems and Computing, pages 47–58. Springer International Publishing, 2014. [Bra10] D. Brauckhoff. Network Traffic Anomaly Detection and Evaluation. PhD thesis, ETH Zürich, 2010. [BSJM14] P. Bereziński, M. Szpyrka, B. Jasiul, and M. Mazur. Network anomaly detection using parameterized entropy. In K. Saeed and V. Snášel, editors, Computer Information Systems and Industrial Management, volume 8838 of Lecture Notes in Computer Science, pages 465–478. Springer Berlin Heidelberg, 2014. [BSS+ 14] J.G. Bazan, M. Szpyrka, A. Szczur, Ł. Dydo, and H. Wojtowicz. Classifiers for behavioral patterns identification induced from huge temporal data. In Proceedings of the Concurrency Specification and Programming Workshop (CSP 2014), volume 1269 of CEUR Workshop Proceedings, pages 22–33, Chemnitz, Germany, September 29October 1 2014. P. Bereziński Entropy-based Network Anomaly Detection 87 BIBLIOGRAPHY [BTW+ 06] D. Brauckhoff, B. Tellenbach, A. Wagner, M. May, and A. Lakhina. Impact of packet sampling on anomaly detection metrics. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, IMC ’06, pages 159–164, New York, NY, USA, 2006. ACM. [BWM08] D. Brauckhoff, A. Wagner, and M. May. Flame: A flow-level anomaly modeling engine. In Proceedings of the Conference on Cyber Security Experimentation and Test, CSET’08, pages 1–6, Berkeley, CA, USA, 2008. USENIX Association. [Cai] Center for Applied Internet Data Analysis (CAIDA). http://www.caida.org/data/overview. [Cal09] C. Callegari. Statistical approaches for network anomaly detection. In Proceedings of the 4Th International Conference on Internet Monitoring and Protection, 2009. [CBK09] V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3):15:1–15:58, July 2009. [Cer13] CERT Poland Raport. http://www.cert.pl/PDF/Report_CP_2013.pdf, 2013. [CFVN09] P. Casas, L. Fillatre, S. Vaton, and I. Nikiforov. Volume anomaly detection in data networks: An optimal detection algorithm vs. the PCA approach. In R. Valadas and P. Salvador, editors, Traffic Management and Traffic Engineering for the Future Internet, volume 5464 of Lecture Notes in Computer Science, pages 96–113. Springer Berlin Heidelberg, 2009. [CH67] R. Clausius and T.A. Hirst. The mechanical theory of heat: With its applications to the steam-engine and to the physical properties of bodies. J. van Voorst, London, 1867. [CJSS03] C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk. Gigascope: A stream database for network applications. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD’03, pages 647–651, New York, NY, USA, 2003. ACM. [CKP+ 11] M. Choraś, R. Kozik, R. Piotrowski, J. Brzostek, and W. Hołubowicz. Network events correlation for federated networks protection system. In Towards a Service-Based Internet, volume 6994 of Lecture Notes in Computer Science, pages 100–111. Springer, 2011. [CKS+ 09] A. Callado, C. Kamienski, G. Szabo, B. Gero, J. Kelner, S. Fernandes, and D. Sadok. A survey on internet traffic identification. Commun. Surveys Tuts., 11(3):37–52, July 2009. [Cla04] B. Claise. Cisco Systems NetFlow Services Export Version 9. RFC 3954, IETF, 2004. P. Bereziński Entropy-based Network Anomaly Detection 88 BIBLIOGRAPHY [CLLL12] T.H. Cheng, Y.D. Lin, Y.C Lai, and P.C Lin. Evasion techniques: Sneaking through your intrusion detection/prevention systems. Communications Surveys Tutorials, IEEE, 14(4):1011–1020, Fourth 2012. [CMRB09] S.E. Coull, F. Monrose, M.K. Reiter, and M. Bailey. The challenges of effectively anonymizing network data. In Proceedings of the 2009 Cybersecurity Applications & Technology Conference for Homeland Security, CATCH ’09, pages 230–236, Washington, DC, USA, 2009. IEEE Computer Society. [Cor15] MITRE Corp. The MITRE Corporation Research Overview. http://www.mitre.org/research/overview, 2015. [CRKM11] Z.B. Celik, J. Raghuram, G. Kesidis, and D.J. Miller. Salting public traces with attack traffic to test flow classifiers. In 4th Workshop on Cyber Security Experimentation and Test, CSET ’11, San Francisco, CA, USA, August 8, 2011, 2011. [Csi08] I. Csiszár. Axiomatic characterizations of information measures. Entropy, 10(3):261– 273, 2008. [CSO+ 09] I. Cunha, F. Silveira, R. Oliveira, R. Teixeira, and C. Diot. Uncovering artifacts of flow measurement tools. In S. Moon, R. Teixeira, and S. Uhlig, editors, Passive and Active Network Measurement, volume 5448 of Lecture Notes in Computer Science, pages 187–196. Springer Berlin Heidelberg, 2009. [CT06] T.M. Cover and J.A. Thomas. Elements of Information Theory. A Wiley-Interscience publication. Wiley, 2006. [DDL+ 12] E. Damon, J. Dale, E. Laron, J. Mache, N. Land, and R. Weiss. Hands-on Denial of Service lab exercises using Slowloris and Rudy. In Proceedings of the 2012 Information Security Curriculum Development Conference, InfoSecCD ’12, pages 21–29, New York, NY, USA, 2012. ACM. [Den87] D.E. Denningm. An intrusion-detection model. IEEE Transactions on Software Engineering, 13(2):222–232, 1987. [Den12] D.E. Denning. Stuxnet: What has changed? Future Internet, 4(3):672–687, 2012. [DG06] J. Davis and M. Goadrich. The relationship between Precision-Recall and ROC curves. In Proc. of the 23rd Int. Conference on Machine Learning, ICML’06, pages 233–240. ACM, 2006. [EDD+ 13] A.F. Emmott, S. Das, T. Dietterich, A. Fern, and W. Wong. Systematic construction of anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, ODD ’13, pages 16–21, New York, NY, USA, 2013. ACM. P. Bereziński Entropy-based Network Anomaly Detection 89 BIBLIOGRAPHY [Eim08] R. Eimann. Network Event Detection with Entropy Measures. PhD thesis, University of Auckland, 2008. [ESB05] R. Eimann, U. Speidel, and J.N. Brownlee. A T-entropy analysis of the slammer worm outbreak. In Proceedings of Asia-Pacific Network Operations and Management Symposium (APNOMS), pages 434–445, 2005. [ETGTDV04] J.M. Estevez-Tapiador, P. Garcia-Teodoro, and J.E. Diaz-Verdejo. Anomaly detection methods in wired networks: A survey and taxonomy. Comput. Commun., 27(16):1569– 1584, October 2004. [FAAM07] M. Foukarakis, D. Antoniades, S. Antonatos, and E.P. Markatos. Flexible and highperformance anonymization of NetFlow records using anontool. In Third International Conference on Security and Privacy in Communication Networks and the Workshops, SecureComm 2007, Nice, France, 17-21 September, 2007, pages 33–38, 2007. [Faw06] T. Fawcett. An introduction to ROC analysis. Pattern Recogn. Lett., 27(8):861–874, 2006. [Floa] AKMA Labs FlowMatrix. http://www.akmalabs.com. [Flob] Flow-tools – Tool set for working with NetFlow data. http://code.google.com/p/flowtools. [Floc] Invea-Tech FlowMon. https://www.invea.com. [Fpr] Fprobe – NetFlow probe. http://fprobe.sourceforge.net. [FWB+ 11] J. Francois, S. Wang, W. Bronzi, R. State, and T. Engel. BotCloud: Detecting botnets using MapReduce. In Proceedings of the 2011 IEEE International Workshop on Information Forensics and Security, WIFS ’11, pages 1–6, Washington, DC, USA, 2011. IEEE Computer Society. [GGSZ14] S. García, M. Grill, J. Stiborek, and A. Zunino. An empirical comparison of botnet detection methods. Comput. Secur., 45:100–123, September 2014. [GHK14] M. Golling, R. Hofstede, and R. Koch. Towards multi-layered intrusion detection in high-speed networks. In Cyber Conflict (CyCon 2014), 2014 6th International Conference On, pages 191–206, June 2014. [GM10] P. Giura and N. Memon. Netstore: An efficient storage infrastructure for network forensics and monitoring. In S. Jha, R. Sommer, and Ch. Kreibich, editors, Recent Advances in Intrusion Detection, volume 6307 of Lecture Notes in Computer Science, pages 277– 296. Springer Berlin Heidelberg, 2010. P. Bereziński Entropy-based Network Anomaly Detection 90 BIBLIOGRAPHY [GMT05] Y. Gu, A. McCallum, and D. Towsley. Detecting anomalies in network traffic using maximum entropy estimation. In Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, IMC ’05, pages 32–32, Berkeley, CA, USA, 2005. USENIX Association. [GOB11] H. Gascon, A. Orfila, and J. Blasco. Analysis of update delays in signature-based network intrusion detection systems. Computers & Security, 30(8):613–624, 2011. [GTDVMFV09] P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, and E. Vázquez. Anomalybased network intrusion detection: Techniques, systems and challenges. Computers & Security, 28(1–2):18–28, 2009. [GV03] P.D. Grünwald and P.M.B. Vitányi. Kolmogorov complexity and information theory with an interpretation in terms of questions and answers. J. of Logic, Lang. and Inf., 12(4):497–529, 2003. [HA04] V.J. Hodge and J. Austin. A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2):85–126, 2004. [HCT+ 14] R. Hofstede, P. Celeda, B. Trammell, I. Drago, R. Sadre, A. Sperotto, and A. Pras. Flow monitoring explained: From packet capture to data analysis with NetFlow and IPFIX. Communications Surveys Tutorials, IEEE, 16(4):2037–2064, 2014. [HDS+ 13] R. Hofstede, I. Drago, A. Sperotto, R. Sadre, and A. Pras. Measurement artifacts in NetFlow data. In M. Roughan and R. Chang, editors, Passive and Active Measurement, volume 7799 of Lecture Notes in Computer Science, pages 1–10. Springer Berlin Heidelberg, 2013. [HFH+ 09] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The WEKA data mining software: An update. SIGKDD Explor. Newsl., 11(1):10–18, 2009. [HK11] J. Hauke and T. Kossowski. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae, 30(2):87–93, 2011. [HLF+ 01] J.W. Haines, R.P. Lippmann, D.J. Fried, M.A. Zissman, E. Tran, and S.B. Boswell. 1999 DARPA intrusion detection evaluation: Design and procedures. report, Technical https://www.ll.mit.edu/mission/communications/cyber/CSTcorpora/files/TR- 1062.pdf, 2001. MIT Lincoln Laboratory, Technical Report 1062. [HNG+ 07] L. Huang, X. Nguyen, M. Garofalakis, M. Jordan, A.D. Joseph, and N. Taft. In-network PCA and anomaly detection. Technical Report UCB/EECS-2007-10, EECS Department, University of California, Berkeley, Jan 2007. [HP14] HP - The bot threat. http://www.bitpipe.com/detail/RES/1384218191_706.html, 2014. P. Bereziński Entropy-based Network Anomaly Detection 91 BIBLIOGRAPHY [HPW02] D. Harrington, R. Presuhn, and B. Wijnen. An Architecture for Describing Simple Network Management Protocol (SNMP) Management Frameworks. RFC 3411, IETF, 2002. [HTF09] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: data mining, inference and prediction. Springer, 2 edition, 2009. [IT10] C.M. Inacio and B. Trammell. Yaf: Yet another flowmeter. In Proceedings of the 24th International Conference on Large Installation System Administration, LISA’10, pages 1–16, Berkeley, CA, USA, 2010. USENIX Association. [ITA] ACM Sigcomm Internet Traffic Archive. http://www.sigcomm.org/ITA. [IZ14] F. Iglesias and T. Zseby. Entropy-based characterization of internet background radiation. Entropy, 17(1):74–101, 2014. [JP05] S.S. Joshi and V.V. Phoha. Investigating Hidden Markov Models capabilities in anomaly detection. In Proceedings of the 43rd Annual Southeast Regional Conference - Volume 1, ACM-SE 43, pages 98–103, New York, NY, USA, 2005. ACM. [JPB+ 12] B. Jasiul, R. Piotrowski, P. Bereziński, M. Choraś, R. Kozik, and J. Brzostek. Federated cyber defence system - applied methods and techniques. In Communications and Information Systems Conference (MCC), pages 1–6, Oct 2012. [JSl14a] B. Jasiul, M. Szpyrka, and J. Śliwa. Detection and modeling of cyber attacks with Petri Nets. Entropy, 16(12):6602–6623, 2014. [JSl14b] B. Jasiul, M. Szpyrka, and J. Śliwa. Malware behavior modeling with Colored Petri Nets. In Computer Information Systems and Industrial Management - 13th IFIP TC8 International Conference, CISIM 2014, Ho Chi Minh City, Vietnam, November 5-7, 2014. Proceedings, pages 667–679, 2014. [KAA+ 06] D. Koukis, S. Antonatos, D. Antoniades, E.P. Markatos, and P. Trimintzios. A generic anonymization framework for network traffic. In Communications, 2006. ICC ’06. IEEE International Conference on, volume 5, pages 2302–2309, June 2006. [Kar03] J. Karmeshu. Entropy Measures, Maximum Entropy Principle and Emerging Applications. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2003. [KBHJ08] Y. Kopylova, D.A. Buell, C Huang, and J. Janies. Mutual information applied to anomaly detection. pages 89–97, 2008. [KDD] The Third International Knowledge Discovery and Data Mining Tools (KDD) Cup 1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. P. Bereziński Entropy-based Network Anomaly Detection 92 BIBLIOGRAPHY [KHRH14] M. Kührer, T. Hupperich, C. Rossow, and T. Holz. Exit from hell? Reducing the impact of amplification DDoS attacks. In Proceedings of the 23rd USENIX Security Symposium, August 2014. [Kog11] J. Kogel. One-way delay measurement based on flow data: Quantification and compensation of errors by exporter profiling. In International Conference on Information Networking (ICOIN), pages 25–30, Jan 2011. [KS12] P. Kumar and S.A. Senthil. Establishing a valuable method of packet capture and packet analyzer tools in firewal. International Journal of Research Studies in Computing, 1(1):11–20, 2012. [KSD09] A. Kind, M.P. Stoecklin, and X. Dimitropoulos. Histogram-based traffic anomaly detection. IEEE Trans. on Netw. and Serv. Manag., 6(2):110–121, June 2009. [Kul59] S. Kullback. Information Theory and Statistics. John Wiley & Sons, New York, 1959. [LAS12] C.F. L. Lima, F.M. Assis, and C.P. Souza. A comparative study of use of Shannon, Renyi and Tsallis entropy for attribute selecting in network intrusion detection. In Proceedings of the 13th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL’12, pages 492–501, Berlin, Heidelberg, 2012. SpringerVerlag. [LBN] Lawrence Berkeley National Laboratory/International Computer Science Institute Enterprise Tracing. http://www.icir.org/enterprise-tracing/Overview.html. [LCD05] A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM ’05, pages 217–228, 2005. [LDZ05] Z.i Li, A. Das, and J. Zhou. Usaid: Unifying signature-based and anomaly-based intrusion detection. In T. Ho, D. Cheung, and H. Liu, editors, Advances in Knowledge Discovery and Data Mining, volume 3518 of Lecture Notes in Computer Science, pages 702–712. Springer Berlin Heidelberg, 2005. [LG09] W. Lu and A.A. Ghorbani. Network anomaly detection based on wavelet analysis. EURASIP J. Adv. Sig. Proc., 2009, 2009. [LHF05] N. Landwehr, M. Hall, and E. Frank. Logistic model trees. 95(1-2):161–205, 2005. [LPKL09] D.C. Lee, B. Park, K.E. Kim, and J.J. Lee. Fast traffic anomalies detection using SNMP MIB correlation analysis. In Proceedings of the 11th International Conference on Advanced Communication Technology - Volume 1, ICACT’09, pages 166–170, Piscataway, NJ, USA, 2009. IEEE Press. P. Bereziński Entropy-based Network Anomaly Detection 93 BIBLIOGRAPHY [LTG08] W. Lu, M. Tavallaee, and A.A. Ghorbani. Detecting network anomalies using different wavelet basis functions. In Sixth Annual Conference on Communication Networks and Services Research (CNSR 2008), 5-8 May 2008, Halifax, Nova Scotia, Canada, pages 149–156, 2008. [LWK10] K. Limthong, P. Watanapongse, and F. Kensuke. A wavelet-based anomaly detection for outbound network traffic. In Information and Telecommunication Technologies (APSITT), 2010 8th Asia-Pacific Symposium on, pages 1–6, 2010. [LWLS06] C. Livadas, R. Walsh, D. Lapsley, and W.T. Strayer. Using machine learning techniques to identify botnet traffic. In In 2nd IEEE LCN Workshop on Network Security (WoNS 2006, pages 967–974, 2006. [LX01] W. Lee and D. Xiang. Information-theoretic measures for anomaly detection. In Proceedings of the 2001 IEEE Symposium on Security and Privacy, SP ’01, pages 130– 143, Washington, DC, USA, 2001. IEEE Computer Society. [LYW13] Y.J. Lee, Y.R. Yeh, and Y.C.F. Wang. Anomaly detection via online oversampling principal component analysis. IEEE Trans. on Knowl. and Data Eng., 25(7):1460– 1470, 2013. [Mar05] M. Marco. A step beyond Tsallis and Renyi entropies. Physics Letters A, 338(3– 5):217–224, 2005. [MBM15] M. Małowidzki, P. Bereziński, and M. Mazur. Network intrusion detection: Half a kingdom for a good dataset. In Proceedings of NATO STO SAS-139 Workshop, Portugal, 2015. [MC03] M.V. Mahoney and P.K. Chan. An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection. In G. Vigna, C. Kruegel, and E. Jonsson, editors, Recent Advances in Intrusion Detection, volume 2820 of Lecture Notes in Computer Science, pages 220–237. Springer Berlin Heidelberg, 2003. [McH00] J. McHugh. Testing intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory. ACM Trans. Inf. Syst. Secur., 3(4):262–294, November 2000. [MD08] T. Maszczyk and W. Duch. Comparison of Shannon, Renyi and Tsallis entropy used in decision trees. In L. Rutkowski, R. Tadeusiewicz, L.A. Zadeh, and J.M. Zurada, editors, Artificial Intelligence and Soft Computing – ICAISC 2008, volume 5097 of Lecture Notes in Computer Science, pages 643–651. Springer Berlin Heidelberg, 2008. [Mek14] MEKA: A Multi-label Extension to WEKA. http://meka.sourceforge.net/, 2014. P. Bereziński Entropy-based Network Anomaly Detection 94 BIBLIOGRAPHY [MFF14] J. Mazel, R. Fontugne, and K. Fukuda. A taxonomy of anomalies in backbone network traffic. In Wireless Communications and Mobile Computing Conference (IWCMC), 2014 International, pages 30–36, Aug 2014. [MKGD12] G. Madjarov, D. Kocev, D. Gjorgjevikj, and S. Deroski. An extensive experimental comparison of methods for multi-label learning. Pattern Recogn., 45(9):3084–3104, 2012. [MoM] Cluster of European Projects aimed at Monitoring and Measurement (MoMe). http://www.ist-mome.org/database/MeasurementData. [MSHJSC+ 04] K. Myung-Sup, K. Hun-Jeong, H. Seong-Cheol, C. Seung-Hwa, and J.W. Hong. A flow-based method for abnormal network traffic detection. In Network Operations and Management Symposium, 2004. NOMS 2004. IEEE/IFIP, volume 1, pages 599–612, April 2004. [MSOS07] R.A. Martin, M. Schwabacher, N. Oza, and A. Srivastava. Comparison of unsupervised anomaly detection methods for systems health management using space shuttle. In Main Engine Data,” Proceedings of the Joint Army Navy NASA Air Force Conference on Propulsion, 2007, 2007. [Nai09] S. Nair. Finding Fault: Anomaly Detection for Embedded Networked Sensing. PhD thesis, University of California, 2009. [NfS] NfSen – NetFlow Sensor. http://nfsen.sourceforge.net. [NSA+ 08] G. Nychis, V. Sekar, D. Andersen, H. Kim, and H. Zhang. An empirical evaluation of entropy-based traffic anomaly detection. In Proceedings of the 8th ACM SIGCOMM Conference on Internet Measurement, IMC ’08, pages 151–156, New York, NY, USA, 2008. ACM. [Nto] NtopNg – High-Speed Web-based Traffic Analysis and Flow Collection. http://www.ntop.org. [OSG] OSGi – Open Service Gateway initiative. http://www.osgi.org. [Owe10] P. Owezarski. A database of anomalous traffic for assessing profile based IDS. In F. Ricciato, M. Mellia, and E.W. Biersack, editors, TMA, volume 6003 of Lecture Notes in Computer Science, pages 59–72. Springer, 2010. [PAPL06] R. Pang, M. Allman, V. Paxson, and J. Lee. The devil and packet trace anonymization. Computer Communication Review, 36(1):29–38, 2006. [Par13] C. Parsons. Deep packet inspection and its predecessors. Technical report, 2013. [Pax99] V. Paxson. Bro: A system for detecting network intruders in real-time. Comput. Netw., 31(23-24):2435–2463, 1999. P. Bereziński Entropy-based Network Anomaly Detection 95 BIBLIOGRAPHY [PBPC12] J. Pawelec, P. Bereziński, R. Piotrowski, and W. Chamela. Entropy measures for internet traffic anomaly detection. In TransComp conference on Computer Systems, Industry and Transport, pages 309–318, 2012. [Plo00] D. Plonka. Flowscan: A network traffic flow reporting and visualization tool. In In USENIX LISA, pages 305–317, 2000. [PLSG10] C. Phua, V.C.S Lee, K. Smith-Miles, and R.W. Gayler. A comprehensive survey of data mining-based fraud detection research. CoRR, abs/1009.6119, 2010. [PP07] A. Patcha and J.M. Park. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Comput. Netw., 51(12):3448–3470, August 2007. [Prt] Peassler PRTG – Network Monitor. http://www.paessler.com. [PSS+ 09] A. Pras, R. Sadre, A. Sperotto, T. Fioreze, D. Hausheer, and J. Schönwälder. Using NetFlow/IPFIX for network management. Journal of network and systems management, 17(4):482–487, November 2009. [REHA13] A.M. Riad, I. Elhenawy, A. Hassan, and N. Awadallah. Visualize network anomaly detection by using k-means clustering algorithm. International Journal of Computer Networks & Communications (IJCNC), 5(5), 2013. [Ren70] A. Renyi. Probability theory. By A. Renyi. [Enlarged version of Wahrscheinlichkeitsrechnung, Valoszinusegszamitas and Calcul des probabilites. English translation by L. Vekerdi]. North-Holland Pub. Co Amsterdam, 1970. [Ren11] R. Renk. Modyfikacja metody opartej o słownik funkcji bazowych do wykrywania anomalii w ruchu sieciowym w sieciach IP. PhD thesis, Uniwersytet TechnologicznoPrzyrodniczy im. Jana i J˛edrzeja Śniadeckich w Bydgoszczy, Wydział Telekomunikacji i Elektrotechniki, Bydgoszcz, 2011. [RFG05] C. Reimann, P. Filzmoser, and R.G. Garrett. Background and threshold: critical comparison of methods of determination. Science of The Total Environment, 346(1–3):1 – 16, 2005. [Rif08] R. Rifkin. MIT - Multiclass Classification. http://www.mit.edu/∼9.520/spring09/Classes/multiclass.pdf, 2008. [Roe99] M. Roesch. Snort - lightweight intrusion detection for networks. In Proceedings of the 13th USENIX Conference on System Administration, LISA ’99, pages 229–238, Berkeley, CA, USA, 1999. USENIX Association. [RSN+ 07] S. Ranjan, S. Shah, A. Nucci, M. Munafo, R. Cruz, and S. Muthukrishnan. Dowitcher: Effective worm detection and containment in the internet core. In INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE, pages 2541–2545, May 2007. P. Bereziński Entropy-based Network Anomaly Detection 96 BIBLIOGRAPHY [SBCQ09] G. Sadasivan, N. Brownlee, B. Claise, and J. Quittek. Architecture for IP Flow Information Export. RFC 5470, IETF, 2009. [Scr] Plixer Scrutinizer – Incident Response System. http://www.plixer.com. [SCSC03] M.L. Shyu, S.C. Chen, K. Sarinnapakorn, and L Chang. A novel anomaly detection scheme based on principal component classifier. In in Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop, in conjunction with the Third IEEE International Conference on Data Mining (ICDM’03, pages 172–179, 2003. [SEB07] U. Speidel, R. Eimann, and N. Brownlee. Detecting network events via T-entropy. In Information, Communications Signal Processing, 2007 6th International Conference on, pages 1–5, Dec 2007. [SF02] R. Sommer and A. Feldmann. NetFlow: Information loss or win? In Proceedings of the 2Nd ACM SIGCOMM Workshop on Internet Measurment, IMW’02, pages 173– 174, New York, NY, USA, 2002. ACM. [SFH05] M. Sumner, E. Frank, and M. Hall. Speeding up logistic model tree induction. In 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 675–683. Springer, 2005. [Sha48] C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423, 1948. [Sim] SimpleWeb. http://www.simpleweb.org/wiki/Traces. [SK14] M. Scanlon and M. Kechadi. The case for a collaborative universal peer-to-peer botnet investigation framework. In Proceedings of the 9th International Conference on Cyber Warfare and Security (ICCWS 2014), pages 287–293, Purdue University, West Lafayette, Indiana, USA, March 2014. Academic Conferences Limited. [SKF08] M.Z. Shafiq, S.A. Khayam, and M. Farooq. Improving accuracy of immune-inspired malware detectors by using intelligent features. In Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, GECCO ’08, pages 119–126, New York, NY, USA, 2008. ACM. [SL12] G.A.F. Seber and A.J. Lee. Linear Regression Analysis. Wiley Series in Probability and Statistics. Wiley, 2012. [SLBK08] M.P. Stoecklin, J.Y. Le Boudec, and A. Kind. A two-layered anomaly detection technique based on multi-modal flow behavior models. In Proceedings of the 9th International Conference on Passive and Active Network Measurement, PAM’08, pages 212– 221, Berlin, Heidelberg, 2008. Springer-Verlag. [Sof] Softflowd – Flow-based Network Traffic Analyser. http://code.google.com/p/softflowd/. P. Bereziński Entropy-based Network Anomaly Detection 97 BIBLIOGRAPHY [Sol] Solarwinds – Network Traffic Analyzer. http://www.solarwinds.com. [Sop14] Sophos – Security Threat Report Smarter, Shadier, Stealthier Malware. http://www.sophos.com/en-us/threat-center/medialibrary/PDFs/other/sophos-securitythreat-report-2014.pdf, 2014. [SPBW12] I. Syarif, A. Prugel-Bennett, and G. Wills. Unsupervised clustering approach for network anomaly detection. In R. Benlamri, editor, Networked Digital Technologies, volume 293 of Communications in Computer and Information Science, pages 135–145. Springer Berlin Heidelberg, 2012. [SS08] G. Schaffrath and B. Stiller. Conceptual integration of flow-based and packet-based network intrusion detection. In D. Hausheer and J. Schönwälder, editors, Resilient Networks and Services, volume 5127 of Lecture Notes in Computer Science, pages 190–194. Springer Berlin Heidelberg, 2008. [SSHB14] R. Sadre, A. Sperotto, R. Hofstede, and N. Brownlee. Flow-based approaches in network management: Recent advances and future trends. International Journal of Network Management, 24(4):219–220, 2014. [SSP12] R. Sadre, A. Sperotto, and A. Pras. The effects of DDoS attacks on flow monitoring applications. In NOMS, pages 269–277, 2012. [SSS+ 10] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller. An overview of IP flow-based intrusion detection. Commun. Surveys Tuts., 12(3):343–356, July 2010. [SSSP12] R.O. Schmidt, A. Sperotto, R. Sadre, and A. Pras. Towards bandwidth estimation using flow-level measurements. In R. Sadre, J. Novotný, P. Čeleda, M. Waldburger, and B. Stiller, editors, Dependable Networks and Services, volume 7279 of Lecture Notes in Computer Science, pages 127–138. Springer Berlin Heidelberg, 2012. [SST+ 04] A. Soule, K. Salamatia, N Taft, R. Emilion, and K. Papagiannaki. Flow classification by histograms: Or how to go on safari in the internet. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’04/Performance ’04, pages 49–60, New York, NY, USA, 2004. ACM. [SSTG12] A. Shiravi, H. Shiravi, M. Tavallaee, and A.A. Ghorbani. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur., 31(3):357–374, May 2012. [SSVP09] A. Sperotto, R. Sadre, F. Vliet, and A. Pras. A labeled data set for flow-based intrusion detection. In Proceedings of the 9th IEEE International Workshop on IP Operations and Management, IPOM ’09, pages 39–50, Berlin, Heidelberg, 2009. Springer-Verlag. P. Bereziński Entropy-based Network Anomaly Detection 98 BIBLIOGRAPHY [STG+ 11] S. Saad, I. Traoré, A.A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix, and P. Hakimian. Detecting p2p botnets through network behavior analysis and machine learning. In PST, pages 174–180. IEEE, 2011. [Stu11] ESET - Stuxnet Under the Microscope. http://www.eset.com/us/resources/white- papers/Stuxnet_Under_the_Microscope.pdf, 2011. [Sym14] 2014 Internet Security Threat Report, Volume 19. http://www.symantec.com/security_response/publications/threatreport.jsp, 2014. [SZH+ 13] W. Sha, Y Zhu, T. Huang, M. Qiu, Y Zhu, and Q. Zhang. A multi-order Markov chain based scheme for anomaly detection. In IEEE 37th Annual Computer Software and Applications Conference,COMPSAC Workshops 2013, pages 83–88, 2013. [Tap12] Gigamon – SPAN Port Or TAP? White Paper. https://www.netdescribe.com/downloads/span_port_or_tap_web.pdf, 2012. [TB08] B. Trammell and E. Boschi. Bidirectional Flow Export Using IP Flow Information Export (IPFIX). RFC 5103, IETF, 2008. [TBLG09] M. Tavallaee, E. Bagheri, W. Lu, and A.A. Ghorbani. A detailed analysis of the KDD Cup 99 data set. In Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA’09, pages 53–58, Piscataway, NJ, USA, 2009. IEEE Press. [TBS+ 11] B. Tellenbach, M. Burkhart, D. Schatzmann, D. Gugelmann, and D. Sornette. Accurate network anomaly classification with generalized entropy metrics. Comput. Netw., 55(15):3485–3502, October 2011. [TBSM09] B. Tellenbach, M. Burkhart, D. Sornette, and T. Maillart. Beyond Shannon: Characterizing internet traffic with generalized entropy metrics. In Proceedings of the 10th International Conference on Passive and Active Network Measurement, PAM ’09, pages 239–248, Berlin, Heidelberg, 2009. Springer-Verlag. [Tel12] B. Tellenbach. Detection, Classification and Visualization of Anomalies using Generalized Entropy Metrics. PhD thesis, ETH Zürich, 2012. [TMSA11] A. Teixeira, A. Matos, A. Souto, and L. Antunes. Entropy measures vs. Kolmogorov complexity. Entropy, 13(3):595–611, 2011. [TNS+ 05] M. Titchener, R. Nicolescu, L. Staiger, T. Gulliver, and U. Speidel. Deterministic complexity and entropy. Fundam. Inform., 64(1-4):443–461, 2005. [Tsa88] C. Tsallis. Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52(1-2):479–487, 1988. P. Bereziński Entropy-based Network Anomaly Detection 99 BIBLIOGRAPHY [TSB08] C. Thomas, V. Sharma, and N. Balakrishnan. Usefulness of DARPA dataset for intrusion detection system evaluation. In Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 6973 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, March 2008. [TTSB11] B. Trammell, B. Tellenbach, D. Schatzmann, and M. Burkhart. Peeling away timing error in NetFlow data. In Neil Spring and GeorgeF. Riley, editors, Passive and Active Measurement, volume 6579 of Lecture Notes in Computer Science, pages 194–203. Springer Berlin Heidelberg, 2011. [TWC13] B. Trammell, A. Wagner, and B. Claise. Flow Aggregation for the IP Flow Information Export (IPFIX) Protocol. RFC 7015, IETF, 2013. [UMa] UMass Trace Repository (UMass). http://traces.cs.umass.edu. [Ver14] Verizon Data Breach Investigations Report. http://www.verizonenterprise.com/dbir/2014, 2014. [W˛e12] E. W˛edrowska. Miary entropii i dywergencji w analizie struktur. A Wiley-Interscience publication. Wydawnictwo Uniwersytetu Warmińsko-Mazurskiego, 2012. [WFH11] I.H. Witten, E. Frank, and M.A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 3rd edition, 2011. [WIT] Waikato Internet Traffic Storage (WITS). http://wand.net.nz/wits. [WP05] A. Wagner and B. Plattner. Entropy based worm and anomaly detection in fast IP networks. In Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise, WETICE ’05, pages 172–177, Washington, DC, USA, 2005. IEEE Computer Society. [WSO] WSO2 – SOA middleware platform. http://wso2.com. [YKW11] X. Yang, L. Ke, and Z. Wanlei. Low-rate DDoS attacks detection and traceback by using new information metrics. Trans. Info. For. Sec., 6(2):426–437, June 2011. [YZB04] N. Ye, Y. Zhang, and C.M. Borror. Robustness of the Markov-chain model for cyberattack detection. pages 116–123, 2004. [YZX+ 04] B. Yue, Y. Zhao, Z. Xu, H. Fu, and F. Ma. An anomaly intrusion detection method using Fourier transform. Journal of Electronics (China), 21(2):135–139, 2004. [ZGMR07] A. Ziviani, A.T.A. Gomes, M.L. Monsores, and P.S.S. Rodrigues. Network anomaly detection using nonextensive entropy. Communications Letters, IEEE, 11(12):1034– 1036, December 2007. P. Bereziński Entropy-based Network Anomaly Detection