Modeling the propagation of computer viruses
Transcription
Modeling the propagation of computer viruses
Modeling the propagation of computer viruses Dr. Stefano Zanero, PhD Assistant Professor (Ricercatore) Dipartimento di Elettronica e Informazione Politecnico di Milano [email protected] S. Zanero (DEI) Modeling the propagation of computer viruses 1 / 41 Lecture Outline Introduction: computer viruses History of viruses Existing models Graph-based models, file exchange viruses Propagation of e-mail viruses Continuous and discrete models for worms Advanced models and issues Open research questions and conclusions S. Zanero (DEI) Modeling the propagation of computer viruses 2 / 41 What is a computer virus? First named by Cohen in [1] and [2] Research areas in computer virology : theoretical study of the properties of self-replicating code creation of new viruses and viral vectors development of new techniques for detection and containment study of in-the-wild samples and automation thereof modeling code replication and propagation behavior S. Zanero (DEI) Modeling the propagation of computer viruses 3 / 41 A few definitions Malware: also known as “malicious code”, is code that is intentionally written to violate a security policy Virus: piece of code that self-propagates (i.e. copies itself) by infecting other files Worm: self-propagating program which copies itself, often by exploiting host vulnerabilities, or by social engineering (e.g. mail worms) Trojan horse: program with malicious (e.g. backdoor) capabilities, sometimes masqueraded as benign software Rootkits: combination of trojans and techniques to hide them S. Zanero (DEI) Modeling the propagation of computer viruses 4 / 41 A brief history of viruses 1966 von Neumann publishes “Theory of self-reproducing automata” 1971 Creeper, worm infecting PDP-10 via ARPANET 1974 Wabbit, fork-bomb program; also ANIMAL, a trojan, self-copying program 1981 First reported virus outbreak; first reported boot sector virus: Elk cloner (Apple 2) 1983 Virus definition by Cohen; demonstration of file infection 1986 First PC virus (MS-DOS), Brain (boot sector); VirDem demoes infection of .com files 1987 First self-encrypting virus (Cascade); first purposefully destructive virus (Jerusalem, “Friday 13 virus”); SCA, boot sector virus for Amiga S. Zanero (DEI) Modeling the propagation of computer viruses 5 / 41 A brief history of viruses: the worm age 1988 First self-propagating worm: Morris worm [3] 1989 First multipartite virus: Ghostball 1990 First polymorphic virus: Chameleon 1992 Michelangelo: mass-media hysteria on timebomb 1995 “Concept”, first macro virus 1998 Back orifice; CIH destructive massmailer 1999 Melissa virus (Word+Outlook MM); Knark rootkit; concept of “zombie”; Happy99 worm email attachment 2000 Loveletter worm 2001 Sadmind worm (Sun+MS IIS); Code Red Worm; Nimda and Sircam Worm (multihole) 2003 SQL Slammer worm (UDP!); Blaster worm S. Zanero (DEI) Modeling the propagation of computer viruses 6 / 41 A brief history of viruses: 2004, annus horribilis 2004 MyDoom, record massmailing worm; Witty worm (multihole, exploits security product, fastest disclosure-to-worm, first destructive, limited impact); Sasses worm, last huge UDP worm; Santy is the first webworm 2006 First MacOSX trojan 2007 Storm, massmailer, creates the Storm botnet. At June 30, 1.7M computers in it. 2008 Mocmex trojan found in digital photoframe; Torpig (trojan, anti-antivirus) creates a botnet which is later taken over and destroyed by researchers; Conficker worm infects millions of machines creating a botnet; Koobface, facebook and myspace worm 2009 Scareware and fake-av success 2010 Stuxnet: SCADA trojan; likely targeted S. Zanero (DEI) Modeling the propagation of computer viruses 7 / 41 Viral code propagation Viruses need a way to infect a host program, a way to execute themselves when the program is run, and a way to propagate Common method of propagation in the 80s: floppy disk exchange Cavity infection, appending, prepending; boot code infection Macro viruses extremely similar Worms need only a way to propagate and a way to execute themselves automatically Propagation: social engineering (in particular massmailing) or exploitation Execution through persistent infection of the system S. Zanero (DEI) Modeling the propagation of computer viruses 8 / 41 Antivirus: mission impossible Virus scanners basically detect signatures of files (or memory-resident viruses) New viruses, or even modified ones, escape detection Polymorphism/metamorphism has long been a challenge More generally: it is not possible to build a perfect virus/malware detector (Cohen) Diagonal argument let P be a perfect detection program let V be a piece of viral code V can call P if P(V ) = true then halt if P(V ) = false then spread S. Zanero (DEI) Modeling the propagation of computer viruses 9 / 41 Motivation for creating viral propagation models Creating reliable models is beneficial for many reasons It allows to better understand the threat posed by new attack vector and new propagation techniques. Use of models of worm propagation allowed to predict with a stunning precision the behavior of future malware [4] It allows to develop and test improved containment and disinfection strategies [5] Combined with load modelling, such models help to predict failures of the global network infrastructure Such models can be used to develop early detection mechanisms by describing characteristic symptoms of worm activity (e.g. a symptomatic propagation curve) For a review (missing out some of the newest developments) see [6] For an older review of models see [7]. S. Zanero (DEI) Modeling the propagation of computer viruses 10 / 41 Viral propagation models in biology Tipical simplifications and assumptions: Epidemiological models abstract from the individuals, and consider them units of a population Each unit can only belong to a limited number of states, see Table 1 Usually, the chain of states gives the name to the model, e.g., SIR model, SIS model. . . Usually, avoid to deal with transmission mechanism, translating it into parameters of the model, computed by fitting the model to propagation An excellent analysis of mathematics for infectious diseases in the biological world is available in [8]. M S E I R Passive immunity Susceptible state Exposed to infection Infective Recovered Table: Typical states for an epidemiological model S. Zanero (DEI) Modeling the propagation of computer viruses 11 / 41 Modeling file infectors First model, developed in [9] Tries to overcome two limitations of tipical biological models: being homogeneous, i.e. an infected individual is equally likely to infect any other individual being symmetric, which means that there is no privileged direction of transmission Tipically, file exchange happened in cliques and with a specific direction of flow Both shortcomings addressed by transferring a SIS model onto a directed random graph important effects of the topology of the graph on propagation Sparse graph (node with small, constant average degree) allows for conditions where infection dies out Local graph (where probability of having a vertex between nodes B and C is significantly higher if both have a vertex connected to the same node A), shows higher propagation rate SIR model would have been interesting to study for file infectors; but the appearance of the Internet changed the system. . . S. Zanero (DEI) Modeling the propagation of computer viruses 12 / 41 Modeling e-mail based worms Best model by Zou et al. in [10] Internet e-mail modeled as an undirected graph of relationship between people Node degree assumed to be power law distributed from the analysis of “Yahoo!” discussion group sizes; small world topology This is not a very solid assumption. Also, the small world topology they use ignores the existence of interest groups among people Each user “opens” an incoming virus attachment with a fixed probability Pi , a function of the user but constant in time. E-mail checking time Ti is modeled as either an exponentially or Erlang distributed random variable. T = E [Ti ], P = E [Pi ] assumed independently distributed gaussians. Interesting observations: since user e-mail checking time is much larger than the average e-mail transmission time, the latter can be disregarded the overall spread rate of viruses gets higher as the variability of users’ e-mail checking times increases, and depends mostly on T = E [Ti ] Botched assumption: reinfection (vs. startup time) and independent opening probability S. Zanero (DEI) Modeling the propagation of computer viruses 13 / 41 Modeling scanning worms Random Constant Spread (RCS) model [4] Developed using empirical data derived from the outbreak of the Code Red v2 worm Released in version 1 on 13 Jul 2001 and immediately analyzed [11, 12] Propagates using the .ida vulnerability discovered by eEye on June 18th 2001 [13], thus infecting vulnerable web servers running Microsoft IIS version 4.0 and 5.0. On the infected host it launches 99 threads, which randomly generate IP addresses (excluding subnets 127.0.0.0/8, loopback, and 224.0.0.0/8, multicast) and try to compromise the hosts at those addresses version 1 had flawed random number generator; version 2 fixes this and adds subroutine for DDoS attack against www1.whitehouse.gov on the days between the 20th and the 28th of each month, then reactivating on the 1st of the following month. No resident infection: a simple reboot eliminates it, but allows reinfection. Patching makes instead the machine invulnerable to reinfection. S. Zanero (DEI) Modeling the propagation of computer viruses 14 / 41 The RCS model Let N be the total number of vulnerable servers which can be potentially compromised from the Internet ignores that systems can be patched, powered and shut down, deployed or disconnected ignores target networks behind NAT devices ignores that recent researches as much as the 5% of the routed (and used) address space is not reachable by various portions of the network [14] Let K be the average compromise rate, i.e. the number of vulnerable hosts that an infected host can compromise per unit of time The model assumes that K is constant Assumes that a machine cannot be compromised multiple times Let a(t) be the share of vulnerable machines which have been compromised at the instant t Then it follows that the number n of machines that will be compromised in the interval dt is: n = (Na) · K (1 − a)dt S. Zanero (DEI) Modeling the propagation of computer viruses 15 / 41 The RCS model(2) Under the hypothesis that N is constant, n = d(Na) = Nda, we can also write: Nda = (Na) · K (1 − a)dt From this, it follows: da = Ka(1 − a) dt The solution of this equation is a logistic curve: a= e K (t−T ) 1 + e K (t−T ) where T is a time parameter representing the point of maximum increase in the growth. S. Zanero (DEI) Modeling the propagation of computer viruses 16 / 41 Fitting RCS against CodeRed In [4] the model is fitted to “scan rate” (total nr. of scans on a single site), instead of distinct attacker IP addresses. The logistic curve in figure has parameters K = 1.6 and T = 11.9. The nr. of distinct IP is instead skewed since each given worm copy takes some random amount of time before it scans a particular site. The smaller the site the higher the skew S. Zanero (DEI) Modeling the propagation of computer viruses 17 / 41 Fitting RCS against CodeRed (2) Data from CAIDA [15] uses a “network telescope” [16], i.e. a large address-space block, routed but with no actual hosts connected. Here the “distortion” is less evident (on the left). On the right, the fitting of the cumulative total of attacker IPs is fitted on a loglog plot against a logistic with parameters K = 1.8 and T = 16 (because of the different timezone). S. Zanero (DEI) Modeling the propagation of computer viruses 18 / 41 Fitting RCS against CodeRed (3) CAIDA further showed the programmed deactivation of Code Red on midnight of July 20, UTC time (left). At that time the worm was approaching saturation with a total of about 359.000 hosts infected in 14 hours of activity. On the right the reactivation on day August 1 2001. CAIDA observes that at peak 275.000 hosts were infected. S. Zanero (DEI) Modeling the propagation of computer viruses 19 / 41 RCS failure: UDP worms January 25th 2003, slightly before 05:30 UTC: Slammer released Exploited a buffer overflow in SQL Server or MSDE 2000 Vulnerability discovered in July 2002, and a patch available since then Doubling time of 8.5(±1) seconds, infecting more than 90 percent of vulnerable hosts within the first 10 minutes. UDP, not TCP. Bandwidth limited, not roundtrip limited For comparison, Code Red had a doubling time of about 37 minutes. Same propagation strategy as CR, so RCS should work. But it actually fails after a few minutes. Why? S. Zanero (DEI) Modeling the propagation of computer viruses 20 / 41 Compartment models and understanding the data We must understand that after 3 minutes, the worm achieved a rate of 55 million scans per second. This affected the Internet: by slowing down and throttling the scans through bottleneck links by throttling the observation link We can build a compartment based model [6] densely connected regions where the worm propagates unhindered, following RCS intra-region propagation with a bottleneck Denoting with Ni and ai the parameters of a single region, and supposing K to be constant across all of the n regions: n P Nj dai Ni = ai K NNi + dt Ni aj K N (1 − ai ) 1 ≤ i ≤ n j=1 j6=i Think of the result of the integration of each equation as a logistic function somehow “forced” in its growth by the second additive term (which represents the attacks incoming from outside the region) S. Zanero (DEI) Modeling the propagation of computer viruses 21 / 41 Compartment models and understanding the data (2) We can reduce the equations to: n da X K i = Nj aj (1 − ai ) dt N (1) j=1 We can calculate analytically the bandwidth on the link(s) of region i (supposing a leaf region) Le s be the size of the worm, rj the number of attacks generated in a time unit by ASj . Let T describe the total number of systems present on the Internet, and Ti the number of systems in ASi . The incoming bandwidth bi,incoming is therefore: bi,incoming = s Ti n X Nj j=1 j6=i | N aj K {z (2) } incoming attacks S. Zanero (DEI) Modeling the propagation of computer viruses 22 / 41 Compartment models and understanding the data (3) Similarly, the outgoing bandwidth is therefore: bi,outgoing = s (T − Ti ) Ni ai K N } | {z (3) outgoing attacks The sum is: sK X bi = T Nj aj − Ti ai Ni N (4) j We can “shape” this by forcing a bottleneck and going back to the equations (not shown, see [6]. We can also use this model to predict what will the outbreak look like from a telescope. S. Zanero (DEI) Modeling the propagation of computer viruses 23 / 41 Compartment models and understanding the data (4) Figure: A comparison between the unrestricted growth predicted by an RCS model and the growth restricted by bandwidth constraints, on the left. On the right, the number of attack rates seen by a global network telescope, under the hypothesis that some links fail and saturate during the outbreak S. Zanero (DEI) Modeling the propagation of computer viruses 24 / 41 Modeling countermeasures In [17], RCS is extended, proposing to consider K = K (t), because of network saturation and router collapse, and taking into account immunization and healing of hosts: da dr = K (t) a (1 − a − q − r ) − dt dt (5) Where q(t) is the proportion of susceptible hosts that are immunized at time t, and r (t) is the proportion of infected hosts that are cured and immunized at time t. The assumptions are that dr dt = γa, and dq dt = µ(1 − a − q − r )(a + r ) (in other words, patching is a diffusive process similar to a worm) A model [18] shows the interdependence between the timing parameters of propagation and removal, and their influence on worm propagation. [19] discusses the effect of selective immunization of computers on a network for two network topologies (tree and cluster). S. Zanero (DEI) Modeling the propagation of computer viruses 25 / 41 Developing new countermeasures from models In [20] a monitoring and alerting system is proposed, based on distributed ingress and egress sensors for worm activity. They also propose the use of a Kalman filter for estimating parameters such as K , N and a from the observations, and thus have a detailed understanding of how much damage the spreading worm could generate. In addition, using some properties of the filter, it can be used to generate and early warning of worm activity as early as when 1% ≤ a ≤ 2%. Quarantine [21] and self-quarantine [22] have been extensively discussed and modeled. S. Zanero (DEI) Modeling the propagation of computer viruses 26 / 41 Modeling Bluetooth worms Bluetooth standard enables close range transmission of files Worm propagation problem Potential for attacks A huge number of people crying wolf (and selling wolf-protection-systems, which makes them less credible. . . ) Modeling and assessing the threat is difficult, because of locality issues Despite claims, no one has performed serious studies on this (hint to reviewers: please, don’t just google it up and see the number of the hits. . . ) S. Zanero (DEI) Modeling the propagation of computer viruses 27 / 41 A Bluetooth primer Bluetooth is a short range short-wave radio communication protocol Alternative to IrDA, no line of sight required This creates a potential for worm transmission. Woo-hoo! Range between 1-100m, most devices 10m Robust security and crypto mechanisms Also, you cannot sniff on common hw because of pseudorandom hopping Unluckily, plethora of implementation bugs, leading to DoS, command execution, etc. Worms reportedly exist: Propagation through an OBEX push connection (e.g. Cabir [23]) Multidropper worms targeting both PC and cellphone . . . do they, actually? S. Zanero (DEI) Modeling the propagation of computer viruses 28 / 41 First research effort: BlueBag S. Zanero (DEI) Modeling the propagation of computer viruses 29 / 41 Worm propagation modeling Used CMMTool to emulate movements Omitted layer-1 aspects and shielding (this was a bad assumption as we will see) Used real environment characteristics and data collected during our survey to create scenarios Gave scary estimates of propagation speed and probability [24] We are now creating BlueBat, an experimental honeypot for Bluetooth attacks S. Zanero (DEI) Modeling the propagation of computer viruses 30 / 41 The workbench Tested several types of antennas: 12.5 dBi directional patch, 19 and 20.5 dBi directional parabolic, 3 amd 9 dBi omnidirectional S. Zanero (DEI) Modeling the propagation of computer viruses 31 / 41 Ranges, ranges . . . Range tests [25] Two class 2 phones, open space, range approximately 20m Class 1 dongle (without an antenna) and phone, open space, approx 60m Linksys dongle with external antenna and phone, open space, approx 90m Aircable dongle, open space, 110m (with a 3dBi omnidirectional antenna), 175m (9 dBi omnidirectional antenna), 400m (12.5 dBi patch antenna), 1.48Km (20.5 dbi parabolic antenna). S. Zanero (DEI) Modeling the propagation of computer viruses 32 / 41 In the wild results Days of observation in crowded places in Milan, plus 6 months of continuous operation of 2 portable devices Hundreds of visible devices passed by A total of 3 files were received: sarah.jpeg (do I actually need to explain what this is?) Leading brand of footwear commercial Unknown .sis file (485zp6x6 .sis, an executable for the Symbian platform (corrupt. . . ) Trying to push an innocuous file: 6%–8% of individuals carelessly accept unknown file transfers from unknown sources. This didn’t change from BlueBag in 2006 to BlueBat in 2008. S. Zanero (DEI) Modeling the propagation of computer viruses 33 / 41 Models and open questions A number of models have been proposed for Bluetooth worm propagation, almost invariably showing great propagation potentials [26, 24, 27, 28]. This potential failed to materialize, in our opinion because of the difficulty of casual transmission. This was actually predicted in [29], which went against the common perception that mobility helped spreading such worms [27]. “Human shield” effect! Low-level description of transmission may not be the best approach; we are thinking of using a model based on scale-free networks . . . 30 years later, we are back to propagations over graphs! S. Zanero (DEI) Modeling the propagation of computer viruses 34 / 41 References I [1] Fred Cohen. Computer Viruses. PhD thesis, University of Southern California, 1985. [2] Fred Cohen. Computer viruses – theory and experiments. Computers & Security, 6(1):22–35, 1987. [3] E. H. Spafford. Crisis and aftermath. Communications of the ACM, 32(6):678–687, 1989. [4] Stuart Staniford, Vern Paxson, and Nicholas Weaver. How to 0wn the internet in your spare time. In Proceedings of the 11th USENIX Security Symposium (Security ’02), 2002. S. Zanero (DEI) Modeling the propagation of computer viruses 35 / 41 References II [5] Ian Whalley, Bill Arnold, David Chess, John Morar, Alla Segal, and Morton Swimmer. An environment for controlled worm replication and analysis. In Proceedings of the Virus Bulletin Conference, September 2000. [6] Giuseppe Serazzi and Stefano Zanero. Computer virus propagation models. In Mariacarla Calzarossa and Erol Gelenbe, editors, Performance Tools and Applications to Networked Systems, Revised Tutorial Lectures [from MASCOTS 2003], volume 2965 of Lecture Notes in Computer Science, pages 26–50. Springer, 2004. [7] Steve R. White. Open problems in computer virus research. In Proceedings of the Virus Bulletin Conference, Oct 1998. [8] Herbert W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599–653, 2000. S. Zanero (DEI) Modeling the propagation of computer viruses 36 / 41 References III [9] J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer viruses. In IEEE Symposium on Security and Privacy, pages 343–361, 1991. [10] Cliff Changchun Zou, Don Towsley, and Weibo Gong. Email virus propagation modeling and analysis. Technical Report TR-CSE-03-04, University of Massachussets, Amherst. [11] Ryan Permeh and Marc Maiffret. .ida ’code red’ worm. Advisory AL20010717, July 2001. [12] Ryan Permeh and Marc Maiffret. Code red disassembly. Assembly code and research paper, July 2001. [13] Ryan Permeh and Riley Hassell. Microsoft i.i.s. remote buffer overflow. Advisory AD20010618, June 2001. S. Zanero (DEI) Modeling the propagation of computer viruses 37 / 41 References IV [14] Abha Ahuja Craig Labovitz and Michael Bailey. Shining light on dark address space. Technical report, Arbor networks, Nov 2001. [15] David Moore, Colleen Shannon, and Jeffery Brown. Code-red: a case study on the spread and victims of an internet worm. In Proceedings of the ACM SIGCOMM/USENIX Internet Measurement Workshop, Nov 2002. [16] David Moore. Network telescopes: Observing small or distant security events. In Proceedings of the 11th USENIX Security Symposium, Aug 2002. [17] Cliff Changchun Zou, Weibo Gong, and Don Towsley. Code red worm propagation modeling and analysis. In Proceedings of the 9th ACM conference on Computer and communications security, pages 138–147. ACM Press, 2002. S. Zanero (DEI) Modeling the propagation of computer viruses 38 / 41 References V [18] Yang Wang and Chenxi Wang. Modelling the effects of timing parameters on virus propagation. In Proceedings of the ACM CCS Workshop on Rapid Malcode (WORM’03), Oct 2003. [19] Chenxi Wang, J. C. Knight, and M. C. Elder. On computer viral infection and the effect of immunization. In ACSAC, pages 246–256, 2000. [20] Cliff Changchun Zou, Lixin Gao, Weibo Gong, and Don Towsley. Monitoring and early warning for internet worms. In Proceedings of the 10th ACM conference on Computer and communication security, pages 190–199. ACM Press, 2003. [21] David Moore, Colleen Shannon, Geoffrey M. Voelker, and Stefan Savage. Internet quarantine: Requirements for containing self-propagating code. In INFOCOM, 2003. S. Zanero (DEI) Modeling the propagation of computer viruses 39 / 41 References VI [22] Cliff Changchun Zou, Weibo Gong, and Don Towsley. Worm propagation modeling and analysis under dynamic quarantine defense. In Proceedings of the ACM CCS Workshop on Rapid Malcode (WORM’03), Oct 2003. [23] Cabir. Analysis available online at http://www.symantec.com/security_response/ writeup.jsp?docid=2004-061419-4412-99. [24] Luca Carettoni, Claudio Merloni, and Stefano Zanero. Studying bluetooth malware propagation: The bluebag project. IEEE Security and Privacy, 5(2):17–25, 2007. [25] A. Galante, A. Kokos, and S. Zanero. Bluebat: Towards practical bluetooth honeypots. In 2009 IEEE International Conference on Communications, Dresden, Germany, June 2009. S. Zanero (DEI) Modeling the propagation of computer viruses 40 / 41 References VII [26] Jing Su, Kelvin K. W. Chan, Andrew G. Miklas, Kenneth Po, Ali Akhavan, Stefan Saroiu, Eyal de Lara, and Ashvin Goel. A preliminary investigation of worm infections in a bluetooth environment. In WORM ’06: Proceedings of the 4th ACM workshop on Recurring malcode, pages 9–16, New York, NY, USA, 2006. ACM. [27] James W. Mickens and Brian D. Noble. Modeling epidemic spreading in mobile environments. In WiSe ’05: Proceedings of the 4th ACM workshop on Wireless security, pages 77–86, New York, NY, USA, 2005. ACM. [28] Guanhua Yan and Stephan Eidenbenz. Modeling propagation dynamics of bluetooth worms (extended version). IEEE Transactions on Mobile Computing, 8(3):353–368, 2009. [29] Guanhua Yan and Stephan Eidenbenz. Bluetooth worms: Models, dynamics, and defense implications. In ACSAC ’06: Proceedings of the 22nd Annual Computer Security Applications Conference, pages 245–256, Washington, DC, USA, 2006. IEEE Computer Society. S. Zanero (DEI) Modeling the propagation of computer viruses 41 / 41