Modeling the propagation of computer viruses

Transcription

Dr. Stefano Zanero, PhD
Assistant Professor (Ricercatore)
Dipartimento di Elettronica e Informazione
Politecnico di Milano
[email protected]
S. Zanero (DEI)
1 / 41
Lecture Outline
Introduction: computer viruses
History of viruses
Existing models
Graph-based models, file exchange viruses
Propagation of e-mail viruses
Continuous and discrete models for worms
Advanced models and issues
Open research questions and conclusions
S. Zanero (DEI)
2 / 41
What is a computer virus?
First named by Cohen in [1] and [2]
Research areas in computer virology :
theoretical study of the properties of self-replicating code
creation of new viruses and viral vectors
development of new techniques for detection and containment
study of in-the-wild samples and automation thereof
modeling code replication and propagation behavior
S. Zanero (DEI)
3 / 41
A few definitions
Malware: also known as “malicious code”, is code that is intentionally
written to violate a security policy
Virus: piece of code that self-propagates (i.e. copies itself) by
infecting other files
Worm: self-propagating program which copies itself, often by
exploiting host vulnerabilities, or by social engineering (e.g.
mail worms)
Trojan horse: program with malicious (e.g. backdoor) capabilities,
sometimes masqueraded as benign software
Rootkits: combination of trojans and techniques to hide them
S. Zanero (DEI)
4 / 41
A brief history of viruses
1966 von Neumann publishes “Theory of self-reproducing
automata”
1971 Creeper, worm infecting PDP-10 via ARPANET
1974 Wabbit, fork-bomb program; also ANIMAL, a trojan,
self-copying program
1981 First reported virus outbreak; first reported boot sector virus:
Elk cloner (Apple 2)
1983 Virus definition by Cohen; demonstration of file infection
1986 First PC virus (MS-DOS), Brain (boot sector); VirDem
demoes infection of .com files
1987 First self-encrypting virus (Cascade); first purposefully
destructive virus (Jerusalem, “Friday 13 virus”); SCA, boot
sector virus for Amiga
S. Zanero (DEI)
5 / 41
A brief history of viruses: the worm age
1988 First self-propagating worm: Morris worm [3]
1989 First multipartite virus: Ghostball
1990 First polymorphic virus: Chameleon
1992 Michelangelo: mass-media hysteria on timebomb
1995 “Concept”, first macro virus
1998 Back orifice; CIH destructive massmailer
1999 Melissa virus (Word+Outlook MM); Knark rootkit; concept
of “zombie”; Happy99 worm email attachment
2000 Loveletter worm
2001 Sadmind worm (Sun+MS IIS); Code Red Worm; Nimda and
Sircam Worm (multihole)
2003 SQL Slammer worm (UDP!); Blaster worm
S. Zanero (DEI)
6 / 41
A brief history of viruses: 2004, annus horribilis
2004 MyDoom, record massmailing worm; Witty worm (multihole,
exploits security product, fastest disclosure-to-worm, first
destructive, limited impact); Sasses worm, last huge UDP
worm; Santy is the first webworm
2006 First MacOSX trojan
2007 Storm, massmailer, creates the Storm botnet. At June 30,
1.7M computers in it.
2008 Mocmex trojan found in digital photoframe; Torpig (trojan,
anti-antivirus) creates a botnet which is later taken over and
destroyed by researchers; Conficker worm infects millions of
machines creating a botnet; Koobface, facebook and
myspace worm
2009 Scareware and fake-av success
2010 Stuxnet: SCADA trojan; likely targeted
S. Zanero (DEI)
7 / 41
Viral code propagation
Viruses need a way to infect a host program, a way to execute
themselves when the program is run, and a way to propagate
Common method of propagation in the 80s: floppy disk exchange
Cavity infection, appending, prepending; boot code infection
Macro viruses extremely similar
Worms need only a way to propagate and a way to execute
themselves automatically
Propagation: social engineering (in particular massmailing) or
exploitation
Execution through persistent infection of the system
S. Zanero (DEI)
8 / 41
Antivirus: mission impossible
Virus scanners basically detect signatures of files (or memory-resident
viruses)
New viruses, or even modified ones, escape detection
Polymorphism/metamorphism has long been a challenge
More generally: it is not possible to build a perfect virus/malware
detector (Cohen)
Diagonal argument
let P be a perfect detection program
let V be a piece of viral code
V can call P
if P(V ) = true then halt
if P(V ) = false then spread
S. Zanero (DEI)
9 / 41
Motivation for creating viral propagation models
Creating reliable models is beneficial for many reasons
It allows to better understand the threat posed by new attack vector
and new propagation techniques. Use of models of worm propagation
allowed to predict with a stunning precision the behavior of future
malware [4]
It allows to develop and test improved containment and disinfection
strategies [5]
Combined with load modelling, such models help to predict failures of
the global network infrastructure
Such models can be used to develop early detection mechanisms by
describing characteristic symptoms of worm activity (e.g. a
symptomatic propagation curve)
For a review (missing out some of the newest developments) see [6]
For an older review of models see [7].
S. Zanero (DEI)
10 / 41
Viral propagation models in biology
Tipical simplifications and assumptions:
Epidemiological models abstract from the individuals, and consider
them units of a population
Each unit can only belong to a limited number of states, see Table 1
Usually, the chain of states gives the name to the model, e.g., SIR
model, SIS model. . .
Usually, avoid to deal with transmission mechanism, translating it into
parameters of the model, computed by fitting the model to propagation
An excellent analysis of mathematics for infectious diseases in the
biological world is available in [8].
M
S
E
I
R
Passive immunity
Susceptible state
Exposed to infection
Infective
Recovered
Table: Typical states for an epidemiological model
S. Zanero (DEI)
11 / 41
Modeling file infectors
First model, developed in [9]
Tries to overcome two limitations of tipical biological models:
being homogeneous, i.e. an infected individual is equally likely to infect
any other individual
being symmetric, which means that there is no privileged direction of
transmission
Tipically, file exchange happened in cliques and with a specific
direction of flow
Both shortcomings addressed by transferring a SIS model onto a
directed random graph
important effects of the topology of the graph on propagation
Sparse graph (node with small, constant average degree) allows for
conditions where infection dies out
Local graph (where probability of having a vertex between nodes B
and C is significantly higher if both have a vertex connected to the
same node A), shows higher propagation rate
SIR model would have been interesting to study for file infectors; but
the appearance of the Internet changed the system. . .
S. Zanero (DEI)
12 / 41
Modeling e-mail based worms
Best model by Zou et al. in [10]
Internet e-mail modeled as an undirected graph of relationship between
people
Node degree assumed to be power law distributed from the analysis of
“Yahoo!” discussion group sizes; small world topology
This is not a very solid assumption. Also, the small world topology they
use ignores the existence of interest groups among people
Each user “opens” an incoming virus attachment with a fixed
probability Pi , a function of the user but constant in time.
E-mail checking time Ti is modeled as either an exponentially or Erlang
distributed random variable.
T = E [Ti ], P = E [Pi ] assumed independently distributed gaussians.
Interesting observations:
since user e-mail checking time is much larger than the average e-mail
transmission time, the latter can be disregarded
the overall spread rate of viruses gets higher as the variability of users’
e-mail checking times increases, and depends mostly on T = E [Ti ]
Botched assumption: reinfection (vs. startup time) and independent
opening probability
S. Zanero (DEI)
13 / 41
Modeling scanning worms
Random Constant Spread (RCS) model [4]
Developed using empirical data derived from the outbreak of the
Code Red v2 worm
Released in version 1 on 13 Jul 2001 and immediately analyzed [11, 12]
Propagates using the .ida vulnerability discovered by eEye on June
18th 2001 [13], thus infecting vulnerable web servers running Microsoft
IIS version 4.0 and 5.0.
On the infected host it launches 99 threads, which randomly generate
IP addresses (excluding subnets 127.0.0.0/8, loopback, and
224.0.0.0/8, multicast) and try to compromise the hosts at those
addresses
version 1 had flawed random number generator; version 2 fixes this and
adds subroutine for DDoS attack against www1.whitehouse.gov on
the days between the 20th and the 28th of each month, then
reactivating on the 1st of the following month.
No resident infection: a simple reboot eliminates it, but allows
reinfection. Patching makes instead the machine invulnerable to
reinfection.
S. Zanero (DEI)
14 / 41
The RCS model
Let N be the total number of vulnerable servers which can be
potentially compromised from the Internet
ignores that systems can be patched, powered and shut down, deployed
or disconnected
ignores target networks behind NAT devices
ignores that recent researches as much as the 5% of the routed (and
used) address space is not reachable by various portions of the
network [14]
Let K be the average compromise rate, i.e. the number of vulnerable
hosts that an infected host can compromise per unit of time
The model assumes that K is constant
Assumes that a machine cannot be compromised multiple times
Let a(t) be the share of vulnerable machines which have been
compromised at the instant t
Then it follows that the number n of machines that will be
compromised in the interval dt is:
n = (Na) · K (1 − a)dt
S. Zanero (DEI)
15 / 41
The RCS model(2)
Under the hypothesis that N is constant, n = d(Na) = Nda, we can
also write:
Nda = (Na) · K (1 − a)dt
From this, it follows:
da
= Ka(1 − a)
dt
The solution of this equation is a logistic curve:
a=
e K (t−T )
1 + e K (t−T )
where T is a time parameter representing the point of maximum
increase in the growth.
S. Zanero (DEI)
16 / 41
Fitting RCS against CodeRed
In [4] the model is fitted to “scan rate” (total nr. of scans on a single site), instead of distinct
attacker IP addresses. The logistic curve in figure has parameters K = 1.6 and T = 11.9. The
nr. of distinct IP is instead skewed since each given worm copy takes some random amount of
time before it scans a particular site. The smaller the site the higher the skew
S. Zanero (DEI)
17 / 41
Fitting RCS against CodeRed (2)
Data from CAIDA [15] uses a “network telescope” [16], i.e. a large
address-space block, routed but with no actual hosts connected. Here the
“distortion” is less evident (on the left). On the right, the fitting of the
cumulative total of attacker IPs is fitted on a loglog plot against a logistic
with parameters K = 1.8 and T = 16 (because of the different timezone).
S. Zanero (DEI)
18 / 41
Fitting RCS against CodeRed (3)
CAIDA further showed the programmed deactivation of Code Red on
midnight of July 20, UTC time (left). At that time the worm was
approaching saturation with a total of about 359.000 hosts infected in 14
hours of activity. On the right the reactivation on day August 1 2001.
CAIDA observes that at peak 275.000 hosts were infected.
S. Zanero (DEI)
19 / 41
RCS failure: UDP worms
January 25th 2003, slightly before 05:30 UTC: Slammer released
Exploited a buffer overflow in SQL Server or MSDE 2000
Vulnerability discovered in July 2002, and a patch available since then
Doubling time of 8.5(±1) seconds, infecting more than 90 percent of
vulnerable hosts within the first 10 minutes.
UDP, not TCP. Bandwidth limited, not roundtrip limited
For comparison, Code Red had a doubling time of about 37 minutes.
Same propagation strategy as CR, so RCS should work. But it
actually fails after a few minutes. Why?
S. Zanero (DEI)
20 / 41
Compartment models and understanding the data
We must understand that after 3 minutes, the worm achieved a rate
of 55 million scans per second. This affected the Internet:
by slowing down and throttling the scans through bottleneck links
by throttling the observation link
We can build a compartment based model [6]
densely connected regions where the worm propagates unhindered,
following RCS
intra-region propagation with a bottleneck
Denoting with Ni and ai the parameters of a single region, and
supposing K to be constant across all of the n regions:





n
P

Nj
dai
Ni 
= ai K NNi +
dt
Ni aj K N  (1 − ai ) 1 ≤ i ≤ n

j=1

j6=i
Think of the result of the integration of each equation as a logistic
function somehow “forced” in its growth by the second additive term
(which represents the attacks incoming from outside the region)
S. Zanero (DEI)
21 / 41
Compartment models and understanding the data (2)
We can reduce the equations to:



n
 da
X
K
i
=
Nj aj  (1 − ai )
 dt
N
(1)
j=1
We can calculate analytically the bandwidth on the link(s) of region i
(supposing a leaf region)
Le s be the size of the worm, rj the number of attacks generated in a
time unit by ASj . Let T describe the total number of systems present
on the Internet, and Ti the number of systems in ASi .
The incoming bandwidth bi,incoming is therefore:
bi,incoming = s Ti
n
X
Nj
j=1
j6=i
|
N
aj K
{z
(2)
}
incoming attacks
S. Zanero (DEI)
22 / 41
Similarly, the outgoing bandwidth is therefore:
bi,outgoing = s (T − Ti )
Ni
ai K
N }
| {z
(3)
outgoing attacks
The sum is:


sK  X
bi =
T
Nj aj − Ti ai Ni 
N
(4)
j
We can “shape” this by forcing a bottleneck and going back to the
equations (not shown, see [6]. We can also use this model to predict
what will the outbreak look like from a telescope.
S. Zanero (DEI)
23 / 41
Figure: A comparison between the unrestricted growth predicted by an RCS
model and the growth restricted by bandwidth constraints, on the left. On the
right, the number of attack rates seen by a global network telescope, under the
hypothesis that some links fail and saturate during the outbreak
S. Zanero (DEI)
24 / 41
Modeling countermeasures
In [17], RCS is extended, proposing to consider K = K (t), because of
network saturation and router collapse, and taking into account
immunization and healing of hosts:
da
dr
= K (t) a (1 − a − q − r ) −
dt
dt
(5)
Where q(t) is the proportion of susceptible hosts that are immunized
at time t, and r (t) is the proportion of infected hosts that are cured
and immunized at time t. The assumptions are that dr
dt = γa, and
dq
dt = µ(1 − a − q − r )(a + r ) (in other words, patching is a diffusive
process similar to a worm)
A model [18] shows the interdependence between the timing
parameters of propagation and removal, and their influence on worm
propagation.
[19] discusses the effect of selective immunization of computers on a
network for two network topologies (tree and cluster).
S. Zanero (DEI)
25 / 41
Developing new countermeasures from models
In [20] a monitoring and alerting system is proposed, based on
distributed ingress and egress sensors for worm activity. They also
propose the use of a Kalman filter for estimating parameters such as
K , N and a from the observations, and thus have a detailed
understanding of how much damage the spreading worm could
generate. In addition, using some properties of the filter, it can be
used to generate and early warning of worm activity as early as when
1% ≤ a ≤ 2%.
Quarantine [21] and self-quarantine [22] have been extensively
discussed and modeled.
S. Zanero (DEI)
26 / 41
Modeling Bluetooth worms
Bluetooth standard enables close range transmission of files
Worm propagation problem
Potential for attacks
A huge number of people crying wolf (and selling
wolf-protection-systems, which makes them less credible. . . )
Modeling and assessing the threat is difficult, because of locality
issues
Despite claims, no one has performed serious studies on this (hint to
reviewers: please, don’t just google it up and see the number of the
hits. . . )
S. Zanero (DEI)
27 / 41
A Bluetooth primer
Bluetooth is a short range short-wave radio communication protocol
Alternative to IrDA, no line of sight required
This creates a potential for worm transmission. Woo-hoo!
Range between 1-100m, most devices 10m
Robust security and crypto mechanisms
Also, you cannot sniff on common hw because of pseudorandom
hopping
Unluckily, plethora of implementation bugs, leading to DoS,
command execution, etc.
Worms reportedly exist:
Propagation through an OBEX push connection (e.g. Cabir [23])
Multidropper worms targeting both PC and cellphone
. . . do they, actually?
S. Zanero (DEI)
28 / 41
First research effort: BlueBag
S. Zanero (DEI)
29 / 41
Worm propagation modeling
Used CMMTool to emulate movements
Omitted layer-1 aspects and shielding (this was a bad assumption as
we will see)
Used real environment characteristics and data collected during our
survey to create scenarios
Gave scary estimates of propagation speed and probability [24]
We are now creating BlueBat, an experimental honeypot for
Bluetooth attacks
S. Zanero (DEI)
30 / 41
The workbench
Tested several types of antennas: 12.5 dBi directional patch, 19 and
20.5 dBi directional parabolic, 3 amd 9 dBi omnidirectional
S. Zanero (DEI)
31 / 41
Ranges, ranges . . .
Range tests [25]
Two class 2 phones, open space, range approximately 20m
Class 1 dongle (without an antenna) and phone, open space, approx
60m
Linksys dongle with external antenna and phone, open space, approx
90m
Aircable dongle, open space, 110m (with a 3dBi omnidirectional
antenna), 175m (9 dBi omnidirectional antenna), 400m (12.5 dBi
patch antenna), 1.48Km (20.5 dbi parabolic antenna).
S. Zanero (DEI)
32 / 41
In the wild results
Days of observation in crowded places in Milan, plus 6 months of
continuous operation of 2 portable devices
Hundreds of visible devices passed by
A total of 3 files were received:
sarah.jpeg (do I actually need to explain what this is?)
Leading brand of footwear commercial
Unknown .sis file (485zp6x6 .sis, an executable for the Symbian
platform (corrupt. . . )
Trying to push an innocuous file: 6%–8% of individuals carelessly
accept unknown file transfers from unknown sources. This didn’t
change from BlueBag in 2006 to BlueBat in 2008.
S. Zanero (DEI)
33 / 41
Models and open questions
A number of models have been proposed for Bluetooth worm
propagation, almost invariably showing great propagation potentials
[26, 24, 27, 28].
This potential failed to materialize, in our opinion because of the
difficulty of casual transmission. This was actually predicted in [29],
which went against the common perception that mobility helped
spreading such worms [27].
“Human shield” effect!
Low-level description of transmission may not be the best approach;
we are thinking of using a model based on scale-free networks
. . . 30 years later, we are back to propagations over graphs!
S. Zanero (DEI)
34 / 41
References I
[1]
Fred Cohen.
Computer Viruses.
PhD thesis, University of Southern California, 1985.
[2]
Fred Cohen.
Computer viruses – theory and experiments.
Computers & Security, 6(1):22–35, 1987.
[3]
E. H. Spafford.
Crisis and aftermath.
Communications of the ACM, 32(6):678–687, 1989.
[4]
Stuart Staniford, Vern Paxson, and Nicholas Weaver.
How to 0wn the internet in your spare time.
In Proceedings of the 11th USENIX Security Symposium (Security ’02), 2002.
S. Zanero (DEI)
35 / 41
References II
[5]
Ian Whalley, Bill Arnold, David Chess, John Morar, Alla Segal, and Morton
Swimmer.
An environment for controlled worm replication and analysis.
In Proceedings of the Virus Bulletin Conference, September 2000.
[6]
Giuseppe Serazzi and Stefano Zanero.
Computer virus propagation models.
In Mariacarla Calzarossa and Erol Gelenbe, editors, Performance Tools and
Applications to Networked Systems, Revised Tutorial Lectures [from MASCOTS
2003], volume 2965 of Lecture Notes in Computer Science, pages 26–50. Springer,
2004.
[7]
Steve R. White.
Open problems in computer virus research.
In Proceedings of the Virus Bulletin Conference, Oct 1998.
[8]
Herbert W. Hethcote.
The mathematics of infectious diseases.
SIAM Review, 42(4):599–653, 2000.
S. Zanero (DEI)
36 / 41
References III
[9]
J. O. Kephart and S. R. White.
Directed-graph epidemiological models of computer viruses.
In IEEE Symposium on Security and Privacy, pages 343–361, 1991.
[10] Cliff Changchun Zou, Don Towsley, and Weibo Gong.
Email virus propagation modeling and analysis.
Technical Report TR-CSE-03-04, University of Massachussets, Amherst.
[11] Ryan Permeh and Marc Maiffret.
.ida ’code red’ worm.
Advisory AL20010717, July 2001.
[12] Ryan Permeh and Marc Maiffret.
Code red disassembly.
Assembly code and research paper, July 2001.
[13] Ryan Permeh and Riley Hassell.
Microsoft i.i.s. remote buffer overflow.
Advisory AD20010618, June 2001.
S. Zanero (DEI)
37 / 41
References IV
[14] Abha Ahuja Craig Labovitz and Michael Bailey.
Shining light on dark address space.
Technical report, Arbor networks, Nov 2001.
[15] David Moore, Colleen Shannon, and Jeffery Brown.
Code-red: a case study on the spread and victims of an internet worm.
In Proceedings of the ACM SIGCOMM/USENIX Internet Measurement Workshop,
Nov 2002.
[16] David Moore.
Network telescopes: Observing small or distant security events.
In Proceedings of the 11th USENIX Security Symposium, Aug 2002.
[17] Cliff Changchun Zou, Weibo Gong, and Don Towsley.
Code red worm propagation modeling and analysis.
In Proceedings of the 9th ACM conference on Computer and communications
security, pages 138–147. ACM Press, 2002.
S. Zanero (DEI)
38 / 41
References V
[18] Yang Wang and Chenxi Wang.
Modelling the effects of timing parameters on virus propagation.
In Proceedings of the ACM CCS Workshop on Rapid Malcode (WORM’03), Oct
2003.
[19] Chenxi Wang, J. C. Knight, and M. C. Elder.
On computer viral infection and the effect of immunization.
In ACSAC, pages 246–256, 2000.
[20] Cliff Changchun Zou, Lixin Gao, Weibo Gong, and Don Towsley.
Monitoring and early warning for internet worms.
In Proceedings of the 10th ACM conference on Computer and communication
security, pages 190–199. ACM Press, 2003.
[21] David Moore, Colleen Shannon, Geoffrey M. Voelker, and Stefan Savage.
Internet quarantine: Requirements for containing self-propagating code.
In INFOCOM, 2003.
S. Zanero (DEI)
39 / 41
References VI
[22] Cliff Changchun Zou, Weibo Gong, and Don Towsley.
Worm propagation modeling and analysis under dynamic quarantine defense.
In Proceedings of the ACM CCS Workshop on Rapid Malcode (WORM’03), Oct
2003.
[23] Cabir.
Analysis available online at http://www.symantec.com/security_response/
writeup.jsp?docid=2004-061419-4412-99.
[24] Luca Carettoni, Claudio Merloni, and Stefano Zanero.
Studying bluetooth malware propagation: The bluebag project.
IEEE Security and Privacy, 5(2):17–25, 2007.
[25] A. Galante, A. Kokos, and S. Zanero.
Bluebat: Towards practical bluetooth honeypots.
In 2009 IEEE International Conference on Communications, Dresden, Germany,
June 2009.
S. Zanero (DEI)
40 / 41
References VII
[26] Jing Su, Kelvin K. W. Chan, Andrew G. Miklas, Kenneth Po, Ali Akhavan, Stefan
Saroiu, Eyal de Lara, and Ashvin Goel.
A preliminary investigation of worm infections in a bluetooth environment.
In WORM ’06: Proceedings of the 4th ACM workshop on Recurring malcode,
pages 9–16, New York, NY, USA, 2006. ACM.
[27] James W. Mickens and Brian D. Noble.
Modeling epidemic spreading in mobile environments.
In WiSe ’05: Proceedings of the 4th ACM workshop on Wireless security, pages
77–86, New York, NY, USA, 2005. ACM.
[28] Guanhua Yan and Stephan Eidenbenz.
Modeling propagation dynamics of bluetooth worms (extended version).
IEEE Transactions on Mobile Computing, 8(3):353–368, 2009.
[29] Guanhua Yan and Stephan Eidenbenz.
Bluetooth worms: Models, dynamics, and defense implications.
In ACSAC ’06: Proceedings of the 22nd Annual Computer Security Applications
Conference, pages 245–256, Washington, DC, USA, 2006. IEEE Computer Society.
S. Zanero (DEI)
41 / 41

Modeling the propagation of computer viruses

Transcription

Similar documents

GESTION - ADMINISTATION BUDGET – AGRIC

9. Acoel flatworm microscopy images and alignment of V9 sequence

QUINDOS Worm Wheel for cylindrical Worms Inspection and

Willie ate a worm today, a squiggly, wiggly worm. He picked it up

Intergulf Corp.

Mater Dei Catholic High School