prise 2007

Transcription

prise 2007

81,9(56,7¬'(*/,678',',520$
³6$3,(1=$´
5DSSRUWR7HFQLFRQ
$WWLGHO6HFRQGRZRUNVKRSLWDOLDQRVX
35,YDF\H6(FXULW\
± 35,6(±
0HUFROHGuJLXJQR
6KHUDWRQ5RPD+RWHO
9LDOH'HO3DWWLQDJJLR
5RPD,WDOLD
PREFAZIONE
Secondo Workshop Italiano su PRivacy e SEcurity
La sicurezza dei dati e delle reti in funzione del suo impatto sul sistema Paese, con particolare
riferimento all’economia e alla sicurezza dei cittadini, è oramai diventata un tema centrale nel
contesto della moderna Società dell’Informazione e della Comunicazione. In questo quadro si sono
moltiplicate in tutto il mondo le iniziative mirate a stimolare attività di ricerca, sviluppo e
innovazione nel campo della sicurezza informatica. Gli attori coinvolti in queste iniziative non sono
solo le Accademie e gli istituti di ricerca ma anche soggetti privati e pubbliche amministrazioni
interessate alla realizzazione di dispositivi e applicazioni che oltre ad innovare i processi produttivi
tengano conto dei necessari requisiti di sicurezza.
Anche nel nostro Paese, nel corso degli ultimi anni abbiamo assistito al moltiplicarsi di iniziative,
tra le più disparate, nel settore. Diversi gruppi di ricerca hanno iniziato ad operare su temi specifici del
settore, sono stati avviati Master Universitari sul tema, Corsi di Laurea e numerose realtà aziendali
sono impegnate in progetti di ricerca su tematiche centrali o molto contigue a quelle della sicurezza
informatica.
Dopo il successo dell’edizione 2006, si apre a Roma il 6 Giugno 2007, il Secondo Workshop Italiano di
Privacy and Security (PRISE 2007) patrocinato dal Master in Sicurezza informatica e dal Master in
Gestione della Sicurezza informatica del Dipartimento di Informatica dell’Università di Roma
“Sapienza”, nonché dal CLUSIT e da Infosecurity.
Grazie ai numerosi articoli sottomessi, è stato possibile stilare un programma dei lavori che copre le
principali tematiche di ricerca della sicurezza dell’informazione. In particolare, gli argomenti
affrontati includono:
Anonymity and Pseudonymity
Applied Cryptography
Denial of Service
Digital Forensics
Electronic Privacy
Intrusion Detection Systems
Privacy-enhancing technology
Peer-to-Peer Security
Secure Hardware and Smartcards
Wireless network security
Non posso che essere soddisfatto dell’accoglienza positiva di PRISE 2007 da parte della comunità
italiana, per questo il mio ringraziamento più sincero va a tutti coloro che vi hanno dedicato tempo e
lavoro prezioso. Vorrei anche esprimere la mia profonda gratitudine ai membri del comitato
programma per il loro aiuto valido e puntuale.
Workshop chair
Prof. Luigi V. Mancini - Università di Roma “Sapienza”
I
COMMITTEES
WORKSHOP CHAIR
•
Luigi V. Mancini - Università di Roma "Sapienza"
PROGRAMME COMMITTEE
•
Maurizio Aiello - IEIIT CNR Genova
•
Cosimo Anglano - Università Piemonte Orientale
•
Massimo Bernaschi - IAC CNR Roma
•
Claudio Bettini - Università di Milano
•
Danilo Bruschi - Università di Milano
•
Giuseppe Corasaniti - Magistrato
•
Roberto Di Pietro - Università di Roma Tre
•
Roberto Gorrieri - Università di Bologna
•
Pino Italiano - Università di Roma "Tor vergata"
•
Pino Persiano - Università di Salerno
III
INDICE
Gene Tsudik – University of California, Irvine, USA and Università di Roma “La Sapienza”
On Privacy in Critical Internet Services
1
M. Esposito, C. Mazzariello, C. Sansone – Università di Napoli
A network traffic Anonymizer
4
S.P. Romano, L. Peluso, F. Oliviero – Università di Napoli
REFACING: an Autonomic approach to Network Security based on Multidimensional
Trustworthiness
8
A. Dainotti, A. Pescape’, G. Ventre – Università di Napoli
Wavelet-based Detection of DoS Attacks
15
A. Botta, A. Dainotti, A. Pescape’, G. Ventre – Università di Napoli
Experimental Analysis of Attacks Against Intradomain Routing Protocols
20
E. De Cristofaro, C. Blundo, C. Galdi, G. Persiano – Università di Salerno e Università di Napoli
Validating Orchestration of Web Services with BPEL and Aggregate Signatures
25
Ivan Visconti – Dipartimento di Informatica e Applicazioni, Università di Salerno
PassePartout Certificates
35
L. Catuogno, C. Galdi – Università di Salerno e Università di Napoli
A Graphical PIN Authentication Mechanism for Smart Cards and Low-Cost Devices
46
M. Aiello, D. Avanzini, D. Chiarella, G. Papaleo – CNR IEIIT Genova
SMTP sniffing for intrusion detection purposes
53
C. Bettini, S. Mascetti, L. Pareschi – Università di Milano
The general problem of privacy in location-based services and some interesting research
directions
59
C.A. Visaggio, G. Canfora, E. Costante, I. Pennino – Università di Sannio
Bottom up approach to manage data privacy policy through the front end filter paradigm 65
D. Ariu, I. Corona, G. Giacinto, R. Perdisci, F. Roli – Università di Cagliari
Intrusion Detection Systems based on Anomaly Detection techniques
73
D. Adami, C. Callegari, S. Giordano, M. Pagano – Università di Pisa
A Statistical Network Intrusion Detection System
78
V
C. Caruso, D. Malerba, G. Camporeale – Università di Bari
Aggregation of network sensors messages by alarm clustering method: choosing the
parameters
82
A. Savoldi, P. Gubian – Università di Brescia
Embedded Forensics: An Ongoing Research about SIM/USIM Cards
90
S. Aterno – Studio Legale Aterno and Università di Roma “La Sapienza”
Lavoro: le linee guida del Garante per posta elettronica e internet
98
A. Pasquinucci – UCCI.IT
A Pratical Web-Voting System
101
VI
On Privacy in Critical Internet Services
Professor Gene Tsudik
Computer Science Department
University of California Irvine
and
Dipartimento di Informatica
Universita’ degli Studi di Roma, “La Sapienza”
EMAIL: gts(AT)ics.uci.edu
Recent advances in network security and cryptography have enabled private
communication over the public Internet to some extent. For example, public key
cryptography allows entities to establish secure communications without pre-shared
secrets. Many applications build upon this capability. For example, ssh allows a user to
log in to a remote host from anywhere without leaking its secrets (e.g., a password) to
intermediate routers. (Similarly, SSL/TLS protects web client/server communication by
creating – via public key techniques -- secure session-layer tunnels.) IPv4 Security
extension (IPSec [KA98]) takes this a step further by allowing the actual end-points of IP
packets to be concealed, via its tunneling mode.
More advanced anonymization techniques, such as onion routing and TOR [DMS04],
allow network-layer (IP) addresses of communicating hosts to be hidden from any
adversary with non-global observation powers. Such techniques clearly have their merits
in preserving secrecy of communication and privacy of communicating end-points (with
respect to intervening network elements, such as routers). However, they side-step
privacy issues in the bootstrapping process that precedes actual communication between
two hosts. Bootstrapping typically involves a domain name (DNS) query to resolve the
hostname of the target. Furthermore, it increasingly includes revocation checking of the
target host’s credentials (Public Key Certificate or PKC), which is particularly the case
with web-based communication.
For example, to communicate with the web site http://www.whitehouse.gov, a web client
(e.g., at a host my.freedom.nk) first sends a query to its DNS [MD88] resolver, asking for
the IP address of www.whitehouse.gov. This resolver, in turn, queries upper-level name
servers until the query is eventually resolved by the name server responsible for keeping
the record for www.whitehouse.gov. In this process, privacy of both source and
destination (my.freedom.nk and www.whitehouse.gov) hosts is compromised: the source
host reveals to its resolver the target host it intends to communicate; and the name query
reveals to remote name server(s) that someone is interested in the targeted destination
www.whitehouse.gov.
Public key certificate (PKC) revocation checking faces the same privacy issue. An
Internet user needs to verify that the target host’s credentials (e.g., a web server’s PKC)
are still valid before sending important information, such as credit card numbers. (Note
that this is different from establishing PKC authenticity, which is essentially a binding
between a public key and some claimed identity; it is attained by verifying a CA’s
signature on the PKC.) Despite appearing authentic, a PKC may be revoked prematurely
1
for a number of reasons, such as the loss or compromise of a private key, a change of
affiliation or job function, algorithm compromise, or a change in security policy.
Therefore, a user must check the revocation status of a PKC before it accepts it as valid
[MA+99]. Similar to a DNS query, a revocation check reveals both the source and the
target of communication to the (potentially un-trusted) components of the revocation
checking system.
In modern society preoccupied with gradual erosion of electronic privacy, information
leakage of this magnitude is a cause for concern. Consider, for example, certain countries
with less-than-stellar human rights records where mere intent to communicate (indicated
by revocation checking or a name query) with an unsanctioned or dissident host or
website may be grounds for arrest or worse. In the same vein, sharp increase in popularity
(deduced from being a frequent target of revocation checking or a name query) of a website may lead unscrupulous authorities to conclude that something subversive is going on.
The problem can also manifest itself in other less sinister settings.
For example, many Internet service providers already keep detailed statistics and build
elaborate profiles based on their clients’ communication patterns. Current name service
and revocation checking methods – by revealing sources and targets of name or
revocation queries – represent yet another set of sources of easily exploitable and
potentially misused personal information. We therefore need examine a clean-slate
solution towards a privacy-preserving Internet name service as well as more realistic
near-term solutions that lend themselves to DNS coexistence and gradual migration.
Since hiding sources of name or revocation queries can be achieved with modern
anonymization techniques, such as TOR, our research focuses on hiding the target of the
query from the third party service that answers the query. This is a challenging task
because of the conflicting goals. On the one hand, the service must have sufficient
information to resolve a query; but, on the other hand, it must not know the target of a
query so as to preserve the privacy of both source and target. As we have already made
some progress into privacy-preserving revocation checking [ST06,NT07], our first step is
to apply lessons learned in that domain to tackle the challenges of preserving privacy in
the Internet name service. However, the latter is a more formidable task because of the
greater scale: a revocation checking system only keeps records for certificates that are
revoked, whereas, an Internet-wide name service must keeps a record for every hostname. Some recent proposals to modify or re-design DNS offer some hope, e.g.,
[HG05,DCW05].
We are currently exploring novel data structures and state-of-the-art cryptographic
primitives to architect a scalable privacy-preserving name service. We are investigating
applications of various techniques that include range queries, verifiable secret sharing
(VSS) [GMW98] and private information retrieval (PIR) [CK+98].
References:
[CK+98] B. Chor, E. Kushilevitz, O. Goldreich and M. Sudhan. Private Information
Retrieval, J. ACM 45(6), pp. 965–981, 1998.
[DCW05] T. Deegan, J. Crowcroft, and A. Warfield. The main name system: An exercise
2
in centralized computing. ACM SIGCOMM CCR, 35(5):5–13, Oct. 2005.
[DMS04] R. Dingledine, N. Mathewson, and P. Syverson. ”Tor: The second-generation
onion router”, the 13th USENIX Security Symposium, Aug. 2004.
[HG05] M. Handley and A. Greenhalgh. The case for pushing DNS. HotNets IV,
November 2005.
[KA98] S. Kent and R. Atkinson. Security architecture for the internet protocol. Internet
Request for Comments: RFC 2401, November 1998.
[MD88] P. Mockapetris and K. Dunlap. Development of the Domain Name System.
ACM SIGCOMM 1988.
[MA+99] M. Myers, R. Ankney, A.Malpani, S. Galperin, and C. Adams. Internet public
key infrastructure online certificate status protocol - OCSP. Internet Request for
Comments: RFC 2560, 1999.
[ST06] J. Solis and G. Tsudik. Simple and flexible private revocation checking. In 2006
Workshop on Privacy Enhancing Technologies (PET’06), June 2006.
[NT06] M. Narasimha and G. Tsudik. Privacy-Preserving Revocation Checking, In 2007
EuroPKI Workshop, June 2007.
[GMW98] R. Gennaro, M. Rabin and T. Rabin, Simplified VSS and fast-track multiparty
computations with applications to threshold cryptography, ACM PODC’98.
3
A Network Traffic Anonymizer
M. Esposito1,2, C. Mazzariello1, C. Sansone1
1
Dipartimento di Informatica e Sistemistica
Università degli Studi di Napoli Federico II, Napoli (Italy)
{mesposit,cmazzari,carlosan}@unina.it
2
Università Campus Bio-Medico, Roma (Italy)
[email protected]
Abstract. Research in networking often relies on the availability of huge archives of traffic.
Unfortunately, due to the presence of sensible information and to privacy issues, such archives
cannot always be distributed. Hence, tests and results obtained by using them cannot be
reproduced and validated. To such extent, it's useful to have tools which eliminate sensible
information from network traffic, making traffic traces freely distributable.
In this paper we will present an approach to network traffic anonymization by means of a tool
which is flexible, easy to use, and multiplatform. By introducing additional options with respect
to other well-known anonymization tools, we will show how it is still possible to keep the
resource usage under reasonable limits.
Keywords: Anonymity, Privacy, Intrusion Detection Systems.
1. Introduction
By processing information contained in network traffic archives, it is possible to
perform repeated experiments to gain knowledge about the network traffic properties,
and to develop techniques for exploiting such knowledge for several research
purposes.
Anyway, such archives, due to laws protecting users' right to privacy, cannot be
distributed as-is. To address this problem, some approaches propose methods aimed at
avoiding the distribution of the traces, totally preventing information leakage [1].
These approaches, however, cannot by followed if traffic traces have to be used to
simulate and evaluate the performance of security systems, such as Intrusion
Detection Systems (IDS). In this field, traffic traces are typically used to estimate and
eventually validate models of either normal or anomalous traffic for a particular
network environment. In this case a different approach must be used, by anyhow
considering that a preliminary phase, aimed at deleting all the private information
from network traffic, is required by the law in many countries. Network traffic
archives, in fact, allow us to reconstruct the activities of each user, harvest passwords,
bank account and credit card numbers, thus exposing the users to many risks. Hence,
tools for deleting or disguising such sensible information are needed.
In this paper we will present our proposal for such a tool. In particular, we
implemented the possibility of anonymizing header fields at Link State, Network and
Transport Layer. Furthermore, our tool also deletes sensible information at the
Application layer, by obfuscating the whole payload, though keeping the packet size
intact. To such purpose, we substitute the whole payload content with meaningless
random bytes, and reevaluate the packet checksum in order to obtain a traffic trace
4
containing valid packets. What other tools usually do, in fact, is truncating the
payload, thus obtaining malformed packets which are not always analyzable by means
of traffic sniffers. The type of anonymization strategy adopted is tightly related to the
application at hand. In Sections 2 and 3 we will describe the foreseen application
context for our tool and, according to its requirements, we'll introduce the techniques
used in order to anonymize each packet accordingly. Finally, in Section 4 we will
show the tool's functionalities, and we'll compare its performance with those of two
other well-known anonymization softwares [2, 3].
2. Anonymization in practice
2.1 Header Anonymization
In order for two applications to exchange data correctly, and for the packets to be
well-formed, its header fields must be correctly formatted. In the context of privacy
enforcement, we want to avoid the possibility of reconstructing any activity performed
by each of the hosts. We can choose to hide the information about the type of
hardware used, the position in the network, the subnet a host belongs to, and also
about the type of service the communication is bound to. According to each of the
aforementioned requirements, an anonymization tool has to be able to modify
respectively: the MAC address, the IP address and the ports. As to MAC addresses
anonymization, as well as port numbers, the process is very simple: we only need to
define an injective function for the transformation. For IP addresses, instead, by using
our approach, it is possible to either decide to use a random injective function, or a
class preserving or a prefix preserving transformation, as it will be shown in next
Section.
2.2 Payload Anonymization
While the header contains information about the sending and receiving host of each
packet, the payload contains data produced by the applications communicating
through the network. Usually private data are exchanged, as well as unencrypted
passwords and bank account and credit card numbers. The simplest approach to
anonymization consists in completely deleting such information. Yet, the payload can
also be substituted by random symbols. In the first case, malformed packets are
generated, as the size field in the header won't correspond to the actual size of the
packet, while in the second the packet checksum will have to be recomputed, as its
value is related to the actual content of the payload. Our tool implemented the latter
strategy, by using a technique for recomputing the checksum described in the next
Section.
3 Implementation Details
In this section we will discuss some of the implementation details, by motivating
the choices we made while developing our anonymization tool. We will present two
strategies for anonymizing IP addresses, and some issues related to payload
anonymization. Yet, for the sake of brevity, we will not describe the live
5
anonymization functionality in details. The developed software, called Anonymizer, is
available on Sourceforge1 for download.
3.1 Class Preserving
Class Preserving anonymization is a strategy mainly developed for IP address
anonymization. Its aim is to implement an injective transformation between original
and anonymized /8, /16 and /24 class subnets. This means that, for example, if two
addresses belong to the same original /16 subnet, they will still belong to the same /16
anonymized subnet even after transformation. Furthermore, the class of IP addresses
will be preserved after anonymization. Hence, the first bits of the addresses will be
preserved unchanged in order to maintain the address class.
3.2 Prefix Preserving
This anonymization strategy allows us to keep IP addresses grouped according to
the longest prefix they have in common. In other words, if two IP addresses share an
M bits long common prefix, they will still share an M bits long common prefix after
anonymization. This automatically allows us to transform IP's belonging to the same
subnet into IP's belonging to the same anonymized subnet.
3.3 Signature Preserving
Since we want our anonymization tool to be used for network security problems,
we want to allow transformed traffic traces to be used for the evaluation of
signature-based IDS, too. Hence, given a database of known signatures (as, for
example, the ones used by SnortTM [4]), we enable our software to obfuscate all the
payload but the desired signature. Then, in case a predefined string is found within the
payload, it is not overwritten but simply rewritten in the same way and at the same
position in the anonymized payload.
3.4 Checksum Correction
As stated before, we want to allow our anonymizer to be able to keep the packet
size unchanged, by obfuscating the information contained in the payload, instead of
deleting it. In such a case, though, it's necessary to recalculate the checksum for each
packet. In order to keep the anonymization time for each trace as low as possible, we
decided to implement a strategy for incremental checksum update, described in [5-7],
and generally used when the TTL value is changed in the header field of the packet.
4 Experimental Results and Conclusions
In order to prove the effectiveness and usability of our tool, we compared it with
other two well-known tools, namely tcpdpriv [2] and AnonTool [3]. To such aim, a
brief resume of the functionalities of these tools is presented in Table 1, while in Table
2 the execution time of each anonymization tool on different traffic traces is shown.
1
http://anonymizer.sourceforge.net/
6
Live
Anonymization
x
x
x
x
x
x
x
x
x
FreeBSD
Compatible
Port
Anonymization
x
x
Linux
Compatible
Checksum
Correction
x
x
x
Pattern
Matching
Payload
Darkening
x
MAC Address
Anonymization
Prefix
Preserving
tcpdpriv [2]
AnonTool [3]
Anonymizer
Class
Preserving
TABLE 1. Functionalities of different anonymization tools.
x
x
x
x
x
x
x
TABLE 2. Anonymization time using different strategies and operating systems.
Linux - prefix preserving strategy
FreeBSD - class preserving strategy
Tool
0.5 GB
1 GB
Tool
0.5 GB
1 GB
Anonymizer
46s
1m 46s
Anonymizer
26s
1m 53s
tcpdpriv
AnonTool
46s
1m 43s
26s
1m 52s
By considering the data reported in both tables, it is possible to conclude that we
were actually able to develop a tool that extends the functionalities provided by [2]
and [3], by keeping execution times practically unchanged.
Acknowledgments
This work has been partially supported by the Ministero dell'Università e della
Ricerca (MiUR) in the framework of the PRIN Project “Robust and Efficient traffic
Classification in IP nEtworks” (RECIPE).
References
[1]. Mogul J. C., Arlitt M., “SC2D: An Alternative to Trace Anonymization” in Proceedings of the
2006 SIGCOMM Workshop on Mining Network Data, Pisa, Italy, 2006, pp. 323 - 328.
[2]. TcpdPriv: A program for eliminating confidential information from traces.
http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html
[3]. Koukis D., Antonatos S., Antoniades D., Trimintzios P., Markatos E.P., “A Generic
Anonymization Framework for Network Traffic” in Proceedings of IEEE International
Conference on Communications 2006 (ICC 2006), vol. 5, June, Istanbul, Turkey, 2006, pp.
2302-2309.
[4]. Beale J., Foster J.C., Snort 2.0 Intrusion Detection, Rockland, MA: Syngress Publishing, Inc.,
2003.
[5]. Barden R., Borman D., Partridge C., RFC 1071 - Computing the Internet Checksum, 1998.
[6]. Mallory T., Kullberg A., RFC 1441 - Incremental Updating of the Internet Checksum, 1990.
[7]. Rijsinghani A., RFC 1624 - Computation of the Internet Checksum via Incremental Update,
1994.
7
REFACING: an Autonomic approach to Network
Security based on Multidimensional
Trustworthiness
F. Oliviero, L. Peluso, S.P. Romano
University of Napoli “Federico II”
Abstract
Several research efforts have recently focused on achieving distributed
anomaly detection in an effective way. As a result, new information fusion
algorithms and models have been defined and applied in order to correlate information from multiple intrusion detection sensors distributed inside the network. In this field, an approach which is gaining momentum
in the international research community relies on the exploitation of the
Dempster-Shafer (D-S) theory. Dempster and Shafer have conceived a
mathematical theory of evidence based on belief functions and plausible
reasoning, which is used to combine separate pieces of information (evidence) to compute the probability of an event. However, the adoption of
the D-S theory to improve distributed anomaly detection efficiency generally involves facing some important issues. The most important challenge
definitely consists in sorting the uncertainties in the problem into a priori
independent items of evidence. We believe that this can be effectively carried out by looking at some of the principles of autonomic computing in
a self-adaptive fashion, i.e. by introducing support for self-management,
self-configuration and self-optimization functionality.
In this paper we intend to tackle some of the above mentioned issues
by proposing the application of the D-S theory to network information
fusion. This will be done by proposing a model for a self-management
supervising layer exploiting the innovative concept of multidimensional
reputation, which we have called REFACING (RElationship-FAmiliarityConfidence-INteGrity).
1
Introduction
As computer attacks become more and more sophisticated, the need to provide effective intrusion detection methods increases. Current best practices for
protecting networks from malicious attacks rely on the deployment of an infrastructure that includes network intrusion detection systems. However, most
such practices suffer from several deficiencies, like the inability to detect distributed or coordinated attacks and the high false alarm rates. Indeed, detecting
intrusions becomes a hard task in any networked environment, since a network
naturally lends itself to a distributed exploitation of its resources. In such a
scenario, the identification of a potential attack requires that information is
gathered from many different sources and in many different places, since no
8
locality principle (neither spatial nor temporal ) can be fruitfully applied in the
most general case.
The classical approaches to distributed protection of a network rely on the
effective dissemination of probes and classifiers/analyzers across the infrastructure.
We claim that the current solutions to the above mentioned issues lack two
fundamental features, namely dynamicity and trustworthiness. Indeed, in our
view a network should be capable to self-protect against attacks by means of
an autonomic approach which highly depends on the effective exploitation, in
each node, of on-line information coming both from local analysis of traffic and
from synthetic information delivered by neighboring nodes. Self-organization
demands for an un-coordinated capability to appropriately orchestrate the behavior of a number of distributed components. Besides this, the second challenge
we identify resides in the need for having an agreed-upon means of deciding
whether or not information coming from the outside world can be assumed to
be reliable.
In this paper we discuss the main issues related to improving network security through manipulating and combining data coming from multiple sources.
We present a model for a self-management supervising layer exploiting the innovative concept of multidimensional reputation.
2
Detection from multiple sources
As soon as one starts spreading detection components across a network, the
issue arises to appropriately orchestrate their operation. In fact, information
retrieved from a single sensor is usually limited and sometimes provides for low
accuracy. The use of multiple sensors definitely represents a valid alternative
to infer additional information about the environment in which the sensors
operate. To this aim, many research efforts have so far been conducted with the
goal of defining effective approaches for the combination of information coming
form multiple sources. Data fusion deals with the combination of information
produced by different sensors, with the final aim of improving both the accuracy
of the classification process and the reliability of the decision-making process.
It looks clear that any approach relying on information fusion brings in some
contrasting points. In fact, if on one hand the data fusion process can highly
improve reliability of the detection, on the other hand it also makes a strong
hypothesis on the reliability of the information which is subject to the analysis.
Stated in different terms, as soon as I start relying on data coming from the
outside world, I have to make sure that such data can be considered as reliable
as my local information, in order to avoid that the fusion process becomes even
worse than it used to be in the absence of cooperation. This adds a further level
of complexity to the overall intrusion detection system. The ideal situation foresees the possibility to associate local and foreign decisions with a corresponding
weight, which actually represents the current level of trustworthiness assigned
to the corresponding originating source. In the depicted scenario, each decision
in a single node would be taken by appropriately measuring a weighted combination of local and foreign data, with the weights which should vary in time as
a function of the reliability of all participating nodes all along their past history.
While simple in its formulation, the above ideal scenario definitely looks
9
ideal, in the sense that it is not at all easy to dynamically set weights in an
ever-changing environment such as a network crossed by a variegated portfolio of
potential traffic profiles (with each such profile subject to unpredictable changes
in space and time). Hence, the contribution of our research aims to bring
some insights specifically suited to tackle the above mentioned issue. To this
aim, we propose to exploit the concept of weighted information fusion in a
highly dynamic fashion. The key issue we are addressing is that of dynamically
changing the values of the weights assigned to information sources in such a
way as to let them concretely follow the current level of reliability of the sources
themselves. The system we devise can be compared to a dynamic controller
which appropriately sets the values of the parameters of a control function in
which the variables to be tuned represent the decisions taken at different points
of the network.
By summarizing the above considerations, we can easily identify one major
challenge, concerning the need to effectively measure the level of trustworthiness
to be assigned to both local and foreign decisions.
In the following of this paper we will touch upon the above issue. We will
introduce a new model for determining the degree of loyalty of a node based on a
multidimensional framework (REFACING – Relationship-FAmiliarity-ConfidenceINteGrity) envisaging the thorough analysis of the relations between each pair
of interacting nodes.
3
Autonomic Communications
In the recent years, we have been witnessing of many radical changes in thinking computer networks. The on-going convergence of networked infrastructures
and services, in fact, has changed the traditional view of the network from the
simple wired interconnection of few manually administered homogeneous nodes,
to a complex infrastructure encompassing a multitude of different technologies,
heterogeneous nodes, and diverse services. This situation has put a challenge
for the research community to engineer systems and architectures that will increase the robustness of the current and future internetwork whilst alleviating
both management costs and operational complexity. The autonomic communications research community has been formed to respond to this challenge.
From this perspective, Autonomic Communication (AC) represents a new
emergent paradigm for today’s networked cooperation. Many efforts have been
devoted to proposing its most appropriate definition and application in different
actual scenarios. Based on interdisciplinary grounds, AC tries to tackle the problem by developing architectures and models of networked systems that can manage themselves in a reliable way always fulfilling their service mission. In fact,
the essence of autonomic computing systems consists in the self-management requirements, the intent of which is to free system administrators from the details
of system operation and maintenance and to allow systems managing themselves
given high-level objectives.
Independently from networked systems’ behaviors and purposes, the following properties should be exhibited by any autonomic computing system in order
to fulfill self-management needs:
• Automatic: this essentially means being able to self-control its internal
10
functions and operations. As such, an autonomic system must be selfcontained and able to operate without any external intervention;
• Adaptive: an autonomic system must be able to change its operation. This
will allow the system to cope with temporal and spatial changes in its operational context either long term (environment customization/optimization)
or short term (exceptional conditions such as faults, attacks);
• Aware: an autonomic system must be able to monitor its operational
context as well as its internal state in order to be able to assess if its
current operation serves its purpose. Awareness will control adaptation
of its operational behavior in response to situation or state changes.
The sequence of the above mentioned properties highlights the basic principle of the Autonomic Computing paradigm. Any autonomic system must have
a sensing capability in order to enable the overall system to observe its external operational context and to self-adapt its behavior to fit any environment
changes.
3.1
Applying AC principles to Reputation Assessment
Autonomic Communication systems support dynamic coalitions of users or entities sharing common interests. In this context, self-management approaches
become fundamental to enforce “law and order” through distributed an loosely
coupled schemes based on democratic rules, therefore avoiding the complexity
and rigidity of centralized control at an extreme, and the complete anarchy leading to irrelevant information, malicious or free behavior at the other extreme.
Therefore, the need arises to reach the following objectives: (i) to distribute
community control into the community itself in order to allow self-management;
(ii) to detect, remove and isolate malicious and malfunctioning components; (iii)
to identify components that are overloaded or prone to failure or simply have
lower capabilities.
4
Dynamically renewing network nodes’reputation
The model we propose to assess the reputation of network components taking
part to the distributed detection process is called REFACING (RElationshipFAmiliarity-Confidence-INteGrity) and is based on a multi-layered approach, as
depicted in Fig. 1.
The lowermost layer provides information about the existence of some form
of connection among detection components (probes, detection engines, decision engines, etc.). The absence of connection indicates the actual impossibility
of carrying out any form of social relationship with the other nodes of the network. Otherwise, the second layer in the stack can prove useful to quantitatively
measure the level of interaction existing between each pair of network nodes.
The more we interact, the more familiar we result with respect to each other.
Though, this does not necessarily imply that we trust each other: I can know
you quite well, but (or even better, just because of this) I can hardly trust you if
our past interactions showed me that you are not that reliable. This is the reason why we introduce the third layer of the trustworthiness stack, which deals
11
Integrity
Confidence
Familiarity
Relationship
Fig. 1: The REFACING model
with confidence. If I have relations with others, and if I am familiar with the
others as well, I can much more objectively determine their level of trustworthiness with respect to our social interactions. This said, to further foster the
capability of assessing someone else’s loyalty level related to his/her interactions
in the network, one more dimension should be taken into account to somehow
reflect the variability in the behavioral interaction patterns of each node. To
make things clearer, the fact that some node has showed a balmeless behavior
in one single interaction does not necessarily mean that such node shall be irreproachable also in its subsequent interactions. Some form of estimation of the
line of conduct over time is definitely needed for all nodes: the more coherent
my behavior has been in the past, the less probable it will be that I will behave badly in the near future. This is dealt with at the uppermost layer, which
provides information about the level of integrity of network nodes.
We do believe that the adoption of such a multi-layered model helps add
objectivity to the assessment of network nodes’ reputation, since it takes into
account a number of complementary, though highly correlated, facets.
In our view, the REFACING methodology is implemented at the level of
management of the overall infrastructure, as depicted in Fig. 2. The management layer has a global view of the physical topology of the network and is
thus capable to determine whether or not there exists some form of relationship
(layer 1 in the trustworthiness stack) between the network nodes. Furthermore,
thanks to monitoring it can also determine the frequency of the interactions
among the network elements (layer 2 of the stack). Information pertaining to
the third layer can be retrieved through a comparison between each evaluation
provided by a single node and the global opinion of the system (e.g. my confi-
12
REFACING Management Layer
Event
Label
REFACING
peer
A
Attacker
REFACING
peer
REFACING
peer
D
C
E
Attacker
REFACING
peer
Target
Attacker
REFACING
peer
C
Attacker
Fig. 2: The REFACING methodology
dence level gets higher if my personal evaluation was found in accordance with
the final decision taken by the distributed detection system after analyzing all
single decisions coming form the network nodes). Finally, data at the fourth
layer can be computed by statistically analyzing the information related to all
past interactions among all underlying nodes (e.g. my integrity level gets higher
if my confidence level has kept on growing over the past interactions).
After each evaluation turn, the management layer can compute a set of
labels (one for each network node involved in the detection process), which
are assigned to the nodes through, for example, a policy-based approach. The
label computation process can be as general as possible and will normally be
influenced by information belonging to all of the layers in the trustworthiness
stack (in a simplistic scenario, it might for example be a simple weighted sum
of the values computed at each of the four layers). The labels are then used by
all nodes whenever they start a new interaction. Each label acts like a business
card for the node involved in the interaction and can be used by the other
nodes in order to assign a weight to the information that it has received from
its partners.
5
Conclusions
In this short paper we presented a novel approach to distributed detection of
network threats. The core of our contribution resides in having designed a selfmanagement layer exploiting the concept of trustworthiness in order to make
the detection process more reliable. The idea of dynamically tuning the currently estimated level of trust of each peer in the community proves fundamental
during the information fusion process, which in our architecture is based on the
application of an enhanced version of the well-known Dempster and Shafer’s
theory of evidence. Such enhanced version of the D-S formula proposes to appropriately weigh the various inputs to the information fusion process on the
basis of their estimated impact on the final merged information.
The REFACING approach has been tested through extensive measurements
13
based on simulation1 . The experimental campaign (which is not illustrated here
for the sake of conciseness) has showed that our solution helps dramatically
improve the overall performance of the detection process in a number of realworld operational scenarios. On the other hand, it has also helped us set the
limits of our approach when applied to situations envisaging the presence of
a high number of unreliable sensors whose responses can negatively bias the
output of the information fusion process towards a faulty decision.
1 See
http://sourceforge.net/refacing
14
Wavelet-based Detection of DoS Attacks
Alberto Dainotti, Antonio Pescapé, and Giorgio Ventre
University of Napoli “Federico II” (Italy), {alberto,pescape,giorgio}@unina.it
I. I NTRODUCTION
Accurate detection and classification of anomalies in IP networks is still an open issue due to the intrinsic
complex nature of network traffic. Several anomaly detection systems (ADS) based on very different approaches
and techniques have been proposed in literature. Please refer to [1] for a list of related works. In this work we
propose an approach to anomaly detection, based on the wavelet transform, which we tested against several types
of DoS attacks. Such approach presents several differences with past works. First, we make use of the Continuous
Wavelet Transform (CWT), exploiting its interpretation as the cross-correlation function between the input signal
and wavelets and its redundancy in terms of available scales and coefficients. All previous works, instead, are
based on the use of the Discrete Wavelet Transform (DWT), which is more oriented to the decomposition of the
signal over a finite set of scales, each one with a reduced number of coefficients, in order to make the original
signal reconstructable from them. This is typically done in a way that avoids redundancy. Second, our detection
approach takes explicitly into account - beside hits and false alarms - accuracy of the estimation of the time
interval during which the anomalous event happens and the resolution (in terms of ability to distinguish between
subsequent anomalies). In the context of security incidents, these aspects can be crucially important, for example to
trace back the source of an attack, or during forensics analysis, etc. Third, we propose a cascade architecture made
of two different systems - the first one based on classical ADS techniques for time series, the second one based
on the analysis of wavelet coefficients - which allows more flexibility and performance improvements as regards
the hits/false alarms trade-off. Finally, as fourth point, we present an experimental analysis of the performance of
the system under an extensive set of attack - traffic trace combinations (≈ 15000).
II. A N A NALYTICAL BASIS
The Continuous Wavelet Transform (CWT) is defined as:
fCW T (a, b) =
+∞
−∞
∗
f (t)ψab
(t)dt = f (t)|ψab (t) ,
with
1
ψab (t) = √ ψ
a
t−b
a
,
(1)
f (.) is the signal under analysis, ψ(.) is a function of finite energy whose integral over R is 0, called mother
wavelet, and a and b are the scaling and translation factors respectively. Each (a, b) pair furnishes a wavelet
coefficient, which can also be seen as the cross-correlation at lag b between f (t) and the mother wavelet function
dilated to scaling factor a. An important difference between the CWT and the DWT is that the former calculates
such correlation for each lag at every possible scale, whereas the DWT calculates a number of coefficients that
decreases with the scaling factor. The scale of the coefficients global maximum, is where the input signal is most
similar to the mother wavelet. This function is chosen to be oscillating but with a fast decay from the center to its
sides, in order to have good scale (frequency) and time localization properties. This makes the CWT a good tool
for analyzing transient signals as network traffic time series. In the context of the study of wavelets and image
processing, it has been proved that the local maxima of a wavelet transform can detect the location of irregular
structures in the input signal [4]. Please refer to [1] and [4] for analytical details. Here we just summarize that by
using the derivative of a smoothing function as a mother wavelet (e.g. derivatives of the gaussian function), the
zero-crossings or the local extrema of the wavelet transform applied to a signal indicate the locations of its sharp
variation points and singularities. The CWT coefficient redundancy, allows to identify these points at every scale
with the same time-resolution of the input signal.
0
The content of this extended abstract is part of the paper A. Dainotti, A. Pescapè, G. Ventre, “Wavelet-based Detection of DoS Attacks”
2006 IEEE GLOBECOM - Nov 2006, San Francisco (CA, USA)
15
ROUGH DETECTION
Detection-R
Normal
Behavior
Model
Input Signal
Normal Behavior
Model
Construction
FINE DETECTION
Detection-F
Rough
Detection
Signal
Detection
Signal
CWT
Coefficients
CWT
Computing
Threshold
Signal
Analysis
Network
Data
Signal
Statistics
Threshold
Calculation
Library of the
Anomalies
Fig. 1.
Anomaly Detection System: Proposed Architecture.
III. A RCHITECTURE
In Fig. 1 a block diagram representing the two-stage architecture of the proposed ADS is shown. The ADS
takes as input a time series of samples representing the packet rate and outputs an ON-OFF signal reporting the
presence of an anomaly for each sample. The first stage, which we called Rough Detection, can be implemented
using statistical anomaly detection techniques previously presented in literature and it is just responsible to detect
any suspicious change in the traffic trend and to report an alarm to the second stage. Its output is equal to 0 or
1 for each input sample. Here we impose a high sensitivity aiming at catching as much anomalies as possible,
whereas the second stage, which we called Fine Detection, is designed to reduce the number of false alarms. For
each detected anomaly, this stage also estimates the time interval during which it is present.
As for the Rough Detection module, we adopted the two techniques proposed in [2] to detect SYN flooding
attacks (an adaptive threshold algorithm and the CUSUM algorithm) and we applied them to generic traffic traces.
A similar implementation of the CUSUM algorithm has also been proposed in [3] to detect different DoS attacks.
More details on our implementation of these well-known algorithms are given in [1]. The CWT computing block
computes the continuous wavelet transform of the whole input signal. We used the Wavelab [5] set of routines
under the Matlab environment. The block output is a matrix W of M rows and N columns, where N is the number
of samples of the input trace. Each row reports the wavelet coefficient at a different scale. The number of available
scales M is given by the number of octaves, J = [log2 N ] − 1 times the number of voices per octave. The CWT
function implemented under Wavelab allowed us to work with 12 voices per octave. This matrix is fed as an input
to the Detection-F block, which receives as inputs also a threshold level (that will be explained in the following)
and the Rough Detection Signal. For each alert reported in the Rough Detection Signal, the Detection-F block
basically operates by looking for the scale at which the coefficients reach the maximum variation. The use of the
CWT guarantees that we have a coefficient for each input sample at every scale - differently from the DWT, where
typically the number of coefficients decreases as the scale grows. This way, if an anomaly is recognized, we can
identify with good precision the zero-crossing points of the wavelet coefficients at the scale where the anomaly is
present. The choice of the threshold level for the wavelet coefficients (Threshold Calculation block) is based on
the mean and standard deviation of the traffic trace, computed in the Signal Analysis block, and on the Library of
Anomalies, which is a collection of signals representing some traffic anomalies (see Section IV).
IV. T RAFFIC T RACES AND A NOMALIES
To study and develop our ADS, we made several experiments under a broad range of situations. Our approach
was to generate traffic signals superimposing anomaly profiles to real traffic traces in which no anomalies were
present. This choice is partly due to the scarce availability of traffic traces containing classified anomalies along
with all the necessary details. For example, the lack of information on the exact beginning and end of each anomaly
would not allow us to evaluate the temporal precision of the detection system. We considered real traffic traces that
were known not to contain any anomalies, obtaining a large and heterogeneous set of traces. In Table I the data sets
we used are summarized. The first three groups of traces in Table I were derived from the DARPA/MIT Lincoln
Laboratory off-line intrusion detection evaluation data set [6], which has been widely used for testing intrusion
detection systems and has been referred in many papers (e.g. [7] [8]). We used only traces from the weeks in which
no attacks were present. The dataset marked in Table I as UCLA refers to packet traces collected during August
2001 at the border router of Computer Science Department, University of California Los Angeles [9]. They have
16
been collected in the context of the D-WARD project [10]. Finally, the UNINA data set refers to traffic traces we
captured by passively monitoring ingoing traffic at the WAN access router at University of Napoli “Federico II”.
Table I contains details about the data sets, as the number of traces for each group and the sampling period Ts used
to calculate the packet rate time series. Also, indicative values of mean and standard deviation (std) are shown.
All traces are composed of 3600 samples.
Several kinds of anomaly profiles related to DoS attacks have been synthetically generated. We assigned labels
to each anomaly we used (see Table II). Some anomaly profiles were obtained by generating traffic with real DDoS
attack tools, TFN2K [11] and Stacheldraht [12]. We launched such tools with several different options and we
captured the traffic that was generated by them. The anomaly profiles obtained were stored and labeled depending
on the adopted attacking technique. Another group of anomalies have been obtained by synthetically generating
the corresponding time series with Matlab, according to known profiles that have been considered in [13]. We
considered ‘Constant Rate’, ‘Increasing Rate’, and ‘Decreasing Rate’ anomalies.
TABLE II
T ESTED A NOMALIES .
TABLE I
T RAFFIC T RACES .
Data Set
Year
Ts
# Traces
Mean
Std
Darpa 1
Darpa 2
Darpa 3
UCLA
UNINA
1999
1999
1999
2001
2004
2s
5s
5s
2s
2s
5
5
5
4
3
80 pkt
20 pkt
12 pkt
20 pkt
8 10E3 pkt
90 pkt
40 pkt
30 pkt
15 pkt
1.3 10E3 pkt
Tools
Matlab
TFN2K
Anomalies
Constant
rate,
Increasing
rate,
Decreasing
rate
ICMP Ping flood, TCP
SYN flood, UDP flood,
Mix flood, Targa3 flood
Stacheldraht
TCP ACK flood, TCP ACK NUL flood, TCP
random header attach, Mstream (TCP DUP
ACK), DOS flood, mass ICMP bombing, IP
header attack, SYN flood, UDP food
V. E XPERIMENTAL R ESULTS
The experimental results shown have been obtained by performing a large set of automated tests. The results
have been summarized and the following performance metrics have been calculated: (i) the Hit Rate, HR =
number of test hits
of f alse alarms
× 100; (ii) the False Alarms Ratio, F AR = number
number tests
total number of alarms × 100; (iii) the estimation
errors in the identification of the beginning and the end of the anomaly; (iv) the number of fragments when a single
anomaly is recognized as several ones. Our scripts generated traces containing anomalies with various combinations
of parameters and ran the ADS on each of them. In order to test the ADS under more complicated situations (i.e.
obfuscating the anomalies in the traces), when a trace and an anomaly profile are selected, the amplitude and the
duration of the signal representing the anomaly are modified. Then the signal is superimposed to the traffic trace at
a randomly selected point - at 1/4, 1/2, or 3/4 of the trace - and the detection system is executed. For a specific
trace, the amplitude of an anomaly was scaled in order to make its maximum peak proportional to the root mean
square of the original traffic trace. The choice of the proportionality factor varies from 0.5 to 2.00 with a step
of 0.25. Anomaly durations range from 50 to 300 samples with a step of 50. Sampling and interpolation of the
anomaly profiles were performed for expansion and shortening respectively. Thus we performed a number of tests
given by the product (traces × anomalies × intensities × durations). With 22 traces and 16 anomalies, we
performed about 15000 tests, each time we tested a system configuration (i.e. with CUSUM, with AT, etc.).
In Table III we show the system performance, in terms of HR and F AR, when the rough detection block is
implemented with AT and CUSUM algorithms. We report results obtained separately for each of the 5 trace data
sets, and in the last row, we show global results obtained working with all the traces. The columns labeled F D(AT )
and F D(CU SU M ) report performance indicators derived from the output of the fine detection stage when the
rough detection stage are AT and CU SU M respectively. Instead, the performance results related just to the output
of the rough detection stages are reported in columns labeled with RD(AT ) and RD(CU SU M ). This is to show
how we tuned the rough detection stage with a very high sensitivity in order to catch as much anomalies as possible
at the expense of a high F AR. Indeed, passing from the rough detection output to the fine detection output, while
HR remains almost the same, F AR decreases dramatically. This happens for all the sets of traces, and for both
AT and CUSUM, and it represents one of the most important features of the proposed ADS.
In order to sketch a comparison between the proposed two-stage ADS and AT or CUSUM used as standalone
algorithms, in the columns labeled as AT-sa and CUSUM-sa we show how they perform in terms of HR when tuned
with approximately the same F AR of the proposed ADS. We see that, in the case of AT, the introduction of the
second stage, improves HR of about 10% for 3 out of 5 trace sets, as for AT, while for CUSUM the improvements
range from about 12%, for the fifth trace set, to almost 50%, for the first one.
17
TABLE III
HR/FAR T RADE -O FF RESULTS .
Dataset
Darpa 1
Darpa 2
Darpa 3
UCLA
UNINA
All
RD(AT)
HR
FAR
95.9
72.8
93.7
68.2
92.1
81.1
90.9
17.7
99.6
69.7
94.2
70.9
FD(AT)
HR
FAR
89.5
34.9
84.9
38.0
83.8
50.1
86.0
14.0
98.0
7.4
87.7
34.1
RD(CUSUM)
HR
FAR
84.0
68.6
85.7
83.6
88.3
77.9
91.5
89.6
99.6
77.3
83.7
86.2
FD(CUSUM)
HR
FAR
82.4
1.56
84.8
38.9
84.7
28.1
86.2
39.8
98.0
12.1
86.3
27.2
AT-sa
HR
FAR
79.0
35.3
74.1
36.4
71.6
51.0
85.7
15.8
86.4
7.0
79.4
33.1
CUSUM-sa
HR
FAR
35.1
6.7
49.4
32.6
62.7
25.0
56.3
44.4
78.6
13.1
49.2
33.9
In Fig. 2 we show how HR and F AR are influenced by the relative amplitude (left figures) and the duration (right)
of the anomalies. Top and bottom figures refer to the system with AT and CUSUM rough detection respectively. We
evaluated performance separately for each anomaly profile. It can be observed that the increasing rate and decreasing
rate anomalies (red and green lines respectively) are more difficult to be detected, compared to the other anomalies.
However, it is interesting to note that the curves related to all the anomaly profiles follow approximately the same
trends. The relative amplitude has more influence on HR and F AR than the anomaly duration. But, when the
anomaly amplitude is tuned for peak values greater than the RMS of the trace (relative amplitude ≥ 1) HR does not
increase anymore. A similar behavior happens for F AR in the AT case, while as for the CUSUM implementation
F AR tends to slowly decrease even after the relative amplitude is higher than 1. As regards the anomaly duration,
while F AR always decreases when the anomaly lasts longer, HR inverts this trend after a certain duration. This
behavior is accentuated in the CUSUM case.
Rough Detection: Adaptive Threshold
FAR
FP
40
50
50
50
40
Rough Detection: Cumulative SUM
60
60
50
FAR
70
60
60
FAR
70
40
40
30
30
20
0.5
20
30
30
1
Relative Amplitude
1.5
20
20
50
2
100
100
150
Duration
200
250
10
0.5
300
1
Relative Amplitude
1.5
10
50
2
100
150
Duration
200
250
300
100
150
Duration
200
250
300
100
100
100
90
80
60
80
HR
60
HR
80
TH
HR
80
60
70
60
40
20
0.5
40
40
1
Relative Amplitude
1.5
20
50
2
100
Fig. 2.
150
Duration
200
250
50
20
0.5
300
1
Relative Amplitude
1.5
2
40
50
HR and FAR as functions of attacks’ relative amplitude and duration.
The diagrams in Fig. 3 show the percentage of correct estimates of the start and the end time of the anomalies,
when the width of the confidence interval (expressed in number of samples) increases. We consider the estimate
to be correct when the start/end time falls into the confidence interval. For a confidence interval of 30 samples,
100
100
90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
0
0
10
20
30
40
50
60
Confidence
70
80
Fig. 3.
90
Start Accuracy
End accuracy
10
Start accuracy
End accuracy
100
0
0
10
20
30
40
50
60
Confidence
70
80
90
100
ADS accuracy.
70% of the start and end times are correctly identified. In general, we note a slightly better performance in the
estimation of the start time compared to the end time. We also evaluated when the system did not correctly estimate
the anomaly duration because the anomaly was recognized as several different anomalous events. This occurred
rarely: for only 4.62% of the detections with the AT rough detection block, and 1.62% with CUSUM.
VI. C ONCLUSION AND ISSUES FOR FUTURE RESEARCH
This paper proposed a cascade architecture based on the Continuous Wavelet Transform to detect volume-based
network anomalies caused by DoS attacks. We showed how the proposed schema is able to improve the trade-off
existing between HR and F AR and at the same time to provide insights on anomaly duration (defining starting
and ending time intervals) and on the identification of subsequent close anomalies. Our current work is focused on
modifying the proposed system to work in a real time (or on-line) fashion.
18
R EFERENCES
[1] A. Dainotti, A. Pescapè, G. Ventre, “Wavelet-based Detection of DoS Attacks” 2006 IEEE GLOBECOM - Nov 2006, San Francisco
(CA, USA)
[2] V. A. Siris, F. Papagalou, “Application of Anomaly Detection Algorithms for Detecting SYN Flooding Attacks”, IEEE GLOBECOM
2004, Nov. 2004, pp. 2050-2054.
[3] R. B. Blazek, H. Kim, B. Rozovskii, A. Tartakovsky, “A Novel Approach to Detection of Denial-of-Service Attacks via Adaptive
Sequential and Batch-Sequential Change-Point Detection Methods”, IEEE Workshop Information Assurance and Security, 2001, pp.
220-226.
[4] S. Mallat, W. L. Hwang, “Singularity Detection and Processing with Wavelets”, IEEE Trans. on information theory, vol. 38, No.2, Mar.
1992.
[5] http://www-stat.stanford.edu/ wavelab/ .
[6] R. Lippmann, et al., “The 1999 DARPA Off-Line Intrusion Detection Evaluation”, Computer Networks 34(4) 579-595, 2000. Data is
available at http://www.ll.mit.edu/IST/ideval/
[7] G. Vigna, R. Kemmerer, “NetSTAT: A Network-based Intrusion Detection System”, Journal of Computer Security, 7(1), IOS Press,
1999.
[8] R. Sekar, A. Gupta, J. Frullo, T. Shanbhag, S. Zhou, A. Tiwari and H. Yang, “Specification Based Anomaly Detection: A New Approach
for Detecting Network Intrusions”, ACM CCS, 2002.
[9] http://lever.cs.ucla.edu/ddos/traces
[10] J. Mirkovic, G. Prier, P. Reiher, “Attacking DDoS at the Source”, ICNP 2002, pp. 312-321, Nov. 2002.
[11] CERT Coordination Center. Denial-of-service tools - Advisory CA-1999-17, http://www.cert.org/advisories/CA-1999-17.html , Dec.
1999.
[12] CERT Coordination Center. DoS Developments - Advisory CA-2000-01, http://www.cert.org/advisories/CA-2000-01.html , Jan. 2000.
[13] J. Yuan, K. Mills, “Monitoring the macroscopic effect of DDos flooding attacks”, IEEE Trans. on dependable and secure computing,
vol.2, N.4, 2005.
19
Experimental Analysis of Attacks Against
Intradomain Routing Protocols
Alessio Botta, Alberto Dainotti, Antonio Pescapè, Giorgio Ventre,
Dipartimento di Informatica e Sistemistica – University of Napoli “Federico II”
{a.botta,alberto,pescape,giorgio}@unina.it
Abstract— Nowadays attacks against the routing infrastructure are gaining an impressive importance. In
this work we present a framework to conduct experimental analysis of routing attacks and, to prove its
usefulness, we study three attacks against routing protocols: route flapping on RIP, Denial of Service on OSPF
by means of the Max Age attack and, finally, route forcing on RIP. We present a qualitative analysis and a
performance analysis that aims to quantify the effects of routing protocol attacks with respect to routers
resources and network traffic over controlled test beds.
I. INTRODUCTION
Routing protocols implement mechanisms to discover the optimal route between end points, describing peering
relationships, methods of exchanging information, and other kind of policies. Since network connectivity depends on
proper routing, it follows that routing security is a critical issue for the entire network infrastructure. In spite of this,
while other aspects of computer and communication security, as network applications and system security, are
subjects of many studies and of widespread interest, attacks against routing protocols are less known. As Bellovin
states in [1], this is probably due to two fundamental reasons: effective protection of the routing infrastructure is a
really hard problem and it is outside the scope of traditional communications security communities. Moreover, most
communications security failures happen because of buggy code or broken protocols, whereas routing security
failures happen despite good code and functioning protocols. For instance, in the case of the routing infrastructure
one or more dishonest or compromised routers can alter the routing process, and a hop-by-hop authentication is not
sufficient. In this abstract, however, we are not focused on particular countermeasure mechanisms, but rather, on the
effects of some routing protocols attacks. This work presents an approach to conduct experimental studies of attacks
against the routing infrastructure observing the effects and quantifying their impact with respect to network and
devices resources. More precisely, we focus our attention on an experimental study of three types of attacks against
Interior Gateway Protocols (IGP). In order to have an evaluation of these attacks, we used a controlled and fully
configurable open test bed. In this way we were able to control as much variables as possible, as well as to configure
several network topologies. This also allowed repeatability of experiments, obtaining numerical results, which show
the degradation of the network configurations under test, with high confidence intervals. Pointing our attention on
router resources, network traffic, convergence time and, in general, on network behavior before, during and after the
attacks, we found interesting results for all of the above attack scenarios.
II. RELATED WORKS AND BACKGROUND
For a careful analysis of related work refer to [2].RIP and OSPF are the most commonly deployed intra-domain
routing protocols. Both these protocols describe methods for exchanging routing information (network topology and
routing tables) between routers of an Autonomous System. Both RIP and OSPF are mainly affected by the lack of
mechanisms to guarantee integrity and authentication of the information exchanged.
In the RIP scenario, with the term Route Flapping we mean an attack consisting in the advertisement and
withdrawal of a route (or withdrawal and re-advertisement) alternating in rapid succession, thereby causing a route
oscillation. Whereas, with Route Forcing, we mean to force a different path with respect to the optimal path
indicated by the routing protocol, causing service degradation. In the OSPF scenario, when an attacker continually
interjects legitimate advertisement packets of a given routing entity with spoofed packets in which the age is set to
the maximum value, he causes network confusion and may contribute to a DoS condition. Such an attack not only
consumes network bandwidth, but also makes the routing information database inconsistent, disrupting correct
routing. In this case we have the Max Age attack.
The previous selected attacks are quite different in nature, which clearly poses several limitations to the way to fix
The content of this abstract is part of the following published paper: Antonio Pescapè, Giorgio Ventre, “Experimental analysis of attacks against intradomain
routing protocols”, Journal of Computer Security - Volume 13, Number 6 / 2005, Pages:877 - 903
20
them in a unified fashion. This work provides just a methodology to qualitatively and quantitatively analyze the
impact that routing attacks have on the network infrastructure, leaving room to propose innovative mitigation
strategies and restoring policies.
III. EXPERIMENTAL SCENARIO
In order to have an evaluation of the above-mentioned attacks, we used as a proof-of-concept a controlled and
fully configurable open test bed. In this way we were able to control as much variables as possible, as well as to
configure on our own several network topologies. We preferred to use a controlled test bed also to have a full
control of network devices and network traffic. This aspect is important when several tests must be performed in
order to obtain numerical results. Indeed, if we want to draw a general behavior and we want to guarantee the
experiment repeatability we have to know and control the dependent variables. Our objective, indeed, is to
demonstrate how it is possible to show and analyze what happens to a network under attack and in particular how to
evaluate the quantitative impact on network resources. Thus, while the presented methodology is of general validity,
the numerical results obtained in the following experiments are important to show the degradation of the specific
network configurations under test.
In Fig. 1 the experimental test bed used in our practical analysis is depicted, whereas in Table 1 a complete
description of hardware and software characteristics is shown. The experimental test bed is composed of eight
networks and seven routers (with back-to-back connections). We called the routers Aphrodite, Cronus, Poseidon,
Gaia, Zeus, Helios, and Calvin. In Figure 1, the numbers in the bullets represent the last field of the IP address
related to the indicated network address (placed on the link between two routers). It is worth noting that, by
changing the position of the links between each pair of routers and by consequently modifying (when needed) the
addressing plane, we are able to produce a large number of network topologies. We used GNU Zebra [3] to
implement routing protocols on our Linux routers. As regards traffic generation, we used D-ITG [4,5]. To passively
capture transferred packets, without influencing the systems constituting the network configurations under test, we
worked with the Ethereal [6] network sniffer. With the terms packet forgers, instead, we mean tools
able to create and send known protocols packets, filling the various fields with information chosen by the user,
usually an attacker. Such software also allow to indicate IP source and destination addresses in a totally arbitrary
way and attend to the calculation of the control fields that are built from the data inserted in the packet. To perform
“routing packets forgery” and conduct our simulated attacks, among several available tools as Spoof [7], Nemesis
[8], IRPAS [9], and srip [10], we chose Spoof because of its bias towards routing protocols, and because it was
previously used in other works reported in literature that are related to security of routing protocols [11].
Using common PCs as well as open source and freeware tools guarantees that our experiments can be repeated in
a more simple way by other practitioners or researchers in the field of computer network security.
Figure 1: Network test bed – Route Flapping on RIP
Figure 2: Route Flapping on RIP: average time of a stable path
TABLE I - HARDWARE AND OS DESCRIPTION
CPU
MEMORY
OS
Network Cards
Intel PII 850 MHz
RAM 128MB – Cache 256 KB
Linux Red Hat 7.1 – Kernel 2.4.2-2
Ethernet Card 10/100 Mbps
Figure 3: Flapping on RIP: number of route changes during tests
21
IV. EXPERIMENTAL ANALYSIS
It is worth mentioning that we repeated each test several times. In the following subsections, in the case of
numerical values, we present the average value over several experiment repetitions. Thanks to the use of a controlled
test bed we had a confidence interval 96%. Therefore, for each experiment, the numerical results found in this
work can be considered very reliable in representing how an attack impacts the network configuration under test.
This is an important aspect of the proposed methodology.
A. Route Flapping on RIPv2
In Fig. 2 the network scenario and the actors (subnets and routers) of this attack are depicted. In this test, by
sending false information from Poseidon to Cronus, we forced the RIPv2 to use different routes at different time
intervals. By using spoof we announced to Aphrodite a longer distance for the 192.168.2.0 subnet with respect to the
real path, this was repeated at regular intervals. Therefore, the effect of the attack was to make Cronus believe that
the shortest path to reach subnet 192.168.2.0 had Poseidon as first hop instead of Aphrodite. RIPv2 reacted with a
path oscillation (route flapping) between Cronus and subnet 192.168.2.0: one time the best path had Poseidon as
first hop, the other time it had Aphrodite. We carried out a high number of trials by varying the time interval
between two malicious packets. Also, all trials have been repeated in two different situations. In the first one, the
false information (malicious packets) sent from Poseidon to Cronus specified a distance between Aphrodite and
192.168.2.0 subnet equal to 10 hops. In the second one, the false information reported that the 192.168.2.0 subnet
was unreachable from Aphrodite. The effect of the attack can be monitored also by looking at Cronus routing table
before and during the attack, or by issuing the traceroute command between Cronus and Calvin (192.168.2.2).
Before analyzing experimental results we would like to underline that in a normal situation RIP indicates a path
between Cronus and 192.168.2.0 subnet made by 3 hops, whereas during route flapping the path length oscillates
between 3 and 6 hops. Our test lasted for 200 seconds and we repeated it for different time intervals between two
successive malicious packets containing false information (3s, 6s, 9s, … up to 30s). For an interval of 200s we
evaluated the number of route flappings and the length of a stable path. In Fig. 5, the number of the route changes
(route flapping) inducted is reported. The first cycle is related to the attack that sends information on distance
between Aphrodite and 192.168.2.0 subnet equal to 10 hops. The second cycle is related to the attack that sends
information on the distance between Aphrodite and 192.168.2.0 subnet equal to 16 hops (unreachable). We noticed
that in the second cycle there are less route changes than in the first cycle. In addition, we observed that in both
cycles, the maximum number of route changes is obtained far from the interval bounds (as you can see the highest
value is obtained in the experiment with a time interval, between two successive packets, of around 12s). Close to
the interval bounds a smaller number of oscillations is experienced. In Figs 7, the average durations of the stable
path with 6 hops (measured in seconds) in function of the malicious packets inter-departure time (IDT), is depicted.
Looking also at the diagram of the stable path made of 3 hops, we noticed that, in both cases, the first cycle the
average duration of stable paths was independent from malicious packets IDT.
Figure 4: Route Flapping on RIP: OUT Traffic (Second cycle)
Figure 5: Denial of Service on OSPF: Experimental test-bed
Finally, using SNMP, for each router in the test bed and for each experiment, the total amount of input and output
traffic was calculated. We plotted the traffic amount, indicated as a percentage of the minimum number of bytes for
each interface of each router (in both input and output directions). The figures contain a single line for each router.
In all diagrams it is possible to observe that there is a similar envelope for Calvin, Helios, Aphrodite, and Cronus
plots. Here, for lack of space we show only the diagram related to output traffic of the 2nd cycle.
22
B. Denial of Service on OSPF
The experimental test bed is depicted in Fig. 12. The target of OSPF Max Age Attack is to cause a DoS. Using
spoof we sent false LSAs from Poseidon to Calvin. These LSAs contained information on Zeus and Cronus and had
the Age field set to the maximum value. The result of these actions was the loss of contact between Zeus/Cronus and
Calvin/Helios. Indeed, because of the information that is believed to be sent by Poseidon, the routers named Calvin
and Helios deleted the entries for Zeus and Cronus in their LSA databases. Attack consequences are very simple to
understand. We verified the success of the attack using the ping command from Calvin and Helios, directed to
subnets 192.168.1.4, 192.168.1.8, and 192.168.1.64 (reporting a network unreachable error during the attack).
Figure 6: Denial of Service on OSPF: Traffic analysis
Figure 7 Route Forcing on RIP: TCP throughput
An analysis of the Helios LSA database before and during the attack confirmed the depicted behavior. During this
attack, by using OspfSpfRuns SNMP variable we counted the number of executions of the Dijkstra algorithm (in a
single specific area) for each router. By taking into account the number of recalculations during the attack we could
evaluate the computational load for each router. It is worth noting that in our test we use a PC with 128 MB RAM.
In a real router we may find a smaller amount of memory, hence controlling the number of recalculations – with a
large number of routing entries - is important, especially with a large network and a large number of routers. It is
important to underline that during our first experiment (50s) there were six executions of the Dijkstra algorithm for
each router. This means that, besides the depicted DoS effect, we were able to increase also the computational load
of each router in the testbed. Finally, it is interesting to note that if we increase the duration of the experiment (100s,
150s, ...) the number of recalculations per second is basically constant. After this first measure we repeated the
experiment eliminating all other traffic sources (ping, …) from our experimental network. In this way we were able
to measure the precise amount of traffic on the network during the attack. In order to have a reference value, during a
time interval equal to 60s, we measured the traffic load with and without the Max Age Attack on OSPF. In Fig. 17
comparisons among data related to network load when the systems were under attack, expressed in percentages with
respect to the values found without the attack, are depicted. The most important result is that the maximum increase
of traffic happens on the router interfaces that rely on the subnet that was unreachable during the attack.
C. Route Forcing on RIP
Between Aphrodite and Cronus there is one single hop. Using the Route Forcing attack, we forced the traffic to
follow a longer path from Aphrodite to Cronus, to finally reach subnet 192.168.2.0. In order to obtain such result we
used spoof on both Zeus and Poseidon. Repeating this packet sending activity every 3s we forced the artificial route
for the entire experimental time interval. Moreover, because of the Route Forcing on RIPv2, we moved from a path
with a link bandwidth equal to 100 Mbps to a path made of links of 10 Mbps. Using D-ITG we measured the
throughput on both paths. During the attack, TCP and UDP flows experienced a saturated path, whereas after the
attack both flows relied on a non-saturated path (Fig.24).
V. CONCLUSION
In this work we presented a methodology to conduct experimental analysis of attacks against routing protocols.
We showed how an attack can be performed and emulated, with which tools it is possible to carry out the emulation
and analysis, and what are the impacts on the networks under attack. As for the effects, we pointed our attention on
router resources, network traffic, convergence time and on network behavior before, during and after the attacks.
23
REFERENCES
[1]
[2]
S. M. Bellovin, Routing Security, Talk at British Columbia Institute of Technology, June 2003.
Antonio Pescapè, Giorgio Ventre, “Experimental analysis of attacks against intradomain routing protocols”, Journal of Computer Security - Volume 13,
Number 6 / 2005, Pages:877 – 903
[3] http://www.zebra.org/ (as of April 2007)
[4] http://www.grid.unina.it/software/ITG (as of April 2007)
[5] S. Avallone, D. Emma, A. Pescapè, and G. Ventre. Performance evaluation of an open distributed platform for realistic traffic generation. Performance
Evaluation: An International Journal (Elsevier Journal), Volume 60, Issues 1-4, pp. 359-392 (May 2005) - ISSN: 0166-5316
[6] http://www.ethereal.com (as of April 2007)
[7] http://www.ouah.org/protocol_level.htm (as of April 2007)
[8] The nemesis packet injection tool-suite. http://nemesis.sourceforge.net/ (as of April 2007)
[9] FX. “IRPAS – Internetwork Routing Protocol Attack Suite”. http://www.phenoelit.de/irpas/ (as of April 2007)
[10] Reliable Software Group Website (University of California Santa Barbara), http://www.cs.ucsb.edu/~rsg/ (as of April 2007)
[11] V. Mittal, G. Vigna. Sensor-Based Intrusion Detection for Intra-Domain Distance-Vector Routing. In Proceedings of CCS 2002, 9th ACM Conference on
Computer and Communications Security. November 17-21, 2002, Washington, DC.
24
Validating Orchestration of Web Services with
BPEL and Aggregate Signatures ∗
Carlo Blundo †
Clemente Galdi
§
Emiliano De Cristofaro ‡
Giuseppe Persiano ¶
May 17, 2007
Abstract
In this paper, we present a framework providing integrity and authentication for secure workflow computation based on BPEL Web
Service orchestration. We address a recent cryptographic tool, aggregate signatures, to validate the orchestration by requiring all partners
to sign the result of computation. Security operations are performed
during the orchestration and require no change in the services implementation.
1
Introduction
Web Services technology [12] provides software developers with a wide range
of tools and models to produce innovative distributed applications. Interoperability, cross platform communication, and language independence are only
a part of the appealing characteristics of Web Services. The standardization
and the flexibility introduced by Web services in the development of new
applications translate into increased productivity and gained efficiency.
Furthermore, the challenges of fast growing and fast evolving business
processes lead to a higher level of interactions and to the need of applications
integration across organizational boundaries. Services are no longer designed
∗
This work has been partially supported by the European Commission through the IST
program under contracts FP6-1596 (AEOLUS).
†
Dipartimento Informatica ed Applicazioni - Università di Salerno - [email protected]
‡
§
Dipartimento di Scienze Fisiche - Università di Napoli Federico II - [email protected]
¶
25
as isolated processes, but to be invoked by other services and to invoke themselves other services. This paradigm is often referred to as Service-Oriented
Computing (SOC) [27]. Two different approaches can be considered: service
orchestrations, i.e., there is one particular service that directs the logical order of all other services, and service choreographies, i.e., individual services
work together in a loosely coupled network [24].
Interactions should be driven by explicit process models. Therefore, the
need of languages to model business process arises in particular for processes implemented by Web Services. Recently, many languages have been
proposed, such as BPML [17], XLANG [11], WSCI [10], WS-BPEL [22], WSCDL [9]. Among these, the Business Process Execution Language for Web
Services (WS-BPEL) has emerged as the de-facto standard for “enabling
users to describe business process activities as Web services and define how
they can be connected to accomplish specific tasks” [22]. However, whereas
security and access control policies for normal Web Services are well studied,
Web Services composition still lacks a standard tool to validate the orchestration of processes, in terms of providing integrity and authenticity of the
computation.
In this paper, we refer to the standard tool for providing integrity and authentication in network interactions, i.e., cryptographic signatures. However,
two different issues should be considered when addressing this tool to provide
security within Web Services interactions. First, the number of signatures
would grow linearly to the number of users and processes, arising issues related to the size of certificate chains and bandwidth overhead. Second, there
could be applications requiring a specific order to perform processes.
To this aim, we address a recent cryptographic tool, aggregate signatures,
first presented by Boneh et al. [15]. Aggregate signatures allow to aggregate
multiple signatures on distinct messages into a single short signature. Furthermore, we consider a variant of this primitive, in which the set of signers
is ordered. This scheme is referred to as sequential aggregate signatures and
it has been presented by Lysyankaya et al.[25].
In this paper, we present a framework which uses these two schemes to
validate Web Services orchestration, by requiring each partner to sign the
result of its computation. In particular, since Web Services interactions can
either be performed in parallel or sequentially, we need both schemes: when
the computation is carried out in parallel, we use the “standard” aggregate
signature scheme presented in [15] and its sequential variant presented in [25]
to validate sequential workflow computation.
The rest of the paper is structured as follows. In Section 2, we give
an overview of the endorsed technologies, while in Section 3 we present our
framework.
26
2
2.1
Background
Workflow and WS-BPEL
A workflow describes the automation of a business process. During the process documents, information, or roles are exchanged among actors in order to
complete a task as specified by a well-defined set of rules. In other words, a
workflow is composed by a set of tasks, related to each other through different
types of relationships [20].
A workflow management system allows to define, create, and manage
the execution of a workflow through a software executing on one or more
workflow engines. A workflow manager interpreters the formal definition of
a process, in order to interact with several actors by managing states and
tasks coordination.
Tasks, actors, and processes within the workflow can be designed as desired. In this paper, we focus on the Web Service technology and on WSBPEL, the Business Process Execution Language for Web Services. For a
thorough overview of Web Services, we refer to [21]. WS-BPEL is an XMLbased language for describing the behavior of business process based on Web
Services [22]. It provides activities, which can be either basic or structured.
A basic activity can communicate with the partners by messages (invoke,
receive, reply), manipulate data (assign), wait for some time (wait), do
nothing (empty), signal faults (throw), or end the entire process (terminate).
A structured activity defines a causal order on the basic activities and can be
nested in another structured activity itself. The structured activities include
sequential execution (sequence), parallel execution (flow), data-dependent
branching (switch), timeout-or message-dependent branching (pick), and
repeated execution (while). The most important structured activity is a
scope. It links an activity to a transaction management and provides fault,
compensation, and event handling. A process is the outmost scope of the
described business process. For more details about BPEL, we refer to [28],
[19], and [23].
WS-BPEL is a standard specification, therefore it must be implemented.
Nowadays, many implementations are available, such as Bexee [3], Oracle
BPEL Process Manager [6], BPEL Maestro [7], or iGrafx [5]. Among these,
the ActiveBPEL implementation [2] appears to be one of the most interesting
solutions, as being open source and freely available. ActiveBPEL is deployed
as a servlet into Apache’s Jakarta Tomcat container. It has extensive documentation and has been released by Active EndPoints into the public domain
under a Lesser Gnu Public License. It provides a full implementation of the
BPEL 1.1 specification. Moreover, Active EndPoints released two useful
27
and powerful tools: ActiveBPEL Designer and ActiveBPEL Enterprise. The
former is a comprehensive visual tool for creating, testing, and deploying
composite applications based on BPEL standard, and it is a plug-in to the
Eclipse development environment [4]. The latter is a complete BPEL engine,
which provides many features to administrate BPEL processes.
2.2
Aggregate Signatures
An aggregate signature scheme is a digital signature that supports aggregation. Given n signatures on n distinct messages from n distinct users, it is
possible to aggregate all these signatures into a single short signature.
For the rest of the paper, we will use the acronym AS to refer to the
signature scheme presented in [15]. Let U be the set of possible users. Each
user u ∈ U holds a keypair (P Ku , SKu ). Consider a subset U ⊆ U. Each user
u ∈ U signs a message Mu obtaining σu . All the signatures are combined by
an aggregating party into a single aggregate σ, whose length is equal to any
of the σu . The aggregating party has not to be one of the users in U, neither
it has to be trusted by them. The aggregating party needs to know to the
users’ public keys, the messages, and the signatures on them, but not any
private keys. This scheme allows a verifier, given σ, the identities of the users,
and the messages, to check whether each user signed his respective message.
The AS scheme is based on Co-gap Diffie-Hellman signatures (for details, see
[16]) and bilinear maps and it is proved to be secure in a model that gives
the adversary the choice of public keys and message to force. Authors added
the constraint that an aggregate signature is valid only if it is an aggregation
of signatures on distinct messages. However, in [13] a new analysis and proof
is provided to overcome this limitation, yielding the first truly unrestricted
aggregate signature scheme.
Moreover, a sequential aggregate signature scheme has been proposed in
[25]. For the rest of the paper, we refer to it as SAS. In this scheme, the set
of signers is ordered. The aggregate signature is computed by having each
signer, in turn, add his signature to it. The construction proposed in [25]
is based on families of certified trapdoor permutations. In the same paper
the authors explicitly instantiate the proposed scheme based on RSA. This
scheme is based on the the full-domain hash signature scheme introduced in
[14]. Such a scheme works with any trapdoor one-way permutation family.
However, also this scheme is affected by the restriction that distinct signers have to sign distinct messages. Once again, in [13] a new analysis and
proof has been showed that the restriction can also be dropped, yielding an
unrestricted sequential aggregate signature scheme.
The main difference with sequential aggregate signatures is that each
28
signer transforms a sequential aggregate into another that includes a signature on a message of his choice. Signing and aggregation are a single
operation; sequential aggregates are built in layers: the first signature in the
aggregate is the inmost. As with non-sequential aggregate signatures, the
resulting sequential aggregate is the same length as an ordinary signature.
Such a scheme is proved secure in the random oracle model.
2.3
WS-Security
In the context of Web Services, many supplemental standards for guaranteeing security features have been released in the last years, and collected under
the WS-Security specification [26]. This specification (whose last version has
been released at the beginning of the 2006) is the result of the OASIS Technical Committee’s work [1]. It proposes a standard set of SOAP extensions
that can be used when building secure Web services to provide confidentiality
(messages should be read only by sender and receiver), integrity and authentication (the receiver should be guaranteed that the message is the one sent
by the sender and has not been altered), non repudiation (the sender should
not be able to deny the sending of the message), and compatibility (messages
should be processed according to the same roles by any node on the path
message). Such mechanisms provide a building block that can be used in
conjunction with other Web service extensions and higher-level applications
to accommodate a wide variety of security models and security technologies.
Before the release of the WS-Security standard, such features had to be
implemented by adding and managing customized ad-hoc headers in SOAP
messages exchanged during Web Services interactions. Instead, within WSSecurity these headers become standard as well as mechanisms to manage
them. In fact, WS-Security implementations provide application developers
with dedicated handlers to process security information in a transparent way.
3
Our framework
Our goal is to build a framework which provides integrity and authentication
for secure workflow computation based on BPEL Web Service orchestration.
In our scenario, the orchestrator defines the workflow and describes Web
Services composition through BPEL. We want to ensure that:
1. The workflow has been correctly followed by the defined partners.
2. The workflow has been correctly followed in the defined order.
29
3. The partners cannot repudiate their computation.
4. The partners can verify that their computation has been correctly inserted into the workflow.
Figure 1: Combination of SAS and AS schemes
Figure 2: Framework Architecture
To this aim, we address to the aggregate signature schemes presented
in Section 2, and referred to as AS and SAS. We hereby propose a novel
aggregate signature scheme as the combination of the AS scheme in [15] and
the SAS scheme in [25]. Actually, we need both schemes since composite Web
Services execution can be performed either in a parallel or in a sequential
30
way. Therefore, we use SAS to sign messages sequentially produced by the
partners. Instead, whenever a parallel execution is performed, we use AS to
sign and combine produced messages.
As depicted in Figure 1, when an orchestration includes both parallel and
sequential executions, we need to “map” SAS to AS, since the two signature
schemes work on different fields.
At the end of the computation, we expect the Orchestrator to perform
signature verification to check its correctness. However, since AS and SAS
allow every user to perform aggregate signature verification, every entity involved in the composition may verify that its computation has been correctly
inserted in the workflow.
To implement our scheme, it is necessary to modify the ActiveBPEL
engine to support signature operations. We address the WS-Security standard to represent information in a standard way. In particular, we refer to
the possibility of exchanging security data within the BinarySecurityToken
field in the message header. In fact, WS-Security provides to developers the
chance of defining new security tokens and store them as binary data into the
BinarySecurityToken field. In particular, it is possible to force the value
type of the token to respect an XML schema [8].
In order to add this security layer, we need to modify ActiveBPEL to
allow intermediate operations to be performed between services’ invocations.
In fact, to the best of our knowledge there is no BPEL engine supporting
security features in the service composition. Therefore, we need to modify the
process deployment operation by introducing a sign service to be invoked.
Figure 2 shows the overall architecture of the framework. The core components are the BPEL process, the BPEL engine, and the security service
[18]. BPEL engine has been modified to add security services: aggSign for
the AS feature, seqAggSign for the SAS feature, and addToken for adding
the security information in the SOAP header.
References
[1] Organization for the Advancement of the Structured Information Standards (OASIS). http://www.oasis-open.org/home/index.php.
[2] ActiveEndPoints ActiveBPEL Open Source Engine, BPEL Standard.
http://www.activebpel.org/, Last visit, April 2007.
[3] Bexee: BPEL EXecution Engine. http://bexee.sourceforge.net/, Last
visit, April 2007.
31
[4] Eclipse SDK. http://www.eclipse.org, Last visit, April 2007.
[5] iGrafx BPEL. http://www.igrafx.com/products/bpel/, Last visit, April
2007.
[6] Oracle BPEL Process Manager. http://www.oracle.com/technology/bpel/index.html,
Last visit, April 2007.
[7] Parasoft BPEL Maestro. http://www.parasoft.com/BPELMaestro, Last
visit, April 2007.
[8] W3C XML Schema. http://www.w3.org/XML/Schema, Last visit, April
2007.
[9] Web Service Choreography Description Language (WSCDL) 1.0.
http://www.w3.org/TR/ws-cdl-10/, Last visit, April 2007.
[10] Web
Service
Choreography
Interface
(WSCI)
http://www.w3.org/TR/wsci/, Last visit, April 2007.
[11] XLANG:
XML-based
extension
of
http://www.ebpml.org/xlang.htm, Last visit, April 2007.
1.0.
WSDL.
[12] Gustavo Alonso, Fabio Casati, Harumi Kuno, and Vijay Machiraju.
Web Services: Concepts, Architecture and Applications. Springer Verlag,
2004.
[13] Mihir Bellare, Chanathip Namprempre, and Gregory Neven. Unrestricted Aggregate Signatures. Cryptology ePrint Archive, Report
2006/285, 2006. http://eprint.iacr.org/.
[14] Mihir Bellare and Phillip Rogaway. Random oracles are practical: a
paradigm for designing efficient protocols. In CCS ’93: Proceedings of
the 1st ACM Conference on Computer and Communications Security,
pages 62–73, New York, NY, USA, 1993. ACM Press.
[15] Dan Boneh, Craig Gentry, Ben Lynn, and Hovav Shacham. Aggregate
and verifiably encrypted signatures from bilinear maps. In Proceedings
of Advances in Cryptology - Eurocrypt 2003, Lecture Notes in Computer
Science, volume 2656, pages 416–432. Springer-Verlag, Berlin, 2003.
[16] Dan Boneh, Ben Lynn, and Hovav Shacham. Short signatures from
the weil pairing. In Proceedings of Advances in Cryptology - Asiacrypt
2001, Lecture Notes in Computer Science, volume 2248, pages 514–532.
Springer-Verlag, Berlin, 2001.
32
[17] Business Process Modeling Language. http://www.bpmi.org. Last visit,
April 2007.
[18] Anis Charfi and Mira Mezini. Using aspects for security engineering of
web service compositions. In ICWS ’05: Proceedings of the IEEE International Conference on Web Services (ICWS’05), pages 59–66, Washington, DC, USA, 2005. IEEE Computer Society.
[19] Xiang Fu, Tevfik Bultan, and Jianwen Su. Analysis of interacting bpel
web services. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, pages 621–630, New York, NY, USA, 2004.
ACM Press.
[20] Dimitrios Georgakopoulos, Mark Hornick, and Amit Sheth. An overview
of workflow management: from process modeling to workflow automation infrastructure. Distrib. Parallel Databases, 3(2):119–153, 1995.
[21] Steve Graham, Simeon Simeonov, Toufic Boubez, Glen Daniels, Doug
Davis, Yuichi Nakamura, and Ryo Neyama. Building Web Services with
Java: Making sense of XML, SOAP, WSDL, and UDDI. 2001.
[22] IBM, Microsoft,
and BEA Systems.
cess execution language for web services.
http://www.ibm.com/developerworks/library/ws-bpel.
Business proAugust 2002.
[23] R. Khalaf, A. Keller, and F. Leymann. Business processes for Web
Services: principles and applications. IBM Syst. J., 45(2):425–446, 2006.
[24] Niels Lohmann, Peter Massuthe, Christian Stahl, and Daniela Weinberg. Analyzing Interacting BPEL Processes. In Business Process Management, 4th International Conference, BPM 2006, Vienna, Austria,
September 5-7, 2006, Proceedings, volume 4102 of Lecture Notes in Computer Science, pages 17–32. Springer-Verlag, September 2006.
[25] Anna Lysyanskaya, Silvio Micali, Leonid Reyzin, and Hovav Shacham.
Sequential aggregate signatures from trapdoor permutations. In Proceedings of Advances in Cryptology - Eurocrypt 2004, Lecture Notes in
Computer Science, volume 3027, pages 74–90. Springer-Verlag, Berlin,
2004.
[26] Anthony Nadalin, Chris Kaler, Phillip Hallam-Baker, and Ronald
Monzillo. Web Services Security: SOAP Message Security 1.1, OASIS.
2006. http://www.oasis-open.org/committees/download.php/16790/wssv1.1-spec-os-SOAPMessageSecurity.pdf .
33
[27] Mike P. Papazoglou. Agent-oriented technology in support of e-business.
Communications of ACM, 44(4):71–77, 2001.
[28] Petia Wohed, Wil M. P. van der Aalst, Marlon Dumas, and Arthur H. M.
ter Hofstede. Analysis of Web Services Composition Languages: The
Case of BPEL4WS. In Proceedings of the 22nd International Conference
on Conceptual Modeling (ER), pages 200–215, 2003.
34
PassePartout Certificates
Ivan Visconti
Dipartimento di Informatica ed Applicazioni
Università di Salerno
via Ponte Don Melillo
84084 Fisciano (SA) - ITALY
E-mail: [email protected]
Abstract
The invention of public-key cryptography revolutioned the design of secure systems
with a tremendous impact on the development of a cyberspace where digital transactions replace all activities of the physical world. The use of public-key cryptography
evidenced the need of public-key infrastructures and digital certificates. Such tools
were developed and integrated into the de-facto standard security architectures on the
Internet.
After the invention of public-key cryptography the crypto world continued the
development of cryptographic gadgets that when security, privacy and usability are
considered as major requirements, can produce amazing impacts on current technologies.
The goal of this work is to show how to plug in recent results of the crypto world in
some of the current standards used by the information security world. In particular we
consider the use of “trapdoor commitments” and of “zero-knowledge sets” and show
how to extend the features of current standard implementations of access controlbased systems using such two crypto primitives, still preserving (and in some cases
improving) the efficiency and usability of current systems.
Keywords: PKIX, attribute certificates, commitment schemes, ZK sets.
The work of the author is supported in part by the European Commission through
the IST program under Contract IST-2002-507932 ECRYPT, and in part by the European
Commission through the FP6 program under contract FP6-1596 AEOLUS.
1
Introduction
The design of secure systems includes the use of cryptographic tools and their combination in
a sophisticated process. Since cryptography evolves and new powerful gadgets are provided,
security experts should reconsider the design of their systems as they could benefit from
such novel results. Unfortunately this is not always the case, the crypto world and the
information security world do not seem to have a so strong interaction and thus more
35
attention should be paid to the blending of these two worlds. It is crucial that security
experts ask to cryptographers for new tools with specific properties, and that cryptographers
inform security experts about recent and potentially interesting results.
Access control by means of digital certificates. In this work we focus on the problem
of designing secure systems that need an access control functionality, i.e., the possibility of
personalizing the execution of a service on the base of the privileges of the parties. This is a
typical security problem that is already widely spread in the cyberspace and thus it already
deserved a central attention in the past.
We observe that these crucial and popular features seem to suffer the previously discussed
gap between the crypto and the information security worlds. The widely used public-key
infrastructure based on X509v3 certificates [7] and its integration by means of so called
attribute certificates [5] represents a major combination of cryptographic tools (i.e., signature
schemes and collision-resistant hash functions) for realizing secure transactions. However
these crypto tools were already known long time ago1 , and one should wonder whether new
crypto tools could allow the design of better systems.
X509 and attribute certificates. An X509 certificates is a digital certificate that contains some mandatory information to identify the owner (e.g., the issuer, the subject, the
expiration date, the signature algorithm), along with a digital signature that “certifies” the
binding between such data and a public key. Moreover it can contain some extensions to
add new fields. Such certificates are currently the standard on the Web [4, 6] and for secure
e-mail [14].
The possibility of adding extensions is a first method to perform access control. Indeed, the certification authority can verify possession of some credentials and can add some
corresponding fields to the certificate. Then, the owner of the certificate can simply send
his certificate to a service provider and proving (in general by means of a signature) the
ownership of the certificate, it obtains all privileges that correspond to the encoded credentials. This mechanism is very efficient but unfortunately it is not flexible. The set of
valid credentials of a user is dynamic, indeed credentials expire and new credentials could
be obtained in the future. The previous solution not only is not flexible but also it is not
privacy aware. Indeed, adding all credentials to the digital certificate exposes user data to
a privacy theft. Digital certificates are used for many purposes and giving all credentials to
a service provider that “should” not be interested in many of them can be dangerous.
A both flexible and privacy-aware solution is that of combining X509 certificates (without
extensions) and attribute certificates. A user obtains a X509 certificate from a certification
authority, then when he needs a “certified” credential, he asks an attribute authority for an
attribute certificate. This authority verifies that the user owns such credentials and that the
user owns the digital certificate, then it releases a specific certificate for those credentials that
however is linked to the X509 certificate. Such a link guarantees that the attribute certificate
can not be used by other users. This solution is flexible because attribute certificates can
be short-lived while X509 certificates can still be long-lived. Moreover the user will send to
a service provider the sole attribute certificates that are needed to obtain the appropriate
privileges.
1
There are always updates regarding the specific implementations of these tools that generate then
updates of the implementations of security architectures, but the design of such systems do not change.
36
The combined use of X509 and attribute certificates is thus a practical proposal that is
used with success as it seems to guarantee satisfying security, privacy and flexibility.
Our contribution. Given this state-of-the art, we first discuss the power of such standard techniques for access-control based transactions showing some weaknesses in concrete
applications. Then, we discuss some useful additional tools produced by the crypto world,
namely trapdoor commitment schemes and zero-knowledge sets. We show that these tools
can be integrated in current standard technologies thus making current systems more robust with respect to many concrete applications, where security, privacy and usability are
currently not satisfied.
2
Weaknesses of Current Standards
Assume an access-control policy is based on the nationality2 of a user. This is a concrete
example as many access-control policies include such rules or variations of them. A citizen
can get from his government (that works as an attribute authority) an attribute certificate
so that he uses it along with his X509 (identity) certificate to obtain the required privileges.
We need wildcards. There are special cases in which some specific users have strong
privileges. A diplomatic representative at the United Nations would need for its missions
a wildcard that allows him to have the privileges of all nationalities. This can be managed
using current standards in two ways. The first way is to give him an attribute certificate
where the nationality is a special string that corresponds to all nationalities. The second
way is to give him an attribute certificate for each nation in the United Nations. Obviously
the former solution generates a privacy theft as using this certificate, the diplomatic representative announce this special credential each time he uses the certificate and this could be
used against him. The latter solution is unpractical as it requires a large batch of certificates
and in other applications (e.g., day and city of birth) this would correspond to millions of
certificates.
In section 3.1 we show a solution based on a “trapdoor commitment”. This crypto gadget
allows the attribute authority to assign to a party a special value for the field corresponding
to the nationality. Then the party can only reveal his nationality to successfully complete
the transaction. Instead, if a diplomatic representative receives from the attribute authority
a specific trapdoor, then he will be able to use any nationality to successfully complete
the transaction, and moreover the “verifier” will not notice any difference with respect to a
transaction of a normal party.
We need to deal with sets. There are more sophisticated, but still concrete and of
practical use, examples where both current standard technologies and solutions based on
trapdoor commitments fail.
2
This does not necessarily corresponds to the “country” field of X509 certificates since that value can be
the nation in which the user lives, or the one in which he works, or the one in which he was born etc. It is
therefore always possible to give a specific example where the “country” does not concern the appropriate
credential.
37
Consider again the nationality issue discussed so far. We actually have in the world
citizens with more than one nationality (i.e., more passports). Again, having all nationalities
in the same attribute certificate would represent a privacy threat (i.e., if access to a service
needs Italian nationality, communicating the Italian+X nationality could be used against
the user if the service provider does not like citizens from X). Moreover having an attribute
certificate for each nationality could be unpractical. It is possible to use again trapdoor
commitments for solving this problem. In this case the authority will not give the trapdoor
to the user but will only give him the information required to open that committed value
to each nationality that belongs to the user.
However, while in general a user has some nationalities, it could be the case that access
to a system requires to show that the nationality is not a specific one (i.e., services from
U.S. could be restricted to people that are not from countries considered dangerous for
U.S. security). Thus having for each other country an attribute certificate that says that
a user is not a citizen of that country would be really unpractical3 . We finally note that
the solution based on wildcards does not work here as we do not want to give a party the
power of claiming any nation, but only of affirmatively claiming some of them and negatively
claiming the (exponentially many) other ones.
In section 3.2 we show a solution based on “zero-knowledge sets”. This crypto gadget
allows an attribute authority to assign a special value for the field corresponding to the
nationality. Later the party can only reveal his nationalities and his non-nationalities to
successfully complete the transactions. Moreover, this can be done even in case the size
of the set of nationalities is exponential! Additionally, each time a nationality or nonnationality is shown, no additional information is learned by the service provider about the
other nationalities/non-nationalities.
This recently-introduced (it was introduced in [10] and later studied in [3] and in [2])
primitive is quite powerful and can be of interest in other important secure systems.
The advances of the crypto world. X509 and attribute certificates simply use digital
signatures and hash functions, crypto tools that were known long time ago. Stronger results have been achieved more recently, in particular the so called “anonymous credential
systems” [9, 1]. These systems guarantee a strong privacy as a user can prove possession
of credentials to many service providers still preserving unlinkability. Such a property is
not enjoyed by standard technologies and even the ones proposed in this work. Indeed, we
discuss the standard systems where the “same” certificate is potentially sent to different
service providers. Another recent result that is more compatible with standard systems is
that of “oblivious attribute certificates” [8] where only qualified users obtain the services
but service providers do not distinguish qualified users from unqualified ones.
However while anonymous credential systems and oblivious attribute certificates guarantee strong securities, their efficiency and usability is still controversial when access control
policies are complex and the space of possible credentials is large. We finally stress that these
systems would have a stronger impact on the current standard technologies with respect to
our work that instead focuses on both integrating and improving current standards. We
finally cite the notion of a “crypto certificate” introduced in [12, 13] where an encryption of
3
In some cases, the set can have an exponential number of elements where only few of them are credentials
owned by a user (e.g., consider a k-bit string credential, there can be 2k possibilities but a user will have
only poly(k) of them).
38
an attribute is stored in the certificate. The functionalities provided by crypto certificates
are however properly included in the setting where trapdoor commitments are used instead.
3
PassePartout Certificates
In this section we present our extensions for current X509 and attribute certificates in
order to strength their usability, privacy and security. We use the term “PassePartout” to
define a certificate that contains the special values that we discuss and that allow for much
more powerful access-control based systems. The first extension is that of wildcards and is
discussed in Section 3.1, while the second extension is that of zero-knowledge sets and is
discussed in Section 3.2.
3.1
Adding Wildcards to Digital Certificates
Here we show that an attribute certificate can contain a field that as value contains a string
that does not immediately correspond to the credential, but however it binds the owner of
the certificate to only one value (i.e., the credential). Such a value will be sent by the owner
of the certificate in order to successfully complete the transaction. This game corresponds
to a “commitment scheme” that we briefly discuss below.
Commitment schemes. Intuitively a commitment scheme can be seen as the digital
equivalent of a sealed envelope. If a party A wants to commit to some message m she just
puts it into the sealed envelope. Whenever A wants to reveal the message, she simply opens
the envelope. In order for such a mechanism to be useful, some basic requirements need to
be met. The digital envelope should hide the message: no party other than A should be
able to learn m from the commitment (this is often referred in the literature as the hiding
property). Moreover, the digital envelope should be binding, in the sense that A can not
change her mind about m, and, when checking the opening of the commitment, one can
verify that the obtained value is actually the one A had in mind originally (this is often
referred as the binding property).
A trapdoor commitment scheme is a commitment scheme with associated a pair of public
and private keys (the latter also called the trapdoor). Knowledge of the trapdoor allows the
sender to open the commitment to any message of its choice (this is often referred as the
equivocality property). On the other hand, without knowledge of the trapdoor, equivocality
remains computationally infeasible.
It is known how to construct an efficient trapdoor commitment scheme on top of the standard “discrete logarithm assumption” [11]. For additional details on commitment schemes,
see Appendix A.
Implementing wildcards with trapdoor commitments. We propose to extend the
possible values of a field of an attribute certificate considering also trapdoor commitments.
The public parameters of the scheme are chosen by the attribute authority that also adds
to the certificate a trapdoor commitment. Moreover the authority gives to the user the
information to “open” such a commitment to the credential he owns. A commitment can
also be opened with different values, in this case the authority gives to the user additional
39
data that allow him to open the commitment to different values. Finally, the attribute
authority can give the trapdoor to the user, in this case the user will be able to open the
commitment as any value he wishes. The efficiency of the scheme proposed in [11] guarantees
that such features can be used in practice without penalizing the efficiency of the system.
3.2
Adding Sets to Digital Certificates
Zero-knowledge sets. In [10], Micali et al. introduced the concept of a zero-knowledge
set. There a prover P commits to an arbitrary set S so that for any string x he can later
prove to a verifier V that x ∈ S or x ∈ S. Such a proof is required to be both “sound” and
“zero knowledge”. The former requirement preserves the security for the verifier since he
can not be convinced of a false proof given by an adversarial prover. The latter requirement
preserves the security for the prover since no adversarial verifier can learn more information
than the mere truthfulness of the proved statements.
In [2], on top of the work of [3] it is shown a general paradigm to construct efficient
schemes for implementing zero-knowledge sets. For additional details, see Appendix B.
Implementing sets with zero-knowledge sets. We propose to extend the possible
values of a field of an attribute certificate considering zero-knowledge sets. The crucial point
is that we want to give to a user a certificate that certifies both the credentials he owns and
the ones he does not own, giving him the possibility of showing both possession and nonpossession. The public parameters of the scheme are chosen by the attribute authorities that
also adds to the certificate the commitment to a zero-knowledge set. The user can therefore
send his certificate, showing that actually it contains some given credentials and also showing
that it does not contain some others credentials. We stress that a given attribute authority
gives only one attribute certificate to user for a given type of credentials. This is the only
way to give sense to a proof that a user does not posses a credential certified by a given
authority, when it is not encoded in one attribute certificate released by that authority.
References
[1] J. Camenisch and A. Lysyanskaya. An Efficient Non-Transferable Anonymous MultiShow Credential System with Optional Anonymity Revocation. In Advances in Cryptology – Eurocrypt ’01, volume 2045 of Lecture Notes in Computer Science, pages 93–118.
Springer-Verlag, 2001.
[2] D. Catalano, Y. Dodis, and I. Visconti. Mercurial commitments: Minimal assumptions
and efficient constructions. In 3rd Theory of Cryptography Conference (TCC ’06),
Lecture Notes in Computer Science. Springer-Verlag, 2006.
[3] M. Chase, A. Healy, A. Lysyanskaya, T. Malkin, and L. Reyzin. Mercurial commitments
with applications to zero-knowledge sets. In Advances in Cryptology – Eurocrypt ’05,
volume 3494 of Lecture Notes in Computer Science, pages 422–439. Springer-Verlag,
2005.
[4] T. Dierks and C. Allen. The TLS Protocol Version 1.0. Network Working Group RFC
2246, 1999.
40
[5] S. Farrell and R. Housley. An Internet Attribute Certificate Profile for Authorization.
Network Working Group, RFC 3281, April 2002.
[6] A. O. Freier, P. Karlton, and P. C. Kocher.
The SSL Protocol Version 3.0.
Transport Layer Security Working Group, Internet Draft, 1996.
http://home.netscape.com/eng/ssl3.
[7] R. Housley, W. Polk, W. Ford, and D. Solo. Internet X509 Public Key Infrastructure:
Certificate and Certificate Revocation List (CRL) Profile. Network Working Group,
RFC 3280, April 2002.
[8] Jiangtao Li and Ninghui Li. Oacerts: Oblivious attribute certificates. In Proceedings of
3rd Conference on Applied Cryptography and Network Security (ACNS), volume 3531
of Lecture Notes in Computer Science, pages 301–317. Springer Verlag, 2005.
[9] A. Lysyanskaya, R. Rivest, A. Sahai, and S. Wolf. Pseudonym Systems. In Selected
Areas in Cryptography (SAC ’99), volume 1758 of Lecture Notes in Computer Science.
Springer-Verlag, 1999.
[10] S. Micali, M. Rabin, and J. Kilian. Zero-knowledge sets. In 44th IEEE Symposium on
Foundations of Computer Science (FOCS ’03), pages 80–91, 2003.
[11] T. Pedersen. Non-interactive and information-theoretic secure verifiable secret sharing.
In Advances in Cryptology – Crypto ’91, volume 576 of Lecture Notes in Computer
Science, pages 129–140. Springer-Verlag, 1992.
[12] P. Persiano and I. Visconti. User Privacy Issues Regarding Certificates and the TLS
Protocol (The Design and Implementation of the SPSL Protocol). In 7th ACM Conference on Computer and Communications Security (CCS ’00), pages 53–62. ACM,
2000.
[13] Pino Persiano and Ivan Visconti. A secure and private system for subscription-based
remote services. ACM Transactions on Information and System Security, 6(4):472–500,
2003.
[14] B.
Ramsdell.
S/MIME
Version
http://www.ietf.org/rfc/rfc2632.txt, 1999.
A
3
Certificate
Handling.
Commitment Schemes
A commitment scheme is a primitive to generate and open commitments. More precisely
a commitment scheme is a two-phase protocol between two probabilistic polynomial time
algorithms sender and receiver. In the first phase (the commitment one) sender commits
to a bit b using some appropriate function Com which takes as input b and some auxiliary
value r and produces as output a value y. The value y is sent to receiver as a commitment
on b. In the second stage (called the decommitment phase) sender “convinces” receiver
that y is actually a valid commitment on b. The requirements that we make on a commitment
scheme are the following ones. First, if both sender and receiver behave honestly, then
at the end of the decommitment phase receiver is convinced that sender had committed
to bit b with probability 1. This is often referred as the correctness requirement. Second, a
dishonest receiver can not guess b with probability significantly better than 1/2. This is the
so-called hiding property. Finally, a cheating sender should be able to open a commitment
(i.e., to decommit) with both b and 1 − b only with very small (i.e., negligible) probability
(this is the binding property).
For readability we will use “for all x” to mean any possible string x of length polynomial in the security parameter. We start with the standard notion of commitment scheme
with its two main variants (i.e., unconditionally binding and unconditionally hiding). Note
that all definitions will use a commitment generator function that outputs the commitment
parameters. Therefore, such commitments have a straightforward implementation in the
common reference string model where a trusted third party generates a reference string that
is later received as common input by all parties. In some cases the commitment parameters
can be deterministically extracted from a random string; in such cases the corresponding
commitments can be implemented in the shared random string model which is a set-up
assumption weaker than the common reference string model.
For the sole sake of simplicity, in the following definitions, we consider the case in which
the commitment parameters are used for computing a single commitment. However all the
definitions can be extended (e.g., strengthening the computational indistinguishability so
that it holds even in case the distinguisher has on input the trapdoor information) and then
the same commitment parameters can be used for any polynomial number of commitments
(and actually all our results hold in this stronger setting).
Definition A.1 (Gen, Com, Ver) is a commitment scheme if:
- efficiency: Gen, Com and Ver are polynomial-time algorithms;
- correctness: for all v it holds that
Prob crs ← Gen(1k ); (com, dec) ← Com(crs, v) : Ver(crs, com, dec, v) = 1 = 1;
- binding: for any polynomial-time algorithm sender there is a negligible function ν such
that
Prob crs ← Gen(1k ); (com, v0 , v1 , dec0 , dec1 ) ← sender (crs) :
Ver(crs, com, dec0 , v0 ) = Ver(crs, com, dec1 , v1 ) = 1) ≤ ν(k);
- hiding: for all crs generated with non-zero probability by Gen(1k ), for all v0 , v1 where
|v0 | = |v1 | the probability distributions:
{(com0 , dec0 ) ← Com(crs, v0 ) : com0 }
and
{(com1 , dec1 ) ← Com(crs, v1 ) : com1 }
are computationally indistinguishable.
If the binding property holds with respect to any computationally unbounded algorithm
sender, the commitment scheme is said unconditionally binding; if instead, the hiding property holds with respect to any computationally unbounded distinguisher algorithm receiver,
the commitment scheme is said unconditionally hiding.
We now give the definition of a trapdoor commitment scheme.
41
42
Definition A.2 (Gen, Com, TCom, TDec, Ver) is a trapdoor commitment scheme if Gen(1k )
outputs a pair (crs, aux), Gencrs is the related algorithm that restricts the output of Gen to
the first element crs, (Gencrs , Com, Ver) is a commitment scheme and TCom and TDec are
polynomial-time algorithms such that:
- trapdoorness: for all v the probability distributions:
{(crs, aux) ← Gen(1k ); (com, dec) ← Com(crs, v) : (crs, com, dec, v)}
and
{(crs, aux) ← Gen(1k ); (com , auxcom ) ← TCom(aux); dec ← TDec(auxcom , v) : (crs, com , dec , v)}
are computationally indistinguishable.
Other Notions of Commitments. In [3] Chase et al. considered two different ways
for computing and opening commitments introducing the notion of mercurial commitment
schemes. In such schemes, the sender is allowed to compute hard and soft commitments. An
hard commitment is a classical unconditionally binding commitment. A soft commitment,
on the other hand, can be teased (i.e., partially open) to any value by the sender, but can
not be (fully) opened. In this sense, soft commitments are quite different than trapdoor
commitments as they can be teased to any value but can not actually be opened to any of
them. The sender can also tease an hard commitment as the same value that he can open. An
important property of mercurial commitment schemes is that, by looking at a commitment,
it is computationally infeasible to decide whether it is an hard or a soft commitment. More
precisely, a mercurial commitment is secure if there exists a simulator that can produce
commitments that it can later open or tease to any value and whose distribution remains
indistinguishable with respect to the distribution of the commitments produced by the
legitimate sender. Non-interactive mercurial commitments have been constructed, in the
shared random string model, under the assumption that non-interactive zero-knowledge
proof systems exist [3]. Mercurial commitments can be used to construct zero-knowledge
sets by only adding the assumption that collision-resistant hash functions exist (see below).
By looking at the properties of mercurial commitments it may seem that they are actually a more powerful primitive than hybrid commitments. This intuition may lead to
explain the current gap between the complexity-based assumptions used to construct noninteractive mercurial commitments (i.e., NIZK proofs) and non-interactive hybrid trapdoor
commitments (i.e., one-way functions) in the shared random string model. In this work, we
show that this intuition is wrong by showing that non-interactive hybrid trapdoor commitments suffice for constructing non-interactive mercurial commitments in the shared random
string model.
Finally we define mercurial commitments.
Definition A.3 (Mercurial Commitments) (Setup, Hard, Soft, Tease, Open, VerTease,
VerOpen) is a mercurial commitment scheme if:
- efficiency: The algorithms Setup, Hard, Soft, Tease, Open, VerTease and VerOpen run
in polynomial-time.
- correctness: Let crs be the output of Setup on input the security parameter k. For all
messages v it holds that
43
Hard Correctness
⎡
⎤
(com, dec) ← Hard(crs, v);
⎦=1
Pr ⎣ y ← Tease(v, dec); z ← Open(dec) :
VerTease(crs, com, v, y) = 1 ∧ VerOpen(crs, com, v, z) = 1
Soft Correctness
Pr
(com, dec) ← Soft(crs);
y ← Tease(v, dec) : VerTease(crs, com, v, y) = 1
=1
- binding: For any polynomial-time algorithm sender there is a negligible function ν
such that
(com, dec, v0 , v1 , y, z) ← sender (crs) : VerOpen(crs, com, v0 , z) = 1∧
≤ ν(k)
Pr
(VerOpen(crs, com, v1 , y) = 1 ∨ VerTease(crs, com, v1 , y) = 1) ∧ (v0 = v1 )
- hiding: There exist four polynomial-time algorithms (Sim-Setup, Sim-Com, Sim-Open,
Sim-Tease) described as follows:
Sim-Setup - This algorithm takes as input a security parameter k and produces as
output the common parameters crs and some auxiliary information aux.
Sim-Com - On input aux, it computes a simulated commitment C = (com, dec).
Sim-Open - On input a message m and dec, it outputs a simulated decommitment key
π.
Sim-Tease - On input a message m and dec, it outputs the simulated teasing τ .
We require that for all polynomially bounded receiver there exists a negligible
function ν such that
Pr crs0 ← Setup(1k ) : receiverO0 (crs0 ) = 1 −
Pr (crs1 , aux1 ) ← Sim-Setup(1k ) : receiverO1 (crs1 ) = 1 ≤ ν(k)
where Ob operates as follows.
1. O0 initializes L as the empty list and answers hard commit, soft commit, tease
and open queries as follows.
On input (Hard, v) it computes (com, dec) = Hard(crs0 , v), stores (Hard, com, dec,
v) in L and outputs com.
On input (Soft, v) it computes (com, dec) = Soft(crs0 ), stores (Soft, com, dec,
On input (Tease, com, v ) it checks if com ∈ L. If not it answers fail, otherwise
it retrieves from L the corresponding information. If com was an hard commitment on v , or if com was a soft commitment, O0 outputs y ← Tease(v , dec).
Otherwise O0 outputs fail.
On input (Open, com, v ) it checks if com ∈ L. If not it answers fail, otherwise it
retrieves from L the corresponding information. If com was an hard commitment
on v , O0 outputs z ← Open(dec). Otherwise O0 outputs fail.
44
2. O1 initializes L as the empty list and answers the queries above as follows.
On input (Hard, v) it computes (com, dec) = Sim-Com(aux1 ), stores (Hard, com, dec,
On input Soft it computes (com, dec) = Sim-Com(aux1 ), stores (Soft, com, dec)
in L and outputs com.
On input (Tease, com, v ) it checks if com ∈ L. If not it answers fail, otherwise it
on v , or if com was a soft commitment, O1 outputs y ← Sim-Tease(v , dec).
Otherwise O1 outputs fail.
On input (Open, com, v ) it checks if com ∈ L. If not it answers fail, otherwise it
on v , Ob outputs z ← Sim-Open(v , dec). Otherwise Ob outputs fail.
B
Zero-Knowledge Sets
In [10], Micali et al. introduced the concept of a zero-knowledge set. There a prover P
commits to an arbitrary set S so that for any string x he can later prove to a verifier V that
x ∈ S or x ∈ S. Such a proof is required to be both sound and zero knowledge. The former
requirement preserves the security for the verifier since he can not be convinced of a false
proof given by an adversarial prover. The latter requirement preserves the security for the
prover since no adversarial verifier can learn more information than the mere truthfulness
of the proved statements.
A light variation (actually extension) of zero-knowledge sets is that of zero-knowledge
elementary databases where x is considered a key and v(x) is the corresponding datum. In
this case, for any key x the prover either proves that x ∈ S or proves that x ∈ S ∧ v(x) = u,
still preserving soundness and zero knowledge. We will focus on zero-knowledge sets but all
the discussions and results extend also to zero-knowledge elementary databases.
Mercurial commitments for zero-knowledge sets. In [3] by Chase et al. where the
concept of a mercurial commitment is introduced along with their application for the construction of zero-knowledge sets. More specifically, zero-knowledge sets are constructed by
using collision-resistant hash functions and mercurial commitments. The concept of mercurial commitments has been later investigated in [2] there a general paradigm for constructing
efficient mercurial commitments (and thus efficient zero-knowledge sets) is presented.
45
A Graphical PIN Authentication Mechanism for
Smart Cards and Low-Cost Devices∗
Luigi Catuogno
Clemente Galdi
Dipartimento di Informatica ed Applicazioni
Università di Salerno - ITALY
Dipartimento di Scienze Fisiche
Università di Napoli “Federico II” - ITALY
[[email protected]]
[[email protected]]
Abstract
Passwords and PINs are still the most deployed authentication mechanisms and their
protection is a classical branch of research in computer security. Several password schemes,
as well as more sophisticated tokens, algorithms, and protocols, have been proposed during the last years. Some proposals require dedicated devices, such as biometric sensors,
whereas, other of them have high computational requirements. Graphical passwords are a
promising research branch, but implementation of many proposed schemes often requires
considerable resources (e.g., data storage, high quality displays) making difficult their usage on small devices, like old fashioned ATM terminals, smart cards and many low-price
cellular phones.
In this paper we present an graphical mechanism that handles authentication by means
of a numerical PIN, that users have to type on the basis of a secret sequence of objects and
a graphical challenge. The proposed scheme can be instantiatiated in a way to require low
computation capabilities, making it also suitable for small devices with limited resources.
We prove that our scheme is effective against “shoulder surfing” attacks.
Introduction
Passwords and PINs are still the most deployed authentication mechanism, although they suffer of relevant and well known weakness [1]. The protection of passwords is a classical branch
of research in computer security. Several important improvements to the old-fashioned alphanumeric passwords, according to the context of different applications, have been proposed
in the last years. Indeed, literature on authentication and passwords is huge, here we just
cite Kerberos [13] and S/Key [6].
Two important aspects in dealing with passwords are the following:
1. Passwords should be easy enough to be remembered but strong enough in order to avoid
guessing attacks;
2. The authentication mechanism should be resilient against classical threats, like shoulder
surfing attacks, i.e., the capability of recording the interaction of the user and the
terminal; moreover, it should be light enough to be used also on small computers.
∗
This work was partially supported by the European Union under IST FET Integrated Project AEOLUS
(IST-015964).
46
Consider for example the following scenario. For accessing an ATM services, a user needs
a magnetic strip card. In order to be authenticated, the user pushes her card (that carries
only her identification data) in the ATM reader and types her four digit PIN; afterwards,
the ATM sends the user’s credentials to the remote authentication server through a PSTN
network. This approach is really weak. Magnetic strip cards can be easily cloned and, PIN
numbers can be collected in many ways. For example, an adversary could have placed a
hidden micro-camera pointing to the ATM panel somewhere in the neighborhood. A recent
tampering technique is accomplished by means of a skimmer, i.e., a reader equipped with
an EPROM memory that is glued upon the ATM reader, so that strips of passing cards can
be dumped to the EPROM. A forged spotlight is also placed upon the keyboard in order to
record the insertion of the PIN. The skimmer allows adversaries to collect a finite number of
user sessions obtaining all information needed to clone user cards. Such information, coupled
with the images taken by the camera, allow the attacker to correctly authenticate to the
ATM. Such attack is known in literature as “shoulder surfing” attack.
Graphical passwords [2, 10, 3, 7, 8, 12, 16, 9, 11, 15] are a promising authentication
mechanism that faces many drawbacks of old-style password/PIN based scheme. The basic
idea is to ask the user to click on some predefined parts of an image displayed on the screen
by the system, according to a certain sequence. Such a method has been improved during
the last years, in order to obtain schemes offering enhanced security and usability. Despite
its importance, few attention has been devoted to graphical password schemes resilient to
shoulder surfing attacks. In particular [11] first addressed this problem under restricted
conditions. Subsequently, in [15] presented a graphical password scheme that was claimed to
be secure against shoulder surfing attacks. However, this scheme has been proved not to be
secure in [5]. For a wider overview about research on graphical passwords, we suggest the
reader to take a look at the survey by Suo et al. [14] and visit the web site of the “Graphical
Passwords Project”[4] at Rutgers.
The majority of proposed schemes require costly hardware (e.g., medium or high resolution displays and graphic adapters, touch screen, data storage, high computational resources
etc.). This makes some of the proposed schemes not suitable to be implemented on low cost
equipments (e.g., current ATM terminals that are still the overwhelming majority).
In this paper we propose a graphical PIN scheme based on the challenge-response paradigm
that can be instantiatiated in a way to require low computation capabilities, making it also
suitable for small devices with limited resources.
The design of the scheme follows three important guidelines:
• The scheme should be independent from the specific set of objects that are used for the
graphical challenge. In particular, our scheme can be deployed also on terminals that
are equipped with small sized or cheap displays like the ones of the cellular phones, or
through the classical 10 inch CRT monitor (both color or monochrome) that still equips
thousands of ATM terminals.
Moreover, user responses should be composed as well by any sophisticated pointing
device as by simple keypad.
• The generation of challenges and the verification of user’s responses should be affordable
also by computer with limited computational resources (e.g. as in the “smart card
scenario” described above).
47
• The user is simply required to recognize the position of some objects on the screen. She
is not required to compute any function.
We present a strategy that can withstand shoulder surfing attacks. This strategy is
independent from the specific set of objects that are used to construct the challange.
1
Our Proposal
In this paper we assume that the terminal used by the user cannot be tampered. In other
words, an adversary is allowed to record the challanges displayed by the terminal and the
activity of the user but she is not allowed to alter in any way thebehaviour of the parties.
The protocols described in this paper belong to the family of challenge and response
authentication schemes, where the system issues a random challenge to the user, who is
required to compute a response, according to the challenge and to a secret shared between
the user and the system.
More precisely, a challenge consists of a picture depicting a random arrangement of some
objects (e.g. colored geometrical shapes) into a matrix. The challenge is displayed on the
screen. We denote by O the set of all distinct objects and by q its cardinality. A challenge is
represented as a sequence α = (o1 , . . . , o|α| ), where oi is an object drawn from O.
During her authentication session, the user is required to type as PIN the position of a
sequence of secret objects in the challenge matrix. It is clear that the PIN typed by the user
changes in each session as the challenge changes, since it is simply the proof that the user
knows the secret sequence of objects and so, she can correctly reply to the current challenge.
To be more precise the secret is a sequence of m questions, called queries. Each query is a
question of the following type: “On which row of the screen do you see the object o?”. Since
questions are chosen independently, the set of possible queries has size |O|m .
Upon reception of a challenge, the user is required to compute a response, according to
the secret queries shared with the system. A response is a vector β = (β1 , . . . , βm ), where
each βi is a number drawn from a set A = {0, 1, . . . a − 1}, representing the answer to the
i-th query, according to the challenge. A Session Transcript is a pair τ = (α, β), where α is
a challenge and β is the user response to α.
We stress that the set of objects used to construct the challanges has an impact on the
usability of the scheme. For example, it is easier to remember a sequence of pictures like
“home, dog, cat” than a sequence of geometrical shapes, like “blue traingle, green circle,
yellow square”. On the other hand, complex objects cannot be displayed/managed on lowcost devices. Our scheme is independent from the specific set of objects. This makes it is
suitable for deployment both on complex and simple devices.
1.1
Different authentication strategies.
Give the above authentication scheme, we have analized three different authentication strategies. In the first strategy, the user is required to correctly answer all the questions in her
secret. A second strategy is to allow the user to correctly answer only to a subset of her secret
questions. We have considered the case in which the user correctly answers at least k out of m
questions of her choice while she is allowed to give random answers to the remaining queries.
the last strategy we have analyzed consists in requiring the user to correctly answering exactly
k out of n queries while giving wrong answers to the remaining ones.
48
Notice that the last two strategies differ in the sense that wrong answers do give information about the user secret in contrast to random answers that do not give any information on
the user secret.
For the above strategies we have evaluated the probability with which an adversary can
extract the user secret as a function of the number of recorded sessions. Notice that the goal of
the adversary may not be the secret extraction but, more simply, a one-time authentication.
We notice that, typically, in the scenario we consider the adversary cannot use a “brute
force” attack since, for example, the strip card would be disabled after three unsuccessfull
authentications. For this reason the adversary should recover either the whole secret or
“almost” the whole secret, before trying the authentication.
2
Experimental Evaluation
In this section we give an experimental evaluation of the performances of the strategies presented above. For each strategy we report the number of session trascripts that the adversary
needs to intercept for extracting the user secret with probability either .95 or .99. In order to
present concrete examples, we will fix the number of objects to be either 36 or 80. The value
36 has been chosen so that all the object can be displayed on a low resolution display, e.g.,
the ATM case. The value 80 could be used in case the device used for displaying the objects
is a more advanced one.
Furthermore, we fix the number m of queries the user should answer to 15. This choice is
due to the fact that (a) It should not be hard for a human to remember 15 objects and (b)
The probability of a blind attack is negligible.
Table 1 summarizes the performance of the first strategy in which the user correctly
answers to all the questions in her secret. In particular, we report the number of sessions the
adversary needs in order to compute successfully the user’s secret either with probability 0.95
or with probability 0.99.
Always Correct
q=36, a=2
q=36, a=6
q=80, a=6
q=80, a=8
p = 0.95
14
6
15
5
p = 0.99
16
7
17
6
Table 1: Number of sessions needed to extract the user secret with probability at least p in
case m = 15 query case.
As for the second strategy, the results in reported in Table 2 are referred to the single
query case. This means that the adversary needs to collect at least x correct answers from
the user. If we extend to the multiple query case, we need to consider that in each session,
the user answers correctly only to a fraction of the queries. The value of the fraction of
correct anwers depends, for some technicalities, on the size of the answer set A. As for the
multiple query case, Table 3 reports the expected number of sessions that the adversary needs
to collect in order to extract the user’s secret. The last column indicates the probability with
which the user correctly answers a query. The multiple query case is strictly related to the
Group Coupon Collector’s Problem. Since we are not aware of any result on such problem,
49
we have obtained these results by simulation.
Correct & Random
q=36 a=2
q=36 a=6
q=80 a=2
q=80 a=8
p = 0.95
10
4
11
4
p = 0.99
12
5
13
5
Table 2: Number of correct sessions needed to extract the user secret with probability at least
p in the single query case.
Correct & Random
q=36, a=2
q=36, a=6
q=80, a=2
q=80, a=8
p = 0.95
15
8
17
11
p = 0.99
18
11
20
15
c
3/2
3
3/2
4
As for the last strategy, let c be the number of questions the user correctly answer in each
authentication. Table 4 reports the number of sessions an adversary needs to collect in order
to extract the user secret with probability at least 0.95 or 0.99.
Correct & Wrong
q=36, a=2
q=36, a=2
q=36, a=6
q=36, a=6
q=80, a=2
q=80, a=2
q=80, a=8
q=80, a=8
p = 0.95
16
24
10
16
16
24
10
16
p = 0.99
20
36
12
16
24
36
10
16
c
m/2
m/4
m/2
m/4
m/2
m/4
m/2
m/4
3
Conclusion
In this paper we have presented a simple graphical PIN authentication mechanism that is
resilient against shoulder surfing. Our scheme is independent on the spcific set of objects
used to construct the challanges. Depending on the specific strategy, the adversary may fail
in impersonating the user even if she manages to obtain as much as 36 transcripts. The
scheme may be implemented on low cost devices and does not require any special training
50
for the users. The user only needs to remember a small sequence of objects. Finally the
authentication requires a single round of interaction between the user and the terminal. We
have also discussed a prototype implementation.
The analysis of the scheme considers the probability of extracting the user’s secret instead
of the one of successful “one-time” authentication. Since the number of attempts the adversary
can try before the user is disabled is limited to three, we believe that the number of sessions
needed by the adversary in the latter case does not differ significantly from the one needed
for the former goal.
References
[1] Ross J. Anderson. Why cryptosystems fail. Commun. ACM, 37(11):32–40, 1994.
[2] G. E. Blonder. Graphical passwords. Lucent Technologies Inc, Murray Hill, NJ (US),
US Patent no. 5559961, 1996.
[3] R. Dhamija and A. Perring. ”dèjá vu: A user study using images for authentication”. In
IX USENIX UNIX Security Symposium, Denver, Colorado(USA), August, 14-17 2000.
[4] J. C. Birget et al. Graphical password project. http://clam.rutgers.edu/ birget/grPssw,
2002.
[5] Philippe Golle and David Wagner. Cryptanalysis of a cognitive authentication scheme
(short paper). In 2007 IEEE Symposium on Security and Privacy, to appear.
[6] Neil M. Haller. The S/KEY one-time password system. In Proceedings of the Symposium
on Network and Distributed System Security, pages 151–157, 1994.
[7] W. Jensen, S. Gavrila, V. Korolev, R. Ayers, and R. Swanstrom. ”picture password:
a visual login technique for mobile devices”. In National Institute of Standards and
Technologies Interagency Report, volume NISTIR 7030, 2003.
[8] I. Jermyn, A. Mayer, F. Monrose, M. K. Reiter, and A. D. Rubin. ”the design and
analysis of graphical passwords”. In Proceedings of the 8th USENIX security Symposium,
Washington D.C. (US), august 23-26 1999.
[9] Shushuang Man, Dawei Hong, and Manton M. Matthews. A shoulder-surfing resistant
graphical password scheme - wiw. In Proceedings of the International Conference on
Security and Management, SAM ’03, June 23 - 26, 2003, Las Vegas, Nevada(US), volume 1, pages 105–111, June 2003.
[10] A. Perrig and D. Song. ”hash visualization: A new technique to improve real-world security”. In ”Proceedings of the 1999 Internationa Workshop on Cryptographic Techniques
and E-Commerce”, 1999.
[11] Volker Roth, Kai Richter, and Rene Freidinger. A pin-entry method resilient against
shoulder surfing. In CCS ’04: Proceedings of the 11th ACM conference on Computer and
communications security, pages 236–245, New York, NY, USA, 2004. ACM Press.
[12] L. Sobrado and J. C. Birget. ”graphical password”. ”The Rutgers Scholar, an electronic
Bulletin for undergraduate research”, 4, 2002.
51
[13] Jennifer G. Steiner, B. Clifford Neuman, and Jeffrey I. Schiller. Kerberos: An authentication service for open network systems. In USENIX Winter, pages 191–202, 1988.
[14] Xiaoyuan Suo, Ying Zhu, and G. Scott Owen. ”graphical passwords: a survey”. In
Proceedings of 21st Annual Computer Security Application Conference (ACSAC 2005)
december 5-9, Tucson AZ (US), pages 463–472, december 2005.
[15] Daphna Weinshall. Cognitive authentication schemes safe against spyware (short paper).
In IEEE Symposium on Security and Privacy, pages 295–300. IEEE Computer Society,
2006.
[16] S. Wiedenbeck, J. Waters, L. Sobrado, and J. C. Birget. Design and evaluation of a
shoulder-surfing resistant graphical password scheme. In Proceedings of Advanced Visual
Interfaces AVI 2006, Venice ITALY, may 23-26 2006.
52
SMTP sniffing for intrusion detection purposes
Maurizio Aiello*, David Avanzini*,
Davide Chiarella†*, Gianluca Papaleo†*
*National Research Council, IEIIT, Genoa
†University of Genoa, Department of Computer and Information Sciences, Italy
Abstract. Internet e-mail has become one of the most important ways for people and
enterprises to communicate with each other. However, this system, in some cases, is used
for malicious purposes. A great problem is the worm and spam spreading. A smart e-mail
content checking system can help to detect these kinds of threats. We propose a way to
capture, store and display e-mails transactions through SMTP packet sniffing. We worked
on pcap files dumped by a packet sniffer containing SMTP traffic packets of a real
network. After reassembling the TCP streams and SMTP commands, we store the
captured e-mails on a database: for privacy reasons, only e-mails headers are stored.
Having a tool for clear understanding and monitoring SMTP transaction may help in
manager security tasks.
Keywords: SMTP, E-mail, Intrusion Detection, Network Security, Worm Detection,
sniffing.
1 Introduction
Nowadays e-mail has already changed people’s life and work habits thoroughly: in
fact the majority of people use the e-mail system to share any type of information
ranging from business purposes to illegal ones.
A pestering and energy draining problem for mail domain administrators is
answering to users' requests regarding e-mail sent that never arrived to destination, or
messages that he should have received but he didn't. However the main and most
important problem related to e-mail utilization is virus and spam diffusion: software
like Spamassassin [1] make an e-mail content checking in order to limit spam while
Amavis [2] scans e-mail attachments for viruses using third-party virus scanners in
order to detect viruses, but both have to be installed in every mail server in order to
work. In all these cases the use of a packet sniffer can be a good solution to the
problem; in fact it centralizes the work, enabling administrators to monitor all e-mail
traffic of several servers with just one installation, making possible to build a gateway
defense system and permitting to identify all the SMTP traffic, including e-mails sent
to unknown servers or from infected hosts (trojans, viruses etc.). In fact, in some cases
the misconfiguration of firewalls permit a normal user to install and use a SMTP
server on his own machine without administrator authorization or to use a SMTP
server to send mail that is not internal to the company. The disadvantage of our
solution is that the use of ESMTP-TLS and PGP make the packet sniffer useless.
While PGP is a server independent solution, ESMTP is bounded to both servers:
source and destination have to use ESMTP with TLS enabled to permit an encrypted
e-mail traffic; if one of the two servers doesn’t have TLS enabled, the communication
is not encrypted.
53
The purpose of this paper is to present a smart solution to check e-mail for intrusion
detection purposes. We propose a program based on SMTP flux
reassembling[3][9][10] that can be useful for e-mail auditing and intrusion detection
purposes.
2 Packet sniffing
To allow our system to work we need information about packets within the network
we are interested to monitor. We implemented a packet sniffing program to analyze
the SMTP traffic using the Perl library Net::Pcap [4] . The sniffer process begins by
determining which interface to sniff on. The function lookupdev() is used to get the
network interface and the function open_live() to set the interface to promiscuous
mode for sniffing. Then, the function compile() and setfilter() are used to set a filter to
get the packet; we can filter the packet sniffing on the port 25: this is the standard port
in which mail servers are listening for connections. If we have multiple networks, we
can furthermore filter our traffic, picking the network we are interested to. The
function dump_open() allows the sniffer to dump the captured packets in a pcap file.
Finally the function loop() allow the sniffer to start.
The pcap files are then passed to the reassembler: reassembler can be located in the
same machine of the sniffer or, due to the SMTPSniffer modularity, in another host.
To allow the reassembler to read pcap file in on-line mode (almost in real time), the
sniffer creates pcap files every n packets sniffed on the network. The pcap files are
overlapped to be sure not to lose packets between the different slices. The dimension
of pcap files is set by the user. The architecture of the system is shown in Figure 1.
Figure 1. System architecture
3 Reassembler
The next phase of our system is to take pcap file produced by the sniffer and
reconstruct the whole e-mail sent by a client. To do this operations we use the Perl
library Net::Analysis [5]. To allow SMTPSniffer to work in online mode, a scheduler
check every minute if there are new pcap files to check and then passes the new pcap
files to the reassembler. The scheduler has been build, using Schedule::Cron [6]
library. First of all, for every packet captured we store in a file the source IP and port,
destination IP and port, the timestamp and the data part of the packet. Timestamp
54
represents the number of seconds between the present date and the Unix Epoch
(January 1st, 1970).
Figure 2. Flux hash table
Once we obtained all information interested about TCP packet, we have to
reconstruct every SMTP session. To do this, we use two hash tables: the first hash
table is called flux (Figure 2) and represents all the SMTP sessions.
Figure 3. Timeflux hash table
The second hash table is called timeflux (Figure 3), it contains a list of ordered
timestamp referred for every smtp session. The second hash table is useful to reorder
all the SMTP fluxes.. For a better view the timestamp format of Timeflux hash table in
Figure 3 is YYYY/MM/DD hh:mm:ss.
During normal operations, if we find a TCP packet related to SMTP port 25, we
check if the key in the flux already exists: in positive case we append the data part of
the packet in the value of flux hash table, otherwise we create a new entry both in flux
hash table and timeflux hash table.
Once we have information of all e-mails sent, the next step is to reorder all the
fluxes related to their timestamp (this information is taken in timeflux hash table). For
privacy reason all the parts about body message are cut.
55
4 Database features
Once we have reassembled the flux stream of our e-mails, the system dump the
information about e-mails in a MySQL database. The info about e-mails stored in the
database are: Timestamp (converted in a human readable format), mail client name
(obtained by a reverse lookup), IP mail client, IP mail server, sender address, receiver
address, SMTP code (e.g. 250, 450), e-mail size.
The usefulness to have all the mail stored in a database is the capability to search
information we are interested in at any time through db-queries.
For example, you can find how and which e-mails have been sent by a user or to a
particular user. Another choice is to see the whole e-mail traffic in a certain period of
time or listing all the e-mails rejected by your e-mail server.
In Figure 4 we can see an example of e-mails captured and stored in our database.
Figure 4. Database example
5 Test scenario
We tested SMTPSniffer in a network segmented in more subnets with a total of
about 400 hosts. We compare the result between reassembling the SMTP session
analyzing a single big pcap file (off-line mode) and reassembling SMTP session
analyzing more pcap files overlap slices (on-line mode) and we were able to
reconstruct exactly all the e-mails. In Figure 5 we can see a scheme of our test
scenario. The SMTPSniffer PC is attached to our network through two interfaces; one
interface is connected to the hub to allow SMTPSniffer to capture all the traffic
outgoing the network, (the interface is set in promiscuous mode), the other interface is
connected to the switch to communicate with hosts in our network. The purpose to
have an interface connected to the switch is to control in remote mode the
SMTPSniffer pc. The hub is not strictly required, in fact you can connect the
promiscuous interface directly to a switch mirror port.
Figure 5. Scenario schema
56
6 Security purposes
SMTPSniffer is, as already said, a powerful tool for a system administrator,
because it allows a fast check of e-mail traffic in real time. This fact takes a lot of
possible choices.
SMTPSniffer can be used for intrusion detection purposes, and in particular for
indirect worm detection [7]. At the present, in fact, worms continue to improve in
terms of their sophistication and detrimental effect and by exploiting the benefits of email system they spread very widely and very fast, exhausting network resources.
When a worm infects a host, it tries to send the greatest amount of e-mail in the
shortest time interval: this behaviour can’t pass unnoticed because SMTPSniffer lists
the e-mail activity and so you can try to discover why a host is sending a lot of emails.
It is possible to collect the data stored in the database to detect anomalies using
statistical methods. Through DB-Query you can filter the data before analyzing them,
getting a more efficient result and making possible an instant raw anomaly detection.
In fact worm spreading hurry produces a lot of e-mail rejected by the mail-server
and if you filter the traffic through a query that consider only rejected e-mails (in
database the field status must be 450 or 550) and then analyze the various peaks,
there’s a lot of chance to identify worm activity on your network.
Moreover all these features permits you to anticipate the antivirus patch: in fact the
speed at which viruses spread is faster than the speed to develop and distribute virus
signatures for anti-virus protection.
7 Wormpoacher integration
In [7] we describe a worm detection technique and system. The program we are
developing using these techniques is called Wormpoacher.
Wormpoacher use a tool for log mail analysis called LMA [8]. LMA is bounded to
the specific mail server, if we install a mail server not supported by LMA we aren’t
able to allow Wormpoacher work properly. Moreover many worm use its own SMTP
engine to propagate and in this case, Wormpoacher isn’t able to analyze this type of
traffic. To overcome the LMA limits, we can add SMTPSniffer feature in
Wormpoacher. This operation is very easy to achieve, because Wormpoacher
architecture is completely modular. Clearly, if we want to analyze communications
between our hosts (internal communications) we have to place the sensor inside our
internal network: the same concepts holds if we want LMA work on different servers.
8 Conclusion
In this paper we discussed a program useful to store and display all the e-mails
sent in a local network. It can be used to prevent attack actions within local network or
to investigate mail servers misconfigurations or mail traffic firewalls settings.
You can view all e-mails almost in real time through the db feature and,
through various kind of queries, it is possible to perform different mail analysis.
SMTPSniffer can be integrated in Wormpoacher in order to perform worm
detection and it can overcome the LMA constraints in order to have a total view of
network mail configuration and activity.
57
Acknowledgments
This work was partially supported by National Research Council of Italy, University
of Genoa and PRAI-FESR Programme, Innovative Actions of Liguria.
REFERENCES
[1]. http://spamassassin.apache.org/
[2]. http://www.amavis.org/
[3]. Wang Zhimin, Jia Xiaolin.. Restoration and audit of Internet e-mail based on TCP stream reassembling.
Communication Technology Proceedings, 2003. ICCT 2003. International Conference on Volume 1, 911 April 2003 Page(s):368 - 371 vol.. 1
[4]. Net::Pcap - Interface to pcap LBL packet capture library http://search.cpan.org/dist/Net-Pcap/Pcap.pm
[5]. Net::Analysis - Modules for analysing network traffic http://search.cpan.org/~worrall/Net-Analysis0.06/lib/Net/Analysis.pm
[6]. Schedule::Cron - cron-like scheduler for Perl subroutines http://search.cpan.org/~roland/Schedule-Cron0.97/Cron.pm
[7]. Maurizio Aiello, David Avanzini, Davide Chiarella, Gianluca Papaleo. Worm Detection Using E-mail
Data Mining, 2006. PRISE 2006, Primo Workshop Italiano su PRIvacy e SEcurity.
[8]. Maurizio Aiello, David Avanzini, Davide Chiarella, Gianluca Papaleo. Log Mail Analyzer: Architecture
and Practical Utilization, 2006.
[9]. Shishi LIU, Jizshou SUN, Xiaoling ZHAO, Zunce WEI. A general purpose application layer IDS
Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on Volume 2, 47 May 2003 Page(s):927 - 930 vol.2
[10]. Xiaoling Zhao, Jizhou Sun, Shishi Liu, Zunce Wei. A parallel algorithm for protocol reassembling
Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on Volume 2, 47 May 2003 Page(s):901 - 904 vol. 2
58
The general problem of privacy in location-based
services and some interesting research directions
Claudio Bettini Sergio Mascetti Linda Pareschi
DICo, Università di Milano
(Extended Abstract)
The proliferation of location-aware devices will soon result in a diffusion
of location-based services (LBS). Privacy preservation is a challenging research issue for this kind of services. In general, there is a privacy threat
when an attacker is able to associate the identity of a user to information
that the user considers sensitive. In the case of LBS, both the identity of
a user and her sensitive information can be possibly derived from requests
issued to service providers. More precisely, the identity and the sensitive
information of a single user can be derived from requests issued by a group
of users. Figure 1 shows a graphical representation of this general view of
privacy threats in LBS. In order to prevent an attacker from associating a
user’s identity to her sensitive information, the ongoing research in this field
is tackling two main subproblems: prevent the attacker from inferring the
user’s identity and prevent the attacker from inferring the user’s sensitive
information.
Since the general privacy threat is the association of a user’s identity
with her sensitive information, in order to protect the user’s privacy it is
sufficient to prevent the attacker from inferring either the identity or the
sensitive information. Hence, despite the solution of one of the two subproblems is sufficient to guarantee user’s privacy, we argue that the solution
of both subproblems could enhance better techniques for privacy protection. However, the quality of the provided service could be affected by the
introduction of stronger mechanisms for the privacy preservation. Indeed,
the obfuscation of requests parameters usually involved in privacy protection techniques implies a degradation of the quality of service. A location
based privacy preserving system that implements solutions for both the subproblems can combine them in order to to optimize quality of service while
preserving privacy. Most of the approaches proposed in the literature to
protect LBS privacy consider scenarios that can be easily mapped to the
one depicted in Figure 2. It involves three main entities:
• The User invokes or subscribes to location-based remote services that
are going to be provided to her mobile device.
59
Figure 1: General privacy threat in LBS
• The Location-aware Trusted Server (LTS) stores precise location
data of all its users, using data directly provided by users’ devices
and/or acquired from the infrastructure.
• The Service Provider (SP) fulfills user requests and communicates
with the user through the LTS.
Figure 2: The reference scenario.
In our model each request r is processed by the LTS resulting into a
request r with the same logical components but appropriately generalized
to guarantee user’s privacy. Requests, once forwarded by the LTS, may
be acquired by potential attackers in different ways: they may be stolen
from SP storage, voluntarily published by the trusted parties, or may be
acquired by eavesdropping on the communication lines. On the contrary,
the communication between the user and the LTS is considered as trusted,
and the data stored at the LTS is not considered accessible by the attacker.
Research directions. Our current research effort is focused on the problem of preventing the inference of the user’s identity; we call this problem
the LBS identity privacy problem. A different research direction focuses on
the problem of preventing the inference of the user’s sensitive information;
when sensitive information is the specific user location or can be inferred
60
(a) single-issuer
(b) multiple-issuers
Figure 3: The static case
from that information, this problem is called LBS location privacy problem
and has been addressed, among others, in [5, 7].
In order to resolve the identity privacy problem, several contributions
have proposed different techniques [4, 9, 6, 2]. The common idea is to enforce
the issuer of a request to be anonymous. This means that an attacker must
not be able to associate any request to its issuer with likelihood greater than
a threshold value.
A particular case studied in these papers is when the attacker can acquire
a single request issued by the user. More specifically, this case assumes that:
(i) the attacker is not able to link a set of requests i.e., to understand that
the requests are issued by the same (anonymous) user; (ii) the attacker is
not able to reason with requests issued by different users.
In general, we can distinguish privacy threats according to two orthogonal dimensions: a) threats in static versus dynamic cases , b) threats involving requests from a single user (single-issuer case) versus threats involving
requests from different users (multiple-issuer case).
Figure 3(a) shows a graphical representation of the privacy threat in
the static, single-issuer case. In all single-issuer cases, the LTS must ensure
that an attacker cannot associate a request with its issuer with likelihood
greater than a threshold value. Different papers addressed the LBS identity
privacy problem in this case proposing different technical solutions. In [2]
we presented a formal framework to model this problem; Moreover, in [8]
we have compared and empirically evaluated the solutions proposed in the
literature as well as new algorithms that we have devised.
Example 1 shows that, in the multiple-issuer cases, users’ anonymity
may not be sufficient to guarantee their privacy.
Example 1 Suppose a user u issues a request generalized into r by the
LTS. Assume that, considering r , an attacker is not able to identify u with
likelihood greater than a value h in the set S of potential issuers. However,
61
if many of the users in S issue requests from which the attacker can infer the
same sensitive information inferred from r , then the attacker can associate
that sensitive information to u with likelihood greater than h.
In the area of databases, the analogous privacy issue is known as ldiversity problem. In the area of LBS, the problem is depicted in Figure 3(b). In the multiple-issuer case the attacker can gain information from
the requests issued by different users, and the static case imposes that a
single request is considered for each user. We were probably the first to
study this problem in the area of LBS; Our preliminary results can be found
in [1].
In contrast with the static case, in the dynamic case it is assumed that
the attacker is able to recognize that a set of requests has been issued by the
same (anonymous) user. Several techniques exist to link different requests
to the same user, with the most trivial ones being the observation of the
same identity or pseudo-id in the requests. We call request trace a set of
requests that the attacker can correctly associate to a single user.
Figure 4 shows a graphical representation of the dynamic case. The corresponding techniques to preserve privacy are facing two problems. First,
preventing the attacker from linking the requests (called linking problem);
Indeed, the longer is a trace, the higher the probability of the issuer to loose
her anonymity. Second, preventing the attacker from understanding the
identity of the issuer (based, for example, on external knowledge about the
position of users at different times) with likelihood greater than a threshold
value. In [3] we introduced the notion of historical k-anonymity to formally
model the dynamic, single-issuer case and we investigated how different techniques for solving the linking problem and the identity privacy problem can
be combined for protecting the user’s privacy. From those preliminary results we are working towards a general formal model and privacy protection
techniques for the dynamic case, eventually covering also the multiple-issuer
case.
References
[1] Claudio Bettini, Sushil Jajodia, and Linda Pareschi. Anonymity and
diversity in lbs: a preliminary investigation. In Proc. of the 5th International Conference on Pervasive Computing and Communication (PerCom). IEEE Computer Society, 2007.
[2] Claudio Bettini, Sergio Mascetti, X. Sean Wang, and Sushil Jajodia.
Anonymity in location-based services: towards a general framework. In
Proc. of the 8th International Conference on Mobile Data Management
(MDM). IEEE Computer Society, 2007.
62
Figure 4: The dynamic case
[3] Claudio Bettini, X. Sean Wang, and Sushil Jajodia. Protecting privacy
against location-based personal identification. In Proc. of the 2nd workshop on Secure Data Management (SDM), volume 3674 of LNCS, pages
185–199. Springer, 2005.
[4] Marco Gruteser and Dirk Grunwald. Anonymous usage of location-based
services through spatial and temporal cloaking. In Proc. of the 1st International Conference on Mobile Systems, Applications and Services
(MobiSys). The USENIX Association, 2003.
[5] Marco Gruteser and Xuan Liu. Protecting privacy in continuous
location-tracking applications. IEEE Security & Privacy, 2(2):28–34,
2004.
[6] Panos Kalnis, Gabriel Ghinta, Kyriakos Mouratidis, and Dimitri Papadias. Preserving anonymity in location based services. Technical Report
B6/06, National University of Singapore, 2006.
[7] Hidetoshi Kido, Yutaka Yanagisawa, and Tetsuji Satoh. An anonymous
communication technique using dummies for location-based services. In
Proc. of the International Conference on Pervasive Services (ICPS),
pages 88–97. IEEE Computer Society, 2005.
[8] Sergio Mascetti and Claudio Bettini. A comparison of spatial generalization algorithms for lbs privacy preservation. In Proc. of the 1st International Workshop on Privacy-Aware Location-based Mobile Services
(PALMS). IEEE Computer Society, 2007.
[9] Mohamed F. Mokbel, Chi-Yin Chow, and Walid G. Aref. The new
casper: query processing for location services without compromising pri-
63
vacy. In Proc. of the 32nd International Conference on Very Large Data
Bases (VLDB), pages 763–774. VLDB Endowment, 2006.
64
Bottom up approach to manage data privacy policy
through the front end filter paradigm.
Gerardo Canfora, Elisa Costante, Igino Pennino, Corrado Aaron Visaggio
Research Centre on Software Technology
University of Sannio – 82100 Benevento
Abstract
An increasing number of business services for private companies and citizens are accomplished trough the web
and mobile devices. Such a scenario is characterized by high dynamism and untrustworthiness, as a large
number of applications exchange different kinds of data. This poses an urgent need for effective means in
preserving data privacy. This paper proposes an approach, inspired to the front-end trust filter paradigm, to
manage data privacy in a very flexible way. Preliminary experimentation suggests that the solution could be a
promising path to follow for web-based transactions which will be very widespread in the next future.
Keywords: data privacy, front end trust filter.
Introduction
The number and the complexity of processes which are accomplished throughout the web are
increasing. Confidential data are more exposed to be collected lawlessly by humans, devices or
software. The actors involved are often autonomous systems with a high degree of dynamism [15];
negotiations are performed among multiple actors, and cross the boundaries of a single organization
[10]. As a consequence, privacy of personal and confidential data is exposed to several threats [13].
Different technologies have been ideated in order to face this problem, such as: anonymization
[16], fine grain access control (FGAC) [2], data randomization and perturbation [9].
These solutions show some limitations when applied in contexts characterized by high dynamism
and a few opportunities to control data exchange: they are scarcely scalable, they cannot be used in
untrustworthy transactions, or they propose too invasive data access mechanisms, which hinder
flexibility. This discussion is largely achieved in the related work section.
The realization and the adaptation of a data privacy policy is a process of transformation, which
spans from the definition of strategies to properly protect data up to the design of a supporting
technology which implements the established policies. Such process includes three main stages. At
a first stage, a data privacy policy is described in natural language in a document which contains the
rules to disclose sensitive data. At a second stage, the general policy must be refined in specific
strategies, in order to understand which kinds of actions could be performed on certain categories of
data by some categories of users, and under which conditions. Finally, the established strategies
need to be implemented with a suitable technology ensuring that accesses to the data repository
happen accordingly with the strategy.
This paper proposes a three-layered approach which aims at facilitating the management of data
privacy in such a scenario.
The main purpose is to provide the data manager with the capabilities of:
65
1. translating the privacy policies, expressed in a natural language, into low-level protection
rules, directly defined on database fields;
2. providing the database with an adaptive protection, which is able to change accordingly
to: (i) the current state of the database, and (ii) to the knowledge that the user acquires by
aggregating the information achieved throughout the submitted queries over time.
The paper proceeds as follows: in the next section related work is discussed; in the following
section the solution is presented. Thus, the results of the experimentation are provided; and, finally,
conclusions are drawn.
Related Work
Different technologies have been proposed to preserve data privacy, but some of them, which
could be properly adopted in many contexts, could be scarcely effective in highly dynamic systems.
The W3C Consortium developed P3P [17]. It provides a method that permits a Web Site to
codify within an XML file the purposes for which data are collected. It is based on confronting
privacy preferences between information provider and requester. P3P is used by different web
browsers and lets web site to express the privacy policy with a standard structure: the server
according to this structure can choose if deliver data or not. P3P synthesizes the purposes, treatment
modes and retention period for data, but it does not guarantee that data are used accordingly to the
declared policies. Consequently it may be successful in trusted environments.
Researchers of IBM proposed the model of Hippocratic database[1]: it supports the management
of information sharing with third parties. It establishes ten rules for exchanging data; relying on
these rules, queries are re-written, data are obfuscated and cryptography is in place, when needed.
Hippocratic databases use metadata for designing an automatic model for privacy policy
enhancement, named Privacy Metadata Schema. This technique degrades performances, as at each
steps purposes and user authorization must be checked at each transaction. Memory occupation is a
further matter, as the metadata could grow up fast.
The fine grain access control (FGAC)[2], is a mechanism designed for a complete integration
with the overall system infrastructure. Constructs which implement this method must: (i) assure
that access strategies are hidden to users; (ii) minimize the complexity of policies; and (iii)
guarantee the access to tables’ rows, columns, or fields. Traditional implementation of FGAC use
static views. This kind of solution could be used only when constraints on data are few.
Further solutions, like EPAL [3] and the one proposed in [14], allow actors of a transaction to
exchange services and information within a trusted context. The trust is verified throughout the
exchange of credentials or the verifications of permissions to perform a certain action.
Anonymization techniques let organizations to retain sensible information, by changing values of
specific table’s fields. The underlying idea is to make data undistinguishable, as happens in the kanonymity algorithm [16], throughout the perturbation of values within records. Another techniques
require to make data less specific, as happens in the generalization [5]. This technique affects
seriously data quality and may leave the released data set in vulnerable states.
Further mechanisms of data randomization and perturbation [9] hinder the retrieval of
information at individual level. These techniques are difficult to implement, as they are based on
complex mathematics, and however are invasive both for data and applications.
Cryptography is the most widespread technique for securing data exchange [8], even if it shows
some limitations: high costs for governing distribution of keys, and low performances in complex
and multi-users transactions.
Definitions
For a better understanding of this work it is necessary to give the following definitions:
66
x
x
A privacy policy defines the sensitive data whose access must be denied; it is captured as a
set of purposes.
A protection rule (pr) defines if the result set can be disclosed (Legal rule) or not (Illegal
rule). For example, let’s consider the following rules:
a. NO SELECT Fiscal_Code, Surname FROM Person;
b. SELECT Age, Zip_Code FROM Person.
The rule (a) is not legal and establishes that the couple Fiscal_Code – Surname cannot be
disclosed. Otherwise it doesn’t explicitly deny the access to the single attributes. Vice versa the rule
(b) makes attributes Age and Zip_Code of the table Person accessible whether in pairs or singularly.
The state of the database is time dependent and it is defined by the informative content of the
database. It can be modified by means of insert, delete and alter operations. Depending on the
database state, the privacy policy could be enforced or made less restrictive, as vulnerabilities and
threats to privacy preservation could rise or disappear.
Approach
Two complementary approaches could be followed in order to meet the goal:
x Top-down, that derives a set of protection rules by the privacy strategy.
x Bottom-up, that allows the rules definition from the analysis of vulnerability and the
aggregation inference.
The system acts like a filter between the user applications interrogating the database to protect
and the database itself, captures the submitted queries, compares them with the protection rules and
decide if they are to allow or to block. The Top-down approach suffers a major weakness: the rules
can be eluded, by exploiting specific vulnerabilities of the database or, more simply, taking
advantage of the flexibility of SQL that allows to write a single query in a lot of ways. Moreover,
the growth of the user’s knowledge can entail the generation of new protection rules. The goal of
the bottom-up approach is to solve these problems.
Query Filtering
The goal of the filtering is to establish if a
query is:
Start
query q
x
is legal
is attack
x
No
Yes
Legal (to allow), when it doesn’t disclose
sensitive information;
Illegal (to block) when it tries to access
to protected data.
Yes
q is
blocked
q is
allowed
Stop
Figure 1 – Filtering algorithm
To make this possible it is necessary to
evaluate if the handed out query (q, from here
on) matches with a protection rule (pr, from here
on).
As showed in figure 1, the filtering process
can be divided in three steps:
67
1. Query submission;
2. Search for a pr, belonging to the illegal catalog, which matches the q; if a correspondence is
found, the query is blocked, otherwise it is proceeded with step 3;
3. Search for a matching of the q with a pr belonging to the legal catalog; if a correspondence
occurs, the query is forwarded to the database.
In order to recognize the correspondence between a query and a rule, the algorithm Result
Matching has been formulated. Such a comparison is based on the interrogation result rather than on
the syntax used to write it. By this way it’s guaranteed that more queries expressed in different ways
and disclosing the same data, are considered equivalent and thus blocked, as well. When a user
submits a query, the system evaluates if at least one rule that involves the same tables of the query
exists; and than forwards the found rules and the query to the database, capture the result set of the
rules and the query , and, finally, compare them.
Analysis of Acquired knowledge
If there is no matching between the query and the set of rules, the system must establish if the
obtained result set can be disclosed on the base of the information already released. In order to
make this decision, the system must estimate if the aggregation of the information that the user has
acquired through the previous queries and the information released with the last query violate the
established privacy policy. As a matter of fact, a sensitive information can be often composed by
more information with a less sensibility degree. For instance, let’s consider the following illegal
rule, which denies the spreading of the information about which patients are affected by Aids or
Tuberculosis:
-
NO SELECT Diagnosis, Patient FROM Illness WHERE Diagnosis = ‘Aids’ OR
Diagnosis = ‘Tuberculosis’; (r)
and the submission of this two different queries combinations:
x
x
{ (q1) SELECT Diagnosis FROM Illness; (q2) SELECT Patient FROM Illness; }
{ (q3) SELECT Diagnosis FROM Illness ORDER BY Diagnosis;
(q4) SELECT Patient FROM Illness ORDER BY Diagnosis; }
As showed in figure 2, the
combination of q3 and q4 is more
dangerous than the combination of
q1 e q2. The latter, in fact, allows to
match the patient’s id to his illness,
because the result sets are ordered
by the same criteria. Conversely, q1
and q2 do not expose any sorting
rationales.
Figure 2 – Possible Resultset Aggregation
68
Log & Quarantine
The knowledge given by q4 is harmful only if the information released by q3 has been already
obtained and vice versa. That means that it is not compulsory to block both queries to avoid the
violation of the rule r, but it is enough to block only the last submitted one. It is necessary to track
the history of user’s interrogations over time in order to get a complete picture of the overall
knowledge acquired by the user. However, all the queries forwarded to the database will be logged
in a file together with the corresponding information about their success achieved or missed.
When a query is submitted, if it does
not match with any rule, i.e. it does not
belong to the illegal catalog, the system
evaluates if it can disclose sensitive
information and alerts the administrator.
To do this, the system combines the
current query with the previous allowed
ones (described in the log file),
formulating a new query that represents
the aggregation. If this query is not
matched with an illegal rule, the current
user query is allowed, otherwise it is
suspended in a quarantine status. The
whole filtering algorithm is described in
figure 3.
The administrator can decide for each
single suspended query if it is to block or
Figure 3 – Complete filtering algorithm
to allow in the future, generating a new
protection
rule.
An
“in-vitro”
experimentation has been carried out, in order to validate the approach, whose outcomes are
encouraging and stimulated new directions for future research: the next steps consists of realizing a
system for modeling the data domain from a privacy preservation perspective and a system to
capture the knowledge acquired by each user over time, in order to limit exploits based on the
inference.
Experimentation
The experimentation aims at evaluating the approach’s effectiveness, in order to estimate the
robustness of the data protection offered, as the semantic flexibility of SQL could let cheating the
adopted mechanisms to preserve data privacy; moreover, the experimentation is headed to estimate
the performances degradation of the system, in terms of response time, while the catalogued rules’
set grows up.
The figure 4 shows the databases used as
experimental vitro.
In order to test the effectiveness of the Result
Matching algorithm, an experiment has been
realized, which consisted of evaluating the
percentage of blocked queries –which is expected
to be the 100%- within a set of forwarded queries
Figure 4- Experimental Vitro
to the target database.
69
For each database have been formulated:
x
x
x
x
4 rules on 1 attribute of 1 table;
4 rules on 2 attributes of 1 table;
4 rules on 4 attributes of 1 table;
4 rules on 2 attributes of 2 tables.
For each rule, 4 equivalent queries have been written and their effects have been observed. The
matching algorithm proved to be well-built and particularly effective to face up to SQL flexibility.
As a matter of fact the algorithm successfully achieved blocking the overall set of queries.
The second part of the experiment helped analyze how the performances of the solution changed,
in terms of response time, with correspondence to an increasing of both the rules’ number and of the
Resultset size.
It is important to recall that to make possible the result matching it is necessary to submit to the
database both the query to analyze and the rules’ set against which the query is confronted, as not
all the rules in the catalogue are involved when filtering a query.
In order to carry out a more consistent experiment, rules that involve the same tables of the query
have been formulated and catalogued.
The following queries, with a growing number of attributes (and so Resultset size), is analyzed:
x
x
x
x
x
Query1:
Query2:
Query3:
Query4:
Query5:
SELECT fiscal_code FROM person
SELECT fiscal_code, name FROM person
SELECT fiscal_code, name, surname FROM person
SELECT fiscal_code, name, surname, birth_place FROM person
SELECT fiscal_code, name, surname, birth_place, nationality FROM person
All the protection rules refer to the table PERSON, that has the following schema:
x
PERSON(fiscal_code, name, surname, sex, birth_place, nationality)
time(ms)
For each query, the response times have been measured with correspondence to a catalogue with,
respectively, 10 (5 legal and 5 illegal), 50 (25 legal and 25 illegal) and 80 (40 legal and 40 illegal)
rules. Consider that all the queries were allowed with exception of Query4 and Query5 that were
blocked when the catalogues containing 50 and 80 protection rules were used.
The following graphs show the obtained results. It’s possible to observe that Query1, Query2 and
Query3 have the same trend,
Response Time
that is: the response time
increases with the growth of
120
the catalogued rules’ number,
because they produced the
100
same outcomes, namely they
80
Query1
are allowed at all.
Query2
As
expected,
the
60
Query3
performances seem to decrease
Query4
proportionally with the growth
Query5
40
of the catalogue’s size, but the
20
proportional factor could be
not equal to one. In fact,
0
10
50
number of rules
80
70
Figure 4 – Response time corrispong to an increasng of the rules’ set
corresponding to a 500%growth of the rule’s number, it was recorder a 25% increment of the
response time. Moreover, corresponding to an 800% growth of the rules’ number, it was observed a
40% increment of the response time.
Concerning Query4 and Query5, it’s possible to observe a different behavior: the response times
are smaller then the previous ones, because they match with an illegal rule. This means that there is
a fewer number of comparisons to accomplish.
Compared Rules during Filtering
16
Number of submitted rules
14
12
Query1
10
Query2
8
Query3
Query4
6
As well as observed in
figure 5, when the catalogue
counted 80 rules, only 15 were
actually used for comparison,
which means that in the worst
case less then 20% rules out of
the
catalogue
size
are
effectively considered in the
analysis.
Qeury5
4
2
0
10
50
80
Number of catalogued rules
Figure 5 – Effectivly compared rules
Conclusion
With the growing migration of services towards the net, privacy should be managed within
environments characterized by high dynamism: multiple applications are able to access different
data sources, without having in place trust-based mechanisms.
As such scenarios foresee a high scalability and a loose control, existing solutions for data
privacy management could be unfeasible, too costly or scarcely successful.
This work introduces a novel approach to data privacy, inspired to the paradigm of front end trust
filtering. According to this approach the data privacy is managed in a way which aims at reducing
control on transactions exchanging data set, while keeping a high level of robustness in preserving
data privacy.
The proposed solution implements a bottom-up approach, which relies on the comparison of the
result set produced by the forwarded query and the one containing the information which should be
banned, accordingly to the established privacy policy.
Furthermore, this solution helps discover new queries which could menace the privacy of data,
but are not included in the catalogue’ rules, throughout the quarantine management policy.
A preliminary experimentation was carried out in order to prove the effectiveness and the
efficacy of the approach. It emerged that the system is able to successfully face the semantic
flexibility of the SQL, and the degradation of performances with the growing of rules’ number is
limited to the 20% for the worst case.
As future work we are planning a larger experimentation in order to detect further weakness
points of the solution and identify improvement opportunities.
71
References
[1]. Agrawal R., Kiernan,J., Srikant R., and Xu Y., 2002, Hippocratic databases. In VLDB, the 28th Int’l Conference on Very
Large Database.
[2]. Agrawal R., Bird P., Grandison T., Kiernan J., Logan S., Rjaibt W., 2005 Extending Relational Database Systems to
Automatically Enforce Privacy Policies. In ICDE’05 Int’l Conference on Data Engineering, IEEE Computer Society.
[3]. Ashley P., Hada S., Karjoth G., Powers C., Schunter M., 2003. Enterprise Privacy Authorization Language (EPAL 1.1).
IBM Reserach Report. (available at: http://www.zurich.ibm.com/security/enterprice-privacy/epal – last access on 19.02.07)
[4]. Bayardo R.J., and Srikant R., 2003. Technology Solutions for Protecting Privacy. In Computer. IEEE Computer Society.
[5]. Fung C.M:, Wang K., and Yu S.P., 2005. Top-Down Specialization for information and Privacy Preservation. In ICDE’05,
21st International Conference on Data Engineering. IEEE Computer Society.
[6]. Langheinrich M.,2005. Personal privacy in ubiquitous computing –Tools and System Support. PhD. Dissertation, ETH
Zurich.
[7]. Machanavajjhala A., Gehrke J., and Kifer D., 2006. l-Diversity: Privacy Beyond k-Anonymity. In ICDE’06 22nd Int’l
Conference on Data Engineering . IEEE Computer Society.
[8]. Maurer U., 2004. The role of Cryptography in Database Security. In SIGMOD, int’l conference on Management of Data.
ACM.
[9]. Muralidhar, K., Parsa, R., and Sarathy R. 1999. A General Additive Data Perturbation Method for Database Security. In
Management Science, Vol. 45, No. 10.
[10].Northrop L., 2006. Ultra-Large-Scale System. The software Challenge of the Future. SEI Carnegie Mellon University
Report (available at http://www.sei.cmu.edu/uls/ – last access on 19.02.07).
[11].Oberholzer H.J.G., and Olivier M.S., 2005, Privacy Contracts as an Extension of Privacy Policy. In ICDE’05, 21st Int’l
Conference on Data Engineering. IEEE Computer Society.
[12].Pfleeger C.R., and Pfleeger S.L., 2002. Security in Computing. Prentice Hall.
[13].Sackman S., Struker J., and Accorsi R., 2006. Personalization in Privacy-Aware Highly dynamic Systems.
Communications of the ACM, Vol. 49 No.9.ACM.
[14].Squicciarini A., Bertino E., Ferrari E., Ray I., 2006 Achieving Privacy in Trust Negotiations with an Ontology-Based
Approach. In IEEE Transactions on Dependable and Secure Computing, IEEE CS.
[15].Subirana B., and Bain M., 2006. Legal Programming. In Communications of the ACM, Vol. 49 No.9. ACM.
[16].Sweeney L., 2002. k-Anonymity: A model for Protecting Privacy. In International Journal on Uncertainty, Fuzziness and
Knowledge Based Systems, 10..
[17].Platform for Privacy Preferences (P3P) Project, W3C, http://www.w3.org/P3P/ (last access on January 2007).
72
Intrusion Detection Systems based on
Anomaly Detection techniques
Davide Ariu, Igino Corona, Giorgio Giacinto,
Roberto Perdisci e Fabio Roli
University of Cagliari
DIEE Electrical and Electronic Engineering Department
Piazza d’Armi, 09123 Cagliari (Italy)
{davide.ariu, igino.corona, giacinto,
roberto.perdisci, roli}@diee.unica.it
1
Introduction
Statistical Pattern Recognition approaches are currently investigated to provide an effective tool
for Intrusion Detection Systems (IDS) based on Anomaly Detection. In particular, our activities
are mainly aimed at the study and development of statistical Pattern Recognition approaches,
and Multiple Classifier Systems (MCS) for devising advanced techniques for detecting anomalies
(i.e., potential intrusions) in the traffic over a TCP/IP network [1, 2]. These techniques showed
also the ability of hardening Anomaly Detection Systems in the presence of malicious errors [3].
New methodologies for clustering alarm messages from various IDS have been also proposed
[4].
Recently, anomaly detection techniques based on Hidden Markov Models (HMM) have been
proposed for detecting intrusions by analysing the commands exchanged between hosts for a
given application (e.g., FTP, SMTP, etc.) [6]. With the partnership of Tiscali S.p.A., this
research activity produced an implementation of a module for Snort (the most important open
source IDS, http://www.snort.org) that implemented an anomaly detector based on Hidden
Markov Models. In particular, the module has been developed for the FTP (File Transfer
Protocol), and the SMTP (Simple Mail Transfer Protocol) services.
2
State of the art
Hidden Markov Models [5] have been successfully applied in a numer of pattern recognition
applications, and only recently they have been also applied to Intrusions Detection problems.
HMM have been used for Intrusion Detection thanks to their ability to model time-series using
a stateful approach, where the role and meaning of the internal states are “hidden”. The
vast majority of studies that proposed the HMM to implement IDS are related to Host Based
systems, i.e., IDS that analyze the actions performed on a single host to detect attempts of
intrusion. Only few works have proposed HMM for analysing network traffic, by representing
the traffic at the packet level. The application of the Hidden Markov Models for structural
inference in sequences of commands exchanged between host at the application level, appears
very interesting and still unexplored.
73
3
The Proposed HMM-based IDS
In order to perform anomaly detection by HMM, we propose to infer the structure of the
sequences of legitimate commands for application protocols, e.g. the FTP and SMTP protocols.
The basic assumption is that sophisticated attacks realized using these services, may exhibit
“anomalous” sequences of commands exchanged between a host client and a host server. We
use the term anomalous for those sequences that are structurally different from those that can
be considered legitimate or normal. These anomalies may be caused either by the attacker
trying to perform a number of exploits, or by the characteristics of the steps completed during
an attack.
It is easy to see that it is necessary to define the criterion upon which a sequence of commands
should be considered as legitimate. In the following, two different techniques for the definition
of legitimate sequences, are proposed. The technique described in section 3.1 is implemented in
the module developed for Snort (and used in the reported experiments on SMTP traffic), while
in section 3.2 the technique used in the reported experiments on the FTP traffic is described.
Both techniques produce valid results and can be used alternatively.
3.1
Legitimate sequences are those that are accepted by the server
Definition 1 (sequences accepted) A sequence of n commands SEQ = {c1 , c2 , c3 , . . . , cn }
from one specific client to one specific server (in a connection) is considered accepted only
if for every command ci in the sequence, there is a corresponding positive server response1
(i = 1, . . . , n).
Following the definition (1), we require that a legitimate sequence of commands must necessarily be accepted by the server. If the server successfully replied to all the commands in
the sequence, thus it means that such a sequence is in agreement with the protocol rules, and
therefore (with the exception of implementation bugs) it is considered to be a legitimate sequence. For each service, e.g., FTP, and SMTP, an HMM is trained on a training set of accepted
sequences using the Baum Welch algorithm. In the operational phase, each HMM assigns a
value of probability of normality to each of the analyzed sequences of commands, thus “rating”
the structural likeness between the observed sequence, and those supplied during the training
phase. An alarm is then generated if the value of probability assigned to the sequence is smaller
than a fixed decision threshold. Such a threshold is estimated on the training set and depends
on the confidence of the hypothesis that all the training sequences are legitimate.
3.2
Legitimate sequences are those that are not filtered by a signature based IDS)
In order to identify legitmate sequences of commands, the training traffic is first analysed by
a signature based IDS, e.g. Snort. Then, the set of legitimate sequences is made up of all the
sequences that have not raised alarms. This training set is then used to train an HMM (or an
ensemble of HMMs) according to the same technique outlined in the previous section 3.1.
1
R(ci , statei ) is the function that determines the response ri of the server, having as inputs the i-th command
ci , and the term statei , that is the state of the connection at time i, which depends on the initial state and on
all the past i − 1 commands.
74
Table 1: Results attained on the FTP dataset using a set of HMM with 20 hidden states.
Performances are evaluated in terms of the Area Under Curve (AUC), the False Alarm rate
(FA), and the Detection Rate (DR).
4
Experimental Results
In this section we present a summary of the experimental results on a set traffic data provided
by Tiscali SpA. The traffic data is related to two protocols, namely the FTP, and the SMTP
protocols. In particular, FTP data are related to sequences of commands generated by the
users that upload/download their personal Web pages, while SMTP sequences are generated
by the users sending/receiving email messages.
4.1
FTP traffic
In the training phase, we chose to make up the dictionary of symbols (i.e., the set of commands)
by including all the symbols in the training set. We then added a special symbol called NaS
(Not a Symbol) in account of symbols in test data that were not present in the training set
(further details can be found in [6]. The traffic in the test set is made up of normal sequences,
as well as 20 attacks created by some automatic tools (e.g., IDS informer). The training set
is made up of 32,000 sequences (subdivided into ten smaller subsets of 3,200 sequences), while
the test set is made up of 8,000 sequences.
As the performances of HMM are sensitive to the training set, and to the initial values of the
parameters (usually randomly chosen), in this work we explored the performances attained by
combining an ensemble of HMM, each one trained on different portions of the training set. To
this end, we used three techniques for combining the outputs of HMM, namely the Arithmetic
Mean, the Geometric Mean, and the Decision Templates. This solution allowed to attain low
false alarm rates, and high detection rates [6].
Table 1 shows the performances of the best solution in a number of trials, where 100 HMM
are trained, and each HMM has 20 hidden states. Results are evaluated in terms of the AUC
(i.e., the area under the ROC curve), false alarm rate and detection rate. Reported results
clearly show that high vaues of AUC can be attained by combining the HMM by the Decision
Template technique.
If the decision threshold is set to the value that produces the 1% false alarm rate on the
training set, the false alarm rate on the test set is always smaller that 1%. Thus, the threshold
estimated on the training set produce similar results on the test set. With 1% false alarm
rate on the training set, the combination of HMM by the geometric mean provides the highest
performances.
4.2
SMTP traffic
The SMTP traffic provided by Tiscali has been divided into a training set, and two validation
sets. Then, two test sets have been created by using each validation set, and the traffic generated
75
Test set
I
II
Anomaly Detection Alarms
Validation set
Attack set
Unknown
Comand
Unknown
Comand
Command Order
Command Order
67
95
6
1
111
182
6
1
Table 2: Results attained on the SMTP dataset using a HMM with 10 hidden states. Performances are related to test sets I and II. The number of alarms generated by the signature-based
IDS has been approximately 1/3 with respect to those generated by the HMM-based module.
by the Nessus vulnerability scanner against a SMTP victim.
The total traffic used for training is made up of 5,500 SMTP sessions, while the test traffic
is made up of 5,500 legitimate sessions, and 22 attack sessions.
Table 2 shows the higher ratio between the number of alarms raised and the related number
of sessions in correspondence of the attacks set (7/22), than the same ratio computed in correspondence of legitimate traffic (162/5,500, and 293/5,500). Thus, it can be claimed that the
proposed anomaly based module is able to discriminate between normal traffic, and attacks. It
is worth noting, however, that a number of alarms were also raised by a signature-based IDS
on legitimate sessions in the validation set. The HMM-based module allowed detecting attacks
that involved the use of commands not in the training set (6 unknown commands on 7 alarms).
In addition, HMM raised an alarm whenever the order of commands was suspect. Further
analysis is needed to test this module with more sophisticated attack sequences, as Nessus does
not actually complete an attack sequence, but it just aims at uncovering a vulnerabilty in the
tested service.
5
Conclusions
We showed how anomaly detection can be perfomed at the network level by stateful techniques
based on HMM. This detection mechanism is very promising, and showed good tradeoff between
detection rate and false positive rate. Further improvements are expected with SMTP/FTP
commands content analysis.
References
[1] G. Giacinto, F. Roli, L. Didaci, Fusion of multiple classifiers for intrusion detection in
computer networks, Pattern Recognition Letters, 24(12), 2003, pp. 1795-1803
[2] G. Giacinto, R. Perdisci, M. Del Rio, F. Roli, Intrusion detection in computer networks by
a modular ensemble of one-class classifiers, Information Fusion (in press)
[3] R. Perdisci, G. Gu, W. Lee, Using an Ensemble of One-Class SVM Classifiers to Harden
Payload-based Anomaly Detection Systems. ICDM 2006
[4] Perdisci, R., Giacinto, G., Roli, F., 2006. Alarm clustering for intrusion detection systems
in computer networks. Engineering Applications of Articial Intelligence 19 (2006) 429-438
[5] L.R. Rabiner , A tutorial on Hidden Markov Models and selected applications in speech
recognition., Proc. of IEEE, vol.77(2), pp.257-286, February 1989.
76
[6] D. Ariu, G. Giacinto, and R. Perdisci, Sensing attacks in Computers Network with Hidden
Markov Models Proc. of MLDM 2007, Leipzig (D), July 18-20, 2007 (in press)
77
A Statistical Network Intrusion Detection System
D. Adami, C. Callegari, S. Giordano, and M. Pagano
Dept. of Information Engineering, University of Pisa
{d.adami,christian.callegari,m.pagano,s.giordano}@iet.unipi.it
1
Introduction
In this paper we present the design, the implementation, and the validation of a network Intrusion Detection System (IDS) [1], based on anomaly detection techniques. The system, which
has been designed as an original modification of well-tested approaches, relies on supervised
learning techniques. Given a training dataset, the IDS is able to build a model of the normal
behavior of the network. Mainly for this aspect, the system is named Self-Learning Intrusion
Detection System (SLIDS)[2].
2
SLIDS
SLIDS is a software anomaly based IDS, composed of several modules. The modular implementation has been chosen, because it guarantees scalability and extensibility.
The major feature of SLIDS architecture (see figure 1) is that, differently from most of the
current state of the art IDSs, more than one approach for detecting anomalies has been used.
In the following subsection we provide a description of the most important system modules.
2.1
TCP/Markov Module
The TCP/Markov module is based on the idea that TCP connections can be modeled by Markov
chains [3]. SLIDS calculates one distinct matrix for each application (identified on the basis of
the destination port number).
The module only considers a few fields of the packet headers, more precisely the IP destination address and the destination port numberto (to identify a connection), and the TCP flags
(to identify the chain transitions).
During the training phase, the module reconstructs TCP connections and associates a value
Sp = syn + 2 · ack + 4 · psh + 8 · rst + 16 · urg + 32 · f in to each packet.
Figure 1: SLIDS Architecture
78
Thus, each connection is represented by a sequence of symbols Si . These symbols can
be considered as the states of a Markov chain. Hence, the modules calculates the transition
probabilities matrix A, where
aij = P [qt+1 = j|qt = i] =
P [qt = i, qt+1 = j]
P [qt = i]
In the detection phase the TCP/Markov module uses a sliding window (dimension T ) mechanism to process the packets. Thus, when processing the packets, the module computes T symbols S = {SR+1 , SR+2 , · · · SR+T } and estimates the probability P [S|A], where A is the matrix
obtained in the training phase. Actually, the system calculates the logarithm of the Likelihood
Function (LogLF ) and its “temporal derivative” DW (where the default value for the parameter
W is 3).
LogLF (t) =
T
+R
t=R+1
Log(aSt St+1 )
W
1 Dw (t) = LogLF (t) −
LogLF (t − i)
W
i=1
A sequence of non-expected symbols produces a low probability: an anomaly determines a
rapid decrease in the LogLF and a peak in the DW . If the DW is bigger than a threshold (set
by means of Monte Carlo simulation), a security event is generated. The security event has an
anomaly score, which is calculated as Log(DW ).
This module is very efficient in terms of memory and processing requirements. Given the
nature of TCP, the number of actual flags configurations, identified during the training phase, is
usually less than ten, which implies the storage of a matrix composed of less than one hundred
elements (400B) for each application. It is then necessary to store, during the detection phase, T
bytes corresponding to the T flags values inside the sliding window (simulations have shown that
a small T, around 30, is necessary to reveal an intrusion quickly). The ease in the computation
of the Likelihood Function, clearly shows that this method can be applied to on-line detection.
After this phase the packets are forwarded to the TCP/SLR Module.
2.2
TCP/SLR and ICMP/SLR Modules
The SLR modules construct a protocol-specific rule-set, by analysing the first 64 bytes of each
training packet [4], i.e. IP and TCP or ICMP headers plus some bytes of the payload.
The first 64 bytes of the training packets, considered 2 by 2, are called attributes. An
attribute is defined as Ai and its value is called vi . The training phase of this modules consist
of constructing a random set of conditional rules of the form
if
A1 = v1 , A2 = v2 , . . . Ak = vk
then
Ak+1 ∈ V = {vk+1 , vk+2 , . . . vk+r }
where A1 = v1 , A2 = v2 , . . . Ak = vk is the antecedent, while Ak+1 is the consequent.
During the detection phase, the SLR modules analyze the first 64 bytes of each packet and
check if they break some rules. If so, the modules calculate an anomaly score for each broken
rule. The total anomaly score is the sum of all those calculated for each rule.
As for the previous module, this approach is very efficient in terms of memory and processing
requirements. Experimental results have shown that, after the pruning procedure, a total of
about one hundred rules are generated, which means that the memory occupation is of the
order of few KBs. Moreover, once the rule set has been constructed, the system works exactly
as a rule-based IDS, which is suitable for on-line detection.
79
(a) Different modules
(b) SLIDS
Figure 2: Experimental Results
3
Performance Analysis
In this section we discuss the tests carried out to evaluate the performance of SLIDS.
The results highlight that the combined use of different modules detect more anomalies than
classic IDSs. In particular, for TCP packets, the two approaches, TCP/SLR and TCP/Markov,
detect different anomalies, thus the combined use of both of them takes to a significant improvement in the IDS performance.
The experimental results, obtained using the 1999 DARPA evaluation data set [5], highlight
that the combined use of different modules detect more anomalies than classic IDSs.
The alarm threshold in SLIDS has been set, by means of Monte Carlo simulation, so as to
obtain a false alarm rate of about 10%. Figures 2(a) and 2(b) show the results of our tests; the
values inside the graph bars represent the number of detected/not detected attacks. The first
figure shows attacks detected by the different modules, while Fig.6 shows the attacks revealed
by SLIDS as a whole. As it can be seen, the use of several modules, working in parallel on the
packets, allows us to obtain much better performance than using only a single approach. As
an example, let us consider the Data attacks: the use of a single module allows to detect at
most one half of the attacks, while the complete system is able to detect all the Data attacks.
The same consideration is still valid for the detection rate of all the attacks: the maximum
achievable with a single module (TCP/SLR) is 52.3%, while using the whole system it is about
78%.
To evaluate the performance we used a Receiver Operating Characteristic (ROC), which
plots true positive rate vs. false positive rate. Figure 3 shows that the system can achieve
really good results, detecting the 80% of the attacks with only 10-15% of false alarms.
Another test session has been carried out using actual traffic traces, captured within our
testbed network in November 2005. This analysis was performed to evaluate the capability of
SLIDS to face with current security attacks.
The ROC curve obtained in this test session is analogous to the one shown in 3.
The test session carried out on real traffic has confirmed the good performance obtained on
the DARPA dataset. The system has detected the 80% of the attacks with a false alarm rate
of 15%.
4
Conclusions
In this paper we have presented the design and the implementation of SLIDS, an Anomaly
based NIDS. Such IDS has been realized taking into account extensibility and flexibility features. The modularity of the system allows the network administrator to customize SLIDS,
80
Figure 3: Roc Curve
according to the behavior of the network. Moreover, parallel use of several modules permits to
identify a wide range of attacks.
Acknowledgments
This work was partially supported by the RECIPE project founded by MIUR.
References
[1] Denning D.E., An Intrusion-Detection Model IEEE, Transactions on Software Engineering,
February 1987.
[2] Adami, D., Callegari, C., Giordano, S., Pagano, M., Design, Implementation, and Validation
of a Self-Learning Intrusion Detection System, IEEE/IST MonAM 2006.
[3] Nong Ye, et al., Robustness of the Markov-Chain for Cyber-Attack Detection, IEEE, Transactions on Reliability, March 2004.
[4] Mahoney, M., A Machine Learning Approach to Detecting Attacks by Identifying Anomalies
in Network Traffic,
[5]
MIT,
Lincoln
Laboratory,
http://www.ll.mit.edu/IST/ideval/
DARPA
81
evaluation
intrusion
detection,
!"#$%'!*
+
/;
;
!
"
## " $ &*
"
+
<
>
?
@\^`{
>?
!
!
##
"
|&@|`}~~{@{
``
82
*
"

*
&
@}{ * "
&*

&
+ + ><<? !
@}{ !
"
&
< "
"
" "$ "#$%% "$ &#'' * "$ $
83
"
"$

& +"$ "$
"
"
-+"#$%%
+"$/6-"#$%%
"$/
">!&?7
*&
+"$
% ∈ +"$%"
7 % > = ← * ? & [ > ] ← [ > ] "
"
7
"
?<"
"
?<
@}{
!
<~
"
"

"&
>*
&
8

•

• X

•
• X
!
• Y[>
?
• *

•
>
?
*
• >??
84
"
X
X
9#Y[*
9
;
9
<
$

~*~
~^*^^\^
97=>?
#
X
X
"
$
$

*
+$
+
97=>?
;Y[
"
$

$

~}~~*\^`
~~*}^`
`~~*\^`
~~~~*~^`
97=>?<*
"
8
?<
"
"

?<
85
$

• !
• "
!
>*
"
@}{
?<&!
<~
?<

" =
=
=

?

+

?<
"
!
>* ?<
?&
&
+
= &
?=
?<
= &
?
?< = & @~{ @^{
"
} "
>*
?<
?%@
>
&
+
+
+
*
86
& +
$~~~~
~~
"
"
$
& @~{
}
">*

>?
}~~
>?
`^~~
>?}~~
>?`^~~
!
# !
; ?< > ?< &!
<~
?
}
"
!
#
?<\
J
%
+
U
J
U
^\
^
^~
`^

}\\
~}

}
Q
}^}
^\
^
`
\
`~
!
;
]^
J
%
+
U
J
U
Q
\
^~
\`
}\\
\`
}^}
\}
\
`
`^
\
~}
~
^\

`~

"
+ `~ ?< `" ]^ 87
^
!
!
<
?<
"
"
!
<
?<
]^
7
7
#
#
;
<'
X
Z'
[
\'
]'
^
#_
##'
#;'
#<
#X'
#Z
#[
#\
#]
#^
;
<
X
Z
[
\
]
^
#_
##
#;
\$?<

`\~\`\\

`\~}

~~*
~\*

`\~\`\^

`\~}

~~*
~\*`

`^}

`\~}

~~*
~\*\

$

`\~}

$
$
\
?< +}^}

!
88
`
&
"

&
*
" >
@{ $&+~~~ `j!|~*j*
$
++
>++?\>\?}~^
@{ ||$+
¡~~~ @\{
@{
@^{
@{
@{
@}{

<<<
<<``~~<
| ¡ ``} <<<*
<<`}~^~\*
~~~
~
*
¢+
>~~~?

~~~
>
} $ +
*}~~+$
$~~>

!
>
$
>$?++
}^~\
¡
~~ > > *
$
+
$>$+$?
¡
~~\
>
|
>*$
++
*\*
89
Embedded Forensics: An Ongoing Research about
SIM/USIM Cards
Antonio Savoldi and Paolo Gubian
Department of Electronics for Automation
University of Brescia
Via Branze 38, I25121 Brescia, Italy
{antonio.savoldi,paolo.gubian}@ing.unibs.it
Abstract
The main purpose of this paper is to describe the real filesystem of SIM and USIM cards,
pinpointing what the official standard reference does not say. By analyzing the full filesystem of
such embedded devices, it is possible to find out a lot of undocumented files usable to conceal
sensitive and arbitrary information that are unrecoverable with the standard tools normally used
in a forensic field. In order to understand how it is possible to use SIMs/USIMs for data hiding
purposes, the paper will present a tool capable of extracting the entire observable memory of these
devices together with the effective filesystem structure.
Keywords: filesystem, SIM/USIM cards, imaging tool, data hiding
1 Introduction
There are many commercial tools used in order to extract and decode the raw data contained by
SIM/USIM cards, although none of them is capable of revealing the real filesystem’s structure and,
consequently, to discover the multitude of non-standard files which are hidden into such devices.
This is what can be done with SIMBrush1 [7] [8], a new open source tool, developed in ANSI C
for Linux and Windows platforms, aimed at extracting the observable portion of the filesystem of a
SIM/USIM card. The real news for the digital forensics research is to know what is really concealed
and, potentially concealable, in a standard SIM/USIM card’s filesystem, thus demonstrating that data
hiding is possible in such devices.
In the open source arena there are, in the authors’ knowledge, only two examples of such kind of
tool. The first, TULP2G [11] [16], is a framework, developed by NIS (the Netherlands Institute of
Forensics), implemented in C# and whose utility is in the recovery of data from electronic equipment
such as cellular and SIM cards. The second example, BitPim [4], is a program that allows to manipulate, at the logical level, many CDMA phones branded LG, Samsung, Sanyo and other manufacturers.
More information about mobile forensics tools can be found in [14].
Throughout the rest of the paper we will briefly describe what is notable for this tool and, how it is
possible to reconstruct the entire SIM’s and USIM’s filesystem. After that, some users’ scenario will
be shown with regard the presentation of notable data present in such devices. Finally, after having
understood the real structure of the SIM/USIM filesystem, it will be shown how it is possible to use
1
The tool can be required by emailing to author by [email protected]. More information are available at
http://www.ing.unibs.it/ãntonio.savoldi
90
non-standard locations to conceal arbitrary data, giving a practical demonstration of the effectiveness
of the method.
2 SIM/USIM Filesystem
SIM stands for Subscriber Identity Module and together with the Mobile Equipment, that is the user
cellular phone, constitute the Mobile Station, which defines, in the GSM (Global System for Mobile
Communications) system, the so called “end user part”. The evolution of such a pervasive system,
which counts more than one billion of users in the world, is the UMTS (Universal Mobile Telecommunications System), which increases, with respect to the GSM counterpart, the bandwidth for data
exchange. In this case we must consider, in the end user part of the network, the USIM, that is Universal Subscriber Identity Module, together with the Mobile Equipment. Substantially, there are not big
differences between SIM and USIM, from the filesystem structure point of view, although the USIM
contains more data, defined by the ETSI standard reference.
Every SIM/USIM card is a smart card, standardized by ISO: in particular, SIMs are contact (as
opposed to contactless) smart cards, which are specified by ISO standard 7816 [12]. They contain
a microprocessor, three types of memory, which are RAM, ROM and EEPROM and, finally, some
integrated logic to manage the security issues. The SIM can be considered as a black box interfacing
with Mobile Equipment by a standard API (Application Program Interface). The filesystem is stored in
an internal EEPROM and has a hierarchical structure with a root called Master File (MF). Basically,
there are two types of files: directories, called Dedicated Files (DF) and files, called Elementary
Files (EF). The main difference between these two types of files is that a DF contains only a header,
whereas an EF contains a header and a body. The header contains all the metainformation that relates
the file to the structure of the filesystem (available space under DF, number of EFs and DFs which
are direct children, length of a record, etc.) and security information (access conditions to a file),
whereas the body contains information related to the application for which the card has been issued.
In an ordinary SIM/USIM card three types of EF are possible, namely transparent, linear fixed and
cyclic [15]. As said previously, there are a lot of files in an ordinary SIM/USIM card which can be
subdivided with regard to the subscriber, information about acquaintances, SMS traffic, subscriber,
calls, the provider and the cellular system [15].
The operations allowed on the filesystem are coded into a set of commands [15] that the interface device (IFD), which is the physical device capable of interfacing with a SIM and setting up the
communication session, issues to the SIM, then waiting for the responses and, among these, only few
are really important in the SIMBrush tool. Some of this fundamentals commands are SELECT, GET
RESPONSE, READ BINARY, READ RECORD and others related to the manage of the CHV1/CHV2
codes [15].
Thus, in SIMBrush’s core algorithm, only these commands are used, preserving in this way all
integrity data in the filesystem, all data being extracted in read only access mode. It is interesting
to note the access conditions which are indicated in the files header: these acts as constraints to the
execution of commands which protect files from unauthorized manipulation, and only for the duration
of their authorization. In particular these constraints, specified by some bytes in the header of each
elementary file, are related to one set of specific commands issuable to a card, namely update, read,
increase, rehabilitate and invalidate.
As will be clear in the latter section of this paper, these access conditions, for non-standard files,
play a key role to conceal arbitrary data in such files. It is also mentionable that a necessary condition
to extract all the sensitive content from the SIM/USIM card is to have the PIN1 (CHV1) code. Indeed,
it has been used only standard methods in order to extract all digital data. If such condition is not
satisfied, it will be possible to recover only the meta-content of the filesystem.
91
3 The Filesystem Extraction
Standard programs, like that developed by Shenoi [5], can extract only the standard elementary files
starting from the selection rules defined in the reference standard [15]. First, the standards say that the
filename is univocal and, for example, “3F00” identifies the root of the filesystem, which is the master
file of a SIM/USIM card. Second, the SELECT command may be executed with any file specified
with the relative ID with no restrictions. This leads to the opportunity to “brush” the logical ID space
by issuing a SELECT command for each possible ID, from “0000” to “FFFF”, obtaining a warning
from the SIM when the ID does not exist in the filesystem, or the header of the file when it does. In a
SIM’s filesystem there is the concept of current file and current directory.
The SIM’s files are hierarchically selectable with certain constraints as specified in the reference
standard [15]. According with these rules, it is possible to select, for example, the file “6F3C” (SMS)
by issuing two SELECT commands: the first to select the DF 7F10, which is the father of this EF,
and the second to select the file, that is, in this case, the SMS elementary file. As is notable from
the standard reference, the SIM filesystem has an n-ary structure and it is easy to extract the standard
part of every SIM/USIM, just with only a few commands, by reading directly all the elementary
files declared in the standard reference. This is, in the authors’ knowledge, the approach used in all
commercial and open-source tools. With the objective to acquire all the observable memory of a SIM,
that is the data accessible with standard methods, we must define the general selection rule by which
it is possible to “brush” the entire logical address space of EEPROM. By following the reference
standard rules, it is possible to reconstruct the entire filesystem tree.
Presently, SIMBrush is able to extract the body of those files whose access conditions are ALW
and CHV1/CHV2, and the latter case is possible only if the appropriate code is provided, that is when
PIN1 (CHV1) or PIN2 (CHV2) are provided. The main algorithm is based on the construction of a
binary tree, which is a suitable data structure for SIM card data, being this structure equivalent to an
n-ary tree. We can explain the main elements of this pseudo-code:
Procedure Build_Tree
Expand_DF(PARENT_SET = 0, CURRENT_SET = {MF}, DF_SIBLINGS_SET = 0);
End
Procedure Expand_DF(PARENT_SET: NODE, CURRENT_SET: NODE, DF_SIBLINGS_SET: NODE)
Select(CURRENT_SET);
SELECTABLE_SET = Brush(CURRENT_SET);
SONS SET = SELECTABLE_SET \
(MF_SET
U
CURRENT_SET
U
PARENT_SET
U
DF_SIBLINGS_SET);
For each node N belonging to SONS_SET,
Place_in_tree(N);
If N equal DF Then
Expand_DF(PARENT_SET = CURRENT_SET, CURRENT_SET = N,
DF_SIBLINGS_SET = DF_SIBLINGS_SET \ {N});
End
• Build Tree: this procedure initializes the parameters of the recursive function Expand DF.
• Expand DF: is the recursive function that, starting from the filesystem root, brushes the ID
space, searching all existing EFs and DFs and finding all sons of the current node, which are
placed, dynamically, in a binary tree data structure. For each son, if this is an EF then it is placed
in the data structure; otherwise, if it is a DF then the Expand DF function acts recursively,
updating all interested sets.
• NODE: defines the main data structure to store all filesystem’s data.
• Select: sends a SELECT command to the SIM card.
92
• Brush: this function selects a Dedicated File, passed as argument, which becomes the current
DF, and brushes the entire logical ID’s space, obtaining the SELECTABLE set related to such
DF as a result.
The SIMBrush tool produces as output one XML (eXtensible Markup Language) file, which contains the raw data; the next step is to decode these data in a comprehensible form suitable for the
forensics practitioners to derive useful data that could become digital evidence after the analysis. This
part is done by a second tool, written in Perl, whose function is to translate clearly the rough data into
a suitable form, which is the output XML file.
SIMBrush and the interpretation tool have been tested on several SIM and USIM cards of different
sizes, providers and ranging between 16 Kbytes and 128 Kbytes. It is possible to extract every kind
of data, among those defined in the reference standard [15].
4 Conclusions
In this paper we have made a survey about the real features of the SIM/USIM card filesystem. In the
appendix it is possible to have a look at the hidden part of the filesystem of such embedded device,
showing how it is possible to use such non-standard part for data hiding purposes.
References
[1] Autopsy Forensics Browser.
autopsy/.
Software available at: http://www.sleuthkit.org/
[2] Document Object Model. Paper available at: http://www.w3.org/DOM/.
[3] GSM Phone Card Viewer.
Software available at: http://www.linuxnet.com/
applications/files/gsmcard_0.9.1.tar.gz.
[4] R. Binns. BitPim. Software available at: http://bitpim.sourceforge.net/.
[5] G. Manes C. Swenson and S. Shenoi. Imaging and analysis of gsm sim cards. In IFIP International Federation for Information Processing, Springer Boston, pages 205–216, 2006.
[6] Paraben Corporation. Sim Card Seizure. Software available at: http://www.parabenforensics.com/catalog/.
[7] A. Savoldi F. Casadei and P. Gubian. Simbrush: An open source tool for gsm and umts forensics analysis. In Proceedings of Systematic Approaches to Digital Forensic Engineering, First
International Work-shop, Proc. IEEE, pages 105–119, 2005.
[8] A. Savoldi F. Casadei and P. Gubian. Forensics and sim cards: An overview. International
Journal of Digital Evidence, 5, 2006.
[9] Susteen Inc. DataPilot. Software available at: http://www.susteen.com/.
[10] Netherland Forensics Institue.
Card4Labs.
forensischinstituut.nl/NFI/nl.
Software available at:
http://www.
[11] Netherland Forensics Institue. Tulp2G, Forensic Framework for Extracting and Decoding Data.
Software available at: http://tulp2g.sourceforge.net/.
93
[12] ISO.
Identification Cards - Integrated Circuit Cards with Contacts.
Paper
available at: http://www.cardwerk.com/smartcards/smartcard_standard_
ISO7816.aspx.
[13] A. Savoldi and P. Gubian. A methodology to improve the detection accuracy in digital steganalysis. In Proceedings of International Conference on Intelligent Information Hiding and
Multimedia Signal Processing, Proc. IEEE, pages 373–377, 2006.
[14] e-evidence info The Electronic Evidence Information Center. BitPim. Software available at:
http://www.e-evidence.info/index.html.
[15] ETSI TS 100 977 v8.3.0. Specification of the Subscriber Identity Module - Mobile Equipment
(SIM - ME) interface. Paper available at: http://www.id2.cz/normy/gsm1111v830.
pdf.
[16] J. van den Bos and R. van der Knijff. Tulp2g an open source forensic software framework
for acquiring and decoding data stored in electronic devices. International Journal of Digital
Evidence, 4, 2005.
Appendix 1
Following, it is possible have a look at the raw as well the translated data related to the contents of
one SMS record. With the help of reference standards [15] it is easy to extract all the information
about the SMS message, such as the time (date and hour) when the message has been sent, the length
of SMS, the number of the sender, the number of the service center, and finally, the textual message.
<content>
01 07 91
94 61 00
D4 64 13
D3 EB 69
FE 06 B5
7C 1A 74
D9 61 90
22 87 41
D9 4D 97
2F 58 0D
97 41 CD
0D BA 06
A8 FC DD
CD 6F 50
</content>
93
00
14
FA
DF
0D
0C
F3
BF
44
30
A1
7E
98
33
50
B6
1B
ED
42
34
71
41
0E
1E
20
EB
0D
85
70
DB
44
B2
41
AF
58
69
83
9F
72
D3
8A
28
40
D3
0C
9B
F4
BF
9E
36
A6
BE
1A
6F
C5
02
90
F3
83
FE
34
DD
1E
68
F5
06
44
77
72
00
60
37
E2
06
48
65
87
16
B7
A1
4D
3A
FF
04
25
E8
F5
35
5E
79
E5
7B
BB
20
36
05
FF
04
80
2C
F2
C3
3E
BA
65
C1
2C
54
41
4A
FF
85
A0
0F
9C
78
87
0C
50
70
4F
DA
2F
BA
FF
<content>
<Date>04 Jul 05</Date>
<Hour>09.06.52</Hour>
<Length_SMS>160</Length_SMS>
<Number>4916</Number>
<Number_Service_Center>
393358822000
</Number_Service_Center>
<Status>01</Status>
<Text>
TIM avviso gratuito Da questo momento
Maxi WAP ti regala 2 suonerie da scaricare entro il 31/08/05 da SuonerieMaxiWAP (in WAP di TIM/Promozioni).
</Text>
</content>
Appendix 2
The Hidden Part of Filesystem
The non-standard part of the SIM/USIM filesystem has been discovered by the authors by using
SIMbrush [7] [8], created with the main purpose to acquire the entire contents of a smart card memory.
An example of a partial filesystem present in a 128 Kbyte SIM card can be seen in table 2. Each row
of the table is relative to a node of the n-ary tree of the SIM/USIM card filesystem. In this way we
can manage the huge quantity of information regarding the meta-data in a compact way. We can see
seven fields which refer to ID, standard name of an EF or DF, file type (MF,DF,EF), privileges, which
are related to the constraints on the execution of a set of commands, as already said, structure of
94
file (transparent, linear fixed or cyclic), the field related to father of nodes, important to see the real
structure of the n-ary tree, and finally, the size of the elementary files.
By analyzing, for example, the non-standard elementary files under DF “5FFF”, namely the EFs
ranging from “1F0C” to “1F3F”, it is easy to see that these files are modifiable with the Update
command, because the privilege for this command is CHV1. This means that everyone having the
PIN1 of the card is authorized to store arbitrary data by replacing the contents of the existing files.
Clearly, this is the worst case scenario: indeed, it is always possible to modify the contents of these
files, if the card is not protected with the CHV1 code.
In the present case a SIM/USIM card can act as a covert storage channels, because the data hiding
is possible by using concealed storage locations in the filesystem. The mentioned stego-object is represented by the SIM/USIM with the concealed message, which can be allocated in the non-standard
part of the filesystem by using different strategies. Once a message is hidden into the SIM/USIM, it
can be sent from Alice to Bob with a stego-key, thus acting as a covert channel. A representative diagram of what we have explained is shown in figure 1. Now, we are ready to see a possible framework
usable to demonstrate that data hiding is possible in this kind of devices.
Figure 1: Transmission of information by using an ordinary SIM/USIM card as cover-object.
A Framework for Data Hiding
As already explained, hiding data in SIM/USIM cards is based on the presence of a non declared part
in the filesystem that can be used to store arbitrary data if the privileges permit this. We will see a
possible methodology to perpetrate the data hiding, and subsequently, we will discuss best practices
usable to recover the hidden message.
A Possible Data Hiding Procedure
In order to create the stego-object we need to embed the message in the cover-object, namely the
SIM/USIM, by using a portion of the non-standard part of the filesystem. Here we present a possible
scheme for this purpose.
95
• Extraction of the binary image by using SIMBrush: at this stage we need to deal with the
important task of acquiring all the observable content from a SIM/USIM card. This is clearly
possible for example by using the mentioned tool which is able to analyze the entire logical
space of the EEPROM, discovering the non-standard part.
• Creation of the File Allocation Table (FAT): having the complete set of headers related to the
SIM/USIM filesystem, it is quite trivial to obtain the FAT, as shown in table 2.
• Selection of the Writable Non-standard Part (WNSP): by inspecting the privileges regarding the
Update command, it is possible to discover all non-standard files that are modifiable arbitrarily,
in the worst case with the users’ privileges.
• Allocation of the message in the WNSP: the message that is going to be concealed needs to
be broken into many chunks, according to the dimension of the non-standard files that will be
rewritten. At this stage, there are a lot of possible strategies that can be used. The selected
non-standard files will constitute the steganographic key, usable to recover the hidden message.
In order to understand this procedure we can analyze an example, by considering the FAT presented in table 2. In this case, by adding up all file dimensions, the total space occupied is 56887
bytes, whereas the non-standard part is 42859 bytes. The effective writable non-standard part (WNSP)
is 16549 bytes, about 29,1% of the total engaged space. Some SIMs/USIMs analyzed have been reported in table 1.
Table 1: WNSP regarding some of the analyzed SIM/USIM.
#
Provider
Country
EEPROM
Phase
Services
WNSP
NSP
TES
1
2
3
4
5
6
7
8
TIM
Vodafone
BLU
Omnitel
Wind
TIM
TIM
H3G
Italy
Italy
Italy
Italy
Italy
Italy
Italy
Italy
16KB
32KB
64KB
64KB
64KB
128KB
128KB
128KB
2
2
2+
2+
2+
2+
2+
3
GSM
GSM
GSM
GSM
GPRS
GPRS
GPRS
UMTS
0
0
0
0
96
16549
12478
107
151
531
21122
17427
4737
42859
25112
21290
6997
8743
31087
25689
22651
56887
45729
30826
Guidelines for Recovering the Hidden Message
Having demonstrated that data hiding is possible in such devices, it is mandatory to trace some guidelines about which best practices can be used by the forensics practitioner in order to deal with this
problem. Undoubtedly, the first thing is to understand that the actual tools belonging to the field of
cellular forensics, whose aim is to extract the standard part, have a fundamental drawback, not being
able of acquiring all the memory content. Having said this, in the authors’ opinion, it is important to
alert the forensics community in order to fix this absence.
If we assume that we have the complete SIM/USIM memory image, we can see how one can deal
with the problem of the extraction of sensitive data from this device.
• Extraction of the non-standard part from the image: this task is necessary in order to isolate all
the potentially valuable data.
• Application of the steganalysis methods: this is the most challenging step because it is unknown
whether there are any concealed data in the non-standard part, or which coding has been used
for the hiding purpose.
96
The latter step can be really time-consuming and is very similar to the problem of detecting a
hidden message in an ordinary digital image [13]. A possible solution to solve this problem is to
apply a brute force translation method, by decoding the various chunks of non-standard contents
trying to see something intelligible.
Table 2: A partial list of standard and non-standard files extracted from a TIM 128 Kbytes
SIM card.
ID
3F00
2F00
2F05
2F06
2FE2
2FE4
2FE5
2FFE
7F10
5F3A
4F21
4F22
4F23
4F24
4F25
4F26
4F30
4F3A
4F3D
4F4A
4F4B
4F4C
4F50
4F61
4F69
5FFF
1F00
1F01
1F02
1F03
1F04
1F05
1F06
1F07
1F08
1F09
1F0A
1F0B
1F0C
1F1E
1F1F
1F20
1F21
1F22
1F23
1F24
1F34
1F38
1F3D
1F3E
1F3F
1F40
Name
MF
NS
ELP
NS
ICCID
NS
NS
NS
DFTELECOM
NS
NS
NS
NS
NS
NS
NS
SAI
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
File Type
MF
EF
EF
EF
EF
EF
EF
EF
DF
DF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
DF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
EF
Privileges
—
ALW2 ,ALW,ADM,NEV,NEV,NEV
ALW,CHV1,NEV,NEV,NEV
ALW,NEV,NEV,NEV,NEV
ALW,NEV,NEV,NEV,NEV
ALW,NEV,NEV,NEV,NEV
ALW,NEV,NEV,NEV,NEV
CHV1,ADM,NEV,NEV,NEV
—
—
CHV1,CHV1,NEV,ADM,ADM
CHV1,ADM,NEV,ADM,ADM
CHV1,CHV1,NEV,CHV2,CHV2
—
ADM,ADM,NEV,ADM,ADM
ADM,ADM,NEV,ADM,ADM
CHV1,CHV1,NEV,NEV,NEV
ALW,ADM,NEV,NEV,NEV
ALW,CHV1,NEV,NEV,NEV
ADM,ADM,NEV,ADM,ADM
ADM,ADM,NEV,ADM,ADM
CHV1,CHV1,NEV,NEV,NEV
ADM,ADM,NEV,ADM,ADM
ADM,ADM,NEV,ADM,ADM
CHV1,CHV1,NEV,CHV1,CHV1
ADM,ADM,NEV,ADM,ADM
2
Structure
—
linear fixed
transparent
linear fixed
transparent
transparent
transparent
transparent
—
—
linear fixed
transparent
transparent
transparent
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
—
transparent
transparent
transparent
linear fixed
transparent
linear fixed
linear fixed
transparent
transparent
transparent
linear fixed
transparent
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
linear fixed
transparent
transparent
transparent
transparent
Father
—
3F00
3F00
3F00
3F00
3F00
3F00
3F00
3F00
7F10
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
5F3A
7F10
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
5FFF
Size [byte]
—
46
4
330
10
35
6
8
—
—
500
4
2
2
500
1250
128
7000
75
39
70
70
1280
340
500
—
105
175
11
40
4
640
420
20
175
100
16
16
34
70
70
128
1280
340
500
1250
500
500
4
2
2
700
The sequence of privileges is related to, as explained in the paper, the execution of a defined set of commands issuable
to a SIM card, namely Read, Update, Increase, Rehabilitate and, finally, Invalidate.
97
LAVORO: LE LINEE GUIDA DEL GARANTE
PER POSTA ELETTRONICA E INTERNET
Stefano Aterno
Studio Legale Aterno
Docente informatica giuridica
La Sapienza
Roma
Abstract. Confine tra lecito e illecito nel controllo informatico e telematico dell’attività dei
lavoratori. Oggi quante cose riesce a controllare il datore di lavoro ? Cosa può fare il datore di
lavoro e cosa deve comunicare ai lavoratori. Il controllo delle email. Il difficile equilibrio tra la
sicurezza informatica e la tutela del dipendente sul luogo di lavoro. Il divieto di controllo
dell’attività lavorativa del dipendente e l’utilizzo dei dati a fini disciplinari.
Keywords: dipendente; datore di lavoro; controllo attività lavorativa; controllo email; controllo
navigazione internet; dati personali; violazione della corrispondenza; Statuto dei lavoratori.
Policy aziendali.
PACS: Replace this text with PACS numbers; choose from this list:
http://www.aip.org/pacs/index.html
Il Garante per la protezione dei dati personali, il 1 marzo di quest’anno ha emesso
delle linee guida in materia di controllo del dipendente da parte del datore di lavoro.
Il Garante ha dettato delle novità oppure era tutto già scritto e contenuto in norme,
sentenze e Testi Unici?
Innanzitutto va detto che il Garante per la protezione dei dati personali non ha
emanato delle linee guida e basta. Il testo licenziato dall’Autorità è un vero e proprio
provvedimento che sotto certi punti impone divieti e regole di comportamento
attraverso l’adozione di accorgimenti tecnici.
Nel resto, più diffusamente, emana le cd. linee guida (che altro non sono che dei
meri consigli).
I datori di lavoro pubblici e privati non possono controllare la posta elettronica e la
navigazione in Internet dei dipendenti, se non in casi eccezionali. Tale assunto
espresso dal Garante era già menzionato in molte norme e leggi dello Stato, non ultime
quelle previste dallo Statuto dei Lavoratori e dal codice penale.
Le linee guida riprendono un concetto ormai pacifico, e cioè che spetta al datore di
lavoro definire le modalità d'uso di tali strumenti tenendo conto dei diritti dei
lavoratori e della disciplina in tema di relazioni sindacali (si vedano, art. 4 statuto dei
lavoratori e diverse sentenze della Cassazione in materia penale e di lavoro)
Le esigenze di sicurezza informatica e telematica hanno imposto alle aziende un
controllo puntuale e meticoloso del traffico e dei flussi informatici. Ciò ha consentito
il monitoraggio di una mole di dati (connessioni WEB, email, file di log) direttamente
riconducibili ai lavoratori. L’utilizzazione di tali dati o l’accesso a postazioni
informatiche dei lavoratori ha portato a qualche segnalazione al Garante e ad alcune
denunce penali. Vedremo come si sono conclusi alcuni processi.
98
L’utilizzo per finalità private o per scopi ludici (non aziendali) della posta
elettronica (aziendale) o la connessione a siti web (non istituzionali e/o lavorativi) ha
determinato il proliferare di dati personali e soprattutto sensibili.
Dall'analisi dei siti web visitati si possono trarre informazioni anche sensibili sui
dipendenti e i messaggi di posta elettronica possono avere contenuti a carattere
privato.
Partendo dal presupposto del concetto di bene aziendale e pertanto strumentale al
raggiungimento degli scopi lavorativi si sta cercando una soluzione al problema del
controllo di fatto sull’attività lavorativa del dipendente. Una delle soluzioni è la
prevenzione e l’adozione di rigorose policy interne.
La comunicazione e l’informazione a tutti i dipendenti delle regole e della policy
dovrebbe portare ad una corretta ed equilibrata soluzione del problema.
“Occorre prevenire usi arbitrari degli strumenti informatici aziendali e la lesione
della riservatezza dei lavoratori” (Garante per la protezione dei dati personali) .
L'Autorità prescrive innanzitutto ai datori di lavoro di informare con chiarezza e in
modo dettagliato i lavoratori sulle modalità di utilizzo di Internet e della posta
elettronica e sulla possibilità che vengano effettuati controlli.
Il Garante vieta poi la lettura e la registrazione sistematica delle e-mail così come il
monitoraggio sistematico delle pagine web visualizzate dal lavoratore, perché ciò
realizzerebbe un controllo a distanza dell'attività lavorativa vietato dallo Statuto dei
lavoratori.
Viene inoltre indicata tutta una serie di misure tecnologiche e organizzative per
prevenire la possibilità, prevista solo in casi limitatissimi, dell'analisi del contenuto
della navigazione in Internet e dell'apertura di alcuni messaggi di posta elettronica
contenenti dati necessari all'azienda.
Il provvedimento raccomanda l'adozione da parte delle aziende di un disciplinare
interno, definito coinvolgendo anche le rappresentanze sindacali, nel quale siano
chiaramente indicate le regole per l'uso di Internet e della posta elettronica.
Il datore di lavoro è inoltre chiamato ad adottare ogni misura in grado di prevenire
il rischio di utilizzi impropri, così da ridurre controlli successivi sui lavoratori.
Ma possiamo ritenere sufficiente ciò che dice il Garante a proposito del fatto che
per quanto riguarda Internet è opportuno ad esempio individuare preventivamente i
siti considerati correlati o meno con la prestazione lavorativa e utilizzare filtri che
prevengano determinate operazioni, quali l'accesso a siti inseriti in una sorta di black
list o il download di file musicali o multimediali ? Non si rischia forse di limitare
l’ennesima libertà che era costituita dal navigare liberamente in Internet ovunque?
E’ possibile che non ci siano altre soluzioni magari più severe e responsabilizzanti
ma non certo limitative delle libertà fondamentali?
Uno dei problemi che si pone con maggiore frequenza è il dovere controllare la
posta elettronica in assenza del dipendente. Alcune soluzioni già adottate dalle aziende
sono state riprese dalle Linee Guida. Sono sufficienti ? Alcune sono senz’altro
importanti e fondamentali, altre sono foriere di ulteriori problemi. Vedremo insieme
quali.
Qualora queste misure preventive non fossero sufficienti a evitare comportamenti
anomali, gli eventuali controlli da parte del datore di lavoro devono essere effettuati
con gradualità. In prima battuta si dovranno effettuare verifiche di reparto, di ufficio,
di gruppo di lavoro, in modo da individuare l'area da richiamare all'osservanza delle
regole.
99
Solo successivamente, ripetendosi l'anomalia, si potrebbe passare a controlli su base
individuale.
Uno dei punti principali delle Linee Guida è il richiamo ad alcuni principi già
richiamati nel codice privacy:
-
principio di necessità
principio di pertinenza e di non eccedenza
principio di correttezza e di informazione
Al di là di tante norme e di tanti principi quali sono i rischi veri, concreti e reali che
corrono i lavoratori e quali rischi altrettanto temuti corrono le imprese ed i singoli
imprenditori ?
100
A Practical Web-Voting System
Andrea Pasquinucci
UCCI.IT, via Olmo 26, I-23888 Rovagnate (LC), Italy
Abstract We present a reasonably simple web-voting system. Simplicity, modularity,
voter’s trust and the requirement of compatibility with current web browsers lead to
a protocol which satisfies a set of security requirements for a ballot system which is not
complete but sufficient in many cases. We discuss the requirements, the usability problem
and the threats of this protocol (a demo is on-line at [6]).
1
Introduction
Recently E-voting has been one of the hottest topics in security and cryptography. In
practice E-voting is traditional voting at a voting booth aided by a voting machine. Here
instead we consider web-based voting, that is voting using a web-browser and posting
the vote on a web-site. Web-voting has some intrinsic features which makes it inherently
different from E-voting due to the delocalization of the voter. But if on one side delocalization of the voter can be seen as a weakness of web-voting, on the other hand it is also
its strength, since with web-voting it is possible to express a vote practically anywhere
and in any moment, potentially enlarging people participation to decision processes.
Here we present a simple, modular protocol [14, 13, 12], based on sound and common
cryptography and previous results in the literature [1, 9, 20, 3] which can be implemented
as a web service compatible with current web browsers and not requiring any user education. Most of the cryptographic operations required in the protocol can be implemented
with common tools like PGP or gnupg [15, 8, 11].
Before describing the protocol, its implementation and the threat analysis, it is important to consider two other aspects of web-voting: the requirements that a web-voting
system should satisfy and the problems of user trust, user interface and usability.
2
Web-voting requirements
Among the most important requirements often used in defining an electronic voting system
[1, 9, 20, 7], our protocol satisfies the following:
1. Unreusability (prevent double voting): no voter can vote twice
2. Privacy: nobody can get any information about the voter’s vote
101
3. Completeness: all valid votes should be counted correctly
4. Soundness: any invalid vote should not be counted
5. Individual verifiability: each eligible voter can verify that her vote was counted
correctly
6. Weak Eligibility: only eligible voters can get voting credentials from trusted authorities
7. User-friendliness: the system is easy (intuitive) to use
Our protocol does not satisfy Eligibility (only Weak Eligibility) nor Receipt-freeness,
both of which mean that a voter is able to prove to someone else how she has voted, and
so to sell her vote. Our protocol does not satisfy also Incoercibility.
Indeed, any voting system in which voters don’t express their vote in a controlled environment (the voting booth) cannot prevent coercion and vote-selling. In the case of
web-voting, but the same is true for voting by normal mail, the voter can always take a
picture, make a movie or vote in the presence of the person to whom the vote is sold. The
voter can also sell the voting credentials, unless they are of biometric type.
User-friendliness instead impacts other requirements because to obtain higher security
it is often necessary to adopt advanced cryptographic algorithms and protocols. Unfortunately these cryptographic protocols often require the intervention of the voter and this
means that the voter should understand something, albeit little, of the protocol itself.
This is often impossible if the voter does not have a background in cryptography, even
if she is a computer expert (see for example the recommendations in [19] in the case of
E-voting).
3
The human factor
Voting is not only a technical fact. It is most important that the voters trust the process
and the result of the vote. In traditional voting the electoral committee and the verification of the procedures by the representatives of the competing parties, realize the controls
necessary for the voter to trust the final result.
In E-voting and web-voting, the voters, the electoral committee and the representatives
of the competing parties must trust the software and hardware which implement the
electronic voting system. As we all well know, the security of current electronic system
is weak and for sure it is so perceived by everybody. We cannot expect that voters trust
unconditionally a web-site for protecting their anonymity and counting correctly the votes.
To make the voters trust an electronic voting protocol we need to give the voters themselves some way of controlling the correctness of the process, for example by giving them
a receipt with which they can check that their vote has been counted correctly in the
final tally. This of course can make vote-selling easier, but in web-voting this would be
possible anyway.
102
We believe that today it is somehow more important to renounce to some properties of
a web-voting protocol, but make the voter understand and trust the process. This can
limit the applicability of a web-voting protocol, but it makes it easier to implement and
to manage the associated risks.
4
The modular protocol
Following the traditional voting procedure, and also to reduce the risk of having both
authentication data and votes on the same system, it has been decided to divide the
phases of authentication and vote and to realize them on different machines (for a similar
approach see also [3]). In this way, only by the collusion of the managers of the two servers
it is possible to associate a vote to a voter.
The voter connects first with her web-browser to the authentication web-server (all web
connections are encrypted with SSL/TLS) and she is requested her voting credentials,
a username-password or client digital certificate. These credentials allow her to access
the system. Then the voter presents a Secret Token which is unique per voter and per
vote and that can be used only once. The authentication server creates then a Vote
Authentication which is based on a random number and is encrypted with the public key
of the vote server and digitally signed.
Alternatively, blind signatures can be used [5, 3]. With blind signatures the random
number of the Vote Authentication is created by the voter and the authentication server
does not know its value. This reduces the risk of collusion of the authentication and vote
server managers, but requires a custom built web-browser since current web-browsers do
not support blind signatures and creates a (very small) possibility of two users creating
the same Vote Authentication, in which case only one will be able to vote.
The voter then connects to the vote web-server through an anonymizer service, this
hides the IP address of the voters [4, 10]. As an extra anonymizing measure, a voter can
also connect to the vote web-server using Tor [21, 17]. Notice that anyway all current
browsers in default configuration leak some information about the voter, for example in
the USER-AGENT field.
The voter sends to the vote server the Vote Authentication and the vote server verifies
that the digital signature is done by the authentication server and that the Vote Authentication has not already been used. If this is true the vote is registered and the Vote
Authentication is marked as used. (All data written on the authentication and vote server
is encrypted with the public key of the authentication and vote committee respectively.)
The vote server then sends to the voter a cryptographic Vote Receipt which allow the
voter to verify that her vote has been counted correctly in the final results.
At the end of the voting time, all encrypted votes are sent to the vote committee which
decrypts, counts the votes and posts the results and each vote with its vote receipt.
Analogously the authentication committee decrypts all used Secret tokens and publishes
its list.
103
5
Threats
Besides the possibility of software bugs (the current implementation uses standard software like a linux distribution with kernel fortified with RSBAC [18], apache [2], php [16]
and standard cryptographic libraries [8, 11]) or the possibility of collusion between the
managers of the two servers, among the most difficult threats to counter are timing attacks
and impersonation.
In general we can divide the attacks in two classes: the attacks which try to modify or
add votes, and the attacks against anonymity.
The modification of a vote expressed by a voter will be discovered by the voter herself
using the receipt. It is more difficult to discover if someone has voted in place of some
absentees. Indeed it is improbable that absentees will check that their Secret Token has
not been used, so it will be difficult to discover if someone having learned unused Secret
Tokens, has used them. Notice that on the authentication server appear only the hashes
of the secret tokens, so that the authentication manager cannot mount this attack. This
is then a purely human attack, which cannot be prevented by electronic measures.
Timing attacks are more subtle but also more difficult to mount. First of all, the protocol
does not require that the voter votes just after having received her Vote Authentication,
but this is what most people will do. On the vote server no times of the votes are recorded,
so that all votes appear as if submitted at the same moment. On the other side, times
are present in the protocol: the time of creation of the Vote Authentication is recorded
and the vote receipt is built with a cryptographic routine using as one of the ingredients
a precise time-stamp. In practice the only way of violating the anonymity of the vote
is to match the information known to the authentication manager with that potentially
known to the vote manager: for each vote the vote manager must record also its time (the
system does not record it, so the vote server must be modified to record this information)
and then match them with what recorded by the authentication manager. This attack
requires the collusion of the two managers and the modification of one server.
Acknowledgments
We thank D. Bruschi and A. Lanzi for discussions.
References
[1] B. Adida. Advances in cryptographic voting systems. PhD thesis MIT, 2006.
[2] Apache web server. http://httpd.apache.org/.
[3] A Protocol for Anonymous and Accurate E-Polling, volume Lecture Notes in Computer Science 3416. Springer, 2005. Proceedings of E-Government: Towards Electronic Democracy, International Conference TCGOV 2005, Bolzano Italy 2005.
104
[4] D. Chaum. Untraceable electronic mail, return address and digital pseudonyms.
Comm. ACM 24(2) 84, 1981.
[5] D. Chaum. Security without identification; transaction systems to make big brother
obsolete. Comm. ACM 28(19) 1030, 1985.
[6] Ucci.it eballot. http://eballot.ucci.it.
[7] E. Gerck. Voting system requirements. The Bell, 2001.
[8] Gnupg. http://www.gnupg.org/.
[9] D. Gritzalis, editor. Secure electronic voting. Kluwer Academic Publisher, 2002.
[10] A.D. Rubin M.K. Reiter. Anonymous web transactions with crowds. Comm. ACM
42(2) 32, 1999.
[11] Openssl. http://www.openssl.org/.
[12] A. Pasquinucci.
Implementing the modular eballot
http://eballot.ucci.it/, http://arxiv.org/abs/cs.CR/0611067, 2006.
[13] A. Pasquinucci.
A modular eballot
http://arxiv.org/abs/cs.CR/0611066, 2006.
system.
system
v1.0.
http://eballot.ucci.it/,
[14] A. Pasquinucci. Some considerations on trust in voting. http://eballot.ucci.it, 2006.
[15] Pgp. see e.g. rfc1991, rfc2440, rfc3156, http://www.pgp.com/.
[16] Php. http://www.php.net/.
[17] P. Syverson R. Dingledine, N. Mathewson. Tor: the second-generation onion router.
http://tor.eff.org/, 2004.
[18] Rsbac: Rule set based access control. http://www.rsbac.org/.
[19] R.G. Saltman. Independent verification: essential actions to assure integrity in the
voting process, preliminary review for nist. NIST, 2006.
[20] B. Schneier. Applied Cryptography. Wiley, 1996.
[21] The onion router. http://tor.eff.org/.
105