Open BAR - CE group DISP

Transcription

Università degli Studi di Roma “Tor Vergata”
Facoltà di Ingegneria
Dottorato di Ricerca in
Informatica ed Ingegneria dell’Automazione
Ciclo XXIII
Open BAR
A New Approach to Mobile Backup And Restore
Vittorio Ottaviani
A.A. 2010/2011
Docente Guida/Tutor:
Coordinatore:
Prof. Giuseppe F. Italiano
Prof. Daniel P. Bovet
to my parents
because an example is worth a thousand words
Abstract
Smartphone owners use to save always more information, and more important data into the internal memory of their devices. Mobile devices are prone
to be lost, stolen or broken; this causes the loss of all the information contained
in it if these data are not backed up. While many solutions for making backups and restoring data are known for servers and desktops, mobile devices
pose several challenges, mainly due to the plethora of devices, vendors, operating systems and versions available in the mobile market. In this thesis, we
propose a new backup and restores approach for mobile devices, which helps
to reduce the effort in saving and restoring personal data and migrate from a
device to another. Our approach is platform independent: in particular, we
present some prototypes based on different mobile operating systems: Google
Android, Windows Mobile 5 and 6 and Symbian S60. The approach grants the
security of the information backed up and restored using novel cryptographic
techniques optimized for mobile. Another feature of our approach lies in the
capability of offering additional services to the final user or to administrator of
the system. As an example, for users, we provide a service enabling the sharing of information in mobile devices among a group of selected persons. This
can be useful in many situations e.g., in creating a mobile business network
among a group of people. For administrators we offer a social network extractor which, starting from information contained into the smartphone and data
publicly available on the web generates a social graph of the backup network.
This can be useful in situations like creating teams into an enterprise.
i
Acknowledgements
During the years of my PhD several persons have passed into my life, some of
these persons have leaved a sign that will never be deleted.
First of all I want to thank Pino: your way to approach things, always searching for the best, inspires me everyday; I learned some of the most important
sessions of my life thanks to you.
I want to thank all the colleagues and friends who believed in me during hard
times and who enjoyed with me successes; Emanuele, Cristina, Danilo and
Paolo thank you guys for the support and for sharing with me your experience.
Special thanks go to Fabio and to Ermanno. . . I will not write another thesis to
explain this thanks: each one of you knows. . .
Thanks to my family for your unconditioned love and trust in me. Words cannot fully express how important you are to me.
Finally thank you Ramona, you are my love, my best friend and the reason
why every morning I wake up and do my best to be a better person. . .
iii
Table of Contents
1
2
3
Introduction
1
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1.1
How much does data loss cost? . . . . . . . . . . . . . . .
2
1.1.2
Focusing on mobile . . . . . . . . . . . . . . . . . . . . . .
5
1.2
Our solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.4
Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Backup & restore in the third millennium
13
2.1
Backup features . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.1.1
Full backup . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.1.2
Incremental backup . . . . . . . . . . . . . . . . . . . . .
15
2.1.3
Differential backup . . . . . . . . . . . . . . . . . . . . . .
16
2.1.4
File-based vs. device-based . . . . . . . . . . . . . . . . .
17
2.1.5
Scheduled backup vs continuous data protection . . . .
18
2.1.6
Local backup vs. remote backup . . . . . . . . . . . . . .
19
2.2
Mobile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.3
Local backup for mobile device . . . . . . . . . . . . . . . . . . .
21
2.4
Remote backup for mobile device . . . . . . . . . . . . . . . . . .
22
Our approach to backup
25
3.1
A new approach to backup & restore . . . . . . . . . . . . . . . .
26
3.1.1
Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.1.2
Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
v
TABLE OF CONTENTS
4
3.2
Sharing backup data . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.3
Social network analysis . . . . . . . . . . . . . . . . . . . . . . . .
32
3.4
Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
Data extraction
35
4.1
Forensic Style Approach . . . . . . . . . . . . . . . . . . . . . . .
37
4.1.1
Our methodology . . . . . . . . . . . . . . . . . . . . . . .
38
4.1.2
Symbian implementation . . . . . . . . . . . . . . . . . .
39
4.1.3
Windows Mobile implementation . . . . . . . . . . . . .
41
4.1.4
Some remarks on this approach . . . . . . . . . . . . . . .
47
Selection of interesting data . . . . . . . . . . . . . . . . . . . . .
49
4.2.1
Symbian . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.2.2
Android . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4.3
Performances . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.4
Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.2
5
vi
Data elaboration
55
5.1
Remote elaboration . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5.2
Our step-by-step Methodology . . . . . . . . . . . . . . . . . . .
59
5.2.1
Stage 0: Choice of the objective . . . . . . . . . . . . . . .
62
5.2.2
Stage 1: Files of interest identification . . . . . . . . . . .
62
5.2.3
Stage 2: Data hypotheses and entities injection . . . . . .
64
5.2.4
Stage 3: Sequences similarity discovery . . . . . . . . . .
67
5.2.5
Stage 4: Data interpretation . . . . . . . . . . . . . . . . .
68
5.2.6
Stage 5: Meta-format building . . . . . . . . . . . . . . .
70
5.2.7
Stage 6: Error correction . . . . . . . . . . . . . . . . . . .
72
5.2.8
Stage 7: Parser building . . . . . . . . . . . . . . . . . . .
74
TABLE OF CONTENTS
5.2.9
6
Stage 8: Testing and debugging . . . . . . . . . . . . . . .
74
5.3
Remote elaboration results . . . . . . . . . . . . . . . . . . . . . .
75
5.4
Local elaboration . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
Protecting saved data
81
6.1
Key agreement algorithm . . . . . . . . . . . . . . . . . . . . . .
82
6.1.1
Mathematical setting: key agreement protocol . . . . . .
83
6.1.2
J2ME implementation . . . . . . . . . . . . . . . . . . . .
85
6.1.3
Performance testing methodology . . . . . . . . . . . . .
87
6.1.4
Performance evaluation . . . . . . . . . . . . . . . . . . .
89
6.1.5
Experimental results . . . . . . . . . . . . . . . . . . . . .
91
6.1.6
Concluding remarks . . . . . . . . . . . . . . . . . . . . .
93
Encryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . .
93
6.2.1
Performances . . . . . . . . . . . . . . . . . . . . . . . . .
94
6.2.2
Statistically testing QP-DYN and RC4 . . . . . . . . . . .
98
Protecting inter process communication . . . . . . . . . . . . . .
100
6.3.1
State of the art . . . . . . . . . . . . . . . . . . . . . . . . .
101
6.3.2
The framework . . . . . . . . . . . . . . . . . . . . . . . .
103
6.3.3
The framework implementation . . . . . . . . . . . . . .
108
6.3.4
On a real device . . . . . . . . . . . . . . . . . . . . . . . .
112
6.2
6.3
7
Value added services on backup data
115
7.1
Sharing backup data with closed groups . . . . . . . . . . . . . .
116
7.1.1
Social backup in business environment . . . . . . . . . .
116
7.1.2
Sharing conference data . . . . . . . . . . . . . . . . . . .
117
7.1.3
Shared backup for smartphone . . . . . . . . . . . . . . .
118
7.1.4
Running the application . . . . . . . . . . . . . . . . . . .
119
vii
TABLE OF CONTENTS
7.2
7.3
8
Extracting social network . . . . . . . . . . . . . . . . . . . . . .
120
7.2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
121
7.2.2
Related work . . . . . . . . . . . . . . . . . . . . . . . . .
122
7.2.3
Smartphone Data Analysis (SDA) . . . . . . . . . . . . .
124
7.2.4
Web Data Analysis (WDA) . . . . . . . . . . . . . . . . .
126
7.2.5
Clustering Analysis (CA) . . . . . . . . . . . . . . . . . .
129
7.2.6
The Final Result: The Social Network . . . . . . . . . . .
132
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
133
Conclusions and Future Work
A The Symbian S60 format
135
139
A.1 Address book . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
139
A.2 Calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
141
A.3 Events log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
147
A.4 SMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
150
B The Backup communication protocol
157
B.1 Backup item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
157
B.2 Contact item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
158
B.3 Calendar item . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
159
B.4 Message item . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
160
B.5 Generic file item . . . . . . . . . . . . . . . . . . . . . . . . . . . .
161
B.6 Setting item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
162
B.7 List methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
162
B.8 Restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
164
B.8.1
viii
Listing items on the server . . . . . . . . . . . . . . . . . .
164
TABLE OF CONTENTS
B.8.2
Choosing data to be restored . . . . . . . . . . . . . . . .
C The Sharing communication protocol
164
167
C.1 Sharing methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
167
C.1.1 Item listing . . . . . . . . . . . . . . . . . . . . . . . . . .
167
C.1.2 Share a item . . . . . . . . . . . . . . . . . . . . . . . . . .
168
C.1.3 Location based sharing . . . . . . . . . . . . . . . . . . . .
169
C.1.4 Listing shared data . . . . . . . . . . . . . . . . . . . . . .
170
C.2 Groups methods . . . . . . . . . . . . . . . . . . . . . . . . . . . .
172
C.2.1 Creating group . . . . . . . . . . . . . . . . . . . . . . . .
172
C.2.2 Listing groups . . . . . . . . . . . . . . . . . . . . . . . . .
172
C.2.3 Handling invitations . . . . . . . . . . . . . . . . . . . . .
173
Bibliography
189
ix
List of Figures
1.1
Costs of data loss per industry sector (values are in million $ per
year) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Smartphone and PC sales prevision in Million of units . . . . . .
5
1.3
2007 - 2010 trend mobile operating systems market share. . . . .
6
1.4
Mobile cellular, subscriptions per 100 people, 2009. . . . . . . .
7
3.1
Backup and Restore system architecture. . . . . . . . . . . . . . .
27
3.2
Example of data model for a contact. . . . . . . . . . . . . . . . .
28
3.3
Example of a request of a contact. . . . . . . . . . . . . . . . . . .
29
3.4
Example of client server interactions. . . . . . . . . . . . . . . . .
30
4.1
Data collection workflow . . . . . . . . . . . . . . . . . . . . . . .
39
4.2
Windows Mobile 5.0 memory architecture. . . . . . . . . . . . .
42
4.3
(a) Symbian S60 tool’s screenshot, (b) Windows Mobile tool’s
screenshot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.1
The methodology flow . . . . . . . . . . . . . . . . . . . . . . . .
59
5.2
The format of the Ω operations sequence. In this figure is shown
an example with contacts discovery as objective . . . . . . . . .
5.3
64
These figures show an example of a DBMS binary file before
and after the Stage 3. In (a) the sample file after making pairs
of calls of the same duration (Stage 2). In (b) equal sequences
highlighted. In (c) the formatted file Φ̂0 . . . . . . . . . . . . . . .
5.4
68
This three figures depict an example of the application of Stage
5 on a file containing the phone’s address book. . . . . . . . . .
71
xi
LIST OF FIGURES
5.5
The architecture of the backup server . . . . . . . . . . . . . . . .
78
6.1
Key Agreement process using conjugate. . . . . . . . . . . . . .
84
6.2
Public data and Key Agreement generation time: all tests . . . .
89
6.3
Public data and Key Agreement generation time: results with an
upper bound of 1 sec. . . . . . . . . . . . . . . . . . . . . . . . . .
6.4
90
Overall encryption and decryption time comparison between
(sizes in bytes) (a) RC4 512-bit and QP4, (b) RC4 768-bit and QP5,
(c) RC4 1024-bit and QP6. . . . . . . . . . . . . . . . . . . . . . .
6.5
AES CFB 256-bit and QP3 (sizes in bytes). . . . . . . . . . . . . .
6.6
95
96
AES 256-bit and QP3 (sizes in bytes). . . . . . . . . . . . . . . . .
97
6.7
Mutual Authentication phase. . . . . . . . . . . . . . . . . . . . .
105
6.8
Session Authentication phase. . . . . . . . . . . . . . . . . . . . .
106
6.9
Session Encryption phase. . . . . . . . . . . . . . . . . . . . . . .
107
6.10 SAVED framework main packages. . . . . . . . . . . . . . . . . .
109
7.1
Use case of meeting backup and share. . . . . . . . . . . . . . . .
117
7.2
Android Backup and Restore client. . . . . . . . . . . . . . . . .
118
7.3
Android Backup and Restore client. . . . . . . . . . . . . . . . .
119
7.4
The graph representation of contacts (a) and their relationships with
the phone’s owner (b), which are revealed by the number of calls and
number of SMS/MMS. In (c) is shown the graph after the execution of
SESORR; the edges represent the relationships extracted from the Web
7.5
xii
(web-edges). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
125
Frequency distribution of URLs (domains) providing relationships. . .
128
LIST OF FIGURES
7.6
Contact-to-cluster assignment. . . . . . . . . . . . . . . . . . . . . .
7.7
Clustering metrics trends. The profile graph, used in the example, has
130
218 contacts and 1242 Web edges; the black vertical line is relative to
. . . . . . . . . .
131
The final result of the whole process: the social network clusters. . . .
132
B.1 Example of XML payload for a backup item. . . . . . . . . . . .
157
B.2 Example of XML payload for a contact item. . . . . . . . . . . .
158
B.3 Example of XML payload for a calendar item. . . . . . . . . . . .
159
B.4 Example of XML payload for a message item. . . . . . . . . . . .
160
B.5 Example of XML payload for a generic file item. . . . . . . . . .
161
B.6 Example of XML payload for a contact list response. . . . . . . .
163
B.7 Example of XML payload for a setting item. . . . . . . . . . . . .
165
B.8 Restore method response. . . . . . . . . . . . . . . . . . . . . . .
166
C.1 Example of XML payload for a list of items. . . . . . . . . . . . .
167
C.2 Example of XML payload to share an item with a group. . . . .
168
k = 10, the chosen value for the input parameter k.
7.8
C.3 Example of XML payload to share an item with a group using
location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169
C.4 Example of XML payload for a list of items. . . . . . . . . . . . .
170
C.5 Example of XML payload to create a group. . . . . . . . . . . . .
172
C.6 Example of XML payload of a list of groups. . . . . . . . . . . .
173
C.7 Example of XML payload to invite users to a group. . . . . . . .
174
C.8 Example of XML payload of invitations received by the user. . .
175
xiii
List of Tables
1.1
Cost and causes of data loss . . . . . . . . . . . . . . . . . . . . .
2
2.1
Comparison of backup approaches . . . . . . . . . . . . . . . . .
16
4.1
Files generated during the Extraction Process . . . . . . . . . . .
40
4.2
Windows Mobile 5.0 relevant files . . . . . . . . . . . . . . . . .
47
4.3
Extraction tool consistency analisys . . . . . . . . . . . . . . . . .
48
4.4
Time overhead of the backup operation per data type . . . . . .
53
5.1
Symbian files of interest . . . . . . . . . . . . . . . . . . . . . . .
75
6.1
Time used from algorithms to generate the secret to agree a SSK.
92
6.2
Time overhead for the framework phases. . . . . . . . . . . . . .
113
A.1 Possible values for the rows of table “DATA TYPE TABLE”. They
describe the type of attributes present in the “DATA BLOCK”.
(Symbian S60 v2) . . . . . . . . . . . . . . . . . . . . . . . . . . .
139
A.2 This table lists all contact’s data which can be found in the Contacts.cdb. Since data are located in three logical file areas, the
table is split in three parts. . . . . . . . . . . . . . . . . . . . . . .
153
A.3 This table lists all calendar entries such as Notes Meetings Anniversaries stored in the Calendar file. . . . . . . . . . . . . . . .
154
A.4 This table lists all event entries such as SMS, MMS, voice and
data calls, SIM change. . . . . . . . . . . . . . . . . . . . . . . . .
155
A.5 This table lists all fields characterizing an SMS. . . . . . . . . . .
156
xv
Serva me, servabo te
Save me and I will save you
Petronius Arbiter
1
Introduction
1.1 Motivation
Backup is a crucial task, since hardware failures and software or human errors
can lead to the loss of important information. In addition to failures, backups
are even more important for devices such as laptops and smartphones, since
they are more prone to loss or theft. Currently, smartphones are used more
as handheld computers than as mobile phones, and consequently a lot of data
is stored in those devices. This makes the need to keep data stored on those
devices safe from losses more critical. In addition, the rapid technological evolution in mobile devices makes it more difficult to restore data saved from old
devices to new ones. Thus, mobile devices pose new challenges in the backup
and restore problem. Backing up data on external memory devices, such as
on Secure Digital (SD) cards or on laptop disks, suffers from the same risks of
failure or loss.
Moreover, the growth of cloud services, and the capability of modern smartphones to be always online without consuming too much power is pushing
backup systems to save data on line using cloud services. Unfortunately the
plethora of devices, operating systems and vendors available on the market
1
CHAPTER 1. INTRODUCTION
Cause
hardware or system failure
human errors
software corruptions
natural disasters
other
Percent
78%
11%
7%
1%
3%
cost
$9.36 billion
$1.32 billion
$0.84 billion
$0.12 billion
$0.36 billion
Table 1.1: Cost and causes of data loss
causes interoperability problems and often the user loses his/her information
in case the device fails, is lost or stolen and even in case of migration to a new
device.
1.1.1
How much does data loss cost?
Some studies report that a company that experiences a computer outage lasting
for more than 10 days will never fully recover financially and that 50 percent of
companies suffering such incident will be out of business within 5 years [1], [2].
Other studies by National Archives & Records Administration in Washington
show that 93% of companies that lost their data center for 10 days or more
due to a disaster, filed for bankruptcy within one year from the disaster; 50%
of businesses that found themselves without data management for this same
time period filed for bankruptcy immediately.
Statistics about data recovery [3], [4], [5] say that U.S. businesses lose over
$12 billion per year because of data loss. This loss is due primarily to hardware
or system failure, which accounts for 78%, human error accounts for 11%, software corruptions for 7% and natural disasters represent only 1% of all data
loss; Table 1.1 summarizes how each factor economically affects the loss.
Moreover, disaster prevention and recovery plans are often overlooked or
2
1.1. MOTIVATION
0.0 M$
Ba n k
Reta
e utic
Pha r
mac
Ins u
echn
o log
s
Infor
ma ti
on T
l Ins t
itu tio
n
ufac
Ma n
nc ia
Fin a
T e le
com
mu n
ic atio
En e r
gy
in g
0.5 M$
0.0 M$
als
1.0 M$
0.5 M$
il
1.5 M$
1.0 M$
ra n c
e
2.0 M$
1.5 M$
y
2.5 M$
2.0 M$
turin
g
3.0 M$
2.5 M$
ns
3.0 M$
Figure 1.1: Costs of data loss per industry sector (values are in million $ per year)
outdated, and more often are considered a boring and time wasting activity,
also because users have the perception that backup tools and techniques are
not 100% reliable. The 7th Annual ICSA Lab’s Virus Prevalence Survey [6]
says that file corruption and data loss are becoming much more common as
usually users cooperate on shared documents or resources, although loss of
productivity continues to be the major cost associated with a virus disaster.
Ontrack statistics calculate how much data loss costs for each industry sector.
The chart in Figure 1.1 resumes costs of data loss per industry sector [7].
The cost of losing data depends on the type of data. If an enterprise loses
historical data about room cleaning the loss does not represent a huge problem for the business. On the other hand, if archives containing contracts and
invoices data, architectural drawings, or the source code of a mission critical
software that should be rewritten by high skilled developers, are lost, then the
3
loss is huge. In the first case the institution will have to face legal problems,
due to law that regulates official data management; in the second the architect
will have to inspect all the areas interested by the lost drawings and redo the
work, in the third case the enterprise, will have to spend time and money to reimplement the software or will have to face a migration to a similar software.
As far as mobile environments are concerned, if a manager’s smartphone
fails and she loses her family pictures, this is not a huge problem; while if in
the failure she loses the address book, containing all her business contacts, this
represents days of work to recover part of these data; probably she will never
recover all the information lost and for a manager this is a great loss.
Moreover, in the last years that users store always more important data
such as pin codes or bank account numbers in their mobile phones or laptops
as they trust the reliability of such devices. For the users it is really comfortable
for day-to-day business to store private information into their mobile device as
it allows to access the information instantly. Unfortunately these devices are
subject to be lost or stolen. Cpp Fonesafe sets that, in Italy, every four minutes
a mobile phone is lost or stolen; AXA insurance in a report states that the majority of stolen devices are smartphones [8]. The phenomenon is even bigger
in other countries; in the UK, for example, 228 mobile phones are reported to
be stolen every hour [9]. In case the device is lost, usually the information contained in it is not interesting for the one who finds it; he will just reset and use
it. On the other hand if the device is stolen the information can be used by the
thief as he/she may know the owner and can exploit such information more
easily. A security layer must protect these data.
In any case when somebody loses his/her device, the most valuable thing
he/she loses is the information within the device so there is a need for a reliable
4
1.1. MOTIVATION
400
350
300
250
Smartphones
PC
200
150
100
50
0
2005 A
2006 A
2007 A
2008 A
2009 A
2010 E
2011 E
Figure 1.2: Smartphone and PC sales prevision in Million of units
mobile backup system.
1.1.2
Focusing on mobile
According to RBC analysts [10], the 2011 shipments of smartphone devices will
approach 400 M units equalizing PC sales. Figure 1.2 illustrates the trend for
2005–2011, “A” indicates actual values, “E” indicates estimated values.
Nokia is still the mobile device market leader, probably thanks to his policy
on low cost devices. Apple IOS and Android equipped devices are gaining
market share on Windows Mobile, Palm and Linux OS, while RIM Blackberry
is quite stable, probably because of its focus to business customers.
Smartphone OS’s diffusion changed in the last year; in 2009 Symbian was
leading the market with 52% followed by RIM 17%, Windows Mobile 12%,
iPhone 8%, Palm 2%, Android 1% and others 9% [11]. In Q4 of 2010 Android
gained a huge part of the market growing 886% year-over-year [12]; 2010 OS
market is still lead by Symbian with 38% of the market share, RIM has grown
5
100%
90%
Other (Palm, Linux)
Google Android
Microsoft Windows
Mobile
Apple iPhone
RIM Blackberry
Symbian
80%
70%
60%
50%
40%
30%
20%
10%
0%
Share Q3 2007
Share Q3 2008
Share Q3 2009
Share Q4 2010
Figure 1.3: 2007 - 2010 trend mobile operating systems market share.
reaching 16%, Apple IOS after iPhone4 launch gained 5 points holding 16% of
the market, but the fastest growing OS is Android, having 23% of the whole
market. Android’s growth is driven by key products from HTC, Motorola,
Samsung, Sony Ericsson and LG, among others, as they provide smartphones
running Android as operating system [13]. Figure 1.3 shows the trend of distribution for smartphone operating systems over the last 4 years, the figure alson
reflects the trend of sales for vendors.
The map in Figure 1.4 shows the spread of mobile devices in the world at
the end of 2009 [14], when more or less each person has a mobile device. In
some countries, such as the United Arab Emirates, a person uses two or more
mobile devices in everyday life. In the most cases, using more than a device,
forces the user to switch from a vendor, operating system or version of the
6
1.1. MOTIVATION
Figure 1.4: Mobile cellular, subscriptions per 100 people, 2009.
same operating system to another continuously.
Moreover the usage of more than a device spreads personal data on all the
devices, making it more difficult to search the information in all his/her devices. The solution is to keep devices synchronized but it is really hard to do.
It is even harder, if not impossible, if these devices are from different vendors
and if they run different versions or different operating systems. Currently, in
some cases, the easiest way to synchronize two devices is to manually copy
data from one to the other.
For example if a user wants to switch from a Symbian equipped Nokia
smartphone to an Android device she can synchronize her device with her
gmail account (if she has one), in order to have her address book copied to the
7
new device. One way to copy messages (SMS, MMS) is to use a migration tool
like SPB Migration Tool available on the market for 9.95$. Unfortunately, from
the users’ comments, it looks like the application does not work properly on
every source device; even Android OS versions are not fully supported (only
2.0 and higher). Moreover the application migrates Address book, SMS, MMS
and gallery data to Android and does not work if the user wants to migrate
from Android to another operating system.
Another way to move SMS from Symbian to Android is to install Nokia OVI
on a laptop, synchronize messages from the smartphone with OVI, download
and run Nokia2AndroidSMS.exe which should automatically find all datastores
created by Nokia OVI and automatically select the first one and generate an
XML file. Then the user should install SMS Backup & Restore on his Android
device, connect the phone to the PC and select “Disk drive” as connection type.
Now the user should copy the XML file into the SMSBackupRestore folder on
the phone and run SMS Backup & Restore to import messages.
Even such a “straightforward” procedure is one way, it works just from
Symbian to Android.
We explained here some examples on how to migrate from a device to another. To keep different devices synchronized, Microsoft Exchange can be used,
but the devices are just partially synchronized, SMSs are not updated.
It is clear that saving personal information and restore these information to
a new device, is not as simple as it should be. In some cases it is impossible to
save some kinds of data. In this introduction we did not mention application
settings, but it would be a huge save of time and pain to restore those settings
to a new device and have all applications, if available on the new platform,
already installed and configured.
8
1.2. OUR SOLUTION
Currently there is no solution which allows the user to backup data from
a device and restore it to a new device having all contacts, calendars, email,
SMS, MMS and even application settings available on the new device without
wasting too much time and with a painless procedure.
1.2 Our solution
The solution proposed in this thesis is to provide a common interface to exchange data between the plethora of devices present in the market. Such common interface is based on the structure of the data to be exchanged. The mobile
phone self-extracts the information to be backed up using the API provided by
the mobile operating system or saving the whole content of the device and
extracting the useful information in an ad-hoc server application.
As smartphones tend to be always connected to the Internet, it seems natural to move the information online and to provide backup and restore services
based on the cloud computing paradigm, which is considered to be more reliable and less expensive by end users [15], [16], [17]. This approach reduces
also the risk of data loss and decouples the data from a specific device.
Once backup information moves online, it can be used in several ways,
for example in a shared application or to extract social networks and profile
users. In an enterprise scenario, for example, it can be useful for users to share
business or personal data contained in their mobile’s backups, such as calendar
or business cards, with some selected contacts of their choice. At the same
time, the management could be interested in analyzing social relations which
naturally grow between employees, and exploit these relationships to build
workgroups.
In such a scenario, it is easy to imagine a community of people willing to
9
share some of their data within their mobile network. A backup that allows
data sharing, however, can suffer the same security and privacy issues present
in social networks [18]; such limitations can be approached in different ways
depending on the environment where the system is used. In an enterprise scenario, data sharing can be monitored by administrators which can enforce the
company privacy policies. In a general purpose environment, like a mobile social network, ownership of data must be verified and sharing must be allowed
only by the data owner.
1.3 Contributions
The goal of this thesis is to present a backup system for smartphones that allows users to share part of their personal backup data with a selected set of
contacts. In order to be platform independent, our approach is based on a
novel kind of management of data, and hinges on a data model which abstracts
from the underlying platform and focuses on the data type. The same backup
and restore method can be applied both on mobile and on desktop or interconnected TV platforms. With such system, users can manage different devices,
using different operating systems, and keep data synchronized across different platforms. In order to assess the feasibility and impact of our approach
in a real scenario, we built three prototypes of our backup and restore system
for Android, Windows Mobile OS (version 5 and 6) and for Symbian OS, and
tested them on actual mobile devices. Our contribution to this project covers
the following areas:
Smartphone data extraction : we proposed two different approaches to extract internal data from a mobile device and send these data to a remote
server using a common format, based on the structure of the data type to
10
1.4. THESIS OUTLINE
be exchanged between the mobile client and the server.
Smartphone data elaboration : we designed a methodology to reverse engineer raw data, coming from mobile devices, implement specific parsers
able to extract personal information and elaborate such information to
make it compatible with other devices.
Securing the system : we proposed a brand new key agreement algorithm
based on matrix conjugation method, a new model to implement secure
inter-process communication into the Android OS, and we verified the
usability of new encryption algorithms compared to standard ones in
mobile environments.
Services on backup data : we proposed some services using stored data to be
provided to users or to administrators of the backup system. Such services are just a starting point for other possible uses of data. The services
implemented are a shared backup and a social network extractor.
1.4 Thesis Outline
This thesis is organized into three parts. The first part introduces to the problem and describes the solution proposed.
First part is composed by Chapter 2 and Chapter 3. Chapter 2 is a survey
on backup techniques both on desktop and on mobile environments; Chapter
3 summarizes how we approach the backup problem, showing the proposed
idea. In this chapter we show the components of the system implemented to
allow users to backup and restore data granting interoperability between vendors, operating systems and versions.
11
The second part deals with the operations on data. This part is composed by
Chapter 4 and Chapter 5. Chapter 4 details the two methods, forensic style
and selection of interesting data, proposed to extract data from the device and
shows how these tasks are performed and integrated. Chapter 5 explains the
data reverse engineering methodology proposed to extract personal information from raw backup data, and how such data are managed to be made interoperable between vendors, operating systems and versions. This section
describes architecture of the server side and how the server stores backup data
coming directly from a device or from a raw backup.
Chapter 6 and Chapter 7 compose the third part. This part proposes some services to be provided to the users of the system. Chapter 6 describes the security
services deployed to secure the information. In Chapter 6 we explain the new
key agreement algorithm, the approach proposed to protect inter-process communication in Android and some considerations on the opportunity to use new
encryption algorithms or the standard ones on mobile environments. Chapter
7 shows some possible Value Added Services on backup data like the opportunity to share part of the backup with some selected contacts and the possibility
of extracting users’ social network using data from the backup and information
available on the web.
The thesis ends with a chapter which summarises the findings of the thesis
and considers directions for future work.
12
2
Backup & restore in the third
millennium
Introduction
According to [19], backups can be classified in several types; it is possible to
distinguish the data repository model in full backups, incremental backups
and differential backups, data can be stored in a file-based or a device-based
style, and the data repository management can be classified as online vs. offline; those approaches can be combined in different ways, according to accessibility, security and cost needs. In case of failure, a full backup is able to restore
the entire content of a device: this process is slow in the backup phase, introduces a huge overhead in the data stored, but allows for faster restores. On
the other hand, incremental backups reduce backup times and sizes but imply higher restore times. Backups can operate on files (file-based approach)
or on data physically saved on the disk (device-based approach): although a
file-based approach tends to be slower than a device-based backup, it allows
for more flexibility and it is easier to manage. Online backups permit to save
and restore data while the system is running, while off-line backups require the
13
CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM
system to be idle: online backups are more convenient, as they do not interfere
with the users’ work, but are more complex to handle, as the system needs to
deal with updates carried out during the backup. In all cases, backups can be
stored locally, e.g., on an external device , or remotely, e.g., on a remote server.
Backups for mobile devices can be stored locally on a SD card, or on a personal
computer, or remotely on a server accessible via network connectivity. Several
synchronization protocols have been proposed for mobile devices, including
Microsoft’s ActiveSync, HotSync for Palm OS devices, Pumatech’s Intellisync,
SyncML and CPISync. We refer the interested reader to in [20] for a detailed
analysis of these protocols. Google’s Android Sync, Google Sync and Apple’s
MobileMe are examples of applications enforcing data synchronization among
different devices through cloud services. The main problem in the existing mobile backup solutions is that they are usually bound to specific platforms and
vendors. Even SyncML [21], which was launched to provide an open standard
to synchronize devices with different OS, is confined inside the Open Mobile
Alliance companies’ products. Data sharing, like business contacts or calendar
events, among different users is spreading, but currently available solutions
(e.g., VCard [22] via SMS or Bluetooth) are still too complicated to use, as they
require physical proximity and suffer from lack of portability across different
platforms.
2.1 Backup features
In the following subsections we will describe in more detail the main features
of backups; i.e., full vs. incremental backups; file-based vs. device-based
schemes; support for online backups; the use of snapshots and copy-on-write
mechanisms; local and remote storage. All these features will be analyzed both
14
2.1. BACKUP FEATURES
for the mobile and for the desktop/server environments.
2.1.1
Full backup
The simplest way to protect a file system against disk failures or file corruption
is to copy the entire contents of the file system to a backup device. The resulting
archive is called a full backup. If a file system is later lost due to a disk failure, it
can be reconstructed from the full backup onto a replacement disk. Individual
lost files can also be retrieved. Full backups have two disadvantages: reading
and writing the entire file system is slow, and storing a copy of the file system
consumes significant capacity on the backup medium. Full backup is designed
to allow the entire device to be recovered without any installation of operating
system, application software and data. This kind of approach allows the user to
avoid the time expense in a full system recovery, the hours needed to rebuild
the device to the point of restoring the last data backup. So, a full system
backup makes a complete image of the device so that if needed, it can be copied
back to the device. To restore the system in such cases there is the need of some
specific software, such as for example Ghost.
2.1.2
Incremental backup
Faster and smaller backups can be achieved using an incremental backup scheme,
which copies only those files that have been created or modified since a previous backup. Incremental schemes reduce the size of backups, since only a
small percentage of files change on a given day. A typical incremental scheme
performs occasional full backups supplemented by frequent incremental backups. Restoring a deleted file or an entire file system is slower in an incremental
backup system; recovery may require consulting a chain of backup files, begin15
MORE
l
LESS
Backup speed
Incremental
Differential
Full
Restore speed
Full
Differential
Incremental
Information saved
Full
Differential
Incremental
Table 2.1: Comparison of backup approaches
ning with the last full backup and applying changes recorded in one or more
incremental backups.
2.1.3
Differential backup
A third schema between incremental and full backup is the differential backup,
the differential backup schema performs a full backup and later saves all files
modified since the last full backup. The main difference between incremental and differential backup style is that incremental backup saves all files that
have been changed since the last backup, whether it is a full, an incremental
or a differential backup, while differential checks for the type of backup performed and saves all files modified since the last full backup. This style performs faster than incremental backup, but slower than full backup to restore a
compromised device. In backup phase differential approaches faster than full
but slower than incremental. The storage needed to save data of backups is
less than full backup and more than incremental backup. Table 2.1 resumes
comparison between backup and restore techniques. Incremental and differential backup can be considered reverse delta approaches; in these schemata the
backup system stores only the differences between current and previous versions. Such kind of backups start with a full backup and periodically synchronize data with the live copy; data between live copy and full backup can be
16
archived or erased depending if the system wants to allow to recover to intermediate versions. Backup systems using suck approach are rdiff-backup and
Time Machine.
2.1.4
File-based vs. device-based
Files are saved on disk in logical blocks, these blocks are usually all with the
same size (e.g., 8 KiloBytes). A file in a working system will usually be saved in
blocks which are not contiguous. Backup software can operate either on files
or on physical disk blocks. File-based backup systems understand the structure of files and copy entire files and directories to the storage media; such
approach is really powerful in case one wants to recover or backup a single
file, unfortunately on huge backup operation on hard disks such approach is
slowed down by the seek times to reach file parts contained in non-contiguous
blocks. A file-based backup scheme even suffers the problem that even a small
change to a file requires the entire file to be backed up. In small files the problem is negligible but in multimedia files performances are strongly affected.
On the other hand, device-based backup systems make a low-level copy of the
content of the drive block-by-block; this improves backup performance on hard
disks, since backup software performs fewer seek operations. Device-based
backup, if performed with a reverse delta approach, performs better even on
bigger files as small modifications, even on big files, cost at most the size of the
modification more 7 KiloBytes. Unfortunately, this approach complicates and
slows file restores, since files may not be stored contiguously on the backup
medium. Moreover to allow file recovery, backups must include information
on how files and directories are organized on disks to correlate blocks on the
backup medium with particular files. This carries that device-based programs
17
are usually specific for a particular file system not easily portable.
2.1.5
Scheduled backup vs continuous data protection
Backup software can require the file system to be quiescent during backups,
usually these systems perform scheduled backups, online or active backup systems allow users to continue accessing files during backup. These kind of
systems can perform both scheduled backups and continuous data protection. In
continuous data protection data are continuously saved from the device to the
backup medium in a transparent way for the user. Online backup systems offer
higher availability but introduce consistency problems; another problem introduced by such kind of systems is that device resources are consumed by the
backup system continuously performing operations in background. In a server
or desktop environment resources consumed by the backup system do not affect usability of the system but,for example, on a mobile device with limited
capability, it is really important to save resources for the user interaction, and
battery to grant the device autonomy. By contrast, scheduling backups save
operations are performed in given moments (e.g., once a week). This approach
do not grant that data are continuously protected, but have the advantage to
be less resource consuming, as operations are performed rarely with respect
to continuous data protection. Another advantage is that operations do not
interact with user’s activity, if the backups are scheduled in a smart way. In
mobile devices for example operations can be performed when the device is
idle, for example during the night. In the describer cases we talk about backup
performed when the device can run programs, so file system can be modified
during the backup, this can lead the backup create inconsistency in files saved,
a possible solution of such problem can be performing a “snapshot” of the
18
filesystem in a consistent time and make the backup of the snapshot. There can
be the need to create a snapshot when the approach is full backup style or is the
first execution of a reverse delta approach, in other execution of incremental or
differential backup can be followed a copy-on-write scheme; in this scheme
each time a file is modified the snapshot is updated and kept consistent with
the live copy.
2.1.6
Local backup vs. remote backup
Backup data can be stored in several locations; historically backups were saved
on magnetic tapes labeled both internally and externally to avoid losing backup
data. Unfortunately magnetic tapes are not really reliable as tapes are prone to
wear and to magnetic capacity loss. Currently backup data are stored on hard
disks or other media. A backup can be considered local when the media where
data are saved is locally connected to the device backed up (e.g., a second hard
disk mounted on the same computer where reside data to be saved). A remote backup is the case when data are saved in another computer; this remote
storage can be an ftp server inside the LAN or a server accessible through the
Internet. Saving data locally the backup and restore processes are performed
faster than using a remote resource as transmission times, the real bottleneck
in such kind of operations, are saved. On the other hand performing backup
operations remotely grants a better level of safety in case of theft of a PC for
example if data are saved locally, in a second hard disk installed on the device, the second hard disk will be stolen with the device. The same problem
can happen with an external hard disk, for example for laptops, if the laptop
is stolen or lost it is really probable that the external hard disk is contained in
the laptop’s bag. If data are saved remotely it is really improbable that both
19
the backed up device and the data container are lost or stolen in the same time.
Remote approach grants a better level of reliability for the user saving the data.
Cloud backup systems [23], [24] are the last frontier of the remote backup, such
kind of system grant the best reliability even if times to backup and restore are
slowed down by connectivity that is the bottleneck of remote backup systems.
Unfortunately using these approaches the final user is locked to the particular provider; the cloud solutions available need a specific software installed
both on the client and on the server. The backup software optimizes backup
operations but reduces the portability.
In [25] authors propose cloud backup approach based on simple operations
available in every remote storage system:
Get: Given a pathname, retrieve the contents of a file from the server.
Put: Store a complete file on the server with the given pathname.
List: Get the names of files stored on the server.
Delete: Remove the given file from the server, reclaiming its space.
The method proposed moves all critical operations to the clients, the server
must provide just the interfaces to perform the four operations listed above.
Such approach should ease migration to new costless and more powerful solutions, moreover the backups could be stored exploiting more providers located
in different geographic areas to increase fault tolerance even in case of natural
disasters. A similar approach can be used to backup data stored into mobile
devices.
20
2.2. MOBILE
2.2 Mobile
Data stored on a mobile device are usually critical for the device’s user, when
data are lost the effort required to recover all the information saved on the device is really high, and sometimes it is impossible to recover all the information
stored on the device. Moreover mobile devices are subject to be lost, or stolen,
even more than laptops and desktop devices; furthermore they suffer storage
and performance problems. For these reasons usually backups are performed
on remote devices such as the device owner’s laptop.
2.3 Local backup for mobile device
Following the desktop idea a local backup should be performed on a storage
media directly connected to the device, e.g., a memory card. For mobile device
storing backups on the user’s laptop can be considered as a local backup; this
kind of storage suffers, more or less, the same problems of local backups in
laptop environments. Saving data on the device memory card offers a good
level of usability, the backup can be done transparently for the user. Backup
process can run in background saving data when are modified on the device.
Saving data on the device memory card can be useful in case of migration,
but unfortunately gives no reliability in the case the device is lost or stolen.
Saving backup data on a laptop increases the reliability of the backup system;
usability is affected by the need to connect the device to the laptop. Usually
backup from mobile device to laptop is performed using Bluetooth or usb cable connections, this makes necessary that mobile device is near or connected
to the laptop, connection operations for some classes of devices (basically old
devices) is not usual for users, so backups are performed rarely with the conse21
quence that saved data are not updated when restore is performed. For other
classes of devices, Android or iPhones connecting the device to the laptop can
carry malware infection propagation [26].
2.4 Remote backup for mobile device
Improved 3G and 4G connectivity features provided for mobile devices have
opened in the last years the possibility to perform backup remotely using the
Internet. Such approach is characterized by issues and advantages described in
Section 2.1.6 for desktop devices. Due to the reduced hardware capability typical of mobile devices performance problems are increased. Furthermore mobile devices suffer battery autonomy problems, and, for a mobile device, using
connectivity features increases significantly the battery consumption. On the
other hand saving remotely backup data using network connectivity allows
the backup system to run as a background application and transparently keep
updated backup data with device data. Moreover saving data remotely can increase usability of the system; there is no tedious need to connect the device to
a laptop using cables or via Bluetooth. Obviously there is no need of proximity
between device and storage media; this allows the system to perform backup
more frequently when the device is idle, freeing resources when needed by the
user. If backup system storage media is based on a cloud architecture, the reliability of the whole system is increased; cloud backup systems allow users
to access their personal data in a secure way from different platforms. Cloud
based approach even free the user from managing his/her personal backup
files avoiding potential errors or data loss due to user’s errors.
[27] proposes a collaborative mobile backup approach based on a peer-topeer architecture, the approach is interesting but opens several security and
22
2.4. REMOTE BACKUP FOR MOBILE DEVICE
privacy problems. The information stored on mobile devices is usually really
personal. Owners are used to store pin codes, private messages and other
information which suffers of privacy issues on their mobile device. Backups
stored on others’ peer memory could be analyzed or modified by the owner of
the peer where data are stored making these data useless or unavailable when
needed.
23
3
Our approach to backup
Introduction
Several solutions to solve the backup and restore problem for desktop and
server systems are available. The heterogeneity of mobile environments makes
the backup problem harder. New vendors, operating systems, versions and
new devices frequently appear in a changing market and, new solution are
proposed continuously to solve the problem vertically for each new device/platform launched.
In this Chapter we show the new approach we proposed to handle backups from heterogeneous mobile devices and grant interoperability with new
devices. The main results obtained applying the proposed approach have been
recently presented in the 4th IFIP International Conference on New Technologies, Mobility and Security [28].
Some of the ideas presented in this chapter, and more in general in this
thesis are being applied in the Telecom Italia CuboVision backup project.
25
CHAPTER 3. OUR APPROACH TO BACKUP
3.1 A new approach to backup & restore
Our approach tries to overcome the limitations in saving and restoring data
from mobile devices, by using online remote backups as a uniform interface
for sharing data among different users and multiple platforms. In particular,
we present an online remote backup system based on a Service Oriented Architecture (SOA): the services offered by our solution allow to backup and restore
not only files but also more structured data such as contacts, calendar events,
and text messages (SMS). In order to be able to access those services, a mobile device must be equipped with a client capable of retrieving internal data
from the device and sending them to the server via a common interface. This
interface is designed so as to exploit the common features of mobile data models: e.g., independently of the platform used, a contact in an address book is
always identified by fields such as first name, last name, address, phone numbers, etc. . . etc. . .
All the communication exchanged between the client and the server is based
on an extensible standard language (i.e., XML). The communication format for
each kind of data is detailed in Appendix B.
Thanks to the data common interface, data saved on the server are available for all types of devices, mobile or not, equipped with the client application. This grants interoperability between vendors, platforms, and operating
systems. Using our general data model, backup data can be shared among different users: this allows to share part of the backups that transparently are kept
updated on all the devices that can access the information.
In our architecture (see Figure 3.1), the server provides his services using a
representational state transfer (REST) architecture [29], [30]. For each type of
26
3.1. A NEW APPROACH TO BACKUP & RESTORE
Persistence: DBMS
Application: Business Logic
Web Services: REST API
Internet
Secure connection
Common Format: XML
Figure 3.1: Backup and Restore system architecture.
platform, a different client is implemented: each client connects to the server
using the HTTP protocol to exchange information in XML format. In the following, we describe in more detail the main functionalities implemented by
the server and the clients.
27
<contact>
<email>[email protected]</email>
<given_name>NAME</given_name>
<phone_number_list>
<phone_number>
<number>*********</number>
<type>2</type>
</phone_number>
<phone_number>
<number>********</number>
<type>1</type>
</phone_number>
</phone_number_list>
<backupItem>
<timestamp>2010-07-07 12:20:12.997</timestamp>
</backupItem>
<status>new</status>
</contact>
Figure 3.2: Example of data model for a contact.
3.1.1
Server
The server has been designed as RESTful: in a REST architecture requests and
responses are built around the transfer of representations of resources. In our
case, a resource is the XML representation of its state, for example a contact
(e.g., in Figure 3.2) or a contact list.
A REST architecture is based on the HTTP protocol and uses all the HTTP
facilities, such as the security layer provided by HTTPS in a transparent way.
The server allows mobile clients to perform full backups and incremental backups. When a user performs a backup, all the user’s data previously stored on
the server are still accessible from the mobile client; old data are kept on the
server, and made accessible to the mobile client, to allow the user to revert to
28
3.1. A NEW APPROACH TO BACKUP & RESTORE
https://someserver.com/backup/{backupType}
/device/{imei}/contacts/{contactItemName}
Figure 3.3: Example of a request of a contact.
old backups in case of loss or failure.
Our server implementation offers two REST methods: PUT, used to insert
new entries on the server’s database, and GET that allows the mobile client to
perform queries for a single entry or for entry lists. Figure 3.2 shows a typical
body of a PUT request to the server: the body will contain the XML representation of the serialized object (in this specific case the entity saved is a contact
item). In Figure 3.3 we show a typical URI of a PUT/ GET request (this specific
case shows a request for a contact).
When receiving a GET request at a URI as shown in Figure 3.3, the server
will answer with the “contactItemName”, for the “imei“ device from the “backupType” resource using the XML shown in Figure 3.2; otherwise, if a PUT request is received, the server expects in the HTTP(S) request, the contact details
to be processed.
3.1.2
Client
The client can be implemented for different types of devices (mobile, desktop,
game console, Internet TV etc. . . ). The software should be implemented to access private data residing on the device and to send such data on a remote
server which will store these data. Clients must be able to handle HTTP messages bodies, get data sent by the server and store them into the device, for
example in the address book, in the device specific format.
29
GET(id_list)
id_list
PUT(id_x1)
PUT(id_xn)
Figure 3.4: Example of client server interactions.
Usually devices need to be built on purpose to interact with a backup server;
in some cases they need to handle dirty flags in order to manage the status of
the resources to be saved. In our approach, in order to interact with the server,
clients need only to be able to read and write resources to be saved and to
implement just some basic HTTP methods.
To improve the performance in incremental backup operations the client
may handle the list of items to be sent to the server. Figure 3.4 shows a typical
interactions for an incremental backup. First, the client asks the list of identifiers of the items in the last backup, the server sends the list of the identifiers
to the client in a XML format with the last backup date. At this point, the client
computes its internal list of identifiers and compares the two lists: now, the
client knows all the data that have been updated in the device, and can build
30
3.2. SHARING BACKUP DATA
the list of modified contents. If last modification date in the client’s list is most
recent than the one in the server, then the client adds the item to the list of data
to backup. Note that our approach ensures compatibility with all old devices
that can run third party applications able to access private data. The restore
process gets the list of items on the server and saves all contents on the client.
If there are some contents that appear on the client but not on the server, such
contents are preserved in the client. In case of migration to a new device, or restore after a hard reset, the device is empty so the device contents after restore
will be those contained in the last backup.
3.2 Sharing backup data
Following a user-centric idea of the collaborative Web, the proposed approach
for sharing data among different users and different devices can be often useful. It is easy to imagine a community of people willing to share some of their
data within their mobile network. In a closed group of people, such as friends
or work colleagues, usually some class of data stored in mobile devices are
the same for the entire group. Collaborating people usually share each others’
mobile phone numbers, emails, calendar, addresses documents and so on. In
an enterprise scenario, for example, it can be useful for people to share business cards or calendar events contained in their mobile’s backups, with some
selected contacts of their working group. At the same time, in such a collaborative backup, it will be easier to recover data loss even if these data were not
saved in a personal backup; in fact in a closed group that collaborates, it will
be easier to asks one of the members for some data that a member lost and
another still owns. Members of the same social community tend to share more
or less the same data [31, 32]: if a member of the group changes his/her mo31
bile phone number he/she will have to spread his/her new number to all the
network; in the same way if a new member joins the group other members will
have to save his/her contacts in their mobile device. The approach proposed
her aims at speeding up the sharing of updated information between project
teams, study groups or more in general social communities.
3.3 Social network analysis
As a side effect of building a shared backup system, we have data allowing us
to analyze the social network of the users participating the shared backup. The
available information is not only that available into the backup’s data; we can
access a lot of information that the user spreads on the web both consciously,
using services where the user wants to give information about himself (i.e.,
linkedIn, myspace, flickr), and unconsciously signing up to other services,
mailing lists, and so on. Even college web sites sometimes give information
about students such as matriculation number or even notes. In such context
we are able to access a lot of information which have no sense or is quite useless if not filtered.
We can use shared backup data to filter this information and profile the
user building his/her social network and crossing such social network with the
other user’s social network. Once we have built the user’s social network, we
can use such network to build workgroups, use this information for marketing
purpose or Customer Relationship Management. Section 7.2 details how we
can build a social network getting data from a mobile device backup and from
the web.
32
3.4. SECURITY
3.4 Security
A backup that allows data sharing, however, can suffer the same security and
privacy issues present in social networks [18]; as personal data are more affected from privacy issues than common ones, both for interest they arose and
for the problems a data theft can carry to the user, some additional measures
to grant privacy should be applied. Depending on the size of the sharing
group, privacy issues can be approached differently. In small and medium
groups an administrator can handle permissions and grants access to data to
users. For example, in a medium scenario like an enterprise, data sharing can
be monitored by administrators which can enforce the company privacy policies. For bigger groups like a widespread social communities, privacy cannot
be demanded to security managers or privileged users; each user must prove
his/her ownership on data he/she wants to share. For example, if a user wants
to share an email contact, the system will send a verification code to this email
address and the user will have to prove ownership replying to the challenge.
For such approach privacy issues are more challenging than security; a security
layer is provided deploying secure connections and data encryption. Communication security is provided using HTTP over TLS connections while data encryption can be transparently done via common DBMS encryption functions.
For some classes of information (e.g., calendars) time limited sharing could
be an improvement to grant privacy to user that are interested to share data
just for a limited period with someone. Since mobile devices introduce a new
feature applicable to backup and sharing: the geographic position, location
limited sharing could solve the problem of a user that wants to share data just
with person in a certain geographic area. Obviously all these solutions should
33
be combined to grant different levels and configuration of privacy settings.
In Chapter 6 are detailed some results that can be applied to grant an higher
level of security to the whole system.
34
4
Data extraction
Introduction
The backup process is basically divided into three steps; the first is to get the
information from the memory of the device, the second is to save such information in a different store and the third is to restore the information from the
backup into the device. Extraction, for mobile devices, is one of the most challenging problems due to the differences between devices and operating systems. In this Chapter we describe how we solved the extraction problem.
We approached the problem in two different ways; a forensic style extraction and a smarter approach based on an extraction performed using the underlying OS’s APIs.
The main results of the application of the forensic approach have been presented in the 2008 High Performance Computing & Simulation Conference
[33]; the improvements of the extraction methodology have been later published on the International Journal of Electronic Security and Digital Forensics
[34]. The generalization of the extraction methodology and the testing results
35
CHAPTER 4. DATA EXTRACTION
on Windows Mobile and Symbian S60 operating systems have been published
as Chapter 19 of the Handbook of Electronic Security and Digital Forensics [35].
Currently Italian Carabinieri are experimenting the MIAT tool (a tool based on
the extraction approach presented in this chapter) to forensically extract information from seized devices.
The extraction approach which exploits the operating system’s API has
been used to extract information from more powerful devices, such approach
has been presented in the 4th IFIP International Conference on New Technologies, Mobility and Security [28].
Different ways for information extraction
In the plethora of devices, vendors and operating systems present in the mobile
market, and as continuously new devices, implementing new technologies,
giving more and more capabilities to the user are deployed, getting data from
the device internal memory and restoring these data should be approached in
different ways for each different case. This is due to limitations specific for
each platform, operating system and even version of the operating system.
In such scenario the easier way is to approach the device backup is to implement a differential (see Section 2.1.3), file based (see Section 2.1.4) scheduled
backup (see Section 2.1.5) with a snapshot approach. Such approach is really
powerful in case the user wants to restore all the device, replacing even configurations and system files as they were at the backup moment. A detailed
description on how such approach can be performed is given in Section 4.1.
Such approach suffers the problem that the backups are not easily portable
from one model to another and it is impossible to restore a backup performed
with such approach to a different vendor’s device.
36
4.1. FORENSIC STYLE APPROACH
Another approach is to extract information just from some selected files,
these files containing personal information such as contacts, calendars, text
messages (SMS), multimedia messages (MMS) etc... Such approach needs to
put more logic on the mobile device but allows the server side to manage in a
easier way data coming from the devices. Filtering and pre-formatting information on the smartphone even allows the server to manage data in a more
flexible way. In this way server is enabled to handle backup data independently from the client where data are backed up. Such approach to backup
grants a higher interoperability, allows the user to migrate easily from a device/vendor/operating system to another. Moreover applications can use facilities given by modern operating systems to access personal information inside the device. This approach is described in Section 4.2 with further details.
4.1 Forensic Style Approach
In this section it is described how the backup can be performed iterating recursively on the filesystem and performing a snapshot of the state of the device.
Such approach is near to the forensic technique described in [33], [34], [35],
[36], due to the reduced capability of the smartphone internal memory such
memory can be copied to the external memory (i.e., the SD card or the MMC).
After the internal data, the most critical, have been saved to the external memory such data can be sent to a remote server, described in Section 5.4 via WIFI
or 3G connectivity and later elaborated (see Section 5.1) as data contained in
the snapshot can be used for the restore.
We developed a forensic tool, available for Windows Mobile 5 and 6 and
Symbian, to extract data from inside the memory of a smartphone granting
the non modification of the content of the files. Such tool is able to create a
37
logical dump of the internal memory of the device into the external memory.
In some cases the tool modifies some files inside the device memory or was
not able to save all system files from the internal memory. Luckily these files
are not key files in the backup process we propose. Moreover to perform the
extraction forensically the device in some cases must to be restarted to allow
the application to gain privileges to unlock system files. In the backup case
there is no need to access these files, so the device does not need to be restarted
and the tool can run in background on the device.
The external memory content does not contain locked files, so contents can
be sent to the server without considerable problems, and without the need of
performing a snapshot.
4.1.1
Our methodology
The approach we propose focuses on acquiring data from a mobile device’s
internal storage memory, copying data to an external removable memory (like
SD, mini SD, etc.). Such task is performed without the need of connecting the
device to PC. Thanks to this the backup process is really easy for the user; when
the device is idle it performs the logical dump and when the dump is complete
it can be sent to a remote server. The complete data extraction process is shown
in Figure 4.1.
The extraction tool spiders all the mobile device filesystem recursively, for
each file performs a hashing of each file before and after the copy, to ensure
acquired information integrity. The report containing file hashes is saved in
a log file (checksum.xml). The extraction tool also compiles a log file named
info.xml with all remarkable events and another log file summarizing the er-
38
Start
more files?
no
Stop
yes
MD5
open
opened?
no
copying using
specific OS apis
yes
normal chunked
copy
check integrity
MD5
Figure 4.1: Data collection workflow
rors encountered namely errors.xml (Table 4.1 shows log files produced by the
extraction tool). Log files are saved using an XML format.
Data stored in the original memory card can be even acquired using a MMC
or SD reader (USB or integrated): binary data are read from source, then stored
as an image file, representing all the single bytes, including file system’s metadata. After that, it is possible to analyse the file allocation table to recover, in
some cases, even deleted data.
4.1.2
Symbian implementation
Extraction tool for Symbian, was developed to support and to test the methodology described above. Symbian is an operating system derived from the Epoc
39
File
checksum.xml
info.xml
errors.xml
Contents
File size, file typology, file name, MD5 hash, extraction, duration, and creation, access and modification time.
Information about the device (IMEI, device ID, platform type, model, manufacturer), and about the extraction process (duration, battery consumption
date of extraction).
Information about errors that may happen during the process.
Table 4.1: Files generated during the Extraction Process
operating system; Symbian OS supports a wide range of device categories with
several user interfaces, including Nokia S60, UIQ and the NTT DoCoMo common software platform for 3G FOMATM handsets. The commonality of Symbian OS APIs enables development that targets all of these phone platforms
and categories. In order to produce executable code which does not need of
any other software layer (e.g., a JVM to interpret the bytecode) The application
was originally developed in C++, the native language of the Symbian OS.
Most relevant files are locked by system processes, many files on the system are always open and locked by system processes. For example the file Contacts.cdb, which contains the database of contacts, is locked by PhoneBook that
is the address book process. In the past ([36]) we made use of the OS Backup
service to perform seizure of locked data. Such service is an utility allowing
the backup of the memory contents, even if these contents are locked. An application or a service can register itself and the files which locks. The Backup
Server notifies a backup request to registered applications, so they can release
the lock temporarily. Once the file had been saved, the application could notify this to Backup Server and then the system process could re-acquire the
lock. In a recent work ([37]), we adopted a further alternative way to get access to locked files. This way is accomplished by the Symbian RFs API method
40
ReadFileSection that allows a file to be read without opening it. By this
method it is possible to seize the entire file system tree including files which
have a persistent lock on; furthermore this strategy preserves integrity because
the access is established in read-only mode, guaranteed by the OS.
There are some files and folders which are more relevant for the backup
specific case, in Symbian S60 case these files are:
• Calendar, containing the memo, daynotes, meetings, anniversaries;
• Contacts.cdb, containing the contacts available from the address book.
• Mail, is a folder containing all SMS/MMS/Email files with Sender, receiver and body.
• Images, is a folder containing all pictures taken by the user or available
on the gallery application.
• Video clips, is a folder with user’s video recordings or video downloaded or received.
• Sound clips, is a folder where the system saves user’s audio recordings,
ringtones and received audio files.
4.1.3
Windows Mobile implementation
The tool implementing the methodology described above has been realized
and tested even for Windows Mobile 5 and 6. Due to the differences between
the two environments, the realization of the tool for Windows Mobile is not
a porting of the Symbian version. Implementing the Windows Mobile version
required a design phase as problems to be faced where different from problems
faced implementing the Symbian version.
41
PocketPC internal memory and storage architecture
In Windows Mobile 2003 PocketPC and earlier, device’s memory was split in
two sections: a ROM section, containing all operating system core files, and
a RAM section aimed in keeping the user storage (Storage Memory) and the
memory space for running applications and their data (Program Memory). The
user can choose the amount of memory to be reserved to Storage Memory and
then to the Program Memory. The RAM chip was built on a volatile memory
scheme, so a backup battery was required to keep the RAM circuitry powered
up, even if the device was just suspended. In case battery power supply went
down, all user’s data were lost. Such scenario forced user to recharge battery
within a time limit of 72 hours (as mandatory by Microsoft to devices manufacturers).
RAM
ROM
64M
64M
Core OS Stuff
User Storage
32M
32M
Memory sizes reported could change among different PPC models.
Figure 4.2: Windows Mobile 5.0 memory architecture.
Since Windows Mobile 5, memory architecture was redesigned to implement a non-volatile user storage. Currently, the memory is split in two section
(see Figure 4.2): the RAM is aimed to hold running processes data, whereas the
42
ROM keeps core OS code and libraries (called modules), the registry, databases
and user’s files. Such memory, also called Persistent Storage and contained
within a flash memory chip, can be built using many different technologies
[38]:
• XIP model, based on NOR memory and volatile memory, this technology
enables device to store modules and executables in XIP (execute-in-place)
format and allows the operating system to run applications directly from
ROM, avoiding to copy them first in the RAM section. NOR memory has
poor write performance.
• Shadow model, which boots the system from NOR and uses a NAND for
the storage. This model is power-expensive, because the volatile memory
requires to be constantly powered on.
• NAND store and download model, which reduces costs replacing NOR
with OTP (one-time programmable) memory model.
• Hybrid store and download model, which mixes SRAM and NAND,
covering them with a NOR-like access interface (to support XIP model).
Windows Mobile 5 and above place the great part of the applications and
system data in the Persistent Storage. Core OS files, user’s files, databases
and registry are seen by applications and users in the same file system tree,
which is hold and controlled by the FileSys.exe process. Such process is also
responsible for handling the Object Store, which maps objects like databases,
registry and user’s files in a contiguous heap space. The Object Store’s role is
to manage the stack and the heap memory, to compress and to expand files, to
integrate ROM-based applications and RAM-based data. For a comprehensive
43
explanation about how Windows Mobile uses the Object Store and manages
linear flash memory, see [39] and [40].
The strategy for storing data is based on a transactional model, which ensures that store is never corrupted after a power down while data is being
written. Finally, the Storage Manager manages storage devices and their file
systems, offering a high-level layer over storage drivers, partition drivers, file
system drivers and file system filters.
Algorithm 1 Extraction
Input: A path p.
Output: none.
for all objects obj (files and directories) in p do
if obj is a directory then
Create a directory named p in the SD Card
Recursively call Extraction(p/obj)
else if obj is a file then
Compute MD5 hash of obj
Copy obj in path p on the SD Card
if obj has not been copied then
Access to obj with CEDB APIs
if obj could be accessed then
recreate a similar database in path p on the SD Card
end if
end if
Compute MD5 hash of the copied obj on the SD Card
end if
end for
Implementation details
We have chosen to develop the application using a native C++ approach, fulfilling the requirement of having a tool to be launched from an external mem-
44
ory card, without the need of a pre-installed runtime environment (like java
virtual machine), neither the need to install the tool on the device. The application runs in stand-alone mode, and it does not require any third party’s dll.
Since the tool uses the standard Windows Mobile APIs to access the file system
(like Open, Read and Write, FileCopy), we can reasonably think that these
APIs will not change in future versions of OS: then the forward compatibility
can be assured. In Algorithm 1 is depicted the pseudo-code of the seizure process, that starts after the main application killed all the other non-vital running
processes.
Such algorithm performs two main tasks:
• the copy task, which copies all internal memory’s files of the mobile device on the memory card;
• the hash task, which ensures the integrity of the copied files and allows to
discover which files have been modified during the seizure process.
The Extraction algorithm works using APIs like CopyFile, Open, Close, and
it copies recursively every internal file system entry on the memory card. This
task preserves the directory structure, copying files according to their original
position. The hash task computes the MD5 hash of each file found in the device
internal memory. Hashes are written in a log file saved in a separate directory.
The hash task can be launched as a separate function, and it surfs the whole
filesystem to compute hash of every file.
The Extraction algorithm invokes the hash function before and after the
copy of every single file, allowing to understand if changes happen during
the copy from the internal filesystem to the Storage Card.
As reported in Section 4.1.3 talking about internal memory and storage
45
architecture, Windows Mobile places OS’s stuff in a lot of file-like objects in
the same file system seen by the user (under /Windows directory). Most of
these files are inaccessible by the standard file system APIs because they are
objects that are in XIP format: most of the headers are removed and the addresses are fixed up so that the programs are able to run with no need to be
loaded into RAM first. The binary has been stripped down and customized
for that particular device [41]. Such files are also flagged with file attributes
like FILE ATTRIBUTE INROM and FILE ATTRIBUTE ROMMODULE. Our application skips these files: there is no reason to look for a method to access such
files because they are firmware’s modules and they could be replaced with new
ones only by an advanced user (using the ROM flashing technique - e.g., if she
is willing to upgrade her firmware with a new version of the operating system
or she want to modify things like bootsplash). Moreover, there is another set
of files that cannot be accessed by standard APIs: these files are database objects locked by operating system processes which cannot be killed. We reach to
access their data using CEDB APIs and we are able to recreate such files in the
external memory card. In Table 4.2 it is shown where most relevant data about
user and system are stored in the file system.
Experimental results
The Windows Mobile extraction tool has been tested on a physical HTC device
and on a emulated one (on a Windows XP computer). The extraction tool saves
all the files containing the user’s information to be backed up. We noticed
from hashes that some files have been modified, this is due to the fact that for
some files it was necessary to create a new file and refill it with the original
46
Filename
System.hv
User.hv
Default.vol
Location
/Documents And Settings/system.hv
/Documents And Settings/default
/user.hv
/Documents And Settings/default.vol
Mxip system.vol,
Mxip lang.vol,
Mxip notify.vol,
Mxip initdb.vol
Cemail.vol
/
Pim.vol
/
/
Description
System registry hive.
User registry hive for default
user.
Object store replacement volume for persistent CEDB
databases. This file contains
MSN contacts
Metabase volumes, including language-specific data
and storage for notifications.
Default SMS and e-mail storage.
Personal Information Manager (PIM) data, such as address book, schedules, SIM
entries, call logs.
Table 4.2: Windows Mobile 5.0 relevant files
file contents using CEDB APIs. In Table 4.3 are shown these files encountering
problems in saving phase, in the right column it is possible to see if the final file
√
has been saved (−), differs (?) or not ( ). As previously described OS’s core
files were not saved because these files are just virtual files.
The testing phase, have been performed on a AMD Athlon64 X2 Dual 1GB
Ram PC and a QTEK9000 PDA (HTC Universal), equipped with a Kingston SD
2GB.
4.1.4
Some remarks on this approach
In this section has been discussed a methodology to extract data from a smartphone recursively copying the internal memory filesystem content to the external memory. To prove the effectiveness of the solution two prototypes have
been implemented, one for Symbian S60 (Figure 4.3 (a)) and another for Win47
File
Cosistency
/Documents And Settings/default.vol
?
/Documents And Settings/system.hv
−
/Documents And Settings/default/user.hv
−
/Windows/*.dll
−
/mxip notify.vol
?
/cemail.vol
?
√
/mxip system.vol
√
/mxip lang.vol
√
/pim.vol
− file not copied
?
√file copied but its hash does not match
file copied and hash matches
Table 4.3: Extraction tool consistency analisys
(a)
(b)
Figure 4.3: (a) Symbian S60 tool’s screenshot, (b) Windows Mobile tool’s screenshot.
48
4.2. SELECTION OF INTERESTING DATA
dows Mobile 5 and 6 (Figure 4.3 (b)).
The prototypes have been tested on a set of real devices, and results of the
testing prove that the solution is able to extract internal device’s files containing
personal user’s information and settings. For sure the application could be
improved to support more recent devices, such as the brand new Windows
Mobile 7.
Unfortunately this approach is not sufficient to have an interoperable backup
and restore system; the logical dump can, and have been, used to restore devices of the same vendor and model from where the dump have been extracted.
The logical dump can be analyzed using the methodology proposed in Chapter 5 to extract interesting data that would allow to abstract from the specific
device and focus on data.
4.2 Selection of interesting data
Better interoperability, between devices from different vendors, can be granted
delegating the extraction and part of analysis of data to the mobile client. In our
approach the application focuses on how data are structured into the device
memory than on the internal system structure. Such application is installed on
the mobile device and acts as a client that filters personal data and configurations present on the device, formats it following the common format proposed
in Chapter 3 and sends that information to a remote server which interprets
the format and saves the data into a common database.
All smartphone’s operating system provide APIs to access internal databases
containing personal data such as address books, calendars, notes, messages
(SMS, MMS, emails); such APIs can be used to collect data to be sent to a server
to perform a remote backup (see Section 2.1.6) of the mobile device.
49
Unfortunately these API are usually full featured developing the application in the operating system’s native programming language (i.e., for IOS,
Objective-C; for Android, Dalvik Virtual Machine Java interface; for Symbian,
Symbian C++; for Windows Mobile, Visual C++ or .NET). Portable source code
such as J2ME cannot access some contents or has writing limits for some others. Moreover Java virtual machine is not available for all operating systems
(e.g., IOS) or has a different implementation (e.g., Android’s Dalvik) and the
J2ME code is not fully portable. Another problem of implementing mobile applications in non native languages is due to the execution speed and resources
consumption due to the virtual machine effort.
Considering the limitations due to develop a, more or less, portable client in
J2ME, and the difficulties due to implement a client for each operating system
using the specific native programming language. The better approach to follow
is the second which with a little bit of developing effort offers a more stable,
performing and powerful backup client application.
The implemented applications will retrieve data from the internal databases
of the mobile device. These data will be sent using the REST web services provided by our backup server using the proposed data model.
Our data model allows different OS to communicate, in particular we describe how it is possible to backup data on a Symbian S60 device and store
them in a remote server and then restore them in an Android 2.1 device. We
choose to implement firstly our clients on Symbian and Android to show how
older and newer devices can easily cooperate with our approach.
50
4.2. SELECTION OF INTERESTING DATA
4.2.1
Symbian
We realized the backup and restore client for Symbian, with a basic user interface just to show how collaboration was possible. The Symbian Socket framework was used to establish a TLS connection with the server. Symbian’s CActive
allows to perform long running task in background and realize an asynchronous
communication with the server, this behaviour is similar to Android’s AsyncTask class. Asynchronous communication between client and server grants
that the user can perform other operations while the application is running,
this allows to run the application in background while the user continues using the device.
To access data it was necessary, for each data type, to open a session with the
respective servers, which manage the communication with underlying databases
or files:
• to extract address book data from Contacts.cdb, the CContactDatabase
class has been used. This class gives access to all the contacts databases.
• to handle Calendar data a client server session is necessary, to get access
to calendar data CCalSession object must to be used;
• to get access to messages, (SMS, MMS and Emails) it is necessary to establish a communication channel with the Message Server through the
CMsvSession::OpenSyncL() method;
• all the other files present in the multimedia folders (see Section 4.1.2) such
as pictures or videos can be accessed directly as files using the approach
described in Section 4.1, and sent to the server.
51
4.2.2
Android
On Android we developed a complete prototype, we designed a user interface that allows to choose the type of data to backup (i.e., contacts, calendars,
files, SMS, application settings) and the backup’s type (full or incremental).
Before restoring it, it is possible to select the backup to restore. Backup and
restore tasks have been realized through asynchronous tasks in background
using the AsyncTask class provided by the framework itself, which allows to
notify the UI thread with results without the need of specific handlers. HTTP
requests/responses have been managed using the well known HTTP Client of
Apache’s Jakarta Commons project. To extract data it was necessary to bypass
the Android’s access policies. Each Android application has its own sandbox
which the other applications cannot invade, but for explicitly declaring some
permissions. It is possible to access applications’ private data only if the application provide a Content Provider, which makes possible to access to private
data of applications in a uniform manner. So private data of contact, calendar,
SMS and media file applications have been accessed through the respective
content providers. Calendar data have been accessed directly from the SQLite
database. Each application stores its own persistent settings in a XML file contained in the shared prefs private directory; there is no way to access such
information if application does not implement a Content Provider. This limitation was overcame elevating access permissions of the backup application as
root (http://www.koushikdutta.com).
52
4.3. PERFORMANCES
Data Type
contact
SMS
calendar event
file
msec
81315
80877
5544
278980
units
150
170
14
3
msec/unit
542
476
396
92993
Table 4.4: Time overhead of the backup operation per data type
4.3 Performances
The system developed have been tested on a HTC Legend device connected to
a 54Mbps WIFI network on a secure HTTPS channel. We aimed at measuring
the time overhead introduced by our system, and thus we measured the time
needed to execute single backup functions.
Table 4.4 shows the times needed to backup a commonly used smartphone,
with 150 contacts, 170 text messages, 14 calendar events and 3 files of size
104.796 KB, 5.659 KB and 161.166 KB. Clearly, the most expansive operations
are on files; to save the 3 files, the application needs 278 seconds, which is 71%
of the total time needed for the backup. The total overhead to perform a full
backup of the device amounts to 447 seconds (about 7 minutes), preserving the
usability for real use cases. The most common operations are on incremental
backup, hence, in the last column of Table 4.4
4.4 Concluding remarks
In this chapter two different approaches and implementations of backup systems for mobile devices have been described. These approaches must be combined to realize a powerful backup system implementing a differential, resource based, scheduled backup with a snapshot approach.
Focusing the backup target on mobile devices, for some classes of data
53
backup can be performed online monitoring resources such as address book,
calendar and other resources updated frequently and keeping the most important data, for the device’s user, up to date.
Combining the two approaches described with the proposed data model,
the backup system coming as output will take advantage from the first approach to maintain a snapshot of the system’s status for a fast restore; moreover
the first approach can be used to handle resources such as multimedia files and
non structured databases. The second approach is really powerful to handle
structured data. Such kind of information is sent over the network exploiting
the proposed data model. The server will take advantage of data “formatted”
using the data model to build an interoperable data structure accessible from
all classes of mobile devices and mobile operating system.
Even if we use the first approach to save data residing on the device on the
external memory and send these data to the backup server, personal information are contained into the backup in raw format. The reader will see in the
first part of Chapter 5 a methodology proposed to extract personal data from
raw backup files.
54
5
Data elaboration
Introduction
The second step to be performed in a backup process is to save data to a location different from the location being backed up. This phase can be performed
in several ways and, obviously, results obtained will be different. The most useful approach is not to backup the device as is, with all the system files, but get
only the information useful for the user. Mobile device operating system usually can be restored using some key combination or specific commands. For
example for Symbian devices typing *#7780# resets the device without erasing user’s data, *#7370# deep resets the device even erasing user data. Saving
system data is completely useless, on the other hand the user is interested in
restore his/her personal data i.e., contacts, messages, calendars, files . . .
The first part of this Chapter shows how personal data can be extracted
from the raw backup of a mobile device; the extraction is performed using the
methodology we proposed in the International Conference on Ultra Modern
Telecommunications 2009 [42].
55
CHAPTER 5. DATA ELABORATION
In the second part of the chapter we describe how data can be managed
both on device and in a server in a smarter way, using the approach published
in the 4th IFIP International Conference on New Technologies, Mobility and
Security [28].
Personal data is extracted directly on the device, using the second approach
described in Chapter 4, or from a raw backup, using the approach described
in Section 5.1, and it’s saved in a common database to grant interoperability
between different devices.
Data are contained inside smartphones as files. These files can be accessed in
several ways, the first approach described in Chapter 4 handles each file in
the same way, whether it contains a picture or the address book. The second
approach focuses its interest, exploiting operating system’s API whenever possible, into data contained into the databases, and not into files containing data.
These two methodologies approach the problem in a too much different way,
and obviously the second approach cannot be used to extract logical data, such
as contacts, from a backup stored using the first approach. Information contained in some files, such as Contacts for Symbian, backed up using the first
approach, is useless if not processed to extract interesting data from these files.
Files containing data stored in a smartphone can be divided mainly into
two classes:
Non structured files are files such as multimedia files, text files or PDF documents saved on the device during its usage, these files are accessible and
can be restored on other devices without any further processing;
Structured files these files usually are databases containing information used
56
5.1. REMOTE ELABORATION
by device applications such as address book, calendar or text messaging. To save the information contained within a structured file is more
important than to save the file itself; as it enables the backup system to
store such information into a common data structure, to allow different
devices to interoperate (see Section 4.4).
5.1 Remote elaboration
After extracting the file system’s logical dump from a smartphone (see first
approach described in Section 4.1), the dump is sent to the backup server described in Section 5.4. When the logical dump is received from the client the
server needs a method to decode personal data stored within several mobile
DBMS files and to make them available to other applications. Such DBMS files
contain actual and obsolete data, i.e., old or deleted entities; this occurs because the mobile OS, for performance reasons, defers the deletion as long as
possible, e.g., when the free space available in the file system is not enough. It
could be useful for a backup system to recover even erased information even
if this information has not been backed up. Unfortunately deleted information
is not accessible using DBMS APIs provided by manufacturers (when available). Therefore we chose a Data Reverse Engineering (DRE) approach to retrieve and decode the storing format. In the traditional architectures (PCs and
mainframes) the DRE was studied as business solution either for the control
of data handled via legacy applications or in order to reconstruct deteriorated
data. Developed models are too generic for mobile environments [43], or they
aims at discovering mainly the data model [44], [45], [46], or have been studied to address vertical problems like extracting data from COBOL, DB/2 [47]
or Access. For our scope, we are not interested in discovering the data model
57
because we know a priori which data we are looking for (e.g., all the user controllable data attributes like contact’s name and surname or SMS text), and we
do not care about the relational structure. Moreover, a great facility given by
a methodological DRE application, is that, when file formats change, after reapplying the methodology we are able to update our knowledge about how
data are stored.
In this section we propose a methodology allowing smartphone’s DRE operators to be more flexible in the mobile file formats knowledge. As a matter
of fact, the mobile phone environment is composed of a plethora of manufacturers and operating systems, each of them is released in several versions
which stores data in different formats. Handling such heterogeneity through a
methodological approach is an important asset to allow the system to decode
different platform’s databases.
The DRE methodology has been proposed to solve mobile forensic problems due to lack of standards; we applied it to the backup case with success.
As a case study we applied these methods to the Symbian OS, and we obtained
several results, including the mapping between a given data and its location
into the file system, the obsolete data recovering, and the Symbian personal
databases format reversed. The obtained results (see Section 5.3) show that our
methodology can be successfully applied to environments which are different
from the forensic starting point. The methodology helps to decode databases
files and to develop ad-hoc parsers; data extracted by such parsers can be easily converted and used to perform tasks such as backup, user profiling, device
syncing and data recovery.
A flow-chart of the methodology is shown in Figure 5.1.
58
5.2. OUR STEP-BY-STEP METHODOLOGY
Stage 0:
Choice of the
objective
Stage 2:
Data hypothesis
and entities
injection
Stage 1:
Files of interest
identification
Stage 3:
Sequences similarity
discovery
Stage 4:
Data interpretation
No
Yes
Is it sufficient
to modify the
hypothesis?
Goal
reached?
Stage 6:
Error correction
Yes
No
Objective
reached?
No
Stage 5:
Meta-format building
Yes
Stage 8:
Testing
&
debugging
Stage 7:
Parser building
Figure 5.1: The methodology flow
5.2 Our step-by-step Methodology
Smartphone’s operating systems save personal data in many DBMS tables which
are stored in binary files. Often the format of such files is not public and the
tools available to read them rely on an operating system native API (if they
run on the device) or on a porting of their code (if they run on a PC), and,
when available, they can not retrieve deleted or modified data. Therefore, a
solution is to interpret the binary file directly in order to give a structure to the
internal data. Initially the problem was addressed through the comparison of
multiple files of the same type, relying on the analyst’s ability in the intuitive
interpretation of the data content. In such way the analysis of data was of59
ten confused and led to performing redundant operations without any result.
Therefore, in order to preserve obsolete data, we chose to design a methodology for the binary file interpretation, which was able to decode the information
required without performing redundant operations. Furthermore, the methodology will help to retrieve the data alterations and deletions.
Our main contribution is to propose a wisdom-driven DRE methodological approach to decode smartphone’s personal data, that are stored in several
DBMS-managed files; with the contribution of this chapter we provide the tools
to reach the following targets:
• understand where information is stored in the mobile device’s file system;
• retrieve and decode personal actual and obsolete data;
• develop a suitable parser.
Stage 0 aims at choosing which kind of information (the objective) we want
to find and how it can be decoded. An objective is composed by one or more
goals. We may think of an objective as an entity (e.g., a contact, or a call log, or
a SMS) composed of one or more fields (e.g., for a contact, the first name, the
last name, the phone number, etc.), which are the goals of our objective. Stage
1 aims at identifying which files (file of interest) could contain data (our goal)
we wish to decode. With Stage 1, the methodology enters in a iterative process
which allows to understand the binary format of data by comparing different
versions of it.
In Stage 2 some assumptions about the data type are made. Such assumptions lead the choice of sample instances of entities to be inserted into the device’s databases. Instances are stored as records which are contained in one or
60
more binary files. If required, the hypotheses made in Stage 2 will be refined in
Stage 6 and the instances may change. The number of instances inserted will
determine the number of comparisons among binary records, that will affect
the precision of next Stage. Stage 3 deals with the binary files’ content formatting, in order to make the data instances inserted in Stage 2 identifiable and
comparable. Usually, we try to group similar zones within the same sample
binary file, and among different sample binary files, and then proceed to the
interpretation.
Formatting must take into account the data interpreted successfully in previous iterations, in order to cut them off (i.e., data already analyzed) from the
study of a new format. The Stage 4 comprises two sub-tasks: the first deals
with identifying candidate bytes sequences, and the second aims at decoding
the candidate bytes sequences. The identification of candidate bytes sequences
is performed by removing all the sequences that do not match with the hypothesis of the Stage 2. The second task tries to find the connection between
the data inserted in Stage 2 (the instances) and its binary representation. As
depicted in Figure 5.1, the methodology iterates through Stage 1, 2, 3, 4, and 6
(error correction) until a goal is reached, i.e., the information about the format
of a entity’s field is exhaustive and a mapping between the field and its binary
storing format is found. The fifth Stage simply annotates in a meta-format all
mapping information found. If the joining of all meta-format found allows the
decoding of the entire objective (the information needed) identified in Stage
0, the methodology goes to Stage 7. At this Stage a piece of software able to
decode automatically the now-exposed file format will be designed and implemented. All collected knowledge about the format turns into a set of software
requirements. This process must be repeated for each file marked as file of in61
terest. Such a piece of software will be tested at Stage 8.
In the following sections we will describe each methodology Stage.
5.2.1
Stage 0: Choice of the objective
Before starting, we must to choose from which data we want to start the decoding process. We define as objective the type of personal data (e.g., contacts,
SMS, email, calendar, events log, etc.) we want to find into the device’s file
system and to decode the binary format. An objective can be seen as the set
of “atomic” goals that must be completed in order to reach the objective. For
instance, in order to decode the contacts (the objective), after having detected
in which file (or files) they are stored, we have to find how the contact’s data
elements (goals) are encoded. Such goals are attributes such as name, surname,
mobile phone number, e-mail, street address, etc.
Definition 1 Let an objective Γ be a set such that it contains the list of goals we want
to reach.
Γ = γ 1 . . . γn
In this Stage we can only define roughly an approximation of Γ: thanks to
information about the objective’s data format that we will learn progressively
in the next Stages, we will be able to refine Γ with more accurate goals.
5.2.2
Stage 1: Files of interest identification
Given the objective chosen in the previous Stage, this step aims at identifying
files to be analyzed and decoded in next Stages. Mobile devices save personal
data in database files stored persistently in the file system. To identify the files
containing the information we are looking for, we first need to cause a lot of
62
changes inside these files in order to make them identifiable. These changes
are objective-dependent: if we are looking for contacts, we will generate activity like contact insertion; if we are looking for events log, we will make calls,
sim-changes, and send and receive SMS. Each of these operations generates an
entity (E) which will be stored as one or more records in the file system. Each
entity E is a set composed by m ∈ N attributes ().
Definition 2 For each goal γi ∈ Γ there is a set of attributes j ∈ E such that, after
discovering the encoding of each j in the set, the goal γi will be reached.
Definition 3 We define Ω as the sequence {E1 , . . . , En } of entities we have to insert
in the device in order to modify all the files involved in the given objective.
The value of n depends on the objective’s type and on how its entities are
stored. Then, n can only be supposed as the process starts, but it could be
refined over the methodology’s iterations if needed.
For instance, let E be a contact’s card: each i ∈ E will be an attribute such
as name, surname, date of birth, phone number, email address, and so on.
As a best practice, there is the need to fill every i ∈ E attribute in order to
modify all possible files involved in Γ.
Definition 4 Let A be the fileset (in our case, the whole device’s file system) before
performing the Ω operation set on the device. Let B br the fileset after performing Ω
operation set. The application T of operations set Ω on the device is:
TΩ : A → B
Definition 5 Let diff denote the function which computes the differences between two
63
ε1
John
ε1
ε2
Brown
ε2
White
ε3
+123423456
ε3
+19280023
ε4
[email protected]
ε4
[email protected]
εm
some_info
εm
some_info
Ω =｛
E1
E2
Peter
En
｝
Figure 5.2: The format of the Ω operations sequence. In this figure is shown an example
with contacts discovery as objective
filesets. The fileset C, which contains only files modified by the T application, is:
C = diff (B, A)
C may contain garbage data, since other operations may occur when the
user performs T . Then, we must “clean” C, searching and deleting all irrelevant data.
Definition 6 Let clean denote the function which cleans a fileset of garbage data. The
fileset Φ is:
Φ = clean(C)
5.2.3
Stage 2: Data hypotheses and entities injection
After the insertion of the Ω entities, the Φ set tells us which files have been
modified, but it still does not give us information about how the i are encoded
64
in the storage.
In Stage 2 we will perform three tasks:
1. We make assumptions about the possible i format. The Λ set represents
the collection of assumptions we made at this Stage. Λ is composed by
assumption about data type, size and predictability. The latter indicates if
we can control the value of i . Possible values of predictability can be the
following:
• controllable: the attribute corresponds to any input field and the user
can fully control its value. An important property, for the methodology application, is that controllable attributes can be stored more
than once in the device, and the corresponding byte sequence is always the same. In contacts case, controllable attributes are input
fields like name, surname, phone number, etc. If we hit the right
type and size, we will be able to predict the binary (hexadecimal)
version of the data.
• uncontrollable: the attribute does not correspond to any input field
and the user is prevented from handling its value; there is no way
to predict the binary version of the data. In the contacts case, the
contact’s ID is an uncontrollable attribute, because it is transparently
assigned by the system.
• pseudo-controllable: the attribute does not correspond to any input
field and the user is prevented from handling its value, but it can be
partially predictable in its binary version. For instance, if we store
two contacts in the same day, the year/month/day part of the insertion date (the 6 most meaningful bytes, for 8-bytes date format) will
65
be the same for both of them.
2. Once the assumptions at the previous point have been made, we generate
a set Ω0 of sample entities which have all attributes but the i-th set to
NULL:

1 = NULL





...

Ω0 = 
i = v1




...



m = NULL






,...,




1 = NULL
...
i = vk
...
m = NULL












where |Ω0 | = k, i = va ∈ {v1 . . . vk }, j = NULL, ∀ j 6= i. Values
va will be chosen as they will be easily identified trough all file bytes
in the next Stages. A good choice for va values critically influences the
subsequent steps; in the early iterations of the methodology, va should
be chosen with values that, disposed in the Ω0 entity sequence, follow a
periodical repetitive pattern (e.g., AABB, ABAB, AAAA, etc.). Thanks to
this approach, in the next Stages we will be able to retrieve them through
a pattern similarity matching, avoiding ambiguities in the modified file’s
zones caused by insertion side effects.
3. Finally, we have to insert Ω0 entities into the device through an application TΩ0 , and then we have to perform a new file system dump in order
to analyze the files generated via the insertion performed in the previous
task.
The output of this Stage is Λ and Φ0 , the set composed by all files containing
Ω0 entities.
66
5.2.4
Stage 3: Sequences similarity discovery
The goal of this Stage is to get the Φ0 fileset, containing the sample entities,
and to find all sequences of bytes which present the same similarities as the
attributes of Ω0 entity set inserted in Stage 2. In the previous Stage, we injected
entities which shared one or more attributes among them. The attributes of
entities was injected following a pattern, like pairs of calls with the same duration or contacts with the same fields. In this Stage we have to highlight the
file’s byte sequences which are equal among them. In the call duration example, if we made c pairs of calls with the same duration, we will find c equal
pairs of byte’s sequences in the events log file. Therefore, if the assumptions of
the previous Stage were correct, the current step simplifies the interpretation
tasks in the next Stages reducing the file’s complexity.
The Stage 3 process iterates through the following steps:
1. Discard file zones which are not directly affected by the operations in
Stage 2;
2. Identify attribute separation flags;
3. Identify, highlight and separate similar byte sequences.
In Figure 5.3 is shown an example in which we are going to format the
event log file to detect the storage format of the voice call duration. In the
previous Stage we made pairs of calls of the same duration (Figure 5.3a). All
the useless information (metadata, index and tables) was discarded and similar
zones were looked for in accordance with the methodology described.
Once similar zones are identified (Figure 5.3b), they have to be formatted
in the same way to enable the next Stage to refine the identification and to
67
00
60
00
00
2A
00
31
7F
00
02
00
00
00
02
CF
00
06
11
02
63
00
E1
1B
39
98
00
30
46
01
00
0C
58
00
31
F2
00
02
00
00
00
02
C3
00
06
00
60
00
00
2A
00
31
66
00
30
46
01
00
0C
58
00
31
C1
05
63
00
E1
C8
39
B8
00
06
00
60
00
00
2A
00
31
EB
00
02
00
00
00
02
58
00
31
DD
03
63
00
E1
46
39
9C
00
30
46
01
00
0C
2A
00
31
6E
00
02
00
00
00
02
C9
00
06
00
60
00
00
E1
1B
39
82
00
30
46
01
00
0C
58
00
31
FB
06
63
00
00
00
02
BD
00
06
00
60
00
00
2A
00
31
96
00
02
00
01
00
0C
58
00
31
9C
04
63
00
E1
46
39
CA
00
30
46
00
60
00
00
2A
00
31
7F
00
02
00
00
00
02
CF
00
06
11
02
63
00
E1
1B
39
98
00
30
46
01
00
0C
58
00
31
(a)
F2
00
02
00
00
00
02
C3
00
06
00
60
00
00
2A
00
31
66
00
30
46
01
00
0C
58
00
31
C1
05
63
00
E1
C8
39
B8
00
06
00
60
00
00
2A
00
31
EB
00
02
00
00
00
02
58
00
31
DD
03
63
00
E1
46
39
9C
00
30
46
01
00
0C
2A
00
31
6E
00
02
00
00
00
02
C9
00
06
00
60
00
00
E1
1B
39
82
00
30
46
01
00
0C
58
00
31
FB
06
63
00
00
00
02
BD
00
06
00
60
00
00
2A
00
31
96
00
02
00
01
00
0C
58
00
31
9C
04
63
00
E1
46
39
CA
00
30
46
00 11
58 2A
00 00
00 00
02 0C
00 DD
58 2A
00 00
00 00
02 0C
00 9C
58 2A
00 00
00 00
02 0C
...
...
F2
E1
00
00
00
6E
E1
00
00
00
7F
E1
00
00
00
66
00
00
63
00
82
00
00
63
00
98
00
00
63
00
(b)
B8
01
00
02
00
BD
01
00
02
00
C3
01
00
02
00
60 02
1B
30 06 31 31 39
46
60 03
1B
30 06 31 31 39
46
60 04
46
30 06 31 31 39
46
(c)
Figure 5.3: These figures show an example of a DBMS binary file before and after the
Stage 3. In (a) the sample file after making pairs of calls of the same duration (Stage 2).
In (b) equal sequences highlighted. In (c) the formatted file Φ̂0
understand which file parts were changed after the Ω0 entities insertion. We
must separate the user added information from other file data (Figure 5.3c). A
good file formatting is given by isolating different file zones from similar ones,
and then by isolating flags.
The output of this Stage is the Φ̂0 containing the formatted fileset.
5.2.5
Stage 4: Data interpretation
Stage 4 is composed of two steps; the candidate sequence identification and the
candidate sequence interpretation.
Definition 7 The candidate sequences are sequences of bytes, stored in the Φ̂0 fileset,
in which we are likely to find the data we are looking for. ΣΓ,Λ is the set of candidate
sequences for a given objective Γ, and under a given assumption Λ.
The candidate sequence identification relies on the hypothesis about attribute data properties made in Stage 2, and it deals with simplifying the se-
68
quence, deleting all non-relevant data. In particular:
• If the data is constant it is always stored in the same format, so the formatted files containing the data can be simplified by removing all the
different bytes; if the data’s size is equal to the size in the assumptions
made, such data is added to ΣΓ,Λ ;
• If the data is variable probably the storing format will be always different, so all the equal formatted files parts can be removed to simplify. If
the data size is equal to the size in the assumptions, such data is added
to ΣΓ,Λ ;
• If the data is pseudo-variable the storing format will be partially constant
and partially variable; we have to look for the constant parts of the file
and, then, we can look at the proximity of the constant zone in an area
with its size equal to the hypothesis. Then the sequence is added to ΣΓ,Λ .
If ΣΓ,Λ = ∅ or |ΣΓ,Λ | is large (unmanageable quantity), in order to reduce the
number of resulting candidate sequences, we have to analyse the results and
understand how to change the Λ assumptions made in Stage 2 (through Stage
6). Once the assumptions are modified and the new Ω0 entities are inserted in
the device (reiteration through Stages 2, 3 and 4), the precision of this Stage
will improve.
When we reach a manageable size of |ΣΓ,Λ |, the candidate sequence interpretation task can start. In this step we consider Λ to better understand which
part of the candidate sequence represents the data we are interested in. We look
at the Ω0 sequence of operations and check if the sequence does match in the
candidate sequence set. If the sequence of attended values of attributes in Ω0 is
the same in the ΣΓ,Λ , the sequence is ready to be interpreted. As the database
69
files are usually in hexadecimal format and the target data are in a different format (e.g., string, decimal format), it is necessary to transform data in a common
format (e.g., decimal).
The last step to be performed is to compare data contained in the database
with the data inserted in Ω0 entity sequence and, if those match, the storage
format is saved and the next Stage starts.
5.2.6
Stage 5: Meta-format building
After the data decoding in Stage 4, we need to store the information collected
in a intermediate format. This Stage should be seen as a “methodology intermediate status saving”, which helps the operator to choose the next γ goal to
process, and to refine it if required. Before compiling the meta-format, this
Stage requires the compilation of a “formats table”. In such a table a list of data
discovered at Stage 4 is reported, and for each data the following metadata are
shown:
Field Name Is a text placeholder associated with the data. This label will be
substituted to the data value in the Φ̂0 , in order to make its retrieval easier.
Size The size of data, expressed in bytes.
Description Other information, useful to the parser building Stage, like: which
information is held by the field, type of data, endianess, suggestions for
the automatic data localization, etc.
Example An example value of the field.
Each discovered data needs a row in the table. An example of formats table is
shown in 5.4a.
70
Field Name
ID
NAME LEN
NAME
...
Size
4
1
( NAME2 LEN )
...
Description
Int, Bigend
Int, Littleend
String
...
Example
B6 03 00 00
0E
43 6C 61 75 64
...
(a) A table with pseudo data type, got as output by Stage 4.
B6
0E
43
0A
44
10
55
09
03 00 00
6C 61 75 64 69 61
72 61 67 6F
6E 69 72 6F 6D 61 32
13 00 10
(b) The meta-format file before Stage 5.
ID
NAME LENGHT
NAME
SURNAME LENGHT
SURNAME
COMPANY NAME LENGHT
COMPANY NAME
CXF1
(c) The meta-format file after Stage 5.
Figure 5.4: This three figures depict an example of the application of Stage 5 on a file
containing the phone’s address book.
71
After compiling the formats table, the meta-format file will be equivalent to
the sample binary file purged from non-relevant bytes. Data such as headers,
indexes, etc, can be deleted if they are not relevant for the purposes of the
objective. The first step to be performed is to identify, for each entry in the
table, the values with which the data is manifested into the meta-format file
(Figure 5.4b) and to replace them with the related labels in the table (figure
5.4c). In this way all relevant data in the meta-format file will be replaced by
placeholders that will be easily detected at the parser building Stage.
The example shown in Figure 5.4 takes into account a contacts file containing two records with following fields: name, surname and company.
After this Stage, the given binary file could be automatically interpretable,
if all the following conditions are satisfied:
1. The meta-format’s data and values not yet identified have a static size, so
they can be ignored. In this case the parser is able to skip them automatically;
2. All required meta-format’s data and values are identified;
3. If after having tried different hypotheses of Ω0 , the identified zones in
the meta-format did not change at all, then the meta-format file and the
formats table are stable.
5.2.7
Stage 6: Error correction
This Stage will be performed if the current γi was not reached (e.g., Stage 4
was unable to find a correct interpretation for the i representation) and it is
mandatory to re-iterate the methodology. The error leading to this Stage can be
72
caused by two cases. In the following list we show the actions to be performed
in the next iteration:
1. ΣΓ,Λ = ∅ or |ΣΓ,Λ | is high (unmanageable quantity): if there are no candidate sequences or there are too many, some backtracking needs to be
performed to obtain a manageable number of candidate sequences. Some
actions may be useful to do this:
(a) Changing the assumed data size. This implies reformatting the Φ0 ,
building up a new Φ̂0 . If ΣΓ,Λ = ∅ and we are looking for matching
sequences, we need to decrease the size. Two different big sequences
might contain two matching smaller sequences. On the other hand,
if we are looking for non-matching sequences, the size needs to be
increased. In the case where |ΣΓ,Λ | being high, if we are looking for
matching sequences we need to increase the size, and decrease it for
non-matching sequences.
(b) Modifying Ω0 , adding or deleting entities, or changing the i values.
A new Ω0 could give as output more accurate results. The changes
should be done according to the feeling of the operator, this is the
hardest part of the whole process and the operator’s skills play the
starring role.
(c) Verifying Φ0 correctness. Verify that the file we are looking into is
the right one (the required information may reside in another file).
2. If the interpretation of candidate sequence did not decode any information about the storing format:
(a) Changing the assumed data size.
73
(b) Modifying Ω0 . If an ambiguity among different candidate sequences
happened, modify Ω0 in order to restrict the change to less bytes;
(c) Changing the data type. Changing the data type might help the
decoding from hex.
If none of the above cases apply, or the suggested changes did not lead to
a correct data interpretation, we need to review the current γi goal in Stage 1.
Each reached γ reduces the space of assumptions we are free to choose to build
Λ (and Ω0 as a consequence) for other γ.
5.2.8
Stage 7: Parser building
This Stage takes as input all collected knowledge about the given binary file
format. The operator should be able to write a program that reads data from
the logical dump of the smartphone and converts them in a XML format. It
is mandatory to implement a quality monitor that measures the number of
entries in which the parser encounters problems. The ratio r =
F
T
between the
number of failures (F ) and the total number of entries (T ) will be an indicator of
the need to perform additional methodology’s iterations. The threshold below
which r is acceptable depends on the required accuracy.
5.2.9
Stage 8: Testing and debugging
In this phase the parser produced in the last Stage will be applied on several
logical dumps, in order to test it and to debug it over real cases. In this Stage
the r values of the current parser it will be verified and will be established if
the implementation precision is sufficient or not.
74
5.3. REMOTE ELABORATION RESULTS
Case Study
Logdbu.dat
Information
Event Log
Calendar
Contacts.cdb
Mail folder
Memo
Contacts
SMS/MMS/Email
Detailed Information
SMS previews, MMSs, e-mails, calls, video calls,
PRSConnection, SIM/MC change.
Daynotes, meetings, anniversaries
Contacts information
Sender, receiver and body
Table 5.1: Symbian files of interest
5.3 Remote elaboration results
In order to verify and to refine the methodology’s Stages, we took the Symbian
S60 operating system as a case study. Applying the methodology produced the
results we are going to show in this section.
File of interest - Stage 1 helped us to find a list of files containing SMS, MMS,
contacts, and all user’s personal data, which are shown in Table 5.1.
Symbian personal data files format - Thanks to the methodology we have been
able to reverse engineer the Symbian S60 DBMS file format. We applied
the methodology to the contacts list, to the calendar, to the text/multimedia messages and to the phone’s event log (which contains calls, sent and
received SMS/MMS preview and SD card ad SIM changes). The complete format is explained in Appendix A.
Obsolete data - Among information identified and retrieved in the case study,
we were able to find obsolete data which were not purged from the file
system. The DBMS resources optimization strategy, in fact, reduces the
high-cost of DB’s modify/delete operations by flagging them as “obsolete”: for these reasons the modify/delete operations are scheduled as
late as possible, and the circumstance when they are performed varies
75
depending the kind of file. For instance, in the Symbian case, in Contacts.cdb the deleting operations are performed when the Compress()
syscall is invoked. Operating system tasks and third-party software as
well can invoke this function, and they are able to know whether or not
to perform compression by invoking CompressRequired() (see [48]).
Let S the disk total space, F the free disk space, and W the amount of
disk space wasted; the boolean function returns true if:
(W > 64K) ∨ (W > 16K ∧ W >
1
(W > 16K ∧ F < 20S
)∨
(W > 16K ∧ F < 16K)
1
2S )∨
After a compression is performed, the contacts are rearranged, the space
wasted by obsolete records is recovered and there is no way to recover
obsolete data. If the extraction operation occurs before the compression
was invoked, we will find a database file that will contain all data since
last compression. For case studies related to the contacts, calendar and
event log, enough information was decoded in order to reconstruct the
owner communication history. In the case study of messages (SMS, MMS
and emails, stored in the /System/Mail folder) we were not able to
find erased data, because OS purges immediately deleted messages to
optimize the available storage.
Unexpected information - A part of data attributes are not controllable by the
user, i.e., she can not insert them into the system explicitly, thus we were
not conscious of their presence. During Stage 4, the nature of our methodology helped us to retrieve such “hidden” information, as the record’s ID
and its creation date. Such an important result enforces the methodology
effectiveness, since it is able to detect more goals than the identified ones
76
5.4. LOCAL ELABORATION
in Stage 0. In our case study, some unexpected information helped us to
better understand the data model, thus the application’s behaviour.
We applied the methodology to more than 50 device dumps. At the beginning, the first dumps we studied came from Nokia N70 devices1 , but we
realized that the knowledge we had about the S60 format was still incomplete
since the parser was unable to decode an older phone’s dump (Nokia 7610).
After applying a few iterations of the methodology, we built a parser able to
interpret the new format.
5.4 Local elaboration
Local elaboration requires less work to be performed by the server part of the
system. Differently from the remote elaboration case, the most part of local
elaboration is performed on the mobile client (see Section 4.2).
The server side of the system in this case must provide to the clients the API
to communicate, save and restore backups data. Following the cloud paradigm
these API are provided as web services. A set of REST web services have been
implemented, we choose to implement the web services using a REST architectural style because we wanted to exploit the HTTP protocol facilities. HTTP
grants the system to be scalable, easy to be maintained and provides a secure
transmission level (HTTPS) without implementation effort. Mainly REST architectural style is suitable for our purpose as the server’s tasks can be performed using PUT and GET requests.
Figure 5.5 shows the server architecture. We designed a tree level architecture following the Model-View-Controller (MVC) pattern [49]; on the top view
1 Equipped
with Symbian OS v8.1a, S60 Platform Second Edition, Feature Pack 3
77
Apache Tomcat Integration
View
Control
Model
Restlet
XStream - XML (de)serialization
HTTP GET/POST/PUT/DELETE
Business logic
ORM (Active Objects)
DAOs - Java POJO
MySQL
Figure 5.5: The architecture of the backup server
layer is contained the interface with clients. The interface has been realized
using the RESTlet framework [50], [51]. REST web services are exposed using
the Apache Tomcat [52] application server and accessed via standard HTTP(S)
GET/PUT/POST/DELETE methods. Data are sent over the network via XML,
object representations are serialized and deserialized via the XStream library.
The control layer shown in the center of the figure implements the business
logic of the backup system. Business logic does not contain only functions to
handle data to be saved into the database; this layer contains even parsers implemented as result of the application of the methodology proposed in Section
5.2.
78
5.4. LOCAL ELABORATION
The model level has been developed using the Active Objects ORM [53]
which allowed us to interact with the MySql database directly using standard
Java objects (POJO) [54].
The server provides REST API to perform full and an incremental backup.
The server provide an interface to backup and restore contacts, calendars, text
messages, multimedia messages, emails, application and system settings. To
access the backup and restore services the client must authenticate through
username and password.
On the first interaction between client and server the server expects a full
backup, the server will store all the data sent by the client to the database.
Each time the client and the server interact the server on the bootstrap phase
of the communication the server sends the list of flies, saved into the server,
representing the last version of the information; the client creates the list of
files to be sent and later using the proper methods will store/update/delete
the files from the last version of the backup.
To create the list of files to be backed up the client uses the MD5 hash
and the last modification date of the files processed using the first method described in Chapter 4, while for the contents inside the databases, uses the list,
given by the server, containing the modification date of each entry, extracts the
new/updated/deleted data and creates the lists to be processed to synchronize
client and server.
A typical interaction between client and server starts with the insertion on
the server, through the /backup/{backupType}/device/{imei} method,
of a backup item; in this way the server can identify the type of backup performed from the backupType parameter and the device by the imei parameter.
Identify the device is fundamental in case the user holds more than one device
79
and ho uses the system to synchronize these devices. The server answers with
the list of resources composing the last backup.
Subsequently the client sends all data using the proper methods and the
server updates the date of the last backup performed the operation is straightforward, and similar for all kind of data. The full XML communication protocol
is detailed in Appendix B.
80
6
Protecting saved data
Introduction
Personal data are probably the most valuable to a user in today’s world. Somebody says that “data is the new oil” [55]. This information needs to be kept safe
and accessible only to people explicitly authorized by the data owner. This can
be achieved using authentication, security and privacy techniques; these techniques are usually based on cryptography. Unfortunately cryptography adds
a lot of overhead to operations performed on data, and, even if mobile devices
are becoming more powerful they still encounter performance and battery life
problems. Cryptographic operations affect both by requiring to the device to
execute more operation to achieve the same task.
In this chapter we show a novel key agreement algorithm based on the
matrix conjugation method we presented in the 2010 SECRYPT International
Conference on Security and Cryptography [56]. The algorithm has been implemented in J2ME and tested on real mobile devices. We also show the results of
some performance test executed on a new encryption algorithm compared to
81
CHAPTER 6. PROTECTING SAVED DATA
standard ones, presented in [57].
In the end of the chapter we present the framework to manage securely
inter-process communication under Android. The framework is detailed in
Grillo’s PhD thesis [58] and has been presented in the 2nd International ICST
Conference on Mobile Computing, Applications, and Services [59].
6.1 Key agreement algorithm
In many cases a key agreement is needed to send/exchange private data/information by coding them with a specific algorithm. Some mobile cryptography
use examples are [60], in which elliptic curves are efficiently used, and [61],
[62], concerning trusted text messaging. All these works focus more on coding/signing part than on key agreement, but of course a key agreement phase
is needed before encrypting or signing. In this section we present a JavaME
implementation of a new key agreement protocol – a particular case of a class
recently proposed in [63] – and compare our implementation performance [57]
against standard and Elliptic Curve Diffie-Hellman protocol [64].
In the next Section we explain the mathematical problem to be solved to
exploit the key agreement, and some consideration upon possible attacks and
why these attacks are not effective on such algorithm. In Section 6.1.2 the implementation choices are presented, analyzing why they do not affect security,
optimizing performances. In Section 6.1.3 we analyze the testing methodology
explaining each step of the testing phase. Section 6.1.4 shows the testing phase
results. Section 6.1.5 analyzes with more detail the Section 6.1.4 data. Section
6.1.6 resumes all results proposing possible improvements and applications of
the algorithm.
82
6.1. KEY AGREEMENT ALGORITHM
6.1.1
Mathematical setting: key agreement protocol
We consider GL(d, Zp ) = M, where p is a prime number. Fix G ∈ M and let ϕ
be the conjugation isomorphism associated to G
ϕG : M 3 M 7→ ϕG (M ) = GM G−1 ∈ M
The following public key agreement between Alice (A) and Bob (B) – see [63]
for a more general setting – exploits the property [ϕG (A)]n = ϕG (An ).
1. A and B share Q, S ∈ M, with SQ 6= QS and det(Q) = |Q| = 1,
2. A chooses two numbers xA , nA ∈ N.
3. A computes MA = S nA QxA S −nA and sends it to B.
4. B receives from A the matrices MA .
5. B chooses two numbers xB , nB ∈ N, computes
MB = S nB QxB S −nB and sends MB to A.
6. A computes MAB =
S nA MBxA S −nA = S nA (S nB QxB xA S −nB )S −nA
7. B computes MBA =
S nB MAxB S −nB = S nB (S nA QxA xB S −nA )S −nB
At the end A and B share the common matrix MAB = MBA , which represents the Secret Shared Key (SSK). In fact,
MAB = S nA +nB QxB xA S −(nA +nB )
= S nB +nA QxA xB S −(nB +nA ) = S nB MAxB S −nB = MBA
83
ALICE
(nA , xA )
MA = S nA QxA S −nA
x
MAB = S nA M A S −nA
B
MAB = S nA (S nB QxB S −nB )xA S −nA
MAB = S nA S nB (QxB )xA S −nB S −nA
BOB
(d, p, Q, S)
MA
EVE
MB
Unsecure Channell
MAB = S (nA +nB ) QxB xA S −(nB +nA )
(nB , xB )
MB = S nB QxB S −nB
x
MBA = S nB M B S −nB
A
MBA = S nB (S nA QxA S −nA )xB S −nB
MBA = S nB S nA (QxA )xB S −nA S −nB
MBA = S (nB +nA ) QxA xB S −(nA +nB )
MAB = S (nA +nB ) QxB xA S −(nB +nA ) = S (nB +nA ) QxA xB S −(nA +nB ) = MBA
Figure 6.1: Key Agreement process using conjugate.
Note that if |Q| 6= 1, a possible eavesdropper Eve (E) could set up a discrete
logarithm problem by considering the determinantal equation [65]
|MA | = |S nA QxA S −nA | = |S nA ||QxA ||S −nA |
= |S|nA |Q|xA |S|−nA = |Q|xA
with det(Q) known, if E can solve this scalar discrete logarithm problem, thus
recovering xA , then she can easily find, by solving a linear problem, and adjusting the free parameters entering in the solution, a polynomial X in the matrix
S of degree ≤ d, with coefficients in Zp such that MA X = XQxA .
Using this, E can compute
XMBxA X −1 = (XS nB X −1 )(XQxA X −1 )xB (XS −nB X −1 )
= S nB MAxB S −nB = MAB
because X commutes with S. In conclusion: if det(Q) 6= 1, then, the breaking complexity of the algorithm is essentially equivalent to the breaking complexity of a (discrete) logarithm in Zp , i.e., to that of (scalar) Diffie-Hellman.
84
With det(Q) = 1 (see step 1 of agreement process), this “attack” cannot be performed.
Figure 6.1 shows the agreement process performed by the algorithm. E
could intercept S, Q, d, p, MA and MB . In order to recover the private keys
(e.g., nA and xA ), she could set up the following equation
MA = S nA QxA S −nA = (S nA QS −nA )xA
but this is much more difficult than a usual matrix discrete logarithm problem
(DLP), as the base matrix is unknown. Other identities, such as
MA S nA = S nA QxA
are difficult to exploit because both S nA and QxA are not known separately.
Qd−1
We have that # M = i=0 (pd − pi ). Let o(M ) be the order of a matrix
M ∈ M, i.e., the smallest integer such that M o(M ) = 1. In order to avoid useless
computations, it is sufficient to choose nA , nB < o(S) (resp. xA , xB < o(Q)).
The order of a matrix M ∈ M is in general difficult to compute, but an
upper bound for it can be found as follows. For each M ∈ M let pM (x) =
Qk
di
i=1 fi (x) be its characteristic polynomial factorized in Z[x], with α = max{di | i =
1, . . . k}. An upper bound (multiple) m(M ) for its multiplicative order o(M ) is
given by the following formula [66]
m(M ) = lcm(pd1 − 1, . . . , pdk − 1) · pdlogp (α)e
6.1.2
J2ME implementation
The previously described operations to perform key agreement have been developed in Java Micro Edition (J2ME). We chose to implement in such programming language because we need a suite that can run on different hardware
85
platforms and operating systems. Moreover we noticed that a good performance evaluation can be obtained, comparing our implementation of the key
agreement algorithm with Bouncy Castle’s implementation of Elliptic Curve
and standard Diffie-Hellman key agreement algorithm.
Bouncy castle provide a plethora of API performing different cryptographic
operations implemented in JAVA, J2ME and C#, we used the Elliptic Curve
Diffie-Hellmen (ECDH) and the standard Diffie-Hellman (DH) key agreement
J2ME implementation to perform the comparison. The first step to implement
the algorithm described in Section 6.1.1, is to implement the modular operations on matrices (e.g., modular matrix multiplication, power, inversion, conjugate and other ancillary operations).
It is very important, in a mobile environment, to optimize every step of every operation with respect to resource consumption: in small capacity devices
every waste of resources implies a delay, larger than the delay, in more performing devices corresponding to the same waste: because of the shortage of
RAM, CPU and storage capacity, operations need to be optimized as much as
possible.
To perform the operations described in Section 6.1.1 we use a 32 bit unsigned integer data structure. Unfortunately in JAVA and J2ME there is no
unsigned integer data structure; to solve this problem there are two possible
approaches:
1. use bigger data structures, such as 64 bit signed long integer simulating a
32 bit size applying modulus when the value exceeds 232 ,
2. use available 32 bit signed integer combining it with arithmetical operations modulus 231 .
86
We have chosen the latter solution, i.e., to develop the modular matrix as a
integer array (int[ ]) with modulo 231 . This data structure is, in our opinion,
the best compromise between RAM wasting and CPU usage due to operations
needed to perform a task. Security of the key agreement is not affected using
31 bit integers, while performances are compromised, if one uses the 64 bit
signed integer to simulate 32 bit unsigned integer. Using long integers the
RAM consumption doubles and the system’s performances, in our opinion,
degrade too much to justify the slight improvement in security.
6.1.3
Performance testing methodology
In this section we report our performance tests of Matrix Conjugation Based
Key Agreement versus Elliptic Curve and standard Diffie-Hellman on a Nokia
N70 platform.
The Nokia N70 is a multimedia smartphone launched in Q3 2005. In 2007,
it was the second most popular cellular phone, with 8% of all sales at Rampal
Cellular Stockmarket[67]. Our experiments show similar results with other
mobile devices. Nokia N70 is equipped with:
• CPU : Texas Instruments OMAP 1710 (ARM architecture 926TEJ v5) – 220
MHz processor
• RAM : 55 MB
• FLASH : 19.9 MB
• MMC : 2 GB
• SCREEN : 176×208 TFT Matrix, 256K colours
• BATTERY : BL-5C (970 mAh)
87
• OS : BB5 / Symbian OS v8.1a, S60 Platform Second Edition, Feature Pack
3 operating system
• JAVA : MIDP 2.0 midlets
In a mobile device, in general, and using J2ME, in particular, there are several
problems in measuring the time required for a given task, because the accuracy
of the System.currentTimeMillis() function is not sufficient.
We will use, as an estimate of the time length of a given task, the average
of the time lengths, measured on several repetitions of the same task. More
precisely:
Definition 8 Let n be the number of iterations of one task, and let θi denote the time
needed to perform the ith task measured using the System.currentTimeMillis().
The actual time that the device needs to perform such task will be measured as follows:
n
Θn =
1X
θi
n i=1
6.1 It is an empirical fact that Θn becomes approximately independent from n,
for “large” n. The size on n depends on the task is and usually smaller for
longer tasks (i.e., larger Θi ) , see Section 6.1.5 below.
For each algorithm tested, we performed the above described operation for
the most used instances of the algorithms; e.g., for the ECDH case we tested all
the curves recommended by the NIST [68]. For what concerns standard DiffieHelman and Matrix Conjugation Based Key Agreement analysis, we considered instances with comparable private key length, in order to have an idea of
brute force attack complexity with respect to performances.
88
Public Data Generation
Key Agreement
TOT
6000
5000
4000
3000
2000
1000
1)
6 4)
2 (44
MC 1
MC 1
1 (37
5
11 )
10 0)
MC 9
(25
MC 1
0 (3
19)
84 )
MC 8
(19
MC 7
(15
DH 1
0 24
MC 6
(111
6)
)
DH 7
68
it
(775
MC 5
it
EC 5
7 1b
DH 5
12
EC 5
2 1b
9bitk
EC 4
0
MC 4
(496
)
itk
9)
EC 3
84bit
EC 2
8 3b
6bit
9bit
MC 3
(27
EC 2
5
3bitk
EC 2
3
EC 2
3
it
EC 2
24bit
EC 1
9 2b
EC 1
6
3bitK
0
Figure 6.2: Public data and Key Agreement generation time: all tests
EC . . . bit: Elliptic Curve Diffie-Hellman with a . . . bit key
EC . . . bitK: Koblitz Elliptic Curve Diffie-Hellman with a . . . bit key
MC d (. . . ): Matrix Conjugation at dimension d with a . . . bit key
DH . . . : Diffie-Hellman with a . . . bit key
Next section shows the experimental results of the comparison of various
performances of different key agreement algorithms.
6.1.4
Performance evaluation
Here we show the results of all the tests performed on standard key agreement
algorithms and protocols and on Matrix Conjugation Based Key Agreement.
We compared the performance of Matrix Conjugation Based Key Agreement to other reference algorithms, such as Diffie-Hellman key agreement (DH)
[69] and Elliptic Curve Diffie-Hellman key agreement (ECDH) [70]. We remark
that these algorithms are the most used to perform key agreement operations in
desktop and mobile environments. Among the NIST suggested Elliptic Curves
[71], we select both Koblitz curves (ending with a K in Figure 6.2 and Figure
89
Public Data Generation
Key Agreement
TOT
1200
1000
800
600
400
200
6 4)
5 1)
2 (44
MC 1
MC 1
1 (37
1)
0 0)
0 (31
MC 1
84 )
(251
MC 9
9)
16 )
(151
MC 8
(19
MC 7
DH 1
0 24
MC 6
(11
)
DH 7
68
MC 5
(775
)
DH 5
12
itk
(496
MC 4
4bit
EC 4
0 9b
9)
it
3bitk
EC 3
8
EC 2
8
MC 3
(27
4bit
2bit
EC 2
5 6b
EC 2
2
EC 1
9
EC 1
6
3bitK
0
Figure 6.3: Public data and Key Agreement generation time: results with an upper
bound of 1 sec.
EC . . . bit: Elliptic Curve Diffie-Hellman with a . . . bit key
EC . . . bitK: Koblitz Elliptic Curve Diffie-Hellman with a . . . bit key
MC d (. . . ): Matrix Conjugation at dimension d with a . . . bit key
DH . . . : Diffie-Hellman with a . . . bit key
6.3) and pseudo-random curves over GF (p).
In Figure 6.2 the time comparison between Matrix Conjugation Based Key
Agreement (MC in Figure 6.2 and Figure 6.3), standard and Elliptic Curve
Diffie-Hellman is shown. We can note that conjugation based key agreement
generates the public data and the SSK faster than the other algorithms. Since
in Figure 6.2 the difference in generation times for the secret and the key agreement is not really significant, we illustrate in Figure 6.3 a closer look to show
better the differences in time.
While a key agreement using Elliptic Curve with a 571-bits key takes 5706.3
milliseconds, a key agreement using conjugation based key agreement with a
5 × 5 matrix (775-bits key) takes only 20.63 milliseconds. This difference is sig-
90
nificant even considering that the SSK generated by Matrix Conjugation Based
Key Agreement is 50% larger than the Elliptic Curve SSK. Even when considering the case of standard Diffie-Hellman, the differences in mobile environment
look quite impressive; for example, a Diffie-Hellman 768-bits SSK is agreed in
343.44 milliseconds while a Matrix Conjugation Based Key Agreement 775-bits
SSK takes only 20.63 milliseconds. These differences are illustrated in Figure
6.3.
6.1.5
Experimental results
Table 6.1 summarizes all the results obtained in the performance testing for the
different classes of algorithms. Parameters field indicates:
• In the ECDH case, the type of curve that is used to generate the agreement
(K indicates a Koblitz curve) and the size of the generated SSK;
• In the DH case, the size of the generated SSK;
• In the Matrix Conjugation Based Key Agreement, the matrix dimension
and the bit size of the matrix generated as key.
Public Data Generation (Pub. Data) field indicates the time to generate the exchanged data to agree a SSK. The field Key Agreement (Key Agr.) shows time
needed to generate the SSK by means of exchanged and private data.
In Total field the sum of times used to generate exchanged data and SSK is
shown. The last field, Iterations (Iter.), indicates how many times the agreement
has been performed. This field is useful to understand the accuracy of the
values in the Public Data Generation, Key Agreement and Total fields. In all cases
but ECDH we did 100 iterations; in ECDH cases we decided to use just 10
91
Param.
Pub. Data Key Agr.
Total
Elliptic Curve Diffie-Hellman
163bit K
110,90
100,00
210,90
192bit
185,90
195,30
381,20
224bit
298,50
281,20
579,70
233bit K
696,90
759,30
1456,20
239bit
1684,40
1626,60
3311,00
256bit
312,50
262,50
575,00
283bit K
407,80
442,20
850,00
384bit
493,70
415,70
909,40
409bit K
561,00
560,90
1121,90
521bit
1404,60
1342,20
2746,80
571bit
2845,30
2861,00
5706,30
Diffie-Hellman
512
37,51
68,58
106,09
768
116,25
227,19
343,44
1024
282,98
539,83
822,81
Matrix Conjugation Based Key Agreement
3 (279)
3,27
3,13
6,40
4 (496)
6,72
5,94
12,66
5 (775)
10,32
10,31
20,63
6 (1116)
16,72
15,47
32,19
7 (1519)
23,76
22,96
46,72
8 (1984)
33,90
31,41
65,31
9 (2511)
44,35
44,85
89,20
10 (3100)
57,97
56,87
114,84
11 (3751)
74,21
71,26
145,47
12 (4464)
93,91
89,53
183,44
Iter.
10
10
10
10
10
10
10
10
10
10
10
100
100
100
100
100
100
100
100
100
100
100
100
100
Table 6.1: Time used from algorithms to generate the secret to agree a SSK.
92
6.2. ENCRYPTION ALGORITHM
iterations because times were more than one order of magnitude bigger than
in the other cases, so that keeping the same accuracy was not necessary.
6.1.6
Concluding remarks
In this section we compared a custom key agreement algorithm based on matrix conjugation with standard Diffie-Hellman and Elliptic Curve Diffie-Hellman
key agreement. Our experiments have been performed using one of the most
popular smartphone in the world. Experimental results showed that the key
agreement based on matrix conjugation results to be from 8 to 450 times faster
than the two DH.
Providing the users new services on their mobile device enlarges the need
of security to protect the information exchanged; such information can contain
data about bank accounts, credit card numbers, pins or simply passwords.
Currently existing cryptographic methods affect too much usability of applications, charging the system with resource consumption due to cryptographic
operations. Considering the growing business opportunity around the mobile
world and, at the same time, the need of new more performing applications
that can run on small capacity devices, as smartphones or netbooks, this section’s results open the possibility to apply such cryptographic methodology to
many scenarios in mobile devices use.
6.2 Encryption algorithm
QP-DYN is an encryption algorithm based on some ideas coming from [72]
used for the encryption/decryption phase of the communication. We are not
authorized to disclose information about how the algorithm works, we can
just provide information on the performance and statistic testing performed in
93
comparison with other stream cypher algorithms.
The security of QP-DYN’s has been statistically tested and the results are
available in Section 6.2.2. These results do not prove that QP-DYN is unbreakable; however they show that QP-DYN not only satisfies NIST requirements
for classified information but also it passes tighter and more robust tests, such
as Rabbit, Alphabit Pseudodiehard, FIPS-140-2 and Crush test batteries.
6.2.1
Performances
Performance testing have been executed on a Nokia N70 (see Section 6.1.3 for
the device’s details)
We compare QP-DYN with RC4 [73] and AES CFB [74] Stream Cipher because both perform stream cipher operations as QP-DYN.
In Figure 6.4 (a), Figure 6.4 (b) and Figure 6.4 (c) the results of performance
testing between QP-DYN and RC4 for different key sizes are shown. Time performances shown in the figures are the sum of the encryption and decryption
times.
The sizes of the key tested are:
• 512-bit for RC4 compared to QP-DYN with a 4x4 matrix for a total of
496-bit (Figure 6.4 (a));
• 768-bit for RC4 compared to QP-DYN with 5x5 matrix for a total of 775bit (Figure 6.4 (b));
• 1024-bit for RC4 compared to QP-DYN with 6x6 matrix for a total of 1116bit (Figure 6.4 (c)).
94
milliseconds
80
70
60
50
40
30
20
10
0
32
RC4 512
QP 4 (496)
96
160
224
288
352
416
480
544
608
size
milliseconds
(a)
70
60
50
40
30
20
10
0
32
RC4 768
QP 5 (775)
96
160
224
288
352
416
480
544
608
size
milliseconds
(b)
90
80
70
60
50
40
30
20
10
0
32
RC4 1024
QP 6 (1116)
96
160
224
288
352
416
480
544
608
size
(c)
Figure 6.4: Overall encryption and decryption time comparison between (sizes in bytes)
(a) RC4 512-bit and QP4, (b) RC4 768-bit and QP5, (c) RC4 1024-bit and QP6.
95
We observe that the time differences in the above figures are in the following ranges:
• From 15 up to 52 milliseconds for Figure 6.4 (a);
• From 18 up to 45 milliseconds for Figure 6.4 (b);
• From 37 up to 65 milliseconds for Figure 6.4 (c).
The time differences are within the range 15-65 milliseconds and thus they
do not affect substantially the usability of QP-DYN compared to RC4. Furthermore, it is useful to remember that RC4 is not considered secure (see also the
results shown in “Statistically testing QP - Dyn and RC4”).
We also compared performance results on mobile environments of QP-DYN
with an AES implementation performing Stream Cipher (AES CFB Stream Cipher). In particular, Figure 6.5 illustrates the results of a comparison between
an AES CFB Stream Cipher implementation using a 256-bit key and QP-DYN
with 3x3 matrixes (279-bit key).
milliseconds
120
100
80
60
AES – Strm 256
QP 3 (279)
40
20
0
32
96
160
224
288
352
416
480
544
608
size
Figure 6.5: Overall encryption and decryption time comparison between AES CFB 256bit and QP3 (sizes in bytes).
96
In our experiments, the size of the plaintext where QP-DYN and AES take
roughly the same time to encrypt/decrypt was about 256 bytes. As it can be
seen from the figure, the time differences between AES CFB and QP-DYN are
not dramatic:
• To encrypt/decrypt 32 bytes of plaintext AES CFB takes 22 milliseconds
less than QP-DYN;
• To encrypt/decrypt 256 bytes of plaintext AES CFB and QP-DYN take
the same time;
• To encrypt/decrypt 512 bytes of plaintext AES CFB takes 24 milliseconds
more than QP-DYN.
QP-DYN can be even used to perform Block Cipher operations so we compared it with AES in his standard Block Cipher implementation. As AES has
been designed to perform Block Cipher operations the encryption/decryption
milliseconds
times are better than the AES CFB. Figure 6.6 shows the results of our experi80
70
60
50
40
30
20
10
0
32
AES - Block 256
QP 3 (279)
96
160
224
288
352
416
480
544
608
size
Figure 6.6: Overall encryption and decryption time comparison between AES 256-bit
and QP3 (sizes in bytes).
97
ments for a standard implementation of AES using a 256-bit key and QP-DYN
with a 3x3 matrix (279-bit).
In particular:
• to encrypt and decrypt 32 bytes of plaintext AES takes 0.3 milliseconds
while QP-DYN 28.25 milliseconds;
• to encrypt and decrypt 512 bytes of plaintext AES takes 4 milliseconds
while QP-DYN 63 milliseconds.
We remark again that those time differences are very small (of the order of
60 milliseconds), and thus they should not have any impact on the practical
usability of QP.
6.2.2
Statistically testing QP-DYN and RC4
QP-DYN performs encryption and decryption in a stream cipher mode; in particular, it generates a key-stream of the same size of the plaintext to be ciphered.
This key-stream is XOR-ed with the plaintext generating the ciphered text.
Such operations are the same as those performed by other stream cipher algorithms e.g., RC4 [75]. In 2005, Andreas Klein presented an analysis of the
RC4 stream cipher, showing correlations between the RC4 keystream and the
key, and again in 2008 Klein presented a successful attack on RC4 key-stream
based on his 2005 work ([76]). These works show that if there are correlations
in the key-stream generated from a stream cipher, the stream cipher itself is
reversible and can be statistically attacked recovering the key.
The National Institute of Standards and Technologies (NIST) sets the guidelines to verify a stream cipher algorithm based on pseudo-random numbers
generators (PRNG) [77]. A PRNG should successfully “pass” some statistic
98
tests in order to be usable to cipher classified information 1 . These tests are a
subset of other sets of tests used to discover correlations in bit sequences generated from a PRNG. NIST gives some documentation about these tests in [78].
As there is a lot of work about RC4 stream cipher cryptanalysis and about
the correlations notable in the key-stream generated, we decided to start analyzing the differences, noticed after performing NIST tests, between RC4 and
QP-DYN. We tested RC4 and QP-DYN using the TestU01 [79] C library for statistical testing; this library provides more tests than those required by NIST;
moreover some of these tests are harder to be passed. The tests in TestU01 library are divided in some batteries: SmallCrush, BigCrush, Rabbit, Alphabit,
FIPS-140-2, pseudo DIEHARD. There is not a battery performing all the tests
required by the NIST but the tests are available in the library [80]; we implemented a battery of tests performing all the tests required.
The results of our NIST test battery running for RC4 and QP-DYN algorithm are shown in [57].
RC4 does not pass some tests, while QP-DYN passes all the tests required
by NIST. Moreover while QP-DYN does not show correlations in the keystream
generated, RC4 does it in a very short time if compared with the QP-DYN times
(Total CPU time for RC4 is equal to 00:32:13.93, Total CPU time for QP-DYN
amounts to 04:00:07.85). In every run performed, RC4 failed always the same
tests, while QP-DYN always passed all the tests performed.
The results of the tests performed give a clear indication that QP-DYN,
1 Classified information is sensitive information to which access is restricted by law or regulation to particular classes of persons. A formal security clearance is required to handle classified
documents or access classified data. The clearance process requires a satisfactory background investigation. There are typically several levels of sensitivity, with differing clearance requirements.
This sort of hierarchical system of secrecy is used by virtually every national government. The act
of assigning the level of sensitivity to data is called data classification.
99
when used properly with strong keys, is a strong and robust stream cipher
even but effort required is higher than that required by other algorithms.
6.3 Protecting inter process communication
Smartphone applications are commonly installed and stored in memory, and
in modern devices all the application’s data are kept safe from the OS by using a sandbox approach. Such approach prevents other applications to access
unauthorized data insulating each application from the others [81], [82], [83].
In many cases applications installed on the same device may interoperate
in their working environment using mechanisms similar to the inter-process
communication (IPC) and made available by the mobile operating system. Unfortunately, mobile devices lack in flexible solutions for making these communications secure.
In this section is presented a framework proposed to secure the message exchange with the services installed on Google Android mobile devices. VASs realized by different providers are discovered, used and composed by an Application Frame designed for realizing complex goals. We implemented a prototype of our proposed framework on a real device and we performed extensive
testing to measure the overhead introduced by the cryptographic operations
required to protect the inter process communication.
We named this framework SAVED (Secure Android Value addED services).
SAVED enables secure communication between services and applications using such services via Inter Process Communication (IPC)/Remote Procedure
Call (RPC). Each VAS is realized through an Android Service. The access to
such a service requires the execution of an authentication and authorization
phase among the involved parties. Once this initial phase is completed, the ap-
100
6.3. PROTECTING INTER PROCESS COMMUNICATION
plication sets up a secure communication with the service using a symmetric
encryption scheme.
6.3.1
State of the art
Android is a multi-process system, in which each application (and parts of the
system) runs in its own process. Most security between applications and the
system is enforced at the process level through standard Linux facilities, such
as user and group IDs that are assigned to applications [84]. The Android system requires that all installed applications be digitally signed with a certificate
whose private key is held by the application’s developer. The Android system uses the certificate as a means of identifying the author of an application
and establishing trust relationships between applications. The Android approach grants security of application’s data, and prevents access to all services
developed by others. Every service publishes in its personal manifest file the
permissions required to use the service. One of the permission settings in the
manifest file is Protection level. The Protection level field configures the security policies required by the service; if the level is set to signature the service
will communicate only with these applications with which it shares the same
developer certificate.
The main advantage of the approach followed in the Android design is
that developers have to focus their attention only on the application, while the
OS grants that all the applications that are not allowed to access the services
are prevented from doing so. This simplification comes at an extra cost: only
developers sharing certificates and private keys can use services already developed in new applications. This is a huge limitation compared to the growing
size of the mall market and the number of organizations and developers en101
rolled in publishing applications and services.
The approach of Android prevents third parties to start using the framework’s VAS. Developers can use each others’ services sharing certificates and
credentials: in this case, the applications can interact but the security of the
whole framework is granted from a single digital signature; if the developer’s
digital signature is stolen a hacker could sign his/her own applications, thus
getting complete access to all data of the framework.
Our approach wants to promote the framework scalability and grant secure
access to services developed by other users without the need to share private
data. We propose to insert a new layer that handles security of inter-process
communications; in such layer, trustability is granted directly by the security
policy of the framework, and each application can require access and publish
services interacting with the framework like in a PKI environment. Thanks to
SAVED framework it is possible to face different kinds of threats:
• Service Spoofing: the application refers to a service by simply using an
interface that establishes the name, the package and the methods signatures; if the original service is replaced on the mobile device, applications
that exploit that service are unaware of the substitution.
• Memory Dump: starting from Android 1.5, a new API has been introduced to generate a memory dump programmatically. The static method
dumpHprofData(String fileName) of the Debug class generate a dump
file that can be converted with the hprof-conv tool of the Android SDK
and, subsequently, analyzed with different memory analysis tools (e.g.,
Eclipse MAT, JProfile, etc.). If a fake application execute the dump periodically and export the dump data using a connection (e.g., HTTP con-
102
nection), it is possible to steal the data exchanged among applications
and services.
6.3.2
The framework
SAVED (Secure Android Value addED services) is a framework that grants secure communication between services without requiring private data sharing.
Our intent is to improve interoperability between applications and services
facing the limits of the Android’s native approach. The purpose of SAVED is
to allow applications to use services developed by others, to add new VAS to
the framework or even to create new applications using already existing VAS.
All the interactions performed using the proposed frameworks will be performed in a secure way. SAVED adds supplementary security at the process
communication level: each application is accredited to the framework which
grants privileges to access in a secure way shared services and facilities. Single
process security provided using sandboxes with the Android approach is also
preserved in SAVED. In our framework we defined two main entities:
• Application, which provides graphical user interface, and all the logic
implementing the task to be realized. Applications are implemented extending the Android.Activity class.
• Value Added Service (VAS), which provides to the applications developed using the framework all the certified services. VASs are implemented as remote services extending the Android.Service class. The ProxyCA and the ProxyTSA are two special VAS in the framework; these
VASs allow the communication with a Certification Authority and a Timestamping Authority, respectively.
103
In order to realize Applications participating to the framework, developers
have to extend specific interfaces and include particular resource packages.
When a new VAS is realized, it is required to export its class package. Such
class packages will be imported from the Applications that will use the services provided by the VAS. The packages imported will be used to perform
inter-process communication. Including such packages and extending the interfaces will provide the supplementary security layer that will grant a secure
communication between entities and prevent the access to the services to those
applications that are not allowed.
Moreover, we tried to address some best practices to create components
participating to the framework enforcing the required security needs. Some
examples follow:
• Activation code: when the Application/VAS is installed on the device an
unlock code should be required to the user; the Application/VAS will
remain locked (preventing all interactions) until the user will insert the
proper activation code of every entity;
• Use of standard certificate: each component should have a proper X509
digital certificate signed from a valid Certification Authority (CA), such
certificate will be saved in a keystore inside the component memory area;
the component will be responsible to take care of managing correctly the
keystore itself to grant a secure saving of the other’s certificates;
• Model View Control Pattern: VAS and Applications will take care of implementing independently graphical user interfaces to be shown to the end
user;
• Mutual Authentication: each entity needs to implement a mechanism to
104
Figure 6.7: Mutual Authentication phase.
grant mutual authentication. The mutual authentication should be ensured by mutually exchanging and verifying the digital certificates. Using a handshake schema (e.g., TLS handshake) the involved entities exchange their digital certificates, check the certificates validity through the
ProxyCA, and mutually authenticate themselves (Figure 6.7).
• Session Authentication: once the entities are mutually authenticated, a session key (i.e., SK) is shared. According to our approach the SK is gen105
Figure 6.8: Session Authentication phase.
erated by both the Application and the VAS using parameter defined by
the two parties (i.e., CTRL A and CTRL B). Adopting a key agreement
protocol (e.g., Diffie-Hellman protocol) the involved entities agree on secret SK that will be used to encrypt subsequent communications (Figure
6.8).
• Session Encryption: Every VAS allows access to its functionalities only
to “trusted” Applications; trusted Applications have performed successfully the Mutually Authentication and the Session Authentication phases.
In order to enforce the uniqueness of each interaction with VAS a random
106
Figure 6.9: Session Encryption phase.
value (i.e. Nonce A) is used; the confidentiality is granted by encrypting
the exchanged data with the SK.The Application composes the results
of different VAS in order to realize a complex goal. At the end of this
phase, the Application interacts with a timestamping authority through
the ProxyTSA in order to securely keep track of the creation time of the
realized goal (Figure 6.9). The sensitive data of the operation are summarized applying an Hash function (i.e., Op Hash) and these data are sent
to the Timestamping service.
107
Mutual Authentication, Session Authentication and Session Encryption represent the secure core of SAVED framework and should be carefully performed
in order to join the framework.
6.3.3
The framework implementation
We developed a prototype of the SAVED framework on an Android 1.5 platform. The main features of the proposed framework are encapsulated into the
jar files that contains two kind of files (i.e., .aidl, .Stub) for the inter process
communication. AIDL (Android Interface Definition Language) is an IDL [85,
86] with which it is possible to generate automatically the source code that allows two Android applications to exchange information using IPC. AIDL/IPC
interface based mechanism is similar to Common Object Model (COM) or Common Object Request Broker Architecture (CORBA). In order to implement an
AIDL/IPC service it is required to perform some steps:
• Create an .aidl file to define the interface (YourInterface.aidl). The interface defines the access methods and the fields available to a client.
• Add the .aidl file to the makefile and implement the methods of the interface creating a class that extends the YourInterface.Stub (.Stub file is
automatically generated by the tool) and implements methods declared
in the .aidl description file.
• Publish the interface to clients rewriting the Service.onBind (Intent) method;
this method will return an instance of the class implementing the interface.
•
108
Figure 6.10: SAVED framework main packages.
This IPC mechanism needs a way to share complex information, such as nonprimitive types, between two entities. In order to achieve this goal Android
provides Parcelable class able to serialize and deserialize complex types. Figure
6.10 simplifies the package diagram of SAVED. The picture shows on top the
following core .jar files:
• pkgApp.jar contains the interface InterfaceApplication that must be implemented by every class that want to participate SAVED as an Application;
• pkgServ.jar contains the interface InterfaceService that needs to be implemented by every class that want to be a VAS in the framework;
• pkgCA.jar carries the IProxyCA.aidl with his relative .Stub file; these files
109
allow the communications between the entities of SAVED and the ProxyCA. Moreover, the jar file contains the parcelable class ReqX509 that is
mandatory for the communication;
• pkgTSA.jar packages the IProxyTime.aidl with his relative .Stub file to
grant communication with the Proxy TSA;
• pkgCommBase.jar contains the three base parcelable files that grant the
communication between the Application and the VASs, namely CertificatePack.java, KeyPack.java and ResourcePack.java.
In order to grant to an Application to contact and receive services from all the
VAS inside the framework, and so assemble the services offered from the VAS
to create complex applications, it is required to install the ProxyTSA and the
ProxyCA Android packages (apk); these entities are shown in the lower left
half of Figure 6.10. ProxyCA is one of the underlying VASs that exist in the
framework. All entities must submit to the ProxyCA the digital certificates
they receive from their communication partners. The service contacts a web
service that works as an online Certification Authority, inserts the certificate
in a XML file and through a secure HTTP connection (i.e., HTTPS) asks for
the certificate verification. The web service checks the certificate validity and
answers with an XML response.
ProxyTSA is another basic VAS of SAVED. As the ProxyCA the ProxyTSA
takes in account the communication with an external partner, the timestamping
web service. All the communications between the proxy and the timestamping
web service are managed through XML messages on HTTPS.
The lower right half of Figure 6.10 illustrates a VAS and an Application
participating SAVED. Third parties that want to contribute to the framework
110
may easily create and add new Applications or VASs. In the remaining we
sketch how a developer can realize VASs and Applications.
Building a value added service
1. Create a new Android project with a class that extends the native Service
class; Import in the project pkgServ.jar, pkgCommBase.jar, pkgCA.jar;
2. The main class of the project must implement InterfaceService interface
class and consequently all his methods;
3. Create the graphical user interface;
4. Create the IServiceX.aidl in the project as described previously;
5. Create and export pkgXVAS.jar containing IServiceX.aidl and the corresponding .Stub file generated automatically;
6. Service class must implement, all the standard methods of the Android
native Service class, and the .aidl interface with all the methods defined
through the description language;
7. Release the service as an .apk file for the installation on the device.
Building an application
1. Create a new Android project which contains a class that extends the native Activity Android class; Import in the project pkgApp.jar, pkgCommBase.jar, pkgCA.jar, pkgTS.jar;
2. The main class of the project must implement the InterfaceApplication
interface with all his methods;
111
3. Create a graphical user interface to allow the user to interact with the
Application;
4. Import from each VAS you want to use in the Application the corresponding jar file (i.e., pkgXVAS.jar)
5. Use each service in a proper way, taking care of managing and releasing
correctly the connection with the involved VAS. Note that early versions
of Android platform serialize the access to the services.
6. Release the Application as an .apk file for the installation on the device.
Assume we are in a scenario where we have one Application and one VAS,
each one with its own digital certificate signed by different CAs. Note that in
this scenario, none of the entities “knows” the public key or the certificate of
the counterpart. If the two entities wish to cooperate, they need to authenticate
each other. After contacting the ProxyCA to verify the communication partner
trustability (cfr. the Mutual Authentication phase), an asymmetric cryptography session to exchange the session key can be started (cfr. the Session Authentication phase). Finally, the session between the involved parties is encrypted
using symmetric cryptography (cfr. the Session Encryption phase). The need to
switch from asymmetric to symmetric cryptography is due to the performance
overhead of asymmetric cryptography: indeed, the switch from asymmetric to
symmetric cryptography improves the performances of the whole framework
reducing the effort due to encryption/decryption operations.
6.3.4
On a real device
The framework has been tested on an Android HTC Magic device. The device was equipped with Android 1.5 OS, 3.2 M-pixel camera, Integrated GPS
112
Phase
1. Mutual Authentication*
1. Mutual Authentication
2. Session Authentication
3. Session Encryption
Total Framework Overhead
Time (ms)
1197
446
257
795
1498
Table 6.2: Time overhead for the framework phases.
Antenna, IEEE 802.11 b/g Wi-Fi. Using Android ADB tool different .apk, created using Eclipse IDE, have been installed on the HTC Magic. The testing
phase has highlighted a slower response of the Applications due to security
operations, inter-process communications via AIDL interfaces and parcelable
classes. We executed some performance tests using our prototype. We aimed at
measuring the time computational overhead introduced by the use of SAVED,
and thus we measured the time needed to execute security functions. In particular, we have considered the overhead related to each one of the phases described in Section 6.3.2.
In Table 6.2 we can see the time overhead introduced by SAVED. The first
row of the table refers to the first execution of the Mutual Authentication phase,
while the second row refers to the subsequent executions. In the first case the
more time required is justified by the need to update the keystore with the
new digital certificates; this delay is paid once. The total framework overhead
amounts to 1.5 second preserving the usability for real use cases.
113
People have really gotten comfortable not only sharing more information and different kinds, but
more openly and with more people - and that social norm is just
something that has evolved over
time.
Mark Zuckerberg
7
Value added services on backup data
Introduction
In this chapter we present some possible use cases where the application of our
backup approach can bring an improvement of the interaction among people.
In the first part of the chapter we show a system which allows the user to
share part of his/her backup data with some selected contacts; a shared backup
can ease communication within an enterprise environment, among friends or
university colleagues and, with some constraint such as geographic location
and time, in other situations such as meetings or conferences. Some results of
the experimentation of the shared backup proposed have been recently presented in the 4th IFIP International Conference on New Technologies, Mobility
and Security [28].
In the second part we show a methodology to extract social network from
backup data. The methodology proposed helps building the network extracting connections into backups, and helps making searches on the web for information publicly available and findable using standard search engines. The
115
CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA
social network can be useful for several objectives; for example in an enterprise
it can be used to choose people that are “friends” out of office to be inserted into
workgroups, this may improve productivity avoiding conflicts between collaborators. An approach like that exploits dynamics already present into groups
of people [87].
Some ideas in the second part of this chapter have been proposed in Dellutri’s PhD thesis [31]. An extension of this Dellutri’s thesis, showing some results of this chapter have been published in the First IEEE International Workshop on Information Forensics and Security [32].
7.1 Sharing backup data with closed groups
The common interface, introduced in Chapter 3, used for backup can be even
used as base to enable sharing services to the users of the system. Usually, in
closed groups of people, users unconsciously share each others contact, part
of the calendar, production files or even, in personal interaction, pictures and
videos.
The general idea is to give to the user the possibility of sharing part of the
backup as a common synchronization interface with some selected contacts or
group of contacts of their choice in his personal or business network.
7.1.1
Social backup in business environment
In an enterprise where people collaborate daily, it could be important for employees to share commonly useful information e.g., calendar, part of the address book, templates for presentations or documents etc... Moreover if a new
employee joins the team, his/her contacts are added to the common address
book and shared with selected users of his/her new team; his/her new busi-
116
7.1. SHARING BACKUP DATA WITH CLOSED GROUPS
Figure 7.1: Use case of meeting backup and share.
ness device is added to a specific closed group and all data updated to the last
changes are kept from the shared backup and saved on it. If somebody’s device
is lost, or stolen, or the employee leaves the company, the group administrator
can disconnected it from the social backup and the privacy of the group members is granted. Using our approach, all these updates are directly exchanged
and notified on employees’ smartphones.
7.1.2
Sharing conference data
Using some restrictions (i.e., time, location), our approach, can be useful in
some particular kind of events, such as meeting, conventions or conferences.
In this kind of events, the interest on some information (e.g., organizer contacts,
event schedule) is temporary; the participant is interested in such information
just for the time he/she is in the meeting location. Organizing committee can
inform participant sharing documents and other related info just in the event
area and when the event is. In this way participant will have just the information he/she needs directly on his/her mobile device, e.g., the conference
117
schedule is shared via calendar, venue address via maps, committee contacts
via address book; this avoids a plethora of non-useful data for the attendee and
a lot of noisy requests of information for the committee.
(a)
(b)
(c)
Figure 7.2: Android Backup and Restore client.
7.1.3
Shared backup for smartphone
To allow users to share part of their backup with closed groups we deployed a
set of REST web services (see Appendix C for the services implemented) on the
backup server described in Section 5.4. In the control layer we implemented the
business logic which handles the sharing services; using these REST API the
user can manage via client his/her groups and sharing allowing or denying
access to a resource to other users. Before granting access to any content the
server checks the owner’s settings to verify if the user can access such resource.
The shared approach proposed can be generalized, under some conditions,
to a open community willing to share his/her data. In small groups, where
all participants directly know each other the information can be shared freely;
a system like that does not introduce privacy or security problems. If some-
118
7.1. SHARING BACKUP DATA WITH CLOSED GROUPS
(a)
(b)
(c)
(d)
Figure 7.3: Android Backup and Restore client.
one wants to share an information with a friend the system just eases the task
keeping the information up to date in both “sharers” devices.
In an open community the information to be shared must be authorized by
the information owner. In such case, when a user tries to share an information
this information must be verified for example through a code sent to the information. For example if a user wants to share a mobile phone number, the
system will send a text message to the number with a verification code to be
inserted into the system to verify the ownership of an information.
We equipped the Android backup client (see Section 4.2) to access the web
services deployed on the server.
7.1.4
Running the application
We ran our application on real devices. Figure 7.2 and Figure 7.3 show some
snapshots of the Android’s client GUI that explains how the system works; the
Symbian client and the server side implementations are omitted. Figure 7.2(a)
illustrates the backup setup features where the user can choose which data to
119
backup and the type (full or incremental) of the backup to be performed. Figure
7.2(b) shows the interface where the user can select a backup to be restored:
note that this backup may have been performed on another device. In Figure
7.2(c) depicts a granular view of a backup: in this interface the user can choose
to restore just a part of the backup, or to keep updated only part of his/her
data.
Figure 7.3 illustrates how to share information based on geographic coordinates. In Figure 7.3 (a) the user is presented the actions used to manage groups
with which he/she shares information; Figure 7.3 (b) shows all the possible
actions which can be performed by the client. If the user chooses share data,
Figure 7.3 (c) is presented and he/she can share data with his/her friends or
with the groups he/she participates. In Figure 7.3 (c) and (d) show the interfaces to share data geographically: tapping on the map the user specifies the
area where a resource is visible and shares a resource from a backup. When another user of his/her group accesses the area in which the shared information
resides, the new user is notified and the information is made available.
7.2 Extracting social network
In this section we propose an approach that allows one to get information about
the social network of an individual by complementing the information provided by its (smart)phone with the data publicly available on the net.
Our approach is based on a profile graph, whose nodes are the people involved and the (weighted) edges represent their mutual links. In a first phase, a
preliminary version of the graph is built by using all the information available
in the backed up smartphone; later, the obtained graph is refined by mining
publicly available data from the Web. Finally, the graph is clustered to gener-
120
7.2. EXTRACTING SOCIAL NETWORK
ate cliques of people.
All the phases of the process, described above, are performed by an integrated and interactive software tool, that allows the user to rapidly recover a
smartphone’s owner social network. Merging the information coming from the
Web with the information stored on the mobile device allows to reach “clearer”
results avoiding homonymy problems and improving the precision of the link
weighting.
7.2.1
Introduction
The everyday increasing spread of (Web) social networks, like Facebook, Linkedin,
Fickr, Twitter, mySpace, etc. provides an invaluable amount of personal data
publicly available, but it is often difficult if not impossible to distinguish real
friends from Web ones if the WWW is our unique source.
However, things change considerably if we can access an individual’s mobile device data: the two sources, the phone and the Web, together provide a
precise picture of his/her social network. In some cases, if the smartphone is
used to access some social networks, and therefore it stores the relative passwords, the picture provided can be really sharp.
The social network generated can be used to profile users. Users profiles
can become a key point in workgroups creation; productivity should be improved when people already know each others and if they share interests different from work. Beside the group generation in an enterprise the user profile
might prove helpful in other fields including marketing, new social networking services boot-strapping, and Customer Relationship Management.
The added value given by this approach is that intersecting the information
provided by the smartphone with the information freely available on the Web,
121
allows to: i) filter effectively the often too many Web contacts, ii) discover the
mutual relation between the phone contacts, iii) reduce ambiguities (e.g., Facebook friends you do not really know are either filtered out or their connection’s
weight in the social network graph is really low compared with real friends existing both in the smartphone and Web data); iv) provide a “closeness” score.
The above approach, aimed at performing the Mobile Identity Profiling (MIP),
i.e., reconstructing a user’s profile by combining the smartphone’s data analysis with social relationships data found on the Web, is splitted into three stages:
1. the Smartphone Data Analysis (SDA) (Section 7.2.3);
2. the Web Data Analysis (WDA) (Section 7.2.4);
3. the Clustering Analysis (CA) (Section 7.2.5).
The goal of the process is to build the smartphone owner’s social network,
namely the profile graph, and to find all sub-graphs (clusters) which represent
the social groups within the graph.
The purpose of this section is to give the reader an idea about the effectiveness of our approach. We will discuss how the process is performed using an
example to lead the reader through all the stages.
7.2.2
Related work
To the best of our knowledge, our approach of combining three different techniques in order to reconstruct an individual’s social network is novel. In this
section we briefly discuss related works about the three distinct processes. The
leitmotif connecting these processes is the concept of identity; through this section, with identity we mean “that part of the self by which we are known to
122
others” [88]. A remarkable work about the identity construction on social networks is given by Zhao et al. in [89], where the authors study identity construction on Facebook (http://www.facebook.com).
The first phase of the approach described in this section, Smartphone Data
Analysis, is based on some previous works, where we extracted information
residing in mobile devices [33], [34] and analyzed this information to trace the
smartphone’s user activity for forensics objective.
Focusing on Web Data Analysis, interesting results are presented by Mika
et al. in [90], [91]; the authors, dealing with the problem of “bootstrapping”
a Friend-Of-A-Friend (FOAF) based social network, proposed “the traditional
Web as source of information about the social networks in a community”. So
they introduced a system for collecting social network data which fetches data
from the traditional Web by mining the index of Google. Since social networks
spread, many “common” users put themself on the Web and, in particular,
they entered information about who their friends are. We are able to extend
Mika’s experiment to common users, thanks to the part of social networks data
that is publicly available on the Web and that is periodically crawled by search
engines.
Dealing with Clustering Analysis, we focused in the identification of locally dense subgraphs that are sparsely inter-connected, also known as the
paradigm of intra-cluster density versus inter-cluster sparsity (see [92], that
provides an excellent overview about graph clustering). In Section 7.2.5 we
provide details about the algorithms used.
123
7.2.3
Smartphone Data Analysis (SDA)
Smartphone data analysis aims to decode the content of a smartphone and analyse it generating a graph representing the interactions between the users and
his/her mobile device contacts.
The decoding phase aims at generating parsers able to export data in XMLformat, or that can be integrated directly in the analysis application. Data decoded (contacts list, SMS list, event log and calendar entries list) can be hardly
analysed manually by a human operator, because she has to correlate their
unique identifier, in order to reconstruct situations, conversations and relationships between a device’s owner and her contacts.
The Smartphone Data Analysis is composed of four sub-phases1 :
The File Analysis, that analyses files contained in the device filesystem organizing them by MIME-Type and run the decoding tool over personal data
files.
The Contact Analysis, which merges together duplicate contacts information, highlights those contacts which may represent potential source of noise
for the next Web analysis.
The Event Analysis, that mines the phone’s log to reconstruct the user’s
activity. Events generated by a mobile device always belong to the following
macro-classes: voice calls, data calls, SMS/MMS sent or received, SIM change,
SD change. Voice calls and SMS/MMS logs are useful to reconstruct of the
phone owner’s social activity, and are used to determine the strength of the
relation between the owner and each contact.
The Messages Analysis completes the event analysis by extending it to all
1 In
this section we do not deal with the calendar analysis, because it is not directly correlated
with the social network discovery.
124
(a)
(b)
(c)
Figure 7.4: The graph representation of contacts (a) and their relationships with the
phone’s owner (b), which are revealed by the number of calls and number of SMS/MMS. In (c) is shown the graph after the execution of SESORR; the edges represent
the relationships extracted from the Web (web-edges).
SMS/MMS that have been deleted from event log but could still persist in the
saved SMS/MMS list.
After these analysis sub-phases have been completed, the profile graph is
built and the information collected is organized and stored inside it. Such data
structure allows us to represent the social network given by the phone interaction between the owner and the contacts. The graph generated is an undirected2
graph; the weight of each edge connecting two vertexes represents the strength
of the connection between the user (central vertex) and a contact (other vertex).
In our graph representation the phone owner is in the center of a circle
composed by the contacts we found in her smartphone (Figure 7.4a). After the
SDA, the graph is augmented with edges from the phone’s owner and the contacts with whom she has communicated (via SMS/MMS or call). The weight of
2 An
undirected graph is a graph in which the vertices are connected by undirected edges. An
undirected edge is an edge that has no orientation.
125
these links is computed trivially as the sum between the number of calls (sent
or received) and SMS/MMS (sent and received) between the owner and the
contacts. The value of this sum is used to compute the edges length, in order
to put the most frequently contacted people closer to the owner (Figure 7.4b).
7.2.4
Web Data Analysis (WDA)
The goal of the Web Data Analysis component is to find the social network between the phone owner and her contacts, and among them, by retrieving people’s public information on the World Wide Web. As mentioned before, we
follow the approach of Mika et al. [90] to retrieve relationships from search
engine records. In this section we will detail the relationships-retrieving algorithm and the techniques used to estimate the Web edges weight.
SESORR Algorithm
In order to reconstruct relationships among a phone’s contacts, we used the
huge amount of data collected by search engines over the years to obtain relational network data. Our approach is to submit all possible pairs of names
and surnames to the search engine and to retrieve the results, i.e., the pages
where the two pairs hname, surnameii,j occur simultaneously, by counting the
number of pages found (hits) and, for each of them, by saving the title and the
short description returned by the search engine. Moreover, it counts the nonstopwords3 contained in titles and description for further analysis. To accomplish this task, we designed the SESORR (Search Engine SOcial RelationshipsRetrieving) algorithm. As preliminary examination, SESORR submits the query
3 In a natural language, stopwords are function words or connectives such as articles and prepo-
sitions that do not provide useful information for our scope.
126
hname, surnamei ∨ hsurname, namei
for each contact and stores the results in the G nodes data structures. In such
way it is able to discard from subsequent queries the contacts which are not
present on the Web (i.e., the query returned a result set R = ∅). Finally, for each
pair of contacts i, j, SESORR submits the following query:
(hnamei , surnamei i ∨ hsurnamei , namei i)
∧
(hnamej , surnamej i ∨ hsurnamej , namej i)
and stores the results. Name and surname pairs are sent to the search engine
by enclosing them within quotation marks: in such way the search engine is
forced to retrieve only pages which contain the adjacency of the search terms.
The piece of software which implements SESORR is able to contact both Google
and Yahoo. After the SESORR execution, the profile graph is enriched by Web
edges between the owner and their contacts, and among contacts. An example
is reported in Figure 7.4c.
For each Web edge, SESORR merges the titles and the descriptions of each
result set entry in a single string. It computes the occurring frequency of each
non-stop word. Such keywords and their frequency are stored in the Web edge
and are displayed to the operator when she clicks on the edge. Given a Web
edge, the list of keywords and their frequencies provides a kind of “semantic
vision” of the relationship and the user is able to figure out a meaning of the
relationship at glance.
127
Figure 7.5: Frequency distribution of URLs (domains) providing relationships.
Moreover, besides title and description, SESORR stores each URL in the
result set. By calculating the frequency with which each URL occurs over all
relationships on a single profile, SESORR also provides a distribution of frequency of domains related to the profile and its contacts (see Figure 7.5).
Web-edge weight estimation
In order to measure a Web edge weight, i.e., how similar are two contacts between which a Web edge exists, we define a function σ(e) ∈ [0, 1] which measures the similarity between u and v individuals. In the semantic Web area,
the similarity between two classes is assessed by observing the number of instances that these classes share, their individual number of instances, and the
total number of instances they contain. The most frequently used metrics are
the following:
128
Jaccard index [93] between two sets X and Y is defined as the ratio between
the size of the intersection and the size of the union of the two sets being compared:
σ(X, Y ) =
|X ∩ Y |
|X ∪ Y |
Normalized Google Distance (NGD) [94] it takes advantage of the number
of hits returned by Google to compute the semantic distance between concepts.
Given two search terms x and y, the the normalised Google distance between
x and y, N GD(x, y), can be obtained as follows:
N DG(x, y) =
max{log f (x), log f (y)} − log f (x, y)
log M − min{log f (x), log f (y)}
where f (x) is the number of Google hits for the search term x, f (y) is the number of Google hits for the search term y, f (x, y) is the number of Google hits for
the tuple of search terms xy, and M is the number of Web pages indexed by
Google (approximately ten billion pages).
In our preliminary experiments, we measured the Pearson’s correlation between the Jaccard index and the NGD; the results were (approximately) in the
range 0.3 − 0.4, thus exhibiting a small-medium correlation. The software allows the user either to choose between the metrics, or to combine them by
providing relative weights.
7.2.5
Clustering Analysis (CA)
At the final analysis stage, we want to identify subgroups of contacts sharing
similarities. Generally speaking, the goal of clustering is to group together
similar elements and thereby to identify the skeleton structure of the input
data.
129
Figure 7.6: Contact-to-cluster assignment.
In this section we have employed clustering techniques to split the phone
owner’s social graph into small subgraphs (clusters). We chose spectral algorithms, i.e., algorithms based on spectral properties of the matrices associated to the input graph, because i) they are general and versatile, and ii) they
proved to perform effectively in the identification of locally dense subgraphs
that are sparsely inter-connected. In particular, we used the Spectral [95] and
FullSVD [96] algorithms; both are based on the Singular Value Decomposition
(SVD) performed on the adjacency matrix of the input graph.
Spectral. This algorithm, introduced in in [95], where it was called Spectral
Algorithm I, is essentially a projection onto the first k right singular vectors.
The intuition of this technique is that the matrix A describes the location of m
points in an n-dimensional space. The projection onto the subspace defined by
the top k right singular vectors gives the best k-rank approximation of A.
FullSVD. Drineas et al. studied in [96] k-means and its continuous version.
While the discrete version is known to be N P-hard, the latter can be solved
efficiently using a projection onto the top k left singular values. Similar to the
Spectral algorithm, the cluster assignment is a discretization of the continuous
solution. We refer to this method as FullSVD in order to avoid ambiguities with
130
the SVD computation which is the core of all these algorithms.
Both the algorithms output a matrix C which has on the rows the nodes
(contacts) indexes, and on the columns the clusters indexes. The matrix cells
represent intuitively the weight of the “closeness” between a contact and a
cluster. We assign a node to the cluster with the maximum absolute value. In
Figure 7.6 is shown a screenshot with the C matrix details and, for each contact,
the chosen cluster.
It is important to mention that, in both the clustering algorithms used, k,
i.e., the number of clusters, is an input parameter; the quality of the results
heavily depend on a good choice of its value. In the literature there are several
measures to assess the quality of a cluster [92], and our tool can measure many
of them, thus providing a feedback to the operator.
In Figure 7.7 are reported some snapshots depicting, for each algorithm and
spectral (unweighted)
spectral (Jaccard)
40
50
60
0
10
20
30
40
SVD (unweighted)
30
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0
10
20
30
40
50
60
70
0.8
0.4
●● ●
● ● ●●
● ●● ●●●●
●●●●●●●●●
●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0
10
20
30
40
●
60
70
50
60
70
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●
●●
●
●
●
●
● ●●
●●
● ●●
●●●●
●●●●
● ●
●●●●●●●●●●●●
●●●●●
●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●
0
10
k
coverage
performance
50
SVD (Google Similarity)
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●
●●
●
●
k
40
k
●
0.0
0.8
0.4
0.0
●
●
● ●
●●
● ●●●●●●●●●
●●
● ●●
●●●●●●●●●
●●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●
●●
●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●
20
70
SVD (Jaccard)
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●●
●
●
10
60
k
●
0
50
0.0
0.0
70
k
0.8
30
0.4
20
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.0
10
●
●●
● ●●
●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●●
●●
●●●
● ●●●●●
●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.8
●●
●●●●
0.0
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●
●
●●
●●●
●●
●
●●
● ●●●
●●●●●●
●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●●
●
0.4
0.4
0.8
0.4
●
0
spectral (Google Similarity)
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●●●●●●●●●●●●●●●●●●
●
●●●
●
●●
●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.8
●
20
30
40
50
60
70
k
inter−cluster conductance
intra−cluster conductance
Figure 7.7: Clustering metrics trends. The profile graph, used in the example, has 218
contacts and 1242 Web edges; the black vertical line is relative to k = 10, the chosen
value for the input parameter k.
131
University colleagues
Musician friends
Friends (from Facebook)
Work colleagues
Family members
Figure 7.8: The final result of the whole process: the social network clusters.
for each Web edge weight metric, the performance of clustering quality indexes
to the variation of k.
7.2.6
The Final Result: The Social Network
After the three stages, described in the previous sections, the tool is able to
produce a graphical view of the clusters, shown in Figure 7.8. Here, for each
cluster, the Phone’s owner is represented by a black node.
It is interesting to notice that, by looking at each cluster’s Web edge key-
132
7.3. CONCLUSIONS
words, we have been able to gather the area of interest shared by individuals
in each cluster.
Furthermore, the graph structure in each cluster may provide an intuition
about the mutual relationship of the people involved. For example, from the
“work colleagues cluster”, that is far from being a complete graph, it is possible
to see “who works with whom”; even more interesting, inside the “musician
friends cluster” we see a complete subgraph, made by five nodes, that corresponds to the members of a rock band, and only one of them actually plays
with the Phone’s owner (together with the other people/nodes shown in the
cluster).
It is important to emphasize that all the above information, together with
everything shown in Figure 7.8, have been obtained by a smartphone and our
tool, able to mine the Web data, with no additional information available.
7.3 Conclusions
Our profiling method relies on information stored in the smartphone and its
precision depends on the quantity and quality of such data.
Since the method we used to find a person on the Web, and her relationships with other phone contacts, relies on her first name and last name, precision is strongly dependent on the care used by the owner when she inserted
each first name and last name. Sometimes only the names or the nicknames of
a contact are inserted (e.g., most intimate contact); after submitting such weak
identity to a search engine, this will produce no or useless results. To deal with
this aspect, the framework performs a pre-processing of all contact entries (e.g.,
highlights entries which have name or surname missing) and suggests the operator identifies the contacts (where possible) and enters their correct names
133
and surnames.
An obvious limitation deals with the time frame that we can reconstruct.
The event log stored in the device is limited to a fixed size which restricts the
vision of user activity to the latest. Also the size of stored sent and received
SMS/MMS, if set, will limit precision.
We were not able to perform name-surname/number matching and we
just can access the log of the operations performed by the user in the last period as we have no access to mobile company’s customers data. Even holding
this weak quantity of information the results look interesting as we were able
to generate clusters mapping real life relationship between the user and her
friends.
134
8
Conclusions and Future Work
Conclusions
In this thesis we proposed a full integrated solution which aims for solving the
backup and synchronization problem in mobile environments. We propose to
focus the management af the information, on how the information is logically
structured (e.g., a contact will have name, surname, phone numbers. . . ). Our
solution delegates backup client applications installed on mobile devices to extract data from the internal data stores of the device and send these data using
a common format to a remote server. The approach proposed is based on three
main parts:
The first part of the system is a client application installed on the mobile device. The client extracts the information from the internal storage of the device
and sends such information to the backup server using an extensible common
format (i.e.,XML). Moreover the client is responsible to get information from
the server and restore such information into the device.
Client applications have been implemented differently depending on the
availability of APIs on the specific platform. In Android and newer Symbian
135
CHAPTER 8. CONCLUSIONS AND FUTURE WORK
devices we have been able to access data into datastores of the device backed
up via standard APIs. Moreover in Android under some conditions (rooted
device) it has been possible to access even applications settings. Unfortunately
accessing data directly into datastores (in particular with writing permissions)
was not possible for the client applications implemented for older versions of
Symbian OS and for Microsoft Windows Mobile 5 and 6. These applications
backup the full system and the backup is later analyzed on the backup server.
All the applications developed interact with the server using the common
format proposed to grant interoperability between vendors and operating systems.
The second part of the system is the backup server. The server implements the
functionalities of getting data from the client application, handle these data and
store the information into a common database. In case of restore, the backup
server provides access to internally stored information to the clients; access is
given granting privacy and security to users.
We implemented the backup server as to be the more standard, scalable
and extensible as possible. The basic idea is that our backup server should
use a standard communication protocol that can be exploited by every class of
device. Experimental results have shown that mobile clients running on different architectures/operating systems can interact with the proposed server via
HTTP/HTTPS accessing all the features provided.
The backup server have been even enabled to extract personal data from
old Symbian raw backup. We proposed a methodology to reverse engineer
datastores where personal data are contained and implement the parsers that
extract these data.
136
The third component of the system are the services on backup data. These
services can be provided by the same provider of the backup, or from other
authorized service providers.
In this thesis we have proposed two kind of services; one focused on end
users and another on business and administrators of the system. The first service provides to the users the capability of sharing part of data in their personal
backup with some selected contacts of their choice. The second service implements a social network extractor which starting from backup data and data
publicly available on the web, generates a social network and the cliques of
contacts into the backup; this is done by clustering the various groups of interconnected contacts.
The approach proposed has been considered by Telecom Italia to be used into
the cubovision project to implement the set-top box backup operations. Part
of the information stored by the user into his/her set-top box device is saved
on a remote backup server. The set-top box device mainly contains video/audio files, but there are some other contents, such as applications installed and
configured by the user, that are backed up using some ideas presented in this
thesis.
Future work
Currently we are implementing an Apple iPhone and a RIM BlackBerry client
application able to interact with the system implementing all the features of
the Android application. Moreover we are extending the Android application
to improve usability and improve performances.
137
CHAPTER 8. CONCLUSIONS AND FUTURE WORK
We are also participating to the Ericsson Application Awards 20111 with
the shared backup idea and with the improved Android application equipped
with augmented reality.
The social network extraction tool described in Section 7.2 is being improved to generate the social network of all the backup system; this, when
used by a sufficient number of persons, will solve the name/surname problem. In fact the mobile phone number can be considered a unique identifier
and allows the system to disambiguate homonyms or merge contacts named
differently in different backups.
The merge could be done considering set of names and surnames (data set
containing common names and surnames can be found in the web); matching
the name and surname field of a contact with commonly used ones will help
ignoring nicknames and will make the system more precise.
New services on backup data can be provided to users and to administrators such as integration with other systems such as interconnected TV, set-top
boxes, laptop and tablet devices.
We designed the server part of the system to be really extensible; a huge
quantity of services using backup data can be provided. This opens the project
to a plethora of novel ideas, in this thesis we described two use cases just
to show how such an extensible backup system can be exploited by service
providers.
1 http://www.ericssonapplicationawards.com/
138
A
The Symbian S60 format
A.1 Address book
Type flag
Meaning
Type flag
Meaning
04024008
2402C003
24028004
0402000D
0C02C00D
04024009
0402400A
14024002
14024003
24024006
24024007
fax
job
mobile(work)
video
wv user ID
po.box
city
extension (home)
state (home)
street (work)
country (work)
14020001
04140280
1C02C00C
1402400D
0402C008
04028009
0402800A
14028002
14028003
24028006
3C02000B
H fax
home
nickname
video (home)
url
extension
state
street (home)
country (home)
postal code (work)
DTMF
Type flag
Meaning
Type flag
Meaning
2402C004
0402C007
04028008
2402800D
14028001
0402C009
0402C00A
1402C002
2402C005
2402C006
3402800B
W fax
mobile
?
video (work)
url(home)
street
country
postal code (home)
po.box (work)
city (work)
note
04028007
1402C000
04020008
24020004
24024005
0402000A
14020002
14020003
24020006
24020007
general
mobile(home)
pager
work
url(work)
postal code
po.box (home)
city (home)
extension (work)
state (work)
Table A.1: Possible values for the rows of table “DATA TYPE TABLE”. They describe
the type of attributes present in the “DATA BLOCK”. (Symbian S60 v2)
139
APPENDIX A. THE SYMBIAN S60 FORMAT
Contacts and their data are stored in the Contacts.cdb file (located under C:\System\Data). During the methodology iterations, we found that
contacts data were fragmented and spread across the entire file. In fact, after
a contact update, Symbian preserves the old contact entry and appends new
one at the end of the file with the same ID but fresh data. When the system
performs a DB compression, obsolete entries are purged. After a first analysis,
we found that data could grouped in three macro-areas (parts, see Table A.2).
For each contact, the three parts are connected because each of them shares
the same contact ID. The first part stores metadata about each contact and
a block containing attributes like phone/fax/mobile numbers, snail mail address and notes:
D2
10
FF
20
30
64
76
76
04
00
1D
04
14
04
04
1A
20
33
64
00
09
30
30
31
12
12
00
00
00
13
30
66
32
9D
9D
00
1F
00
00
65
66
37
FA
FA
00
10
31
61
36
0F
0F
00
00
02
02
00
00
00
C0
00
00
00
07
00
04
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00
FF FF FF FF
32
39
20 E1 00
20 E1 00
00
00
33 38 38 37 36 35 34 32 33
VD
ID
CXF1
20 30 30 65 31 32
30 30 66 66 61 39
64 31 32 37 36
EDIT DATE
CREATION DATE
04 00 00 00 00 00
00 00 1F
TYPE TABLE LEN
|
TYPE TABLE
|
04 00 00 00 00
DATA BLOCK LEN
FIELD FLAG
DATA BLOCK
The second part stores contact’s name, surname and company:
140
A.2. CALENDAR
10 00 00 00
12
50 61 70 E0 20 43 65 6C 6C
09 13 00 10
ID
NAME LENGHT
NAME
CXF1
The third part stores email addresses:
1C
32
00
10
24
6F
6C
00 00 03
00 00 00
64 6F 6D 65 6E 69 63 40
69 62 65 72 6F 2E 69 74
EMAIL LENGHT
EMAIL ID
EMAIL FLAG
ID
EMAIL ADDRESS LEN
EMAIL ADDRESS
|
A.2 Calendar
Calendar entries are stored in Calendar file (located under C:\System\Data).
A calendar’s entry belongs to one of the following categories: anniversary, meeting or note. A sample of calendar’s entry, an anniversary without alarm, is
shown below:
141
03
0F
0A
52
02
01
A4
05
AA
01
08
20
41
61
0E
AA
AA
00 00 00
00 00 00
00 00
28 52 03
28 A2 AC
00 00 01 00
00 00
6E 6E 69 76 65 72 73
72 79 41 6F 66 66
20 29
28
28
FT
VS
AF
BL
ID
Flag1
CD (GG MM)
05
AD A2 AC
Flag3
Flag2
TL
Text
|
ET
SD
ED
An example of anniversary with the alarm setted to on is shown here below:
142
A.2. CALENDAR
03
0F
1A
52
02
01
A4
05
AA
01
3C
43
72
09
08
1E
41
61
0E
AA
AA
00 00 00
00 00 00
00 00
28 5B 03
28 A2 AC
00 00 01 00
61
6D
01
00
6C 65 6E 41 6C 61
53 6F 75 6E 64 32
00
00
6E 6E 69 76 65 72 73
72 79 41 6F 6E
20 29
28
28
FT
VS
AF
BL
ID
Flag1
CD (GG MM)
05
AD AT
VS2
ANL
ATXT
|
09 01 00
Flag2
TL
Text
|
ET
SD
ED
An example of meeting with the alarm set to off is shown below:
143
00
0F
0A
50
02
01
A4
01
A6
A8
01
08
1A
4D
0E
A6
A6
00 00 00
00 00 00
00 00
28 C6 02
28
28
00 00 00 00
00 00
65
20
28
28
65 74 41 6F 66 66 52 64 61 79
29
64 05
91 05
FT
SF 00 00 00
AF
50
ID
Flag1
CD (GGMM)
RF
ERD
A8 28
01 00 00 00 00
Flag2
TL
Text
ET
SD
ED
An example of meeting with the alarm set to on is shown below:
144
A.2. CALENDAR
00
0F
1A
50
02
01
A4
01
A5
A6
01
3C
43
72
05
08
18
4D
0E
A5
A5
00 00 00
00 00 00
00 00
28 34 03
28
28
00 00 00 00
61
6D
00
00
6C 65 6E 41 6C 61
53 6F 75 6E 64 AF
00
00
65
20
28
28
65 74 41 6F 6E 52 64 61 79
29
EC 01
90 03
FT
SF 00 00 00
AF
50
ID
Flag1
CD (GGMM)
RF
ERD
A6 28
01 00 00 00 00
RTN LEN
RTN
|
05 00 00
Flag2
TL
Text
ET
SD
ED
Some meetings are saved in a different way, we call this kind of entries special
meetings; here below is shown a special meeting with the alarm set to off and
without repetition.
145
00
0F
08
50
02
01
A4
08
1C
53
66
0E
A6
A6
00 00 00
00
00
28
00
00 00
00
7A 03
00
6D
66
20
28
28
65
52
29
64
91
65 74 41 6F
6F 66 66
05
05
FT
SF 00 00 00
AF
50
ID
Flag1
CD (GGMM)
Flag2
TL
Text
|
ET
SD
ED
An example of special meeting with the alarm set to on and repetition set to off
is shown below:
00
0F
18
50
02
01
A4
3C
43
72
05
08
1A
53
6E
0E
A6
A6
146
00 00 00
00 00 00
00 00
28 9C 03
61
6D
00
00
6C 65 6E 41 6C 61
53 6F 75 6E 64 AF
00
00
4D
52
20
28
28
65
6F
29
94
C1
65 74 41 6F
66 66
02
02
FT
SF 00 00 00
AF
50
ID
Flag1
CD (GGMM)
RTN LEN
RTN
|
05 00 00
Flag2
TL
Text
|
ET
SD
ED
A.3. EVENTS LOG
Notes are stored in a similar format as special meeting. An example of note is
shown below:
02
0F
08
52
02
01
A4
08
10
44
0E
A5
A5
00 00 00
00
00
28
00
00 00
00
6F 03
00
61 79 4E 6F 74 65
20 29
28
28
FT
VS
AF
BL
ID
Flag1
CD {ID2}
Flag2
NL
Text
ET
SD
ED
A.3 Events log
Events and status changes are stored in Logdbu.dat file (located under C:
\System\Data), and can belong to the following categories: sms , mms, voice
and data calls, SIM changes. In the last case, the event is stored as an sms, so
we will not examine it. Details about the fields are reported in Table A.4.
An example of SMS is shown in the following:
147
03
BA
FF
B2
60
1C
52
6F
04
80
44
54
20
45
4E
65
1A
2B
XX
18
00
00
00
00
00
A3 EB EB B6 2C E1 00
16 00 00
61 6D 6F 6E 61 20 4D
72 65 74 74 69
00 05 00 33
4F
49
55
4D
41
20
4D
20
53
45
2C
6F
41
49
43
20
6F
67
4E
4E
49
58
76
49
56
52
20
76
20
49
45
55
69
53
54
20
4E
61
45
4F
49
41
6D
52
20
4E
20
65
41
41
53
43
6E
33 39 XX XX XX XX
XX XX XX XX XX
00
00
00
01
00
00
00
00
01
00
00
00
00
00
00
00
00
00
00
A4
A voice call example is shown below:
148
20
44
49
45
74
03
DATE
FF
NAME FLAG
|
60 16 00 00
NAME LENGTH
NAME
04 00 05 00 33
MESS LENGTH
|
|
MESS
|
|
|
NUMBER LENGHT
NUMBER
|
18
DIRECTION
00 00 00 01 00
00 00 00 00 00
00 00 00 00 00
00 01 00 00 00
00
00
00
A4
A.3. EVENTS LOG
00
1F
01
70
63
1C
52
6F
01
00
00
67
1A
2B
XX
20
17
66 4A EF C8 2C E1 00
16 00 00
61 6D 6F 6E 61 20 4D
72 65 74 74 69
00 00 00
02 36
33
XX
03
00
39
XX
00
00
XX XX XX XX
XX XX XX
00 02 A6
A4
00
DATE
01
NAME FLAG
63 16 00 00
NAME LENGHT
NAME
|
01
DIRECTION
CALL TIME
67 02 36
NUMBER LENGHT
NUMBER
|
20 03 00 00 02 A6
17 00 00 A4
An MMS example is reported below:
05
41
01
F0
00
0E
54
00
02
30
36
5A
47 86 12 05 2A E1 00
00 00 00
69 6D 20 6D 6D 73
00 07 00 00 00
00
08
34 31 38 2C 37 32 37
05
DATA
01
F0
00 00 00 00
0E
PROVIDER
00 00 07 00 00 00
02 00
NS
NUMBER
5A
A data call, or data traffic log entry, may belong to two different categories
which are related to the type of storage format used: mms-type and sms-type.
149
The text body of sms or mms is used to store a single packet content. An example of sms-type data call is reported below:
03
C1
FF
A2
00
7E
61
7A
69
4D
69
0A
34
18
36 5F 21 08 2A E1 00
03
04
50
72
69
76
41
61
00
00
65
65
6F
61
49
72
00
33
72
20
20
72
4C
65
00 03
20
69
64
65
20
75
6C
65
20
65
74
20
76
41
20
69
73
69
6C
61
39 30 30 31
00 00
6C
65
20
69
73
69
72
61
63
73
7A
76
74
65
6F
7A
69
74
20
63
03
DATA
FF
A2 03 00 00 00 03
00 04 00 33
|
|
PACKET CONTENT
|
|
|
SEP MES NUM
34 39 30 30 31
FLAG END
A.4 SMS
SMS are stored in the first folder (assuming that the folders are ordered alphabetically) in /System/Mail folder. An example of received message is
reported in the following table:
150
A.4. SMS
68 3C 00
...
00 10 00
25 3A 00
0C 52 69
0E
20 29 34
00 10 45
01 00 00
01
00 00 00
00 01 00
FA 54 17
28
33 34 39
44
69 73 74
00 00 00
00 00 00
00 00 01
00 00 00
F1 5A 15
02 91
34
2B 33 39
38 35 38
15 00 81
28
33 34 39
10 68 3C
00 00 00
10
63 65 76
18
04 01 00
00 02 00
00 00 00
00 00
46 F2 29 E1 00
34 36 37 37 31 34 36 34
65
00
02
00
00
41
66
00
03
00
61 6E 6F 20 41 6C 65
02
00
00
F2 29 E1 00
33 32 30 35
35 30 30
34 36 37 37 31 34 36
68 3C 00 10 68 3C
...
00 10 00 00 00 00
flag1
Text
ET
Received Flag Mar
00 10 45 04 01 00
01 00 00 00 02 00
Received Flag
00 00 00 00 00 00
00 01 00 00 00
Date
SNL
Number
NL
Name
00 00 00 00 00 02
00 00 00 02 03 00
00 00 01 00 00 00
00 00 00 00
SCRD
SCF
SCNL
SCN
|
ESNF
ESNL
ESN
An example of sent message is reported in the following table:
151
68 3C 00
...
00 10 00
25 3A 00
10
49 6E 76
0E
20 29 34
00 10 F5
00 00 00
00
00 02 03
01 00 00
00 00
00 0C 09
00 91
34
2B 33 39
30 30 30
04 91
34
2B 33 39
36 37 37
00 00
00 93 2D
152
10 68 3C
00 00 00
10
69 61 74 6F
18
02 01 00
00 00 00
00 00 00
00 01 00
22 F2 1C 32 E1 00
33 34 39 32
38 39 38
33 34 39 34
31 34 36
4B F2 29 E1 00
68 3C 00 10 68 3C
...
00 10 00 00 00 00
flag1
TL
Text
ET
Received Flag Mar
00 10 F5 02 01 00
00 00 00 00 00 00
Received Flag
00 02 03 00 00 00
01 00 00 00 01 00
00 00
Date
flag2
UNL
UN
|
RNF
RNL
RN
|
00 00
SCRD
A.4. SMS
Field Name
VD
ID
CXF1
Size (Bytes)
Description
Example
D2
10
FF
FF
76
E1
76
E1
1D
64
00
09
FF
12
00
12
00
00
02
00
02
00
00
Stores the size in bytes of DATA BLOCK
04
14
00
04
00
04
1A
EDIT DATE (ED)
8
CREATION DATE (CD)
8
(Uncertain) Used by DB indexing
Contact identifier (stored as little-endian)
Flag. The first byte have to be equal to the last four. The second byte depends
on Symbian version; values can be 09, 0B, 10. ”13 00 10” is constant.
17 bytes-offset from CXF1. Represents the number of microseconds from year
zero.
Represents the number of microseconds from year zero.
TYPE TABLE LEN
(T T L)
TYPE TABLE (T T )
1
41 bytes offset from CXF1. Stores the lenght (in bytes) of the TYPE TABLE
DATA BLOCK LEN
(DBL)
FIELD FLAG (F F )
DATA BLOCK (DB)
ID
NAME LENGHT (N L)
NAME
SURNAME LENGHT
(SL)
SURNAME
COMPANY NAME LENGHT
(CN L)
COMPANY NAME
(CN )
CONTACT END
EMAIL LENGHT (EL)
EMAIL ID
EMAIL FLAG
ID
EMAIL ADDRESS LENGHT
(EF L)
EMAIL ADDRESS
2
4
9
TTL
1
2
DBL−2F F −1
2
4
1
NL
2
Table of 12-bytes lenght rows, describing the types of the corrisponding data
in the DATA BLOCK. The first 5 bytes are the table start flag. The last 5 bytes
indicate the end table flag. The first 12-bytes row does not contain useful information. For further information about data types, see Table A.1.
00 00
13 00 10 FF
FF
9D FA 0F 20
9D FA 0F 20
00
00
00
C0
00
00
00
00
00
07
00
00
04
00
00
00
00
00
00
00
00
00
Flag which is repeated as many times as the number of fields in DATA BLOCK.
20 00
Stores contact’s information, according to fields type described in
TYPE TABLE. Each field is separated by 00.
33 33 38 38 37 36
35 34 32 33
Contact identifier (stored as little-endian) - The same as above.
The size in nibbles of the name field.
The contact’s name
1
The size in nibbles of the surname field.
10 00 00 00
0E
43 6C 61 75 64 69
6F
08
SL
2
1
The contact’s surname
The size in nibbles of the company field.
43 65 71 61
0C
1
The contact’s company
44 72 2E 77 68 79
4
Flag. Denotes the end of a contact’s details. The first byte depends on Symbian
version (09, 0B, 10, as in CXF1 field). The other bytes are constant.
09 13 00 10
The size in nibbles of the email address block.
The ID of email address
Flag.
Contact identifier (stored as little-endian) - The same as above.
The size in nibbles of the email address string.
1C
32
00 00 00 03
10 00 00 00
24
The email address string.
6F 64 6F 6D 65 6E
69 63 40 6C 69 62
65 72 6F 2E 69 74
1
EL
2
4
4
1
EF L
2
Table A.2: This table lists all contact’s data which can be found in the Contacts.cdb.
Since data are located in three logical file areas, the table is split in three parts.
153
Field Name
Size (Bytes)
FT
1
VARIABLE SEQUENCE(V S)
4
ALARM FLAG (AF )
1
BODY LENGTH (BL)
ID
Flag1
CREATION DATE (CD)
DAY (GG)
MONTH (M M )
REP FLAG (RF )
1
4
3
4
2
2
1
REPEAT UNTIL
2
ANNIVER DATE (AD)
2
ALARM TIME (AT )
2
VAR SEQ
5
AL NAME LEN (AN L)
AL NAME (AN )
Flag2
TEXT LENGTH (T L)
TEXT
1
AN L
4
3
1
TL − 1
2
END TEXT (ET )
START DATE (SD)
3
2
START DATE M
(SDM )
END DATE (ED)
4
END DATE M (EDM )
4
2
Description
Example
Indicates the event type: if 00 is a Meeting if 02 is a daynote if 03 is an anniversary.
A four bytes variable secuence if the entry represents a Meeting and the first
byte is 10 this indicates the meeting needs to be processed in a different way
This byte indicates if the alarm is set to ON (1A for normal events 18 for Special
Meetings) or OFF (0A for normal events 08 for Special Meetings).
Represents the length of the in nebbles of the following part of the entry.
Calendar entry identifier (stored as little-endian)
Indicates a calendar entry in this area
Stores the creation date of the Calendar entry, is composer by GG and MM.
Represents the day part of the CD field.
Represents the month part of the CD field.
Appears only for meeting type entryes; indicates if the repetition of the meeting
is daily (value 01), weekly (value 02), montly (value 03).
Appears only for meeting type entryes; indicates the date until the event has
to be repeated.
Stores the date of the event, is an integer counting the number of days since
1-1-1980. This field appears only if the entry type is Anniversary.
If the alarm is set to on stores the information about the alarm time, else is
unused. For the day note this field does not appear.
is a variable secuence, in case of Anniversary the first 3 bytes are 01 00 00 if the
anniversary’s alarm is set to off the lasttwo are 01 00 may vary but their value
is always lass than 32.,
Indicates the size in nibbles of the ALARM NAME field
Stores a text field indicating the ringtone name for the alarm
00
is a flag characterizing a calendar event
Indicates the size in nibbles of the TEXT field
Stores the text field of the calendar entry
Is the end flag of the TEXT field the value is always 0E 20 29
Stores the starting date of the entry if is a note or an anniversary else it does
not appear
Stores the starting date of the entry if is a meeting else it does not appear
Stores the ending date of the entry if is a note or an anniversary else it does not
appear
Stores the ending date of the entry if is a meeting else it does not appear
0F OO OO OO
0A
52
02
10
A4
A4
52
01
00 00 00
00 00
28 52 03
28
03
A5 28
AA 28
A2 AC
01 00 00 01 00
3C
43
6C
75
08
20
41
72
6F
0E
A5
61
61
6E
00
6C 65 6E 41
72 6D 53 6F
64 32
00
6E
73
66
20
28
6E 69 76 65
61 72 79 41
66
29
A5 28 EC 01
A5 28
A5 28 90 03
Table A.3: This table lists all calendar entries such as Notes Meetings Anniversaries
stored in the Calendar file.
154
A.4. SMS
Field Name
Size (Bytes)
Description
Example
COMMON PART
START DATA FLAG
(SDF )
1
DATE
8
END DATA FLAG
(EDF )
NAME FLAG (N F )
1
NAME LENGTH
(N L)
NAME
1
1
NL
2
Indicates a date starting at next byte, this flag combined with the EDF indicates what kind of data are stored in the session. (03 DATA FF indicates an
SMS, GPRS traffic or ‘DATAMESSAGE’ MMS, 05 DATA 01 MMS recived from
the operator or GPRS traffic to the operator, 00 DATE 01 indicates incoming
and outgoing calls)
Stores the date in wich the operation has been performed. The date in stored
in big endian format.
Is located with an offset of 8 after the SDF and indicates that a date finishes
here.
Is located with an offset of 1 after the EDF and if the entry refers to a contact present in the address book (for the SMS the value is B2 if the message is
to/from a contact in the address book for calls the value can be 70 if present
else 60).
If the contact is present in the address book is located with an offset of 4 after
the N F and indicates the length in nibbles of the subsecuent field N AM E.
the N L and stores the name of the contact stored in the address book.
03
BA A3 EB EB B6 2C E1 00
FF
B2
1C
52 61 6D 6F 6E 61 20 4D 6F
72 65 74 74 69
SMS PART
MESS LENGTH
(M L)
MESS
NUMBER LENGTH
(N U L)
NUMBER
1
ML
2
1
NUL
2
DIR
1
DIRECTION
1
the N AM E field and indicates the length in nibbles of the subsecuent field
M ESS, else is st with an offset of 5 after N F .
Is located with an offset of 1 after the M L and stores the message sent/recived.
Is located with an offset of 1 after the M ESS and indicates the length in
nibbles of the subsecuent field N U M BER.
Is located with an offset of 1 after the N U L and stores the number of the
sender/recipient of the message.
Is located with an offset of 2 after the N U M BER and stores the information about the direction of the data stored in the section (value 00 indicates a
sent message else the value will be 02).
80
44
52
49
43
45
20
69
67
1A
4F
41
54
49
4D
43
61
1A
4D
20
4F
52
45
45
6D
41
54
20
45
20
4E
65
4E
49
41
20
58
41
6E
49
20
44
49
20
2C
74
20
49
20
4E
55
6F
65
53
4E
55
53
4E
76
20
45
56
53
49
41
76
6F
2B 33 39 XX XX XX XX XX XX
XX XX XX XX
02
CALL PART
CALL TIME (CT )
4
NUMBER LENGHT
(N U L)
NUMBER
1
NUL
2
the N AM E and stores the information about the direction of the data stored
in the section (value 00 indicates an exiting call else the value will be 02).
Is located with an offset of 2 after the DIR field and stores the information
about the duration of the call, data is atored in big endian format.
Is located with an offset of 4 after the CT and indicates the length in nibbles
of the subsecuent field N U M BER.
02
00 00 00 00
1A
2B 33 39 XX XX XX XX XX XX
XX XX XX XX
MMS PART
PROV LEN (P L)
PROVIDER
NUM START (N S)
NUMBER
1
PL
2
2
NUL
2
is located with an offset of 5 after the EDF and indicates the length in nibbles
of the subsecuent field P ROV IDER.
Is located with an offset of 1 after the P L field and stores the information
about the mms service provider’s name.
Is located with an offset of 4 after the CT and indicates the length in nibbles
of the subsecuent field N U M BER.
0E
54 69 6D 20 6D 6D 73
30 08
36 34 31 38 2C 37 32 37
Table A.4: This table lists all event entries such as SMS, MMS, voice and data calls, SIM
change.
155
Field Name
Size (Bytes)
Description
Example
COMMON PART
flag 1
4
REC FLAG MARK
(RF M )
REC FLAG (RF M )
4
1
If this flag is in the starting part of the file or at offset 5 the file does not contain
SMS so there will be no need to parse it.
Received Flag Marker indicates a recived message
25 3A 00 10
Starts at byte 13 after the (RF M ). If its value is 01 then the message is recived
else if the value is 00 the message is a sent message.
Generally is just after the flag 1 indicates the message’s text length.
If appears after TEXT LEN indicates a message from a special number.
Stores the text of the SMS message.
10
TEXT LEN (T L)
SPEC MES (SM )
TEXT
1
1
TL − 1
2
END TEXT (ET )
1
Indicates the end of the message text.
DATE
8
This field starts 12 bytes after the recived flag (REC FLAG).
SEND NUM LEN
(SN L)
NUMBER
1
This is an otiponal field: appears only if the sender’s number is stored in the
address book. Indicates the length of the sender NUMBER field.
Stores the number of the sender if the sender appears in the address book.
20 29 34 18
10
02
49 6E 76 69 61 74
6F
0E
RECEIVED MESSAGE
SN L
4
NAME LENGTH (N L)
NL
4
NAME
SERV CENT
(SCRD)
SERV CENT
(SCF )
SERV CENT
(SCN L)
SERV CENT
(SCN )
1
This is an otiponal field: appears only if the sender’s number is stored in the
address book. Indicates the length of the following NAME field.
Stores the name of the sender if the sender appears in the address book.
FA 54 17 46 F2 29
E1 00
28
33 34 39 34 36 37
37 31 34 36 34
44
REC DATE
8
Is stored with an offset of 23 bytes after the name field.
FLAG
2
Indicates that the message service center’s number starts here.
69
6E
F1
E1
02
NUM LEN
1
Indicates the length of the following SERV CENT NUM field.
34
Stores the number of the messge service provider.
SCN L
4
NUM
EFF SERV CENT FLAG
(ESCF )
EFF SERV NUM LEN
(ESN L)
EFF SERV NUM (ESN )
73 74 65 66 61
6F 20 41 6C 65
5A 15 41 F2 29
00
91
3
Indicates that the effective message service center’s number starts here.
2B 33 39 33 32 30
35 38 35 38 35 30
30
15 00 81
1
Indicates the length of the following EFF SERV NUM field.
28
Stores the effective number of the SMS message service provider.
33 34 39 34 36 37
37 31 34 36
ESN L
4
SENT MESSAGE
DATE
8
Is stored with 14 bytes offset from the end of REC FLAG.
Flag 2
UNDEF NUMB LEN
(U N L)
UNDEFINED NUMBER
(U N )
2
1
Is a flag indicating the presence of a recived message.
Indicates the length of the following UNDEFINED NUMBER field.
RECIVER NUMB FLAG
(RN F )
RECIVER NUMBER LEN
(RN L)
RECIVER NUMBER
(RN )
SERV CEN REC DATE
(SCRD)
RN L
4
2
1
RN L
4
8
It is not clear which number does this field stores. Maybe the number of the
sender’s message service provider.
This flag indicates the presence of the sender’s number in the next bytes.
2B 33 39 33 34 39
32 30 30 30 38 39
38
04 91
Indicates the length of the following RECIVER NUMBER field.
34
Stores the reciver’s number.
2B
34
36
00
E1
Stores the reciving date for the message service provider, it is stored with an
offset of 2 bytes from the end of RECIVER NUMBER.
Table A.5: This table lists all fields characterizing an SMS.
156
00 0C 09 22 F2 1C
32 E1 00
00 91
34
33 39 33 34 39
36 37 37 31 34
93 2D 4B F2 29
00
B
The Backup communication protocol
B.1 Backup item
/backup/{backupType}/device/{imei}/
HTTP method: PUT
Attributes: backupType indicates the type of backup performed; possible values are full or diff. imei allows to identify the backed up device via its
IMEI number.
<backupItem>
<backupType>full</backupType>
<calendar_bak>false</calendar_bak>
<contact_bak>false</contact_bak>
<file_bak>false</file_bak>
<sms_bak>true</sms_bak>
<app_settings_bak>false</app_settings_bak>
</backupItem>
Figure B.1: Example of XML payload for a backup item.
157
APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL
B.2 Contact item
/backup/{backupType}/device/{imei}/contacts/{contactItemName}
HTTP method: PUT and GET
Attributes: contactItemName is the unique identifier used by the client for
a contact resource; backupType indicates the type of backup performed; possible values are full or diff in case of PUT request and restore in case of
GET request.
<contact>
<detail__list>
<detail>
<label>label0</label>
<value>value0</value>
</detail>
</detail__list>
<given__name>name</given__name>
<phone__number__list>
<phone__number>
<number>+123456789054</number>
<type>2</type>
</phone__number>
<phone__number>
<number>+12309876543</number>
<type>1</type>
</phone__number>
</phone__number__list>
<backupItem>
</backupItem>
</contact>
Figure B.2: Example of XML payload for a contact item.
158
B.3. CALENDAR ITEM
B.3 Calendar item
/backup/{backupType}/device/{imei}/calendar/{calendarItemName}
Attributes: calendarItemName is the unic identifier used by the client for the
calendar items;
<calendar>
<alarmOffset>0</alarmOffset>
<allDay>1</allDay>
<cal_id>6</cal_id>
<calendarName>meeting name</calendarName>
<description>meeting description</description>
<endTime>1277769600000</endTime>
<location>somelocation</location>
<startTime>1277683200000</startTime>
<summary>Meeting</summary>
<type>event</type>
<backupItem>
</backupItem>
</calendar>
Figure B.3: Example of XML payload for a calendar item.
159
B.4 Message item
/backup/{backupType}/device/{imei}/sms/{smsItemName}
Attributes: smsItemName is the unic identifier used by the client for the SMS
resources;
<sms>
<body>text</body>
<sender>+123456789000</sender>
<type>2</type>
<backupItem>
</backupItem>
</sms>
Figure B.4: Example of XML payload for a message item.
160
B.5. GENERIC FILE ITEM
B.5 Generic file item
Files are sent in Base-64 encoding, if the file is too big for a single package the
file is splitted in several chunk and sent chunk by chunk to the server which
keeps track of the chunks received and assembles the file after all chunks have
been received.
/backup/{backupType}/device/{imei}/files/{fileItemName}
/init_byte/{init_byte}/final_byte/{final_byte}
Attributes: fileItemName is the unic identifier used by the client for files,
(e.g., the path); init byte is the first byte of the file’s chunk sent. final byte
is the last byte of the file’s chunk sent.
<file>
<content>UBy/eQhuUlasfiUe/bocsDM3TbRsHPAfASGQj4fc1
+eRu2vnsuab0z6kYYlmo1BWtKbU/wBrGmkxtMLctJLwHjTiRSn
h06ZAhwskO9kcVyaUFDUUFelcgQ4U4Jgjc3qx5fDTc9/
.......
/hiZsZZEQkoILIo6kCm30/TlRk0SktinpQ==</content>
<file_type>file</file_type>
<final_byte>169999</final_byte>
<init_byte>160000</init_byte>
<name>09.jpg</name>
<backupItem>
</backupItem>
</file>
Figure B.5: Example of XML payload for a generic file item.
161
B.6 Setting item
/backup/{backupType}/device/{imei}/app_settings/{fileItemName}
/init_byte/{init_byte}/final_byte/{final_byte}
Settings are managed as files, for android are usually Shared Preferences
files, for iPhone plist files, these files can be analyzed on the server to extract
data and make these data interoperable.
B.7 List methods
HTTP method: GET
These methods are used to obtain the lists of resources present in the last
backup. List methods have been implemented for; contacts, files, sms,
calendar, settings.
/backup/diff/device/{imei}/contactsIdList
/backup/diff/device/{imei}/filesIdList
/backup/diff/device/{imei}/smsIdList
/backup/diff/device/{imei}/calendarIdList
/backup/diff/device/{imei}/appSettingsIdList
Figure B.6 shows the XML response produced by the server for a list of items
required using the contactsIdList method. Each dataItem contains two
information: the itemName that is the unique identifier of the client and the
162
B.7. LIST METHODS
<itemIdList>
<idList>
<dataItem>
<itemName>480</itemName>
<backupItem>
</backupItem>
</dataItem>
<dataItem>
<backupItem reference="../../dataItem/backupItem"/>
</dataItem>
.......
</idList>
</itemIdList>
Figure B.6: Example of XML payload for a contact list response.
timestamp of the last backup. These data are used when performing the differential backup to undestand which contents should be updated.
163
B.8 Restore
B.8.1
Listing items on the server
/backup/device/{imei}/backup_item_list
HTTP method: GET
This method provides the list of all the backups present on the server for the
device identified by the IMEI and for all the devices owned by the authenticated user. When a user decides to restore from a backup, he/she choose the
backup to restore from the list given by this method. Figure B.7 shows a typical
list of backups.
B.8.2
Choosing data to be restored
/backup_restore/device/{imei}/{data_type}/{backup_id}
HTTP method: GET
Attributes: data type indicates the type of data to be restored, possible values
are contact, calendar, file, SMS or app depending on what to be resotred;
backup id identifies the backup on the server. Figure B.8 shows the response
from the server to a restore request.
Choice of data to be restored can be done punctually identifying just one
item on the server. The response to a request like that will be like that shown
in Figure B.8.
/restore/device/{imei}/{data_type}/{item_id}
164
B.8. RESTORE
<backupItemList>
<backupList>
<backupItem>
<backupType>full</backupType>
<deviceItem>
<imei>00000001</imei>
</deviceItem>
<file_bak>true</file_bak>
<sms_bak>true</sms_bak>
<calendar_bak>true</calendar_bak>
<contact_bak>true</contact_bak>
<backup_id>1</backup_id>
</backupItem>
.......
<backupItem>
<backupType>diff</backupType>
<deviceItem>
<imei>00000000</imei>
</deviceItem>
<file_bak>false</file_bak>
<sms_bak>false</sms_bak>
<calendar_bak>false</calendar_bak>
<contact_bak>true</contact_bak>
<backup_id>27</backup_id>
</backupItem>
</backupList>
</backupItemList>
Figure B.7: Example of XML payload for a setting item.
165
<restoreItemList>
<restoreList>
<restoreItem>
<item_id>49</item_id>
<description>item description</description>
</restoreItem>
.......
<restoreItem>
<item_id>29</item_id>
</restoreItem>
</restoreList>
</restoreItemList>
Figure B.8: Restore method response.
166
C
The Sharing communication protocol
C.1 Sharing methods
C.1.1
Item listing
Returns the list of the sharable items present in the last backup.
/sharing/device/{imei}/contactsIdList
HTTP method: GET
<itemIdList>
<idList>
<dataItem>
<backupItem>
</backupItem>
</dataItem>
<dataItem>
<backupItem reference="../../dataItem/backupItem"/>
</dataItem>
</idList>
</itemIdList>
Figure C.1: Example of XML payload for a list of items.
167
APPENDIX C. THE SHARING COMMUNICATION PROTOCOL
Similar methods are available for files (filesIdList) and calendars (calendarIdList)
C.1.2
Share a item
Method used to share a item.
/sharing/device/{imei}/{data_type}/sharing_item/{id}
Attributes: data type can be contact, calendar or file in case of PUT; in
case of GET file cannot be used. Files are managed by
/sharing/device/{imei}/file/init_byte/{init_byte}/final_byte/
{final_byte}/sharing_item/{id}
id represents the itemName when the request method is PUT, when the method
is GET, id is the identifier into the server’s database.
<sharingItem>
<group>
<groupId>7</groupId>
</group>
<isLB>false</isLB>
</sharingItem>
Figure C.2: Example of XML payload to share an item with a group.
For each data to be shared sharingItem, for each group group with which
the user wants to share information. isLB indicates whether the information
is geotagged or not in this case the value should be false in PUT requests as the
method does not handle location based data.
168
C.1. SHARING METHODS
C.1.3
Location based sharing
Method used to share a item using location based attributes.
/sharing/device/{imei}/{data_type}/lb_sharing_item/{id}
Attributes: the method works as the non location base method, even in this
case there is a method to handle files in case of GET request;
/sharing/device/{imei}/file/init_byte/{init_byte}/
final_byte/{final_byte}/LB_SharingItem/{id}
sharingItem and group, in this method are used as in the non location based
case; latitude, longitude and radius (see Figure C.3) can be defined to
set the area where the information is available. isLB indicates whether the
information is geotagged or not.
<sharingItem>
<group>
<groupId>x</groupId>
</group>
<latitude>41.963706</latitude>
<longitude>12.501572</longitude>
<radius>5000</radius>
<isLB>true</isLB>
</sharingItem>
Figure C.3: Example of XML payload to share an item with a group using location.
When the request is a GET and the isLB field is true, latitude, longitude and
radius are used to locate the item on the map.
169
C.1.4
Listing shared data
This method is used to list all information shared by user’s groups; the method
can be location based or not.
/sharing/device/{imei}/{data_type}/sharing_item_list
/sharing/device/{imei}/{data_type}/lb_sharing_item_list
HTTP method: GET
Attributes: data type can assume contact, calendar or file value depending on the kind of data to be retrieved;
<sharingItemList>
<sharingList>
<sharingItem>
<group>
<groupName>University</groupName>
</group>
<sharing_id>2</sharing_id>
<description>Johnn Doe</description>
<isLB>false</isLB>
</sharingItem>
<sharingItem>
<group>
</group>
<sharing_id>3</sharing_id>
<description>Mike Black</description>
<isLB>false</isLB>
</sharingItem>
.................
</sharingList>
</sharingItemList>
Figure C.4: Example of XML payload for a list of items.
170
C.1. SHARING METHODS
Result does not contain data, but metadata visible in sharingItem; group
indicates the group with which the information is shared, sharing id is the
identifier of the data on the server and description contains a human readable description of the content shared.
Results can be filtered by group using the following method with the identifier of the group in group id field.
/sharing/device/{imei}/{data_type}/sharing_item_list/group/
{group_id}
171
C.2 Groups methods
C.2.1
Creating group
Using this method the user can create a new group.
/sharing/device/{imei}/group
HTTP method: PUT
<group>
<memberList>
<username>[email protected]</username>
</memberList>
</group>
Figure C.5: Example of XML payload to create a group.
Such method gets groupName field to set the name of the group, and all the
usernames in the memberList to set the users in the group.
C.2.2
Listing groups
Method used to get the list of groups available for the user, and the users participating the group.
sharing/device/{imei}/group_list
HTTP method: GET
172
C.2. GROUPS METHODS
<groupItemList>
<groupList>
<groupItem>
<admin>
<nickname>johnn</nickname>
</admin>
<memberList>
<nickname>bill</nickname>
</memberList>
</groupItem>
<groupItem>
<groupName>Work</groupName>
<admin>
<nickname>mike</nickname>
</admin>
<memberList>
<nickname>bill</nickname>
</memberList>
</groupItem>
</groupList>
</groupItemList>
Figure C.6: Example of XML payload of a list of groups.
C.2.3
Handling invitations
Using this method the user can invite other users to a group or handle his/her
invitations to groups.
/sharing/device/{imei}/invitations
173
<groupItemList>
<groupList>
<groupItem>
<memberList>
</memberList>
</groupItem>
<groupItem>
<memberList reference="../../groupItem/memberList"/>
</groupItem>
</groupList>
</groupItemList>
Figure C.7: Example of XML payload to invite users to a group.
In PUT case the user sends the XML in Figure C.7 with username fields set
using the usernames of the users to be invited to the group groupId.
174
C.2. GROUPS METHODS
<groupItemList>
<groupList>
<groupItem>
<groupName>Lavoro</groupName>
<admin>
</admin>
</groupItem>
................................
<groupItem>
<admin>
<nickname>jack/nickname>
</admin>
</groupItem>
</groupList>
</groupItemList>
Figure C.8: Example of XML payload of invitations received by the user.
In GET case the user receives the XML in Figure C.8 with all the groups where
is invited.
Using the invitations response method using a PUT request the user can
decide, setting the status to IGNORED, ACCEPTED or REFUSED whether to ignore, accept or refuse the request.
/sharing/device/{imei}/invitations_response/group/{group_id}/
{status}
175
Bibliography
[1] Jon Toigo. Disaster recovery planning : managing risk and catastrophe in information systems Yourdon Press, Englewood Cliffs N.J., 1989.
[2] Jon Toigo. Disaster recovery planning : preparing for the unthinkable. Prentice
Hall, Upper Saddle River NJ, 3rd ed. edition, 2003.
[3] ADR Data Recovery.
Data loss facts, 2008.
http://www.
adrdatarecovery.com/content/adr_loss_stat.html.
[4] Inc.
loss,
ONTRACK
2001.
Data
International.
Understanding
data
http://www.ontrackdatarecovery.com/
understanding-data-loss/.
[5] DATAMATE.
Microsoft data loss findings, 2001.
http://www.
datamate.com.au/content/view/14/.
[6] Lawrence M. Bridwel and Peter Tippet. ICSA Labs 7th Annual Computer
Virus Prevalence Survey 2001. ICSA Lab, Upper Saddle River NJ, 7th ed.
edition, 2001.
[7] Meta Group.
It performance engineering & measurement strategies:
Quantifying performance loss. Technical report, Meta Group, 2000.
[8] Winterthur.
Un telefono cellulare rubato su due è un iphone.
municato stampa, sep 2010.
Co-
http://www.axa-winterthur.ch/
It/chi-siamo/media/comunicati-stampa-2010/Documents/
20100926-axawin-iphone_it.pdf.
all the URLs reported in this bibliography have been last viewed in December 2010.
177
BIBLIOGRAPHY
[9] Rory Cellan-Jones. Government calls for action on mobile phone crime,
feb 2010. The government has called on the mobile phone industry to do
more to protect handset owners against theft.
[10] Lexton Snol.
More smartphones than PCs by 2011.
PC Ad-
visor, August 2009 http://www.pcadvisor.co.uk/news/index.
cfm?NewsID=3200338.
[11] Larry Dignan. Smartphone operating systems: The market share, usage
disconnect, may 2009. http://blogs.zdnet.com/BTL/?p=18730.
[12] Paul Miller.
Canalys:
Android takes q2 smartphone market
share lead in us with 886 percent year-over-year growth, aug
2010.
http://www.engadget.com/2010/08/02/canalys-android-takes-
q2-smartphone-market-share-lead-in-us-wit/.
[13] Christy Pettey and Laurence Goasduff. Gartner says worldwide mobile
device sales to end users reached 1.6 billion units in 2010; smartphone
sales grew 72 percent in 2010. Gartner press release, Gartner Inc., February
2011.
[14] International Telecommunication Union. Mobile cellular, subscriptions
per 100 people, 2009. http://www.itu.int/en/pages/default.
aspx. viewed 28th January, 2010.
[15] George Reese. Database Programming with JDBC and Java, Second Edition,
chapter Chapter 7: Distributed Application Architecture. O’Reilly & Associates, nov 2000.
178
BIBLIOGRAPHY
[16] Rajkumar Buyya, Chee S. Yeo, and Srikumar Venugopal. Market-Oriented
Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as
Computing Utilities. Aug 2008.
[17] Eric Knorr and Galen Gruman. What cloud computing really means. web,
09 2008. The next big trend sounds nebulous, but it’s not so fuzzy when
you view the value proposition from the perspective of IT professionals.
[18] Alessandro Acquisti, Elisabetta Carrara, Fred Stutzman, Jon Callas, Klaus
Schimmer, Maz Nadjm, Mathieu Gorge, Nicole Ellison, Paul King, Ralph
Gross, and Scott Golder. ENISA position paper no.1 ”Security issues
and recommendations for online social networks”.
ENISA, November 2007.
Technical report,
http://www.enisa.europa.eu/act/res/other-
areas/social-networks/security-issues-and-recommendations-for-onlinesocial-networks/at download/fullReport.
[19] Ann Chervenak, Vivekanand Vellanki, and Zachary Kurmas. Protecting
file systems: A survey of backup techniques. In Joint NASA and IEEE Mass
Storage Conference, 1998.
[20] S. Agarwal, D. Starobinski, and A. Trachtenberg. On the scalability of data
synchronization protocols for PDAs and mobile devices. IEEE Network, 16,
2002. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=
10.1.1.17.427.
[21] Open Mobile Alliance. SyncML specifications, version 1.1, April 2002.
http://www.openmobilealliance.org/tech/affiliates/
syncml/syncmlindex.html#V11.
179
BIBLIOGRAPHY
[22] F. Dawson and T. Howes.
RFC 2426 - vCard MIME Directory Profile.
Netscape Communications, September 1998. http://www.ietf.org/
rfc/rfc2426.txt.
[23] Marc Staimer. Why cloud backup & restore (bur) now! Technical report,
Dragon Slayer Consulting, apr 2010. And How Procrastination Only Increases Risk.
[24] Microsoft. Zmanda, software company enriches cloud-based backup solution with structured data storage. Technical report, Microsoft, gen 2009.
[25] Michael Vrable, Stefan Savage, and Geoffrey M. Voelker.
Cumulus:
Filesystem Backup to the Cloud. ACM Transactions on Storage (TOS), 5(4),
dec 2009.
[26] Zhaohui Wang and Angelos Stavrou. Exploiting smart-phone usb connectivity for fun and profit. In Proceedings of the 26th Annual Computer Security
Applications Conference, Austin, Texas, USA, 2010. ACM.
[27] Marc-Olivier Killijian, David Powell, Michel Banâtre, Paul Couderc, and
Yves Roudier. Collaborative backup for dependable mobile applications
(extended abstract). In In Proceedings of 2nd International Workshop on Middleware for Pervasive and Ad-Hoc Computing (Middleware 2004, pages 146–
149. ACM Press, 2004.
[28] V. Ottaviani, A. Lentini, A. Grillo, S. Di Cesare, and G. F. Italiano. Shared
backup & restore, save, recover and share personal information into
closed groups of smartphones. In 4th IFIP International Conference on New
Technologies, Mobility and Security. IEEE, feb. 2011.
180
BIBLIOGRAPHY
[29] Roy Thomas Fielding. Architectural Styles and the Design of Network-based
Software Architectures. PhD thesis, University of California, Irvine, 2000.
[30] Roy Thomas Fielding and Richard N. Taylor.
Principled design of
the modern web architecture. ACM Transactions on Internet Technology,
2(2):115–150, may 2002.
[31] Fabio Dellutri. Profiling Mobile Identities. PhD thesis, University of Rome
”Tor Vergata”, 2009.
[32] F. Dellutri, L. Laura, V. Ottaviani, and G.F. Italiano. Extracting social networks from seized smartphones and web data. In Information Forensics and
Security, 2009. WIFS 2009. First IEEE International Workshop on, pages 101
–105, dec 2009.
[33] Fabio Dellutri, Vittorio Ottaviani, and Gianluigi Me. Forensic acquisition
for windows mobile pocketpc. In Proceedings of the Workshop on Security
and High Performance Computing Systems, HPCS 2008, Nicosia, Cyprus June
3-6, 2008, pages 200–205, 2008.
[34] Rosamaria Bertè, Fabio Dellutri, Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Vittorio Ottaviani. Fast smartphones forensic analysis
results through miat and forensic farm. International Journal of Electronic
Security and Digital Forensics (IJESDF), Inderscience, 2008.
[35] Rosamaria Bertè, Fabio Dellutri, Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Vittorio Ottaviani. Handbook of Electronic Security and Digital Forensics, chapter A Methodology for Smartphones Internal Memory
Acquisition, Decoding and Analysis. Worldscience, 2008.
181
BIBLIOGRAPHY
[36] Gianluigi Me and Maurizio Rossi. Internal forensic acquisition for mobile
equipments. In IEEE Computer Society Press, editor, 4th Int’l Workshop on
Security in Systems and Networks (SSN2008), Proceedings of the International
Parallel and Distributed Processing Symposium (IPDPS), 2008.
[37] Alessandro Distefano and Gianluigi Me. An overall assessment of mobile
internal acquisition tool. Digital Investigation, 5(Supplement 1):S121–S127,
2008.
[38] Michael Santarini. Nand versus nor. EDN, October 2005.
[39] Microsoft.
ce.
Linear flash memory devices on microsoft windows
http://www.microsoft.com/technet/archive/wce/plan/
flashce.mspx.
[40] Microsoft.
The windows ce 5.0 object store.
http://msdn2.
microsoft.com/en-us/library/ms885891.aspx.
[41] Yost Scott.
Why can’t i copy programs out of windows?, 2007.
http://blogs.msdn.com/windowsmobile/archive/2007/12\
/29/why-can-t-i-copy-programs-out-of-windows.aspx.
[42] F. Dellutri, V. Ottaviani, D. Bocci, G.F. Italiano, and G. Me. Data reverse
engineering on a smartphone. In Ultra Modern Telecommunications Workshops, 2009. ICUMT ’09. International Conference on, pages 1 –8, oct 2009.
[43] P. H. Aiken. Reverse engineering of data. IBM Systems Journal, 37(2):246–
269, 1998.
[44] Chen and Associates. Reverse-DBMS (Access. 2.0) for Windows Reference
Manual Version 3.0, 1994.
182
BIBLIOGRAPHY
[45] Roger H. L. Chiang. A knowledge-based system for performing reverse
engineering of relational databases. Decis. Support Syst., 13(3-4):295–312,
1995.
[46] Kathi Hogshead Davis. August-ii: a tool for step-by-step data model reverse engineering. Reverse Engineering, 1995., Proceedings of 2nd Working
Conference on, pages 146–154, Jul 1995.
[47] Jean Henrard, Didier Roland, Anthony Cleve, and Jean-Luc Hainaut.
Large-scale data reengineering: Return from experience. In WCRE ’08:
Proceedings of the 2008 15th Working Conference on Reverse Engineering,
pages 305–308, Washington, DC, USA, 2008. IEEE Computer Society.
[48] Contacts Database (CContactDatabase).
Symbian developer library.
http://www.symbian.com/Developer/techlib/v70docs/
SDL_v7.0/doc_source/reference/cpp/ContactsModel/
CContactDatabaseClass.html.
[49] Glenn E. Krasner and Stephen T. Pope. A cookbook for using the modelview controller user interface paradigm in Smalltalk-80. J. Object Oriented
Program., 1(3):26–49, 1988.
[50] Jerome Louvel and Thierry Boileau. Restlet in Action. Manning Early Access Program, 2011.
[51] Noelios Technologies. Restlet, 2010. http://www.restlet.org/.
[52] The Apache Software Foundation. Apache tomcat, 2010.
[53] Douglas Schmidt. Pattern-oriented software architecture. Wiley, Chichester
[England] ;;New York, 2000.
183
BIBLIOGRAPHY
[54] Martin Fowler. Pojo, 2000. http://www.martinfowler.com/bliki/
POJO.html.
[55] Richard D. Titus.
Data is the new oil.
Presentation: http://www.
slideshare.net/rxdxt/data-is-the-new-oil, June 2010.
[56] Vittorio Ottaviani, Alberto Zanoni, and Massimo Regoli. Conjugation as
public key agreement protocol in mobile cryptography. In Sokratis K.
Katsikas and Pierangela Samarati, editors, SECRYPT, pages 411–416.
SciTePress, 2010.
[57] Vittorio Ottaviani, Giuseppe F. Italiano, Antonio Grillo, and Alessandro
Lentini. Benchmarking for the qp cryptographic suite. Technical report,
University of Rome “Tor Vergata”, dept. of Informatics, Systems and Production, August 2009.
[58] Antonio Grillo. TIMiD: Trasferring Identities on Mobile Devices. PhD thesis,
University of Rome Tor Vergata, 2011.
[59] Antonio Grillo, Alessandro Lentini, Vittorio Ottaviani, Giuseppe F. Italiano, and Fabrizio Battisti. Saved: Secure android value added services.
In Proceedings of MOBICASE 2010 Conference, International Workshop on Mobile Security, 2010.
[60] Tohari Ahmad, Jiankun Hu, and Song Han. An efficient mobile voting
system security scheme based on elliptic curve cryptography. Network
and System Security, International Conference on, 0:474–479, 2009.
[61] Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Giuseppe F. Italiano. Transaction oriented text messaging with trusted-sms. In ACSAC,
pages 485–494. IEEE Computer Society, 2008.
184
BIBLIOGRAPHY
[62] Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Giuliano Rulli.
Trusted sms - a novel framework for non-repudiable sms-based processes.
In Luı́s Azevedo and Ana Rita Londral, editors, HEALTHINF (1), pages
43–50. INSTICC - Institute for Systems and Technologies of Information,
Control and Communication, 2008.
[63] Eligijus Sakalauskas, Povilas Tvarijonas, and Andrius Raulynaitis. Key
agreement protocol (kap) using conjugacy and discrete logarithm problems in group representation level. Informatica, 18(1):115–124, 2007.
[64] Whitfield Diffie and Martin E. Hellman. New directions in cryptography.
IEEE Transactions on Information Theory, IT-22(6):644–654, 1976.
[65] Marco Bodrato. Personal communication, 2009.
[66] Frank Celler and C. R. Leedham-Green. Calculating the order of an invertible matrix. In In Groups and Computation II, pages 55–60. American
Mathematical Society, 1995.
[67] PRNewswire.
Rcs announces 2007 January-June trading data for the
global cellular phone open market. Note, jul 2007.
[68] NIST. Recommended elliptic curves for federal government use. Technical
report, NIST, July 1999.
[69] Eric Rescorla. Rfc 2631 - Diffie-Hellman key agreement method. Technical
report, RTFM Inc., June 1999.
[70] Elaine Barker, Don Johnson, and Miles Smid. NIST SP 800-56A - Recommendation for Pair-Wise Key Establishment Schemes Using Discrete Logarithm
Cryptography. NIST, March 2007.
185
BIBLIOGRAPHY
[71] Certicom Research. Standards for efficient cryptography - SEC 1: Elliptic curve cryptography. Technical Report 20, Certicom Corp., [email protected], September 2000.
[72] M. Abundo, L. Accardi, and A. Auricchio.
Hyperbolic automor-
phisms of tori and pseudo-random sequences. Calcolo, 29:213–240, 1992.
10.1007/BF02576183.
[73] E. Rescorla. Diffie-hellman key agreement method. RFC 2631, 1999.
[74] FIPS. the official aes standard. FIPS PUB 197, 2001.
[75] Kalle Kaukonen and Rodney Thayer. A Stream Cipher Encryption Algorithm ”Arcfour”. 1999.
[76] Andreas Klein. Attacks on the rc4 stream cipher. Designs, Codes and Cryptography, 48:269–286, 2008. 10.1007/s10623-008-9206-6.
[77] NIST. Random number generation, dec 2000. http://csrc.nist.
gov/groups/ST/toolkit/rng/index.html.
[78] NIST. Guide to the statistical tests, apr 2008. http://csrc.nist.gov/
groups/ST/toolkit/rng/stats_tests.html.
[79] Pierre L’Ecuyer and Richard Simard. Testu01, oct 2009. http://www.
iro.umontreal.ca/˜simardr/testu01/tu01.html.
[80] Pierre L’Ecuyer and Richard Simard.
TestU01. A Software Library in
ANSI C for Empirical Testing of Random Number Generators. Departement
d’Informatique et de Recherche Operationnelle Universite de Montreal,
aug 2009. User’s guide, compact version.
186
BIBLIOGRAPHY
[81] Jesse Burns. Developing secure mobile applications for android. Technical
report, iSEC Partners, 2008.
[82] Jesse Burns. Mobile application security on android. Technical report,
Black Hat, 2009. Context on Android security.
[83] Android
developers.
Security
and
permission,
jun
2010.
http://developer.android.com/guide/topics/security/
security.html.
[84] Li Gong, Marianne Mueller, Hemma Prafullchandra, and Roland
Schemers. Going beyond the sandbox: An overview of the new security
architecture in the java development kit 1.2. In Proceedings of the USENIX
Symposium on Internet Technologies and Systems, Monterey, California, dec
1997.
[85] David Alex Lamb. Sharing intermediate representations: the interface description language. PhD thesis, Carnegie-Mellon University, Department
of Computer Science, 1983.
[86] F.Bachmann et al. Documenting software architecture: Documenting interfaces. Technical report, Sofware Enginerring Institute, Carniege Mellon, 2002.
[87] Robert Kail. Human development : a life-span view. Wadsworth Cengage
Learning, Australia ;;Belmont CA, 5th ed. edition, 2010.
[88] David L. Altheide. Identity and the definition of the situation in a massmediated context. Symbolic Interaction, 23(1):1–27, 2000.
187
BIBLIOGRAPHY
[89] Shanyang Zhao, Sherri Grasmuck, and Jason Martin. Identity construction on facebook: Digital empowerment in anchored relationships. Comput. Hum. Behav., 24(5):1816–1836, 2008.
[90] Peter Mika.
Bootstrapping the foaf-web: An experiment in social
network mining, 2004. http://www.cs.vu.nl/˜pmika/research/
foaf-ws/mining.html.
[91] Peter Mika. Flink: Semantic web technology for the extraction and analysis of social networks. Web Semantics: Science, Services and Agents on the
World Wide Web, 3(2-3):211–223, October 2005.
[92] Marco Gaertler. Clustering. In Ulrik Brandes and Thomas Erlebach, editors, Network Analysis: Methodological Foundations, volume 3418 of Lecture
Notes in Computer Science, pages 178–215. Springer, February 2005. http:
//springerlink.metapress.com/content/19b5r48lqx3nx7gc.
[93] Paul Jaccard. Etude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaudoise Sci Nat, 37:547–579, 1901.
[94] Rudi L. Cilibrasi and Paul M. B. Vitanyi. The google similarity distance.
IEEE Transactions on Knowledge and Data Engineering, 19(3):370–383, March
2007. http://dx.doi.org/10.1109/TKDE.2007.48.
[95] Ravi Kannan, Santosh Vempala, and Adrian Vetta. On Clusterings: Good,
Bad, Spectral. Journal of the ACM, 51(3):497–515, May 2004.
[96] P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering
Large Graphs via the Singular Value Decomposition. Machine Learning,
56(1-3):9–33, 2004.
188
BIBLIOGRAPHY
[97] E. Casalicchio, E. Galli, and V. Ottaviani. MobileOnRealEnvironment-GIS:
A federated mobile network simulator of mobile nodes on real geographic
data. In Distributed Simulation and Real Time Applications, 2009. DS-RT ’09.
13th IEEE/ACM International Symposium on, pages 255 –258, oct 2009.
189

Open BAR - CE group DISP

Transcription

Similar documents

AcendexAssure

The Writing Side of Sales

Darshanen Vasanthan

CBeyond BeyondMobile

Read it - NovaStor

Issue 122 - Jan 2002 - Isle of Wight PC User Group

Disk to Disk Backup for IBM Midrange Servers

Ziua doctoranzilor

How To Use Rsync Backup Module

February - Hendersonville Area Computing Society