Open BAR - CE group DISP
Transcription
Open BAR - CE group DISP
Università degli Studi di Roma “Tor Vergata” Facoltà di Ingegneria Dottorato di Ricerca in Informatica ed Ingegneria dell’Automazione Ciclo XXIII Open BAR A New Approach to Mobile Backup And Restore Vittorio Ottaviani A.A. 2010/2011 Docente Guida/Tutor: Coordinatore: Prof. Giuseppe F. Italiano Prof. Daniel P. Bovet to my parents because an example is worth a thousand words Abstract Smartphone owners use to save always more information, and more important data into the internal memory of their devices. Mobile devices are prone to be lost, stolen or broken; this causes the loss of all the information contained in it if these data are not backed up. While many solutions for making backups and restoring data are known for servers and desktops, mobile devices pose several challenges, mainly due to the plethora of devices, vendors, operating systems and versions available in the mobile market. In this thesis, we propose a new backup and restores approach for mobile devices, which helps to reduce the effort in saving and restoring personal data and migrate from a device to another. Our approach is platform independent: in particular, we present some prototypes based on different mobile operating systems: Google Android, Windows Mobile 5 and 6 and Symbian S60. The approach grants the security of the information backed up and restored using novel cryptographic techniques optimized for mobile. Another feature of our approach lies in the capability of offering additional services to the final user or to administrator of the system. As an example, for users, we provide a service enabling the sharing of information in mobile devices among a group of selected persons. This can be useful in many situations e.g., in creating a mobile business network among a group of people. For administrators we offer a social network extractor which, starting from information contained into the smartphone and data publicly available on the web generates a social graph of the backup network. This can be useful in situations like creating teams into an enterprise. i Acknowledgements During the years of my PhD several persons have passed into my life, some of these persons have leaved a sign that will never be deleted. First of all I want to thank Pino: your way to approach things, always searching for the best, inspires me everyday; I learned some of the most important sessions of my life thanks to you. I want to thank all the colleagues and friends who believed in me during hard times and who enjoyed with me successes; Emanuele, Cristina, Danilo and Paolo thank you guys for the support and for sharing with me your experience. Special thanks go to Fabio and to Ermanno. . . I will not write another thesis to explain this thanks: each one of you knows. . . Thanks to my family for your unconditioned love and trust in me. Words cannot fully express how important you are to me. Finally thank you Ramona, you are my love, my best friend and the reason why every morning I wake up and do my best to be a better person. . . iii Table of Contents 1 2 3 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 How much does data loss cost? . . . . . . . . . . . . . . . 2 1.1.2 Focusing on mobile . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Our solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Backup & restore in the third millennium 13 2.1 Backup features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.1 Full backup . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.2 Incremental backup . . . . . . . . . . . . . . . . . . . . . 15 2.1.3 Differential backup . . . . . . . . . . . . . . . . . . . . . . 16 2.1.4 File-based vs. device-based . . . . . . . . . . . . . . . . . 17 2.1.5 Scheduled backup vs continuous data protection . . . . 18 2.1.6 Local backup vs. remote backup . . . . . . . . . . . . . . 19 2.2 Mobile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Local backup for mobile device . . . . . . . . . . . . . . . . . . . 21 2.4 Remote backup for mobile device . . . . . . . . . . . . . . . . . . 22 Our approach to backup 25 3.1 A new approach to backup & restore . . . . . . . . . . . . . . . . 26 3.1.1 Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.2 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 v TABLE OF CONTENTS 4 3.2 Sharing backup data . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Social network analysis . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Data extraction 35 4.1 Forensic Style Approach . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.1 Our methodology . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.2 Symbian implementation . . . . . . . . . . . . . . . . . . 39 4.1.3 Windows Mobile implementation . . . . . . . . . . . . . 41 4.1.4 Some remarks on this approach . . . . . . . . . . . . . . . 47 Selection of interesting data . . . . . . . . . . . . . . . . . . . . . 49 4.2.1 Symbian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2 Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Performances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 5 vi Data elaboration 55 5.1 Remote elaboration . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2 Our step-by-step Methodology . . . . . . . . . . . . . . . . . . . 59 5.2.1 Stage 0: Choice of the objective . . . . . . . . . . . . . . . 62 5.2.2 Stage 1: Files of interest identification . . . . . . . . . . . 62 5.2.3 Stage 2: Data hypotheses and entities injection . . . . . . 64 5.2.4 Stage 3: Sequences similarity discovery . . . . . . . . . . 67 5.2.5 Stage 4: Data interpretation . . . . . . . . . . . . . . . . . 68 5.2.6 Stage 5: Meta-format building . . . . . . . . . . . . . . . 70 5.2.7 Stage 6: Error correction . . . . . . . . . . . . . . . . . . . 72 5.2.8 Stage 7: Parser building . . . . . . . . . . . . . . . . . . . 74 TABLE OF CONTENTS 5.2.9 6 Stage 8: Testing and debugging . . . . . . . . . . . . . . . 74 5.3 Remote elaboration results . . . . . . . . . . . . . . . . . . . . . . 75 5.4 Local elaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Protecting saved data 81 6.1 Key agreement algorithm . . . . . . . . . . . . . . . . . . . . . . 82 6.1.1 Mathematical setting: key agreement protocol . . . . . . 83 6.1.2 J2ME implementation . . . . . . . . . . . . . . . . . . . . 85 6.1.3 Performance testing methodology . . . . . . . . . . . . . 87 6.1.4 Performance evaluation . . . . . . . . . . . . . . . . . . . 89 6.1.5 Experimental results . . . . . . . . . . . . . . . . . . . . . 91 6.1.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . 93 Encryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.2.1 Performances . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.2.2 Statistically testing QP-DYN and RC4 . . . . . . . . . . . 98 Protecting inter process communication . . . . . . . . . . . . . . 100 6.3.1 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.3.2 The framework . . . . . . . . . . . . . . . . . . . . . . . . 103 6.3.3 The framework implementation . . . . . . . . . . . . . . 108 6.3.4 On a real device . . . . . . . . . . . . . . . . . . . . . . . . 112 6.2 6.3 7 Value added services on backup data 115 7.1 Sharing backup data with closed groups . . . . . . . . . . . . . . 116 7.1.1 Social backup in business environment . . . . . . . . . . 116 7.1.2 Sharing conference data . . . . . . . . . . . . . . . . . . . 117 7.1.3 Shared backup for smartphone . . . . . . . . . . . . . . . 118 7.1.4 Running the application . . . . . . . . . . . . . . . . . . . 119 vii TABLE OF CONTENTS 7.2 7.3 8 Extracting social network . . . . . . . . . . . . . . . . . . . . . . 120 7.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.2.3 Smartphone Data Analysis (SDA) . . . . . . . . . . . . . 124 7.2.4 Web Data Analysis (WDA) . . . . . . . . . . . . . . . . . 126 7.2.5 Clustering Analysis (CA) . . . . . . . . . . . . . . . . . . 129 7.2.6 The Final Result: The Social Network . . . . . . . . . . . 132 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Conclusions and Future Work A The Symbian S60 format 135 139 A.1 Address book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 A.2 Calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 A.3 Events log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 A.4 SMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 B The Backup communication protocol 157 B.1 Backup item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 B.2 Contact item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 B.3 Calendar item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 B.4 Message item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 B.5 Generic file item . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 B.6 Setting item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 B.7 List methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 B.8 Restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 B.8.1 viii Listing items on the server . . . . . . . . . . . . . . . . . . 164 TABLE OF CONTENTS B.8.2 Choosing data to be restored . . . . . . . . . . . . . . . . C The Sharing communication protocol 164 167 C.1 Sharing methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 C.1.1 Item listing . . . . . . . . . . . . . . . . . . . . . . . . . . 167 C.1.2 Share a item . . . . . . . . . . . . . . . . . . . . . . . . . . 168 C.1.3 Location based sharing . . . . . . . . . . . . . . . . . . . . 169 C.1.4 Listing shared data . . . . . . . . . . . . . . . . . . . . . . 170 C.2 Groups methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 C.2.1 Creating group . . . . . . . . . . . . . . . . . . . . . . . . 172 C.2.2 Listing groups . . . . . . . . . . . . . . . . . . . . . . . . . 172 C.2.3 Handling invitations . . . . . . . . . . . . . . . . . . . . . 173 Bibliography 189 ix List of Figures 1.1 Costs of data loss per industry sector (values are in million $ per year) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Smartphone and PC sales prevision in Million of units . . . . . . 5 1.3 2007 - 2010 trend mobile operating systems market share. . . . . 6 1.4 Mobile cellular, subscriptions per 100 people, 2009. . . . . . . . 7 3.1 Backup and Restore system architecture. . . . . . . . . . . . . . . 27 3.2 Example of data model for a contact. . . . . . . . . . . . . . . . . 28 3.3 Example of a request of a contact. . . . . . . . . . . . . . . . . . . 29 3.4 Example of client server interactions. . . . . . . . . . . . . . . . . 30 4.1 Data collection workflow . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Windows Mobile 5.0 memory architecture. . . . . . . . . . . . . 42 4.3 (a) Symbian S60 tool’s screenshot, (b) Windows Mobile tool’s screenshot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1 The methodology flow . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2 The format of the Ω operations sequence. In this figure is shown an example with contacts discovery as objective . . . . . . . . . 5.3 64 These figures show an example of a DBMS binary file before and after the Stage 3. In (a) the sample file after making pairs of calls of the same duration (Stage 2). In (b) equal sequences highlighted. In (c) the formatted file Φ̂0 . . . . . . . . . . . . . . . 5.4 68 This three figures depict an example of the application of Stage 5 on a file containing the phone’s address book. . . . . . . . . . 71 xi LIST OF FIGURES 5.5 The architecture of the backup server . . . . . . . . . . . . . . . . 78 6.1 Key Agreement process using conjugate. . . . . . . . . . . . . . 84 6.2 Public data and Key Agreement generation time: all tests . . . . 89 6.3 Public data and Key Agreement generation time: results with an upper bound of 1 sec. . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 90 Overall encryption and decryption time comparison between (sizes in bytes) (a) RC4 512-bit and QP4, (b) RC4 768-bit and QP5, (c) RC4 1024-bit and QP6. . . . . . . . . . . . . . . . . . . . . . . 6.5 Overall encryption and decryption time comparison between AES CFB 256-bit and QP3 (sizes in bytes). . . . . . . . . . . . . . 6.6 95 96 Overall encryption and decryption time comparison between AES 256-bit and QP3 (sizes in bytes). . . . . . . . . . . . . . . . . 97 6.7 Mutual Authentication phase. . . . . . . . . . . . . . . . . . . . . 105 6.8 Session Authentication phase. . . . . . . . . . . . . . . . . . . . . 106 6.9 Session Encryption phase. . . . . . . . . . . . . . . . . . . . . . . 107 6.10 SAVED framework main packages. . . . . . . . . . . . . . . . . . 109 7.1 Use case of meeting backup and share. . . . . . . . . . . . . . . . 117 7.2 Android Backup and Restore client. . . . . . . . . . . . . . . . . 118 7.3 Android Backup and Restore client. . . . . . . . . . . . . . . . . 119 7.4 The graph representation of contacts (a) and their relationships with the phone’s owner (b), which are revealed by the number of calls and number of SMS/MMS. In (c) is shown the graph after the execution of SESORR; the edges represent the relationships extracted from the Web 7.5 xii (web-edges). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Frequency distribution of URLs (domains) providing relationships. . . 128 LIST OF FIGURES 7.6 Contact-to-cluster assignment. . . . . . . . . . . . . . . . . . . . . . 7.7 Clustering metrics trends. The profile graph, used in the example, has 130 218 contacts and 1242 Web edges; the black vertical line is relative to . . . . . . . . . . 131 The final result of the whole process: the social network clusters. . . . 132 B.1 Example of XML payload for a backup item. . . . . . . . . . . . 157 B.2 Example of XML payload for a contact item. . . . . . . . . . . . 158 B.3 Example of XML payload for a calendar item. . . . . . . . . . . . 159 B.4 Example of XML payload for a message item. . . . . . . . . . . . 160 B.5 Example of XML payload for a generic file item. . . . . . . . . . 161 B.6 Example of XML payload for a contact list response. . . . . . . . 163 B.7 Example of XML payload for a setting item. . . . . . . . . . . . . 165 B.8 Restore method response. . . . . . . . . . . . . . . . . . . . . . . 166 C.1 Example of XML payload for a list of items. . . . . . . . . . . . . 167 C.2 Example of XML payload to share an item with a group. . . . . 168 k = 10, the chosen value for the input parameter k. 7.8 C.3 Example of XML payload to share an item with a group using location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 C.4 Example of XML payload for a list of items. . . . . . . . . . . . . 170 C.5 Example of XML payload to create a group. . . . . . . . . . . . . 172 C.6 Example of XML payload of a list of groups. . . . . . . . . . . . 173 C.7 Example of XML payload to invite users to a group. . . . . . . . 174 C.8 Example of XML payload of invitations received by the user. . . 175 xiii List of Tables 1.1 Cost and causes of data loss . . . . . . . . . . . . . . . . . . . . . 2 2.1 Comparison of backup approaches . . . . . . . . . . . . . . . . . 16 4.1 Files generated during the Extraction Process . . . . . . . . . . . 40 4.2 Windows Mobile 5.0 relevant files . . . . . . . . . . . . . . . . . 47 4.3 Extraction tool consistency analisys . . . . . . . . . . . . . . . . . 48 4.4 Time overhead of the backup operation per data type . . . . . . 53 5.1 Symbian files of interest . . . . . . . . . . . . . . . . . . . . . . . 75 6.1 Time used from algorithms to generate the secret to agree a SSK. 92 6.2 Time overhead for the framework phases. . . . . . . . . . . . . . 113 A.1 Possible values for the rows of table “DATA TYPE TABLE”. They describe the type of attributes present in the “DATA BLOCK”. (Symbian S60 v2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 A.2 This table lists all contact’s data which can be found in the Contacts.cdb. Since data are located in three logical file areas, the table is split in three parts. . . . . . . . . . . . . . . . . . . . . . . 153 A.3 This table lists all calendar entries such as Notes Meetings Anniversaries stored in the Calendar file. . . . . . . . . . . . . . . . 154 A.4 This table lists all event entries such as SMS, MMS, voice and data calls, SIM change. . . . . . . . . . . . . . . . . . . . . . . . . 155 A.5 This table lists all fields characterizing an SMS. . . . . . . . . . . 156 xv Serva me, servabo te Save me and I will save you Petronius Arbiter 1 Introduction 1.1 Motivation Backup is a crucial task, since hardware failures and software or human errors can lead to the loss of important information. In addition to failures, backups are even more important for devices such as laptops and smartphones, since they are more prone to loss or theft. Currently, smartphones are used more as handheld computers than as mobile phones, and consequently a lot of data is stored in those devices. This makes the need to keep data stored on those devices safe from losses more critical. In addition, the rapid technological evolution in mobile devices makes it more difficult to restore data saved from old devices to new ones. Thus, mobile devices pose new challenges in the backup and restore problem. Backing up data on external memory devices, such as on Secure Digital (SD) cards or on laptop disks, suffers from the same risks of failure or loss. Moreover, the growth of cloud services, and the capability of modern smartphones to be always online without consuming too much power is pushing backup systems to save data on line using cloud services. Unfortunately the plethora of devices, operating systems and vendors available on the market 1 CHAPTER 1. INTRODUCTION Cause hardware or system failure human errors software corruptions natural disasters other Percent 78% 11% 7% 1% 3% cost $9.36 billion $1.32 billion $0.84 billion $0.12 billion $0.36 billion Table 1.1: Cost and causes of data loss causes interoperability problems and often the user loses his/her information in case the device fails, is lost or stolen and even in case of migration to a new device. 1.1.1 How much does data loss cost? Some studies report that a company that experiences a computer outage lasting for more than 10 days will never fully recover financially and that 50 percent of companies suffering such incident will be out of business within 5 years [1], [2]. Other studies by National Archives & Records Administration in Washington show that 93% of companies that lost their data center for 10 days or more due to a disaster, filed for bankruptcy within one year from the disaster; 50% of businesses that found themselves without data management for this same time period filed for bankruptcy immediately. Statistics about data recovery [3], [4], [5] say that U.S. businesses lose over $12 billion per year because of data loss. This loss is due primarily to hardware or system failure, which accounts for 78%, human error accounts for 11%, software corruptions for 7% and natural disasters represent only 1% of all data loss; Table 1.1 summarizes how each factor economically affects the loss. Moreover, disaster prevention and recovery plans are often overlooked or 2 1.1. MOTIVATION 0.0 M$ Ba n k Reta e utic Pha r mac Ins u echn o log s Infor ma ti on T l Ins t itu tio n ufac Ma n nc ia Fin a T e le com mu n ic atio En e r gy in g 0.5 M$ 0.0 M$ als 1.0 M$ 0.5 M$ il 1.5 M$ 1.0 M$ ra n c e 2.0 M$ 1.5 M$ y 2.5 M$ 2.0 M$ turin g 3.0 M$ 2.5 M$ ns 3.0 M$ Figure 1.1: Costs of data loss per industry sector (values are in million $ per year) outdated, and more often are considered a boring and time wasting activity, also because users have the perception that backup tools and techniques are not 100% reliable. The 7th Annual ICSA Lab’s Virus Prevalence Survey [6] says that file corruption and data loss are becoming much more common as usually users cooperate on shared documents or resources, although loss of productivity continues to be the major cost associated with a virus disaster. Ontrack statistics calculate how much data loss costs for each industry sector. The chart in Figure 1.1 resumes costs of data loss per industry sector [7]. The cost of losing data depends on the type of data. If an enterprise loses historical data about room cleaning the loss does not represent a huge problem for the business. On the other hand, if archives containing contracts and invoices data, architectural drawings, or the source code of a mission critical software that should be rewritten by high skilled developers, are lost, then the 3 CHAPTER 1. INTRODUCTION loss is huge. In the first case the institution will have to face legal problems, due to law that regulates official data management; in the second the architect will have to inspect all the areas interested by the lost drawings and redo the work, in the third case the enterprise, will have to spend time and money to reimplement the software or will have to face a migration to a similar software. As far as mobile environments are concerned, if a manager’s smartphone fails and she loses her family pictures, this is not a huge problem; while if in the failure she loses the address book, containing all her business contacts, this represents days of work to recover part of these data; probably she will never recover all the information lost and for a manager this is a great loss. Moreover, in the last years that users store always more important data such as pin codes or bank account numbers in their mobile phones or laptops as they trust the reliability of such devices. For the users it is really comfortable for day-to-day business to store private information into their mobile device as it allows to access the information instantly. Unfortunately these devices are subject to be lost or stolen. Cpp Fonesafe sets that, in Italy, every four minutes a mobile phone is lost or stolen; AXA insurance in a report states that the majority of stolen devices are smartphones [8]. The phenomenon is even bigger in other countries; in the UK, for example, 228 mobile phones are reported to be stolen every hour [9]. In case the device is lost, usually the information contained in it is not interesting for the one who finds it; he will just reset and use it. On the other hand if the device is stolen the information can be used by the thief as he/she may know the owner and can exploit such information more easily. A security layer must protect these data. In any case when somebody loses his/her device, the most valuable thing he/she loses is the information within the device so there is a need for a reliable 4 1.1. MOTIVATION 400 350 300 250 Smartphones PC 200 150 100 50 0 2005 A 2006 A 2007 A 2008 A 2009 A 2010 E 2011 E Figure 1.2: Smartphone and PC sales prevision in Million of units mobile backup system. 1.1.2 Focusing on mobile According to RBC analysts [10], the 2011 shipments of smartphone devices will approach 400 M units equalizing PC sales. Figure 1.2 illustrates the trend for 2005–2011, “A” indicates actual values, “E” indicates estimated values. Nokia is still the mobile device market leader, probably thanks to his policy on low cost devices. Apple IOS and Android equipped devices are gaining market share on Windows Mobile, Palm and Linux OS, while RIM Blackberry is quite stable, probably because of its focus to business customers. Smartphone OS’s diffusion changed in the last year; in 2009 Symbian was leading the market with 52% followed by RIM 17%, Windows Mobile 12%, iPhone 8%, Palm 2%, Android 1% and others 9% [11]. In Q4 of 2010 Android gained a huge part of the market growing 886% year-over-year [12]; 2010 OS market is still lead by Symbian with 38% of the market share, RIM has grown 5 CHAPTER 1. INTRODUCTION 100% 90% Other (Palm, Linux) Google Android Microsoft Windows Mobile Apple iPhone RIM Blackberry Symbian 80% 70% 60% 50% 40% 30% 20% 10% 0% Share Q3 2007 Share Q3 2008 Share Q3 2009 Share Q4 2010 Figure 1.3: 2007 - 2010 trend mobile operating systems market share. reaching 16%, Apple IOS after iPhone4 launch gained 5 points holding 16% of the market, but the fastest growing OS is Android, having 23% of the whole market. Android’s growth is driven by key products from HTC, Motorola, Samsung, Sony Ericsson and LG, among others, as they provide smartphones running Android as operating system [13]. Figure 1.3 shows the trend of distribution for smartphone operating systems over the last 4 years, the figure alson reflects the trend of sales for vendors. The map in Figure 1.4 shows the spread of mobile devices in the world at the end of 2009 [14], when more or less each person has a mobile device. In some countries, such as the United Arab Emirates, a person uses two or more mobile devices in everyday life. In the most cases, using more than a device, forces the user to switch from a vendor, operating system or version of the 6 1.1. MOTIVATION Figure 1.4: Mobile cellular, subscriptions per 100 people, 2009. same operating system to another continuously. Moreover the usage of more than a device spreads personal data on all the devices, making it more difficult to search the information in all his/her devices. The solution is to keep devices synchronized but it is really hard to do. It is even harder, if not impossible, if these devices are from different vendors and if they run different versions or different operating systems. Currently, in some cases, the easiest way to synchronize two devices is to manually copy data from one to the other. For example if a user wants to switch from a Symbian equipped Nokia smartphone to an Android device she can synchronize her device with her gmail account (if she has one), in order to have her address book copied to the 7 CHAPTER 1. INTRODUCTION new device. One way to copy messages (SMS, MMS) is to use a migration tool like SPB Migration Tool available on the market for 9.95$. Unfortunately, from the users’ comments, it looks like the application does not work properly on every source device; even Android OS versions are not fully supported (only 2.0 and higher). Moreover the application migrates Address book, SMS, MMS and gallery data to Android and does not work if the user wants to migrate from Android to another operating system. Another way to move SMS from Symbian to Android is to install Nokia OVI on a laptop, synchronize messages from the smartphone with OVI, download and run Nokia2AndroidSMS.exe which should automatically find all datastores created by Nokia OVI and automatically select the first one and generate an XML file. Then the user should install SMS Backup & Restore on his Android device, connect the phone to the PC and select “Disk drive” as connection type. Now the user should copy the XML file into the SMSBackupRestore folder on the phone and run SMS Backup & Restore to import messages. Even such a “straightforward” procedure is one way, it works just from Symbian to Android. We explained here some examples on how to migrate from a device to another. To keep different devices synchronized, Microsoft Exchange can be used, but the devices are just partially synchronized, SMSs are not updated. It is clear that saving personal information and restore these information to a new device, is not as simple as it should be. In some cases it is impossible to save some kinds of data. In this introduction we did not mention application settings, but it would be a huge save of time and pain to restore those settings to a new device and have all applications, if available on the new platform, already installed and configured. 8 1.2. OUR SOLUTION Currently there is no solution which allows the user to backup data from a device and restore it to a new device having all contacts, calendars, email, SMS, MMS and even application settings available on the new device without wasting too much time and with a painless procedure. 1.2 Our solution The solution proposed in this thesis is to provide a common interface to exchange data between the plethora of devices present in the market. Such common interface is based on the structure of the data to be exchanged. The mobile phone self-extracts the information to be backed up using the API provided by the mobile operating system or saving the whole content of the device and extracting the useful information in an ad-hoc server application. As smartphones tend to be always connected to the Internet, it seems natural to move the information online and to provide backup and restore services based on the cloud computing paradigm, which is considered to be more reliable and less expensive by end users [15], [16], [17]. This approach reduces also the risk of data loss and decouples the data from a specific device. Once backup information moves online, it can be used in several ways, for example in a shared application or to extract social networks and profile users. In an enterprise scenario, for example, it can be useful for users to share business or personal data contained in their mobile’s backups, such as calendar or business cards, with some selected contacts of their choice. At the same time, the management could be interested in analyzing social relations which naturally grow between employees, and exploit these relationships to build workgroups. In such a scenario, it is easy to imagine a community of people willing to 9 CHAPTER 1. INTRODUCTION share some of their data within their mobile network. A backup that allows data sharing, however, can suffer the same security and privacy issues present in social networks [18]; such limitations can be approached in different ways depending on the environment where the system is used. In an enterprise scenario, data sharing can be monitored by administrators which can enforce the company privacy policies. In a general purpose environment, like a mobile social network, ownership of data must be verified and sharing must be allowed only by the data owner. 1.3 Contributions The goal of this thesis is to present a backup system for smartphones that allows users to share part of their personal backup data with a selected set of contacts. In order to be platform independent, our approach is based on a novel kind of management of data, and hinges on a data model which abstracts from the underlying platform and focuses on the data type. The same backup and restore method can be applied both on mobile and on desktop or interconnected TV platforms. With such system, users can manage different devices, using different operating systems, and keep data synchronized across different platforms. In order to assess the feasibility and impact of our approach in a real scenario, we built three prototypes of our backup and restore system for Android, Windows Mobile OS (version 5 and 6) and for Symbian OS, and tested them on actual mobile devices. Our contribution to this project covers the following areas: Smartphone data extraction : we proposed two different approaches to extract internal data from a mobile device and send these data to a remote server using a common format, based on the structure of the data type to 10 1.4. THESIS OUTLINE be exchanged between the mobile client and the server. Smartphone data elaboration : we designed a methodology to reverse engineer raw data, coming from mobile devices, implement specific parsers able to extract personal information and elaborate such information to make it compatible with other devices. Securing the system : we proposed a brand new key agreement algorithm based on matrix conjugation method, a new model to implement secure inter-process communication into the Android OS, and we verified the usability of new encryption algorithms compared to standard ones in mobile environments. Services on backup data : we proposed some services using stored data to be provided to users or to administrators of the backup system. Such services are just a starting point for other possible uses of data. The services implemented are a shared backup and a social network extractor. 1.4 Thesis Outline This thesis is organized into three parts. The first part introduces to the problem and describes the solution proposed. First part is composed by Chapter 2 and Chapter 3. Chapter 2 is a survey on backup techniques both on desktop and on mobile environments; Chapter 3 summarizes how we approach the backup problem, showing the proposed idea. In this chapter we show the components of the system implemented to allow users to backup and restore data granting interoperability between vendors, operating systems and versions. 11 CHAPTER 1. INTRODUCTION The second part deals with the operations on data. This part is composed by Chapter 4 and Chapter 5. Chapter 4 details the two methods, forensic style and selection of interesting data, proposed to extract data from the device and shows how these tasks are performed and integrated. Chapter 5 explains the data reverse engineering methodology proposed to extract personal information from raw backup data, and how such data are managed to be made interoperable between vendors, operating systems and versions. This section describes architecture of the server side and how the server stores backup data coming directly from a device or from a raw backup. Chapter 6 and Chapter 7 compose the third part. This part proposes some services to be provided to the users of the system. Chapter 6 describes the security services deployed to secure the information. In Chapter 6 we explain the new key agreement algorithm, the approach proposed to protect inter-process communication in Android and some considerations on the opportunity to use new encryption algorithms or the standard ones on mobile environments. Chapter 7 shows some possible Value Added Services on backup data like the opportunity to share part of the backup with some selected contacts and the possibility of extracting users’ social network using data from the backup and information available on the web. The thesis ends with a chapter which summarises the findings of the thesis and considers directions for future work. 12 2 Backup & restore in the third millennium Introduction According to [19], backups can be classified in several types; it is possible to distinguish the data repository model in full backups, incremental backups and differential backups, data can be stored in a file-based or a device-based style, and the data repository management can be classified as online vs. offline; those approaches can be combined in different ways, according to accessibility, security and cost needs. In case of failure, a full backup is able to restore the entire content of a device: this process is slow in the backup phase, introduces a huge overhead in the data stored, but allows for faster restores. On the other hand, incremental backups reduce backup times and sizes but imply higher restore times. Backups can operate on files (file-based approach) or on data physically saved on the disk (device-based approach): although a file-based approach tends to be slower than a device-based backup, it allows for more flexibility and it is easier to manage. Online backups permit to save and restore data while the system is running, while off-line backups require the 13 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM system to be idle: online backups are more convenient, as they do not interfere with the users’ work, but are more complex to handle, as the system needs to deal with updates carried out during the backup. In all cases, backups can be stored locally, e.g., on an external device , or remotely, e.g., on a remote server. Backups for mobile devices can be stored locally on a SD card, or on a personal computer, or remotely on a server accessible via network connectivity. Several synchronization protocols have been proposed for mobile devices, including Microsoft’s ActiveSync, HotSync for Palm OS devices, Pumatech’s Intellisync, SyncML and CPISync. We refer the interested reader to in [20] for a detailed analysis of these protocols. Google’s Android Sync, Google Sync and Apple’s MobileMe are examples of applications enforcing data synchronization among different devices through cloud services. The main problem in the existing mobile backup solutions is that they are usually bound to specific platforms and vendors. Even SyncML [21], which was launched to provide an open standard to synchronize devices with different OS, is confined inside the Open Mobile Alliance companies’ products. Data sharing, like business contacts or calendar events, among different users is spreading, but currently available solutions (e.g., VCard [22] via SMS or Bluetooth) are still too complicated to use, as they require physical proximity and suffer from lack of portability across different platforms. 2.1 Backup features In the following subsections we will describe in more detail the main features of backups; i.e., full vs. incremental backups; file-based vs. device-based schemes; support for online backups; the use of snapshots and copy-on-write mechanisms; local and remote storage. All these features will be analyzed both 14 2.1. BACKUP FEATURES for the mobile and for the desktop/server environments. 2.1.1 Full backup The simplest way to protect a file system against disk failures or file corruption is to copy the entire contents of the file system to a backup device. The resulting archive is called a full backup. If a file system is later lost due to a disk failure, it can be reconstructed from the full backup onto a replacement disk. Individual lost files can also be retrieved. Full backups have two disadvantages: reading and writing the entire file system is slow, and storing a copy of the file system consumes significant capacity on the backup medium. Full backup is designed to allow the entire device to be recovered without any installation of operating system, application software and data. This kind of approach allows the user to avoid the time expense in a full system recovery, the hours needed to rebuild the device to the point of restoring the last data backup. So, a full system backup makes a complete image of the device so that if needed, it can be copied back to the device. To restore the system in such cases there is the need of some specific software, such as for example Ghost. 2.1.2 Incremental backup Faster and smaller backups can be achieved using an incremental backup scheme, which copies only those files that have been created or modified since a previous backup. Incremental schemes reduce the size of backups, since only a small percentage of files change on a given day. A typical incremental scheme performs occasional full backups supplemented by frequent incremental backups. Restoring a deleted file or an entire file system is slower in an incremental backup system; recovery may require consulting a chain of backup files, begin15 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM MORE l LESS Backup speed Incremental Differential Full Restore speed Full Differential Incremental Information saved Full Differential Incremental Table 2.1: Comparison of backup approaches ning with the last full backup and applying changes recorded in one or more incremental backups. 2.1.3 Differential backup A third schema between incremental and full backup is the differential backup, the differential backup schema performs a full backup and later saves all files modified since the last full backup. The main difference between incremental and differential backup style is that incremental backup saves all files that have been changed since the last backup, whether it is a full, an incremental or a differential backup, while differential checks for the type of backup performed and saves all files modified since the last full backup. This style performs faster than incremental backup, but slower than full backup to restore a compromised device. In backup phase differential approaches faster than full but slower than incremental. The storage needed to save data of backups is less than full backup and more than incremental backup. Table 2.1 resumes comparison between backup and restore techniques. Incremental and differential backup can be considered reverse delta approaches; in these schemata the backup system stores only the differences between current and previous versions. Such kind of backups start with a full backup and periodically synchronize data with the live copy; data between live copy and full backup can be 16 2.1. BACKUP FEATURES archived or erased depending if the system wants to allow to recover to intermediate versions. Backup systems using suck approach are rdiff-backup and Time Machine. 2.1.4 File-based vs. device-based Files are saved on disk in logical blocks, these blocks are usually all with the same size (e.g., 8 KiloBytes). A file in a working system will usually be saved in blocks which are not contiguous. Backup software can operate either on files or on physical disk blocks. File-based backup systems understand the structure of files and copy entire files and directories to the storage media; such approach is really powerful in case one wants to recover or backup a single file, unfortunately on huge backup operation on hard disks such approach is slowed down by the seek times to reach file parts contained in non-contiguous blocks. A file-based backup scheme even suffers the problem that even a small change to a file requires the entire file to be backed up. In small files the problem is negligible but in multimedia files performances are strongly affected. On the other hand, device-based backup systems make a low-level copy of the content of the drive block-by-block; this improves backup performance on hard disks, since backup software performs fewer seek operations. Device-based backup, if performed with a reverse delta approach, performs better even on bigger files as small modifications, even on big files, cost at most the size of the modification more 7 KiloBytes. Unfortunately, this approach complicates and slows file restores, since files may not be stored contiguously on the backup medium. Moreover to allow file recovery, backups must include information on how files and directories are organized on disks to correlate blocks on the backup medium with particular files. This carries that device-based programs 17 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM are usually specific for a particular file system not easily portable. 2.1.5 Scheduled backup vs continuous data protection Backup software can require the file system to be quiescent during backups, usually these systems perform scheduled backups, online or active backup systems allow users to continue accessing files during backup. These kind of systems can perform both scheduled backups and continuous data protection. In continuous data protection data are continuously saved from the device to the backup medium in a transparent way for the user. Online backup systems offer higher availability but introduce consistency problems; another problem introduced by such kind of systems is that device resources are consumed by the backup system continuously performing operations in background. In a server or desktop environment resources consumed by the backup system do not affect usability of the system but,for example, on a mobile device with limited capability, it is really important to save resources for the user interaction, and battery to grant the device autonomy. By contrast, scheduling backups save operations are performed in given moments (e.g., once a week). This approach do not grant that data are continuously protected, but have the advantage to be less resource consuming, as operations are performed rarely with respect to continuous data protection. Another advantage is that operations do not interact with user’s activity, if the backups are scheduled in a smart way. In mobile devices for example operations can be performed when the device is idle, for example during the night. In the describer cases we talk about backup performed when the device can run programs, so file system can be modified during the backup, this can lead the backup create inconsistency in files saved, a possible solution of such problem can be performing a “snapshot” of the 18 2.1. BACKUP FEATURES filesystem in a consistent time and make the backup of the snapshot. There can be the need to create a snapshot when the approach is full backup style or is the first execution of a reverse delta approach, in other execution of incremental or differential backup can be followed a copy-on-write scheme; in this scheme each time a file is modified the snapshot is updated and kept consistent with the live copy. 2.1.6 Local backup vs. remote backup Backup data can be stored in several locations; historically backups were saved on magnetic tapes labeled both internally and externally to avoid losing backup data. Unfortunately magnetic tapes are not really reliable as tapes are prone to wear and to magnetic capacity loss. Currently backup data are stored on hard disks or other media. A backup can be considered local when the media where data are saved is locally connected to the device backed up (e.g., a second hard disk mounted on the same computer where reside data to be saved). A remote backup is the case when data are saved in another computer; this remote storage can be an ftp server inside the LAN or a server accessible through the Internet. Saving data locally the backup and restore processes are performed faster than using a remote resource as transmission times, the real bottleneck in such kind of operations, are saved. On the other hand performing backup operations remotely grants a better level of safety in case of theft of a PC for example if data are saved locally, in a second hard disk installed on the device, the second hard disk will be stolen with the device. The same problem can happen with an external hard disk, for example for laptops, if the laptop is stolen or lost it is really probable that the external hard disk is contained in the laptop’s bag. If data are saved remotely it is really improbable that both 19 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM the backed up device and the data container are lost or stolen in the same time. Remote approach grants a better level of reliability for the user saving the data. Cloud backup systems [23], [24] are the last frontier of the remote backup, such kind of system grant the best reliability even if times to backup and restore are slowed down by connectivity that is the bottleneck of remote backup systems. Unfortunately using these approaches the final user is locked to the particular provider; the cloud solutions available need a specific software installed both on the client and on the server. The backup software optimizes backup operations but reduces the portability. In [25] authors propose cloud backup approach based on simple operations available in every remote storage system: Get: Given a pathname, retrieve the contents of a file from the server. Put: Store a complete file on the server with the given pathname. List: Get the names of files stored on the server. Delete: Remove the given file from the server, reclaiming its space. The method proposed moves all critical operations to the clients, the server must provide just the interfaces to perform the four operations listed above. Such approach should ease migration to new costless and more powerful solutions, moreover the backups could be stored exploiting more providers located in different geographic areas to increase fault tolerance even in case of natural disasters. A similar approach can be used to backup data stored into mobile devices. 20 2.2. MOBILE 2.2 Mobile Data stored on a mobile device are usually critical for the device’s user, when data are lost the effort required to recover all the information saved on the device is really high, and sometimes it is impossible to recover all the information stored on the device. Moreover mobile devices are subject to be lost, or stolen, even more than laptops and desktop devices; furthermore they suffer storage and performance problems. For these reasons usually backups are performed on remote devices such as the device owner’s laptop. 2.3 Local backup for mobile device Following the desktop idea a local backup should be performed on a storage media directly connected to the device, e.g., a memory card. For mobile device storing backups on the user’s laptop can be considered as a local backup; this kind of storage suffers, more or less, the same problems of local backups in laptop environments. Saving data on the device memory card offers a good level of usability, the backup can be done transparently for the user. Backup process can run in background saving data when are modified on the device. Saving data on the device memory card can be useful in case of migration, but unfortunately gives no reliability in the case the device is lost or stolen. Saving backup data on a laptop increases the reliability of the backup system; usability is affected by the need to connect the device to the laptop. Usually backup from mobile device to laptop is performed using Bluetooth or usb cable connections, this makes necessary that mobile device is near or connected to the laptop, connection operations for some classes of devices (basically old devices) is not usual for users, so backups are performed rarely with the conse21 CHAPTER 2. BACKUP & RESTORE IN THE THIRD MILLENNIUM quence that saved data are not updated when restore is performed. For other classes of devices, Android or iPhones connecting the device to the laptop can carry malware infection propagation [26]. 2.4 Remote backup for mobile device Improved 3G and 4G connectivity features provided for mobile devices have opened in the last years the possibility to perform backup remotely using the Internet. Such approach is characterized by issues and advantages described in Section 2.1.6 for desktop devices. Due to the reduced hardware capability typical of mobile devices performance problems are increased. Furthermore mobile devices suffer battery autonomy problems, and, for a mobile device, using connectivity features increases significantly the battery consumption. On the other hand saving remotely backup data using network connectivity allows the backup system to run as a background application and transparently keep updated backup data with device data. Moreover saving data remotely can increase usability of the system; there is no tedious need to connect the device to a laptop using cables or via Bluetooth. Obviously there is no need of proximity between device and storage media; this allows the system to perform backup more frequently when the device is idle, freeing resources when needed by the user. If backup system storage media is based on a cloud architecture, the reliability of the whole system is increased; cloud backup systems allow users to access their personal data in a secure way from different platforms. Cloud based approach even free the user from managing his/her personal backup files avoiding potential errors or data loss due to user’s errors. [27] proposes a collaborative mobile backup approach based on a peer-topeer architecture, the approach is interesting but opens several security and 22 2.4. REMOTE BACKUP FOR MOBILE DEVICE privacy problems. The information stored on mobile devices is usually really personal. Owners are used to store pin codes, private messages and other information which suffers of privacy issues on their mobile device. Backups stored on others’ peer memory could be analyzed or modified by the owner of the peer where data are stored making these data useless or unavailable when needed. 23 3 Our approach to backup Introduction Several solutions to solve the backup and restore problem for desktop and server systems are available. The heterogeneity of mobile environments makes the backup problem harder. New vendors, operating systems, versions and new devices frequently appear in a changing market and, new solution are proposed continuously to solve the problem vertically for each new device/platform launched. In this Chapter we show the new approach we proposed to handle backups from heterogeneous mobile devices and grant interoperability with new devices. The main results obtained applying the proposed approach have been recently presented in the 4th IFIP International Conference on New Technologies, Mobility and Security [28]. Some of the ideas presented in this chapter, and more in general in this thesis are being applied in the Telecom Italia CuboVision backup project. 25 CHAPTER 3. OUR APPROACH TO BACKUP 3.1 A new approach to backup & restore Our approach tries to overcome the limitations in saving and restoring data from mobile devices, by using online remote backups as a uniform interface for sharing data among different users and multiple platforms. In particular, we present an online remote backup system based on a Service Oriented Architecture (SOA): the services offered by our solution allow to backup and restore not only files but also more structured data such as contacts, calendar events, and text messages (SMS). In order to be able to access those services, a mobile device must be equipped with a client capable of retrieving internal data from the device and sending them to the server via a common interface. This interface is designed so as to exploit the common features of mobile data models: e.g., independently of the platform used, a contact in an address book is always identified by fields such as first name, last name, address, phone numbers, etc. . . etc. . . All the communication exchanged between the client and the server is based on an extensible standard language (i.e., XML). The communication format for each kind of data is detailed in Appendix B. Thanks to the data common interface, data saved on the server are available for all types of devices, mobile or not, equipped with the client application. This grants interoperability between vendors, platforms, and operating systems. Using our general data model, backup data can be shared among different users: this allows to share part of the backups that transparently are kept updated on all the devices that can access the information. In our architecture (see Figure 3.1), the server provides his services using a representational state transfer (REST) architecture [29], [30]. For each type of 26 3.1. A NEW APPROACH TO BACKUP & RESTORE Persistence: DBMS Application: Business Logic Web Services: REST API Internet Secure connection Common Format: XML Figure 3.1: Backup and Restore system architecture. platform, a different client is implemented: each client connects to the server using the HTTP protocol to exchange information in XML format. In the following, we describe in more detail the main functionalities implemented by the server and the clients. 27 CHAPTER 3. OUR APPROACH TO BACKUP <contact> <email>[email protected]</email> <given_name>NAME</given_name> <phone_number_list> <phone_number> <number>*********</number> <type>2</type> </phone_number> <phone_number> <number>********</number> <type>1</type> </phone_number> </phone_number_list> <backupItem> <timestamp>2010-07-07 12:20:12.997</timestamp> </backupItem> <status>new</status> </contact> Figure 3.2: Example of data model for a contact. 3.1.1 Server The server has been designed as RESTful: in a REST architecture requests and responses are built around the transfer of representations of resources. In our case, a resource is the XML representation of its state, for example a contact (e.g., in Figure 3.2) or a contact list. A REST architecture is based on the HTTP protocol and uses all the HTTP facilities, such as the security layer provided by HTTPS in a transparent way. The server allows mobile clients to perform full backups and incremental backups. When a user performs a backup, all the user’s data previously stored on the server are still accessible from the mobile client; old data are kept on the server, and made accessible to the mobile client, to allow the user to revert to 28 3.1. A NEW APPROACH TO BACKUP & RESTORE https://someserver.com/backup/{backupType} /device/{imei}/contacts/{contactItemName} Figure 3.3: Example of a request of a contact. old backups in case of loss or failure. Our server implementation offers two REST methods: PUT, used to insert new entries on the server’s database, and GET that allows the mobile client to perform queries for a single entry or for entry lists. Figure 3.2 shows a typical body of a PUT request to the server: the body will contain the XML representation of the serialized object (in this specific case the entity saved is a contact item). In Figure 3.3 we show a typical URI of a PUT/ GET request (this specific case shows a request for a contact). When receiving a GET request at a URI as shown in Figure 3.3, the server will answer with the “contactItemName”, for the “imei“ device from the “backupType” resource using the XML shown in Figure 3.2; otherwise, if a PUT request is received, the server expects in the HTTP(S) request, the contact details to be processed. 3.1.2 Client The client can be implemented for different types of devices (mobile, desktop, game console, Internet TV etc. . . ). The software should be implemented to access private data residing on the device and to send such data on a remote server which will store these data. Clients must be able to handle HTTP messages bodies, get data sent by the server and store them into the device, for example in the address book, in the device specific format. 29 CHAPTER 3. OUR APPROACH TO BACKUP GET(id_list) id_list PUT(id_x1) PUT(id_xn) Figure 3.4: Example of client server interactions. Usually devices need to be built on purpose to interact with a backup server; in some cases they need to handle dirty flags in order to manage the status of the resources to be saved. In our approach, in order to interact with the server, clients need only to be able to read and write resources to be saved and to implement just some basic HTTP methods. To improve the performance in incremental backup operations the client may handle the list of items to be sent to the server. Figure 3.4 shows a typical interactions for an incremental backup. First, the client asks the list of identifiers of the items in the last backup, the server sends the list of the identifiers to the client in a XML format with the last backup date. At this point, the client computes its internal list of identifiers and compares the two lists: now, the client knows all the data that have been updated in the device, and can build 30 3.2. SHARING BACKUP DATA the list of modified contents. If last modification date in the client’s list is most recent than the one in the server, then the client adds the item to the list of data to backup. Note that our approach ensures compatibility with all old devices that can run third party applications able to access private data. The restore process gets the list of items on the server and saves all contents on the client. If there are some contents that appear on the client but not on the server, such contents are preserved in the client. In case of migration to a new device, or restore after a hard reset, the device is empty so the device contents after restore will be those contained in the last backup. 3.2 Sharing backup data Following a user-centric idea of the collaborative Web, the proposed approach for sharing data among different users and different devices can be often useful. It is easy to imagine a community of people willing to share some of their data within their mobile network. In a closed group of people, such as friends or work colleagues, usually some class of data stored in mobile devices are the same for the entire group. Collaborating people usually share each others’ mobile phone numbers, emails, calendar, addresses documents and so on. In an enterprise scenario, for example, it can be useful for people to share business cards or calendar events contained in their mobile’s backups, with some selected contacts of their working group. At the same time, in such a collaborative backup, it will be easier to recover data loss even if these data were not saved in a personal backup; in fact in a closed group that collaborates, it will be easier to asks one of the members for some data that a member lost and another still owns. Members of the same social community tend to share more or less the same data [31, 32]: if a member of the group changes his/her mo31 CHAPTER 3. OUR APPROACH TO BACKUP bile phone number he/she will have to spread his/her new number to all the network; in the same way if a new member joins the group other members will have to save his/her contacts in their mobile device. The approach proposed her aims at speeding up the sharing of updated information between project teams, study groups or more in general social communities. 3.3 Social network analysis As a side effect of building a shared backup system, we have data allowing us to analyze the social network of the users participating the shared backup. The available information is not only that available into the backup’s data; we can access a lot of information that the user spreads on the web both consciously, using services where the user wants to give information about himself (i.e., linkedIn, myspace, flickr), and unconsciously signing up to other services, mailing lists, and so on. Even college web sites sometimes give information about students such as matriculation number or even notes. In such context we are able to access a lot of information which have no sense or is quite useless if not filtered. We can use shared backup data to filter this information and profile the user building his/her social network and crossing such social network with the other user’s social network. Once we have built the user’s social network, we can use such network to build workgroups, use this information for marketing purpose or Customer Relationship Management. Section 7.2 details how we can build a social network getting data from a mobile device backup and from the web. 32 3.4. SECURITY 3.4 Security A backup that allows data sharing, however, can suffer the same security and privacy issues present in social networks [18]; as personal data are more affected from privacy issues than common ones, both for interest they arose and for the problems a data theft can carry to the user, some additional measures to grant privacy should be applied. Depending on the size of the sharing group, privacy issues can be approached differently. In small and medium groups an administrator can handle permissions and grants access to data to users. For example, in a medium scenario like an enterprise, data sharing can be monitored by administrators which can enforce the company privacy policies. For bigger groups like a widespread social communities, privacy cannot be demanded to security managers or privileged users; each user must prove his/her ownership on data he/she wants to share. For example, if a user wants to share an email contact, the system will send a verification code to this email address and the user will have to prove ownership replying to the challenge. For such approach privacy issues are more challenging than security; a security layer is provided deploying secure connections and data encryption. Communication security is provided using HTTP over TLS connections while data encryption can be transparently done via common DBMS encryption functions. For some classes of information (e.g., calendars) time limited sharing could be an improvement to grant privacy to user that are interested to share data just for a limited period with someone. Since mobile devices introduce a new feature applicable to backup and sharing: the geographic position, location limited sharing could solve the problem of a user that wants to share data just with person in a certain geographic area. Obviously all these solutions should 33 CHAPTER 3. OUR APPROACH TO BACKUP be combined to grant different levels and configuration of privacy settings. In Chapter 6 are detailed some results that can be applied to grant an higher level of security to the whole system. 34 4 Data extraction Introduction The backup process is basically divided into three steps; the first is to get the information from the memory of the device, the second is to save such information in a different store and the third is to restore the information from the backup into the device. Extraction, for mobile devices, is one of the most challenging problems due to the differences between devices and operating systems. In this Chapter we describe how we solved the extraction problem. We approached the problem in two different ways; a forensic style extraction and a smarter approach based on an extraction performed using the underlying OS’s APIs. The main results of the application of the forensic approach have been presented in the 2008 High Performance Computing & Simulation Conference [33]; the improvements of the extraction methodology have been later published on the International Journal of Electronic Security and Digital Forensics [34]. The generalization of the extraction methodology and the testing results 35 CHAPTER 4. DATA EXTRACTION on Windows Mobile and Symbian S60 operating systems have been published as Chapter 19 of the Handbook of Electronic Security and Digital Forensics [35]. Currently Italian Carabinieri are experimenting the MIAT tool (a tool based on the extraction approach presented in this chapter) to forensically extract information from seized devices. The extraction approach which exploits the operating system’s API has been used to extract information from more powerful devices, such approach has been presented in the 4th IFIP International Conference on New Technologies, Mobility and Security [28]. Different ways for information extraction In the plethora of devices, vendors and operating systems present in the mobile market, and as continuously new devices, implementing new technologies, giving more and more capabilities to the user are deployed, getting data from the device internal memory and restoring these data should be approached in different ways for each different case. This is due to limitations specific for each platform, operating system and even version of the operating system. In such scenario the easier way is to approach the device backup is to implement a differential (see Section 2.1.3), file based (see Section 2.1.4) scheduled backup (see Section 2.1.5) with a snapshot approach. Such approach is really powerful in case the user wants to restore all the device, replacing even configurations and system files as they were at the backup moment. A detailed description on how such approach can be performed is given in Section 4.1. Such approach suffers the problem that the backups are not easily portable from one model to another and it is impossible to restore a backup performed with such approach to a different vendor’s device. 36 4.1. FORENSIC STYLE APPROACH Another approach is to extract information just from some selected files, these files containing personal information such as contacts, calendars, text messages (SMS), multimedia messages (MMS) etc... Such approach needs to put more logic on the mobile device but allows the server side to manage in a easier way data coming from the devices. Filtering and pre-formatting information on the smartphone even allows the server to manage data in a more flexible way. In this way server is enabled to handle backup data independently from the client where data are backed up. Such approach to backup grants a higher interoperability, allows the user to migrate easily from a device/vendor/operating system to another. Moreover applications can use facilities given by modern operating systems to access personal information inside the device. This approach is described in Section 4.2 with further details. 4.1 Forensic Style Approach In this section it is described how the backup can be performed iterating recursively on the filesystem and performing a snapshot of the state of the device. Such approach is near to the forensic technique described in [33], [34], [35], [36], due to the reduced capability of the smartphone internal memory such memory can be copied to the external memory (i.e., the SD card or the MMC). After the internal data, the most critical, have been saved to the external memory such data can be sent to a remote server, described in Section 5.4 via WIFI or 3G connectivity and later elaborated (see Section 5.1) as data contained in the snapshot can be used for the restore. We developed a forensic tool, available for Windows Mobile 5 and 6 and Symbian, to extract data from inside the memory of a smartphone granting the non modification of the content of the files. Such tool is able to create a 37 CHAPTER 4. DATA EXTRACTION logical dump of the internal memory of the device into the external memory. In some cases the tool modifies some files inside the device memory or was not able to save all system files from the internal memory. Luckily these files are not key files in the backup process we propose. Moreover to perform the extraction forensically the device in some cases must to be restarted to allow the application to gain privileges to unlock system files. In the backup case there is no need to access these files, so the device does not need to be restarted and the tool can run in background on the device. The external memory content does not contain locked files, so contents can be sent to the server without considerable problems, and without the need of performing a snapshot. 4.1.1 Our methodology The approach we propose focuses on acquiring data from a mobile device’s internal storage memory, copying data to an external removable memory (like SD, mini SD, etc.). Such task is performed without the need of connecting the device to PC. Thanks to this the backup process is really easy for the user; when the device is idle it performs the logical dump and when the dump is complete it can be sent to a remote server. The complete data extraction process is shown in Figure 4.1. The extraction tool spiders all the mobile device filesystem recursively, for each file performs a hashing of each file before and after the copy, to ensure acquired information integrity. The report containing file hashes is saved in a log file (checksum.xml). The extraction tool also compiles a log file named info.xml with all remarkable events and another log file summarizing the er- 38 4.1. FORENSIC STYLE APPROACH Start more files? no Stop yes MD5 open opened? no copying using specific OS apis yes normal chunked copy check integrity MD5 Figure 4.1: Data collection workflow rors encountered namely errors.xml (Table 4.1 shows log files produced by the extraction tool). Log files are saved using an XML format. Data stored in the original memory card can be even acquired using a MMC or SD reader (USB or integrated): binary data are read from source, then stored as an image file, representing all the single bytes, including file system’s metadata. After that, it is possible to analyse the file allocation table to recover, in some cases, even deleted data. 4.1.2 Symbian implementation Extraction tool for Symbian, was developed to support and to test the methodology described above. Symbian is an operating system derived from the Epoc 39 CHAPTER 4. DATA EXTRACTION File checksum.xml info.xml errors.xml Contents File size, file typology, file name, MD5 hash, extraction, duration, and creation, access and modification time. Information about the device (IMEI, device ID, platform type, model, manufacturer), and about the extraction process (duration, battery consumption date of extraction). Information about errors that may happen during the process. Table 4.1: Files generated during the Extraction Process operating system; Symbian OS supports a wide range of device categories with several user interfaces, including Nokia S60, UIQ and the NTT DoCoMo common software platform for 3G FOMATM handsets. The commonality of Symbian OS APIs enables development that targets all of these phone platforms and categories. In order to produce executable code which does not need of any other software layer (e.g., a JVM to interpret the bytecode) The application was originally developed in C++, the native language of the Symbian OS. Most relevant files are locked by system processes, many files on the system are always open and locked by system processes. For example the file Contacts.cdb, which contains the database of contacts, is locked by PhoneBook that is the address book process. In the past ([36]) we made use of the OS Backup service to perform seizure of locked data. Such service is an utility allowing the backup of the memory contents, even if these contents are locked. An application or a service can register itself and the files which locks. The Backup Server notifies a backup request to registered applications, so they can release the lock temporarily. Once the file had been saved, the application could notify this to Backup Server and then the system process could re-acquire the lock. In a recent work ([37]), we adopted a further alternative way to get access to locked files. This way is accomplished by the Symbian RFs API method 40 4.1. FORENSIC STYLE APPROACH ReadFileSection that allows a file to be read without opening it. By this method it is possible to seize the entire file system tree including files which have a persistent lock on; furthermore this strategy preserves integrity because the access is established in read-only mode, guaranteed by the OS. There are some files and folders which are more relevant for the backup specific case, in Symbian S60 case these files are: • Calendar, containing the memo, daynotes, meetings, anniversaries; • Contacts.cdb, containing the contacts available from the address book. • Mail, is a folder containing all SMS/MMS/Email files with Sender, receiver and body. • Images, is a folder containing all pictures taken by the user or available on the gallery application. • Video clips, is a folder with user’s video recordings or video downloaded or received. • Sound clips, is a folder where the system saves user’s audio recordings, ringtones and received audio files. 4.1.3 Windows Mobile implementation The tool implementing the methodology described above has been realized and tested even for Windows Mobile 5 and 6. Due to the differences between the two environments, the realization of the tool for Windows Mobile is not a porting of the Symbian version. Implementing the Windows Mobile version required a design phase as problems to be faced where different from problems faced implementing the Symbian version. 41 CHAPTER 4. DATA EXTRACTION PocketPC internal memory and storage architecture In Windows Mobile 2003 PocketPC and earlier, device’s memory was split in two sections: a ROM section, containing all operating system core files, and a RAM section aimed in keeping the user storage (Storage Memory) and the memory space for running applications and their data (Program Memory). The user can choose the amount of memory to be reserved to Storage Memory and then to the Program Memory. The RAM chip was built on a volatile memory scheme, so a backup battery was required to keep the RAM circuitry powered up, even if the device was just suspended. In case battery power supply went down, all user’s data were lost. Such scenario forced user to recharge battery within a time limit of 72 hours (as mandatory by Microsoft to devices manufacturers). RAM ROM 64M 64M Core OS Stuff User Storage 32M 32M Memory sizes reported could change among different PPC models. Figure 4.2: Windows Mobile 5.0 memory architecture. Since Windows Mobile 5, memory architecture was redesigned to implement a non-volatile user storage. Currently, the memory is split in two section (see Figure 4.2): the RAM is aimed to hold running processes data, whereas the 42 4.1. FORENSIC STYLE APPROACH ROM keeps core OS code and libraries (called modules), the registry, databases and user’s files. Such memory, also called Persistent Storage and contained within a flash memory chip, can be built using many different technologies [38]: • XIP model, based on NOR memory and volatile memory, this technology enables device to store modules and executables in XIP (execute-in-place) format and allows the operating system to run applications directly from ROM, avoiding to copy them first in the RAM section. NOR memory has poor write performance. • Shadow model, which boots the system from NOR and uses a NAND for the storage. This model is power-expensive, because the volatile memory requires to be constantly powered on. • NAND store and download model, which reduces costs replacing NOR with OTP (one-time programmable) memory model. • Hybrid store and download model, which mixes SRAM and NAND, covering them with a NOR-like access interface (to support XIP model). Windows Mobile 5 and above place the great part of the applications and system data in the Persistent Storage. Core OS files, user’s files, databases and registry are seen by applications and users in the same file system tree, which is hold and controlled by the FileSys.exe process. Such process is also responsible for handling the Object Store, which maps objects like databases, registry and user’s files in a contiguous heap space. The Object Store’s role is to manage the stack and the heap memory, to compress and to expand files, to integrate ROM-based applications and RAM-based data. For a comprehensive 43 CHAPTER 4. DATA EXTRACTION explanation about how Windows Mobile uses the Object Store and manages linear flash memory, see [39] and [40]. The strategy for storing data is based on a transactional model, which ensures that store is never corrupted after a power down while data is being written. Finally, the Storage Manager manages storage devices and their file systems, offering a high-level layer over storage drivers, partition drivers, file system drivers and file system filters. Algorithm 1 Extraction Input: A path p. Output: none. for all objects obj (files and directories) in p do if obj is a directory then Create a directory named p in the SD Card Recursively call Extraction(p/obj) else if obj is a file then Compute MD5 hash of obj Copy obj in path p on the SD Card if obj has not been copied then Access to obj with CEDB APIs if obj could be accessed then recreate a similar database in path p on the SD Card end if end if Compute MD5 hash of the copied obj on the SD Card end if end for Implementation details We have chosen to develop the application using a native C++ approach, fulfilling the requirement of having a tool to be launched from an external mem- 44 4.1. FORENSIC STYLE APPROACH ory card, without the need of a pre-installed runtime environment (like java virtual machine), neither the need to install the tool on the device. The application runs in stand-alone mode, and it does not require any third party’s dll. Since the tool uses the standard Windows Mobile APIs to access the file system (like Open, Read and Write, FileCopy), we can reasonably think that these APIs will not change in future versions of OS: then the forward compatibility can be assured. In Algorithm 1 is depicted the pseudo-code of the seizure process, that starts after the main application killed all the other non-vital running processes. Such algorithm performs two main tasks: • the copy task, which copies all internal memory’s files of the mobile device on the memory card; • the hash task, which ensures the integrity of the copied files and allows to discover which files have been modified during the seizure process. The Extraction algorithm works using APIs like CopyFile, Open, Close, and it copies recursively every internal file system entry on the memory card. This task preserves the directory structure, copying files according to their original position. The hash task computes the MD5 hash of each file found in the device internal memory. Hashes are written in a log file saved in a separate directory. The hash task can be launched as a separate function, and it surfs the whole filesystem to compute hash of every file. The Extraction algorithm invokes the hash function before and after the copy of every single file, allowing to understand if changes happen during the copy from the internal filesystem to the Storage Card. As reported in Section 4.1.3 talking about internal memory and storage 45 CHAPTER 4. DATA EXTRACTION architecture, Windows Mobile places OS’s stuff in a lot of file-like objects in the same file system seen by the user (under /Windows directory). Most of these files are inaccessible by the standard file system APIs because they are objects that are in XIP format: most of the headers are removed and the addresses are fixed up so that the programs are able to run with no need to be loaded into RAM first. The binary has been stripped down and customized for that particular device [41]. Such files are also flagged with file attributes like FILE ATTRIBUTE INROM and FILE ATTRIBUTE ROMMODULE. Our application skips these files: there is no reason to look for a method to access such files because they are firmware’s modules and they could be replaced with new ones only by an advanced user (using the ROM flashing technique - e.g., if she is willing to upgrade her firmware with a new version of the operating system or she want to modify things like bootsplash). Moreover, there is another set of files that cannot be accessed by standard APIs: these files are database objects locked by operating system processes which cannot be killed. We reach to access their data using CEDB APIs and we are able to recreate such files in the external memory card. In Table 4.2 it is shown where most relevant data about user and system are stored in the file system. Experimental results The Windows Mobile extraction tool has been tested on a physical HTC device and on a emulated one (on a Windows XP computer). The extraction tool saves all the files containing the user’s information to be backed up. We noticed from hashes that some files have been modified, this is due to the fact that for some files it was necessary to create a new file and refill it with the original 46 4.1. FORENSIC STYLE APPROACH Filename System.hv User.hv Default.vol Location /Documents And Settings/system.hv /Documents And Settings/default /user.hv /Documents And Settings/default.vol Mxip system.vol, Mxip lang.vol, Mxip notify.vol, Mxip initdb.vol Cemail.vol / Pim.vol / / Description System registry hive. User registry hive for default user. Object store replacement volume for persistent CEDB databases. This file contains MSN contacts Metabase volumes, including language-specific data and storage for notifications. Default SMS and e-mail storage. Personal Information Manager (PIM) data, such as address book, schedules, SIM entries, call logs. Table 4.2: Windows Mobile 5.0 relevant files file contents using CEDB APIs. In Table 4.3 are shown these files encountering problems in saving phase, in the right column it is possible to see if the final file √ has been saved (−), differs (?) or not ( ). As previously described OS’s core files were not saved because these files are just virtual files. The testing phase, have been performed on a AMD Athlon64 X2 Dual 1GB Ram PC and a QTEK9000 PDA (HTC Universal), equipped with a Kingston SD 2GB. 4.1.4 Some remarks on this approach In this section has been discussed a methodology to extract data from a smartphone recursively copying the internal memory filesystem content to the external memory. To prove the effectiveness of the solution two prototypes have been implemented, one for Symbian S60 (Figure 4.3 (a)) and another for Win47 CHAPTER 4. DATA EXTRACTION File Cosistency /Documents And Settings/default.vol ? /Documents And Settings/system.hv − /Documents And Settings/default/user.hv − /Windows/*.dll − /mxip notify.vol ? /cemail.vol ? √ /mxip system.vol √ /mxip lang.vol √ /pim.vol − file not copied ? √file copied but its hash does not match file copied and hash matches Table 4.3: Extraction tool consistency analisys (a) (b) Figure 4.3: (a) Symbian S60 tool’s screenshot, (b) Windows Mobile tool’s screenshot. 48 4.2. SELECTION OF INTERESTING DATA dows Mobile 5 and 6 (Figure 4.3 (b)). The prototypes have been tested on a set of real devices, and results of the testing prove that the solution is able to extract internal device’s files containing personal user’s information and settings. For sure the application could be improved to support more recent devices, such as the brand new Windows Mobile 7. Unfortunately this approach is not sufficient to have an interoperable backup and restore system; the logical dump can, and have been, used to restore devices of the same vendor and model from where the dump have been extracted. The logical dump can be analyzed using the methodology proposed in Chapter 5 to extract interesting data that would allow to abstract from the specific device and focus on data. 4.2 Selection of interesting data Better interoperability, between devices from different vendors, can be granted delegating the extraction and part of analysis of data to the mobile client. In our approach the application focuses on how data are structured into the device memory than on the internal system structure. Such application is installed on the mobile device and acts as a client that filters personal data and configurations present on the device, formats it following the common format proposed in Chapter 3 and sends that information to a remote server which interprets the format and saves the data into a common database. All smartphone’s operating system provide APIs to access internal databases containing personal data such as address books, calendars, notes, messages (SMS, MMS, emails); such APIs can be used to collect data to be sent to a server to perform a remote backup (see Section 2.1.6) of the mobile device. 49 CHAPTER 4. DATA EXTRACTION Unfortunately these API are usually full featured developing the application in the operating system’s native programming language (i.e., for IOS, Objective-C; for Android, Dalvik Virtual Machine Java interface; for Symbian, Symbian C++; for Windows Mobile, Visual C++ or .NET). Portable source code such as J2ME cannot access some contents or has writing limits for some others. Moreover Java virtual machine is not available for all operating systems (e.g., IOS) or has a different implementation (e.g., Android’s Dalvik) and the J2ME code is not fully portable. Another problem of implementing mobile applications in non native languages is due to the execution speed and resources consumption due to the virtual machine effort. Considering the limitations due to develop a, more or less, portable client in J2ME, and the difficulties due to implement a client for each operating system using the specific native programming language. The better approach to follow is the second which with a little bit of developing effort offers a more stable, performing and powerful backup client application. The implemented applications will retrieve data from the internal databases of the mobile device. These data will be sent using the REST web services provided by our backup server using the proposed data model. Our data model allows different OS to communicate, in particular we describe how it is possible to backup data on a Symbian S60 device and store them in a remote server and then restore them in an Android 2.1 device. We choose to implement firstly our clients on Symbian and Android to show how older and newer devices can easily cooperate with our approach. 50 4.2. SELECTION OF INTERESTING DATA 4.2.1 Symbian We realized the backup and restore client for Symbian, with a basic user interface just to show how collaboration was possible. The Symbian Socket framework was used to establish a TLS connection with the server. Symbian’s CActive allows to perform long running task in background and realize an asynchronous communication with the server, this behaviour is similar to Android’s AsyncTask class. Asynchronous communication between client and server grants that the user can perform other operations while the application is running, this allows to run the application in background while the user continues using the device. To access data it was necessary, for each data type, to open a session with the respective servers, which manage the communication with underlying databases or files: • to extract address book data from Contacts.cdb, the CContactDatabase class has been used. This class gives access to all the contacts databases. • to handle Calendar data a client server session is necessary, to get access to calendar data CCalSession object must to be used; • to get access to messages, (SMS, MMS and Emails) it is necessary to establish a communication channel with the Message Server through the CMsvSession::OpenSyncL() method; • all the other files present in the multimedia folders (see Section 4.1.2) such as pictures or videos can be accessed directly as files using the approach described in Section 4.1, and sent to the server. 51 CHAPTER 4. DATA EXTRACTION 4.2.2 Android On Android we developed a complete prototype, we designed a user interface that allows to choose the type of data to backup (i.e., contacts, calendars, files, SMS, application settings) and the backup’s type (full or incremental). Before restoring it, it is possible to select the backup to restore. Backup and restore tasks have been realized through asynchronous tasks in background using the AsyncTask class provided by the framework itself, which allows to notify the UI thread with results without the need of specific handlers. HTTP requests/responses have been managed using the well known HTTP Client of Apache’s Jakarta Commons project. To extract data it was necessary to bypass the Android’s access policies. Each Android application has its own sandbox which the other applications cannot invade, but for explicitly declaring some permissions. It is possible to access applications’ private data only if the application provide a Content Provider, which makes possible to access to private data of applications in a uniform manner. So private data of contact, calendar, SMS and media file applications have been accessed through the respective content providers. Calendar data have been accessed directly from the SQLite database. Each application stores its own persistent settings in a XML file contained in the shared prefs private directory; there is no way to access such information if application does not implement a Content Provider. This limitation was overcame elevating access permissions of the backup application as root (http://www.koushikdutta.com). 52 4.3. PERFORMANCES Data Type contact SMS calendar event file msec 81315 80877 5544 278980 units 150 170 14 3 msec/unit 542 476 396 92993 Table 4.4: Time overhead of the backup operation per data type 4.3 Performances The system developed have been tested on a HTC Legend device connected to a 54Mbps WIFI network on a secure HTTPS channel. We aimed at measuring the time overhead introduced by our system, and thus we measured the time needed to execute single backup functions. Table 4.4 shows the times needed to backup a commonly used smartphone, with 150 contacts, 170 text messages, 14 calendar events and 3 files of size 104.796 KB, 5.659 KB and 161.166 KB. Clearly, the most expansive operations are on files; to save the 3 files, the application needs 278 seconds, which is 71% of the total time needed for the backup. The total overhead to perform a full backup of the device amounts to 447 seconds (about 7 minutes), preserving the usability for real use cases. The most common operations are on incremental backup, hence, in the last column of Table 4.4 4.4 Concluding remarks In this chapter two different approaches and implementations of backup systems for mobile devices have been described. These approaches must be combined to realize a powerful backup system implementing a differential, resource based, scheduled backup with a snapshot approach. Focusing the backup target on mobile devices, for some classes of data 53 CHAPTER 4. DATA EXTRACTION backup can be performed online monitoring resources such as address book, calendar and other resources updated frequently and keeping the most important data, for the device’s user, up to date. Combining the two approaches described with the proposed data model, the backup system coming as output will take advantage from the first approach to maintain a snapshot of the system’s status for a fast restore; moreover the first approach can be used to handle resources such as multimedia files and non structured databases. The second approach is really powerful to handle structured data. Such kind of information is sent over the network exploiting the proposed data model. The server will take advantage of data “formatted” using the data model to build an interoperable data structure accessible from all classes of mobile devices and mobile operating system. Even if we use the first approach to save data residing on the device on the external memory and send these data to the backup server, personal information are contained into the backup in raw format. The reader will see in the first part of Chapter 5 a methodology proposed to extract personal data from raw backup files. 54 5 Data elaboration Introduction The second step to be performed in a backup process is to save data to a location different from the location being backed up. This phase can be performed in several ways and, obviously, results obtained will be different. The most useful approach is not to backup the device as is, with all the system files, but get only the information useful for the user. Mobile device operating system usually can be restored using some key combination or specific commands. For example for Symbian devices typing *#7780# resets the device without erasing user’s data, *#7370# deep resets the device even erasing user data. Saving system data is completely useless, on the other hand the user is interested in restore his/her personal data i.e., contacts, messages, calendars, files . . . The first part of this Chapter shows how personal data can be extracted from the raw backup of a mobile device; the extraction is performed using the methodology we proposed in the International Conference on Ultra Modern Telecommunications 2009 [42]. 55 CHAPTER 5. DATA ELABORATION In the second part of the chapter we describe how data can be managed both on device and in a server in a smarter way, using the approach published in the 4th IFIP International Conference on New Technologies, Mobility and Security [28]. Personal data is extracted directly on the device, using the second approach described in Chapter 4, or from a raw backup, using the approach described in Section 5.1, and it’s saved in a common database to grant interoperability between different devices. Data are contained inside smartphones as files. These files can be accessed in several ways, the first approach described in Chapter 4 handles each file in the same way, whether it contains a picture or the address book. The second approach focuses its interest, exploiting operating system’s API whenever possible, into data contained into the databases, and not into files containing data. These two methodologies approach the problem in a too much different way, and obviously the second approach cannot be used to extract logical data, such as contacts, from a backup stored using the first approach. Information contained in some files, such as Contacts for Symbian, backed up using the first approach, is useless if not processed to extract interesting data from these files. Files containing data stored in a smartphone can be divided mainly into two classes: Non structured files are files such as multimedia files, text files or PDF documents saved on the device during its usage, these files are accessible and can be restored on other devices without any further processing; Structured files these files usually are databases containing information used 56 5.1. REMOTE ELABORATION by device applications such as address book, calendar or text messaging. To save the information contained within a structured file is more important than to save the file itself; as it enables the backup system to store such information into a common data structure, to allow different devices to interoperate (see Section 4.4). 5.1 Remote elaboration After extracting the file system’s logical dump from a smartphone (see first approach described in Section 4.1), the dump is sent to the backup server described in Section 5.4. When the logical dump is received from the client the server needs a method to decode personal data stored within several mobile DBMS files and to make them available to other applications. Such DBMS files contain actual and obsolete data, i.e., old or deleted entities; this occurs because the mobile OS, for performance reasons, defers the deletion as long as possible, e.g., when the free space available in the file system is not enough. It could be useful for a backup system to recover even erased information even if this information has not been backed up. Unfortunately deleted information is not accessible using DBMS APIs provided by manufacturers (when available). Therefore we chose a Data Reverse Engineering (DRE) approach to retrieve and decode the storing format. In the traditional architectures (PCs and mainframes) the DRE was studied as business solution either for the control of data handled via legacy applications or in order to reconstruct deteriorated data. Developed models are too generic for mobile environments [43], or they aims at discovering mainly the data model [44], [45], [46], or have been studied to address vertical problems like extracting data from COBOL, DB/2 [47] or Access. For our scope, we are not interested in discovering the data model 57 CHAPTER 5. DATA ELABORATION because we know a priori which data we are looking for (e.g., all the user controllable data attributes like contact’s name and surname or SMS text), and we do not care about the relational structure. Moreover, a great facility given by a methodological DRE application, is that, when file formats change, after reapplying the methodology we are able to update our knowledge about how data are stored. In this section we propose a methodology allowing smartphone’s DRE operators to be more flexible in the mobile file formats knowledge. As a matter of fact, the mobile phone environment is composed of a plethora of manufacturers and operating systems, each of them is released in several versions which stores data in different formats. Handling such heterogeneity through a methodological approach is an important asset to allow the system to decode different platform’s databases. The DRE methodology has been proposed to solve mobile forensic problems due to lack of standards; we applied it to the backup case with success. As a case study we applied these methods to the Symbian OS, and we obtained several results, including the mapping between a given data and its location into the file system, the obsolete data recovering, and the Symbian personal databases format reversed. The obtained results (see Section 5.3) show that our methodology can be successfully applied to environments which are different from the forensic starting point. The methodology helps to decode databases files and to develop ad-hoc parsers; data extracted by such parsers can be easily converted and used to perform tasks such as backup, user profiling, device syncing and data recovery. A flow-chart of the methodology is shown in Figure 5.1. 58 5.2. OUR STEP-BY-STEP METHODOLOGY Stage 0: Choice of the objective Stage 2: Data hypothesis and entities injection Stage 1: Files of interest identification Stage 3: Sequences similarity discovery Stage 4: Data interpretation No Yes Is it sufficient to modify the hypothesis? Goal reached? Stage 6: Error correction Yes No Objective reached? No Stage 5: Meta-format building Yes Stage 8: Testing & debugging Stage 7: Parser building Figure 5.1: The methodology flow 5.2 Our step-by-step Methodology Smartphone’s operating systems save personal data in many DBMS tables which are stored in binary files. Often the format of such files is not public and the tools available to read them rely on an operating system native API (if they run on the device) or on a porting of their code (if they run on a PC), and, when available, they can not retrieve deleted or modified data. Therefore, a solution is to interpret the binary file directly in order to give a structure to the internal data. Initially the problem was addressed through the comparison of multiple files of the same type, relying on the analyst’s ability in the intuitive interpretation of the data content. In such way the analysis of data was of59 CHAPTER 5. DATA ELABORATION ten confused and led to performing redundant operations without any result. Therefore, in order to preserve obsolete data, we chose to design a methodology for the binary file interpretation, which was able to decode the information required without performing redundant operations. Furthermore, the methodology will help to retrieve the data alterations and deletions. Our main contribution is to propose a wisdom-driven DRE methodological approach to decode smartphone’s personal data, that are stored in several DBMS-managed files; with the contribution of this chapter we provide the tools to reach the following targets: • understand where information is stored in the mobile device’s file system; • retrieve and decode personal actual and obsolete data; • develop a suitable parser. Stage 0 aims at choosing which kind of information (the objective) we want to find and how it can be decoded. An objective is composed by one or more goals. We may think of an objective as an entity (e.g., a contact, or a call log, or a SMS) composed of one or more fields (e.g., for a contact, the first name, the last name, the phone number, etc.), which are the goals of our objective. Stage 1 aims at identifying which files (file of interest) could contain data (our goal) we wish to decode. With Stage 1, the methodology enters in a iterative process which allows to understand the binary format of data by comparing different versions of it. In Stage 2 some assumptions about the data type are made. Such assumptions lead the choice of sample instances of entities to be inserted into the device’s databases. Instances are stored as records which are contained in one or 60 5.2. OUR STEP-BY-STEP METHODOLOGY more binary files. If required, the hypotheses made in Stage 2 will be refined in Stage 6 and the instances may change. The number of instances inserted will determine the number of comparisons among binary records, that will affect the precision of next Stage. Stage 3 deals with the binary files’ content formatting, in order to make the data instances inserted in Stage 2 identifiable and comparable. Usually, we try to group similar zones within the same sample binary file, and among different sample binary files, and then proceed to the interpretation. Formatting must take into account the data interpreted successfully in previous iterations, in order to cut them off (i.e., data already analyzed) from the study of a new format. The Stage 4 comprises two sub-tasks: the first deals with identifying candidate bytes sequences, and the second aims at decoding the candidate bytes sequences. The identification of candidate bytes sequences is performed by removing all the sequences that do not match with the hypothesis of the Stage 2. The second task tries to find the connection between the data inserted in Stage 2 (the instances) and its binary representation. As depicted in Figure 5.1, the methodology iterates through Stage 1, 2, 3, 4, and 6 (error correction) until a goal is reached, i.e., the information about the format of a entity’s field is exhaustive and a mapping between the field and its binary storing format is found. The fifth Stage simply annotates in a meta-format all mapping information found. If the joining of all meta-format found allows the decoding of the entire objective (the information needed) identified in Stage 0, the methodology goes to Stage 7. At this Stage a piece of software able to decode automatically the now-exposed file format will be designed and implemented. All collected knowledge about the format turns into a set of software requirements. This process must be repeated for each file marked as file of in61 CHAPTER 5. DATA ELABORATION terest. Such a piece of software will be tested at Stage 8. In the following sections we will describe each methodology Stage. 5.2.1 Stage 0: Choice of the objective Before starting, we must to choose from which data we want to start the decoding process. We define as objective the type of personal data (e.g., contacts, SMS, email, calendar, events log, etc.) we want to find into the device’s file system and to decode the binary format. An objective can be seen as the set of “atomic” goals that must be completed in order to reach the objective. For instance, in order to decode the contacts (the objective), after having detected in which file (or files) they are stored, we have to find how the contact’s data elements (goals) are encoded. Such goals are attributes such as name, surname, mobile phone number, e-mail, street address, etc. Definition 1 Let an objective Γ be a set such that it contains the list of goals we want to reach. Γ = γ 1 . . . γn In this Stage we can only define roughly an approximation of Γ: thanks to information about the objective’s data format that we will learn progressively in the next Stages, we will be able to refine Γ with more accurate goals. 5.2.2 Stage 1: Files of interest identification Given the objective chosen in the previous Stage, this step aims at identifying files to be analyzed and decoded in next Stages. Mobile devices save personal data in database files stored persistently in the file system. To identify the files containing the information we are looking for, we first need to cause a lot of 62 5.2. OUR STEP-BY-STEP METHODOLOGY changes inside these files in order to make them identifiable. These changes are objective-dependent: if we are looking for contacts, we will generate activity like contact insertion; if we are looking for events log, we will make calls, sim-changes, and send and receive SMS. Each of these operations generates an entity (E) which will be stored as one or more records in the file system. Each entity E is a set composed by m ∈ N attributes (). Definition 2 For each goal γi ∈ Γ there is a set of attributes j ∈ E such that, after discovering the encoding of each j in the set, the goal γi will be reached. Definition 3 We define Ω as the sequence {E1 , . . . , En } of entities we have to insert in the device in order to modify all the files involved in the given objective. The value of n depends on the objective’s type and on how its entities are stored. Then, n can only be supposed as the process starts, but it could be refined over the methodology’s iterations if needed. For instance, let E be a contact’s card: each i ∈ E will be an attribute such as name, surname, date of birth, phone number, email address, and so on. As a best practice, there is the need to fill every i ∈ E attribute in order to modify all possible files involved in Γ. Definition 4 Let A be the fileset (in our case, the whole device’s file system) before performing the Ω operation set on the device. Let B br the fileset after performing Ω operation set. The application T of operations set Ω on the device is: TΩ : A → B Definition 5 Let diff denote the function which computes the differences between two 63 CHAPTER 5. DATA ELABORATION ε1 John ε1 ε2 Brown ε2 White ε3 +123423456 ε3 +19280023 ε4 [email protected] ε4 [email protected] εm some_info εm some_info Ω ={ E1 E2 Peter En } Figure 5.2: The format of the Ω operations sequence. In this figure is shown an example with contacts discovery as objective filesets. The fileset C, which contains only files modified by the T application, is: C = diff (B, A) C may contain garbage data, since other operations may occur when the user performs T . Then, we must “clean” C, searching and deleting all irrelevant data. Definition 6 Let clean denote the function which cleans a fileset of garbage data. The fileset Φ is: Φ = clean(C) 5.2.3 Stage 2: Data hypotheses and entities injection After the insertion of the Ω entities, the Φ set tells us which files have been modified, but it still does not give us information about how the i are encoded 64 5.2. OUR STEP-BY-STEP METHODOLOGY in the storage. In Stage 2 we will perform three tasks: 1. We make assumptions about the possible i format. The Λ set represents the collection of assumptions we made at this Stage. Λ is composed by assumption about data type, size and predictability. The latter indicates if we can control the value of i . Possible values of predictability can be the following: • controllable: the attribute corresponds to any input field and the user can fully control its value. An important property, for the methodology application, is that controllable attributes can be stored more than once in the device, and the corresponding byte sequence is always the same. In contacts case, controllable attributes are input fields like name, surname, phone number, etc. If we hit the right type and size, we will be able to predict the binary (hexadecimal) version of the data. • uncontrollable: the attribute does not correspond to any input field and the user is prevented from handling its value; there is no way to predict the binary version of the data. In the contacts case, the contact’s ID is an uncontrollable attribute, because it is transparently assigned by the system. • pseudo-controllable: the attribute does not correspond to any input field and the user is prevented from handling its value, but it can be partially predictable in its binary version. For instance, if we store two contacts in the same day, the year/month/day part of the insertion date (the 6 most meaningful bytes, for 8-bytes date format) will 65 CHAPTER 5. DATA ELABORATION be the same for both of them. 2. Once the assumptions at the previous point have been made, we generate a set Ω0 of sample entities which have all attributes but the i-th set to NULL: 1 = NULL ... Ω0 = i = v1 ... m = NULL ,..., 1 = NULL ... i = vk ... m = NULL where |Ω0 | = k, i = va ∈ {v1 . . . vk }, j = NULL, ∀ j 6= i. Values va will be chosen as they will be easily identified trough all file bytes in the next Stages. A good choice for va values critically influences the subsequent steps; in the early iterations of the methodology, va should be chosen with values that, disposed in the Ω0 entity sequence, follow a periodical repetitive pattern (e.g., AABB, ABAB, AAAA, etc.). Thanks to this approach, in the next Stages we will be able to retrieve them through a pattern similarity matching, avoiding ambiguities in the modified file’s zones caused by insertion side effects. 3. Finally, we have to insert Ω0 entities into the device through an application TΩ0 , and then we have to perform a new file system dump in order to analyze the files generated via the insertion performed in the previous task. The output of this Stage is Λ and Φ0 , the set composed by all files containing Ω0 entities. 66 5.2. OUR STEP-BY-STEP METHODOLOGY 5.2.4 Stage 3: Sequences similarity discovery The goal of this Stage is to get the Φ0 fileset, containing the sample entities, and to find all sequences of bytes which present the same similarities as the attributes of Ω0 entity set inserted in Stage 2. In the previous Stage, we injected entities which shared one or more attributes among them. The attributes of entities was injected following a pattern, like pairs of calls with the same duration or contacts with the same fields. In this Stage we have to highlight the file’s byte sequences which are equal among them. In the call duration example, if we made c pairs of calls with the same duration, we will find c equal pairs of byte’s sequences in the events log file. Therefore, if the assumptions of the previous Stage were correct, the current step simplifies the interpretation tasks in the next Stages reducing the file’s complexity. The Stage 3 process iterates through the following steps: 1. Discard file zones which are not directly affected by the operations in Stage 2; 2. Identify attribute separation flags; 3. Identify, highlight and separate similar byte sequences. In Figure 5.3 is shown an example in which we are going to format the event log file to detect the storage format of the voice call duration. In the previous Stage we made pairs of calls of the same duration (Figure 5.3a). All the useless information (metadata, index and tables) was discarded and similar zones were looked for in accordance with the methodology described. Once similar zones are identified (Figure 5.3b), they have to be formatted in the same way to enable the next Stage to refine the identification and to 67 CHAPTER 5. DATA ELABORATION 00 60 00 00 2A 00 31 7F 00 02 00 00 00 02 CF 00 06 11 02 63 00 E1 1B 39 98 00 30 46 01 00 0C 58 00 31 F2 00 02 00 00 00 02 C3 00 06 00 60 00 00 2A 00 31 66 00 30 46 01 00 0C 58 00 31 C1 05 63 00 E1 C8 39 B8 00 06 00 60 00 00 2A 00 31 EB 00 02 00 00 00 02 58 00 31 DD 03 63 00 E1 46 39 9C 00 30 46 01 00 0C 2A 00 31 6E 00 02 00 00 00 02 C9 00 06 00 60 00 00 E1 1B 39 82 00 30 46 01 00 0C 58 00 31 FB 06 63 00 00 00 02 BD 00 06 00 60 00 00 2A 00 31 96 00 02 00 01 00 0C 58 00 31 9C 04 63 00 E1 46 39 CA 00 30 46 00 60 00 00 2A 00 31 7F 00 02 00 00 00 02 CF 00 06 11 02 63 00 E1 1B 39 98 00 30 46 01 00 0C 58 00 31 (a) F2 00 02 00 00 00 02 C3 00 06 00 60 00 00 2A 00 31 66 00 30 46 01 00 0C 58 00 31 C1 05 63 00 E1 C8 39 B8 00 06 00 60 00 00 2A 00 31 EB 00 02 00 00 00 02 58 00 31 DD 03 63 00 E1 46 39 9C 00 30 46 01 00 0C 2A 00 31 6E 00 02 00 00 00 02 C9 00 06 00 60 00 00 E1 1B 39 82 00 30 46 01 00 0C 58 00 31 FB 06 63 00 00 00 02 BD 00 06 00 60 00 00 2A 00 31 96 00 02 00 01 00 0C 58 00 31 9C 04 63 00 E1 46 39 CA 00 30 46 00 11 58 2A 00 00 00 00 02 0C 00 DD 58 2A 00 00 00 00 02 0C 00 9C 58 2A 00 00 00 00 02 0C ... ... F2 E1 00 00 00 6E E1 00 00 00 7F E1 00 00 00 66 00 00 63 00 82 00 00 63 00 98 00 00 63 00 (b) B8 01 00 02 00 BD 01 00 02 00 C3 01 00 02 00 60 02 1B 30 06 31 31 39 46 60 03 1B 30 06 31 31 39 46 60 04 46 30 06 31 31 39 46 (c) Figure 5.3: These figures show an example of a DBMS binary file before and after the Stage 3. In (a) the sample file after making pairs of calls of the same duration (Stage 2). In (b) equal sequences highlighted. In (c) the formatted file Φ̂0 understand which file parts were changed after the Ω0 entities insertion. We must separate the user added information from other file data (Figure 5.3c). A good file formatting is given by isolating different file zones from similar ones, and then by isolating flags. The output of this Stage is the Φ̂0 containing the formatted fileset. 5.2.5 Stage 4: Data interpretation Stage 4 is composed of two steps; the candidate sequence identification and the candidate sequence interpretation. Definition 7 The candidate sequences are sequences of bytes, stored in the Φ̂0 fileset, in which we are likely to find the data we are looking for. ΣΓ,Λ is the set of candidate sequences for a given objective Γ, and under a given assumption Λ. The candidate sequence identification relies on the hypothesis about attribute data properties made in Stage 2, and it deals with simplifying the se- 68 5.2. OUR STEP-BY-STEP METHODOLOGY quence, deleting all non-relevant data. In particular: • If the data is constant it is always stored in the same format, so the formatted files containing the data can be simplified by removing all the different bytes; if the data’s size is equal to the size in the assumptions made, such data is added to ΣΓ,Λ ; • If the data is variable probably the storing format will be always different, so all the equal formatted files parts can be removed to simplify. If the data size is equal to the size in the assumptions, such data is added to ΣΓ,Λ ; • If the data is pseudo-variable the storing format will be partially constant and partially variable; we have to look for the constant parts of the file and, then, we can look at the proximity of the constant zone in an area with its size equal to the hypothesis. Then the sequence is added to ΣΓ,Λ . If ΣΓ,Λ = ∅ or |ΣΓ,Λ | is large (unmanageable quantity), in order to reduce the number of resulting candidate sequences, we have to analyse the results and understand how to change the Λ assumptions made in Stage 2 (through Stage 6). Once the assumptions are modified and the new Ω0 entities are inserted in the device (reiteration through Stages 2, 3 and 4), the precision of this Stage will improve. When we reach a manageable size of |ΣΓ,Λ |, the candidate sequence interpretation task can start. In this step we consider Λ to better understand which part of the candidate sequence represents the data we are interested in. We look at the Ω0 sequence of operations and check if the sequence does match in the candidate sequence set. If the sequence of attended values of attributes in Ω0 is the same in the ΣΓ,Λ , the sequence is ready to be interpreted. As the database 69 CHAPTER 5. DATA ELABORATION files are usually in hexadecimal format and the target data are in a different format (e.g., string, decimal format), it is necessary to transform data in a common format (e.g., decimal). The last step to be performed is to compare data contained in the database with the data inserted in Ω0 entity sequence and, if those match, the storage format is saved and the next Stage starts. 5.2.6 Stage 5: Meta-format building After the data decoding in Stage 4, we need to store the information collected in a intermediate format. This Stage should be seen as a “methodology intermediate status saving”, which helps the operator to choose the next γ goal to process, and to refine it if required. Before compiling the meta-format, this Stage requires the compilation of a “formats table”. In such a table a list of data discovered at Stage 4 is reported, and for each data the following metadata are shown: Field Name Is a text placeholder associated with the data. This label will be substituted to the data value in the Φ̂0 , in order to make its retrieval easier. Size The size of data, expressed in bytes. Description Other information, useful to the parser building Stage, like: which information is held by the field, type of data, endianess, suggestions for the automatic data localization, etc. Example An example value of the field. Each discovered data needs a row in the table. An example of formats table is shown in 5.4a. 70 5.2. OUR STEP-BY-STEP METHODOLOGY Field Name ID NAME LEN NAME ... Size 4 1 ( NAME2 LEN ) ... Description Int, Bigend Int, Littleend String ... Example B6 03 00 00 0E 43 6C 61 75 64 ... (a) A table with pseudo data type, got as output by Stage 4. B6 0E 43 0A 44 10 55 09 03 00 00 6C 61 75 64 69 61 72 61 67 6F 6E 69 72 6F 6D 61 32 13 00 10 (b) The meta-format file before Stage 5. ID NAME LENGHT NAME SURNAME LENGHT SURNAME COMPANY NAME LENGHT COMPANY NAME CXF1 (c) The meta-format file after Stage 5. Figure 5.4: This three figures depict an example of the application of Stage 5 on a file containing the phone’s address book. 71 CHAPTER 5. DATA ELABORATION After compiling the formats table, the meta-format file will be equivalent to the sample binary file purged from non-relevant bytes. Data such as headers, indexes, etc, can be deleted if they are not relevant for the purposes of the objective. The first step to be performed is to identify, for each entry in the table, the values with which the data is manifested into the meta-format file (Figure 5.4b) and to replace them with the related labels in the table (figure 5.4c). In this way all relevant data in the meta-format file will be replaced by placeholders that will be easily detected at the parser building Stage. The example shown in Figure 5.4 takes into account a contacts file containing two records with following fields: name, surname and company. After this Stage, the given binary file could be automatically interpretable, if all the following conditions are satisfied: 1. The meta-format’s data and values not yet identified have a static size, so they can be ignored. In this case the parser is able to skip them automatically; 2. All required meta-format’s data and values are identified; 3. If after having tried different hypotheses of Ω0 , the identified zones in the meta-format did not change at all, then the meta-format file and the formats table are stable. 5.2.7 Stage 6: Error correction This Stage will be performed if the current γi was not reached (e.g., Stage 4 was unable to find a correct interpretation for the i representation) and it is mandatory to re-iterate the methodology. The error leading to this Stage can be 72 5.2. OUR STEP-BY-STEP METHODOLOGY caused by two cases. In the following list we show the actions to be performed in the next iteration: 1. ΣΓ,Λ = ∅ or |ΣΓ,Λ | is high (unmanageable quantity): if there are no candidate sequences or there are too many, some backtracking needs to be performed to obtain a manageable number of candidate sequences. Some actions may be useful to do this: (a) Changing the assumed data size. This implies reformatting the Φ0 , building up a new Φ̂0 . If ΣΓ,Λ = ∅ and we are looking for matching sequences, we need to decrease the size. Two different big sequences might contain two matching smaller sequences. On the other hand, if we are looking for non-matching sequences, the size needs to be increased. In the case where |ΣΓ,Λ | being high, if we are looking for matching sequences we need to increase the size, and decrease it for non-matching sequences. (b) Modifying Ω0 , adding or deleting entities, or changing the i values. A new Ω0 could give as output more accurate results. The changes should be done according to the feeling of the operator, this is the hardest part of the whole process and the operator’s skills play the starring role. (c) Verifying Φ0 correctness. Verify that the file we are looking into is the right one (the required information may reside in another file). 2. If the interpretation of candidate sequence did not decode any information about the storing format: (a) Changing the assumed data size. 73 CHAPTER 5. DATA ELABORATION (b) Modifying Ω0 . If an ambiguity among different candidate sequences happened, modify Ω0 in order to restrict the change to less bytes; (c) Changing the data type. Changing the data type might help the decoding from hex. If none of the above cases apply, or the suggested changes did not lead to a correct data interpretation, we need to review the current γi goal in Stage 1. Each reached γ reduces the space of assumptions we are free to choose to build Λ (and Ω0 as a consequence) for other γ. 5.2.8 Stage 7: Parser building This Stage takes as input all collected knowledge about the given binary file format. The operator should be able to write a program that reads data from the logical dump of the smartphone and converts them in a XML format. It is mandatory to implement a quality monitor that measures the number of entries in which the parser encounters problems. The ratio r = F T between the number of failures (F ) and the total number of entries (T ) will be an indicator of the need to perform additional methodology’s iterations. The threshold below which r is acceptable depends on the required accuracy. 5.2.9 Stage 8: Testing and debugging In this phase the parser produced in the last Stage will be applied on several logical dumps, in order to test it and to debug it over real cases. In this Stage the r values of the current parser it will be verified and will be established if the implementation precision is sufficient or not. 74 5.3. REMOTE ELABORATION RESULTS Case Study Logdbu.dat Information Event Log Calendar Contacts.cdb Mail folder Memo Contacts SMS/MMS/Email Detailed Information SMS previews, MMSs, e-mails, calls, video calls, PRSConnection, SIM/MC change. Daynotes, meetings, anniversaries Contacts information Sender, receiver and body Table 5.1: Symbian files of interest 5.3 Remote elaboration results In order to verify and to refine the methodology’s Stages, we took the Symbian S60 operating system as a case study. Applying the methodology produced the results we are going to show in this section. File of interest - Stage 1 helped us to find a list of files containing SMS, MMS, contacts, and all user’s personal data, which are shown in Table 5.1. Symbian personal data files format - Thanks to the methodology we have been able to reverse engineer the Symbian S60 DBMS file format. We applied the methodology to the contacts list, to the calendar, to the text/multimedia messages and to the phone’s event log (which contains calls, sent and received SMS/MMS preview and SD card ad SIM changes). The complete format is explained in Appendix A. Obsolete data - Among information identified and retrieved in the case study, we were able to find obsolete data which were not purged from the file system. The DBMS resources optimization strategy, in fact, reduces the high-cost of DB’s modify/delete operations by flagging them as “obsolete”: for these reasons the modify/delete operations are scheduled as late as possible, and the circumstance when they are performed varies 75 CHAPTER 5. DATA ELABORATION depending the kind of file. For instance, in the Symbian case, in Contacts.cdb the deleting operations are performed when the Compress() syscall is invoked. Operating system tasks and third-party software as well can invoke this function, and they are able to know whether or not to perform compression by invoking CompressRequired() (see [48]). Let S the disk total space, F the free disk space, and W the amount of disk space wasted; the boolean function returns true if: (W > 64K) ∨ (W > 16K ∧ W > 1 (W > 16K ∧ F < 20S )∨ (W > 16K ∧ F < 16K) 1 2S )∨ After a compression is performed, the contacts are rearranged, the space wasted by obsolete records is recovered and there is no way to recover obsolete data. If the extraction operation occurs before the compression was invoked, we will find a database file that will contain all data since last compression. For case studies related to the contacts, calendar and event log, enough information was decoded in order to reconstruct the owner communication history. In the case study of messages (SMS, MMS and emails, stored in the /System/Mail folder) we were not able to find erased data, because OS purges immediately deleted messages to optimize the available storage. Unexpected information - A part of data attributes are not controllable by the user, i.e., she can not insert them into the system explicitly, thus we were not conscious of their presence. During Stage 4, the nature of our methodology helped us to retrieve such “hidden” information, as the record’s ID and its creation date. Such an important result enforces the methodology effectiveness, since it is able to detect more goals than the identified ones 76 5.4. LOCAL ELABORATION in Stage 0. In our case study, some unexpected information helped us to better understand the data model, thus the application’s behaviour. We applied the methodology to more than 50 device dumps. At the beginning, the first dumps we studied came from Nokia N70 devices1 , but we realized that the knowledge we had about the S60 format was still incomplete since the parser was unable to decode an older phone’s dump (Nokia 7610). After applying a few iterations of the methodology, we built a parser able to interpret the new format. 5.4 Local elaboration Local elaboration requires less work to be performed by the server part of the system. Differently from the remote elaboration case, the most part of local elaboration is performed on the mobile client (see Section 4.2). The server side of the system in this case must provide to the clients the API to communicate, save and restore backups data. Following the cloud paradigm these API are provided as web services. A set of REST web services have been implemented, we choose to implement the web services using a REST architectural style because we wanted to exploit the HTTP protocol facilities. HTTP grants the system to be scalable, easy to be maintained and provides a secure transmission level (HTTPS) without implementation effort. Mainly REST architectural style is suitable for our purpose as the server’s tasks can be performed using PUT and GET requests. Figure 5.5 shows the server architecture. We designed a tree level architecture following the Model-View-Controller (MVC) pattern [49]; on the top view 1 Equipped with Symbian OS v8.1a, S60 Platform Second Edition, Feature Pack 3 77 CHAPTER 5. DATA ELABORATION Apache Tomcat Integration View Control Model Restlet XStream - XML (de)serialization HTTP GET/POST/PUT/DELETE Business logic ORM (Active Objects) DAOs - Java POJO MySQL Figure 5.5: The architecture of the backup server layer is contained the interface with clients. The interface has been realized using the RESTlet framework [50], [51]. REST web services are exposed using the Apache Tomcat [52] application server and accessed via standard HTTP(S) GET/PUT/POST/DELETE methods. Data are sent over the network via XML, object representations are serialized and deserialized via the XStream library. The control layer shown in the center of the figure implements the business logic of the backup system. Business logic does not contain only functions to handle data to be saved into the database; this layer contains even parsers implemented as result of the application of the methodology proposed in Section 5.2. 78 5.4. LOCAL ELABORATION The model level has been developed using the Active Objects ORM [53] which allowed us to interact with the MySql database directly using standard Java objects (POJO) [54]. The server provides REST API to perform full and an incremental backup. The server provide an interface to backup and restore contacts, calendars, text messages, multimedia messages, emails, application and system settings. To access the backup and restore services the client must authenticate through username and password. On the first interaction between client and server the server expects a full backup, the server will store all the data sent by the client to the database. Each time the client and the server interact the server on the bootstrap phase of the communication the server sends the list of flies, saved into the server, representing the last version of the information; the client creates the list of files to be sent and later using the proper methods will store/update/delete the files from the last version of the backup. To create the list of files to be backed up the client uses the MD5 hash and the last modification date of the files processed using the first method described in Chapter 4, while for the contents inside the databases, uses the list, given by the server, containing the modification date of each entry, extracts the new/updated/deleted data and creates the lists to be processed to synchronize client and server. A typical interaction between client and server starts with the insertion on the server, through the /backup/{backupType}/device/{imei} method, of a backup item; in this way the server can identify the type of backup performed from the backupType parameter and the device by the imei parameter. Identify the device is fundamental in case the user holds more than one device 79 CHAPTER 5. DATA ELABORATION and ho uses the system to synchronize these devices. The server answers with the list of resources composing the last backup. Subsequently the client sends all data using the proper methods and the server updates the date of the last backup performed the operation is straightforward, and similar for all kind of data. The full XML communication protocol is detailed in Appendix B. 80 6 Protecting saved data Introduction Personal data are probably the most valuable to a user in today’s world. Somebody says that “data is the new oil” [55]. This information needs to be kept safe and accessible only to people explicitly authorized by the data owner. This can be achieved using authentication, security and privacy techniques; these techniques are usually based on cryptography. Unfortunately cryptography adds a lot of overhead to operations performed on data, and, even if mobile devices are becoming more powerful they still encounter performance and battery life problems. Cryptographic operations affect both by requiring to the device to execute more operation to achieve the same task. In this chapter we show a novel key agreement algorithm based on the matrix conjugation method we presented in the 2010 SECRYPT International Conference on Security and Cryptography [56]. The algorithm has been implemented in J2ME and tested on real mobile devices. We also show the results of some performance test executed on a new encryption algorithm compared to 81 CHAPTER 6. PROTECTING SAVED DATA standard ones, presented in [57]. In the end of the chapter we present the framework to manage securely inter-process communication under Android. The framework is detailed in Grillo’s PhD thesis [58] and has been presented in the 2nd International ICST Conference on Mobile Computing, Applications, and Services [59]. 6.1 Key agreement algorithm In many cases a key agreement is needed to send/exchange private data/information by coding them with a specific algorithm. Some mobile cryptography use examples are [60], in which elliptic curves are efficiently used, and [61], [62], concerning trusted text messaging. All these works focus more on coding/signing part than on key agreement, but of course a key agreement phase is needed before encrypting or signing. In this section we present a JavaME implementation of a new key agreement protocol – a particular case of a class recently proposed in [63] – and compare our implementation performance [57] against standard and Elliptic Curve Diffie-Hellman protocol [64]. In the next Section we explain the mathematical problem to be solved to exploit the key agreement, and some consideration upon possible attacks and why these attacks are not effective on such algorithm. In Section 6.1.2 the implementation choices are presented, analyzing why they do not affect security, optimizing performances. In Section 6.1.3 we analyze the testing methodology explaining each step of the testing phase. Section 6.1.4 shows the testing phase results. Section 6.1.5 analyzes with more detail the Section 6.1.4 data. Section 6.1.6 resumes all results proposing possible improvements and applications of the algorithm. 82 6.1. KEY AGREEMENT ALGORITHM 6.1.1 Mathematical setting: key agreement protocol We consider GL(d, Zp ) = M, where p is a prime number. Fix G ∈ M and let ϕ be the conjugation isomorphism associated to G ϕG : M 3 M 7→ ϕG (M ) = GM G−1 ∈ M The following public key agreement between Alice (A) and Bob (B) – see [63] for a more general setting – exploits the property [ϕG (A)]n = ϕG (An ). 1. A and B share Q, S ∈ M, with SQ 6= QS and det(Q) = |Q| = 1, 2. A chooses two numbers xA , nA ∈ N. 3. A computes MA = S nA QxA S −nA and sends it to B. 4. B receives from A the matrices MA . 5. B chooses two numbers xB , nB ∈ N, computes MB = S nB QxB S −nB and sends MB to A. 6. A computes MAB = S nA MBxA S −nA = S nA (S nB QxB xA S −nB )S −nA 7. B computes MBA = S nB MAxB S −nB = S nB (S nA QxA xB S −nA )S −nB At the end A and B share the common matrix MAB = MBA , which represents the Secret Shared Key (SSK). In fact, MAB = S nA +nB QxB xA S −(nA +nB ) = S nB +nA QxA xB S −(nB +nA ) = S nB MAxB S −nB = MBA 83 CHAPTER 6. PROTECTING SAVED DATA ALICE (nA , xA ) MA = S nA QxA S −nA x MAB = S nA M A S −nA B MAB = S nA (S nB QxB S −nB )xA S −nA MAB = S nA S nB (QxB )xA S −nB S −nA BOB (d, p, Q, S) MA EVE MB Unsecure Channell MAB = S (nA +nB ) QxB xA S −(nB +nA ) (nB , xB ) MB = S nB QxB S −nB x MBA = S nB M B S −nB A MBA = S nB (S nA QxA S −nA )xB S −nB MBA = S nB S nA (QxA )xB S −nA S −nB MBA = S (nB +nA ) QxA xB S −(nA +nB ) MAB = S (nA +nB ) QxB xA S −(nB +nA ) = S (nB +nA ) QxA xB S −(nA +nB ) = MBA Figure 6.1: Key Agreement process using conjugate. Note that if |Q| 6= 1, a possible eavesdropper Eve (E) could set up a discrete logarithm problem by considering the determinantal equation [65] |MA | = |S nA QxA S −nA | = |S nA ||QxA ||S −nA | = |S|nA |Q|xA |S|−nA = |Q|xA with det(Q) known, if E can solve this scalar discrete logarithm problem, thus recovering xA , then she can easily find, by solving a linear problem, and adjusting the free parameters entering in the solution, a polynomial X in the matrix S of degree ≤ d, with coefficients in Zp such that MA X = XQxA . Using this, E can compute XMBxA X −1 = (XS nB X −1 )(XQxA X −1 )xB (XS −nB X −1 ) = S nB MAxB S −nB = MAB because X commutes with S. In conclusion: if det(Q) 6= 1, then, the breaking complexity of the algorithm is essentially equivalent to the breaking complexity of a (discrete) logarithm in Zp , i.e., to that of (scalar) Diffie-Hellman. 84 6.1. KEY AGREEMENT ALGORITHM With det(Q) = 1 (see step 1 of agreement process), this “attack” cannot be performed. Figure 6.1 shows the agreement process performed by the algorithm. E could intercept S, Q, d, p, MA and MB . In order to recover the private keys (e.g., nA and xA ), she could set up the following equation MA = S nA QxA S −nA = (S nA QS −nA )xA but this is much more difficult than a usual matrix discrete logarithm problem (DLP), as the base matrix is unknown. Other identities, such as MA S nA = S nA QxA are difficult to exploit because both S nA and QxA are not known separately. Qd−1 We have that # M = i=0 (pd − pi ). Let o(M ) be the order of a matrix M ∈ M, i.e., the smallest integer such that M o(M ) = 1. In order to avoid useless computations, it is sufficient to choose nA , nB < o(S) (resp. xA , xB < o(Q)). The order of a matrix M ∈ M is in general difficult to compute, but an upper bound for it can be found as follows. For each M ∈ M let pM (x) = Qk di i=1 fi (x) be its characteristic polynomial factorized in Z[x], with α = max{di | i = 1, . . . k}. An upper bound (multiple) m(M ) for its multiplicative order o(M ) is given by the following formula [66] m(M ) = lcm(pd1 − 1, . . . , pdk − 1) · pdlogp (α)e 6.1.2 J2ME implementation The previously described operations to perform key agreement have been developed in Java Micro Edition (J2ME). We chose to implement in such programming language because we need a suite that can run on different hardware 85 CHAPTER 6. PROTECTING SAVED DATA platforms and operating systems. Moreover we noticed that a good performance evaluation can be obtained, comparing our implementation of the key agreement algorithm with Bouncy Castle’s implementation of Elliptic Curve and standard Diffie-Hellman key agreement algorithm. Bouncy castle provide a plethora of API performing different cryptographic operations implemented in JAVA, J2ME and C#, we used the Elliptic Curve Diffie-Hellmen (ECDH) and the standard Diffie-Hellman (DH) key agreement J2ME implementation to perform the comparison. The first step to implement the algorithm described in Section 6.1.1, is to implement the modular operations on matrices (e.g., modular matrix multiplication, power, inversion, conjugate and other ancillary operations). It is very important, in a mobile environment, to optimize every step of every operation with respect to resource consumption: in small capacity devices every waste of resources implies a delay, larger than the delay, in more performing devices corresponding to the same waste: because of the shortage of RAM, CPU and storage capacity, operations need to be optimized as much as possible. To perform the operations described in Section 6.1.1 we use a 32 bit unsigned integer data structure. Unfortunately in JAVA and J2ME there is no unsigned integer data structure; to solve this problem there are two possible approaches: 1. use bigger data structures, such as 64 bit signed long integer simulating a 32 bit size applying modulus when the value exceeds 232 , 2. use available 32 bit signed integer combining it with arithmetical operations modulus 231 . 86 6.1. KEY AGREEMENT ALGORITHM We have chosen the latter solution, i.e., to develop the modular matrix as a integer array (int[ ]) with modulo 231 . This data structure is, in our opinion, the best compromise between RAM wasting and CPU usage due to operations needed to perform a task. Security of the key agreement is not affected using 31 bit integers, while performances are compromised, if one uses the 64 bit signed integer to simulate 32 bit unsigned integer. Using long integers the RAM consumption doubles and the system’s performances, in our opinion, degrade too much to justify the slight improvement in security. 6.1.3 Performance testing methodology In this section we report our performance tests of Matrix Conjugation Based Key Agreement versus Elliptic Curve and standard Diffie-Hellman on a Nokia N70 platform. The Nokia N70 is a multimedia smartphone launched in Q3 2005. In 2007, it was the second most popular cellular phone, with 8% of all sales at Rampal Cellular Stockmarket[67]. Our experiments show similar results with other mobile devices. Nokia N70 is equipped with: • CPU : Texas Instruments OMAP 1710 (ARM architecture 926TEJ v5) – 220 MHz processor • RAM : 55 MB • FLASH : 19.9 MB • MMC : 2 GB • SCREEN : 176×208 TFT Matrix, 256K colours • BATTERY : BL-5C (970 mAh) 87 CHAPTER 6. PROTECTING SAVED DATA • OS : BB5 / Symbian OS v8.1a, S60 Platform Second Edition, Feature Pack 3 operating system • JAVA : MIDP 2.0 midlets In a mobile device, in general, and using J2ME, in particular, there are several problems in measuring the time required for a given task, because the accuracy of the System.currentTimeMillis() function is not sufficient. We will use, as an estimate of the time length of a given task, the average of the time lengths, measured on several repetitions of the same task. More precisely: Definition 8 Let n be the number of iterations of one task, and let θi denote the time needed to perform the ith task measured using the System.currentTimeMillis(). The actual time that the device needs to perform such task will be measured as follows: n Θn = 1X θi n i=1 6.1 It is an empirical fact that Θn becomes approximately independent from n, for “large” n. The size on n depends on the task is and usually smaller for longer tasks (i.e., larger Θi ) , see Section 6.1.5 below. For each algorithm tested, we performed the above described operation for the most used instances of the algorithms; e.g., for the ECDH case we tested all the curves recommended by the NIST [68]. For what concerns standard DiffieHelman and Matrix Conjugation Based Key Agreement analysis, we considered instances with comparable private key length, in order to have an idea of brute force attack complexity with respect to performances. 88 6.1. KEY AGREEMENT ALGORITHM Public Data Generation Key Agreement TOT 6000 5000 4000 3000 2000 1000 1) 6 4) 2 (44 MC 1 MC 1 1 (37 5 11 ) 10 0) MC 9 (25 MC 1 0 (3 19) 84 ) MC 8 (19 MC 7 (15 DH 1 0 24 MC 6 (111 6) ) DH 7 68 it (775 MC 5 it EC 5 7 1b DH 5 12 EC 5 2 1b 9bitk EC 4 0 MC 4 (496 ) itk 9) EC 3 84bit EC 2 8 3b 6bit 9bit MC 3 (27 EC 2 5 3bitk EC 2 3 EC 2 3 it EC 2 24bit EC 1 9 2b EC 1 6 3bitK 0 Figure 6.2: Public data and Key Agreement generation time: all tests EC . . . bit: Elliptic Curve Diffie-Hellman with a . . . bit key EC . . . bitK: Koblitz Elliptic Curve Diffie-Hellman with a . . . bit key MC d (. . . ): Matrix Conjugation at dimension d with a . . . bit key DH . . . : Diffie-Hellman with a . . . bit key Next section shows the experimental results of the comparison of various performances of different key agreement algorithms. 6.1.4 Performance evaluation Here we show the results of all the tests performed on standard key agreement algorithms and protocols and on Matrix Conjugation Based Key Agreement. We compared the performance of Matrix Conjugation Based Key Agreement to other reference algorithms, such as Diffie-Hellman key agreement (DH) [69] and Elliptic Curve Diffie-Hellman key agreement (ECDH) [70]. We remark that these algorithms are the most used to perform key agreement operations in desktop and mobile environments. Among the NIST suggested Elliptic Curves [71], we select both Koblitz curves (ending with a K in Figure 6.2 and Figure 89 CHAPTER 6. PROTECTING SAVED DATA Public Data Generation Key Agreement TOT 1200 1000 800 600 400 200 6 4) 5 1) 2 (44 MC 1 MC 1 1 (37 1) 0 0) 0 (31 MC 1 84 ) (251 MC 9 9) 16 ) (151 MC 8 (19 MC 7 DH 1 0 24 MC 6 (11 ) DH 7 68 MC 5 (775 ) DH 5 12 itk (496 MC 4 4bit EC 4 0 9b 9) it 3bitk EC 3 8 EC 2 8 MC 3 (27 4bit 2bit EC 2 5 6b EC 2 2 EC 1 9 EC 1 6 3bitK 0 Figure 6.3: Public data and Key Agreement generation time: results with an upper bound of 1 sec. EC . . . bit: Elliptic Curve Diffie-Hellman with a . . . bit key EC . . . bitK: Koblitz Elliptic Curve Diffie-Hellman with a . . . bit key MC d (. . . ): Matrix Conjugation at dimension d with a . . . bit key DH . . . : Diffie-Hellman with a . . . bit key 6.3) and pseudo-random curves over GF (p). In Figure 6.2 the time comparison between Matrix Conjugation Based Key Agreement (MC in Figure 6.2 and Figure 6.3), standard and Elliptic Curve Diffie-Hellman is shown. We can note that conjugation based key agreement generates the public data and the SSK faster than the other algorithms. Since in Figure 6.2 the difference in generation times for the secret and the key agreement is not really significant, we illustrate in Figure 6.3 a closer look to show better the differences in time. While a key agreement using Elliptic Curve with a 571-bits key takes 5706.3 milliseconds, a key agreement using conjugation based key agreement with a 5 × 5 matrix (775-bits key) takes only 20.63 milliseconds. This difference is sig- 90 6.1. KEY AGREEMENT ALGORITHM nificant even considering that the SSK generated by Matrix Conjugation Based Key Agreement is 50% larger than the Elliptic Curve SSK. Even when considering the case of standard Diffie-Hellman, the differences in mobile environment look quite impressive; for example, a Diffie-Hellman 768-bits SSK is agreed in 343.44 milliseconds while a Matrix Conjugation Based Key Agreement 775-bits SSK takes only 20.63 milliseconds. These differences are illustrated in Figure 6.3. 6.1.5 Experimental results Table 6.1 summarizes all the results obtained in the performance testing for the different classes of algorithms. Parameters field indicates: • In the ECDH case, the type of curve that is used to generate the agreement (K indicates a Koblitz curve) and the size of the generated SSK; • In the DH case, the size of the generated SSK; • In the Matrix Conjugation Based Key Agreement, the matrix dimension and the bit size of the matrix generated as key. Public Data Generation (Pub. Data) field indicates the time to generate the exchanged data to agree a SSK. The field Key Agreement (Key Agr.) shows time needed to generate the SSK by means of exchanged and private data. In Total field the sum of times used to generate exchanged data and SSK is shown. The last field, Iterations (Iter.), indicates how many times the agreement has been performed. This field is useful to understand the accuracy of the values in the Public Data Generation, Key Agreement and Total fields. In all cases but ECDH we did 100 iterations; in ECDH cases we decided to use just 10 91 CHAPTER 6. PROTECTING SAVED DATA Param. Pub. Data Key Agr. Total Elliptic Curve Diffie-Hellman 163bit K 110,90 100,00 210,90 192bit 185,90 195,30 381,20 224bit 298,50 281,20 579,70 233bit K 696,90 759,30 1456,20 239bit 1684,40 1626,60 3311,00 256bit 312,50 262,50 575,00 283bit K 407,80 442,20 850,00 384bit 493,70 415,70 909,40 409bit K 561,00 560,90 1121,90 521bit 1404,60 1342,20 2746,80 571bit 2845,30 2861,00 5706,30 Diffie-Hellman 512 37,51 68,58 106,09 768 116,25 227,19 343,44 1024 282,98 539,83 822,81 Matrix Conjugation Based Key Agreement 3 (279) 3,27 3,13 6,40 4 (496) 6,72 5,94 12,66 5 (775) 10,32 10,31 20,63 6 (1116) 16,72 15,47 32,19 7 (1519) 23,76 22,96 46,72 8 (1984) 33,90 31,41 65,31 9 (2511) 44,35 44,85 89,20 10 (3100) 57,97 56,87 114,84 11 (3751) 74,21 71,26 145,47 12 (4464) 93,91 89,53 183,44 Iter. 10 10 10 10 10 10 10 10 10 10 10 100 100 100 100 100 100 100 100 100 100 100 100 100 Table 6.1: Time used from algorithms to generate the secret to agree a SSK. 92 6.2. ENCRYPTION ALGORITHM iterations because times were more than one order of magnitude bigger than in the other cases, so that keeping the same accuracy was not necessary. 6.1.6 Concluding remarks In this section we compared a custom key agreement algorithm based on matrix conjugation with standard Diffie-Hellman and Elliptic Curve Diffie-Hellman key agreement. Our experiments have been performed using one of the most popular smartphone in the world. Experimental results showed that the key agreement based on matrix conjugation results to be from 8 to 450 times faster than the two DH. Providing the users new services on their mobile device enlarges the need of security to protect the information exchanged; such information can contain data about bank accounts, credit card numbers, pins or simply passwords. Currently existing cryptographic methods affect too much usability of applications, charging the system with resource consumption due to cryptographic operations. Considering the growing business opportunity around the mobile world and, at the same time, the need of new more performing applications that can run on small capacity devices, as smartphones or netbooks, this section’s results open the possibility to apply such cryptographic methodology to many scenarios in mobile devices use. 6.2 Encryption algorithm QP-DYN is an encryption algorithm based on some ideas coming from [72] used for the encryption/decryption phase of the communication. We are not authorized to disclose information about how the algorithm works, we can just provide information on the performance and statistic testing performed in 93 CHAPTER 6. PROTECTING SAVED DATA comparison with other stream cypher algorithms. The security of QP-DYN’s has been statistically tested and the results are available in Section 6.2.2. These results do not prove that QP-DYN is unbreakable; however they show that QP-DYN not only satisfies NIST requirements for classified information but also it passes tighter and more robust tests, such as Rabbit, Alphabit Pseudodiehard, FIPS-140-2 and Crush test batteries. 6.2.1 Performances Performance testing have been executed on a Nokia N70 (see Section 6.1.3 for the device’s details) We compare QP-DYN with RC4 [73] and AES CFB [74] Stream Cipher because both perform stream cipher operations as QP-DYN. In Figure 6.4 (a), Figure 6.4 (b) and Figure 6.4 (c) the results of performance testing between QP-DYN and RC4 for different key sizes are shown. Time performances shown in the figures are the sum of the encryption and decryption times. The sizes of the key tested are: • 512-bit for RC4 compared to QP-DYN with a 4x4 matrix for a total of 496-bit (Figure 6.4 (a)); • 768-bit for RC4 compared to QP-DYN with 5x5 matrix for a total of 775bit (Figure 6.4 (b)); • 1024-bit for RC4 compared to QP-DYN with 6x6 matrix for a total of 1116bit (Figure 6.4 (c)). 94 milliseconds 6.2. ENCRYPTION ALGORITHM 80 70 60 50 40 30 20 10 0 32 RC4 512 QP 4 (496) 96 160 224 288 352 416 480 544 608 size milliseconds (a) 70 60 50 40 30 20 10 0 32 RC4 768 QP 5 (775) 96 160 224 288 352 416 480 544 608 size milliseconds (b) 90 80 70 60 50 40 30 20 10 0 32 RC4 1024 QP 6 (1116) 96 160 224 288 352 416 480 544 608 size (c) Figure 6.4: Overall encryption and decryption time comparison between (sizes in bytes) (a) RC4 512-bit and QP4, (b) RC4 768-bit and QP5, (c) RC4 1024-bit and QP6. 95 CHAPTER 6. PROTECTING SAVED DATA We observe that the time differences in the above figures are in the following ranges: • From 15 up to 52 milliseconds for Figure 6.4 (a); • From 18 up to 45 milliseconds for Figure 6.4 (b); • From 37 up to 65 milliseconds for Figure 6.4 (c). The time differences are within the range 15-65 milliseconds and thus they do not affect substantially the usability of QP-DYN compared to RC4. Furthermore, it is useful to remember that RC4 is not considered secure (see also the results shown in “Statistically testing QP - Dyn and RC4”). We also compared performance results on mobile environments of QP-DYN with an AES implementation performing Stream Cipher (AES CFB Stream Cipher). In particular, Figure 6.5 illustrates the results of a comparison between an AES CFB Stream Cipher implementation using a 256-bit key and QP-DYN with 3x3 matrixes (279-bit key). milliseconds 120 100 80 60 AES – Strm 256 QP 3 (279) 40 20 0 32 96 160 224 288 352 416 480 544 608 size Figure 6.5: Overall encryption and decryption time comparison between AES CFB 256bit and QP3 (sizes in bytes). 96 6.2. ENCRYPTION ALGORITHM In our experiments, the size of the plaintext where QP-DYN and AES take roughly the same time to encrypt/decrypt was about 256 bytes. As it can be seen from the figure, the time differences between AES CFB and QP-DYN are not dramatic: • To encrypt/decrypt 32 bytes of plaintext AES CFB takes 22 milliseconds less than QP-DYN; • To encrypt/decrypt 256 bytes of plaintext AES CFB and QP-DYN take the same time; • To encrypt/decrypt 512 bytes of plaintext AES CFB takes 24 milliseconds more than QP-DYN. QP-DYN can be even used to perform Block Cipher operations so we compared it with AES in his standard Block Cipher implementation. As AES has been designed to perform Block Cipher operations the encryption/decryption milliseconds times are better than the AES CFB. Figure 6.6 shows the results of our experi80 70 60 50 40 30 20 10 0 32 AES - Block 256 QP 3 (279) 96 160 224 288 352 416 480 544 608 size Figure 6.6: Overall encryption and decryption time comparison between AES 256-bit and QP3 (sizes in bytes). 97 CHAPTER 6. PROTECTING SAVED DATA ments for a standard implementation of AES using a 256-bit key and QP-DYN with a 3x3 matrix (279-bit). In particular: • to encrypt and decrypt 32 bytes of plaintext AES takes 0.3 milliseconds while QP-DYN 28.25 milliseconds; • to encrypt and decrypt 512 bytes of plaintext AES takes 4 milliseconds while QP-DYN 63 milliseconds. We remark again that those time differences are very small (of the order of 60 milliseconds), and thus they should not have any impact on the practical usability of QP. 6.2.2 Statistically testing QP-DYN and RC4 QP-DYN performs encryption and decryption in a stream cipher mode; in particular, it generates a key-stream of the same size of the plaintext to be ciphered. This key-stream is XOR-ed with the plaintext generating the ciphered text. Such operations are the same as those performed by other stream cipher algorithms e.g., RC4 [75]. In 2005, Andreas Klein presented an analysis of the RC4 stream cipher, showing correlations between the RC4 keystream and the key, and again in 2008 Klein presented a successful attack on RC4 key-stream based on his 2005 work ([76]). These works show that if there are correlations in the key-stream generated from a stream cipher, the stream cipher itself is reversible and can be statistically attacked recovering the key. The National Institute of Standards and Technologies (NIST) sets the guidelines to verify a stream cipher algorithm based on pseudo-random numbers generators (PRNG) [77]. A PRNG should successfully “pass” some statistic 98 6.2. ENCRYPTION ALGORITHM tests in order to be usable to cipher classified information 1 . These tests are a subset of other sets of tests used to discover correlations in bit sequences generated from a PRNG. NIST gives some documentation about these tests in [78]. As there is a lot of work about RC4 stream cipher cryptanalysis and about the correlations notable in the key-stream generated, we decided to start analyzing the differences, noticed after performing NIST tests, between RC4 and QP-DYN. We tested RC4 and QP-DYN using the TestU01 [79] C library for statistical testing; this library provides more tests than those required by NIST; moreover some of these tests are harder to be passed. The tests in TestU01 library are divided in some batteries: SmallCrush, BigCrush, Rabbit, Alphabit, FIPS-140-2, pseudo DIEHARD. There is not a battery performing all the tests required by the NIST but the tests are available in the library [80]; we implemented a battery of tests performing all the tests required. The results of our NIST test battery running for RC4 and QP-DYN algorithm are shown in [57]. RC4 does not pass some tests, while QP-DYN passes all the tests required by NIST. Moreover while QP-DYN does not show correlations in the keystream generated, RC4 does it in a very short time if compared with the QP-DYN times (Total CPU time for RC4 is equal to 00:32:13.93, Total CPU time for QP-DYN amounts to 04:00:07.85). In every run performed, RC4 failed always the same tests, while QP-DYN always passed all the tests performed. The results of the tests performed give a clear indication that QP-DYN, 1 Classified information is sensitive information to which access is restricted by law or regulation to particular classes of persons. A formal security clearance is required to handle classified documents or access classified data. The clearance process requires a satisfactory background investigation. There are typically several levels of sensitivity, with differing clearance requirements. This sort of hierarchical system of secrecy is used by virtually every national government. The act of assigning the level of sensitivity to data is called data classification. 99 CHAPTER 6. PROTECTING SAVED DATA when used properly with strong keys, is a strong and robust stream cipher even but effort required is higher than that required by other algorithms. 6.3 Protecting inter process communication Smartphone applications are commonly installed and stored in memory, and in modern devices all the application’s data are kept safe from the OS by using a sandbox approach. Such approach prevents other applications to access unauthorized data insulating each application from the others [81], [82], [83]. In many cases applications installed on the same device may interoperate in their working environment using mechanisms similar to the inter-process communication (IPC) and made available by the mobile operating system. Unfortunately, mobile devices lack in flexible solutions for making these communications secure. In this section is presented a framework proposed to secure the message exchange with the services installed on Google Android mobile devices. VASs realized by different providers are discovered, used and composed by an Application Frame designed for realizing complex goals. We implemented a prototype of our proposed framework on a real device and we performed extensive testing to measure the overhead introduced by the cryptographic operations required to protect the inter process communication. We named this framework SAVED (Secure Android Value addED services). SAVED enables secure communication between services and applications using such services via Inter Process Communication (IPC)/Remote Procedure Call (RPC). Each VAS is realized through an Android Service. The access to such a service requires the execution of an authentication and authorization phase among the involved parties. Once this initial phase is completed, the ap- 100 6.3. PROTECTING INTER PROCESS COMMUNICATION plication sets up a secure communication with the service using a symmetric encryption scheme. 6.3.1 State of the art Android is a multi-process system, in which each application (and parts of the system) runs in its own process. Most security between applications and the system is enforced at the process level through standard Linux facilities, such as user and group IDs that are assigned to applications [84]. The Android system requires that all installed applications be digitally signed with a certificate whose private key is held by the application’s developer. The Android system uses the certificate as a means of identifying the author of an application and establishing trust relationships between applications. The Android approach grants security of application’s data, and prevents access to all services developed by others. Every service publishes in its personal manifest file the permissions required to use the service. One of the permission settings in the manifest file is Protection level. The Protection level field configures the security policies required by the service; if the level is set to signature the service will communicate only with these applications with which it shares the same developer certificate. The main advantage of the approach followed in the Android design is that developers have to focus their attention only on the application, while the OS grants that all the applications that are not allowed to access the services are prevented from doing so. This simplification comes at an extra cost: only developers sharing certificates and private keys can use services already developed in new applications. This is a huge limitation compared to the growing size of the mall market and the number of organizations and developers en101 CHAPTER 6. PROTECTING SAVED DATA rolled in publishing applications and services. The approach of Android prevents third parties to start using the framework’s VAS. Developers can use each others’ services sharing certificates and credentials: in this case, the applications can interact but the security of the whole framework is granted from a single digital signature; if the developer’s digital signature is stolen a hacker could sign his/her own applications, thus getting complete access to all data of the framework. Our approach wants to promote the framework scalability and grant secure access to services developed by other users without the need to share private data. We propose to insert a new layer that handles security of inter-process communications; in such layer, trustability is granted directly by the security policy of the framework, and each application can require access and publish services interacting with the framework like in a PKI environment. Thanks to SAVED framework it is possible to face different kinds of threats: • Service Spoofing: the application refers to a service by simply using an interface that establishes the name, the package and the methods signatures; if the original service is replaced on the mobile device, applications that exploit that service are unaware of the substitution. • Memory Dump: starting from Android 1.5, a new API has been introduced to generate a memory dump programmatically. The static method dumpHprofData(String fileName) of the Debug class generate a dump file that can be converted with the hprof-conv tool of the Android SDK and, subsequently, analyzed with different memory analysis tools (e.g., Eclipse MAT, JProfile, etc.). If a fake application execute the dump periodically and export the dump data using a connection (e.g., HTTP con- 102 6.3. PROTECTING INTER PROCESS COMMUNICATION nection), it is possible to steal the data exchanged among applications and services. 6.3.2 The framework SAVED (Secure Android Value addED services) is a framework that grants secure communication between services without requiring private data sharing. Our intent is to improve interoperability between applications and services facing the limits of the Android’s native approach. The purpose of SAVED is to allow applications to use services developed by others, to add new VAS to the framework or even to create new applications using already existing VAS. All the interactions performed using the proposed frameworks will be performed in a secure way. SAVED adds supplementary security at the process communication level: each application is accredited to the framework which grants privileges to access in a secure way shared services and facilities. Single process security provided using sandboxes with the Android approach is also preserved in SAVED. In our framework we defined two main entities: • Application, which provides graphical user interface, and all the logic implementing the task to be realized. Applications are implemented extending the Android.Activity class. • Value Added Service (VAS), which provides to the applications developed using the framework all the certified services. VASs are implemented as remote services extending the Android.Service class. The ProxyCA and the ProxyTSA are two special VAS in the framework; these VASs allow the communication with a Certification Authority and a Timestamping Authority, respectively. 103 CHAPTER 6. PROTECTING SAVED DATA In order to realize Applications participating to the framework, developers have to extend specific interfaces and include particular resource packages. When a new VAS is realized, it is required to export its class package. Such class packages will be imported from the Applications that will use the services provided by the VAS. The packages imported will be used to perform inter-process communication. Including such packages and extending the interfaces will provide the supplementary security layer that will grant a secure communication between entities and prevent the access to the services to those applications that are not allowed. Moreover, we tried to address some best practices to create components participating to the framework enforcing the required security needs. Some examples follow: • Activation code: when the Application/VAS is installed on the device an unlock code should be required to the user; the Application/VAS will remain locked (preventing all interactions) until the user will insert the proper activation code of every entity; • Use of standard certificate: each component should have a proper X509 digital certificate signed from a valid Certification Authority (CA), such certificate will be saved in a keystore inside the component memory area; the component will be responsible to take care of managing correctly the keystore itself to grant a secure saving of the other’s certificates; • Model View Control Pattern: VAS and Applications will take care of implementing independently graphical user interfaces to be shown to the end user; • Mutual Authentication: each entity needs to implement a mechanism to 104 6.3. PROTECTING INTER PROCESS COMMUNICATION Figure 6.7: Mutual Authentication phase. grant mutual authentication. The mutual authentication should be ensured by mutually exchanging and verifying the digital certificates. Using a handshake schema (e.g., TLS handshake) the involved entities exchange their digital certificates, check the certificates validity through the ProxyCA, and mutually authenticate themselves (Figure 6.7). • Session Authentication: once the entities are mutually authenticated, a session key (i.e., SK) is shared. According to our approach the SK is gen105 CHAPTER 6. PROTECTING SAVED DATA Figure 6.8: Session Authentication phase. erated by both the Application and the VAS using parameter defined by the two parties (i.e., CTRL A and CTRL B). Adopting a key agreement protocol (e.g., Diffie-Hellman protocol) the involved entities agree on secret SK that will be used to encrypt subsequent communications (Figure 6.8). • Session Encryption: Every VAS allows access to its functionalities only to “trusted” Applications; trusted Applications have performed successfully the Mutually Authentication and the Session Authentication phases. In order to enforce the uniqueness of each interaction with VAS a random 106 6.3. PROTECTING INTER PROCESS COMMUNICATION Figure 6.9: Session Encryption phase. value (i.e. Nonce A) is used; the confidentiality is granted by encrypting the exchanged data with the SK.The Application composes the results of different VAS in order to realize a complex goal. At the end of this phase, the Application interacts with a timestamping authority through the ProxyTSA in order to securely keep track of the creation time of the realized goal (Figure 6.9). The sensitive data of the operation are summarized applying an Hash function (i.e., Op Hash) and these data are sent to the Timestamping service. 107 CHAPTER 6. PROTECTING SAVED DATA Mutual Authentication, Session Authentication and Session Encryption represent the secure core of SAVED framework and should be carefully performed in order to join the framework. 6.3.3 The framework implementation We developed a prototype of the SAVED framework on an Android 1.5 platform. The main features of the proposed framework are encapsulated into the jar files that contains two kind of files (i.e., .aidl, .Stub) for the inter process communication. AIDL (Android Interface Definition Language) is an IDL [85, 86] with which it is possible to generate automatically the source code that allows two Android applications to exchange information using IPC. AIDL/IPC interface based mechanism is similar to Common Object Model (COM) or Common Object Request Broker Architecture (CORBA). In order to implement an AIDL/IPC service it is required to perform some steps: • Create an .aidl file to define the interface (YourInterface.aidl). The interface defines the access methods and the fields available to a client. • Add the .aidl file to the makefile and implement the methods of the interface creating a class that extends the YourInterface.Stub (.Stub file is automatically generated by the tool) and implements methods declared in the .aidl description file. • Publish the interface to clients rewriting the Service.onBind (Intent) method; this method will return an instance of the class implementing the interface. • 108 6.3. PROTECTING INTER PROCESS COMMUNICATION Figure 6.10: SAVED framework main packages. This IPC mechanism needs a way to share complex information, such as nonprimitive types, between two entities. In order to achieve this goal Android provides Parcelable class able to serialize and deserialize complex types. Figure 6.10 simplifies the package diagram of SAVED. The picture shows on top the following core .jar files: • pkgApp.jar contains the interface InterfaceApplication that must be implemented by every class that want to participate SAVED as an Application; • pkgServ.jar contains the interface InterfaceService that needs to be implemented by every class that want to be a VAS in the framework; • pkgCA.jar carries the IProxyCA.aidl with his relative .Stub file; these files 109 CHAPTER 6. PROTECTING SAVED DATA allow the communications between the entities of SAVED and the ProxyCA. Moreover, the jar file contains the parcelable class ReqX509 that is mandatory for the communication; • pkgTSA.jar packages the IProxyTime.aidl with his relative .Stub file to grant communication with the Proxy TSA; • pkgCommBase.jar contains the three base parcelable files that grant the communication between the Application and the VASs, namely CertificatePack.java, KeyPack.java and ResourcePack.java. In order to grant to an Application to contact and receive services from all the VAS inside the framework, and so assemble the services offered from the VAS to create complex applications, it is required to install the ProxyTSA and the ProxyCA Android packages (apk); these entities are shown in the lower left half of Figure 6.10. ProxyCA is one of the underlying VASs that exist in the framework. All entities must submit to the ProxyCA the digital certificates they receive from their communication partners. The service contacts a web service that works as an online Certification Authority, inserts the certificate in a XML file and through a secure HTTP connection (i.e., HTTPS) asks for the certificate verification. The web service checks the certificate validity and answers with an XML response. ProxyTSA is another basic VAS of SAVED. As the ProxyCA the ProxyTSA takes in account the communication with an external partner, the timestamping web service. All the communications between the proxy and the timestamping web service are managed through XML messages on HTTPS. The lower right half of Figure 6.10 illustrates a VAS and an Application participating SAVED. Third parties that want to contribute to the framework 110 6.3. PROTECTING INTER PROCESS COMMUNICATION may easily create and add new Applications or VASs. In the remaining we sketch how a developer can realize VASs and Applications. Building a value added service 1. Create a new Android project with a class that extends the native Service class; Import in the project pkgServ.jar, pkgCommBase.jar, pkgCA.jar; 2. The main class of the project must implement InterfaceService interface class and consequently all his methods; 3. Create the graphical user interface; 4. Create the IServiceX.aidl in the project as described previously; 5. Create and export pkgXVAS.jar containing IServiceX.aidl and the corresponding .Stub file generated automatically; 6. Service class must implement, all the standard methods of the Android native Service class, and the .aidl interface with all the methods defined through the description language; 7. Release the service as an .apk file for the installation on the device. Building an application 1. Create a new Android project which contains a class that extends the native Activity Android class; Import in the project pkgApp.jar, pkgCommBase.jar, pkgCA.jar, pkgTS.jar; 2. The main class of the project must implement the InterfaceApplication interface with all his methods; 111 CHAPTER 6. PROTECTING SAVED DATA 3. Create a graphical user interface to allow the user to interact with the Application; 4. Import from each VAS you want to use in the Application the corresponding jar file (i.e., pkgXVAS.jar) 5. Use each service in a proper way, taking care of managing and releasing correctly the connection with the involved VAS. Note that early versions of Android platform serialize the access to the services. 6. Release the Application as an .apk file for the installation on the device. Assume we are in a scenario where we have one Application and one VAS, each one with its own digital certificate signed by different CAs. Note that in this scenario, none of the entities “knows” the public key or the certificate of the counterpart. If the two entities wish to cooperate, they need to authenticate each other. After contacting the ProxyCA to verify the communication partner trustability (cfr. the Mutual Authentication phase), an asymmetric cryptography session to exchange the session key can be started (cfr. the Session Authentication phase). Finally, the session between the involved parties is encrypted using symmetric cryptography (cfr. the Session Encryption phase). The need to switch from asymmetric to symmetric cryptography is due to the performance overhead of asymmetric cryptography: indeed, the switch from asymmetric to symmetric cryptography improves the performances of the whole framework reducing the effort due to encryption/decryption operations. 6.3.4 On a real device The framework has been tested on an Android HTC Magic device. The device was equipped with Android 1.5 OS, 3.2 M-pixel camera, Integrated GPS 112 6.3. PROTECTING INTER PROCESS COMMUNICATION Phase 1. Mutual Authentication* 1. Mutual Authentication 2. Session Authentication 3. Session Encryption Total Framework Overhead Time (ms) 1197 446 257 795 1498 Table 6.2: Time overhead for the framework phases. Antenna, IEEE 802.11 b/g Wi-Fi. Using Android ADB tool different .apk, created using Eclipse IDE, have been installed on the HTC Magic. The testing phase has highlighted a slower response of the Applications due to security operations, inter-process communications via AIDL interfaces and parcelable classes. We executed some performance tests using our prototype. We aimed at measuring the time computational overhead introduced by the use of SAVED, and thus we measured the time needed to execute security functions. In particular, we have considered the overhead related to each one of the phases described in Section 6.3.2. In Table 6.2 we can see the time overhead introduced by SAVED. The first row of the table refers to the first execution of the Mutual Authentication phase, while the second row refers to the subsequent executions. In the first case the more time required is justified by the need to update the keystore with the new digital certificates; this delay is paid once. The total framework overhead amounts to 1.5 second preserving the usability for real use cases. 113 People have really gotten comfortable not only sharing more information and different kinds, but more openly and with more people - and that social norm is just something that has evolved over time. Mark Zuckerberg 7 Value added services on backup data Introduction In this chapter we present some possible use cases where the application of our backup approach can bring an improvement of the interaction among people. In the first part of the chapter we show a system which allows the user to share part of his/her backup data with some selected contacts; a shared backup can ease communication within an enterprise environment, among friends or university colleagues and, with some constraint such as geographic location and time, in other situations such as meetings or conferences. Some results of the experimentation of the shared backup proposed have been recently presented in the 4th IFIP International Conference on New Technologies, Mobility and Security [28]. In the second part we show a methodology to extract social network from backup data. The methodology proposed helps building the network extracting connections into backups, and helps making searches on the web for information publicly available and findable using standard search engines. The 115 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA social network can be useful for several objectives; for example in an enterprise it can be used to choose people that are “friends” out of office to be inserted into workgroups, this may improve productivity avoiding conflicts between collaborators. An approach like that exploits dynamics already present into groups of people [87]. Some ideas in the second part of this chapter have been proposed in Dellutri’s PhD thesis [31]. An extension of this Dellutri’s thesis, showing some results of this chapter have been published in the First IEEE International Workshop on Information Forensics and Security [32]. 7.1 Sharing backup data with closed groups The common interface, introduced in Chapter 3, used for backup can be even used as base to enable sharing services to the users of the system. Usually, in closed groups of people, users unconsciously share each others contact, part of the calendar, production files or even, in personal interaction, pictures and videos. The general idea is to give to the user the possibility of sharing part of the backup as a common synchronization interface with some selected contacts or group of contacts of their choice in his personal or business network. 7.1.1 Social backup in business environment In an enterprise where people collaborate daily, it could be important for employees to share commonly useful information e.g., calendar, part of the address book, templates for presentations or documents etc... Moreover if a new employee joins the team, his/her contacts are added to the common address book and shared with selected users of his/her new team; his/her new busi- 116 7.1. SHARING BACKUP DATA WITH CLOSED GROUPS Figure 7.1: Use case of meeting backup and share. ness device is added to a specific closed group and all data updated to the last changes are kept from the shared backup and saved on it. If somebody’s device is lost, or stolen, or the employee leaves the company, the group administrator can disconnected it from the social backup and the privacy of the group members is granted. Using our approach, all these updates are directly exchanged and notified on employees’ smartphones. 7.1.2 Sharing conference data Using some restrictions (i.e., time, location), our approach, can be useful in some particular kind of events, such as meeting, conventions or conferences. In this kind of events, the interest on some information (e.g., organizer contacts, event schedule) is temporary; the participant is interested in such information just for the time he/she is in the meeting location. Organizing committee can inform participant sharing documents and other related info just in the event area and when the event is. In this way participant will have just the information he/she needs directly on his/her mobile device, e.g., the conference 117 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA schedule is shared via calendar, venue address via maps, committee contacts via address book; this avoids a plethora of non-useful data for the attendee and a lot of noisy requests of information for the committee. (a) (b) (c) Figure 7.2: Android Backup and Restore client. 7.1.3 Shared backup for smartphone To allow users to share part of their backup with closed groups we deployed a set of REST web services (see Appendix C for the services implemented) on the backup server described in Section 5.4. In the control layer we implemented the business logic which handles the sharing services; using these REST API the user can manage via client his/her groups and sharing allowing or denying access to a resource to other users. Before granting access to any content the server checks the owner’s settings to verify if the user can access such resource. The shared approach proposed can be generalized, under some conditions, to a open community willing to share his/her data. In small groups, where all participants directly know each other the information can be shared freely; a system like that does not introduce privacy or security problems. If some- 118 7.1. SHARING BACKUP DATA WITH CLOSED GROUPS (a) (b) (c) (d) Figure 7.3: Android Backup and Restore client. one wants to share an information with a friend the system just eases the task keeping the information up to date in both “sharers” devices. In an open community the information to be shared must be authorized by the information owner. In such case, when a user tries to share an information this information must be verified for example through a code sent to the information. For example if a user wants to share a mobile phone number, the system will send a text message to the number with a verification code to be inserted into the system to verify the ownership of an information. We equipped the Android backup client (see Section 4.2) to access the web services deployed on the server. 7.1.4 Running the application We ran our application on real devices. Figure 7.2 and Figure 7.3 show some snapshots of the Android’s client GUI that explains how the system works; the Symbian client and the server side implementations are omitted. Figure 7.2(a) illustrates the backup setup features where the user can choose which data to 119 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA backup and the type (full or incremental) of the backup to be performed. Figure 7.2(b) shows the interface where the user can select a backup to be restored: note that this backup may have been performed on another device. In Figure 7.2(c) depicts a granular view of a backup: in this interface the user can choose to restore just a part of the backup, or to keep updated only part of his/her data. Figure 7.3 illustrates how to share information based on geographic coordinates. In Figure 7.3 (a) the user is presented the actions used to manage groups with which he/she shares information; Figure 7.3 (b) shows all the possible actions which can be performed by the client. If the user chooses share data, Figure 7.3 (c) is presented and he/she can share data with his/her friends or with the groups he/she participates. In Figure 7.3 (c) and (d) show the interfaces to share data geographically: tapping on the map the user specifies the area where a resource is visible and shares a resource from a backup. When another user of his/her group accesses the area in which the shared information resides, the new user is notified and the information is made available. 7.2 Extracting social network In this section we propose an approach that allows one to get information about the social network of an individual by complementing the information provided by its (smart)phone with the data publicly available on the net. Our approach is based on a profile graph, whose nodes are the people involved and the (weighted) edges represent their mutual links. In a first phase, a preliminary version of the graph is built by using all the information available in the backed up smartphone; later, the obtained graph is refined by mining publicly available data from the Web. Finally, the graph is clustered to gener- 120 7.2. EXTRACTING SOCIAL NETWORK ate cliques of people. All the phases of the process, described above, are performed by an integrated and interactive software tool, that allows the user to rapidly recover a smartphone’s owner social network. Merging the information coming from the Web with the information stored on the mobile device allows to reach “clearer” results avoiding homonymy problems and improving the precision of the link weighting. 7.2.1 Introduction The everyday increasing spread of (Web) social networks, like Facebook, Linkedin, Fickr, Twitter, mySpace, etc. provides an invaluable amount of personal data publicly available, but it is often difficult if not impossible to distinguish real friends from Web ones if the WWW is our unique source. However, things change considerably if we can access an individual’s mobile device data: the two sources, the phone and the Web, together provide a precise picture of his/her social network. In some cases, if the smartphone is used to access some social networks, and therefore it stores the relative passwords, the picture provided can be really sharp. The social network generated can be used to profile users. Users profiles can become a key point in workgroups creation; productivity should be improved when people already know each others and if they share interests different from work. Beside the group generation in an enterprise the user profile might prove helpful in other fields including marketing, new social networking services boot-strapping, and Customer Relationship Management. The added value given by this approach is that intersecting the information provided by the smartphone with the information freely available on the Web, 121 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA allows to: i) filter effectively the often too many Web contacts, ii) discover the mutual relation between the phone contacts, iii) reduce ambiguities (e.g., Facebook friends you do not really know are either filtered out or their connection’s weight in the social network graph is really low compared with real friends existing both in the smartphone and Web data); iv) provide a “closeness” score. The above approach, aimed at performing the Mobile Identity Profiling (MIP), i.e., reconstructing a user’s profile by combining the smartphone’s data analysis with social relationships data found on the Web, is splitted into three stages: 1. the Smartphone Data Analysis (SDA) (Section 7.2.3); 2. the Web Data Analysis (WDA) (Section 7.2.4); 3. the Clustering Analysis (CA) (Section 7.2.5). The goal of the process is to build the smartphone owner’s social network, namely the profile graph, and to find all sub-graphs (clusters) which represent the social groups within the graph. The purpose of this section is to give the reader an idea about the effectiveness of our approach. We will discuss how the process is performed using an example to lead the reader through all the stages. 7.2.2 Related work To the best of our knowledge, our approach of combining three different techniques in order to reconstruct an individual’s social network is novel. In this section we briefly discuss related works about the three distinct processes. The leitmotif connecting these processes is the concept of identity; through this section, with identity we mean “that part of the self by which we are known to 122 7.2. EXTRACTING SOCIAL NETWORK others” [88]. A remarkable work about the identity construction on social networks is given by Zhao et al. in [89], where the authors study identity construction on Facebook (http://www.facebook.com). The first phase of the approach described in this section, Smartphone Data Analysis, is based on some previous works, where we extracted information residing in mobile devices [33], [34] and analyzed this information to trace the smartphone’s user activity for forensics objective. Focusing on Web Data Analysis, interesting results are presented by Mika et al. in [90], [91]; the authors, dealing with the problem of “bootstrapping” a Friend-Of-A-Friend (FOAF) based social network, proposed “the traditional Web as source of information about the social networks in a community”. So they introduced a system for collecting social network data which fetches data from the traditional Web by mining the index of Google. Since social networks spread, many “common” users put themself on the Web and, in particular, they entered information about who their friends are. We are able to extend Mika’s experiment to common users, thanks to the part of social networks data that is publicly available on the Web and that is periodically crawled by search engines. Dealing with Clustering Analysis, we focused in the identification of locally dense subgraphs that are sparsely inter-connected, also known as the paradigm of intra-cluster density versus inter-cluster sparsity (see [92], that provides an excellent overview about graph clustering). In Section 7.2.5 we provide details about the algorithms used. 123 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA 7.2.3 Smartphone Data Analysis (SDA) Smartphone data analysis aims to decode the content of a smartphone and analyse it generating a graph representing the interactions between the users and his/her mobile device contacts. The decoding phase aims at generating parsers able to export data in XMLformat, or that can be integrated directly in the analysis application. Data decoded (contacts list, SMS list, event log and calendar entries list) can be hardly analysed manually by a human operator, because she has to correlate their unique identifier, in order to reconstruct situations, conversations and relationships between a device’s owner and her contacts. The Smartphone Data Analysis is composed of four sub-phases1 : The File Analysis, that analyses files contained in the device filesystem organizing them by MIME-Type and run the decoding tool over personal data files. The Contact Analysis, which merges together duplicate contacts information, highlights those contacts which may represent potential source of noise for the next Web analysis. The Event Analysis, that mines the phone’s log to reconstruct the user’s activity. Events generated by a mobile device always belong to the following macro-classes: voice calls, data calls, SMS/MMS sent or received, SIM change, SD change. Voice calls and SMS/MMS logs are useful to reconstruct of the phone owner’s social activity, and are used to determine the strength of the relation between the owner and each contact. The Messages Analysis completes the event analysis by extending it to all 1 In this section we do not deal with the calendar analysis, because it is not directly correlated with the social network discovery. 124 7.2. EXTRACTING SOCIAL NETWORK (a) (b) (c) Figure 7.4: The graph representation of contacts (a) and their relationships with the phone’s owner (b), which are revealed by the number of calls and number of SMS/MMS. In (c) is shown the graph after the execution of SESORR; the edges represent the relationships extracted from the Web (web-edges). SMS/MMS that have been deleted from event log but could still persist in the saved SMS/MMS list. After these analysis sub-phases have been completed, the profile graph is built and the information collected is organized and stored inside it. Such data structure allows us to represent the social network given by the phone interaction between the owner and the contacts. The graph generated is an undirected2 graph; the weight of each edge connecting two vertexes represents the strength of the connection between the user (central vertex) and a contact (other vertex). In our graph representation the phone owner is in the center of a circle composed by the contacts we found in her smartphone (Figure 7.4a). After the SDA, the graph is augmented with edges from the phone’s owner and the contacts with whom she has communicated (via SMS/MMS or call). The weight of 2 An undirected graph is a graph in which the vertices are connected by undirected edges. An undirected edge is an edge that has no orientation. 125 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA these links is computed trivially as the sum between the number of calls (sent or received) and SMS/MMS (sent and received) between the owner and the contacts. The value of this sum is used to compute the edges length, in order to put the most frequently contacted people closer to the owner (Figure 7.4b). 7.2.4 Web Data Analysis (WDA) The goal of the Web Data Analysis component is to find the social network between the phone owner and her contacts, and among them, by retrieving people’s public information on the World Wide Web. As mentioned before, we follow the approach of Mika et al. [90] to retrieve relationships from search engine records. In this section we will detail the relationships-retrieving algorithm and the techniques used to estimate the Web edges weight. SESORR Algorithm In order to reconstruct relationships among a phone’s contacts, we used the huge amount of data collected by search engines over the years to obtain relational network data. Our approach is to submit all possible pairs of names and surnames to the search engine and to retrieve the results, i.e., the pages where the two pairs hname, surnameii,j occur simultaneously, by counting the number of pages found (hits) and, for each of them, by saving the title and the short description returned by the search engine. Moreover, it counts the nonstopwords3 contained in titles and description for further analysis. To accomplish this task, we designed the SESORR (Search Engine SOcial RelationshipsRetrieving) algorithm. As preliminary examination, SESORR submits the query 3 In a natural language, stopwords are function words or connectives such as articles and prepo- sitions that do not provide useful information for our scope. 126 7.2. EXTRACTING SOCIAL NETWORK hname, surnamei ∨ hsurname, namei for each contact and stores the results in the G nodes data structures. In such way it is able to discard from subsequent queries the contacts which are not present on the Web (i.e., the query returned a result set R = ∅). Finally, for each pair of contacts i, j, SESORR submits the following query: (hnamei , surnamei i ∨ hsurnamei , namei i) ∧ (hnamej , surnamej i ∨ hsurnamej , namej i) and stores the results. Name and surname pairs are sent to the search engine by enclosing them within quotation marks: in such way the search engine is forced to retrieve only pages which contain the adjacency of the search terms. The piece of software which implements SESORR is able to contact both Google and Yahoo. After the SESORR execution, the profile graph is enriched by Web edges between the owner and their contacts, and among contacts. An example is reported in Figure 7.4c. For each Web edge, SESORR merges the titles and the descriptions of each result set entry in a single string. It computes the occurring frequency of each non-stop word. Such keywords and their frequency are stored in the Web edge and are displayed to the operator when she clicks on the edge. Given a Web edge, the list of keywords and their frequencies provides a kind of “semantic vision” of the relationship and the user is able to figure out a meaning of the relationship at glance. 127 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA Figure 7.5: Frequency distribution of URLs (domains) providing relationships. Moreover, besides title and description, SESORR stores each URL in the result set. By calculating the frequency with which each URL occurs over all relationships on a single profile, SESORR also provides a distribution of frequency of domains related to the profile and its contacts (see Figure 7.5). Web-edge weight estimation In order to measure a Web edge weight, i.e., how similar are two contacts between which a Web edge exists, we define a function σ(e) ∈ [0, 1] which measures the similarity between u and v individuals. In the semantic Web area, the similarity between two classes is assessed by observing the number of instances that these classes share, their individual number of instances, and the total number of instances they contain. The most frequently used metrics are the following: 128 7.2. EXTRACTING SOCIAL NETWORK Jaccard index [93] between two sets X and Y is defined as the ratio between the size of the intersection and the size of the union of the two sets being compared: σ(X, Y ) = |X ∩ Y | |X ∪ Y | Normalized Google Distance (NGD) [94] it takes advantage of the number of hits returned by Google to compute the semantic distance between concepts. Given two search terms x and y, the the normalised Google distance between x and y, N GD(x, y), can be obtained as follows: N DG(x, y) = max{log f (x), log f (y)} − log f (x, y) log M − min{log f (x), log f (y)} where f (x) is the number of Google hits for the search term x, f (y) is the number of Google hits for the search term y, f (x, y) is the number of Google hits for the tuple of search terms xy, and M is the number of Web pages indexed by Google (approximately ten billion pages). In our preliminary experiments, we measured the Pearson’s correlation between the Jaccard index and the NGD; the results were (approximately) in the range 0.3 − 0.4, thus exhibiting a small-medium correlation. The software allows the user either to choose between the metrics, or to combine them by providing relative weights. 7.2.5 Clustering Analysis (CA) At the final analysis stage, we want to identify subgroups of contacts sharing similarities. Generally speaking, the goal of clustering is to group together similar elements and thereby to identify the skeleton structure of the input data. 129 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA Figure 7.6: Contact-to-cluster assignment. In this section we have employed clustering techniques to split the phone owner’s social graph into small subgraphs (clusters). We chose spectral algorithms, i.e., algorithms based on spectral properties of the matrices associated to the input graph, because i) they are general and versatile, and ii) they proved to perform effectively in the identification of locally dense subgraphs that are sparsely inter-connected. In particular, we used the Spectral [95] and FullSVD [96] algorithms; both are based on the Singular Value Decomposition (SVD) performed on the adjacency matrix of the input graph. Spectral. This algorithm, introduced in in [95], where it was called Spectral Algorithm I, is essentially a projection onto the first k right singular vectors. The intuition of this technique is that the matrix A describes the location of m points in an n-dimensional space. The projection onto the subspace defined by the top k right singular vectors gives the best k-rank approximation of A. FullSVD. Drineas et al. studied in [96] k-means and its continuous version. While the discrete version is known to be N P-hard, the latter can be solved efficiently using a projection onto the top k left singular values. Similar to the Spectral algorithm, the cluster assignment is a discretization of the continuous solution. We refer to this method as FullSVD in order to avoid ambiguities with 130 7.2. EXTRACTING SOCIAL NETWORK the SVD computation which is the core of all these algorithms. Both the algorithms output a matrix C which has on the rows the nodes (contacts) indexes, and on the columns the clusters indexes. The matrix cells represent intuitively the weight of the “closeness” between a contact and a cluster. We assign a node to the cluster with the maximum absolute value. In Figure 7.6 is shown a screenshot with the C matrix details and, for each contact, the chosen cluster. It is important to mention that, in both the clustering algorithms used, k, i.e., the number of clusters, is an input parameter; the quality of the results heavily depend on a good choice of its value. In the literature there are several measures to assess the quality of a cluster [92], and our tool can measure many of them, thus providing a feedback to the operator. In Figure 7.7 are reported some snapshots depicting, for each algorithm and spectral (unweighted) spectral (Jaccard) 40 50 60 0 10 20 30 40 SVD (unweighted) 30 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 10 20 30 40 50 60 70 0.8 0.4 ●● ● ● ● ●● ● ●● ●●●● ●●●●●●●●● ●●●●●●●● ● ●●●●●●● ● ●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 10 20 30 40 ● 60 70 50 60 70 ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●● ●● ● ● ● ● ● ●● ●● ● ●● ●●●● ●●●● ● ● ●●●●●●●●●●●● ●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● 0 10 k coverage performance 50 SVD (Google Similarity) ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●● ●● ● ● k 40 k ● 0.0 0.8 0.4 0.0 ● ● ● ● ●● ● ●●●●●●●●● ●● ● ●● ●●●●●●●●● ●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●● ●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●● 20 70 SVD (Jaccard) ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ● ● 10 60 k ● 0 50 0.0 0.0 70 k 0.8 30 0.4 20 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0.0 10 ● ●● ● ●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ●● ●● ●●● ● ●●●●● ● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0.8 ●● ●●●● 0.0 ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●● ●●● ●● ● ●● ● ●●● ●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●● ● 0.4 0.4 0.8 0.4 ● 0 spectral (Google Similarity) ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●● ● ●●● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0.8 ● 20 30 40 50 60 70 k inter−cluster conductance intra−cluster conductance Figure 7.7: Clustering metrics trends. The profile graph, used in the example, has 218 contacts and 1242 Web edges; the black vertical line is relative to k = 10, the chosen value for the input parameter k. 131 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA University colleagues Musician friends Friends (from Facebook) Work colleagues Family members Figure 7.8: The final result of the whole process: the social network clusters. for each Web edge weight metric, the performance of clustering quality indexes to the variation of k. 7.2.6 The Final Result: The Social Network After the three stages, described in the previous sections, the tool is able to produce a graphical view of the clusters, shown in Figure 7.8. Here, for each cluster, the Phone’s owner is represented by a black node. It is interesting to notice that, by looking at each cluster’s Web edge key- 132 7.3. CONCLUSIONS words, we have been able to gather the area of interest shared by individuals in each cluster. Furthermore, the graph structure in each cluster may provide an intuition about the mutual relationship of the people involved. For example, from the “work colleagues cluster”, that is far from being a complete graph, it is possible to see “who works with whom”; even more interesting, inside the “musician friends cluster” we see a complete subgraph, made by five nodes, that corresponds to the members of a rock band, and only one of them actually plays with the Phone’s owner (together with the other people/nodes shown in the cluster). It is important to emphasize that all the above information, together with everything shown in Figure 7.8, have been obtained by a smartphone and our tool, able to mine the Web data, with no additional information available. 7.3 Conclusions Our profiling method relies on information stored in the smartphone and its precision depends on the quantity and quality of such data. Since the method we used to find a person on the Web, and her relationships with other phone contacts, relies on her first name and last name, precision is strongly dependent on the care used by the owner when she inserted each first name and last name. Sometimes only the names or the nicknames of a contact are inserted (e.g., most intimate contact); after submitting such weak identity to a search engine, this will produce no or useless results. To deal with this aspect, the framework performs a pre-processing of all contact entries (e.g., highlights entries which have name or surname missing) and suggests the operator identifies the contacts (where possible) and enters their correct names 133 CHAPTER 7. VALUE ADDED SERVICES ON BACKUP DATA and surnames. An obvious limitation deals with the time frame that we can reconstruct. The event log stored in the device is limited to a fixed size which restricts the vision of user activity to the latest. Also the size of stored sent and received SMS/MMS, if set, will limit precision. We were not able to perform name-surname/number matching and we just can access the log of the operations performed by the user in the last period as we have no access to mobile company’s customers data. Even holding this weak quantity of information the results look interesting as we were able to generate clusters mapping real life relationship between the user and her friends. 134 8 Conclusions and Future Work Conclusions In this thesis we proposed a full integrated solution which aims for solving the backup and synchronization problem in mobile environments. We propose to focus the management af the information, on how the information is logically structured (e.g., a contact will have name, surname, phone numbers. . . ). Our solution delegates backup client applications installed on mobile devices to extract data from the internal data stores of the device and send these data using a common format to a remote server. The approach proposed is based on three main parts: The first part of the system is a client application installed on the mobile device. The client extracts the information from the internal storage of the device and sends such information to the backup server using an extensible common format (i.e.,XML). Moreover the client is responsible to get information from the server and restore such information into the device. Client applications have been implemented differently depending on the availability of APIs on the specific platform. In Android and newer Symbian 135 CHAPTER 8. CONCLUSIONS AND FUTURE WORK devices we have been able to access data into datastores of the device backed up via standard APIs. Moreover in Android under some conditions (rooted device) it has been possible to access even applications settings. Unfortunately accessing data directly into datastores (in particular with writing permissions) was not possible for the client applications implemented for older versions of Symbian OS and for Microsoft Windows Mobile 5 and 6. These applications backup the full system and the backup is later analyzed on the backup server. All the applications developed interact with the server using the common format proposed to grant interoperability between vendors and operating systems. The second part of the system is the backup server. The server implements the functionalities of getting data from the client application, handle these data and store the information into a common database. In case of restore, the backup server provides access to internally stored information to the clients; access is given granting privacy and security to users. We implemented the backup server as to be the more standard, scalable and extensible as possible. The basic idea is that our backup server should use a standard communication protocol that can be exploited by every class of device. Experimental results have shown that mobile clients running on different architectures/operating systems can interact with the proposed server via HTTP/HTTPS accessing all the features provided. The backup server have been even enabled to extract personal data from old Symbian raw backup. We proposed a methodology to reverse engineer datastores where personal data are contained and implement the parsers that extract these data. 136 The third component of the system are the services on backup data. These services can be provided by the same provider of the backup, or from other authorized service providers. In this thesis we have proposed two kind of services; one focused on end users and another on business and administrators of the system. The first service provides to the users the capability of sharing part of data in their personal backup with some selected contacts of their choice. The second service implements a social network extractor which starting from backup data and data publicly available on the web, generates a social network and the cliques of contacts into the backup; this is done by clustering the various groups of interconnected contacts. The approach proposed has been considered by Telecom Italia to be used into the cubovision project to implement the set-top box backup operations. Part of the information stored by the user into his/her set-top box device is saved on a remote backup server. The set-top box device mainly contains video/audio files, but there are some other contents, such as applications installed and configured by the user, that are backed up using some ideas presented in this thesis. Future work Currently we are implementing an Apple iPhone and a RIM BlackBerry client application able to interact with the system implementing all the features of the Android application. Moreover we are extending the Android application to improve usability and improve performances. 137 CHAPTER 8. CONCLUSIONS AND FUTURE WORK We are also participating to the Ericsson Application Awards 20111 with the shared backup idea and with the improved Android application equipped with augmented reality. The social network extraction tool described in Section 7.2 is being improved to generate the social network of all the backup system; this, when used by a sufficient number of persons, will solve the name/surname problem. In fact the mobile phone number can be considered a unique identifier and allows the system to disambiguate homonyms or merge contacts named differently in different backups. The merge could be done considering set of names and surnames (data set containing common names and surnames can be found in the web); matching the name and surname field of a contact with commonly used ones will help ignoring nicknames and will make the system more precise. New services on backup data can be provided to users and to administrators such as integration with other systems such as interconnected TV, set-top boxes, laptop and tablet devices. We designed the server part of the system to be really extensible; a huge quantity of services using backup data can be provided. This opens the project to a plethora of novel ideas, in this thesis we described two use cases just to show how such an extensible backup system can be exploited by service providers. 1 http://www.ericssonapplicationawards.com/ 138 A The Symbian S60 format A.1 Address book Type flag Meaning Type flag Meaning 04024008 2402C003 24028004 0402000D 0C02C00D 04024009 0402400A 14024002 14024003 24024006 24024007 fax job mobile(work) video wv user ID po.box city extension (home) state (home) street (work) country (work) 14020001 04140280 1C02C00C 1402400D 0402C008 04028009 0402800A 14028002 14028003 24028006 3C02000B H fax home nickname video (home) url extension state street (home) country (home) postal code (work) DTMF Type flag Meaning Type flag Meaning 2402C004 0402C007 04028008 2402800D 14028001 0402C009 0402C00A 1402C002 2402C005 2402C006 3402800B W fax mobile ? video (work) url(home) street country postal code (home) po.box (work) city (work) note 04028007 1402C000 04020008 24020004 24024005 0402000A 14020002 14020003 24020006 24020007 general mobile(home) pager work url(work) postal code po.box (home) city (home) extension (work) state (work) Table A.1: Possible values for the rows of table “DATA TYPE TABLE”. They describe the type of attributes present in the “DATA BLOCK”. (Symbian S60 v2) 139 APPENDIX A. THE SYMBIAN S60 FORMAT Contacts and their data are stored in the Contacts.cdb file (located under C:\System\Data). During the methodology iterations, we found that contacts data were fragmented and spread across the entire file. In fact, after a contact update, Symbian preserves the old contact entry and appends new one at the end of the file with the same ID but fresh data. When the system performs a DB compression, obsolete entries are purged. After a first analysis, we found that data could grouped in three macro-areas (parts, see Table A.2). For each contact, the three parts are connected because each of them shares the same contact ID. The first part stores metadata about each contact and a block containing attributes like phone/fax/mobile numbers, snail mail address and notes: D2 10 FF 20 30 64 76 76 04 00 1D 04 14 04 04 1A 20 33 64 00 09 30 30 31 12 12 00 00 00 13 30 66 32 9D 9D 00 1F 00 00 65 66 37 FA FA 00 10 31 61 36 0F 0F 00 00 02 02 00 00 00 C0 00 00 00 07 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF 32 39 20 E1 00 20 E1 00 00 00 33 38 38 37 36 35 34 32 33 VD ID CXF1 20 30 30 65 31 32 30 30 66 66 61 39 64 31 32 37 36 EDIT DATE CREATION DATE 04 00 00 00 00 00 00 00 1F TYPE TABLE LEN | TYPE TABLE | 04 00 00 00 00 DATA BLOCK LEN FIELD FLAG DATA BLOCK The second part stores contact’s name, surname and company: 140 A.2. CALENDAR 10 00 00 00 12 50 61 70 E0 20 43 65 6C 6C 09 13 00 10 ID NAME LENGHT NAME CXF1 The third part stores email addresses: 1C 32 00 10 24 6F 6C 00 00 03 00 00 00 64 6F 6D 65 6E 69 63 40 69 62 65 72 6F 2E 69 74 EMAIL LENGHT EMAIL ID EMAIL FLAG ID EMAIL ADDRESS LEN EMAIL ADDRESS | A.2 Calendar Calendar entries are stored in Calendar file (located under C:\System\Data). A calendar’s entry belongs to one of the following categories: anniversary, meeting or note. A sample of calendar’s entry, an anniversary without alarm, is shown below: 141 APPENDIX A. THE SYMBIAN S60 FORMAT 03 0F 0A 52 02 01 A4 05 AA 01 08 20 41 61 0E AA AA 00 00 00 00 00 00 00 00 28 52 03 28 A2 AC 00 00 01 00 00 00 6E 6E 69 76 65 72 73 72 79 41 6F 66 66 20 29 28 28 FT VS AF BL ID Flag1 CD (GG MM) 05 AD A2 AC Flag3 Flag2 TL Text | ET SD ED An example of anniversary with the alarm setted to on is shown here below: 142 A.2. CALENDAR 03 0F 1A 52 02 01 A4 05 AA 01 3C 43 72 09 08 1E 41 61 0E AA AA 00 00 00 00 00 00 00 00 28 5B 03 28 A2 AC 00 00 01 00 61 6D 01 00 6C 65 6E 41 6C 61 53 6F 75 6E 64 32 00 00 6E 6E 69 76 65 72 73 72 79 41 6F 6E 20 29 28 28 FT VS AF BL ID Flag1 CD (GG MM) 05 AD AT VS2 ANL ATXT | 09 01 00 Flag2 TL Text | ET SD ED An example of meeting with the alarm set to off is shown below: 143 APPENDIX A. THE SYMBIAN S60 FORMAT 00 0F 0A 50 02 01 A4 01 A6 A8 01 08 1A 4D 0E A6 A6 00 00 00 00 00 00 00 00 28 C6 02 28 28 00 00 00 00 00 00 65 20 28 28 65 74 41 6F 66 66 52 64 61 79 29 64 05 91 05 FT SF 00 00 00 AF 50 ID Flag1 CD (GGMM) RF ERD A8 28 01 00 00 00 00 Flag2 TL Text ET SD ED An example of meeting with the alarm set to on is shown below: 144 A.2. CALENDAR 00 0F 1A 50 02 01 A4 01 A5 A6 01 3C 43 72 05 08 18 4D 0E A5 A5 00 00 00 00 00 00 00 00 28 34 03 28 28 00 00 00 00 61 6D 00 00 6C 65 6E 41 6C 61 53 6F 75 6E 64 AF 00 00 65 20 28 28 65 74 41 6F 6E 52 64 61 79 29 EC 01 90 03 FT SF 00 00 00 AF 50 ID Flag1 CD (GGMM) RF ERD A6 28 01 00 00 00 00 RTN LEN RTN | 05 00 00 Flag2 TL Text ET SD ED Some meetings are saved in a different way, we call this kind of entries special meetings; here below is shown a special meeting with the alarm set to off and without repetition. 145 APPENDIX A. THE SYMBIAN S60 FORMAT 00 0F 08 50 02 01 A4 08 1C 53 66 0E A6 A6 00 00 00 00 00 28 00 00 00 00 7A 03 00 6D 66 20 28 28 65 52 29 64 91 65 74 41 6F 6F 66 66 05 05 FT SF 00 00 00 AF 50 ID Flag1 CD (GGMM) Flag2 TL Text | ET SD ED An example of special meeting with the alarm set to on and repetition set to off is shown below: 00 0F 18 50 02 01 A4 3C 43 72 05 08 1A 53 6E 0E A6 A6 146 00 00 00 00 00 00 00 00 28 9C 03 61 6D 00 00 6C 65 6E 41 6C 61 53 6F 75 6E 64 AF 00 00 4D 52 20 28 28 65 6F 29 94 C1 65 74 41 6F 66 66 02 02 FT SF 00 00 00 AF 50 ID Flag1 CD (GGMM) RTN LEN RTN | 05 00 00 Flag2 TL Text | ET SD ED A.3. EVENTS LOG Notes are stored in a similar format as special meeting. An example of note is shown below: 02 0F 08 52 02 01 A4 08 10 44 0E A5 A5 00 00 00 00 00 28 00 00 00 00 6F 03 00 61 79 4E 6F 74 65 20 29 28 28 FT VS AF BL ID Flag1 CD {ID2} Flag2 NL Text ET SD ED A.3 Events log Events and status changes are stored in Logdbu.dat file (located under C: \System\Data), and can belong to the following categories: sms , mms, voice and data calls, SIM changes. In the last case, the event is stored as an sms, so we will not examine it. Details about the fields are reported in Table A.4. An example of SMS is shown in the following: 147 APPENDIX A. THE SYMBIAN S60 FORMAT 03 BA FF B2 60 1C 52 6F 04 80 44 54 20 45 4E 65 1A 2B XX 18 00 00 00 00 00 A3 EB EB B6 2C E1 00 16 00 00 61 6D 6F 6E 61 20 4D 72 65 74 74 69 00 05 00 33 4F 49 55 4D 41 20 4D 20 53 45 2C 6F 41 49 43 20 6F 67 4E 4E 49 58 76 49 56 52 20 76 20 49 45 55 69 53 54 20 4E 61 45 4F 49 41 6D 52 20 4E 20 65 41 41 53 43 6E 33 39 XX XX XX XX XX XX XX XX XX 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 A4 A voice call example is shown below: 148 20 44 49 45 74 03 DATE FF NAME FLAG | 60 16 00 00 NAME LENGTH NAME 04 00 05 00 33 MESS LENGTH | | MESS | | | NUMBER LENGHT NUMBER | 18 DIRECTION 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 A4 A.3. EVENTS LOG 00 1F 01 70 63 1C 52 6F 01 00 00 67 1A 2B XX 20 17 66 4A EF C8 2C E1 00 16 00 00 61 6D 6F 6E 61 20 4D 72 65 74 74 69 00 00 00 02 36 33 XX 03 00 39 XX 00 00 XX XX XX XX XX XX XX 00 02 A6 A4 00 DATE 01 NAME FLAG 63 16 00 00 NAME LENGHT NAME | 01 DIRECTION CALL TIME 67 02 36 NUMBER LENGHT NUMBER | 20 03 00 00 02 A6 17 00 00 A4 An MMS example is reported below: 05 41 01 F0 00 0E 54 00 02 30 36 5A 47 86 12 05 2A E1 00 00 00 00 69 6D 20 6D 6D 73 00 07 00 00 00 00 08 34 31 38 2C 37 32 37 05 DATA 01 F0 00 00 00 00 0E PROVIDER 00 00 07 00 00 00 02 00 NS NUMBER 5A A data call, or data traffic log entry, may belong to two different categories which are related to the type of storage format used: mms-type and sms-type. 149 APPENDIX A. THE SYMBIAN S60 FORMAT The text body of sms or mms is used to store a single packet content. An example of sms-type data call is reported below: 03 C1 FF A2 00 7E 61 7A 69 4D 69 0A 34 18 36 5F 21 08 2A E1 00 03 04 50 72 69 76 41 61 00 00 65 65 6F 61 49 72 00 33 72 20 20 72 4C 65 00 03 20 69 64 65 20 75 6C 65 20 65 74 20 76 41 20 69 73 69 6C 61 39 30 30 31 00 00 6C 65 20 69 73 69 72 61 63 73 7A 76 74 65 6F 7A 69 74 20 63 03 DATA FF A2 03 00 00 00 03 00 04 00 33 | | PACKET CONTENT | | | SEP MES NUM 34 39 30 30 31 FLAG END A.4 SMS SMS are stored in the first folder (assuming that the folders are ordered alphabetically) in /System/Mail folder. An example of received message is reported in the following table: 150 A.4. SMS 68 3C 00 ... 00 10 00 25 3A 00 0C 52 69 0E 20 29 34 00 10 45 01 00 00 01 00 00 00 00 01 00 FA 54 17 28 33 34 39 44 69 73 74 00 00 00 00 00 00 00 00 01 00 00 00 F1 5A 15 02 91 34 2B 33 39 38 35 38 15 00 81 28 33 34 39 10 68 3C 00 00 00 10 63 65 76 18 04 01 00 00 02 00 00 00 00 00 00 46 F2 29 E1 00 34 36 37 37 31 34 36 34 65 00 02 00 00 41 66 00 03 00 61 6E 6F 20 41 6C 65 02 00 00 F2 29 E1 00 33 32 30 35 35 30 30 34 36 37 37 31 34 36 68 3C 00 10 68 3C ... 00 10 00 00 00 00 flag1 Text ET Received Flag Mar 00 10 45 04 01 00 01 00 00 00 02 00 Received Flag 00 00 00 00 00 00 00 01 00 00 00 Date SNL Number NL Name 00 00 00 00 00 02 00 00 00 02 03 00 00 00 01 00 00 00 00 00 00 00 SCRD SCF SCNL SCN | ESNF ESNL ESN An example of sent message is reported in the following table: 151 APPENDIX A. THE SYMBIAN S60 FORMAT 68 3C 00 ... 00 10 00 25 3A 00 10 49 6E 76 0E 20 29 34 00 10 F5 00 00 00 00 00 02 03 01 00 00 00 00 00 0C 09 00 91 34 2B 33 39 30 30 30 04 91 34 2B 33 39 36 37 37 00 00 00 93 2D 152 10 68 3C 00 00 00 10 69 61 74 6F 18 02 01 00 00 00 00 00 00 00 00 01 00 22 F2 1C 32 E1 00 33 34 39 32 38 39 38 33 34 39 34 31 34 36 4B F2 29 E1 00 68 3C 00 10 68 3C ... 00 10 00 00 00 00 flag1 TL Text ET Received Flag Mar 00 10 F5 02 01 00 00 00 00 00 00 00 Received Flag 00 02 03 00 00 00 01 00 00 00 01 00 00 00 Date flag2 UNL UN | RNF RNL RN | 00 00 SCRD A.4. SMS Field Name VD ID CXF1 Size (Bytes) Description Example D2 10 FF FF 76 E1 76 E1 1D 64 00 09 FF 12 00 12 00 00 02 00 02 00 00 Stores the size in bytes of DATA BLOCK 04 14 00 04 00 04 1A EDIT DATE (ED) 8 CREATION DATE (CD) 8 (Uncertain) Used by DB indexing Contact identifier (stored as little-endian) Flag. The first byte have to be equal to the last four. The second byte depends on Symbian version; values can be 09, 0B, 10. ”13 00 10” is constant. 17 bytes-offset from CXF1. Represents the number of microseconds from year zero. Represents the number of microseconds from year zero. TYPE TABLE LEN (T T L) TYPE TABLE (T T ) 1 41 bytes offset from CXF1. Stores the lenght (in bytes) of the TYPE TABLE DATA BLOCK LEN (DBL) FIELD FLAG (F F ) DATA BLOCK (DB) ID NAME LENGHT (N L) NAME SURNAME LENGHT (SL) SURNAME COMPANY NAME LENGHT (CN L) COMPANY NAME (CN ) CONTACT END EMAIL LENGHT (EL) EMAIL ID EMAIL FLAG ID EMAIL ADDRESS LENGHT (EF L) EMAIL ADDRESS 2 4 9 TTL 1 2 DBL−2F F −1 2 4 1 NL 2 Table of 12-bytes lenght rows, describing the types of the corrisponding data in the DATA BLOCK. The first 5 bytes are the table start flag. The last 5 bytes indicate the end table flag. The first 12-bytes row does not contain useful information. For further information about data types, see Table A.1. 00 00 13 00 10 FF FF 9D FA 0F 20 9D FA 0F 20 00 00 00 C0 00 00 00 00 00 07 00 00 04 00 00 00 00 00 00 00 00 00 Flag which is repeated as many times as the number of fields in DATA BLOCK. 20 00 Stores contact’s information, according to fields type described in TYPE TABLE. Each field is separated by 00. 33 33 38 38 37 36 35 34 32 33 Contact identifier (stored as little-endian) - The same as above. The size in nibbles of the name field. The contact’s name 1 The size in nibbles of the surname field. 10 00 00 00 0E 43 6C 61 75 64 69 6F 08 SL 2 1 The contact’s surname The size in nibbles of the company field. 43 65 71 61 0C 1 The contact’s company 44 72 2E 77 68 79 4 Flag. Denotes the end of a contact’s details. The first byte depends on Symbian version (09, 0B, 10, as in CXF1 field). The other bytes are constant. 09 13 00 10 The size in nibbles of the email address block. The ID of email address Flag. Contact identifier (stored as little-endian) - The same as above. The size in nibbles of the email address string. 1C 32 00 00 00 03 10 00 00 00 24 The email address string. 6F 64 6F 6D 65 6E 69 63 40 6C 69 62 65 72 6F 2E 69 74 1 EL 2 4 4 1 EF L 2 Table A.2: This table lists all contact’s data which can be found in the Contacts.cdb. Since data are located in three logical file areas, the table is split in three parts. 153 APPENDIX A. THE SYMBIAN S60 FORMAT Field Name Size (Bytes) FT 1 VARIABLE SEQUENCE(V S) 4 ALARM FLAG (AF ) 1 BODY LENGTH (BL) ID Flag1 CREATION DATE (CD) DAY (GG) MONTH (M M ) REP FLAG (RF ) 1 4 3 4 2 2 1 REPEAT UNTIL 2 ANNIVER DATE (AD) 2 ALARM TIME (AT ) 2 VAR SEQ 5 AL NAME LEN (AN L) AL NAME (AN ) Flag2 TEXT LENGTH (T L) TEXT 1 AN L 4 3 1 TL − 1 2 END TEXT (ET ) START DATE (SD) 3 2 START DATE M (SDM ) END DATE (ED) 4 END DATE M (EDM ) 4 2 Description Example Indicates the event type: if 00 is a Meeting if 02 is a daynote if 03 is an anniversary. A four bytes variable secuence if the entry represents a Meeting and the first byte is 10 this indicates the meeting needs to be processed in a different way This byte indicates if the alarm is set to ON (1A for normal events 18 for Special Meetings) or OFF (0A for normal events 08 for Special Meetings). Represents the length of the in nebbles of the following part of the entry. Calendar entry identifier (stored as little-endian) Indicates a calendar entry in this area Stores the creation date of the Calendar entry, is composer by GG and MM. Represents the day part of the CD field. Represents the month part of the CD field. Appears only for meeting type entryes; indicates if the repetition of the meeting is daily (value 01), weekly (value 02), montly (value 03). Appears only for meeting type entryes; indicates the date until the event has to be repeated. Stores the date of the event, is an integer counting the number of days since 1-1-1980. This field appears only if the entry type is Anniversary. If the alarm is set to on stores the information about the alarm time, else is unused. For the day note this field does not appear. is a variable secuence, in case of Anniversary the first 3 bytes are 01 00 00 if the anniversary’s alarm is set to off the lasttwo are 01 00 may vary but their value is always lass than 32., Indicates the size in nibbles of the ALARM NAME field Stores a text field indicating the ringtone name for the alarm 00 is a flag characterizing a calendar event Indicates the size in nibbles of the TEXT field Stores the text field of the calendar entry Is the end flag of the TEXT field the value is always 0E 20 29 Stores the starting date of the entry if is a note or an anniversary else it does not appear Stores the starting date of the entry if is a meeting else it does not appear Stores the ending date of the entry if is a note or an anniversary else it does not appear Stores the ending date of the entry if is a meeting else it does not appear 0F OO OO OO 0A 52 02 10 A4 A4 52 01 00 00 00 00 00 28 52 03 28 03 A5 28 AA 28 A2 AC 01 00 00 01 00 3C 43 6C 75 08 20 41 72 6F 0E A5 61 61 6E 00 6C 65 6E 41 72 6D 53 6F 64 32 00 6E 73 66 20 28 6E 69 76 65 61 72 79 41 66 29 A5 28 EC 01 A5 28 A5 28 90 03 Table A.3: This table lists all calendar entries such as Notes Meetings Anniversaries stored in the Calendar file. 154 A.4. SMS Field Name Size (Bytes) Description Example COMMON PART START DATA FLAG (SDF ) 1 DATE 8 END DATA FLAG (EDF ) NAME FLAG (N F ) 1 NAME LENGTH (N L) NAME 1 1 NL 2 Indicates a date starting at next byte, this flag combined with the EDF indicates what kind of data are stored in the session. (03 DATA FF indicates an SMS, GPRS traffic or ‘DATAMESSAGE’ MMS, 05 DATA 01 MMS recived from the operator or GPRS traffic to the operator, 00 DATE 01 indicates incoming and outgoing calls) Stores the date in wich the operation has been performed. The date in stored in big endian format. Is located with an offset of 8 after the SDF and indicates that a date finishes here. Is located with an offset of 1 after the EDF and if the entry refers to a contact present in the address book (for the SMS the value is B2 if the message is to/from a contact in the address book for calls the value can be 70 if present else 60). If the contact is present in the address book is located with an offset of 4 after the N F and indicates the length in nibbles of the subsecuent field N AM E. If the contact is present in the address book is located with an offset of 1 after the N L and stores the name of the contact stored in the address book. 03 BA A3 EB EB B6 2C E1 00 FF B2 1C 52 61 6D 6F 6E 61 20 4D 6F 72 65 74 74 69 SMS PART MESS LENGTH (M L) MESS NUMBER LENGTH (N U L) NUMBER 1 ML 2 1 NUL 2 DIR 1 DIRECTION 1 If the contact is present in the address book is located with an offset of 5 after the N AM E field and indicates the length in nibbles of the subsecuent field M ESS, else is st with an offset of 5 after N F . Is located with an offset of 1 after the M L and stores the message sent/recived. Is located with an offset of 1 after the M ESS and indicates the length in nibbles of the subsecuent field N U M BER. Is located with an offset of 1 after the N U L and stores the number of the sender/recipient of the message. Is located with an offset of 2 after the N U M BER and stores the information about the direction of the data stored in the section (value 00 indicates a sent message else the value will be 02). 80 44 52 49 43 45 20 69 67 1A 4F 41 54 49 4D 43 61 1A 4D 20 4F 52 45 45 6D 41 54 20 45 20 4E 65 4E 49 41 20 58 41 6E 49 20 44 49 20 2C 74 20 49 20 4E 55 6F 65 53 4E 55 53 4E 76 20 45 56 53 49 41 76 6F 2B 33 39 XX XX XX XX XX XX XX XX XX XX 02 CALL PART CALL TIME (CT ) 4 NUMBER LENGHT (N U L) NUMBER 1 NUL 2 If the contact is present in the address book is located with an offset of 2 after the N AM E and stores the information about the direction of the data stored in the section (value 00 indicates an exiting call else the value will be 02). Is located with an offset of 2 after the DIR field and stores the information about the duration of the call, data is atored in big endian format. Is located with an offset of 4 after the CT and indicates the length in nibbles of the subsecuent field N U M BER. Is located with an offset of 1 after the N U L and stores the number of the sender/recipient of the message. 02 00 00 00 00 1A 2B 33 39 XX XX XX XX XX XX XX XX XX XX MMS PART PROV LEN (P L) PROVIDER NUM START (N S) NUMBER 1 PL 2 2 NUL 2 is located with an offset of 5 after the EDF and indicates the length in nibbles of the subsecuent field P ROV IDER. Is located with an offset of 1 after the P L field and stores the information about the mms service provider’s name. Is located with an offset of 4 after the CT and indicates the length in nibbles of the subsecuent field N U M BER. Is located with an offset of 1 after the N U L and stores the number of the sender/recipient of the message. 0E 54 69 6D 20 6D 6D 73 30 08 36 34 31 38 2C 37 32 37 Table A.4: This table lists all event entries such as SMS, MMS, voice and data calls, SIM change. 155 APPENDIX A. THE SYMBIAN S60 FORMAT Field Name Size (Bytes) Description Example COMMON PART flag 1 4 REC FLAG MARK (RF M ) REC FLAG (RF M ) 4 1 If this flag is in the starting part of the file or at offset 5 the file does not contain SMS so there will be no need to parse it. Received Flag Marker indicates a recived message 25 3A 00 10 Starts at byte 13 after the (RF M ). If its value is 01 then the message is recived else if the value is 00 the message is a sent message. Generally is just after the flag 1 indicates the message’s text length. If appears after TEXT LEN indicates a message from a special number. Stores the text of the SMS message. 10 TEXT LEN (T L) SPEC MES (SM ) TEXT 1 1 TL − 1 2 END TEXT (ET ) 1 Indicates the end of the message text. DATE 8 This field starts 12 bytes after the recived flag (REC FLAG). SEND NUM LEN (SN L) NUMBER 1 This is an otiponal field: appears only if the sender’s number is stored in the address book. Indicates the length of the sender NUMBER field. Stores the number of the sender if the sender appears in the address book. 20 29 34 18 10 02 49 6E 76 69 61 74 6F 0E RECEIVED MESSAGE SN L 4 NAME LENGTH (N L) NL 4 NAME SERV CENT (SCRD) SERV CENT (SCF ) SERV CENT (SCN L) SERV CENT (SCN ) 1 This is an otiponal field: appears only if the sender’s number is stored in the address book. Indicates the length of the following NAME field. Stores the name of the sender if the sender appears in the address book. FA 54 17 46 F2 29 E1 00 28 33 34 39 34 36 37 37 31 34 36 34 44 REC DATE 8 Is stored with an offset of 23 bytes after the name field. FLAG 2 Indicates that the message service center’s number starts here. 69 6E F1 E1 02 NUM LEN 1 Indicates the length of the following SERV CENT NUM field. 34 Stores the number of the messge service provider. SCN L 4 NUM EFF SERV CENT FLAG (ESCF ) EFF SERV NUM LEN (ESN L) EFF SERV NUM (ESN ) 73 74 65 66 61 6F 20 41 6C 65 5A 15 41 F2 29 00 91 3 Indicates that the effective message service center’s number starts here. 2B 33 39 33 32 30 35 38 35 38 35 30 30 15 00 81 1 Indicates the length of the following EFF SERV NUM field. 28 Stores the effective number of the SMS message service provider. 33 34 39 34 36 37 37 31 34 36 ESN L 4 SENT MESSAGE DATE 8 Is stored with 14 bytes offset from the end of REC FLAG. Flag 2 UNDEF NUMB LEN (U N L) UNDEFINED NUMBER (U N ) 2 1 Is a flag indicating the presence of a recived message. Indicates the length of the following UNDEFINED NUMBER field. RECIVER NUMB FLAG (RN F ) RECIVER NUMBER LEN (RN L) RECIVER NUMBER (RN ) SERV CEN REC DATE (SCRD) RN L 4 2 1 RN L 4 8 It is not clear which number does this field stores. Maybe the number of the sender’s message service provider. This flag indicates the presence of the sender’s number in the next bytes. 2B 33 39 33 34 39 32 30 30 30 38 39 38 04 91 Indicates the length of the following RECIVER NUMBER field. 34 Stores the reciver’s number. 2B 34 36 00 E1 Stores the reciving date for the message service provider, it is stored with an offset of 2 bytes from the end of RECIVER NUMBER. Table A.5: This table lists all fields characterizing an SMS. 156 00 0C 09 22 F2 1C 32 E1 00 00 91 34 33 39 33 34 39 36 37 37 31 34 93 2D 4B F2 29 00 B The Backup communication protocol B.1 Backup item /backup/{backupType}/device/{imei}/ HTTP method: PUT Attributes: backupType indicates the type of backup performed; possible values are full or diff. imei allows to identify the backed up device via its IMEI number. <backupItem> <timestamp>2010-07-04 20:01:21.902</timestamp> <backupType>full</backupType> <calendar_bak>false</calendar_bak> <contact_bak>false</contact_bak> <file_bak>false</file_bak> <sms_bak>true</sms_bak> <app_settings_bak>false</app_settings_bak> </backupItem> Figure B.1: Example of XML payload for a backup item. 157 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL B.2 Contact item /backup/{backupType}/device/{imei}/contacts/{contactItemName} HTTP method: PUT and GET Attributes: contactItemName is the unique identifier used by the client for a contact resource; backupType indicates the type of backup performed; possible values are full or diff in case of PUT request and restore in case of GET request. <contact> <detail__list> <detail> <label>label0</label> <value>value0</value> </detail> </detail__list> <email>[email protected]</email> <email>[email protected]</email> <given__name>name</given__name> <phone__number__list> <phone__number> <number>+123456789054</number> <type>2</type> </phone__number> <phone__number> <number>+12309876543</number> <type>1</type> </phone__number> </phone__number__list> <backupItem> <timestamp>2010-07-07 12:20:12.997</timestamp> </backupItem> <status>new</status> </contact> Figure B.2: Example of XML payload for a contact item. 158 B.3. CALENDAR ITEM B.3 Calendar item /backup/{backupType}/device/{imei}/calendar/{calendarItemName} HTTP method: PUT and GET Attributes: calendarItemName is the unic identifier used by the client for the calendar items; <calendar> <alarmOffset>0</alarmOffset> <allDay>1</allDay> <cal_id>6</cal_id> <calendarName>meeting name</calendarName> <description>meeting description</description> <endTime>1277769600000</endTime> <location>somelocation</location> <startTime>1277683200000</startTime> <summary>Meeting</summary> <type>event</type> <backupItem> <timestamp>2010-07-04 20:50:47.119</timestamp> </backupItem> <status>new</status> </calendar> Figure B.3: Example of XML payload for a calendar item. 159 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL B.4 Message item /backup/{backupType}/device/{imei}/sms/{smsItemName} HTTP method: PUT and GET Attributes: smsItemName is the unic identifier used by the client for the SMS resources; <sms> <body>text</body> <sender>+123456789000</sender> <type>2</type> <backupItem> <timestamp>2010-07-04 20:57:33.669</timestamp> </backupItem> </sms> Figure B.4: Example of XML payload for a message item. 160 B.5. GENERIC FILE ITEM B.5 Generic file item Files are sent in Base-64 encoding, if the file is too big for a single package the file is splitted in several chunk and sent chunk by chunk to the server which keeps track of the chunks received and assembles the file after all chunks have been received. /backup/{backupType}/device/{imei}/files/{fileItemName} /init_byte/{init_byte}/final_byte/{final_byte} HTTP method: PUT and GET Attributes: fileItemName is the unic identifier used by the client for files, (e.g., the path); init byte is the first byte of the file’s chunk sent. final byte is the last byte of the file’s chunk sent. <file> <content>UBy/eQhuUlasfiUe/bocsDM3TbRsHPAfASGQj4fc1 +eRu2vnsuab0z6kYYlmo1BWtKbU/wBrGmkxtMLctJLwHjTiRSn h06ZAhwskO9kcVyaUFDUUFelcgQ4U4Jgjc3qx5fDTc9/ ....... /hiZsZZEQkoILIo6kCm30/TlRk0SktinpQ==</content> <file_type>file</file_type> <final_byte>169999</final_byte> <init_byte>160000</init_byte> <name>09.jpg</name> <backupItem> <timestamp>2010-07-04 21:02:39.532</timestamp> </backupItem> </file> Figure B.5: Example of XML payload for a generic file item. 161 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL B.6 Setting item /backup/{backupType}/device/{imei}/app_settings/{fileItemName} /init_byte/{init_byte}/final_byte/{final_byte} HTTP method: PUT and GET Settings are managed as files, for android are usually Shared Preferences files, for iPhone plist files, these files can be analyzed on the server to extract data and make these data interoperable. B.7 List methods HTTP method: GET These methods are used to obtain the lists of resources present in the last backup. List methods have been implemented for; contacts, files, sms, calendar, settings. /backup/diff/device/{imei}/contactsIdList /backup/diff/device/{imei}/filesIdList /backup/diff/device/{imei}/smsIdList /backup/diff/device/{imei}/calendarIdList /backup/diff/device/{imei}/appSettingsIdList Figure B.6 shows the XML response produced by the server for a list of items required using the contactsIdList method. Each dataItem contains two information: the itemName that is the unique identifier of the client and the 162 B.7. LIST METHODS <itemIdList> <idList> <dataItem> <itemName>480</itemName> <backupItem> <timestamp>2010-07-04 20:25:40.0</timestamp> </backupItem> </dataItem> <dataItem> <itemName>481</itemName> <backupItem reference="../../dataItem/backupItem"/> </dataItem> ....... </idList> </itemIdList> Figure B.6: Example of XML payload for a contact list response. timestamp of the last backup. These data are used when performing the differential backup to undestand which contents should be updated. 163 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL B.8 Restore B.8.1 Listing items on the server /backup/device/{imei}/backup_item_list HTTP method: GET This method provides the list of all the backups present on the server for the device identified by the IMEI and for all the devices owned by the authenticated user. When a user decides to restore from a backup, he/she choose the backup to restore from the list given by this method. Figure B.7 shows a typical list of backups. B.8.2 Choosing data to be restored /backup_restore/device/{imei}/{data_type}/{backup_id} HTTP method: GET Attributes: data type indicates the type of data to be restored, possible values are contact, calendar, file, SMS or app depending on what to be resotred; backup id identifies the backup on the server. Figure B.8 shows the response from the server to a restore request. Choice of data to be restored can be done punctually identifying just one item on the server. The response to a request like that will be like that shown in Figure B.8. /restore/device/{imei}/{data_type}/{item_id} 164 B.8. RESTORE <backupItemList> <backupList> <backupItem> <timestamp>2010-06-23 01:53:05.0</timestamp> <backupType>full</backupType> <deviceItem> <imei>00000001</imei> </deviceItem> <file_bak>true</file_bak> <sms_bak>true</sms_bak> <calendar_bak>true</calendar_bak> <contact_bak>true</contact_bak> <app_settings_bak>false</app_settings_bak> <backup_id>1</backup_id> </backupItem> ....... <backupItem> <timestamp>2010-07-05 11:38:07.0</timestamp> <backupType>diff</backupType> <deviceItem> <imei>00000000</imei> </deviceItem> <file_bak>false</file_bak> <sms_bak>false</sms_bak> <calendar_bak>false</calendar_bak> <contact_bak>true</contact_bak> <app_settings_bak>false</app_settings_bak> <backup_id>27</backup_id> </backupItem> </backupList> </backupItemList> Figure B.7: Example of XML payload for a setting item. 165 APPENDIX B. THE BACKUP COMMUNICATION PROTOCOL <restoreItemList> <restoreList> <restoreItem> <item_id>49</item_id> <description>item description</description> </restoreItem> ....... <restoreItem> <item_id>29</item_id> </restoreItem> </restoreList> </restoreItemList> Figure B.8: Restore method response. 166 C The Sharing communication protocol C.1 Sharing methods C.1.1 Item listing Returns the list of the sharable items present in the last backup. /sharing/device/{imei}/contactsIdList HTTP method: GET <itemIdList> <idList> <dataItem> <itemName>480</itemName> <backupItem> <timestamp>2010-07-04 20:25:40.0</timestamp> </backupItem> </dataItem> <dataItem> <itemName>481</itemName> <backupItem reference="../../dataItem/backupItem"/> </dataItem> </idList> </itemIdList> Figure C.1: Example of XML payload for a list of items. 167 APPENDIX C. THE SHARING COMMUNICATION PROTOCOL Similar methods are available for files (filesIdList) and calendars (calendarIdList) C.1.2 Share a item Method used to share a item. /sharing/device/{imei}/{data_type}/sharing_item/{id} HTTP method: PUT and GET Attributes: data type can be contact, calendar or file in case of PUT; in case of GET file cannot be used. Files are managed by /sharing/device/{imei}/file/init_byte/{init_byte}/final_byte/ {final_byte}/sharing_item/{id} id represents the itemName when the request method is PUT, when the method is GET, id is the identifier into the server’s database. <sharingItem> <group> <groupId>7</groupId> </group> <isLB>false</isLB> </sharingItem> Figure C.2: Example of XML payload to share an item with a group. For each data to be shared sharingItem, for each group group with which the user wants to share information. isLB indicates whether the information is geotagged or not in this case the value should be false in PUT requests as the method does not handle location based data. 168 C.1. SHARING METHODS C.1.3 Location based sharing Method used to share a item using location based attributes. /sharing/device/{imei}/{data_type}/lb_sharing_item/{id} HTTP method: PUT and GET Attributes: the method works as the non location base method, even in this case there is a method to handle files in case of GET request; /sharing/device/{imei}/file/init_byte/{init_byte}/ final_byte/{final_byte}/LB_SharingItem/{id} sharingItem and group, in this method are used as in the non location based case; latitude, longitude and radius (see Figure C.3) can be defined to set the area where the information is available. isLB indicates whether the information is geotagged or not. <sharingItem> <group> <groupId>x</groupId> </group> <latitude>41.963706</latitude> <longitude>12.501572</longitude> <radius>5000</radius> <isLB>true</isLB> </sharingItem> Figure C.3: Example of XML payload to share an item with a group using location. When the request is a GET and the isLB field is true, latitude, longitude and radius are used to locate the item on the map. 169 APPENDIX C. THE SHARING COMMUNICATION PROTOCOL C.1.4 Listing shared data This method is used to list all information shared by user’s groups; the method can be location based or not. /sharing/device/{imei}/{data_type}/sharing_item_list /sharing/device/{imei}/{data_type}/lb_sharing_item_list HTTP method: GET Attributes: data type can assume contact, calendar or file value depending on the kind of data to be retrieved; <sharingItemList> <sharingList> <sharingItem> <group> <groupName>University</groupName> <groupId>7</groupId> </group> <sharing_id>2</sharing_id> <description>Johnn Doe</description> <isLB>false</isLB> </sharingItem> <sharingItem> <group> <groupName>University</groupName> <groupId>7</groupId> </group> <sharing_id>3</sharing_id> <description>Mike Black</description> <isLB>false</isLB> </sharingItem> ................. </sharingList> </sharingItemList> Figure C.4: Example of XML payload for a list of items. 170 C.1. SHARING METHODS Result does not contain data, but metadata visible in sharingItem; group indicates the group with which the information is shared, sharing id is the identifier of the data on the server and description contains a human readable description of the content shared. Results can be filtered by group using the following method with the identifier of the group in group id field. /sharing/device/{imei}/{data_type}/sharing_item_list/group/ {group_id} 171 APPENDIX C. THE SHARING COMMUNICATION PROTOCOL C.2 Groups methods C.2.1 Creating group Using this method the user can create a new group. /sharing/device/{imei}/group HTTP method: PUT <group> <memberList> <username>[email protected]</username> </memberList> <groupName>University</groupName> </group> Figure C.5: Example of XML payload to create a group. Such method gets groupName field to set the name of the group, and all the usernames in the memberList to set the users in the group. C.2.2 Listing groups Method used to get the list of groups available for the user, and the users participating the group. sharing/device/{imei}/group_list HTTP method: GET 172 C.2. GROUPS METHODS <groupItemList> <groupList> <groupItem> <groupName>University</groupName> <admin> <nickname>johnn</nickname> </admin> <memberList> <nickname>johnn</nickname> <nickname>bill</nickname> </memberList> <groupId>7</groupId> </groupItem> <groupItem> <groupName>Work</groupName> <admin> <nickname>mike</nickname> </admin> <memberList> <nickname>johnn</nickname> <nickname>bill</nickname> </memberList> <groupId>12</groupId> </groupItem> </groupList> </groupItemList> Figure C.6: Example of XML payload of a list of groups. C.2.3 Handling invitations Using this method the user can invite other users to a group or handle his/her invitations to groups. /sharing/device/{imei}/invitations HTTP method: PUT and GET 173 APPENDIX C. THE SHARING COMMUNICATION PROTOCOL <groupItemList> <groupList> <groupItem> <memberList> <username>[email protected]</username> <username>[email protected]</username> </memberList> <groupId>5</groupId> </groupItem> <groupItem> <memberList reference="../../groupItem/memberList"/> <groupId>7</groupId> </groupItem> </groupList> </groupItemList> Figure C.7: Example of XML payload to invite users to a group. In PUT case the user sends the XML in Figure C.7 with username fields set using the usernames of the users to be invited to the group groupId. 174 C.2. GROUPS METHODS <groupItemList> <groupList> <groupItem> <groupName>Lavoro</groupName> <admin> <username>[email protected]</username> <nickname>johnn</nickname> </admin> <groupId>4</groupId> </groupItem> ................................ <groupItem> <groupName>University</groupName> <admin> <username>[email protected]</username> <nickname>jack/nickname> </admin> <groupId>8</groupId> </groupItem> </groupList> </groupItemList> Figure C.8: Example of XML payload of invitations received by the user. In GET case the user receives the XML in Figure C.8 with all the groups where is invited. Using the invitations response method using a PUT request the user can decide, setting the status to IGNORED, ACCEPTED or REFUSED whether to ignore, accept or refuse the request. /sharing/device/{imei}/invitations_response/group/{group_id}/ {status} 175 Bibliography [1] Jon Toigo. Disaster recovery planning : managing risk and catastrophe in information systems Yourdon Press, Englewood Cliffs N.J., 1989. [2] Jon Toigo. Disaster recovery planning : preparing for the unthinkable. Prentice Hall, Upper Saddle River NJ, 3rd ed. edition, 2003. [3] ADR Data Recovery. Data loss facts, 2008. http://www. adrdatarecovery.com/content/adr_loss_stat.html. [4] Inc. loss, ONTRACK 2001. Data International. Understanding data http://www.ontrackdatarecovery.com/ understanding-data-loss/. [5] DATAMATE. Microsoft data loss findings, 2001. http://www. datamate.com.au/content/view/14/. [6] Lawrence M. Bridwel and Peter Tippet. ICSA Labs 7th Annual Computer Virus Prevalence Survey 2001. ICSA Lab, Upper Saddle River NJ, 7th ed. edition, 2001. [7] Meta Group. It performance engineering & measurement strategies: Quantifying performance loss. Technical report, Meta Group, 2000. [8] Winterthur. Un telefono cellulare rubato su due è un iphone. municato stampa, sep 2010. Co- http://www.axa-winterthur.ch/ It/chi-siamo/media/comunicati-stampa-2010/Documents/ 20100926-axawin-iphone_it.pdf. all the URLs reported in this bibliography have been last viewed in December 2010. 177 BIBLIOGRAPHY [9] Rory Cellan-Jones. Government calls for action on mobile phone crime, feb 2010. The government has called on the mobile phone industry to do more to protect handset owners against theft. [10] Lexton Snol. More smartphones than PCs by 2011. PC Ad- visor, August 2009 http://www.pcadvisor.co.uk/news/index. cfm?NewsID=3200338. [11] Larry Dignan. Smartphone operating systems: The market share, usage disconnect, may 2009. http://blogs.zdnet.com/BTL/?p=18730. [12] Paul Miller. Canalys: Android takes q2 smartphone market share lead in us with 886 percent year-over-year growth, aug 2010. http://www.engadget.com/2010/08/02/canalys-android-takes- q2-smartphone-market-share-lead-in-us-wit/. [13] Christy Pettey and Laurence Goasduff. Gartner says worldwide mobile device sales to end users reached 1.6 billion units in 2010; smartphone sales grew 72 percent in 2010. Gartner press release, Gartner Inc., February 2011. [14] International Telecommunication Union. Mobile cellular, subscriptions per 100 people, 2009. http://www.itu.int/en/pages/default. aspx. viewed 28th January, 2010. [15] George Reese. Database Programming with JDBC and Java, Second Edition, chapter Chapter 7: Distributed Application Architecture. O’Reilly & Associates, nov 2000. 178 BIBLIOGRAPHY [16] Rajkumar Buyya, Chee S. Yeo, and Srikumar Venugopal. Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities. Aug 2008. [17] Eric Knorr and Galen Gruman. What cloud computing really means. web, 09 2008. The next big trend sounds nebulous, but it’s not so fuzzy when you view the value proposition from the perspective of IT professionals. [18] Alessandro Acquisti, Elisabetta Carrara, Fred Stutzman, Jon Callas, Klaus Schimmer, Maz Nadjm, Mathieu Gorge, Nicole Ellison, Paul King, Ralph Gross, and Scott Golder. ENISA position paper no.1 ”Security issues and recommendations for online social networks”. ENISA, November 2007. Technical report, http://www.enisa.europa.eu/act/res/other- areas/social-networks/security-issues-and-recommendations-for-onlinesocial-networks/at download/fullReport. [19] Ann Chervenak, Vivekanand Vellanki, and Zachary Kurmas. Protecting file systems: A survey of backup techniques. In Joint NASA and IEEE Mass Storage Conference, 1998. [20] S. Agarwal, D. Starobinski, and A. Trachtenberg. On the scalability of data synchronization protocols for PDAs and mobile devices. IEEE Network, 16, 2002. http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.17.427. [21] Open Mobile Alliance. SyncML specifications, version 1.1, April 2002. http://www.openmobilealliance.org/tech/affiliates/ syncml/syncmlindex.html#V11. 179 BIBLIOGRAPHY [22] F. Dawson and T. Howes. RFC 2426 - vCard MIME Directory Profile. Netscape Communications, September 1998. http://www.ietf.org/ rfc/rfc2426.txt. [23] Marc Staimer. Why cloud backup & restore (bur) now! Technical report, Dragon Slayer Consulting, apr 2010. And How Procrastination Only Increases Risk. [24] Microsoft. Zmanda, software company enriches cloud-based backup solution with structured data storage. Technical report, Microsoft, gen 2009. [25] Michael Vrable, Stefan Savage, and Geoffrey M. Voelker. Cumulus: Filesystem Backup to the Cloud. ACM Transactions on Storage (TOS), 5(4), dec 2009. [26] Zhaohui Wang and Angelos Stavrou. Exploiting smart-phone usb connectivity for fun and profit. In Proceedings of the 26th Annual Computer Security Applications Conference, Austin, Texas, USA, 2010. ACM. [27] Marc-Olivier Killijian, David Powell, Michel Banâtre, Paul Couderc, and Yves Roudier. Collaborative backup for dependable mobile applications (extended abstract). In In Proceedings of 2nd International Workshop on Middleware for Pervasive and Ad-Hoc Computing (Middleware 2004, pages 146– 149. ACM Press, 2004. [28] V. Ottaviani, A. Lentini, A. Grillo, S. Di Cesare, and G. F. Italiano. Shared backup & restore, save, recover and share personal information into closed groups of smartphones. In 4th IFIP International Conference on New Technologies, Mobility and Security. IEEE, feb. 2011. 180 BIBLIOGRAPHY [29] Roy Thomas Fielding. Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine, 2000. [30] Roy Thomas Fielding and Richard N. Taylor. Principled design of the modern web architecture. ACM Transactions on Internet Technology, 2(2):115–150, may 2002. [31] Fabio Dellutri. Profiling Mobile Identities. PhD thesis, University of Rome ”Tor Vergata”, 2009. [32] F. Dellutri, L. Laura, V. Ottaviani, and G.F. Italiano. Extracting social networks from seized smartphones and web data. In Information Forensics and Security, 2009. WIFS 2009. First IEEE International Workshop on, pages 101 –105, dec 2009. [33] Fabio Dellutri, Vittorio Ottaviani, and Gianluigi Me. Forensic acquisition for windows mobile pocketpc. In Proceedings of the Workshop on Security and High Performance Computing Systems, HPCS 2008, Nicosia, Cyprus June 3-6, 2008, pages 200–205, 2008. [34] Rosamaria Bertè, Fabio Dellutri, Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Vittorio Ottaviani. Fast smartphones forensic analysis results through miat and forensic farm. International Journal of Electronic Security and Digital Forensics (IJESDF), Inderscience, 2008. [35] Rosamaria Bertè, Fabio Dellutri, Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Vittorio Ottaviani. Handbook of Electronic Security and Digital Forensics, chapter A Methodology for Smartphones Internal Memory Acquisition, Decoding and Analysis. Worldscience, 2008. 181 BIBLIOGRAPHY [36] Gianluigi Me and Maurizio Rossi. Internal forensic acquisition for mobile equipments. In IEEE Computer Society Press, editor, 4th Int’l Workshop on Security in Systems and Networks (SSN2008), Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2008. [37] Alessandro Distefano and Gianluigi Me. An overall assessment of mobile internal acquisition tool. Digital Investigation, 5(Supplement 1):S121–S127, 2008. [38] Michael Santarini. Nand versus nor. EDN, October 2005. [39] Microsoft. ce. Linear flash memory devices on microsoft windows http://www.microsoft.com/technet/archive/wce/plan/ flashce.mspx. [40] Microsoft. The windows ce 5.0 object store. http://msdn2. microsoft.com/en-us/library/ms885891.aspx. [41] Yost Scott. Why can’t i copy programs out of windows?, 2007. http://blogs.msdn.com/windowsmobile/archive/2007/12\ /29/why-can-t-i-copy-programs-out-of-windows.aspx. [42] F. Dellutri, V. Ottaviani, D. Bocci, G.F. Italiano, and G. Me. Data reverse engineering on a smartphone. In Ultra Modern Telecommunications Workshops, 2009. ICUMT ’09. International Conference on, pages 1 –8, oct 2009. [43] P. H. Aiken. Reverse engineering of data. IBM Systems Journal, 37(2):246– 269, 1998. [44] Chen and Associates. Reverse-DBMS (Access. 2.0) for Windows Reference Manual Version 3.0, 1994. 182 BIBLIOGRAPHY [45] Roger H. L. Chiang. A knowledge-based system for performing reverse engineering of relational databases. Decis. Support Syst., 13(3-4):295–312, 1995. [46] Kathi Hogshead Davis. August-ii: a tool for step-by-step data model reverse engineering. Reverse Engineering, 1995., Proceedings of 2nd Working Conference on, pages 146–154, Jul 1995. [47] Jean Henrard, Didier Roland, Anthony Cleve, and Jean-Luc Hainaut. Large-scale data reengineering: Return from experience. In WCRE ’08: Proceedings of the 2008 15th Working Conference on Reverse Engineering, pages 305–308, Washington, DC, USA, 2008. IEEE Computer Society. [48] Contacts Database (CContactDatabase). Symbian developer library. http://www.symbian.com/Developer/techlib/v70docs/ SDL_v7.0/doc_source/reference/cpp/ContactsModel/ CContactDatabaseClass.html. [49] Glenn E. Krasner and Stephen T. Pope. A cookbook for using the modelview controller user interface paradigm in Smalltalk-80. J. Object Oriented Program., 1(3):26–49, 1988. [50] Jerome Louvel and Thierry Boileau. Restlet in Action. Manning Early Access Program, 2011. [51] Noelios Technologies. Restlet, 2010. http://www.restlet.org/. [52] The Apache Software Foundation. Apache tomcat, 2010. [53] Douglas Schmidt. Pattern-oriented software architecture. Wiley, Chichester [England] ;;New York, 2000. 183 BIBLIOGRAPHY [54] Martin Fowler. Pojo, 2000. http://www.martinfowler.com/bliki/ POJO.html. [55] Richard D. Titus. Data is the new oil. Presentation: http://www. slideshare.net/rxdxt/data-is-the-new-oil, June 2010. [56] Vittorio Ottaviani, Alberto Zanoni, and Massimo Regoli. Conjugation as public key agreement protocol in mobile cryptography. In Sokratis K. Katsikas and Pierangela Samarati, editors, SECRYPT, pages 411–416. SciTePress, 2010. [57] Vittorio Ottaviani, Giuseppe F. Italiano, Antonio Grillo, and Alessandro Lentini. Benchmarking for the qp cryptographic suite. Technical report, University of Rome “Tor Vergata”, dept. of Informatics, Systems and Production, August 2009. [58] Antonio Grillo. TIMiD: Trasferring Identities on Mobile Devices. PhD thesis, University of Rome Tor Vergata, 2011. [59] Antonio Grillo, Alessandro Lentini, Vittorio Ottaviani, Giuseppe F. Italiano, and Fabrizio Battisti. Saved: Secure android value added services. In Proceedings of MOBICASE 2010 Conference, International Workshop on Mobile Security, 2010. [60] Tohari Ahmad, Jiankun Hu, and Song Han. An efficient mobile voting system security scheme based on elliptic curve cryptography. Network and System Security, International Conference on, 0:474–479, 2009. [61] Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Giuseppe F. Italiano. Transaction oriented text messaging with trusted-sms. In ACSAC, pages 485–494. IEEE Computer Society, 2008. 184 BIBLIOGRAPHY [62] Antonio Grillo, Alessandro Lentini, Gianluigi Me, and Giuliano Rulli. Trusted sms - a novel framework for non-repudiable sms-based processes. In Luı́s Azevedo and Ana Rita Londral, editors, HEALTHINF (1), pages 43–50. INSTICC - Institute for Systems and Technologies of Information, Control and Communication, 2008. [63] Eligijus Sakalauskas, Povilas Tvarijonas, and Andrius Raulynaitis. Key agreement protocol (kap) using conjugacy and discrete logarithm problems in group representation level. Informatica, 18(1):115–124, 2007. [64] Whitfield Diffie and Martin E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22(6):644–654, 1976. [65] Marco Bodrato. Personal communication, 2009. [66] Frank Celler and C. R. Leedham-Green. Calculating the order of an invertible matrix. In In Groups and Computation II, pages 55–60. American Mathematical Society, 1995. [67] PRNewswire. Rcs announces 2007 January-June trading data for the global cellular phone open market. Note, jul 2007. [68] NIST. Recommended elliptic curves for federal government use. Technical report, NIST, July 1999. [69] Eric Rescorla. Rfc 2631 - Diffie-Hellman key agreement method. Technical report, RTFM Inc., June 1999. [70] Elaine Barker, Don Johnson, and Miles Smid. NIST SP 800-56A - Recommendation for Pair-Wise Key Establishment Schemes Using Discrete Logarithm Cryptography. NIST, March 2007. 185 BIBLIOGRAPHY [71] Certicom Research. Standards for efficient cryptography - SEC 1: Elliptic curve cryptography. Technical Report 20, Certicom Corp., [email protected], September 2000. [72] M. Abundo, L. Accardi, and A. Auricchio. Hyperbolic automor- phisms of tori and pseudo-random sequences. Calcolo, 29:213–240, 1992. 10.1007/BF02576183. [73] E. Rescorla. Diffie-hellman key agreement method. RFC 2631, 1999. [74] FIPS. the official aes standard. FIPS PUB 197, 2001. [75] Kalle Kaukonen and Rodney Thayer. A Stream Cipher Encryption Algorithm ”Arcfour”. 1999. [76] Andreas Klein. Attacks on the rc4 stream cipher. Designs, Codes and Cryptography, 48:269–286, 2008. 10.1007/s10623-008-9206-6. [77] NIST. Random number generation, dec 2000. http://csrc.nist. gov/groups/ST/toolkit/rng/index.html. [78] NIST. Guide to the statistical tests, apr 2008. http://csrc.nist.gov/ groups/ST/toolkit/rng/stats_tests.html. [79] Pierre L’Ecuyer and Richard Simard. Testu01, oct 2009. http://www. iro.umontreal.ca/˜simardr/testu01/tu01.html. [80] Pierre L’Ecuyer and Richard Simard. TestU01. A Software Library in ANSI C for Empirical Testing of Random Number Generators. Departement d’Informatique et de Recherche Operationnelle Universite de Montreal, aug 2009. User’s guide, compact version. 186 BIBLIOGRAPHY [81] Jesse Burns. Developing secure mobile applications for android. Technical report, iSEC Partners, 2008. [82] Jesse Burns. Mobile application security on android. Technical report, Black Hat, 2009. Context on Android security. [83] Android developers. Security and permission, jun 2010. http://developer.android.com/guide/topics/security/ security.html. [84] Li Gong, Marianne Mueller, Hemma Prafullchandra, and Roland Schemers. Going beyond the sandbox: An overview of the new security architecture in the java development kit 1.2. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Monterey, California, dec 1997. [85] David Alex Lamb. Sharing intermediate representations: the interface description language. PhD thesis, Carnegie-Mellon University, Department of Computer Science, 1983. [86] F.Bachmann et al. Documenting software architecture: Documenting interfaces. Technical report, Sofware Enginerring Institute, Carniege Mellon, 2002. [87] Robert Kail. Human development : a life-span view. Wadsworth Cengage Learning, Australia ;;Belmont CA, 5th ed. edition, 2010. [88] David L. Altheide. Identity and the definition of the situation in a massmediated context. Symbolic Interaction, 23(1):1–27, 2000. 187 BIBLIOGRAPHY [89] Shanyang Zhao, Sherri Grasmuck, and Jason Martin. Identity construction on facebook: Digital empowerment in anchored relationships. Comput. Hum. Behav., 24(5):1816–1836, 2008. [90] Peter Mika. Bootstrapping the foaf-web: An experiment in social network mining, 2004. http://www.cs.vu.nl/˜pmika/research/ foaf-ws/mining.html. [91] Peter Mika. Flink: Semantic web technology for the extraction and analysis of social networks. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2-3):211–223, October 2005. [92] Marco Gaertler. Clustering. In Ulrik Brandes and Thomas Erlebach, editors, Network Analysis: Methodological Foundations, volume 3418 of Lecture Notes in Computer Science, pages 178–215. Springer, February 2005. http: //springerlink.metapress.com/content/19b5r48lqx3nx7gc. [93] Paul Jaccard. Etude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaudoise Sci Nat, 37:547–579, 1901. [94] Rudi L. Cilibrasi and Paul M. B. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370–383, March 2007. http://dx.doi.org/10.1109/TKDE.2007.48. [95] Ravi Kannan, Santosh Vempala, and Adrian Vetta. On Clusterings: Good, Bad, Spectral. Journal of the ACM, 51(3):497–515, May 2004. [96] P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering Large Graphs via the Singular Value Decomposition. Machine Learning, 56(1-3):9–33, 2004. 188 BIBLIOGRAPHY [97] E. Casalicchio, E. Galli, and V. Ottaviani. MobileOnRealEnvironment-GIS: A federated mobile network simulator of mobile nodes on real geographic data. In Distributed Simulation and Real Time Applications, 2009. DS-RT ’09. 13th IEEE/ACM International Symposium on, pages 255 –258, oct 2009. 189