The Virtual Faraday Cage

Transcription

The Virtual Faraday Cage
UNIVERSITY OF CALGARY
The Virtual Faraday Cage
by
James King
A THESIS
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF A MASTERS OF SCIENCE
DEPARTMENT OF COMPUTER SCIENCE
CALGARY, ALBERTA
AUGUST, 2013
c James King 2013
Abstract
This thesis’ primary contribution is that of a new architecture for web application platforms and their extensions, entitled “The Virtual Faraday Cage”. This new architecture
addresses some of the privacy and security related problems associated with third-party
extensions running within web application platforms. A proof-of-concept showing how
the Virtual Faraday Cage could be implemented is described.
This new architecture aims to help solve some of the key security and privacy concerns for end-users in web applications by creating a mechanism by which a third-party
could create an extension that works with end-user data, but which could never leak
such information back to the third-party. To facilitate this, the thesis also incorporates
a basic privacy-aware access control mechanism. This architecture could be used for
centralized web application platforms (such as Facebook) as well as decentralized platforms. Ideally, the Virtual Faraday Cage should be incorporated into the development
of new web application platforms, but could also be implemented via wrappers around
existing application platform Application Programming Interfaces with minimal changes
to existing platform code or workflows.
ii
Acknowledgments
I would first like to thank my supervisors Ken Barker and Jalal Kawash. Without their
guidance and mentorship, and their patience and support – I would not have completed
my program or produced this work. I have the utmost respect for both Dr. Barker and
Dr. Kawash as professors and supervisors, and I believe that anyone would be fortunate
to have their instruction and guidance.
While I did not complete my degree under them, special mention is deserved for my
original supervisors, Rei Safavi-Naini and John Aycock – who both gave me the initial
opportunity to come and study at the University of Calgary, and also the flexibility to
change my area of research afterwards.
I’d also like to thank my committee members, Dr. Gregory Hagen and Dr. Payman
Mohassel, as well as my neutral chair Dr. Peter Høyer. Both Dr. Hagen and Dr. Mohassel were very approachable during the final leg of my journey, and I appreciated their
examination of my work. Dr. Hagen’s feedback regarding Canadian privacy law was especially welcome, and I am happy to have expanded my thesis to address that specifically.
More generally, I’d like to thank the University of Calgary’s Department of Computer
Science – their other faculty members, their IT staff, the department chair Dr. Carey
Williams, as well as their office staff.
Acknowledgments are also deserved for all the support and training I received at
Florida Atlantic University and especially their Department of Mathematics and Center
for Cryptology and Information Security. Without the numerous people there that helped
shape and prepare me for graduate school, I would have never come to the University
of Calgary or pursued the path that I took. In particular, exceptional thanks should be
reserved for Dr. Rainer Steinwandt, Dr. Ronald Mullin, Dr. Spyros Magliveras, and Dr.
Michal Šramka.
It’s impossible for me to name everyone who has helped me along, but final thanks
should go to all my friends and family who have given me their support during my studies.
iii
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . .
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Premise . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Organization of this Thesis . . . . . . . . . . . . . . . . . . .
1.3 Background & Motivations . . . . . . . . . . . . . . . . . . .
1.3.1 Web Applications . . . . . . . . . . . . . . . . . . . .
1.3.2 Online Social Networks as a Specific Web Application
1.4 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Defining and Describing Privacy . . . . . . . . . . . .
1.4.2 Laws, Business, and the Value of Privacy . . . . . . .
1.5 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 The Value of Social Network Data . . . . . . . . . . .
1.5.2 Innate Risks, Threats, and Concerns . . . . . . . . .
1.6 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.1 Access Control and Information Flow Control . . . .
1.6.2 Sandboxing . . . . . . . . . . . . . . . . . . . . . . .
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Software and Web Applications . . . . . . . . . . . . . . . .
2.2.1 P3P and Privacy Policies . . . . . . . . . . . . . . . .
2.2.2 Better Developer Tools . . . . . . . . . . . . . . . . .
2.2.3 Empowering the End-User . . . . . . . . . . . . . . .
2.3 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Hiding End-User Data . . . . . . . . . . . . . . . . .
2.3.2 Third-Party Extensions . . . . . . . . . . . . . . . . .
2.4 Browser Extensions . . . . . . . . . . . . . . . . . . . . . . .
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Theoretical Model . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Formal Model . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Foundations . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Information leakage . . . . . . . . . . . . . . . . . . .
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Data URIs . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Platform
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ii
iii
iv
vii
1
1
3
4
4
5
7
8
11
22
22
28
33
34
36
38
40
40
41
41
45
46
46
46
49
50
52
54
54
55
55
65
72
73
73
76
76
v
4.2.2 Hashed IDs and Opaque IDs . . . . . . . . . . . . . . . . . . . .
4.2.3 Callbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.4 Seamless Remote Procedure Calls and Interface Reconstruction
4.3 Information Flow Control . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Application Programming Interfaces . . . . . . . . . . . . . . . . . . . .
4.5.1 Web Application Platform API . . . . . . . . . . . . . . . . . .
4.5.2 Third-Party Extension API . . . . . . . . . . . . . . . . . . . .
4.5.3 Shared Methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.4 Relationship with the Theoretical Model . . . . . . . . . . . . .
4.6 High-Level Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Accessing a Third-Party Extension . . . . . . . . . . . . . . . .
4.6.2 Mutual Authentication . . . . . . . . . . . . . . . . . . . . . . .
4.6.3 Privacy by Proxy . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7 Remote Procedure Calls . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.2 Protocol Requirements . . . . . . . . . . . . . . . . . . . . . . .
4.7.3 Requirement Fulfillment . . . . . . . . . . . . . . . . . . . . . .
4.7.4 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.5 Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.6 Serialized Data . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.7 Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.8 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8 Sandboxing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9 Inter-extension Communication . . . . . . . . . . . . . . . . . . . . . .
4.10 Methodology and Proof-of-Concept . . . . . . . . . . . . . . . . . . . .
4.10.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10.2 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10.3 Proof-of-Concept . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10.4 Formal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10.5 Example Third-Party . . . . . . . . . . . . . . . . . . . . . . . .
4.10.6 Facebook Wrapper . . . . . . . . . . . . . . . . . . . . . . . . .
4.11 Effects and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.11.1 A more connected web . . . . . . . . . . . . . . . . . . . . . . .
4.11.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Analysis & Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Comparisons and Contrast . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 PIPEDA Compliance . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Comparisons with Other Works . . . . . . . . . . . . . . . . . .
5.2 Time & Space Complexity . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Hashed IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
76
77
78
79
82
82
83
86
86
92
93
94
94
94
98
99
100
100
101
101
103
103
105
105
106
106
108
110
110
112
113
113
116
118
119
120
122
128
129
129
129
138
142
143
5.2.2 Opaque IDs . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.4 Access Control . . . . . . . . . . . . . . . . . . . . . . . .
5.2.5 Subscriptions . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.6 Sandboxing . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.7 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Personal Information Protection and Electronic Documents
5.3.2 Inter-extension Communication . . . . . . . . . . . . . . .
5.3.3 Proof-of-Concept . . . . . . . . . . . . . . . . . . . . . . .
5.3.4 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . .
5.3.5 High-Level Protocol . . . . . . . . . . . . . . . . . . . . . .
5.3.6 Time & Space Complexity Analysis . . . . . . . . . . . . .
5.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.2 Enhanced Support for Legal Compliance . . . . . . . . . .
5.4.3 Callbacks . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.4 Inter-extension Communication . . . . . . . . . . . . . . .
5.4.5 Time & Space Complexity and Benchmarking . . . . . . .
5.4.6 URI Ontology . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
Act
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
143
144
145
146
146
148
148
149
149
149
150
152
152
153
153
153
154
154
155
156
157
157
159
vii
List of Figures
3.1
3.2
A web application platform. . . . . . . . . . . . . . . . . . . . . . . . . . 55
An example of the generalization hierarchy for data s = h“December”, 14, 1974i 61
4.1
4.2
Internal and external realms of a platform. . . . . . . . . . . . . . . . . .
Comparison of traditional extension and Virtual Faraday Cage extension
architectures. Lines and arrows indicate information flow, dotted lines are
implicit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Virtual Faraday Cage modeled using Decentralized Information Flow
Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Information flow within the Virtual Faraday Cage. Dotted lines indicate
the possibility of flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Process and steps for answering a read request from a principal. Start and
end positions have bold borders, ‘no’ paths in decisions are dotted. . . . .
Process and steps for answering a write request from a principal. Start
and end positions have bold borders, ‘no’ paths in decisions are dotted. .
Process and steps for answering a create request from a principal. Start
and end positions have bold borders, ‘no’ paths in decisions are dotted. .
Process and steps for answering a delete request from a principal. Start
and end positions have bold borders, ‘no’ paths in decisions are dotted. .
Process and steps for answering a subscribe request from a principal. Start
and end positions have bold borders, ‘no’ paths in decisions are dotted. .
Process and steps for notifying subscribed principals when data has been
altered. Start and end positions have bold borders, ‘no’ paths in decisions
are dotted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Process and steps for answering an unsubscribe from a principal. Start
and end positions have bold borders, ‘no’ paths in decisions are dotted. .
Steps required for authorizing and accessing a third-party extension. . . .
The EMDB extension specifications . . . . . . . . . . . . . . . . . . . . .
Authenticating an incoming connection from a VFC platform . . . . . . .
Authenticating an incoming connection from a third-party extension . . .
The Lightweight XML RPC Protocol . . . . . . . . . . . . . . . . . . . .
Hypothetical prompt and example extension asking for permission to share
end-user data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Creating a datastore and some data-items in an interactive Python environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Applying projections on data-items in an interactive Python environment
Applying a transform on data-items in an interactive Python environment
Composing projections and transforms together in an interactive Python
environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Composing projections and transforms in an invalid way, in an interactive
Python environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
4.19
4.20
4.21
4.22
74
75
80
82
87
88
89
90
91
92
93
95
96
99
99
104
109
114
114
115
115
117
4.23 Creating a view in an interactive Python environment . . . . . . . . . . . 118
4.24 Creating and accessing privacy and write policies in an interactive Python
environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.25 Graph showing the potential connectivity between categories of web application platforms based on making extensions available from one platform
to another. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
viii
1
Chapter 1
Introduction
1.1 Premise
This thesis proposes the Virtual Faraday Cage (VFC), a novel architecture for web application platforms that support third-party extensions. The Virtual Faraday Cage advocates the idea that privacy can be completely preserved while still gaining some utility.
The Virtual Faraday Cage allows for fine-grained, end-user control over their own data,
forcing third-parties to disclose exactly what data they want and allowing end-users to
decide to what extent, if any, that data is to be released. The Virtual Faraday Cage
also allows for third-party extensions to view and process private end-user data while
ensuring that these extensions are not capable of revealing that information back to the
third-party itself.
To date, no known proposal exists that provides an architecture that allows for this
capability. This is done by combining traditional access control mechanisms, information
flow control, and privacy-policy meta-data into interactions between the platform and
third-parties. The Virtual Faraday Cage permits web application platforms to incorporate privacy-preservation into their systems – allowing for richer third-party extensions
without necessarily making large sacrifices of end-user privacy.
The Virtual Faraday Cage can be applied on both centralized and distributed web application platforms (e.g., peer-to-peer), as well as hybrid systems that use a combination
of both. While modern third-party extensions are treated as remote entities with few or
no local components, the Virtual Faraday Cage restructures these extensions into hybrid
systems: extensions with both remote and local code, that could still provide a seamless
experience for the end-user. Ideally, the Virtual Faraday Cage should be incorporated
2
into the development of new web application platforms, but could also be implemented
via wrappers around existing application platform APIs with minimal changes to existing
platform code or work-flows.
This thesis provides a theoretical model describing how web application platforms
would implement the Virtual Faraday Cage, and specifies an API for third-party extensions. A proof-of-concept implementation validating the Virtual Faraday Cage architecture is also described.
With the Virtual Faraday Cage, third-parties can develop extensions to web applications that add new functionality, while still protecting end-users from privacy violations.
For example, third-parties might be able to collect the necessary operating data from
their users, but be unable to view the social network graph even if it provides features
that utilize that graph. In the Virtual Faraday Cage, users can decide how much of their
personal information is revealed, if any, to third-parties.
The Virtual Faraday Cage enforces a strict information flow policy for an end-user’s
data when interacting with third-party extensions. Extensions are split into two components: a remote component and a local one. The remote component of a third-party
extension is the third-party’s primary interface with the web application platform, and
the only mechanism through which end-user data may be obtained by the third-party.
The local component of an extension is one that runs within a sandboxed environment
within the web-application platform’s control. This allows for composite third-party extensions that can perform sensitive operations on private data, while ensuring that the
knowledge of such data can never be passed back ‘upstream’ to the third-party itself.
When a third-party extension is first granted access to an end-user’s data, and during
all subsequent interaction – the platform ensures all private data that the extension
obtains be either explicitly authorized by the end-user, or authorized by an explicit
policy created by the end-user. End-user data would have explicit granularity settings
3
and conform to a subset of the privacy ontology [1] developed by the Privacy and Security
group1 at the University of Calgary. This allows the end-user, or an agent acting on their
behalf, to weigh the costs and benefits of revealing certain information to the application
provider, and also forces the extension provider to specify what end-user data is used
and state for what purposes the data is accessed or used. For example, an extension that
processes photos on a social network application may have to obtain permission from the
user for each access to that user’s photos.
Validation of the Virtual Faraday Cage system is aided by a Python-based proof-ofconcept implementation. The proof-of-concept involves a social-network web application
platform that allows for third-party extensions to be installed by end-users. The extension implemented is a movie rating and comparison tool, allowing users to save their
movie ratings to a third-party, and also compare their movie lists with their friends to
determine ‘compatibility ratings’. To better mirror reality, the third-party will request
some demographic information from the end-users (age, gender, location), but can neither
request nor be capable of gaining further information (such as the end-user’s friends).
1.2 Organization of this Thesis
The rest of this thesis is organized as follows: The remainder of this chapter provides
the background and motivations for this thesis, as well as some background needed in
privacy, social networks, and other aspects of computer science that are utilized by the
Virtual Faraday Cage. Chapter 2 describes related work: privacy work being done in
online social networks, as well as more specific works that relate to protecting end-user
data from third-parties. Chapter 3 describes the theoretical model which the Virtual
Faraday Cage uses and operates within. Chapter 4 describes the architecture, API,
1
The Privacy and Security group is a subset of the Advanced Database Systems and Application
(ADSA) Laboratory [2]
4
proof-of-concept, and implementation-specific information. Finally, Chapter 5 concludes
this thesis and discusses future work.
1.3 Background & Motivations
1.3.1 Web Applications
Web applications are a key part of our daily lives. They range from applications that
allow end-users to do online banking, to applications that facilitate social networking.
These applications run on a variety of web servers, and the vast majority of them are
located remotely from the end-users that utilize their services.
In the past, when end-users demanded a new feature for the application they were
using, developers had to implement the feature themselves and roll out a new version
of their application. As this is a resource-intensive task, many developers have allowed
their applications to be extended by using third-party extensions. These third-party
extensions, typically installed locally at the end-user’s own risk, would then be able to
interface directly with the original application and extend its features beyond what it
was originally bestowed. For many web applications, these extensions were installed by
the host/provider either as patches on the original web application’s source or as modules
that would run with the same rights and capabilities as the application itself.
For applications hosted in a distributed environment, where there could be many instances of the application running on many different servers (e.g., web forums), the risks
associated with such an extension architecture were not significant enough to warrant
radical changes. If a few providers of that application were compromised because of
malicious or poorly-written extensions, it would not affect the overall experience for all
providers and all end-users. However, with the need for extensions for centralized web
applications – where there is a single web site for all those accessing it – this model could
5
no longer work without modification. Here, developers would begin to market their web
applications as web application platforms, allowing third-parties to write extensions that
would run on top of them. While some require extensions to their web application platform to go through a vetting process [3] , other platforms have a more open model [4].
This new model for extensions allowed the web application platforms to permit the use
of third-party extensions, while limiting the direct security risks and resource demands
of these extensions. Instead, extensions are run remotely – hosted by third-parties –
and simply interface with the web application platform’s API to operate. Despite the
security advantages for the platform itself, this leads to increased risks to end-user privacy and data confidentiality, as end-user data must be transmitted to (or accessed by)
remote third-parties that cannot always be held to the same levels of trust, integrity, or
accountability as the original web application provider.
1.3.2 Online Social Networks as a Specific Web Application Platform
Online social networks (hereafter referred to as “social networks”) are a recent and immensely popular phenomenon, and they represent a specific class of web applications.
First defined as ‘social network sites’ by Boyd and Ellison in 2007 [5], they characterized
social networks by rich user-to-user interaction, typically including the sharing of media
(photos, videos) and the incorporation of relationships between users on the site (e.g.,
friends, or group memberships). According to Boyd and Ellison, such sites are defined
to be “web-based services that allow individuals to (1) construct a public or semi-public
profile within a bounded system, (2) articulate a list of other users with whom they share
a connection, and (3) view and traverse their list of connections and those made by others
within the system.” In particular, a social network typically provides web-space for a
user to create their own ‘profile’ (e.g., username, hobbies, photos, etc.) and tools to keep
in touch with ‘friends’ (other users) through the social network. While the definitive
6
social networks might be services such as Facebook, MySpace, or Google+, many other
web applications incorporate varying levels of social network capabilities. For example,
Amazon.com has some limited support for user profiles and interaction through reviews
and forums – but may not be what typically comes to mind when someone is referring
to a social network.
A conservative estimate of the total number of registered users of some of the top
social networks that operate in North America would put the number at over 579.4 million
accounts, with over 285 million active users [6, 7, 8, 9, 10, 11]. In China alone, the social
network and instant-messenger provider Téngxùn QQ [12] boasts over a billion registered
users, of which, more than 500 million are considered active accounts. Beyond allowing
users to keep in touch with current friends through messages, many social networks
are used to reconnect with old friends or classmates, maintain professional connections,
and keep up with the life events of friends through status updates and planned events.
Popular social networks range from being themed for professional use to blogs or anything
in between. While “a significant amount of member overlap [exists] in popular social
networks” [13], the sheer amount of user accounts and active accounts is a good indicator
of their prevalence and popularity.
Social networks are an important class of web applications with regards to privacy as
they can have millions of end-users and proportional amounts of sensitive and personal
data about those users. Because social networks often also provide capabilities for thirdparty extensions, this makes them web application platforms as well. While there are
other types of web applications that have both sensitive end-user data and potential
risks to that data through third-party extensions, the prevalence of social networking
makes them an ideal candidate for application of the Virtual Faraday Cage. Thus, social
networks are considered a motivating example of a web application platform that would
benefit from a system like the Virtual Faraday Cage.
7
1.4 Privacy
It is infeasible to try to provide a comprehensive description of privacy within a reasonable space in this thesis, when accounting for historical, political, and legal contexts.
Consequently, this section will only provide a brief overview of some of these aspects of
privacy, with a focus on how privacy affects individuals in the context of technology and
software.
Over the past decade there has been much work dedicated to the definition, preservation, and issue of privacy in software systems and on the web. The earliest definitions
of privacy, its laws and expectations, trace back to sources including religious texts. The
United Nations’ Universal Declaration of Human Rights [14] states in Article 12 that:
“No one shall be subjected to arbitrary interference with his privacy, family, home or
correspondence, nor to attacks upon his honour and reputation. Everyone has the right
to the protection of the law against such interference or attacks.” UNESCO’s Privacy
Chair (United Nations Educational, Scientific, and Cultural Organization), states [15]
that: “Data privacy technologies are about technically enforcing the above right in the
information society. [...] Unless direct action is taken, the spread of the information
society might result in serious privacy loss, especially in rapidly developing countries.
The lack of privacy undermines most of all other fundamental rights (freedom of speech,
democracy, etc.).”
As UNESCO points out, privacy helps protect individuals and their rights; protecting
them from discrimination or persecution based on information about them that would
or should not have been obtained otherwise. While laws can be put in place to punish
offenders, it is better to avoid the loss of privacy in the first place as it is impossible to
regain privacy once it is lost.
8
1.4.1 Defining and Describing Privacy
The Privacy and Security (PSEC) group2 at the University of Calgary, provides “A
Data Privacy Taxonomy” [1] to frame the technological definition of privacy. The work
presented by the group aims to address the definition of privacy and identify its major
characteristics in a manner which can be implemented within policies, software, and
database systems. The Virtual Faraday Cage abides by this taxonomy.
They identify four distinct actors within the realm of privacy: data providers (endusers), data collectors (web application platform), the data repository (web application
platform), and data users (third-party extensions). They also identify four distinct ‘dimensions’ of privacy: visibility (who can see the data), granularity (how specific or general
it is), retention (how long can they see it for), and purpose (what can it be used for).
In their paper, they propose five main ‘points’ along the visibility-axis within their
privacy definition: none, owner (end-user), house (platform), third-party, and all (everyone). Similarly, they propose four main ‘points’ along the granularity-axis within their
privacy definition: none, existential, partial, and specific. They also have an explicitly
defined purpose-axis, as well as a retention dimension that specifies the duration of the
data storage and/or under what conditions it expires.
In privacy, purpose signifies what a particular piece of data is used for. For example,
an end-user’s email address could be used as their account name, it could be used to
send them notifications, or it could even be shared with other parties looking to contact
that user. Within this taxonomy, purpose represents what a particular piece of data is
allowed to be used for. By specifying what purposes data can be used for, this allows
for web application platform providers and third-parties to be held accountable for how
they use end-user data. Furthermore, it may also be possible to explicitly enforce certain
2
The Privacy and Security group is a part of the Advanced Database Systems and Application
(ADSA) Laboratory [2]
9
purposes – for example ensuring that if certain data from a patient database is only for
use in a particular treatment, a violator could be detected if that treatment was not
followed through after the data was accessed. Typically, purpose is represented as a
string, although other structures (e.g. hierarchical structures) can be used as well.
Guarda and Zannone [16] define purpose to be the “rationale of the processing [of enduser data], on the basis of which all the actions and treatments have to be performed.
[...] The purpose specifies the reason for which data can be collected and processed.
Essentially, the purpose establishes the actual boundaries of data processing”.
They define ‘consent’ as “[the] unilateral action producing effects upon receipt that
manifests the data subject’s volition to allow the data controller to process [their] data”
[16]. In the context of the Virtual Faraday Cage, consent means that the end-user has
agreed to release their data to be stored and processed by the web application platform.
Consent also [usually] implies that the end-user has agreed to do this for a given purpose.
Guarda and Zannone also note that under privacy legislation, consent can be withdrawn,
and systems implementing privacy should account for this.
Obligations are requirements that the receiving party must abide by in order to store
or process end-user data. In Guarda and Zannone’s paper, they define obligations as
“[conditions] or [actions] that [are] to be performed before or after a decision is made.”.
More concretely, if there has been consent from an end-user for a online storefront web
application platform to store their email address for the purpose of notifying them of
changes in their orders or updates on order delivery – then there is an obligation for that
web application platform to do so. Guarda and Zannone’s work suggests that obligations are difficult to define specifically, as they may be described in both quantitative
and qualitative terms: an obligation could be based on time, money, or as an order of
operations.
10
Retention, as also noted in Barker et al.’s taxonomy [1], refers to how long (or until
what conditions) end-user data may be stored. According to Guarda and Zannone, this
is explicitly time based [16], however retention could also manifest in a way more similar
to an obligation: after the purpose for using a particular data item has been satisfied,
that data could be removed.
The Virtual Faraday Cage borrows the notions of granularity and visibility from the
PSEC taxonomy [1]. Further, it addresses all the points along the granularity axis by
using projections and transforms, and by using the properties of the Virtual Faraday
Cage API, which explicitly prohibits testing for the existence of data as a way of revealing information. Visibility is addressed within the Virtual Faraday Cage, although no
distinction is made between ‘third-party’ and ‘all/world’ due to security and collaborative adversary concerns. Limited support for the purpose axis exists within the Virtual
Faraday Cage, however it is considered unenforceable and exists purely to help inform
the end-user. Retention is omitted and considered unenforceable because the Virtual
Faraday Cage cannot police a third-party to ensure that all copies of end-user data are
deleted when they are promised to be. However, storing retention requirements could be
easily done within the Virtual Faraday Cage. Both purpose and retention are considered
unenforceable by the Virtual Faraday Cage because of the assumption that a third-party
cannot be trusted and that there would be no software or algorithmic method of simultaneously providing a third-party machine with end-user data, and being capable of
ensuring that the data was only used for the specific purposes claimed, and that this
data was properly erased after the retention period ends.
11
1.4.2 Laws, Business, and the Value of Privacy
The value of protecting privacy has long been felt by the privacy and security community –
however, while the value of using private data to further business interests is obvious, the
value of protecting privacy has evidently not been considered as valuable by the business
community. One of the first papers to make a direct and articulated argument for the
protection of privacy within software from a monetary perspective was by Kenny and
Borking [17], articulating the value of privacy engineering at the Dutch Data Protection
Agency. In this paper, the authors begin by arguing why privacy makes sense as a
bottom-line investment within software products. Among those arguments lie customer
trust as well as legal and compliance requirements that currently exist as well as those
that are likely to develop in the future.
Kiyavitskaya et al. [18], highlighted the importance of legal compliance requirements
as well as the difficulties associated with developing systems and applications that are
compliant. Taken into the wider context of designing privacy-aware systems, their work
would seem to suggest that flexible privacy-aware systems are desirable in that they could
be adapted to fit current as well as potentially future privacy and legal requirements.
Thus, developing these systems would also reduce the amount of work needed when
future privacy legislation becomes enacted.
In the current environment of cloud computing, the growing significance of various
national and international privacy laws becomes relevant to businesses seeking to grow
or connect with customers internationally [19]. Fundamentally, it is easier and more cost
effective to build privacy and security into software and systems during their development rather than attempting to adjust or change things afterward. By using privacy
engineering in the design and development process, a corporation can avoid the potential
of future incurred costs and expand their legal capabilities of operating internationally.
12
Privacy violations can both destroy customer trust and result in significant legal damages and fees, making proactive privacy engineering even more attractive. In 2011, the
first financial penalty was enacted on a Maryland healthcare provider for $4.3 million in
damages [20] for being in violation of the Health Insurance Portability and Accountability
Act (HIPAA). This serves to further illustrate the importance of privacy law compliance
and a comprehensive understanding of privacy when building a system in the first place.
In Canada, the Personal Information Protection and Electronic Documents Act (PIPEDA)
[21] was enacted into law in 2000. PIPEDA’s purpose was to “establish [...] rules to govern the collection, use and disclosure of personal information in a manner that recognizes
the right of privacy of individuals with respect to their personal information and the need
of organizations to collect, use or disclose personal information for purposes that a reasonable person would consider appropriate in the circumstances”. PIPEDA covers both
medical and health-related information, as well as personal information. Health-related
information about an individual (whether living or deceased) is defined as:
“(a) information concerning the physical or mental health of the individual;
(b) information concerning any health service provided to the individual;
(c) information concerning the donation by the individual of any body part
or any bodily substance of the individual or information derived from the
testing or examination of a body part or bodily substance of the individual;
(d) information that is collected in the course of providing health services to
the individual; or
(e) information that is collected incidentally to the provision of health services
to the individual.”
13
Personal information is defined as “information about an identifiable individual”,
though this specifically excludes a person’s name, title, address, and phone number at a
business.
PIPEDA establishes that organizations must have valid and reasonable situationallydependent purposes for collecting personal information, and that the collection of personal information must be done with an individual’s consent. The only exception to this
would be situations where the collection is “clearly in the interests of the individual and
consent cannot be obtained in a timely way”, situations where the collection is “solely
for journalistic, artistic, or literary purposes”, situations where the information is public,
or situations where the collection is otherwise mandated by law.
PIPEDA also establishes limited situations when an organization can utilize personal
information without the consent of the individual – in particular, during emergencies
that threaten the individual in question, or if the organization has reasonable grounds to
believe that the individual may be breaking domestic or foreign laws (and the information
is used for the purpose of investigation), or if the information is used for statistical or
scholarly purposes where anonymity is ensured and the Privacy Commissioner is notified
before the information is used. Additionally, organizations can utilize personal information without explicit consent where this information is publicly available, or if the
information was collected as required by law or is clearly in the benefit of the individual
and consent cannot be obtained in a timely fashion. Otherwise, PIPEDA compliance
requires that individuals give consent to the use of their personal information.
PIPEDA mandates that, with few exceptions (e.g., information made to a notary in
the province of Quebec), an organization may only disclose personal information without
the consent of the individual if: 1) such disclosure is required by law, 2) if the disclosure is
made to a government entity because the organization has reasonable grounds to suspect
that the individual in question may be in breach of laws or that the information may
14
relate to national security, 3) if the disclosure is made to protect the individual in the
event of an emergency, 4) if the disclosure is made for scholarly or statistical studies
that would otherwise be unachievable without the disclosure, 5) if the disclosure is made
to a historical or archival organization and the purpose of the disclosure was for the
preservation of the information, 6) made either after 100 years, or 20 years after the
individual has died – whichever is earlier, or 7) if the information was publicly available.
PIPEDA compliance also requires that organizations make available to an individual
their personal information records, so long as either the individual’s health or safety is
threatened by withholding or such information is severable from third-party information
or if the third-party consents to such disclosure. Individuals must make these requests
in writing, and PIPEDA requires that organizations follow through with these requests
and provide assistance (if necessary) in a timely fashion – either making these records
available to the individual or otherwise notifying them of the need for a time extension
within thirty days after the date of the request. Additionally, organizations must be
able to provide alternative formats for personal information records, e.g., to those with
sensory disabilities. Finally, an individual also has the right to request that organizations
inform the individual about any disclosures of their information to government and law
enforcement entities (e.g., under a subpoena, request for information, etc.) – but the
organization must first notify the government entities immediately and in writing before
any response is made back to the individual in question, and the organization must get
authorization from the government entity first.
PIPEDA also grants exceptions to requiring the granting of access to an individual’s
information to any parties in situations where “the information is protected by solicitorclient privilege, or where doing so would reveal confidential commercial information, or
where doing so could reasonably be expected to threaten the life or security of another
individual, or where the information was generated in the course of a formal dispute
15
resolution process, or where the information was created for the purpose of making or
investigating a disclosure under the Public Servants Disclosure Protection Act.”
Should an individual feel that an organization is violating or ignoring a recommended
course of action defined in one or more parts of PIPEDA, that individual can write
a written complaint addressed to the Privacy Commissioner. A complaint that results
from the organization refusing to grant an individual access to their personal information
must be filed within six months after the refusal or after the time-limit for responding
to the request. Afterward, if the Commissioner believes there are reasonable grounds to
investigate an organization, the Commissioner may escalate the complaint and begin an
investigation. Alternatively, assuming that the complaint was filed in a timely manner,
if the Commissioner believes that the individual should first exhaust other reasonable
procedures or believes that the complaint should be dealt with under more suitable laws,
then the Commissioner will not conduct an investigation.
In the course of an investigation, PIPEDA allows for the Privacy Commissioner to
audit the personal information management policies of organizations attempting to comply with the law. This may be done by the Commissioner directly, or under certain
circumstances, or officers and employees to whom the auditing is delegated to. The
Commissioner can also summon and enforce the appearance of organization representatives and force these representatives to give and produce evidence and testimony under
oath on matters that the Commissioner considers necessary to investigate the complaint.
The Commissioner can also, at any reasonable time, enter any commercial premises of the
organization it is investigating and converse in private with any person on the premises as
well as examine or obtain copies of records and evidence obtained on premises in relation
to the ongoing investigation.
16
Should the Privacy Commissioner determine, during the investigation, that there is
insufficient evidence, or that the complaints are either trivial, frivolous, vexatious, or
made in bad-faith, the Commissioner may choose to suspend the investigation. Similarly, should the organization in question provide a fair and reasonable response to the
complaint, or if the matter is already under investigation or already reported on, the
current investigation may be suspended.
Following an investigation, complaints may be escalated to dispute resolution mechanisms, and the Commissioner will issue a public report within a year after the complaint
is filed. This report will include the Commissioner’s findings and recommendations, any
settlements that were reached by the parties, and, where appropriate, a request that the
organization in question notify the Commissioner (within a specified time frame) of any
actions that they will take to address the Commissioner’s findings and recommendations,
or rationale as to why they did not. Following the report publication, the individual may
apply for a court hearing regarding any matter of which the complaint has been made.
Additionally, the Commissioner may, with the individual’s consent, appear on behalf of
the individual in a court hearing. The court may then order the organization to alter its
practices to comply with the Privacy Commissioner’s recommendations, publish a notice
of any action(s) taken or to be taken, and award damages to the individual – though the
court is not limited to these actions.
While PIPEDA violations do not inherently carry administrative monetary penalties
directed at the organization in question, there are individual fines of up to $100,000
CAD for any individual who obstructs the office of the Privacy Commissioner in the
course of an investigation and audit. Additionally, the results of both receiving public
scrutiny, potential disruptions to business practices as investigations may be carried out
on premises, as well as potential criticism and poor public image are all negative effects
that many businesses will benefit from avoiding. As PIPEDA also makes assurances to
17
whistle-blowers and employees of the organization in question, relying on the secrecy of
business practices to avoid scrutiny is also a poor method to avoid a potential PIPEDA
investigation. Furthermore, future changes to laws may bring about monetary penalties
to organizations for violations of PIPEDA or equivalent laws – so proactive compliance
may be an economical option.
PIPEDA places the importance of health and safety as paramount directives above all
other compliance requirements – with the potential exception of an individual’s request
for information. PIPEDA also makes provisions for the use of electronic written documents rather than necessitating physical documents, and it establishes standards and
guidelines for doing so. It also establishes that an organization must be responsible for
personal information under its control, and that it should designate one or more individuals who are responsible for that organization’s compliance with PIPEDA’s principles,
the core of which are: 1) Accountability, 2) Identifying Purposes, 3) Consent, 4) Limiting
Collection, 5) Limiting Use, Disclosure, and Retention, 6) Accuracy, 7) Safeguards, 8)
Openness, 9) Individual Access, and 10) Challenging Compliance.
Recently, the Office of the Privacy Commissioner of Canada published a report [22]
regarding an on-going complaint [23] filed initially in 2008 by the Canadian Internet
Policy and Public Interest Clinic (CIPPIC) under PIPEDA against Facebook. Within
the findings, the report stated:
“The complaint against Facebook by the Canadian Internet Policy and Public
Interest Clinic (CIPPIC) comprised 24 allegations ranging over 11 distinct
subjects. These included default privacy settings, collection and use of users’
personal information for advertising purposes, disclosure of users personal
information to third-party application developers, and collection and use of
non-users personal information.
18
[...]
The central issue in CIPPIC’s allegations was knowledge and consent. Our
Office focused its investigation on whether Facebook was providing a sufficient
knowledge basis for meaningful consent by documenting purposes for collecting, using, or disclosing personal information and bringing such purposes to
individuals attention in a reasonably direct and transparent way. Retention
of personal information was an issue that surfaced specifically in the allegations relating to account deactivation and deletion and non-users personal
information. Security safeguards figured prominently in the allegations about
third-party applications and Facebook Mobile.
[...]
On four subjects (e.g., deception and misrepresentation, Facebook Mobile),
the Assistant Commissioner found no evidence of any contravention of the
Personal Information Protection and Electronic Documents Act (the Act)
and concluded that the allegations were not well-founded. On another four
subjects (e.g., default privacy settings, advertising), the Assistant Commissioner found Facebook to be in contravention of the Act, but concluded that
the allegations were well-founded and resolved on the basis of corrective measures proposed by Facebook in response to her recommendations.
On the remaining subjects of third-party applications, account deactivation
and deletion, accounts of deceased users, and non-users personal information, the Assistant Commissioner likewise found Facebook to be in contravention of the Act and concluded that the allegations were well-founded. In
these four cases, there remain unresolved issues where Facebook has not yet
agreed to adopt her recommendations. Most notably, regarding third-party
19
applications, the Assistant Commissioner determined that Facebook did not
have adequate safeguards in place to prevent unauthorized access by application developers to users personal information, and furthermore was not doing
enough to ensure that meaningful consent was obtained from individuals for
the disclosure of their personal information to application developers.”3
Since then, Facebook has made changes to their privacy policies and practices, including the addition of better default privacy settings and a “privacy tour” feature –
however despite these changes, CIPPIC filed another complaint [24] in 2010 expressing
their dissatisfaction with Facebook’s response and indicating that they felt that many core
concerns they had were not addressed by these changes – including the lack of support
for fine-grained and granular control over end-user data when shared with third-parties.
PIPEDA is only one example of privacy legislation – Europe’s Data Protection Directive [25], Directive of Privacy and Electronic Communications [26], and recently passed
General Data Protection Regulation [27] (effective 2016), along with the United Kingdom’s Data Protection Act [28] all exist as examples of important privacy regulation.
While the United States has the Privacy Act of 1974 (and its amendment, the Computer Matching and Privacy Act of 1988) [29], as well as the Electronic Communications
Privacy Act of 1986 [30], both laws can be criticized for being “out-of-date” and lax in
comparison to existing and new laws in Europe and Canada. Despite this, the nature
of future US laws may change, and organizations and companies seeking to do business
with Canada and Europe will need to earn their trust by providing a higher standard for
end-user privacy. Consequently, there is an existing, and growing, body of legal directives
and regulations that should motivate parties to ensure that privacy protection is built
into their systems and practices.
3
[22], pg. 3
20
Building privacy into applications and systems is advantageous from a sales perspective as well. A product with built-in ‘privacy’ can be more attractive to end-users or
customers, and could potentially fulfill applicable to more business use-cases than the
same product without privacy features. Additionally, a corporation that establishes a
‘brand’ that consistently uses privacy engineering in their products and cares about customer (or end-user) privacy becomes a ‘trusted brand’. A trusted brand gains additional
customer loyalty, the ability to charge more for their products, and also is more capable
of protecting itself against gaining a negative reputation. All of this ultimately means
that protecting privacy is good for the bottom line.
Kenny and Borking also cover more aspects of privacy and businesses in their paper
[17], discussing privacy-related legislature around the world before describing a framework
for software engineering (for software agents) that embeds privacy risk management into
the actual development process.
Guarda and Zannone [16] also address privacy laws and regulations, and how they are
affecting (and should affect) software. This paper was briefly discussed in the beginning
of this chapter with regards to defining and describing privacy. First and foremost, their
work is an in-depth examination of the legal landscape in Europe and the United States
with regards to privacy legislation. They also define what privacy-aware systems should
incorporate – for example, purpose, consent, obligations, etc. In their paper, Guarda and
Zannone mirror Kenny and Borking [17], pointing out some of the benefits and penalties
of abiding by or breaking privacy regulations. The remainder of Guarda and Zannone’s
work describes privacy engineering, privacy policies, privacy-aware access control, and
other aspects of privacy technology and legislature as they relate to software and systems
design and development [16]. While the Virtual Faraday Cage was not designed explicitly
to fulfill privacy legislature requirements, using the Virtual Faraday Cage may assist web
application platform providers in conforming to such legislature because of the ability to
21
fine tune privacy settings and restrict a third-party’s access to end-user information.
Gellman’s [19] work on privacy and confidentiality considers risks in the “clouds”, he
defines cloud computing as the “sharing or storage by [end-]users of their own information
on remote servers owned or operated by others.”. He goes on to define a ‘cloud service
provider’, which “may be an individual, a corporation, or other business a non-profit
organization, a government agency or any other entity. A cloud service provider is one
type of third-party that maintains information about or on behalf, another entity”.
Gellman’s report primarily considers a cloud provider’s terms of service and privacy
policy as well as applicable laws (primarily in the United States). From his point of view,
all parties are considered to be honest, that is to say, they abide by their own terms of
service and privacy policies. Even with this assumption, many risks to end-users still
exist, for example the fact that legal protections for end-users may cease for users that
utilize cloud services – alternatively, an end-user may also inadvertently break the law
by using a cloud provider.
The Virtual Faraday Cage assumes that the use of a given web application platform is
legal and that the platform is well-behaved. While Gellman defines all cloud providers as
third-parties, the Virtual Faraday Cage does not consider all cloud providers necessarily
as third-parties in all situations. Additionally, for the purposes of this thesis, thirdparties are not considered necessarily honest, and the threats to privacy are considered
only from third-parties providing extensions to that web application platform. Gellman’s
work reinforces the viewpoint that the less a third-party knows to perform its functions,
the better.
22
1.5 Social Networks
As stated earlier in this chapter, social networks are a recent and popular phenomenon
with millions of users. Social networks also contain meaningful and accurate data regarding their end-users, and there are many motivated parties that are or would be extremely
interested in acquiring this data. Unfortunately, social networks also have inherent risks
and dangers – both from within and outside the social network. Because of the immense
popularity, value, and dangers of social networks, they are excellent and motivating examples of web application platforms that have high-stakes for third-parties and for millions
of end-users’ personal and private information. This is why the Virtual Faraday Cage
utilizes a social network web application as its implementation proof-of-concept. This
section examines the evidence and arguments for these claims.
1.5.1 The Value of Social Network Data
Social networks are full of meaningful and accurate data [31]. Google+, Facebook, MySpace, LinkedIn, Bebo, LiveJournal, etc. all support profile pictures, if not personal photo
albums - where photos can usually be “tagged” to specify who else is in a picture and
what they look like. MySpace profiles can contain such potentially sensitive information
as sexual orientation, body type and height, while Facebook profiles will typically contain at least a user’s real name. Google+, Facebook, MySpace, and LinkedIn profiles can
contain information about past and ongoing education and work experience - as well as
location information. Because the very purpose of a social network is so that users can
keep up with their friends and also keep their friends up to date about themselves, the
data kept on them can be assumed to be current. Consequently, this makes the potential
incentives for obtaining end-user data very high, and the potential consequences of data
leakage very severe.
23
Many social network providers make no secret that they are generating revenue from
advertising through their system. Both Facebook [32] and MySpace [33] allows advertisers
to target their audience by attributes such as geographic location, age, gender, education,
relationship status, and so on. Targeted advertising is not a new phenomenon and has
existed for decades. It is also one of the more effective marketing measures available. In
the past, individuals (or corporations) known as “list brokers” would produce a list of
people of interest for a price so that a salesperson or company would be able to market a
product or service to a more interested audience. While this list could be something as
‘innocuous’ as comprising of people living within a certain area, the lists could include
such things as individuals suffering from a certain disease (e.g., asthma). Considering the
richness of data and the relative ease of availability, one must presume that advertisers
are very interested in using social networks for targeted ads.
The first paper to identify social networks as an extremely effective venue for marketing was by Domingos and Richardson [34] in 2001. They point out that the knowledge
of the underlying social network graph is of great value and usefulness in marketing. By
utilizing knowledge of the underlying network structure, marketers could become more
effective – for example, spending more money on ‘well-connected’ individuals and less
on others. They postulated the theory that well-connected customers have a stronger
influence on their peers, which in turn leads to higher sales. In essence, the common sense
that if a celebrity supports a particular product, others will buy it too has been formally
stated by these authors. Domingos and Richardson essentially set up the premise that
social networks, and their underlying graph structure in particular, are valuable.
In addition to targeted advertising, viral marketing (or “network marketing”) may
also be of strong interest to advertisers [34, 35], since this allows them to reach a wide
audience while targeting only a small number of individuals. If advertisers were able to
send targeted ads to ‘influential’ consumers (e.g., those with a large number of friends),
24
they might be able to reach a wider audience for a much lower cost when those influential
consumers re-share or post the advertisement. As such, the structure of a social network
would likely be of significant interest to advertisers as well. Malicious parties are also
interested in social networks: evident through the number of worms, phishing scams,
and account hijackings that have taken place on sites such as Facebook and MySpace
[36, 37, 38, 39, 40]. Additionally, social-predators have also been shown to be interested in social networks and have used them in the past for their own purposes such as
stalking [41, 42, 43, 44, 45]. Jagatic et al. [46] reaffirms this by demonstrating a very
effective attack on end-user authentication credentials by using social network topology
information. ‘Phishing’[47] is a type of security attack that manipulates an end-user into
believing that the attacker is actually someone else (e.g., a bank or email provider), and
lures the end-user into ‘logging-in’ with their account credentials. In this paper, Jagatic
et al. demonstrated that the use of a social network’s topology greatly increased the
effectiveness of phishing scams. In their experiment, the authors ran two tests – one
with random individuals contacting others to lure them to a fake website, and another
where the individuals were asked to go to that website by people they knew. The authors
found that ‘social phishing’ increased the effectiveness of phishing by up to 350% over a
traditional phishing scam.
Insurance companies also would likely find social network data to be very useful for
their current and future needs. Individuals who have photos of rowdy drunkenness, update their social network status via mobile device while claiming to be driving, or are a
member of an “alternative lifestyle” may be of interest to these companies when deciding on what rates to set for health, auto, and other insurance policies. In particular if
insurance companies will become as prevalent as suggested by the Discovery channel’s
documentary “2057” [48], then they may very well want to know everything about you,
something that a social network can help provide. For example, one woman has ap-
25
parently lost her insurance benefits as a result of photos posted on Facebook [49]. The
founder and editor-in-chief of Psych Central [50] wrote a 2011 article cautioning against
anyone discussing their health or well-being on social networks because such information
could be used against them, possibly wrongly, to deny them their health or insurance
benefits. In a Los Angeles Times [51] article, a senior analyst for an insurance consulting
firm was interviewed:
“Mike Fitzgerald, a Celent senior analyst, said life insurance companies could
find social media especially valuable for comparing what people will admit
about lifestyle choices and medical histories in applications, and what they
reveal online. That could range from ‘liking’ a cancer support group online
to signs of high-risk behavior. ‘If someone claims they don’t go sky diving
often, but it clearly indicates on their online profile that they do it every
weekend they can get away,’ Fitzgerald said, ‘that would raise a red flag for
insurers.’ Social media is ‘part of a new and emerging risk to the insurance
sector’ that could affect pricing and rating of policies in the future, said Gary
Pickering, sales and marketing director for British insurer Legal & General
Group. But many insurance lawyers decry such practices and warn of a future
when insurance companies could monitor online profiles for reasons to raise
premiums or deny claims.”
Policy enforcers, such as school officials, an employer, or law enforcement, are also
interested in social network usage and data. Photographs showing individuals doing
illicit activities (speeding [52] or drugs [53], for example) could be used as evidence to
aid in prosecution by law enforcement - photographs of a party involving alcohol on
a dry campus could be used as evidence for disciplinary action by university officials
[54, 55, 56, 57] - and similarly, leaks regarding confidential work-related topics could be
26
used for disciplinary action by one’s employers.
Gross and Acquisti’s [58] work examines the behavior of people within social networks. Specifically, the authors examined the profiles of 4,000 Carnegie-Mellon students
to determine the types of information they disclosed to Facebook and their usage of privacy settings. More than 50% of the profiles revealed personal information in almost all
categories examined by their study (profile image, birthday, home town, address, etc.).
Furthermore, when examining the validity of profile names and identifiably of profile
images, they found that 89% of the profiles examined appeared to have real names, and
55% appeared to have identifiable images. Since then, public backlashes have followed
Google’s [59, 60] and Facebook’s [61, 62, 63] respective decisions to publicize data that
users had expected to be private, providing evidence that privacy is becoming a more
important issue for the general public. However, despite this evidence that the general
populations’ awareness of privacy issues has increased, the study serves to reinforce the
idea that social networks continue to be a rich source of accurate and detailed information
on the majority of their end-users.
Gross and Acquisti also published a second paper [31] that conducted a survey of
students at their university regarding the students’ use of Facebook and their attitudes
towards privacy. The authors then compared the survey results with data collected from
the social network before and after the survey. In the survey, questions such as “How
important do you consider the following issues in the public debate?” and “How do
you personally value the importance of the following issues for your own life on a dayto-day basis?” were asked, with respondents filling in their responses to subjects such
as “Privacy policy” on a 7-point scale. Privacy policies ranked 5.411 and 5.09 on the
scales for public debate and personal life, respectively, which was ranked higher than
‘terrorism’ in both cases. Note that in all categories, subjects ranked the importance of
public debate of privacy higher than personal importance on average.
27
It should also be noted that over 45% of those surveyed rated the highest level of
worry to “A stranger knew where you live and the location and schedule of the classes
you take”, as well as 36% for “Five years from now, complete strangers would be able to
find out easily your sexual orientation, the name of your current partner, and your current
political views”. For both scenarios, the average ratings were above 5 (5.78 and 5.55,
respectively). For non-members of Facebook, the survey revealed that the importance of
‘Privacy policy’ was higher: 5.67 for non-members versus 5.3 for members. After using
statistical regression on their data and accounting for age, gender, and academic status
(undergrad, grad, staff, faculty), the authors discovered that an individual’s privacy
attitude can be a factor in determining membership status on Facebook. Specifically,
this factor is statistically significant within non-undergraduate respondents, but does
not appear to be statistically significant among undergraduates – even those with high
privacy concerns.
Furthermore, their study found that the information provided to Facebook, when provided, was overwhelmingly accurate for every category of personal information provided
(birthdays, home number, address, etc.) over 87% of the survey takers who provided
such information stated that the information provided was accurate and complete. The
authors also state that, “... the majority of [Facebook] members seem to be aware of
the true visibility of their profile - but a significant minority is vastly underestimating
the reach and openness of their own profile.” The authors also discovered that “33%
of [the] respondents believe that it is either impossible or quite difficult for individuals
not affiliated with an university to access [the Facebook] network of that university...
But considering the number of attacks ... or any recent media report on the usage of
[Facebook] by police, employers, and parents, it seems in fact that for a significant fraction of users the [Facebook] is only an imagined community.” Finally, examining profiles
post-survey revealed no statistically significant change between the control group and
28
the experimental group, suggesting that the survey had little effect on an individual
respondent’s privacy practices on Facebook.
Acquisti and Gross’ studies, along with that of Jagatic et al., and Domingos and
Richardson, all serve to reinforce the motivations of third-parties to use social network
information, as well as the importance of keeping such information on end-users private.
The use of private information by third-parties can reap immense benefits – both for
‘legitimate’ businesses as well as those purely malicious. On the other hand, end-users
do value privacy immensely, even if in practice many may choose a less privacy-conscious
service in exchange for certain features unavailable otherwise. Additionally, services that
incorporate greater control of end-user privacy may benefit from increased adoption (or
less rejection) by more privacy-conscious end-users, a point that was also argued in the
previous section.
1.5.2 Innate Risks, Threats, and Concerns
Risks and threats to end-users of social networks range in scope and severity as well as
the vectors through which they can affect end-users. A large body of work [58, 31, 46,
64, 65, 66] has been published discussing the various risks and dangers associated with
social networks. While some threats are innate to the nature of online social networks,
or risks associated with security breaches of such sites – other threats come from other
users or third-party extensions that end-users utilize. While social networks such as
Facebook may require these third-parties to provide privacy policies [67], there is no
way to ensure compliance or to what standards private data is protected. Publicized
instances of risks that end-users have faced from the revelation of their data to other
parties range from job loss to criminal prosecution, and in more extreme cases, stalking
and death. Additionally, other reports have indicated that end-users could be denied a
job application or a school admission on the basis of information obtained from social
29
networks. Some of the end-user data can be obtained from publicly viewable pages for the
members of many social networks, which in many cases may contain more information
than their members would prefer to have listed [66]. Some private data may also be
accessible through alternative channels [66] or be leaked through friend (or friends-offriend) connections or by individuals whom have had their accounts hijacked [38, 39, 40].
Data can also be leaked through third-party extensions [68, 69, 65], and potentially even
the social network itself [70]. Gross and Acquisti showed in their research where they
studied a sample of Carnegie-Mellon University students on Facebook that 7% of the
sampled women and 14.5% of the sampled men were vulnerable to real-world stalking,
while 77.7% of the students were vulnerable to online stalking. Other threats that have
been alluded to in previous research work [58] include the potential for social networks
to be used as aides for identity theft. Numerous stories have hit the news of banks and
employees losing laptops with unencrypted ‘data’ – given the richness of the data on social
networks, this could easily lend a helping hand in performing such theft: social networks
may include such data as a date of birth and hint at a person’s birth city, parents’ names,
or other information. While the number of publicized incidents remains in the extreme
minority when compared to the massive number of users, their severity should motivate
further safeguarding of end-user data. These risks and concerns help illustrate why
social networks are an excellent and motivating example of web application platforms
with third-party extensions that need better privacy protection for end-users.
Rosenblum [64] presents an overview of the various threats to end-users social networking sites are presented. Rosenblum calls social networking sites ‘digital popularity
contests’ where “success [is] measured in the number of unsolicited hits to one’s page”,
but contends that social networks such as Facebook are “much truer to the model of
[real-life] social networking”. In one social networking site, MySpace, a user reportedly
stated to the New York Times that she would accept any and all friend requests that she
30
receives. Rosenblum notes that ‘stealth advertisers’ are also utilizing social networks as a
form of advertising, creating profiles for self-promotion and connecting with end-users to
further that goal. In fact, and more disturbingly, MySpace users have utilized automated
friend-requesting scripts to try and maximize their number of friends [71].
Rosenblum also highlights that end-users of such social networks often have a presumption of privacy that does not really exist. Consequently, what they say or post on
their profiles (images of underage drinking or drug usage are ‘commonplace’) can drastically affect their future employment or even academic careers. Furthermore, this is not
limited to just illegal behavior or activities, but also other aspects of an individual’s presumed private life. For example, making unflattering comments about your employers or
customers [72], or having a personal profile with ‘unacceptable content’ while employed
as a teacher [73]. There are numerous other examples ranging from likely poor judgment
on the part of the end-user [74] to seemingly overly-harsh responses by employers [75].
Two security and privacy risks are identified by Rosenblum: ‘net speech and broad
dissemination’ and the ‘unauthorized usage [of end-user data] by third-parties’. He notes
that unlike real-world situations (e.g., with friends at a bar), speech on a social network is
stored, harvested, and analyzed – and is more akin to “taking a megaphone into Madison
Square Garden each time [you] typed in a message.” Rosemblum continues, stating that:
“[Users] are posturing, role playing, being ironic, test-driving their new-found
cynicism in instantaneously transmitted typed communications or instant
messages. And all this on a medium that does not record irony [...]. The
valence of language that allows tone to control meaning is lost. [...] [And] as
the media has learned with sound bites, limiting the context of an utterance
can radically distort its meaning. [...] What these social networks encourage
is a culture of ambiguous signifiers which the reader is left to interpret. If a
reader happens to be a hiring officer, this can have disastrous results.”
31
Because the records of our online interactions can often be permanent records, this
makes the safeguarding of such data all the more important.
Rosenblum also notes that corporations routinely utilize search engines to do background checks on prospective employees and often review online social networks to see
what these users post online. Beyond employers, Rosenblum also acknowledges the threat
of marketing firms that seek to gain access to social networking sites and their data. An
online social network purchased by another company (e.g., MySpace purchased by News
Corp.), or a telecommunications carrier that alters its privacy policy to state that it owns
the digital content of email traffic could result in disastrous ramifications for end-user
privacy. As Rosenblum states, “[News Corp.] could claim ownership of and exploit the
content of MySpace, either using personal information in any way it saw fit or selling the
right to use it to others.” Worse yet, there have been many documented cases of sexual predators (as Rosenblum states) or even murders committed where the victims were
picked, discovered, or otherwise stalked with the aid of social networking sites [41, 45].
Wu et al. [76] analyzed the privacy policies of several social networks (Facebook,
LinkedIn, MySpace, Orkut, Twitter, and Youtube) and linked them back to the PSEC
Privacy Taxonomy [1] developed earlier by Barker et al. The authors also extended the
category of visibility to ‘friends’ and ‘friends-of-friends’, differentiating between thirdparties outside of the social network and other users inside the social network. The
authors also separated end-user data into different categories: ‘registration’ (personally
identifiable and unique across the entire social network), ‘networking’ (friends or contacts), ‘content’ (end-user data, profile information, etc.), and ‘activity’ (web server logs,
cookies, and other information).
32
All of the social networks analyzed would reserve the right to use the collected data
for any purpose. Visibility for registration data was confined to the social network itself,
however network and content data was visible at least to ‘friends’ and at most to the
entire world (or anyone registered on the social network). Activity data, on the other
hand, was visible both to the social networks as well as, in four out of the six examined
social networks, third-parties. The granularity for all data categories was almost always
specific, the exceptions being LinkedIn and Twitter, which would use aggregate activity
data. Finally, retention is not mentioned in any of the examined privacy policies and can
be assumed to be indefinite, with the exception to legal compliance issues (e.g., end-user
under the age of 13).
Bonneau et al. [66] examined how private or sensitive data could be obtained from an
online social network without end-users’ knowledge. This paper primarily examines how
data can be obtained from social networking sites such as Facebook, and how such data
leaks out in the first place. The authors demonstrate how such data can be obtained:
through public listings, creation of false profiles (spies) on the network, profile compromise and phishing, and malicious extensions to the web application platform. Additionally, limitations of the Facebook Query Language (or more generally: web application
provider APIs) can also leak sensitive information. In particular, they demonstrated how
the Facebook Query Language returning the number of records in a query is a form of
leaking some information.
Proofpoint Inc. highlighted concerns of businesses and corporations with regards to
loss of private or sensitive information through email and other means, including social
networks [77]. Their report was based off a survey they conducted across 220 US-based
corporations which each had over 1000 employees. The report showed that over 38%
employed people to monitor outbound email content, and over 32% had employees whose
exclusive duty was to monitor outbound email content. Nearly half of the corporations
33
surveyed with over 20,000 employees had people who monitored outbound email content.
While these numbers are for outbound email, the survey also discovered that between
40-46% of the firms were concerned with blogs, social networks, and similar activities.
They found that over 34% of corporations said that their business was affected by information released through some means (email, social networking, etc..), and that over
45% of businesses had concerns over the potential leakage of information through social
networking sites in particular. At least two specialized companies exist4 that operate on
social networks in efforts to combat ‘bad PR’ for its clients - no doubt such businesses
are interested in data stored on social networks as well. While [confidential] data loss
prevention is outside the scope of this thesis, this report further highlights the very real
necessity of improving end-user privacy to prevent the inadvertent leakage of sensitive
information.
1.6 Security
While privacy and security are dependent fields, this section will consider aspects of
security unrelated to privacy. Because of the vast nature of the field, this section will
only provide a limited overview of selected topics in security as they pertain to the Virtual
Faraday Cage. The Virtual Faraday Cage makes use of topics in security such as access
control, information flow control, and sandboxing so this section covers the necessary
background.
4
Reputation.com and Zululex [78, 79]
34
1.6.1 Access Control and Information Flow Control
Access control is an aspect of computer and information security that deals with models
and systems for controlling access to resources. Information flow control is an area of
information theoretical research that is concerned with the leaking or transmission of
information from one subject to another within a system. Denning [80] first defined it
as: “Secure information flow,’ or simply ‘security,’ means here that no unauthorized flow
of information is possible.”
The “principle of least privilege” is a well known and established guiding principle in
the design of secure systems. First proposed by Saltzer [81] in 1974, it is described as a
rule where, “Every program and every privileged user of the system should operate using
the least amount of privilege necessary to complete the job.” The Principle of Least
Privilege ensures that the impact of mistakes or the misuse of privileges are confined.
The Virtual Faraday Cage conforms to these principles and avoids the pitfalls of granting
third-party extensions access to overly-powerful APIs. Instead, third-party extensions are
only capable of being granted access to an intentionally simple API, one with fine-grained
access controls that facilitate conforming with the principle of least privilege.
One of the most famous access control models that implements a rudimentary form
of information flow control was the Bell-LaPadula model [82]. The original model was
intended for use in government systems where classification and declassification of information, and the access to such information needed to be regulated. In their model,
there existed four levels of classification: unclassified, confidential, secret, and top-secret.
Subjects at a given classification level could not ‘write down’ – that is, write to data
that was at a lower classification – nor could they ‘read up’ – that is, read data from a
higher classification. Their model also had support for labels in the form of categories
– for example a security level X could represent ‘top-secret: NASA, USAF’, indicating
that only someone with an equivalent or ‘dominating’ clearance (e.g., ‘top-secret: NASA,
35
NATO, USAF’) could access such documents.
Decentralized Information Flow Control, first proposed by Myers and Liskov [83], is
an information flow control model that makes use of multiple principals and information
flow labels to control how information is used and disseminated within an application.
In their model, each owner of a particular data item can choose what other principals
can access that data, and only the owners can ‘declassify’ information by either adding
more principals to read access or by removing themselves as an owner. Principals that
can read data can relabel the data, but only if that relabeling makes the data access
more restrictive.
Decentralized Information Flow Control is a natural model to choose when considering
the enforcement of privacy and private data from an end-user point-of-view. The ability
for each owner to specify the information flow policy for their data is a concept that
is readily applied into an environment where each end-user may have differing privacy
preferences for their data.
Papagiannis et al. [84] demonstrates how information flow control can be used in
software engineering as a new paradigm for enforcing security in a collaborative setting.
They describe the usage of Decentralized Information Flow Control to accomplish this,
building a system called DEFCon (Decentralized Event FLow Control). In their system,
events consist of a set of ‘parts’ protected by a DEFCon label. Any unit that accesses
confidential data will have to abide by the labels associated with that data, restricting its
ability to communicate with ineligible units that do not already have access to that data.
This allows software engineers to enforce privacy in an environment which incorporates
untrusted components.
36
Futoransky and Waissbein [85] implemented the ability to add meta-data tags which
related to privacy scopes to PHP variables. Their system helped keep data private
by allowing developers to restrict information flow based on the tags corresponding to
variables, and also by allowing for end-users with an appropriate Firefox extension to see
which forms on a web page correspond to what privacy scopes.
In a similar and more recent development, the National Security Agency recently
published an Apache Incubator project proposal, now full Apache project [86] entitled
“Accumulo”, which is a database management system based on Apache Hadoop but
designed after Google BigTable. Accumulo incorporates the use of cell-level labels that
can be enforced in database query calls. While there are obvious applications of this to
the military and environments that need such fine-grained access-control, the use of such
labels for privacy specifications is an obvious and natural application of this ability.
1.6.2 Sandboxing
Sandboxing was first introduced as a term by Wahbe et al. [87] in 1993. They used the
term to describe a method by which they could isolate untrusted software modules to a
separate memory address space, such that if a fault occurred it could be detected and kept
isolated from the rest of the software. Later, Goldberg et al. [88] defined sandboxing as
“the concept of confining a helper application to a restricted environment, within which
it has free reign.” As a security mechanism, sandboxing allows a host system to confine
untrusted code to a ‘sandbox’ where it would be unable to damage the host system even if
it contained malicious code. The Virtual Faraday Cage requires a sandboxing mechanism
to operate, and this section covers some of the related work specific to web applications
and sandboxing.
37
Maffeis and Taly [89] provided theoretical proofs that ‘safe’ [and useful] subsets of
JavaScript exist and can be used for third-party content. Currently, embedding thirdparty content within a web page can lead to all types of security issues, even if an
effort is made to filter third-party content for malicious JavaScript code. The authors
examined Yahoo! AD-Safe and Facebook’s FBJS as part of motivating examples as to
why current methods for filtering and rewriting JavaScript code may not be sufficient to
ensure security in this setting.
The authors identified ‘sub-languages’ of JavaScript that had certain desirable properties, called ‘Secure JavaScript Subsets’, which had the property that code written in
these subsets would be restricted from using certain JavaScript objects with privileged
abilities or functionalities, or belonging to a different JavaScript application. The authors
presented three examples of practical JavaScript sub-languages which restrict usage of
property names outside of the code but can still be used in meaningful ways.
Another project that seems to be a parallel effort to Maffeis and Taly’s research is the
Google Caja [90] project. The Google Caja project takes as input JavaScript code written
in a subset of the JavaScript language and rewrites it. The new code is then restricted to
local objects only, with exception of any other objects it is explicitly granted access to at
run-time. In this way, Google Caja provides security through an Object-Capability access
control mechanism, allowing websites the ability to decide what capabilities third-party
JavaScript code can have access to.
38
1.7 Summary
Web application platforms are ubiquitous and often contain valuable and sensitive enduser data. Web application platforms also allow for third-parties to add new features and
functionality to these platforms by creating extensions. Unfortunately, current practices,
architectures, and methodology require that end-user data be shared with third-parties
in order for third-party extensions to be capable of interacting with and processing enduser data. While this may not necessarily raise concerns for all types of web application
platforms and all types of end-users, there can be situations where it would. Social
networks are a specific class of web application platform that contain high amounts of
personally-identifiable, specific, and valuable information. As explained in Section 1.5,
this data is highly sought after by many diverse parties for many different reasons –
and the impact of end-user data being made available to the wrong parties can result
in consequences as extreme as job loss [72, 75] or even death [45]. As a result, social
networks are a compelling type of web application platform to scrutinize with regards to
end-user privacy.
To begin to introduce this thesis’ contributions, this chapter has provided an overview
of the underlying aspects of privacy and security necessary to discuss them. This overview
is provided in Section 1.4 and 1.6 respectively. The Virtual Faraday Cage borrows from
the data privacy [1] presented by Barker et. al, using much of their vocabulary to
define privacy and to drive if and how sensitive end-user data can be shared with thirdparties. The basics of access control, and an introduction to information flow control
were introduced in Section 1.6.1. Finally, sandboxing – or “confining [an] application to
a restricted environment” – is introduced in Section 1.6.2. The Virtual Faraday Cage
requires sandboxing in order to function, and it is important that sandboxing is both
theoretically possible as well as practical and widely available. Fortunately, both are
39
true.
The next chapter moves beyond fundamentals in privacy, security, and social networks; it describes the current research landscape and what existing work attempts to
address privacy issues with regards to web application platforms and social networks.
Specifically, while this chapter focused on setting the basis for what social networks are
and what key challenges exist in research, the next chapter will focus on proposed solutions and approaches to addressing privacy in this area as well as highlighting the
current gap in research that this thesis aims to fill.
40
Chapter 2
Related Work
This chapter presents related research work and publications that address the problems
of privacy in web application platforms, in addition to what was discussed in Chapter 1.
2.1 Overview
The vast majority of related work addressing privacy on the web and within web applications has considered privacy policies. These works examine privacy policy agreements,
negotiations, and consider the use of them as an enforcement mechanism. Better tools
and mechanisms that incorporate this have also been made available to developers. Another approach has been to empower the end user by allowing the end-user to make a
more informed decision about what information they reveal to a website based on the
privacy policy for that site. While the research in this area is both valuable and informative, it is insufficient to address the problems posed by misbehaving, malfunctioning,
or dishonest parties.
Specific to social networks, work has been done on hiding end-user information from
the social network provider, making such information available only to other end-users
using a browser extension who have also been given access to the information. However,
such solutions prohibit the interaction of end-users with third-party extensions in the first
place. Other work has addressed social network extensions specifically, and provided a
useful foundation for the Virtual Faraday Cage. Similarly, research on browser extension
security can also be applied to web application extensions.
41
The previous chapter has already provided an overview of some related research in
the context of introducing and motivating the Virtual Faraday Cage. In this chapter,
the majority of the related work is examined in more detail.
2.2 Software and Web Applications
2.2.1 P3P and Privacy Policies
The classic approach to confronting privacy problems on the web has been to attack the
problem from a privacy policy standpoint, where a privacy policy is a document stating the types of data collected and the internal practices of the organization collecting
the data. The majority of research that has been done in this area has dealt specifically with issues relating to Web Services, which are services designed explicitly to be
interoperable with each other and typically are not ‘front-end’ applications with which
end-users interact. A recurring theme in much of the research is the idea of comparing
privacy-policies to each other or to customer privacy-preferences, or negotiating a new
set of policies/preferences between the parties.
The World Wide Web Consortium (W3C) first published the Platform for Privacy
Preferences Project (P3P) in 1998 [91]. P3P was the first concerted effort from a standardization body that was also supported by key players in industry such as IBM and
Microsoft. P3P is a protocol that allows websites to express their privacy policies to
end-users and their web browsers, and for users to express their privacy preferences to
their web browser. When an end-user using a P3P-compliant web browser connects to a
P3P-compliant website, they are notified if the website’s P3P policy conflicts with their
own privacy preferences. Microsoft quickly adopted P3P in 2001 [92] in Internet Explorer
6.
42
While P3P was an important step forward in bringing better awareness of privacy concerns to both end-users and businesses, it relies on the assumptions that privacy promises
made by web sites are enforceable and will be kept. However, this may not always be
the case: the promising website may misbehave, or even be hijacked by malicious parties, and new owners may disregard the privacy policies they previously had established.
Karjoth et al. [93] identified this problem in P3P, noting that “internal privacy policies
should guarantee and enforce the promises made to customers.” In other words, a P3P
policy is worthless unless it accurately reflects the internal practices of the organization
in the first place. In their paper they proposed a system to translate on-going business
practices that were written down in E-P3P (Platform for Enterprise Privacy Practices
[94], an access control language for privacy) into a P3P policy that could be delivered to
end-users. This would allow for honest parties to translate their real business practices
into P3P policies that they could keep.
Ghazinour and Barker’s [95] develop a lattice-based hierarchy for purposes in P3P.
Ghazinour and Barker argue for the use of lattices as the logical choice for establishing
a ‘purpose hierarchy’ as there is always a more general and a more specific purpose with
the obvious extremes being ‘Any’ and ‘None’. Beyond representing purpose or even other
privacy related hierarchies, lattices should be an obvious choice for many access-control
related hierarchies (e.g., groups, and so on).
Rezgui et al. [96] identified and defined many aspects of privacy within Web Services,
such as: ‘user privacy’, ‘data privacy’, and ‘service privacy’ – and the concepts of ‘service
composition’, where one Web Service is combined with another to form a ‘new’ Web
Service from the perspective of an end-user. In the paper, they propose a system which is
supposed to enforce privacy at what is essentially a third-party end-point, but their design
does not address hostile environments and does not address ‘composite Web Services’ –
which could be considered as a more general case of web applications with third-party
43
extensions. Despite this, the definitions established within their paper are applicable in
both unaddressed scenarios.
They define ‘user privacy’ as a user’s privacy profile, which consists of their privacy
preferences with regard to their personal information per ‘information receiver’ and per
‘information usage’ (the purpose of using that data). They define ‘service privacy’ as a
comprehensive privacy policy that specifies a Web Service’s usage, storage, and disclosure
policies. Finally, ‘data privacy’ is defined as the ability for data to “expose different views
to different Web Services”.
Another approach to dealing with privacy problems was suggested by Dekker et al.
[97] in the form of a project proposal that explores the concept of using licenses to manage
the dissemination and ‘policing’ of private data. In their approach, licenses are written in
a formal language and can be automatically generated from privacy requirements. Sublicenses can also be derived from a parent license, and actions can be evaluated relative
to a specific license. In this way, the users of private data can utilize these licenses
to enforce privacy policies themselves – or alternatively the end-users can enforce their
privacy through legal means.
Mohamed Bagga’s [98] thesis proposes a framework for privacy-[preserving] Web Services by incorporating the Enterprise Privacy Authorization Language (EPAL) as the
enforcement mechanism for privacy. Bagga also considers the problem of comparing
privacy policies, and provides an algorithm for doing so.
In his introduction, Bagga further elaborates on the types of use-cases for private data
exchange. Bagga describes business-to-customer (B2C) and business-to-business (B2B)
scenarios for private data usage. In a “data submission” B2C scenario, the business
requests sensitive data from the customer. This means that the customer should verify
that the privacy practices of the business are compatible with their privacy preferences
before submitting their data. In a “data disclosure” B2C scenario, a customer requests
44
sensitive data from the business. In this scenario, the business must evaluate the request
against their privacy policy to determine whether or not that data will be released. In a
“data request” B2B scenario, a business asks another business for sensitive data. In that
scenario, the disclosing business will compare privacy policies before deciding whether or
not to release the data. The last scenario described is the “data delivery” B2B scenario,
where one business wants to deliver sensitive data to another business. In essence, this is
simply the reverse case of the “data request” scenario, where the delivering party initiates
the transaction and also must verify compatible privacy policies. As the Virtual Faraday
Cage is not primarily concerned with privacy policies or privacy-aware access control, the
rest of Bagga’s work is not considered further here.
Xu et al. [99] follows a similar approach. In their paper, the authors examine how
a composite web service could be made privacy-preserving through comparing privacy
policies to an end-user’s privacy preferences. They give several examples of composite
Web Services (e.g., travel agency, comparison shopping portal, and medical services) and
suggest how certain aspects of a composite Web Service could be made unavailable (in
an automated way) to certain customers to protect those customers’ privacy preferences.
Another trend is the notion of negotiating privacy requirements between entities. For
example, a customer at an online store could reveal their date of birth in exchange for
special deals on their birthday or other discounts. Khalil El-Khatib [100] considers the
negotiation of privacy requirements between end-users and Web Services. In this paper,
the end-users either reveal additional information or expands the allowed usage of their
private data in exchange for promised perks. Benbernau et al. [101] builds on the idea of
negotiating privacy requirements between entities by defining a protocol where this can
be accomplished in a dynamic way and potentially change the privacy agreements over
time – a term they coined as ‘on-going privacy negotiations’. Daniel Le Metayer [102]
advocates using software agents to represent an end-user’s privacy preferences and the
45
management of their personal data.
Luo and Lee [103] propose a ‘privacy monitor’, but this requires a trusted central
authority which would store all the private data of everyone else, thus simply moving
the privacy problems to another entity. However, they do identify additional risks and
concerns - namely, the ability for an adversary to aggregate data over a single or multiple
social networks (or other websites) in an effort to breach the privacy of an individual
who has tried to be careful about their data revelation.
All of the work in P3P and privacy policies assumes that the promising party is honest,
which can be a troublesome assumption as the only recourse for dishonest parties are
lawsuits.
2.2.2 Better Developer Tools
Another approach to addressing privacy concerns in software and web applications is
to provide developers with better tools. To better facilitate the management of privacy
policies and the protection of private data, Hamadi et al. [104] propose that the underlying Web Service protocol should be designed with privacy capabilities built into it. In
their paper, they identify key aspects of privacy policies that should be ‘encoded’ into
the Web Service model itself. To do so, they specify what aspects of privacy should be
considered when designing a web service protocol (e.g., data, retention period, purpose,
and third party specifications), specify a formal model for a ‘privacy aware Web Service
protocol’, and they describe the tool that they created to help developers describe their
Web Service protocol with privacy capabilities in mind.
Another paper that was similar to Futoransky et al.’s [85] research mentioned in the
previous chapter was Levy and Gutwin’s [105] work. Levy and Gutwin anchored P3P
policy specifications to specific form fields on a website. This would allow for end-users
(or software agents) to better understand and easily identify which P3P clauses applied
46
to which fields.
2.2.3 Empowering the End-User
Another avenue of research has been to empower the end-user to make better decisions
for themselves regarding information disclosure and software or service usage. Tian et al.
[106] attempt to assign privacy threat values to websites, allowing end-users to decide if
they are willing to risk their privacy to use a particular site. They propose a framework
through which privacy threats can be assigned values by end-users and evaluated so that
end-users can decide if they want to proceed with the dissemination of their information
on a given web site or application. Overall, their approach is interesting, but it does not
solve the underlying problem of how to gain utility from a third-party website without
necessarily disclosing private information in the first place. This paper views privacy
from the point-of-view of a cost/benefit approach.
2.3 Social Networks
2.3.1 Hiding End-User Data
Guha et al. [107] advocate a mechanism called “none of your business” (NOYB) by
which end-users do not store their real data on a social network, instead storing random
plausible (but in reality, encrypted) data on the site and then using a web browser
extension to decrypt/encrypt values.
NOYB is a privacy-preservation mechanism that utilizes a novel approach to providing
anonymity for end-users to third-parties, other end-users, and even the web application
platform itself. In essence, NOYB facilitates the ability for users to create their own ’subnet’ of the service, where the real interaction would take place. This is done by utilizing
encryption which substitutes real values of user-data with that of other plausible data.
47
NOYB implemented a novel idea which involved partitioning a user’s profile into
‘atoms’ such as {(First Name, Gender ), (Last Name), (Religion)}, and swapping them
pseudo-randomly with other atoms from databases called “dictionaries”. These dictionaries would also be capable of growing over time as new entries are added to them.
Users with the right symmetric key would be able to re-substitute back the original data
and in effect decrypt a profile. However, some atoms could potentially reveal user information that otherwise would not make any sense and the steganographic storage for
additional key information may defeat the stealthy approach. For example, combining
atoms to reveal a profile of “Mohammed Mironova”, “Male”, and “Hindu” could indicate
false information as it is unlikely that any such individual exists. Furthermore, only user
attributes are protected and not communication.
One of the limitations of NOYB is that it does not account for how communication
between end-users can be achieved, or how legitimate end-user data can be shared with
third-parties (or even the application provider) for legitimate and useful features and
capabilities.
Additional limitations may include the ability to detect steganographic hidden data,
as well as the challenges associated with using NOYB to send messages or wall posts in
an encrypted way. Furthermore, key-revocation and data updates with a new key are
not explained in detail in the paper, and re-encrypting (and changing) stuff such as a
user’s gender and name – while possible in the real world – should certainly defeat the
purpose of being stealthy, in particular if this happens more than once.
Additionally, while NOYB and similar projects may be able to bypass detection by
web application providers, the use of NOYB may still constitute a violation of the platform’s terms of service – as NOYB essentially piggybacks on Facebook to provide its own
social network experience.
48
Similar to NOYB, Luo et al. [13] propose FaceCloak, which seems to build on and
improve NOYB’s idea. Here, the web application platform has no access to encrypted
information as such information is completely stored on a third-party website. Consequently, the concerns about detection that the previous related work identified are no
longer applicable to FaceCloak. However, the benefits and other criticisms remain the
same.
Lucas and Borisov [108] propose flyByNight, where they incorporated a web application extension that could encrypt/decrypt messages using a public key cryptosystem. flyByNight is an interesting exercise in creating a public-key message encryption/decryption
scheme within Facebook. Unfortunately, as the authors admit, their system is not applicable in a hostile environment. This is to say, if Facebook really wanted to decrypt the
messages that they were sending through their third-party application, then Facebook
could easily eavesdrop and do so. Alternatively, if Facebook were compromised, this could
also happen. The authors contend that some mitigation may be found by establishing a
legal case against Facebook for decrypting messages; however this approach avoids this
key challenge by making it someone else’s responsibility. However, if one does not trust
the web application provider to begin with, why should one trust the web application
provider not to tamper with flyByNight?
Baden et al. [109] propose Persona as an approach to protecting privacy in social networks. In this paper, the authors propose using an Attribute-Based Encryption (ABE)
approach to protecting end-user data in a social network. In their introduction, they
correctly point out some of the issues with social networks (such as Facebook), in particular the legal agreement which has statements such as, ”[users] grant [Facebook] a
non-exclusive, transferable, sub-licensable, royalty-free, worldwide license to use any [Intellectual Property] that [users] post on or in connection with Facebook.” Persona uses a
distributed data-storage model, and requires end-users have browser extensions that can
49
encrypt and decrypt page contents.
2.3.2 Third-Party Extensions
Felt and Evan’s [65] work is the most directly related work to the Virtual Faraday Cage.
They correctly identify that one of the biggest threats to end-user privacy within web
application platforms – in particular social platforms like Facebook and the OpenSocial
API – are third-party extensions.
Felt and Evans discover that the vast majority of Facebook extensions (over 90%)
have access to private data that the extensions do not even claim to need. They also
identify the types of applications by category, without explicitly labeling any as ‘junk’
or ‘spam’ applications. Despite this, essentially 107 of 150 applications do not provide
explicitly new or novel features to Facebook and the authors are right to conclude that
they do not and should not have access to private data. They go further, stating that
extensions should not be given access to private data at all, and that private data can be
isolated by the use of placeholders, essentially achieving a ‘privacy-by-proxy’. For example, “<uval id=’’[$self]’’ field=’’name’’/>” could be a substitute for displaying
the end-user’s name: the HTML code would be replaced by “John Doe” or whatever
the name field should be. The authors concede that while this method does not suffice
for more sophisticated extensions, it is sufficient for the ‘majority’ of extensions. The
authors state that if a more sophisticated extension needs end-user data, it should simply
prompt the user – in effect, forcing the end-user to fill out another profile on a third-party
website.
50
In a separate work, Felt et al. [69] address the underlying mechanics and problems of
embedding third-party content (in particular, extensions) into a web application platform.
They point out that existing Same-Origin-Policies [110] for code execution are insufficient
for practical use because they are too restrictive: the parent page has no access to an
inline frame’s content. On the other hand, directly embedding a third-party’s scripts
presents a significant risk for exploits, as non-standardized code-rewriting can lead to
costly oversights. In the paper, the authors propose that browser vendors (and, indirectly,
standards bodies like the W3C) provide a new standard for isolating untrusted code from
the rest of the Document Object Model (DOM). This isolation would allow one-way
interaction from a parent to a child.
Recent developments seem to have taken into account the authors’ concerns. The
Google Caja [90] project rewrites and prohibits untrusted JavaScript code from accessing
the DOM or any other objects without explicit capabilities granted to it from the parent
page. Essentially, this allows Caja to act as a sandboxing mechanism.
2.4 Browser Extensions
Barth et al.’s [111] work primarily addresses browser extensions, but many of the lessons
can be extrapolated to apply to web application platform extensions as well. The authors
examined 25 Firefox extensions and concluded that 88% of them needed fewer privileges
than they were granted, and that 76% of them had access to APIs with more capabilities
than they needed – compounding the difficulty of reducing the extensions’ privileges.
Instead of focusing on the potential for malicious third-party extensions to exist, the
authors focus on the potential for other malicious parties to exploit bugs or flaws in
third-party extension design – and leverage the APIs and capabilities to which those
third-party extensions have access. Their research concludes that Firefox should build a
51
new extension platform addressing these issues and concerns.
Felt et al. [112] extents their previous work [111] by expanding their study of thirdparty ‘extensions’ to the Google Chrome browser and the Android operating system.
Here, the authors focus on three benefits that install-time permissions have: limited
extension privileges (by conforming to the ‘principle of least privilege’), user consent, and
the potential for benefits when it comes to reviewing extensions. For example, the social
aspect of reviews and end-user feedback can minimize the usage of malicious extensions.
Furthermore, the listing of install-time capabilities also allows for security researchers to
concentrate on extensions with more dangerous capabilities. In the context of Firefox
and Chrome, this allows (and would allow for) official reviewers to speed up the process of
examining extensions before they are published to official repositories. To facilitate this,
the authors propose a permission ‘danger’ hierarchy, where more ‘dangerous’ permissions
are emphasized over less ‘dangerous’ ones.
Fredrikson and Livshits [113] argue that the browser should be the data-miner of
personal information, and that it should be in control of the release of such information
to websites and third-parties. Furthermore, it can support third-party miners whose
source code is statically verified to be unable to leak information back to those thirdparties.
52
2.5 Summary
There are many approaches to addressing the issue of privacy in web applications. One
of the common approaches to addressing privacy is through the use of privacy policies
and technologies that support them such as the Platform for Privacy Preferences Project
(P3P) [91]. In this area, work has been done to allow businesses to express their internal
workflows into P3P policies [94] and better express purposes through the use of a lattice
hierarchy [95]. Another approach to addressing privacy through the use of privacy policies was by considering licenses as a way to manage the dissemination of end-user data
[97]. Beyond protocols, other research examined the possibility of providing better tools
for developers to create privacy-aware web services, for instance, adding P3P privacy
specifications automatically to form fields [105]. Finally, providing a tool for end-users
to utilize to better gauge the risk to their privacy associated with a specific website was
also proposed by Tian et al. [106].
This chapter also surveyed several works [13, 107, 108, 109] specific to social networks
that examine the potential of hiding end-user data from the social network provider itself.
Two other works [65, 69], also social network specific, deal with third-party extensions to
these social networks. In one [69], the authors propose a “privacy-by-proxy mechanism
for protecting end-user data from third-parties. Their method restricts third-parties from
obtaining any end-user data from the social network platform directly, and they propose
that extensions requiring more detailed information should obtain that information external to the social network. In the other [65], the authors of that paper propose that new
standards for isolating untrusted JavaScript code should be implemented by browsers. To
a similar end, recent projects such as Google Caja [90] seem to have taken their concerns
into account.
53
Research related to browser extensions [111, 112, 113] was also examined for any
potential applications to web application platforms. In two of these works [111, 112], the
authors argued that install-time permissions for extensions allowed end-users to make
better and more educated decisions about whether to install a browser extension or not.
In the other paper [113], the authors argue for a “danger hierarchy for permissions, which
would allow end-users to clearly see which permissions are more risky than others.
While the on-going research concerning privacy policies is important and fundamental
to data privacy, such works assume or require that the promising party is honest. Thus,
using privacy policies as the sole enforcement mechanism becomes troublesome when the
promising party cannot be guaranteed to behave honestly. Furthermore, few of the works
examined addressed the problem of third-parties and end-user privacy instead focusing
directly on privacy issues between end-users and the web application platform they are
using. Of the works that did, the authors proposed a “privacy-by-proxy mechanism [69],
and ultimately, better sandboxing mechanisms [65] for embedded third-party code.
Consequently, there exists a gap in current research that has not been addressed: can
third-party extensions to web application platforms work with end-user data in a way
that prohibits end-user privacy from being violated? This is the gap that this thesis aims
to address and fill.
The next chapter introduces the theoretical model for this thesis. The theoretical
model will define the vocabulary used by this thesis to present its contribution, the
Virtual Faraday Cage. Additionally, observations and propositions will be provided that
make privacy guarantees for systems that comply with the model.
54
Chapter 3
Theoretical Model
This chapter presents the theoretical model used by the Virtual Faraday Cage, starting
with definitions and then continuing on to describe abstract objects and operations that
can be used to define privacy violations and protect against them.
3.1 Basics
A web application is a service accessible over the internet through a number of interfaces.
These interfaces can be implemented through web browsers or custom software, and may
run on different ports or use different protocols. A web application platform (hereafter
referred to as a platform) is a web application which provides an API so other developers
can write extensions for the platform to provide new or alternative functionality. These
developers are referred to as third-parties. Traditionally, extensions to web application
platforms are web applications themselves: they are hosted by third-parties and can be
accessed by the web application platform (and vice versa) through API interfaces, and
often by end-users through a web browser or other software. End-users are the users of
the web application platform whose privacy the Virtual Faraday Cage aims to protect.
Figure 3.1 shows this model.
End-user data considered sensitive by the platform represents all data that the enduser can specify the access control policies for with respect to other principals such as
third-party extensions. Private data represents a subset of sensitive data which is strictly
prohibited from being accessed by any remote extension component.
55
Figure 3.1: A web application platform.
3.2 Formal Model
The formal model for The Virtual Faraday Cage is presented here. These definitions and
observations allow the Virtual Faraday Cage to establish privacy guarantees.
3.2.1 Foundations
Definition 3.1. Data
Let D be the set of all data contained within a web application platform. Let Du ⊆ D
be the subset of D that contains all data pertaining to an end-user u, as specified by a
particular platform. The Virtual Faraday Cage defines data to be representable in vector
form, thus ∀di ∈ Du , di = hx0 , ..., xn−1 i where n > 0. Furthermore, xi is considered to
be either a string, a number, or some other atomic type within the context of the web
application platform. The special data value NULL can be represented as a 0-dimensional
vector hi.
56
By convention, data is represented in “monospaced font” with quotes (except for
NULL and numbers), and classes of data or “attributes” are capitalized and without
quotes, for example: Age, Gender, Date, Occupation.
Definition 3.2. Sensitive data
Let Su ⊆ Du be a subset of all data pertaining to u that is considered “sensitive”.
Sensitive data represents all data pertaining to an end-user that the end-user can specify
access control policies for.
Depending on the specific web application platform, sensitive data may consist of
different types of end-user data. A web application platform that requires that all users
have a publicly viewable profile picture, for instance, would force that data, by definition,
to not fall into the category of sensitive data because end-users have no control over the
access control policies for that profile picture. Consequently, data can only be classified
as sensitive if the end-user has control over whether or not that data is disseminated to
other users or third-parties.
Traditionally, determining whether or not data is “sensitive” has been left to the
end-user to decide. This thesis departs from this trend, instead leaving that distinction
to the web application platform to decide. This is done for definitional purposes rather
than philosophical: an end-user may still have reservations about being forced to have
a public profile picture, but if the end-user has no control over how that data is shared
with other end-users or third-parties, then it is not considered sensitive data with respect
to this model.
The next two definitions will introduce third-parties and extensions.
57
Definition 3.3. Third-parties
A third-party θ, represents an entity in the external realm. Third-parties can control
and interact with extensions belonging to them, and they can collude or otherwise share
data with any other third-parties. All data visible to an extension’s remote component
(Definition 3.4) is visible to the third-party owning it as well. Additionally, no assumptions are made regarding the computational resources of a third-party. The set of all
third-parties is T.
The definition of third-parties also departs from traditional definitions. In Barker
et al. [1], a third-party is any party that accesses and makes use of end-user (“dataprovider”) data. A house, in their terminology, is a neutral repository for storing that
data. Consequently, a web application platform that both stores and utilizes end-user
data has roles both as a house as well as a third-party.
In this model however, web application platforms (or “service-providers”) are not
defined as third-parties. Instead, this definition is strictly reserved for other parties that
obtain end-user data from the web application platform. In this way, third-parties are
similar to the concept of an adversary from other security literature. In the Virtual
Faraday Cage’s threat model, it is third-parties that the VFC aims to defend against.
Next, extensions must be defined:
Definition 3.4. Extensions
An extension, denoted by E, is a program that is designed to extend the functionality
of a given web application platform by using that platform’s API. All extensions must
be comprised of either a remote extension component (denoted as E), a local extension
component (denoted as E 0 ), or both. The set of all extensions is denoted by E.
58
The ownership relation o : e ∈ {E, E, E 0 } −→ θ ∈ T is a many-to-one mapping
between extensions and/or extension-components and third-parties that “own” them.
Furthermore, every extension E is “owned” by some third-party θ. This is expressed as
∀e ∈ {E, E, E 0 }, ∃θ ∈ T such that o(E) = θ. Thus, o represents the ownership relation.
Here, extensions are envisioned as other web applications that interact with the web
application platform.
The splitting of extensions into local and remote components is necessary for the
operation of the Virtual Faraday Cage. Local extension components run within a sandboxed environment controlled by the web application platform, while remote extension
components run on third-party web servers. The distinction between local and remote
components is covered in detail in the next chapter.
Now that data, third-parties, and third-party extensions have been defined, projections and transformations will be introduced. These allow for data to be modified before
revelation to a third-party, facilitating fine-grained control over end-user data.
Definition 3.5. Projections and transformations
Suppose that a certain class of data has a fixed dimension n and is of the form
hx0 , ..., xn−1 i. Fix hy0 , ..., ym−1 i as a vector where 0 ≤ yi ≤ n − 1. Then, a projection Py0 ,...,ym1 : hx0 , ..., xn−1 i −→ hxy0 ...xym−1 i, is a function that maps a data vector
hx0 , ..., xn i, to a new vector that is the result of the original vector “projected” onto
dimensions y0 , ..., ym1 in that order.
A transform τ is an arbitrary mapping of a vector hx0 , ..., xn i to another vector
hy0 , ..., ym i. The transform may output a vector of a different dimension, and may or
may not commute with other transforms or projections. Consequently, every projection
can be considered a type of transform, but not every type of transform is a projection.
59
Together, projections and transforms can be used as a method through which data
can be made more general and less specific. This allows the Virtual Faraday Cage to
give end-users the ability to control the granularity and specificity of information they
choose to release to third-parties.
Projections essentially allow the selective display (typically, reduction) of different
dimensions of a particular data item. For example, P0,3,2 (hx0 , ..., xn i) =
hx0 , x3 , x2 i and P0,0,0,0 (hx0 , ..., xn i) = hx0 , x0 , x0 , x0 i. If si = h“December”, “2nd”, 1980i
represents a sensitive date (e.g., birth-date) held within a particular platform, then
P0,2 (si ) = h“December”, 1980i represents a projection of that date onto the month and
year dimensions.
On the other hand, transforms are arbitrary functions that operate on the vector
space of data. Suppose a transform τ could map a date to another date where the month
has been replaced by the season. Thus, in this example, P0,2 (τ (si )) = τ (P0,2 (si )) =
h“Winter”, 1980i; this composition is also called a view.
Definition 3.6. Views
A view is a particular composition of one or more projections and/or transforms and
is a mapping from vector hx0 , ..., xn i to another vector hy0 , ..., ym i.
Depending on the type of data, a context-specific generalization hierarchy or concept
hierarchy, can exist for views when applied to data. A domain generalization hierarchy,
as defined by Sweeney et al. [114], is a set of functions that impose a linear ordering on
a set of generalizations of an attribute within a table (e.g, “Postal Codes”, “Age”, etc.),
where the minimal element is the original range of attribute values, and the maximal
value is a completely suppressed element equivalent to a NULL value. This allows for
60
selective revelation of end-user data (e.g., the value “Adult” instead of “25”), giving the
end-user more control over their personal information and keeping data more private.
The Virtual Faraday Cage makes use of this concept when defining granularity and generalization, see Section 4.10.4 for examples.
Definition 3.7. Granularity
Granularity refers to how specific the view is of a particular data item. For each
particular data item, a different generalization hierarchy may exist. A generalization
hierarchy is a lattice with the most specific data value at one end, and the least specific
data value (e.g., NULL) at the other. Each directed edge of the lattice graph represents a
level of generalization from the previous data value.
Depending on the generalization hierarchy, one view v 0 of data may be a derivative
from another view v: if a path exists from v to v 0 , then v 0 is a derivative (or generalization)
of v. Otherwise, v 0 is not a generalization of v. If v 0 is considered to be a derivative of
v, this can be represented as v 0 ← v. For definitional purposes, this relationship is
considered reflexive: ∀v, v a view, v ← v.
As an example, the view v(s) = s is specific and exact, whereas v(s) = NULL is
not. Figure 3.2 shows an example generalization hierarchy for a Date data item. In the
example, h1974i is a generalization of h“December”, 1974i, but it is not a generalization
of h“December”, 14i.
This thesis now introduces the concept of a “principal” – an entity which can interact
with (potentially reading or writing to) sensitive end-user data, constrained by these
policies. The following definition defines what a principal is within this model.
61
Figure 3.2:
An example
s = h“December”, 14, 1974i
of
the
generalization
hierarchy
for
data
Definition 3.8. Principals
A principal is an entity that can potentially access sensitive end-user data, and for
which data access policies can be created for by the end-user for their data. The set of
all principals is denoted as P.
Principals within the Virtual Faraday Cage include extension components (E and E 0 ),
and potentially other end-users or other objects or “actors”, depending on the particular
web application platform.
The next definitions will introduce the Virtual Faraday Cage’s access control model:
reading data will be handled by the associated privacy policies for that data, and writing
data will be handled by the associated write-policies specific to that data.
62
Definition 3.9. Privacy policies
Let a privacy-policy be defined as a tuple of the form htype, viewi associated with a
particular data item si ∈ Su . Let type represent a value in {single-access,
request-on-demand, always}, and let view represents a particular view of the data (a
sequence of projections and transforms).
Let pA (si ) be defined as the mapping between a given data-item si ∈ Su and the
privacy-policies that are associated between it and a principal A ∈ P. Then pA (si )
represents the privacy-policies of a given data-item relative to a given principal.
As there can be more than one type of privacy policy associated with a given dataitem (single-access, request-on-demand, or always), pA (si ) will return a set of tuples
of the form {htype, viewi, ...}. Because there are only three possible access types, this
ensures 0 ≤ |pA (si )| ≤ 3.
In the Virtual Faraday Cage’s model, a principal cannot read or obtain the existence
of a data item unless a privacy policy for that data item is specified permitting this, and
then the principal can only obtain the view of the data item as specified by the privacy
policy.
The single-access policy would allow an accessing party one-time access to the
data item, which can be enforced by the web application platform checking a list of
past-accesses to the data item. The request-on-demand policy would require that the
end-user authorize each access request for the data in question. Finally, the always
policy always allows access to the data.
For example, let si be of the form hCountry, Province, Cityi, representing some geolocational data. Suppose that an extension, when first activated or used for the first time,
should know the country, and province or state that the user is located in at that moment
– but otherwise be restricted to only seeing the current country. A privacy-policy reflect-
63
ing that would be one where: pA (si ) = {hsingle-access, P0,1 (si )i, halways, P0 (si )i}.
Definition 3.10. Write-policy
Let wA (si ) be a function that returns the write-access ability for a particular si ∈ Su
for a given principal




−1




wA (si ) = 0






1
A. In particular:
if A cannot write/alter si
if A can write/alter si with explicit end-user permission
if A can always write/alter si
Furthermore, unless an end-user has explicitly set the write-policy for a data-item si
to be a specific value, wA (si ) = −1.
In order to facilitate “social” extensions, or extensions that can work between multiple
end-users by sharing some information, local extension cache spaces are also introduced:
Definition 3.11. Local extension cache space
The local extension cache space is a data item that is controlled by the web application
platform, and is denoted as CE,u for a given extension E and user u. The local extension
cache space serves as an area for a local extension component to store data and information required for the extension to operate. While the web application platform can
always choose to arbitrarily deny any read and write operations, by default the following
hold: wE (CE,u ) = 1, wE 0 (CE,u ) = 1, |pE (CE,u )| = ∅, and pE 0 (CE,u ) = {always, CE,u }.
64
These default settings allow both extension components to write to the shared extension cache, but prohibit the remote extension component from being able to read from it.
The local extension component can always read from the cache with full visibility. Furthermore, provisions are made so that the web application platform can always choose to
deny a read or write request. This allows the model to take into consideration situations
that may arise when the extension uses up all its allocated space in its local extension
cache, or if the web application platform chooses to not grant any local storage space to
extensions.
Consequently, the local extension cache space serves as a strictly inbound information
flow area, allowing for the remote component of a third-party extension to write to it but
prohibiting it from reading from it. Additionally as there is no requirement for unique
and separate data storage, a third-party could, for example, cache the same data globally
across all instances of their extensions.
Finally, before the threat of information leakage can be introduced and countermeasures discussed, the sets of all visible and writable data must be introduced:
Definition 3.12. The set of all visible data
Let VA,u ⊆ Su represent the set of all visible data (to some extent) for a particular
end-user u and to a particular principal A. Then, VA,u = {s | s ∈ Su , pA (s) 6= ∅}.
Definition 3.13. The set of all writable data
Let WA,u ⊆ Su represent the set of all writable or alterable data for a particular
end-user u and to a particular principal A. Then, WA = {s | s ∈ Su , wA (s) ≥ 0} .
Note that W is not necessarily a subset of V (for any given A or u), and depending on
the particulars, this may allow for situations where a Bell-LaPadula[82] or similar model
for access control could be followed.
65
3.2.2 Information leakage
Enforcing read and write access through privacy-policies and write-policies as defined in
the previous definitions are sufficient for a “naı̈ve” implementation of the Virtual Faraday
Cage: principals can only read or write data if an explicit policy exists allowing them to
do so, or if explicit end-user permission is obtained in the process. However, what if an
end-user grants permissions such that there is an overlap of readable and writable data
between a principal outside of a sandbox (e.g., E) with a principal that is within it (e.g.,
E 0 )?
As it turns out, if a strict separation of visibility and writability is not enforced between remote and local extension components, it is possible to reveal data that privacy
policies do not permit, thus constituting information leakage and a privacy violation.
Observation 3.1. Simply enforcing privacy and write-access policies is insufficient for
preventing information leakage.
Explanation: It is sufficient to demonstrate that it is possible for an extension to
exist such that an end-user prohibits a third-party’s access to some confidential data, but
for that third-party to gain access to it through the use of a local extension component.
Suppose an end-user u installs an extension E which has remote (E) and local (E 0 )
components. Since the end-user has full control over how they choose to share or limit
their sensitive information revelation to third-parties, the end-user prohibits E from
accessing some data s1 , but allows it to access some data s2 . The end-user also can
allow E 0 to access s1 and write to s2 . At this point, pE (s1 ) = ∅, meaning that E
should be unable to view s1 , and consequently the third-party would not be privy to that
information either.
66
However, |pE 0 (s1 )| > 0, meaning that E 0 can read s1 . Because it can also write to
s2 , it can now leak some information about s1 to E by encoding it and writing to s2 ,
consequently leaking information back to the third-party.
While it may be tempting to argue that the example above is simply a case of poor
policy choices, the fact that it can occur despite the lack of an explicit privacy-policy
allowing for it indicates that the system as defined is insufficient for preventing information leakage. Asides from the trivial case where E 0 has the same access restrictions as E
(and thus, access to the exact same data and capabilities), if E 0 is granted any additional
read capabilities then it must be ensured that E 0 can never leak this data back to E and
thus back to the third-party. The next observation examines the requirements to prevent
this.
Observation 3.2. In order for a single extension to be prohibited from leaking information back to the third-party, its local component must be prohibited from writing to any
data that its remote component can read.
Explanation: If ∃s ∈ Su such that pE (s) = ∅ and |pE 0 (s)| > 0 (the non-trivial
case), then there exists a possibility for information to leak from E 0 to E as proved in
Observation 3.1. To prevent this, E 0 must be prohibited from writing to any data that
E can read. More formally, the following should hold:
∀s ∈ Su , if |pE (s)| > 0 =⇒ wE 0 (s) = 0
Another way of representing this is by thinking of this as prohibiting communication
overlaps between E and E 0 . Specifically, VE,u ∩ WE 0 ,u = ∅ must hold. Thus, any
data that is visible to a remote component cannot be written to by a local component,
preventing information leakage from local components. By ensuring this, for any given
67
extension E, E 0 would be unable to leak information back to the third-party.
However, while necessary, this condition alone is insufficient for preventing all possible ways for information leakage to occur. Colluding third-parties and their multiple
extensions with differing and overlapping Vs and Ws for a given end-user u could bypass
this requirement, as it only requires this hold for a single extension at a given point in
time.
This could be fixed by keeping track of which s has ever been written to by a given
extension Ei ’s local component and then denying a read-request from any other extension
Ej ’s remote component (where i 6= j), but this approach is analogous to moving s to a
subset of Su that is unreadable for remote extension components. Because of this, the
Virtual Faraday Cage defines a subset of sensitive data that is specifically reserved for
local extension components to write to, and explicitly prohibits remote components from
reading such data. Similarly, local extension components are restricted from writing to
data outside of this subset.
Definition 3.14. Private data
Let Xu represent the set of private data belonging to an end-user u, where Xu ⊆ Su .
Then ∀s ∈ Xu and ∀E, pE (s) = ∅. In other words, no remote extension components can
access any data in Xu .
This thesis makes a distinction between private and sensitive data – for both architectural reasons, as well as functional: certain types of information you are willing to
reveal to certain parties under certain circumstances, but other types of information you
may choose to keep private to all parties. For instance, your name and gender may be
considered sensitive information, and you may be willing to reveal one or both to third-
68
parties in certain circumstances – but your sexual orientation or religious affiliation may
be something you consider more private.
Definition 3.15. Revised Local Extension Components
In Definition 3.4 extensions were defined and local extension components were introduced. Now a new constraint is introduced to all local extension components: No
local extension component can write to data outside of a user’s private data set. This is
represented by: ∀si ∈ Su \ Xu , ∀E 0 , wE 0 (si ) = −1.
Defining private data that is accessible only to local extension components allows for
a strict separation of data; a locally running extension E 0 can never taint data visible
to some other remote extension E, as E 0 should be strictly restricted to writing only to
data in Xu . Furthermore, if data in Xu can never be removed from Xu , then it will never
be possible for any third-party to learn anything about data within Xu . This is because
Definitions 3.14 and 3.15 prevent that data from being revealed to remote extension
components, which are the only components of an extension capable of communicating
back to third-parties.
By default, private data can never be declassified. Despite this, the Virtual Faraday
Cage model does not explicitly prohibit an end-user from declassifying private data, and
this does not break any of the previous observations so long as the assumption is that
the end-user is the only entity that can know if data should be kept private or not.
69
Definition 3.16. Visible and writable private data
Let V0 A,u ⊆ Xu represent the set of all visible private data (to some extent) for a
particular end-user u and to a particular principal A. Then, V0 A,u = {s | s ∈ Xu , pA (s) 6=
∅}. In other words, V0 A,u is the subset of private data where there exists a privacy-policy
relative to A.
Let W0 A,u ⊆ Su represent the set of all writable or alterable private data for a particular end-user u and to a particular principal A. Then, W0 A = {s | s ∈ Xu , wA (s) ≥ 0}.
In other words, W0 A,u is the subset of private data where there exists a write-policy that
may allow for A to write to that data.
Definition 3.17. Basic privacy violations
A basic privacy violation occurs when a third-party θ obtains data that it should not
have access to. More formally: Given {Ea1 , ..., Eak } where o(Eai ) = θ for all 1 ≤ i ≤ k,
and θ obtains any knowledge about s ∈ Su : if @Eg ∈ {Ea1 , ..., Eak } such that pEg (s) 6= ∅,
then a basic privacy violation has occurred.
This definition is sufficient to cover obvious examples of privacy violations: If an enduser u did not grant authorization to view some data si ∈ Su to any extensions owned
by a third-party θ, but that third-party somehow got access to it – then a basic privacy
violation has occurred.
However, the definition of a basic privacy violation could be considered “weak” in
that it may be possible for a third-party to obtain full knowledge about s even though it
is only authorized for partial knowledge. As a consequence, generalized privacy violations
are also defined.
70
Definition 3.18. Generalized privacy violations
A generalized privacy violation occurs when a third-party θ obtains a view of data
that it does not have access to, or otherwise is unable to derive from the data it has
access to. More formally: Given {Ea1 , ..., Eak } where o(Eai ) = θ for all 1 ≤ i ≤ k, and θ
obtains a view v 0 of s ∈ Su : if @Eg ∈ {Ea1 , ..., Eak } such that ∃haccess-type, vi ∈ pEg (s)
where v 0 ← v, then a generalized privacy violation has occurred.
Note that transforms are defined as deterministic mappings for data, and that the
definition of generalized privacy violations only covers deterministic transforms. This
means that if non-deterministic transforms are used, it may be possible for a privacy violation to go undetected. Furthermore, the verb “obtains” refers explicitly to obtaining
information from the web application platform. If a third-party θ guesses the correct
value of data, it is not a violation of privacy by Definitions 3.16 and 3.17. However, if
non-deterministic transforms were allowed, this would require alteration of the definition
of privacy violations so that they would encompass the potential for a third party θ to
obtain an unauthorized view of data through the platform with a probability better than
what is granted through the end-user’s privacy policies.
Observation 3.3. Non-private data (s ∈ Su , but s ∈
/ Xu ) cannot be protected against
privacy violations.
Explanation: Let θ1 and θ2 be two third-parties with their own sets of extensions,
E1 = {Ea1 , ..., Eak } and E2 = {Eb1 , ..., Ebk }, where all extensions in E1 are owned by θ1
and all extensions in E2 are owned by θ2 . Let s ∈ Su be data that is not shared with
any extensions in E1 (that is, |pEai (s)| = ∅ ∀Eai ∈ E1 ), but that s is shared with some
extension in E2 . Thus, θ2 knows the value of s, and by definition, can share this value
with θ1 . This type of attack is called a collusion attack, and serves to emphasize that for
71
security purposes one third-party is indistinguishable from another.
Consequently, any data that is not explicitly kept private from all third-parties and
their remote extension components can be shared with other third-parties, violating enduser privacy policies and preferences.
Because Observation 3.3 has demonstrated that non-private data can never be fully
protected from privacy violations, it is important to define a more specific type of privacy
violation that relates only to private data:
Definition 3.19. Critical privacy violations
A critical privacy violation is a privacy violation that occurs when the data item in
question is also a member of the private data set for that user.
Given {Ea1 , ..., Eak } where o(Eai ) = θ for all 1 ≤ i ≤ k. If θ obtains knowledge about
s ∈ Xu , then a critical privacy violation has occurred.
Proposition 3.1. Abiding by privacy-policies and write-policies prevents critical privacy
violations
Proof: Suppose a third-party θ has obtained information about s ∈ Xu . Because s
can only be read by local extension components, this implies that at least one local extension component has accessed s. However, these local extension components can only
write to other data in Xu , or to the local extension cache space, which is not readable
by remote extension components. Thus, either a remote extension component can read
s ∈ Xu , or a remote extension component can read the local extension cache space. This
is a contradiction of the specifications and definitions of both private data (Definition
3.14) as well as local extension cache spaces (Definition 3.11). Consequently, critical
72
privacy violations are not possible within the Virtual Faraday Cage.
Note that enabling non-deterministic transforms has no effect on the prevention of
critical privacy violations.
3.3 Summary
This chapter presented the theoretical model used by the Virtual Faraday Cage. In
Section 3.1 the vocabulary used by the model and the Virtual Faraday Cage is introduced
and defined, and in Section 3.2 and 3.3 the formal model is introduced.
Besides introducing the vocabulary used in the remainder of this thesis, this chapter
also provided a fundamental security guarantee in Proposition 3.1. Specifically, by marking a subset of end-user data as private, and by abiding by the established rules such
as privacy-policies and write-policies, one can prevent critical privacy violations within a
given system. This result is the basis for the Virtual Faraday Cages privacy guarantees,
and is built on in the next chapter.
The next chapter, introduces the Virtual Faraday Cage, the main contribution of this
thesis.
73
Chapter 4
Architecture
This chapter presents the Virtual Faraday Cage, a new architecture for extending web
application platforms with third-party extensions. It presents an overview of the architecture, its features, as well as high-level and low-level protocol information. A proof-ofconcept implementation of the Virtual Faraday Cage is also described in this chapter.
4.1 Preamble
In electrical engineering, a Faraday Cage is a structure that inhibits radio communication
and other forms of electromagnetic transmissions between devices within the cage and
devices outside. Consequently, a Faraday Cage can be thought of as restricting information flow between the interior and exterior of the cage. Faraday Cages are used for
device testing, and for hardening military, government, and corporate facilities against
both electromagnetic-based attacks as well as information leakage. The Virtual Faraday
Cage simulates this by placing untrusted extension code within a sandbox and inhibiting
its communication with any entities outside of the sandbox.
The most significant difference between the traditional architecture for web application platforms and their third-party extensions, and Virtual Faraday Cage’s architecture,
is that the latter applies information flow control to how information is transmitted between the platform and third-parties. In particular, by utilizing a sandboxing mechanism,
it becomes possible to run third-party code that can be guaranteed to run ‘safely’: the untrusted code is limited in computational capabilities and can only access certain method
calls and programming language capabilities.
74
To help clarify the specific role of the Virtual Faraday Cage, it helps to separate the
architectural components and actors into two different areas: the external and internal
realms. The external realm (relative to a given web application platform and a single
end-user) consists of all third-parties and their extensions. The internal realm, consists
of infrastructure associated with running the web application platform, along with the
end-user’s own system (see Figure 4.1).
Figure 4.1: Internal and external realms of a platform.
The Virtual Faraday Cage focuses primarily on countering potential privacy violations
from the external realm, and consequently makes the assumption that all principals in the
internal realm are ‘well-behaved’ by conforming to the model. In other words, the web
application platform can enforce the privacy-policies and write-policies of all principals
acting within the internal realm. On the other hand, third-parties are free to collude with
each other and lie to end-users and the web application platform. Despite this however,
the Virtual Faraday Cage can still provide some security and privacy guarantees.
75
The Virtual Faraday Cage splits traditionally remotely hosted extensions into two
components, one that is located remotely and outside of the platform’s control, and
one that is hosted locally by the platform (See Figure 4.2). While all extensions to the
platform require at least one of these components, they are not required to have both.
This means that existing extensions do not have to change their architecture significantly
to work with a Virtual Faraday Cage based platform. Furthermore, developers without
their own server infrastructure could write extensions that only run locally on the web
application platform itself.
Figure 4.2: Comparison of traditional extension and Virtual Faraday Cage extension
architectures. Lines and arrows indicate information flow, dotted lines are implicit.
76
4.2 Features
Apart from security and privacy properties, the Virtual Faraday Cage, and its implementation, offers several advantageous and distinct features: hashed IDs, opaque IDs,
callbacks, seamless remote procedure calls, and interface reconstruction.
4.2.1 Data URIs
While neither a novel nor new feature, the Virtual Faraday Cage utilizes Uniform Resource Identifiers (URIs) to capture data hierarchy and organization. For example,
data://johndoe/profile/age could be a URI representing an “age” data value for a particular end-user, “John Doe”. Facebook utilizes a similar URI structure for data, for example: https://graph.facebook.com/<user-id> would result in returned data-items
as keyed entries in a JSON dictionary [115], such as “first name” and “last name”.
While the Virtual Faraday Cage does not specify a global URI scheme or ontology
for data, it is conceivable that a universal scheme could be constructed. On the other
hand, allowing individual web application platforms (or even categories of platforms) to
decide on their own URI scheme allows for these platforms to easily adapt the Virtual
Faraday Cage to fit their existing systems.
Data URIs are covered more thoroughly in Section 4.4.
4.2.2 Hashed IDs and Opaque IDs
All IDs used within the implementation are hashed IDs: they are represented in a way
that does not give out any information about either the number of records in the database,
nor the sequential order of a particular record. Specifically, hashed IDs (h ∈ Z2 256 ) are
outputs from SHA-256, though any suitable cryptographic hash function can be used (as
discussed in Chapter 5). In more generic terms, a user-id no longer conveys information
about when that user registered relative to another user, nor does it convey information
77
about how many users there might be within an entire web application. Similarly, when
applied to things such as posts or comments, ascertaining how active or inactive a user
may be is no longer possible through a “side channel” like ID numbers.
Opaque IDs, are extension-specific IDs for other objects within a web application
platform. Within this implementation, opaque IDs exist only for end-users, however a
more complete system would apply the same technique for all other objects. In the proofof-concept, this is a computational and storage intensive task, and is omitted. Opaque
IDs help inhibit one extension from matching user IDs with another extension because a
given user has a different opaque ID for two different extensions. Consequently, opaque
IDs may be considered as an aid to privacy-preservation.
4.2.3 Callbacks
For all API function calls, callback information can be passed as an optional parameter
so that the Virtual Faraday Cage can send a [potentially] delayed response to the thirdparty extension. This provides two benefits: 1) third-party extensions do not have to
wait for processing to occur on the back-end, or wait for end-user input, and 2) callbacks
can be delayed or dropped, making it unclear to the third-party when the end-user was
online and denied a request. In the latter case, this helps prevent leakage of information
regarding when a user was online or not.
Callbacks also allow for innovative application of privacy policies and data-views: for
a given set of privacy-policies, a less-specific view of data could be provided automatically,
and then later updated if the end-user chooses to allow the extension to see a more specific
view of that data.
78
As an example, suppose that an end-user authorizes a third-party extension to always
be able to see the current city that they are located in, but requires that third-party
extension to obtain explicit on-request consent for more specific data. The extension
could then always keep track of the user’s general area, but whenever the user wants
to let the extension know exactly where he is, the extension is granted that through a
callback with the specific value for that end-user’s location. In the context of a map-like
extension, or geo-locational social networking – the third-party always knows the general
area that the user is in, but can only know exactly where the user is when the user wants
to by actively using the extension. Because the third-party would always be able to get
broad geo-locational information, a hypothetical map-view could easily be kept open –
and would be able to update itself when (and if) the end-user allows it.
4.2.4 Seamless Remote Procedure Calls and Interface Reconstruction
As an architecture, the Virtual Faraday Cage can operate with any compatible remoteprocedure-call protocol. However, in the development of the Virtual Faraday Cage, the
Lightweight XML Remote-procedure-call Protocol (LXRP) was developed and implemented in Python. Consequently, the following features are specific to implementations
of the Virtual Faraday Cage that are built on LXRP.
Remote Procedure Calls are as seamless as possible within the Virtual Faraday Cage
because LXRP allows for any type of function or method to be exposed to remote clients.
Furthermore, custom objects can be serialized and passed between client and server,
allowing for rich functionality beyond passing native data-types. Finally, exceptions
raised on the remote end can be passed directly back to the client, allowing for developers
to write code that utilizes RPC functionality as though it were local. Future work could
allow the direct exposure of objects to clients rather than manually specifying methods,
but this would largely be a stylistic improvement rather than a functional one.
79
Interface reconstruction allows for any web application platform, or agent acting on
their behalf, to immediately access remote functions locally upon connection to the extension without needing to necessarily access external resources to obtain documentation
on what particular functionality is available beforehand. Specifically, interface reconstruction brings the remote objects and functionality to the local scope of either the
developer or the application using the interface.
This differs from existing [Python] RPC libraries in that the exact function interface is reconstructed locally. This allows for syntax errors to be caught immediately,
and documentation to be accessed locally. Future work could also allow for local typeverification, so many exceptions and errors can be caught locally without needing to
query the remote server. Asides from these benefits, interface reconstruction also aids
developers and testers, as they can easily see what functions are available to them and
directly call functions through more natural syntax such as “api.myfunc()” rather than
something like “call("myfunc", [])” – as these functions have been reconstructed in
the local scope.
4.3 Information Flow Control
In Myers and Liskov’s Decentralized Information Flow Control model [83], the Virtual
Faraday Cage can best be represented by Figure 4.3. In their model, information flow
policies are represented by {O1 : R1 ; ...; On : Rn }, where Ri is the set of principals that
can read the data as set by the owner Oi . The owners of information can independently
specify the information flow policies for their information, and the effective reader set is
the intersection of each Ri . This means that a reader r can only access a particular piece
of information if all owners authorize it. Trusted agents (represented by a double-border)
can act on behalf of other principals within the system and can declassify policies.
80
While the Virtual Faraday Cage makes use of some similar concepts in information
flow control, the level of sophistication in the Virtual Faraday Cage is far less than a
complete and broadly applicable model such as the one presented by Myers and Liskov.
The Virtual Faraday Cage assumes that the only untrusted components in a web application platform are third-party extensions, and information flow is controlled in the same
general manner for each extension. Extension components running locally that examine
private data can never send any information back to the third-party. As the Virtual
Faraday Cage is relatively simple in its use of information flow control, the Decentralized
Information Flow Control model is not used to describe the Virtual Faraday Cage any
further in this chapter.
Figure 4.3: The Virtual Faraday Cage modeled using Decentralized Information Flow
Control.
81
In the Virtual Faraday Cage, local extension components could have unrestricted read
access to all sensitive data, but would be unable to communicate that knowledge back
to the third-party. Local extension components are allowed to write only to their cache
space and to the end-user’s private data, as enforced by being run inside of a sandbox,
both of which are unreadable by the remote extension component. Write operations by
principals other than the owner of the data requires explicit owner approval.
Proposition 4.1. The Virtual Faraday Cage permits only inbound information flow to
Local Extension Components
Proof: As Proposition 3.1 states, third-parties can never obtain information from
local extension components because there is no capability for them to write to anything
other than private data or local extension cache spaces. On the other hand, remote extension components may have write or alter capabilities on a particular data-item s ∈ Su ,
which in turn could be read by a local extension component. Additionally, each extension
has a local extension cache space which is writable (but not readable) by the remote extension component. Consequently, information flow to the Local Extension Component
is exclusively inbound.
Another diagram outlining the information flow in the Virtual Faraday Cage is provided in Figure 4.4.
82
Figure 4.4: Information flow within the Virtual Faraday Cage. Dotted lines indicate the
possibility of flow.
4.4 URIs
Uniform Resource Identifiers (URIs) provide a richer context for data indices, and allow
for hierarchies to be plainly visible when referencing data. While a data-item referenced at
ID 92131 might be the same as the data referenced by a URI at data://johndoe/location,
the latter method clearly shows that the data-item belongs to an end-user with an ID of
johndoe.
The URI referencing scheme within the site-specific implementation allows for one
data-item to be referenced at multiple URIs – for instance, allowing data://johndoe/fri-ends/janedoe to be the same as data://janedoe. Similarly, if a data-item is owned
by multiple owners, this allows for that data-item to be referenced at two different URIs
that both demonstrate an ownership relationship from within the URI.
4.4.1 Domains
Domains in a URI represent principals within the Virtual Faraday Cage, for example
end-users or extensions. Domains are the hexadecimal representation of the hashed and
opaque IDs of each principal. Consequently, instead of accessing data://johndoe/ you
would be accessing something like data://33036efd85d83e9b59496088a0745dca7a6cd69
83
774c7df62af503063fa20c89a/ instead.
4.4.2 Paths
URI paths should reflect the data hierarchy for a given web application platform. Ideally,
the less variation there is in paths across different web application platforms, the easier
it becomes to develop cross-platform extensions. For example, while paths such as /name
may suffice for many platforms, moving an end-user’s ‘name’ to /profile/name may
allow for less risk of conflict between different web application platforms – both specific
implementations, as well as different categories.
While this thesis does not present a strict hierarchy or ontology as a proposal for all
implementations of the Virtual Faraday Cage, the proof-of-concept implementation used
the following paths to represent end-user data:
Data Paths on a Social Network
• /profile/first-name - First name of the end-user
• /profile/last-name - Last name of the end-user
• /profile/gender - Gender
• /profile/age - Returns the end-user’s age
• /profile/location - Returns the end-user’s location
• /posts - Returns a list of post IDs
• /posts/<id> - Returns the content of the post
• /friends - Returns a list of friend IDs
• /friends/<id> - Returns the friend’s name (as seen by the end-user)
84
The proof-of-concept implementation was also able to perform URI translation – allowing it to switch between a fake social network for testing, and Facebook. In order
to interface with the latter, URIs are “translated” from the Facebook URI scheme into
a scheme matching the other. If this technique is applied to other sites, it would be
possible for the same URI to be used for Facebook, MySpace, Google+, or any other
social networking platform without requiring major changes for an extension running
on top of the VFC API. For example, an extension could request to read the URI
profile://736046350/name, and this URI would then be translated into https://graph.
facebook.com/736046350/?fields=name and the appropriate data type returned. By
keeping the structure of URIs the same across all web application platforms, this permits
easier development and deployment of extensions across multiple platforms. Similarly,
this allowed for the same extension to run on both an example social network as well as
on Facebook.
In the proof-of-concept, a Facebook wrapper was constructed which allowed access to
the Facebook Graph API [116], and the following attributes were made available from
Facebook data (as per User object properties [115]):
85
Facebook URI and Attribute Translation
• id → null - Facebook user ID1
• first name → /profile/first-name - First name of the end-user
• last name → /profile/last-name - Last name of the end-user
• middle name → /profile/middle-name - End-user’s middle name
• gender → /profile/gender - Gender
• locale → /profile/locale - Returns the end-user’s locale (e.g., “en US”)
• birthday → /profile/birthday - Returns the end-user’s birthday (note that
/profile/age can be generated from this)
• location → /profile/location - Returns the end-user’s location
• /friends → /friends - Returns a list of the end-user’s friends
• /friends/<id> → /friends/<id> - Returns the friend’s name
• /statuses → /posts - Returns a list of status IDs
• /statuses/<id> → /posts/<id> - Returns the status content
1
In the Virtual Faraday Cage, “true” IDs are not revealed, and consequently this was omitted from
URI translation.
86
4.5 Application Programming Interfaces
The Application Programming Interfaces (APIs) of the Virtual Faraday Cage represents
the interfaces with which third-party extensions can interact with the underlying web
application platform and end-user data and vice versa. The API embodies an access
controlled asynchronous remote procedure call interface between third-parties and the
underlying web application platform – and end-user data. Specifically, the API exists to
support Propositions 3.1 and 4.1.
4.5.1 Web Application Platform API
The Platform API consists of six methods. In practice, it is possible to build a higherorder API on top of of this ‘base API’; however, it should suffice to use only these six
methods.
Read-Request
A read-request is a function r : URI×Z15 ×({0, 1}256 ∪{NULL}) −→ Data∪{NULL} that
can return end-user data if this request is allowed by an explicit end-user policy. Readrequests are passed the URI of the data to be read, a “priority code”, and an optional
callback ID. The priority-code represents an integer that maps to a specific ordering of
privacy-policy types, allowing a read-request to specify priorities for data views. For
instance, a read-request could prioritize request-on-demand over always, meaning that
the data-view for always would be returned if the data view for request-on-demand
fails. In total, there are 15 possible priority-codes corresponding to all possible orderings
of length ≤ 3 of the set {request-on-demand, always, single-access}. Figure 4.5 illustrates the detailed steps of how a read request is handled within the Virtual Faraday
Cage.
87
Figure 4.5: Process and steps for answering a read request from a principal. Start and
end positions have bold borders, ‘no’ paths in decisions are dotted.
If the request is authorized immediately, the data will be returned immediately; otherwise the only way to receive the data is through a callback. Alternatively, if one privacypolicy type can return data immediately, that view is returned to the requester, and the
other view may be returned later if it becomes authorized (e.g., request-on-demand).
The reason for requiring a callback is so that third-party extensions are not blocked
while waiting for an end-user to authorize a request. Furthermore, by having callbacks,
it becomes possible for some level of ambiguity regarding whether or not a particular
end-user is online. While callback IDs are not required, not using them would prohibit
updates from end-users with potentially more detailed or more up-to-date information.
88
Write-Request
A write-request is a function w : URI × Data × ({0, 1}256 ∪ {NULL}) −→ {0, 1, NULL}
that returns True, False, or None when called and passed data to write at a given URI.
An optional callback ID is also available, which will allow for delayed responses from endusers (e.g., if end-user set write-access to request-on-write). An illustration showing
the process of a principal requesting write access is shown in Figure 4.6.
Figure 4.6: Process and steps for answering a write request from a principal. Start and
end positions have bold borders, ‘no’ paths in decisions are dotted.
Create-Request
A create-request is a function c : URI×Data×({0, 1}256 ∪{NULL}) −→ {0, NULL}∪URI
that returns a new URI location, False, or None. It functions in a similar way to a
Write-Request, except that it will return a new URI corresponding to the location of
the newly-created data if the call is successful. An illustration showing the process of a
principal requesting create access is shown in Figure 4.7.
89
Figure 4.7: Process and steps for answering a create request from a principal. Start and
end positions have bold borders, ‘no’ paths in decisions are dotted.
Delete-Request
A delete-request is a function d : URI × ({0, 1}256 ∪ {NULL}) −→ {1, NULL} that
returns True, or None when called. A delete-request is passed a URI representing data to
be deleted, and if the request is authorized, a True is returned. An illustration showing
the process of a principal requesting delete access is shown in Figure 4.8.
90
Figure 4.8: Process and steps for answering a delete request from a principal. Start and
end positions have bold borders, ‘no’ paths in decisions are dotted.
Subscribe-Request
A subscribe-request is a function s : URI × ({0, 1}256 ∪ {NULL}) −→ {1, NULL} that
returns True or None when called. Subscribing to a URI allows for the subscribing entity
to be notified when there are changes made to the data at the given URI. Later, when
data is changed, the platform will check the complete list of subscribers for that data
URI and all its parents, and then notify the subscribed principals only if their view of
the data has been altered. An illustration showing the process of a principal requesting
subscription access is shown in Figure 4.9. Another illustration showing the process of
notifying subscribers when data has been altered is shown in Figure 4.10
91
Figure 4.9: Process and steps for answering a subscribe request from a principal. Start
and end positions have bold borders, ‘no’ paths in decisions are dotted.
Unsubscribe
An unsubscribe is a function u : URI × ({0, 1}256 ∪ {NULL}) −→ {1, NULL} that returns
True or None when called. Unsubscribing to a URI removes future notifications for
updates or changes to the data at the given URI. An illustration showing the process of
a principal performing an unsubscribe is shown in Figure 4.11.
92
Figure 4.10: Process and steps for notifying subscribed principals when data has been
altered. Start and end positions have bold borders, ‘no’ paths in decisions are dotted.
4.5.2 Third-Party Extension API
As each third-party extension can be unique and different, there is very little to mandate
what functions third-party extension APIs should have. However, the following function
is considered to be a minimum requirement for all third-party extensions:
Get-Interface
getInterface is a function i : NULL ∪ Any × {0, 1}256 ∪ {NULL} × ... −→ Any, that
returns an “user interface” when called, and has a parameter for an optional callback ID.
Additional parameters can also be passed along, if necessary or available. Depending on
the type of web application platform, the interface returned may be an XML document,
a web page, or some other data. If no value is passed to the method, a “default” interface
should be returned.
93
Figure 4.11: Process and steps for answering an unsubscribe from a principal. Start and
end positions have bold borders, ‘no’ paths in decisions are dotted.
4.5.3 Shared Methods
In addition to the above APIs, there exist two shared methods that are required by both
the web application platform, as well as all valid third-party extensions:
Send-Callback
sendCallback is a function b : {0, 1}256 × ({NULL} ∪ Data) −→ NULL that one party can
use on another party’s API to return the value of a callback if and when it is completed.
This method is implemented in the underlying Remote Procedure Call library.
Verify-Nonce
verifyNonce is a function v : {0, 1}256 −→ {0, 1} that returns true or false when called,
depending on whether or not the nonce supplied is correct or incorrect. This is used in
the High-Level Protocol for mutual authentication.See Section 4.6 for details.
94
4.5.4 Relationship with the Theoretical Model
Read-Requests and Subscribe-Requests, and the data that they return abide by the
Privacy-Policies in Definition 3.9. Similarly, Write-Requests, Create-Requests, and DeleteRequests all abide by the Write-Policies in Definition 3.10. As a consequence, all operations utilizing this API must abide by the underlying privacy and write policies as defined
in the theoretical model in Chapter 3. This means that if principals within a system are
forced to use only this API to interact with end-user data, then Proposition 3.1 and
Proposition 4.1 hold.
4.6 High-Level Protocol
The Virtual Faraday Cage’s “High-Level Protocol” (VFC-HLP) specifies how third-party
extensions are interacted with, and how third-party extensions interact with the web
application platform. The VFC-HLP can work with any underlying remote procedure
call mechanism (not just LXRP), as it abstracts the process and only requires a RPC
library that can provide both security and callbacks.
This section specifies the VFC-HLP specifications.
4.6.1 Accessing a Third-Party Extension
Accessing an extension is performed by specifying a URL from which a third-party extension to be installed, whereupon any remote and/or local components are loaded into
the web application platform and made accessible to the end-user. By default, extensions must be installed over HTTPS connections where certificate verification can be
performed; this ensures that mutual authentication between extensions and the web application platform can be reliably performed.
95
When a user wishes to install a simple web application extension, the process would
not be much different than current architectures. As the extension does not need to perform sensitive operations, there is no need for that extension to have any local component
whatsoever. When a user wishes to install a more complicated application extension, the
process may become more involved. In particular, that extension’s local component must
be downloaded and run within a sandbox by the web application platform. Upon authorization, the end-user must also specify what sensitive data, if any, the remote extension
component has access to, and similarly, what private data, if any, the local extension
component can access. Figure 4.12 shows an overview of the procedure for authorizing
and accessing a third-party extension for a particular end-user.
Figure 4.12: Steps required for authorizing and accessing a third-party extension.
96
Specifically, when accessing and authorizing a third-party extension, a VFC-compatible
web application platform acting on the behalf of an end-user would be able to query the
URL for the extension while sending a X-VFC-Client header identifying itself as interested in obtaining the specifications for the third-party extension.
The third-party extension would then send back a 200 OK response, and send a
LZMA-compressed JSON-encoded keyed-array that represented the extension specifications. These specifications would include the URLs for both RPC access to the thirdparty extension’s remote component API, as well as the URL for downloading the local
extension component. Additionally, the types of end-user data requested on-install of
the extension are also specified. See Figure 4.13 for an example.
Figure 4.13: The EMDB extension specifications
97
Third-Party Extension Specifications
The extension specifications returned by a third-party should consist of the following
entries:
• name - This is the full name of the third-party extension.
• canonical-url - This is the URL for accessing the third-party extension specifications.
• display-version - This is the version of the third-party extension, as displayed to
end-users.
• privacy-policy - This specifies the URL to the privacy-policy for the third-party
extension.
• owner - This is a keyed sub-array that describes the third-party, consisting of three
parameters: name, short-name (optional), and url.
• local-component - This represents a keyed-sub-array consisting of a url parameter
(the URL to the local-component), as well as as a type parameter, which specifies
the local component’s format (e.g., ’plain’, ’lzma’, ’gzip’, etc.)
• remote-component - This is a keyed sub-array consisting of a type parameter and
a url parameter. The type specifies the type of remote-component (e.g., ‘lxrp’),
and the URL specifies its location.
• request - This is a sequence of keyed-arrays, where each keyed-array consists of
a data URI path (path), as well as a data-specific ‘reason’ (reason) for requesting
that specific data.
• purpose - This is a plain-text explanation of what the requested data will be used
for, and why it is being requested.
98
• notes - (Optional) This entry represents plain-text notes that the third-party extension can choose to pass along.
4.6.2 Mutual Authentication
In order for the Virtual Faraday Cage to operate in a secure manner, mutual authentication and access control are required between the Virtual Faraday Cage and third
party extensions. When a third-party extension receives a connection request, it must
verify that the requesting party really represents an authorized web application platform.
Similarly, when a web application platform receives a request to perform some actions,
it must verify which third-party extension is actually performing the requests.
To facilitate this, when a connection is first received by any party (the ‘receiver’),
that party will then make a connection back to whomever the connecting party (the
‘requester’) claims to be to verify the connection. Because outgoing connections can be
made over SSL, the receiver can be assured that they are connecting to the true originator
of the incoming connection. Once connected to the true originator, a cryptographic
“nonce” (number that is used only once) supplied by the requester will be verified by the
true originator. This will allow the receiver to be assured that the incoming connection
from the requester is authorized. See Figure 4.14 and 4.15 for diagrams showing the
mutual authentication process.
If LXRP is the remote-procedure-call protocol of choice, it may be possible to efficiently disregard invalid or unauthorized requests from malicious or misbehaving clients
depending on how the Reference Monitor is implemented. As the authentication method
for VFC-Compliant LXRP servers requires mutual authentication through URLs, invalid
authentication requests can lead to connection attempts to web servers that consume
bandwidth and computational time. It may be possible to mitigate this attack by incorporating secret keys for authorized third-party extensions, or moving to a user-id/key
99
Figure 4.14: Authenticating an incoming connection from a VFC platform
Figure 4.15: Authenticating an incoming connection from a third-party extension
model instead. For other RPC protocols, there may be other methods available as well.
4.6.3 Privacy by Proxy
“Privacy by Proxy” refers to the ability for the web application platform to act as a proxy
for third-party extensions, allowing for personalization of extensions without needing to
reveal end-user data to third-parties [65]. Within the Virtual Faraday Cage, extensions
that display prompts or require data input from end-users can display data that they
otherwise would not have access to by utilizing a special XML element <value>. For
example, <value from="data://91281/name"/> would be substituted with user 91281’s
100
name, such as “John Doe”. This would be handled by altering the interfaces returned
from Get-Interface calls.
4.7 Remote Procedure Calls
4.7.1 Overview
The Virtual Faraday Cage uses the Lightweight XML RPC Protocol (LXRP) to handle all RPC calls between third-parties and the web application platform, or vice versa.
LXRP is a relatively simple protocol that allows for RPC clients to easily interface with a
given RPC server, and automatically discover the available methods and API documentation on connect. Similarly, LXRP allows for developers to easily expose methods and
functionality to clients while restricting access by using a “Reference Monitor” object.
Reference monitors were described by Anderson [117] as a supervisory system that
“mediates each reference made by each program in execution by checking the proposed
access against a list of accesses authorized for that user.” Anderson states that in order
for a Reference Monitor to work, the reference validation mechanism must be tamper
proof, must always be involved, and must be small enough to be tested.
LXRP operates over the HTTP protocol, and sends messages in XML format via
the HTTP POST method.
Clients will also, by default, send an additional header,
X-LXRP-Client, allowing servers to identify it as a legitimate LXRP client. This allows for LXRP servers to use the same URI for a web browser versus a LXRP client.
Within the context of the Virtual Faraday Cage, this allows for third-party extensions
to use the same URL for an informational page as the URL for adding the extension
to a web application platform. For example, a request to http://www.imdb.com/app/
could be handled differently depending on whether or not the client was a web browser
or a LXRP client. A web browser client could be given information on how to add the
101
extension to a web application platform such as Facebook, and an LXRP client would
interface directly with the LXRP server.
4.7.2 Protocol Requirements
In developing the RPC protocol for use in the Virtual Faraday Cage, a few key attributes
were identified:
• Encryption – All calls to the API had to be capable of being performed over SSL
between servers with valid certificates.
• Authorization – All API calls had to be authorized through a cryptographic
token.
• Authentication – There had to be a way to authenticate third-parties to ensure
that they were who they claimed to be, and vice versa through mutual authentication.
• Resilience – The protocol had to be resilient against malicious users and, ideally,
denial-of-service attacks.
• Extensibility – The protocol had to be extensible enough to add new API functions as they became necessary.
4.7.3 Requirement Fulfillment
The Lightweight XML-based RPC Protocol (LXRP) was designed to incorporate these
attributes. LXRP provides two main objects to developers: the API Interface and the
Reference Monitor. To create an API instance, you must pass a list of allowed functions
which can be called remotely, and a Reference Monitor instance which will act as the
decision maker behind access to the API.
102
LXRP operates in a secure mode by default, forcing HTTPS connections and verifying SSL certificates according to a certificate chain passed to it on initialization. Upon
initializing a connection to a LXRP API interface, a client passes a keyed array of connection credentials to the LXRP server, which then queries the Reference Monitor before
issuing a cryptographic token to the client. These connection credentials can be anything
from a username and password, to other data such as biometric signatures.
Authorization for function calls is verified by the Reference Monitor for all RPC
calls, this allows for partial exposure of functionality to “lower clearance” clients, as well
as other access control possibilities. Authorization is based on a cryptographic token t
chosen uniformly at random such that t ∈ {0, 1}256 . Consequently, the difficulty of forging
a function call request can be made comparable or better than what can is expected from
current security practices on the web.
Extensibility within LXRP is inherent as there are no restrictions are on exposing
new methods through it. LXRP exposes a set of functions to clients which can access
these through a Resource object. These functions can be public or “private” methods,
allowing for the easy extension of “support-specific” methods such as cryptographic nonce
verification and callback support. Thus, the primary functionality of an API exposed
through LXRP can be given through public methods, and any additional LXRP or VFCspecific functionality can be added through private methods.
LXRP servers also support asynchronous remote procedure calls between LXRP
servers, resulting in callbacks in reference to a method-call sent back to those servers.
If the client to the first server requests a callback and passes along information about
the second server and any authentication tokens needed to perform the callback, then
the first server will dispatch a client to perform the callback if and when that function
evaluation becomes available. Because a callback may not be guaranteed, LXRP clients
should specify a maximum time-to-live (TTL) for a given method request, and consider
103
all callbacks that take longer than a certain duration to be lost, or specifically in the
Virtual Faraday Cage’s situation, denied.
4.7.4 Protocol
When a LXRP client attempts to connect to a server, it first sends an auth request along
with any supplied credentials, and then the server will respond with an authentication
token for the client to use for all future requests. Alternatively, pre-established tokens can
be used, if available. Once an authentication token has been obtained, a LXRP client can
then include it with all subsequent requests. Afterward, a query request can be sent to
obtain the list of available functions that can be called by clients. Consequently, if a client
already has their authentication token and knows which functions it can call, performing
both the auth request and the query request are both optional. See Figure 4.16 for
details.
4.7.5 Messages
There are three message types: auths, queries and calls. “Auths” acquire authentication
tokens for LXRP clients, “queries” simply get the list of available functions that a given
authentication token can call, and “calls” are remote procedure calls that return the
result of function evaluation.
Auth
A query message takes the form “<auth>[...]</auth>”, where the contents are the
serialized Python data values of credentials supplied, if any.
Query
A query message takes the form “<query from="[...]"/>”, where the attribute from
is optional and represents the authentication token.
104
Figure 4.16: The Lightweight XML RPC Protocol
Call
A call message takes the form “<call func="[...]" from="[...]" callback-uri="
[...]" callback-type="[...]" callback-id="[...]" callback-token="[...]"/>
[...]</call>”, where the contents of the call message can include zero or more parameters of the form “<param name="[...]"/>[...]</param>”. The contents of parameters
are the serialized Python data values. Like the query message, calls do not require the
from attribute. Similarly, the callback-* attributes are not required either – they are
only supplied if the requesting party wants to receive the response as a callback. If a
function is called with parameters, each parameter is included in the body of the call.
105
4.7.6 Serialized Data
Data is serialized recursively, allowing more complicated data-structures to be transmitted between the client and server within LXRP. All serialized data takes the form “<value
type="[...]" encoding="[...]">[...]</value>”, where type represents the value
type (e.g., “string”, “integer”, etc.) and encoding represents the encoding – currently
only “plain” (default) and “base64”. The contents of the value tag would then be the
string value of the data. If the data is a more complicated data-structure, then the
value tag might contain other value tags, adding additional structural information. An
optional sub-element, key can also be present within a value element – allowing for the
value to be assigned a “name”, for example, in the context of a Python dictionary. Custom classes and objects, along with exceptions, can be serialized as well – however custom
objects will need to implement their own deconstruction and reconstruction mechanisms
to be passed successfully through LXRP.
4.7.7 Responses
There are three response types in LXRP: auth-responses, query-responses, and callresponses. Auth-responses consist of a single auth element whose content are comprised
of the serialized authentication token that the client should use. Query-responses contain
a list of available methods along with their descriptions (both in human and computer
formats), and also pass along any server-side flags and attributes for LXRP clients to
interpret. Call-responses simply return the data-values from a call request, though they
can also return errors and exceptions to any request.
106
4.7.8 Security
LXRP relies on operating over the HTTPS protocol, allowing LXRP servers to leverage
SSL and existing public-key infrastructure to provide encryption, confidentiality, and authentication of server end-points. Servers then implement their own Reference Monitor
which then can issue authentication tokens to clients based on the authentication credentials they present, as well as a client’s IP address. The Reference Monitor provides access
control over methods exposed over LXRP to clients, as all call-requests are first sent to
the Reference Monitor for confirmation before they are executed. Authentication tokens
are 256-bit strings, ideally supplied by a true-random source, or a strong pseudo-random
number generator. In the current implementation, authentication tokens are derived
from SHA-256 hashes of /dev/urandom for Unix-based systems, or the equivalent for
Windows-based systems.
4.8 Sandboxing
Sandboxing can be accomplished in a number of ways for different languages, and the
Virtual Faraday Cage does not specify which particular mechanisms should be used.
However, for the Virtual Faraday Cage to function properly, a robust sandboxing mechanism must be available and capable of running local extension components within it.
This sandboxing mechanism must be able to guarantee that any code running within it
is incapable of interacting or communicating with any system or software components
outside of the sandbox. Additionally, the sandbox must be capable of allowing a limited
selection of functions, specifically the Virtual Faraday Cage API, to be exposed to the
code running within it. By ensuring that sandboxed code can only interact with the Virtual Faraday Cage API, we can continue to ensure that Theorem 3.1 and Theorem 4.1
remain valid within the system. Otherwise, third-party code would have unrestricted
107
access to the platform’s systems, and ultimately, to end-user data.
Sandboxing was implemented in the Virtual Faraday Cage’s proof-of-concept implementation. While this implementation was developed in Python, the available authoritative references on sandboxing in Python was limited. As of Python 2.7, it seems that the
command “exec code in scope”, where code is untrusted code, and scope is a keyed
array (“dictionary”) is sufficient for protecting against code using unwanted functionality
from within Python, so long as built-in functions have been removed from the scope.
Other possibilities for implementing sandboxing included using the heavier pysandbox
[118] library, or code from the Seattle Project[119]. Pysandbox is a Python sandboxing
library that allows for extensive customization and control over sandboxed code. The
Seattle Project, on the other hand, is a distributed computing platform that utilizes
sandboxing to enable untrusted code to run on machines donating their computational
resources. The Seattle Project’s sandboxing mechanism relies on static code analysis
followed by a restricted scope for the exec command. As the proof-of-concept is only
intended to demonstrate the feasibility of the Virtual Faraday Cage, the simpler and
“built-in” approach of just using “exec code in scope” was undertaken instead.
Using simply “exec code in scope” however exposes one critical flaw: passing any
object to code ‘sandboxed’ in this manner, exposes the object’s parent class, and the code
could conceivably find attributes or functions that could be called to leak information (or
worse). However, if individual object methods are passed, the sandboxed code would be
unable to retrieve or access the parent object. Additionally, any methods called from the
sandboxed code cannot access anything from the parent scope – so the scope restrictions
on sandboxed code apply at all times during execution.
While “exec code in scope” works in Python versions 2.x, the sandbox safeguards
have been removed in Python 3.x. This will mean that any attempts to performing
sandboxing for Python 3 will require the use of tools such as pysandbox instead of built-
108
in functionality. Furthermore, “exec code in scope” does not prevent untrusted code
from overusing CPU or memory resources, nor does it force execution within a set time
interval. To provide a comprehensive sandbox mechanism accounting for those problems, “exec code in scope” would have to be run in a separate and monitored Python
thread. For the purposes of demonstrating the Virtual Faraday Cage, it was sufficient
to implement basic sandbox protection using only “exec code in scope” without using
separate threads – but, this would not be a tenable solution for a production service.
4.9 Inter-extension Communication
At this point, the Virtual Faraday Cage already presents a meaningful framework within
which privacy-preserving extensions can be built. However, it lacks a certain capability
that extensions running on popular platforms (such as social networking platforms) have,
namely the ability to share and process data between different users of the same extension.
To accomplish this, more than one approach must be considered (and these approaches
may not necessarily be mutually exclusive) as carelessly adding this functionality may
lead to information leakage and the loss of privacy by end-users.
One approach to this problem is to prohibit the local extension components from having any write capability. This means that, given u1 , u2 as end-users, if a local extension
component reads some private data s ∈ Xu2 , this local extension component cannot then
write the value of s to the set of private data Xu1 for any other user u1 . This policy can be
enforced by ensuring that wE 0 (s) = −1 ∀s ∈ Xu1 , Xu2 . However while this method may
be useful in conjunction with other methods, by itself it would serve to greatly restrict
the functionalities of local extension components, while simultaneously not addressing
other potential privacy issues such as private data revelation to other end-users. While
the thrust and focus of the Virtual Faraday Cage is to address privacy concerns with
109
third-parties specifically, this proposed approach leaves much to be desired in practice.
Figure 4.17: Hypothetical prompt and example extension asking for permission to share
end-user data.
Another approach is for the extension to ask the end-user for permission and authorization to share or update shared data in the local cache. This could be done by
presenting a prompt to the end-user and showing the full representation of the data to
be shared. Figure 4.17 shows how this might be implemented in practice. As long as
explicit and strong declassification from private data to sensitive data is prohibited, this
could be considered a low-level of declassification. An extension’s local component could
copy the shared data to another end-user’s private data, which would still be restricted
110
from third-party access. Additionally, as the end-user in question already authorized the
viewing of that private data by the other end-user, there is no privacy violation in this
context.
4.10 Methodology and Proof-of-Concept
This section covers the methodology behind taking the theoretical model and turning it
into the Virtual Faraday Cage’s architecture, and implementing the proof of concept.
4.10.1 Methodology
Taking the Virtual Faraday Cage from its formal model to an implementable one consisted of several steps: 1) determining how data would be structured and accessed, 2)
defining Application Programming Interfaces, 3) choosing an existing, or creating a new
remote procedure call protocol to access these APIs, 4) determining how sandboxing
untrusted third-party code could be performed, 5) developing a high-level protocol for
VFC-compliant web application platforms and third-party extensions, and 6) implementing a basic proof-of-concept demonstrating the feasibility of realizing the Virtual Faraday
Cage.
Determining how data would be structured and accessed is fairly straightforward.
Using URIs to reference data-items is both natural and intuitive, as well as easy to
implement. In the formal model, data was abstracted as a vector comprised of atomic
types within a global set of all end-user data but there was no innate way to access
data-items or reference them. Binding data-items to URIs, on the other hand, provides a
means to easily reference data-items, as well as a means through which a data hierarchy
can be expressed.
111
Developing the API for the Virtual Faraday Cage required taking the Formal Model
and building a set of operational primitives on top of it. The formal model provides
both both read and write access, and the Virtual Faraday Cage extends these along with
the hierarchical data structure (represented by data-item URIs) to make available an
additional six specific data-manipulation methods: read, write, create, delete, subscribe,
and unsubscribe.
In determining how remote procedure calls would work in the Virtual Faraday Cage,
three competing technologies were each examined as the potential protocol for communication between third-party clients and the Virtual Faraday Cage API server. These
technologies were SOAP [120], REST [121], and ProtoRPC [122]. The possibility of a
VFC-specific protocol was also considered. As Google AppEngine [123] was being used
at the time to host the Facebook App and VFC-Wrapper, it was important to pick a
technology that had libraries that could be used within AppEngine, because Google AppEngine does not support third-party C/C++ libraries [124]. Consequently, this limited
the available choices for implementing the Virtual Faraday Cage.
For SOAP, the libraries available included Ladon [125], Rpclib [126], and pysimplesoap
[127]. Other libraries were not considered due to either their additional requirements,
lack of maintenance, or other factors. While only pysimplesoap included a SOAP client,
a library such as SUDS [128] was available as a Python SOAP client. For REST, the
server libraries available were appengine-rest-server [129], web.py [130], Flask [131], and
Bottle [132] so a REST client could then be implemented fairly easily [133]. Aside
from appengine-rest-server, the remaining REST libraries were similar to each other and
consequently only appengine-rest-server and web.py were examined in detail. ProtoRPC,
on the other hand, was Google’s own web services framework, which passes messages in
a JSON-encoded format.
112
While SOAP was the most attractive option due to its widespread support2 and it’s
lack of dependence on the HTTP protocol, there were no available SOAP libraries that
worked on Google App Engine without needing modifications. REST, on the other hand,
limited the available methods to GET, PUT, POST, and DELETE. While overall, the Virtual
Faraday Cage’s data model could easily be seen within the context of a REST-like system,
this would limit the capabilities of the API. Thus, utilizing REST was not considered
an optimal solution. Finally, Google’s ProtoRPC was unusable in Python 2.7+ at the
time. Consequently, a VFC-specific protocol had to be written, the Lightweight XML
RPC Protocol (LXRP).
4.10.2 Development
The Virtual Faraday Cage was implemented via a Python-based proof-of-concept consisting of a single third-party entity providing a movie-comparison and rating extension,
as well as a social network site that worked as a wrapper around Facebook data.
While the implementation is sufficient to demonstrate the primary capabilities and
flexibility of the Virtual Faraday Cage, it is not all-encompassing nor production-ready.
The prototype described is also not intended to either be an efficient implementation nor
a comprehensive framework for web application platforms.
The proof-of-concept was developed entirely in Python 2.7. First, the “canonical”
theoretical model was implemented (abstract data-items, privacy policies, projections,
transforms, and views). Next, this model was extended into a “site-specific” model
that incorporated URIs. Then, this model was extended to provide support for a local
database and simulated social network. After this, third-party extension support was
added to the implementation.
2
SOAP is a formal W3C specification [120], and has libraries in C/C++ [134], Java [135], Python
[125, 126, 127]
113
Finally, a Facebook wrapper was implemented so that data made available to thirdparties came from a “real” social-network.
This was done by creating a Facebook
Application[4] and using Facebook’s Graph API[116] to access data, and rewriting the
URIs to conform with specifications set by the Virtual Faraday Cage.
4.10.3 Proof-of-Concept
This proof-of-concept implements the core aspects of the Virtual Faraday Cage, specifically: information flow control, sandboxing, and granular control over data revelation to
third-parties. Information flow control is enforced through access control built into the
Virtual Faraday Cage’s API, described in the following section. Sandboxing is enforced
through CPython’s Restricted Execution Mode, which is discussed earlier. Granular
control over data revelation is managed through the construction of Views, in a method
analogous to what was described in the previous chapter.
To implement the Virtual Faraday Cage, an original client/server architecture for
APIs was created: the Lightweight XML RPC Protocol (LXRP). The rationale and
reasons for this are described in Section 4.7.2, and an overview of LXRP is provided in
Section 4.7.4.
4.10.4 Formal Model
Here, we will cover examples of working with the Virtual Faraday Cage’s implementation,
specific to the abstract formal model.
Datastore and Data-items
In this demonstration, we will create an AbstractDatastore “ds” and populate it with
some data and principals. In this example, b is created as a member of Xp . See
Figure 4.18.
114
Figure 4.18: Creating a datastore and some data-items in an interactive Python environment
Introducing Projections
Here we demonstrate the use of projections on date and location data-items. A projection
p = P0,2 is created and tested on both value types. Another projection p2 = P0,1 is created
and tested on the data as well. Note that passing multiple data-items to a projection
results in outputting a list of the projections applied to each data item. Compositions of
both projections are demonstrated, in the first case by “unpacking” the output via the
* : [v1 , v2 ] −→ v1 , v2 modifier. See Figure 4.19.
Figure 4.19: Applying projections on data-items in an interactive Python environment
115
Introducing Transforms
In Figure 4.20, a transform is loaded that partitions integers (assumed to be ages) into
different age brackets. As transforms are specially written functions, their output and
mode of operation is completely determined by the developers who implement it.
Figure 4.20: Applying a transform on data-items in an interactive Python environment
Projections and Transforms
In Figure 4.21, two different transforms are loaded and composed with a projection. One
transform partitions ages into brackets, and another parses text (assumed to be names)
and produces initials.
Figure 4.21: Composing projections and transforms together in an interactive Python
environment
116
Invalid Projection-Transform Composition
Similar to the previous example, this example also composes transforms with projections. However, this time, we demonstrate how composition can result in errors as these
compositions are not necessarily symmetric. In particular, applying AgeCalculation
to AgePartition makes no sense: AgePartition simply labels numbers (assumed to
be ages) as either “Adult” or “Child” – and AgeCalculation expects a date vector.
Similarly, applying AgeCalculation to the projection p results in an error as well. See
Figure 4.22.
Views
In this example, a view is created as a composition of AgePartition with AgeCalculation.
This view can then be applied to data items as a composition of both transforms, as shown
in Figure 4.23.
Privacy Policies and Access Control
In this example, data is first created before creating views that obscure data by the
Initials and AgePartition transforms. Then privacy policies are created and assigned
to those data items, retrieved, and applied. Similarly, write policies are created and
retrieved. See Figure 4.24.
4.10.5 Example Third-Party
The “Electronic Movie DataBase” (EMDB) was created as a hypothetical third-party
that would have movie information and global ratings, and be able to store user-specific
saved ratings and favorite movies. The EMDB would then supply a VFC-based extension
which would do two things: 1) store movie ratings and preferences for end-users of the
extension, and 2) obtain some demographic information (“Age”, “Gender”, “Location”)
from the end-user through the social network that they are using. To this end, the
117
Figure 4.22: Composing projections and transforms in an invalid way, in an interactive
Python environment
EMDB was given a minimal web presence consisting of a landing page and a VFC-capable
extension URI that acted as a REST-like web application.
Upon an end-user installing the EMDB extension, they would be presented with the
EMDB extension’s privacy policy and terms of service, along with the requested data
to be shared with EMDB. The end-user can then apply any number of applicable and
supplied projections and transforms to adjust the view of their data that EMDB would
be granted.
118
Figure 4.23: Creating a view in an interactive Python environment
The EMDB extension would then consist of two components: a remote component
that would allow for movie searches and saving of movie ratings, and a local component
that would permit the comparison and sharing of ratings with other end-users (e.g.,
“Friends”). On EMDB’s end, whenever a movie is rated or a rating is updated, the
changes are pushed back to the social network’s local extension cache – ensuring that the
local extension component always has the latest data to work with.
4.10.6 Facebook Wrapper
To properly test the Virtual Faraday Cage in a “real” environment, a “Facebook App”
was created. To adequately simulate the Virtual Faraday Cage, a “VFC-Wrapper” was
created which acted as a buffer between a Facebook user’s data and a third-party extension. The VFC-Wrapper allowed for Facebook users to have more control over both
which data would be revealed to a third-party as well as the granularity of such data.
Third-party extensions would then be installed from within the VFC-Wrapper, and would
be interacted with through it.
119
Figure 4.24: Creating and accessing privacy and write policies in an interactive Python
environment
4.11 Effects and Examples
While the primary goal and motivation for developing and using the Virtual Faraday
Cage has always been to facilitate better control over end-user privacy while still gaining
benefits from third-party extensions within web application platforms, other benefits
also may be consequences of wider adoption of the Virtual Faraday Cage. This section
explores those benefits and showcases some examples of how the Virtual Faraday Cage
might be used in ways differing from conventional extension APIs.
120
4.11.1 A more connected web
Previously, King and Kawash [136] proposed a protocol for sharing data between different
web applications on different servers with the ultimate goal of better facilitating the
“meshing” of online communities. It was argued that sharing data while letting each
community administer their own site completely, could be a useful form of island-bridging
for all parties involved.
With the already-existing practices of developing extensions and APIs for web application platforms, a more connected web may also be a consequence of wider adoption of
the Virtual Faraday Cage. While APIs and extensions already exist for web application
platforms, there are no widely accepted or adopted privacy-aware layers for these systems. Additionally, while privacy concerns arise in social networking platforms, they are
overlooked in the larger categories of other web applications. Finally, the Virtual Faraday
Cage allows for a third-party extension that can perform tasks on private end-user data
“blindly” without being able to relay that information back to the third-party: no known
existing proposal addresses this. Consequently, the Virtual Faraday Cage may be an ideal
architecture for sensitive web application platforms such as finance and healthcare.
Figure 4.25 shows how different categories of web application platforms may become
interconnected through extensions. Each arrow represents an embedding of an extension
from one platform into another, with the presumption that potentially sensitive information will largely flow in the direction of the arrow, and private information never leaves
any one platform for another.
121
Figure 4.25: Graph showing the potential connectivity between categories of web application platforms based on making extensions available from one platform to another.
122
4.11.2 Examples
In this section, there are examples provided showing how The Virtual Faraday Cage can
be useful in addressing the particular situation. In all these examples, the economics of
privacy must be taken into account: an extension provider will likely want access to some
data for their own benefit as well!
Example 4.1. Movie comparison extension for a social network
Alice and Bob are both end-users of a particular social networking site. Alice would
like to compare her movie ratings with Bob’s and obtain some meaningful interpretations
from that data, but does not wish to let a third party know about her friendship or the
resulting compatibility ratings.
In this example, the extension’s remote component can provide access to movie titles
as well as global and personal movie ratings and so on, in exchange for some limited
demographic information such as Age and Gender. The extension’s local component, on
the other hand, would be able to access data provided by that end-user’s friends who
also use that extension. However, the local component would be unable to relay any
information learned back to the third party. Consequently, both end-users are able to
compare their movie ‘compatibility’, without having to worry that information such as
their friendship would otherwise leak back to the third party.
Specifically, let the set of all of Alice’s visible data to that extension be VE,u =
{Age, Gender}, and the set of all of Alice’s private data visible to that extension’s local
component be V0 E 0 ,u = {Friends, Age}. Certain data could also be made less specific,
for example, an age range instead of a specific age. Depending on how inter-extension
communication is handled, each user of the extension would have to authorize the sharing
of their movie rating information with other users’ of the extension – such as through
123
the prompt shown in Figure 4.17.
Example 4.2. Date finder
Bob is an end-user on a social networking site, and he would like to find someone
to date from among his friends, friends-of-friends, or other social circles. He does not
mind searching for potential romantic interests on an online dating site, but he would
prefer that his matches know people that he knows. Furthermore, Bob would prefer not
to announce to everyone else on his social network that he is using an online dating service.
Most social networking sites are not geared explicitly towards the facilitation of online
dating or online personals, and it remains a possibility that specialized services can do a
better job of matchmaking. In this example, Bob would like to preserve the privacy of
information such as who his friends are and perhaps limit the extent of what personal
data is visible to the dating service. To facilitate this, the dating service could request
information such as his age (or age range), gender, and personal interests. It would then
combine that data with additional information supplied directly to the dating service
such as his sexual orientation, relationship status, and so on. A list of Bob-specific
hashed friend-IDs could be stored in the local cache for Bob, allowing a local extension
component to iteratively check his friends to see if their hashed friend-ID matches and
that they are users of the dating service as well. At the same time, other matches from
the online dating service could be presented to Bob based on the information he chose
to share with the third-party.
Here, the set of Bob’s visible data to the remote extension component is VE,u =
{Age, Gender, Location, Interests}, and his set of visible private data to the local
extension component is V0 E 0 ,u = {Friends, Friends’-Profiles}.
124
Example 4.3. Extending a map provider with a road trip planner
Alice is looking to plan a road trip using a map provider of her choice and a road
trip planner web service. She would like to plan out her route and stops as well as know
the weather along her route, but she would like to keep specific location information and
travel dates as hidden as possible from the planning service.
In this example, Alice is planning her road trip route through a map provider that
picks the best route for her. A road trip planner web service combines a calendar/eventplanning service along with weather information, and it provides an extension to map
providers and their users. For the extension to work, it requests information about which
cities are along the routes Alice plans to take; the more cities shared, the more capable the
extension is at assisting planning with regards to weather. The rest of the trip planning
is done through the local extension component, serving to prevent the third-party from
knowing anything more than the waypoints along her route.
This could be accomplished by ensuring that Alice’s set of visible data to the road
trip planner is VE,u = {Cities-Along-Route} and her set of private data accessible by
the planner’s local component is V0 E 0 ,u = {Travel-times}. For this to work, the local
extension component E 0 would then need to receive the maximum weather information
(typically around two-weeks) for all the cities along the route, and then the trip planning
could be done within E 0 , with details saved to the local cache.
It is also possible to distrust the map provider for storing the event planning information, however this would require that this information be kept at the road trip planner’s
web service instead. This would mean that the only information that would be truly
unavailable to them would be the specific addresses along the end-user’s route. Ultimately, either scenario is valid and the “better” scenario depends on which web service
an end-user wants to trust for which data.
125
Example 4.4. Integrated dictionary, thesaurus, and writing helper
Bob utilizes a web office suite to compose his documents, which range from essays
and papers to personal letters. He would like to use an integrated dictionary/thesaurus,
especially one that could catch things such as overused words or phrases, but he would
not like to share his documents with any third-parties.
While using the web to look up words is a relatively small inconvenience for Bob, it
would be nice to have an integrated system available within his web office suite. Furthermore, a histogram of word frequencies, as well as alerts to common grammar mistakes or
overused phrases could be useful as well – but this cannot be directly accomplished without revealing the contents of the document to a third party. By using the Virtual Faraday
Cage however, it becomes possible for a dictionary/encyclopedic provider to create an
extension that uses a local component to analyze a document for overused or repetitious words or phrases, while keeping word-lookups on the remote site. Consequently,
only small excerpts of the document (individual words) are shared with the third-party
ephemerally, and only by the explicit authorization of the end-user. In this case, Bob’s
set of visible data to the remote extension component is VE,u = {Excerpts}, and his set
of private data visible to the local extension component is V0 E 0 ,u = {Document}.
Example 4.5. Automated bidding extension for an online auction
Alice would like to install an extension to a web application that facilitates online
auctions. This extension allows Alice to automate her bidding process to improve her
chances of obtaining an item during an auction. However, she would like to prohibit the
extension provider from learning about her shopping habits or financial information.
126
Here, an automatic bid helper could run within the protected environment as a local
extension component, and utilize external third-party supplied statistics or databases
to help with its decision making. It could combine the third-party supplied information with private data such as how much the end-user is willing to pay to make intelligent bids. Thus, Alice’s visible data to the third-party’s remote extension component is VE,u = NULL, and her visible data to the local extension component is V0 E 0 ,u =
{Item, Finances, Other-Parties-Involved-In-Bid, ...}.
Example 4.6. Augmented online shopping with trusted reviewers
Bob shops for many items online, however sometimes it can be challenging for him
to know which reviews of a product he can trust. Because disgruntled customers and
online marketing firms can skew product reviews both ways, he would like to know which
reviews to trust, or to see reviews specifically written by friends or friends-of-friends from
his social network.
In this example, a social network third-party could provide an extension that automatically acquires a list of hashed IDs of friends or friends-of-friends (similar to Example
3.2, thus proactively protecting the privacy of social network users) of Bob, downloading
them into his local extension cache. The local extension component would then check
the current items that Bob is viewing, and then emphasize reviews written by friends or
friends-of-friends according to the data in the cache. This way, the social network also
has no information on the types of products that Bob is viewing. Here, Bob’s visible
data to the remote extension component is VE,u = NULL, and his visible data to the local
extension component is V0 E 0 ,u = {Currently-Viewing}.
127
Example 4.7. Extending a healthcare site with fitness and dietary extensions
Alice uses a web service to manage her medical records and keep track of her visits,
and diagnoses and checkups. She also uses a diet and fitness web service to keep track
of her workouts and her nutritional and dietary needs. She would like to compose these
different web services in a way that preserves her privacy – as she does not need or want
the fitness service knowing what her ailments are or her medical records.
Because of the extreme sensitivity of medical data, as well as accordingly strict and detailed legislation pertaining to it – the Virtual Faraday Cage may be an ideal architecture
through which medical web services can extend their capabilities through third-party extensions. In this example, the fitness/dietary third-party could provide an extension that
could analyze doctor recommendations and provide appropriate fitness regimens or dietary suggestions without relaying any information about the diagnosis back to the thirdparty. This would be accomplished by storing a database of fitness information in the local
extension cache. In exchange, the end-user might be asked to share some basic information back to the third-party (e.g., Gender, Weight, Height, Age) at some level of granularity (e.g., ranges). For this example, Alice’s visible data to the remote extension component is VE,u = {Gender, Weight, Height, Age}, and her visible private data to the local
extension component is V0 E 0 ,u = {Medical-History, Ailments, Recommendations}.
128
4.12 Summary
This chapter has introduced and covered the Virtual Faraday Cage. The Virtual Faraday Cage enables web application platforms to integrate third-party extensions to their
platforms, while simultaneously enabling the complete protection of subsets of end-user
data (“private data”). Furthermore, any data disseminated to third-parties can be done
so at a granular, user-controlled level. The Virtual Faraday Cage also comes with privacy
guarantees backed by a theoretical framework.
The Virtual Faraday Cage’s API was introduced in Section 4.5, and its high-level
protocol was introduced in Section 4.6. A proof-of concept for the Virtual Faraday Cage
and the methodology in developing it are covered in Section 4.10. Finally, a discussion of
the effects of the Virtual Faraday Cage, as well as potential examples of its application
to other types of web application platforms is given in Section 4.11.
The next chapter will provide an in-depth analysis of this thesis’ contributions and
conclude this work.
129
Chapter 5
Analysis & Conclusion
This chapter concludes the discussion of the Virtual Faraday Cage, covering comparisons
to existing work, as well as shortcomings and criticisms of the approaches in this thesis.
This section also discusses future work.
5.1 Comparisons and Contrast
5.1.1 PIPEDA Compliance
Section 1.4.2 introduces the Personal Information Protection and Electronic Documents
Act (PIPEDA) [21], a Canadian privacy law that dictates how organizations can collect,
use, and disclose personal information.
PIPEDA also establishes ten principles that an organization must uphold: 1) Accountability, 2) Identifying Purposes, 3) Consent, 4) Limiting Collection, 5) Limiting
Use, Disclosure, and Retention, 6) Accuracy, 7) Safeguards, 8) Openness, 9) Individual Access, and 10) Challenging Compliance. The Virtual Faraday Cage supports these
principles:
Principle 1 - Accountability (PIPEDA Section 4.1)
“Organizations shall implement policies and practices to give effect to the principles,
including (a) implementing procedures to protect personal information [...]” (PIPEDA
Section 4.1.4)
130
The Virtual Faraday Cage supports this by providing a fine-grained and strict access
control mechanism that prevents unauthorized data dissemination.
Principle 2 - Identifying Purpose (PIPEDA Section 4.2)
“The organization shall document the purposes for which personal information is collected in order to comply with the Openness principle (Clause 4.8) and the Individual
Access principle (Clause 4.9)” (PIPEDA Section 4.2.1)
The Virtual Faraday Cage’s model does not enforce purpose, but purposes are collected from third-parties when they request access to end-user data within a given web
application platform.
“The identified purposes should be specified at or before the time of collection to the
individual from whom the personal information is collected. Depending upon the way in
which the information is collected, this can be done orally or in writing. An application
form, for example, may give notice of the purposes.” (PIPEDA Section 4.2.3)
Purposes are collected and specified when a third-party requests accesses to an enduser’s data.
Principle 3 - Consent (PIPEDA Section 4.3)
“Consent is required for the collection of personal information and the subsequent use or
disclosure of this information. Typically, an organization will seek consent for the use or
disclosure of the information at the time of collection. In certain circumstances, consent
with respect to use or disclosure may be sought after the information has been collected
but before use (for example, when an organization wants to use information for a purpose
not previously identified).” (PIPEDA Section 4.3.1)
The consent of end-users is required before any data is shared with third-parties, and
end-users dictate whether or not this data is shared once, always, or if they must be
asked per-use.
131
“The principle requires ‘knowledge and consent’. Organizations shall make a reasonable effort to ensure that the individual is advised of the purposes for which the
information will be used. To make the consent meaningful, the purposes must be stated
in such a manner that the individual can reasonably understand how the information
will be used or disclosed.” (PIPEDA Section 4.3.2)
When consent is requested from end-users, the purposes are provided at that moment
of time.
“An organization shall not, as a condition of the supply of a product or service, require
an individual to consent to the collection, use, or disclosure of information beyond that
required to fulfill the explicitly specified, and legitimate purposes.” (PIPEDA Section
4.3.3)
The Virtual Faraday Cage specifically facilitates the ability for end-users to choose a
granularity “view” of their data such that their data is not revealed at all to third-parties.
Consequently, the Virtual Faraday Cage architecture implicitly supports this.
“Individuals can give consent in many ways. For example: (a) an application form
may be used to seek consent, collect information, and inform the individual of the use
that will be made of the information. By completing and signing the form, the individual
is giving consent to the collection and the specified uses; (b) a checkoff box may be
used to allow individuals to request that their names and addresses not be given to
other organizations. Individuals who do not check the box are assumed to consent to
the transfer of this information to third-parties; (c) consent may be given orally when
information is collected over the telephone; or (d) consent may be given at the time that
individuals use a product or service” (PIPEDA Section 4.3.7)
Consent is done on a per-data-item basis, and can manifest in in multiple displayed
forms, all of which can comply with (a) and/or (b), and (d).
132
“An individual may withdraw consent at any time, subject to legal or contractual
restrictions and reasonable notice. The organization shall inform the individual of the
implications of such withdrawal.” (PIPEDA Section 4.3.8)
The Virtual Faraday Cage can notify third-parties that an end-user has withdrawn
their consent for the use of an extension.
Principle 4 - Limiting Collection (PIPEDA Section 4.4)
“Organizations shall not collect personal information indiscriminately. Both the amount
and the type of information collected shall be limited to that which is necessary to fulfill
the purposes identified. Organizations shall specify the type of information collected as
part of their information-handling policies and practices, in accordance with the Openness
principle.” (PIPEDA Section 4.4.1)
The Virtual Faraday Cage supports this by forcing third-parties to specify specifically
what data is being collected and how it is revealed. The Virtual Faraday Cage also
supports end-users in determining if revealing that data to a third-party extension is
appropriate for the specified purposes.
Principle 5 - Limiting Use, Disclosure, and Retention (PIPEDA Section 4.5)
The Virtual Faraday Cage does not specifically address or aid in upholding this principle.
Principle 6 - Accuracy (PIPEDA Section 4.6)
“Personal information that is used on an ongoing basis, including information that is
disclosed to third parties, should generally be accurate and up-to-date, unless limits to
the requirement for accuracy are clearly set out.” (PIPEDA Section 4.6.3)
The Virtual Faraday Cage allows third-parties to maintain current information on
end-users, as permitted by end-users. In particular, end-users decide if, when, and how
often a third-party may request the “latest” information. Consequently, the accuracy
principle is facilitated through the Virtual Faraday Cage, as permitted by end-users.
133
Principle 7 - Safeguards (PIPEDA Section 4.7)
“The security safeguards shall protect personal information against loss or theft, as well
as unauthorized access, disclosure, copying, use, or modification. Organizations shall
protect personal information regardless of the format in which it is held.” (PIPEDA
Section 4.7.1)
The Virtual Faraday Cage employs an access control system that protects against
unauthorized actions on an end-user’s sensitive data – no such data can be read or
written unless explicitly authorized by the end-user.
Principle 8 - Openness (PIPEDA Section 4.8)
The Virtual Faraday Cage does not specifically address or aid in upholding this principle.
Principle 9 - Individual Access (PIPEDA Section 4.9)
“Upon request, an organization shall inform an individual whether or not the organization holds personal information about the individual. Organizations are encouraged to
indicate the source of this information. The organization shall allow the individual access
to this information. However, the organization may choose to make sensitive medical information available through a medical practitioner. In addition, the organization shall
provide an account of the use that has been made or is being made of this information
and an account of the third parties to which it has been disclosed.” (PIPEDA Section
4.9.1)
Within the Virtual Faraday Cage, an end-user can “see” all of their sensitive data, as
part of the requirements for ensuring that end-users can set appropriate access control
policies on them. Consequently, end-users can, at all times, know what sensitive data
the web application platform has, as well as what sensitive data has been revealed to
third-parties.
134
“In providing an account of third parties to which it has disclosed personal information
about an individual, an organization should attempt to be as specific as possible. When
it is not possible to provide a list of the organizations to which it has actually disclosed
information about an individual, the organization shall provide a list of organizations to
which it may have disclosed information about the individual.” (PIPEDA Section 4.9.3)
As stated under PIPEDA Section 4.9.1, an end-user can see which third-parties have
had access to sensitive data.
“An organization shall respond to an individual’s request within a reasonable time
and at minimal or no cost to the individual. The requested information shall be provided or made available in a form that is generally understandable. For example, if the
organization uses abbreviations or codes to record information, an explanation shall be
provided.” (PIPEDA Section 4.9.4)
With the Virtual Faraday Cage, it is technologically feasible to make the list of
sensitive data immediately accessible to any end-user through a privacy settings portal.
“When an individual successfully demonstrates the inaccuracy or incompleteness of
personal information, the organization shall amend the information as required. Depending upon the nature of the information challenged, amendment involves the correction,
deletion, or addition of information. Where appropriate, the amended information shall
be transmitted to third parties having access to the information in question.” (PIPEDA
Section 4.9.5)
While not forced, third-parties with access to a specific data-item can subscribe to
changes in that data – should an end-user alter that data’s value, the third-parties will
be notified in real-time.
135
Principle 10 - Challenging Compliance
The Virtual Faraday Cage does not specifically address or aid in upholding this principle.
While the Virtual Faraday Cage is not a complete solution for PIPEDA compliance, it
does assist in allowing for a web application platform to better maintain such compliance
despite allowing third-parties access to end-user data. Specific to the Canadian Privacy
Commissioner’s findings on Facebook [22], the Virtual Faraday Cage can be used as a
system to help alleviate concerns over third-party access to end-user data. In particular,
in the findings, the Commissioner wrote that the original CIPPIC complaints regarding
Facebook and third-party “applications” (extensions) were that Facebook:
1. “was not informing users of the purpose for disclosing personal information to third-party application developers, in contravention of Principles
4.2.2 and 4.2.5;”
2. “was providing third-party application developers with access to personal information beyond what was necessary for the purposes of the
application, in contravention of Principle 4.4.1;”
3. “was requiring users to consent to the disclosure of personal information
beyond what was necessary to run an application, in contravention of
Principle 4.3.3;”
4. “was not notifying users of the implications of withdrawing consent to
sharing personal information with third-party application developers, in
contravention of Principle 4.3.8;”
5. “was allowing third-party application developers to retain a users personal information after the user deleted the application, in contravention
of Principle 4.5.3;”
136
6. “was allowing third-party developers access to the personal information
of users when their friends or fellow network members added applications
without adequate notice, in contravention of Principle 4.3.2;”
7. “was not adequately safeguarding personal information in that it was
not monitoring the quality or legitimacy of third-party applications or
taking adequate steps against inherent vulnerabilities in many programs
on the Facebook Platform, in contravention of Principle 4.7;”
8. “was not effectively notifying users of the extent of personal information
that is disclosed to third-party application developers and was providing
users with misleading and unclear information about sharing with thirdparty application developers, in contravention of Principles 4.3.and 4.8;”
9. “was not taking responsibility for the personal information transferred
to third-party developers for processing, in contravention of Principle
4.1.3; and”
10. “was not permitting users to opt out of sharing their name, networks,
and friend lists when their friends add applications, in contravention of
Principle 4.3 and subsection 5(3).”
The Virtual Faraday Cage helps address 1, 3, 6, 8, and 10 of the CIPPIC’s allegations
– and can help alleviate allegations 2, 4, and 7. In the Privacy Commissioner’s report,
Facebook was found to be in violation of Principles 2 and 3 (“I am concerned that
users are not informed of what personal information developers are accessing and are not
adequately informed of the purposes for which their personal information is to be used
or disclosed.”), as well as Principle 7 (“[...] given the vast potential for unauthorized
access, use, and disclosure in such circumstances, I am not satisfied that contractual
arrangements in themselves with the developers constitute adequate safeguards for the
137
users personal information in the Facebook context.”).
One of the privacy challenges that Facebook is facing is that they have essentially
pushed the responsibilities of ensuring privacy protection to that of the third-party, however at the same time, this third-party cannot held to the same level of trust and adherence to policies as Facebook itself. While Facebook has attempted to mitigate this by
providing a “Facebook Verified” badge for third-party extensions that pay a fee and can
explain the data that they collect – this does not address the issues of either the revelation of “basic information,” nor the reuse of data for purposes other than stated, and it is
an optional audit. Furthermore, when addressing the issue of third-parties having access
to data about end-users who did not install their extensions (e.g., friends of end-users
who did), Facebook’s solution was to have anyone concerned about this opt-out of using
any third-party extensions.
The Privacy Commissioner stated that Facebook should develop a means by which
to monitor third-party extensions so as to ensure that third-parties are complying with
consent requirements – and that Facebook should consider providing third-party developers a template they can use to explain their data needs and purposes. Another issue
was that third-parties did not need to obtain consent from other users on a network
(or on an end-user’s friend list) when a third-party extension is installed. The Privacy
Commissioner also found that Facebook’s use of contractual agreements to be insufficient with regards to Principle 7 – especially as the principle explicitly states the need
for technological measures.
As stated in Section 1.4.2, despite the changes Facebook made to address the original
PIPEDA complaint filed against it, CIPPIC filed another complaint [24] in 2010 which
expressed their dissatisfaction with Facebook’s response and indicated that they felt that
many of the core concerns they had were not addressed by these changes, including the
lack of support for fine-grained and granular control over end-user data when shared with
138
third-parties. In this context, the Virtual Faraday Cage may be a prime framework to
assist with addressing some of those complaints.
The Virtual Faraday Cage helps web application platforms like Facebook provide
end-users with third-party extensions, while simultaneously protecting end-users from
unintended or unauthorized data leakage and helping reduce the necessary trust placed
in these third-parties. By using the Virtual Faraday Cage, platforms like Facebook
can reduce the need to rely on third-parties policing themselves, as well as reducing
the need for an “all-in” or ”none-at-all” approach to using third-party extensions and
sharing data with them. For instance, friend list contents would be revealed as opaque
IDs unless consent was obtained from the individual end-users on that list. The Virtual
Faraday Cage thus explicitly helps address the Privacy Commissioner’s recommendations
regarding “Third-Party Applications.”
5.1.2 Comparisons with Other Works
Section 1.5.2 presents Bonneau et al.’s work [66]. They examined how private or sensitive
data could be obtained from an online social network without end-users’ knowledge,
and demonstrated how this could be accomplished through interacting with Facebook’s
API using the Facebook Query Language (FQL). As FQL queries return the number of
records in a query, this can be exploited to leak some information about end-users within
Facebook. Consequently, as a precaution, the Virtual Faraday Cage API specifically
returns no information in cases where the principal does not have access to the data being
queried. For instance, a read-request or write-request that requires end-user permission
per request is designed to limit the opportunity for third-parties to know if and when the
user is online. Similarly, existence tests are prohibited because the result of performing
an API method on non-existent data is the same as performing an API method on data
that one does not have permission to access.
139
Section 2.2.1 introduced Rezgui et al.’s [96] and Dekker et al.’s [97] works. Rezgui
et al. identified and defined many aspects of privacy within Web Services, and we now
compare it with the Virtual Faraday Cage. Dekker et al. propose using formal languages
to write licenses for data dissemination, however the enforcement or privacy would need
to be accomplished through legal means.
Rezgui et al. define ‘user privacy’ as a user’s ‘privacy profile’. This profile consists
of the user’s privacy preferences for their data, specific to a particular ‘information receiver’ and purpose. They define ‘service privacy’ as a comprehensive privacy policy that
specifies a Web Service’s usage, storage, and disclosure policies. Finally, ‘data privacy’
is defined as the ability for data to “expose different views to different Web Services”,
similar to a notion of an adjustable granularity or that of projections and transforms in
the Virtual Faraday Cage.
While Dekker et al.’s [97] approach may yield numerous advantages over an enduser agreeing to an in-English privacy policy (or a P3P policy), such a system is still
addressing a fundamentally different aspect of privacy than the Virtual Faraday Cage.
In their approach, the authors attempt to examine a way for legally-binding license
agreements for end-user data to be processed and analyzed by machines for the purpose
of enforcement checking and forming derivative licenses. While a well-behaved entity
could abide by a license, a malicious one would have to be taken to court – a solution
that the Virtual Faraday Cage does not consider ideal as no legal remedy would be
capable of restoring that end-user’s privacy once it was lost. The Virtual Faraday Cage
attempts to avoid this scenario all-together by preventing [private] data dissemination in
the first place.
140
Section 2.3.1 presents Guha et al.’s [107] “None Of Your Business” (NOYB). However,
one of the limitations of NOYB is that it does not account for how communication
between end-users (where one is not using NOYB) can be achieved. Furthermore, it does
not account for how end-user data can be shared legitimately with third-parties (or even
the application provider) for legitimate and useful features and capabilities. This is in
fact not a focus of NOYB. This is in contrast with the Virtual Faraday Cage, which seeks
to facilitate meaningful [sensitive] data-sharing between end-users and third-parties, as
well as meaningful use of private data by third-party extensions that are unable to leak
that data back to the third-parties.
Additionally, while NOYB and similar projects may be able to bypass detection by
web application providers, the use of NOYB may still constitute a violation of the platform’s terms of service – as NOYB essentially piggybacks on Facebook to provide its own
social network experience. Ultimately, NOYB provides something different than what
the Virtual Faraday Cage provides.
Baden et al.’s [109] Persona proposal is for a new architecture for privacy-preserving
online social networks. Persona uses a distributed data-storage model, and requires
end-users have browser extensions that can encrypt and decrypt page contents. Their
idea is to provide for finer grained access control with the stipulation that access to
the decrypted data is not possible without permission granted from the end-user in
the first place. While their model provides significant security guarantees, it requires
special browser extensions for all end-users, and does not incorporate a system by which
untrusted third-party extensions can work with end-user data without being capable of
revealing that information back to the third-parties.
141
Section 2.3.2 describes Felt and Evan’s [65] proposal for a privacy-by-proxy API for
Facebook and other social networks. However, their proposal prevents all data disclosure
through the web application platform, on the basis that most Facebook apps (extensions)
do not need to access such data in the first place. Instead, they argue that data disclosure should happen separately, outside of the platform, and directly to the third-party.
However, this would do nothing to address granular data disclosure, or the ability to
hide data from the third-party while still allowing the third-party to work with that data.
This thesis argues that catering to the ‘lowest denominator’ [of third-party extensions] is
inappropriate. Just because many current extensions within Facebook are “junk” does
not mean that web application platforms should prohibit more meaningful extensions
from functioning by hard-coded prevention of end-user data usage.
Ultimately, combining aspects of Felt and Evans’ work with the Virtual Faraday Cage
might present the best benefits in practice. As these two approaches are not mutually
exclusive, doing so is straightforward: All extensions have the capability to use personalization without requiring end-user data access, but extensions that need it will have to
go through the Virtual Faraday Cage. A proposal for how this can be done is presented
in Section 4.6.3.
Felt et al.’s [112] study of Google Chrome extensions proposed a permission ‘danger’
hierarchy, where more ‘dangerous’ permissions are emphasized over less ‘dangerous’ ones
(See Section 2.4). However this observation is not obviously applicable to the Virtual
Faraday Cage as different types of data stored on different types of web application platforms may have similar URIs but totally different contents and privacy risks associated
with revealing them. On the other hand, because the Virtual Faraday Cage supports
both install-time and run-time permission requests – and as it is designed to be built
into a parent platform, it is possible to imagine how a review or vetting process could
be added to further reduce the presence of malicious extensions that end-users might
142
install despite warnings generated by permission and capability requests. However, at
the same time, because the dangerous components of extensions are effectively limited to
the remote components, a review and vetting process may be of limited use within the
Virtual Faraday Cage.
Fredrikson and Livshits [113] argue that the browser should be the data-miner of
personal information, and that the browser could contain third-party miner ‘extensions’
within it. In other words, all data operations must go through the browser, which is
considered trusted. Similarly, the Virtual Faraday Cage requires that all operations on
private data (that must not be relayed back to the third-party) be performed within the
web application platform and not performed remotely.
5.2 Time & Space Complexity
As the Virtual Faraday Cage introduces several new architectural components, including
adding new metadata as well as new workflows for accomplishing tasks within a web
application platform – it is important to consider the potential impacts of the Virtual
Faraday Cage on performance and data storage. This section covers the time and space
complexity analysis for the Virtual Faraday Cage, including the complexity costs associated with components such as hashed and opaque IDs, granular views, sandboxing, and
message protocols.
143
5.2.1 Hashed IDs
Hashed IDs, as implemented by means of a cryptographic hash function, would require
O(|D|+|P|) of storage, where D represents all the sensitive data (for all principals) within
a system, and P represents the set of principals. Using SHA-256, this would require an
additional 256 bits of storage per object. As hashed IDs would be stored along with the
records of actual objects, the additional computational complexity for looking up the true
object should be constant-time. While the use of a hash function may incur potential
collisions, with a suitably chosen hash function this should not occur in practice. An
alternative to using a hash function would be to encrypt the real IDs of objects using a
fast symmetric cipher (e.g., Rijndael/AES) which would also eliminate the concern for
collisions.
5.2.2 Opaque IDs
If opaque IDs are implemented by means of cryptographic hash function, then a lookup
table is needed to perform a reverse-lookup from an opaque ID to a real ID. This would
require storage of up to O((|D| + |P|)2 ) or O(D2 + D · P + P2 ), where P represents the
set of all principals. As the number of principals and/or sensitive data grows large, the
storage space that must be allocated grows to unmaintainable levels.
On the other hand, if opaque IDs are implemented by means of symmetric-key encryption, where each user has a different opaque ID “key”, then storage space can be
reduced to simply O(|P|), or on the order of the number of entities in a given system.
Using a system like AES, this would require adding an extra 256 bits to each entity
record in a database and is far more realistic. Because opaque IDs would reveal the true
ID of a given object when decrypted, and because they have a fixed size for a given web
application platform, the additional computational costs of using opaque IDs would be
constant.
144
5.2.3 Views
Estimating the time and space complexity of views is more challenging than other components of the Virtual Faraday Cage. Without fixing the maximum dimension of projections, or prohibiting the repetition of a dimension in a given projection, it becomes
impossible to fix an upper bound for storage costs. However, if we assume a fixed constant d to represent the maximum dimension of any projection, and we assume that the
maximum dimension of any data item that may be put into a view will be a fixed constant d0 (where d0 ≥ d), then the upper-bound for storage of a given projection will be
d · log2 (d0 ) bits or O(1).
If the total number of transforms that a given web application platform supports is
bounded by a fixed-constant t, then the storage space that a single transform will occupy
will be log2 (t) or, again, O(1).
Finally, if the total number of projections and transforms that can be chained together
is fixed as c, then the total storage space of a given view is bounded by max((d · log2 (d0 ) ·
c), (log2 (t) · c)) bits, which again reduces to O(1).
For example, if d is 32, d0 is 232 , t is 28 , and the maximum chain c is generously
32, then the maximum storage size for any given view will be 4 kilobytes. By further
constraining the maximum dimension of data (e.g., d0 = 16) the maximum storage size
for a given view will be at most 1 kilobyte.
The computational time needed to apply a given view on a particular data-item will
vary depending on the computational time of given transforms, and on the particular
data-item size s. For projections, it should be possible to implement a solution requiring
O(s) operations. If the computational time needed to display or send a particular dataitem from the platform to an end-user or third-party is considered to be an O(s) operation
as well, then barring computationally-intensive transforms (beyond O(s)), the additional
computational resources should be negligible. Consequently, the total computational
145
time for a view should be bounded by O(cs), which reduces to O(s) because c is a
constant.
Similarly, for storage of a view, it can be presumed that because a view will never add
additional information, the storage space for the view’s result should never be greater than
s. Consequently, the total storage complexity of a web application platform supporting
views is O(s). As views can be ephemeral, this space can be further reduced in practice
as once a view is generated and distributed, the space it occupied can be freed for future
operations.
5.2.4 Access Control
Storing the read/write access control rules for end-users will require at most O((3|D| +
|D|) · |P|), which reduces to O(|D| · |P|) storage space. The maximum number of entries
for read views for a single data item is limited to a constant number (at most the number
of access types – in our case, 3), and the lookup table for views would be comprised
of a column for principal IDs, a column for data item IDs, a column for the requesting
principal’s ID, a column for the access type (which has three types), and a column for
the corresponding view. Because each access control rules can apply to different thirdparties, the size will ultimately scale with the number of third-parties multiplied by the
amount of data. On average, the actual storage size may be much lower (e.g., the average
end-user may never exceed granting access to more than 100 third-parties), but the worst
case is still very large.
Using the example numbers for a view’s size (1 kilobyte), assuming that IDs occupy
64 bits of space, and allocating 4 bits for the view type, the read-rules would be just
slightly over the view size (e.g., 1049 bytes). For write rules, the maximum number of
entries is exactly |D| · |P|, consequently the storage complexity is O(|D| · |P|), and the
individual entry size would be much lower (e.g., 25 bytes).
146
As for the computational time needed to query the access control rules and apply
them, this would depend on the particular implementation and performance of database
queries. Assuming that a composite primary key index comprised of the data owner principal ID, data item IDs, a requesting principal ID, and the access type could be created,
and assuming lookups based on primary keys could be completed within O(|D|log(|D|))
time, then the total time complexity would be the same. This is because the additional
steps required (looking up hashed IDs and opaque IDs) are constant time, and resolving
a data item’s URI would ideally just be another O(|D|log(|D|)) operation.
5.2.5 Subscriptions
Storing subscription information for data items will require at most O(|D| · |P|) storage
space. Individual records in such a table would comprise of a data ID and a subscriber
ID, which may reduce to 64-bytes per record. As with the access control records, in
practice the average data item may have few subscribers, reducing the average case storage complexity to something that scales linearly with the amount of data in the system.
Like access control, the time complexity of accessing the subscriber list would depend on
the implementation and performance of database queries, and dispatching messages to
third-party subscribers would also incur a computational cost – ideally bounded by the
average case.
5.2.6 Sandboxing
Estimating the time and space complexity for sandboxing is a challenging task. First,
no time or space constraints were specified when sandboxing was described, but without
assuming constraints, there is no way to provide estimates. Consequently, let us assume
that for a particular platform, limits on the storage space allocated for third-party extension local components is fixed, as is the computational resources that such a component
147
can utilize. This may be represented in both “global” and “per-user” limits – either way,
we may be able to assume that the maximum storage space for a single extension’s local
component is a fixed size.
Borrowing from the W3C recommendations for HTML5 Web Storage [137], a limit of
5,242,880 bytes of storage per domain origin may be a reasonable starting point, but this
would cause problems for multiple extensions hosted on the same primary domain. There
may also be a variety of ways that storage and computational resources may be made
available to third-parties, for instance, are long-lived processes allowed? Can additional
storage space be “purchased” by a third-party? Does the amount of storage space scale
with the number of end-users utilizing that third-party’s extension? Alternatively, is the
third-party charged for every user that utilizes their extension?
As a result of these questions, the storage space required for sandboxing may scale
with the number of third-party extensions (that have local components), or it may scale
with the number of users of an extension (with a local component), or both – across all
such extensions. Similarly, computational requirements are also difficult to examine: even
if we assume that local components must execute within a fixed time t or otherwise be
terminated – resulting in O(|E|) (where E represents the set of all extensions) maximum
computational resources – such estimates are both incomplete and impractical as they
do not take into account the amount of resources likely to be used across any given
point in time, and similarly fail to take into account the fact that end-users expect near
instantaneous results when interacting with a web application.
148
5.2.7 Protocol
There are two protocols presented within the Virtual Faraday Cage – the “High-Level
Protocol” (VFC-HLP), and the protocol used to execute remote procedure calls, which
is LXRP. The time complexity of executing VFC-HLP’s operations (authorizing an extension, and mutual authentication) are constant: a fixed number of messages is passed
to authorize an extension or ascertain that a party is who they claim to be. Similarly,
for LXRP, the time complexity for executing any of its operations is also fixed: initial
authentication is a fixed two-message process, querying for the available methods is also
a fixed two-message process, and method calls also result in a fixed two-message, or at
most four-message (e.g., with a callback) process. As for the space complexity – the size
of communications depend strictly on the amount of data being passed. Platform-wide,
the total amount of messages passed will depend on the total number of method calls
between third-parties and the platform, which is difficult to estimate.
5.2.8 Summary
In general, apart from sandboxing and the protocol it uses, the additional overhead generated by the Virtual Faraday Cage scales linearly with the number of principals and/or
data items within the system. With reasonable constraints (e.g., limits on dimensionality,
view chain length, etc.), the additional load due to the Virtual Faraday Cage will likely
be manageable: new operations are completed in constant-time, and new data storage
scales linearly with the total amount of data stored. In the context of social networking platforms, where individual object content can be very large (e.g., photographs and
videos), the amount of additional storage overhead needed to incorporate access control
rules, hashed IDs and opaque IDs will be very minimal in comparison to the actual data
item size. As the additional storage needed is bounded by a fixed constant, this size
difference would be linear in practice.
149
5.3 Shortcomings
5.3.1 Personal Information Protection and Electronic Documents Act
While the Virtual Faraday Cage does help support many of PIPEDA’s principles, it does
fall short in a few key areas.
For example, no support for the changing of purposes is provided inherently by the
Virtual Faraday Cage, something that PIPEDA’s Principle 2 and 5 request support
for. PIPEDA’s eighth principle, “Openness”, is partially supported through the Virtual
Faraday Cage – users can see what data the web application platform has on them,
however, no support is made to granting end-users a view of what third-parties have
on them. The Virtual Faraday Cage also does not assist in Principle 10, “Challenging
Compliance”.
Consequently, the Virtual Faraday Cage is not a panacea for addressing PIPEDA
compliance, but instead may be viewed as one component of a multi-faceted approach to
achieving compliance.
5.3.2 Inter-extension Communication
The Virtual Faraday Cage proposes that inter-extension communication should be handled by prompting the end-user to decide whether or not to permit it. This would be
accomplished by requiring the end-user to decide whether or not to share the data with
other users of the same extension. While the Virtual Faraday Cage can ensure that
any data revealed through this mechanism remains exclusively in the private data sets
within the web application platform (assuming that data declassification is not permitted), there is no obvious way to prohibit information leakage to other end-users unless
the data owner can verify that the contents of the data being shared are not private.
However, this approach is vulnerable to a different problem: end-users may decide to
150
accept whatever data they are being presented with sharing. This may result in a phenomenon similar to “click fatigue”, where a user may no longer spend the necessary time
to read a security dialog before choosing their answer [138]. Additionally, as the amount
of data grows, a ‘raw’ view into the data to be shared may become impractical. While
a user can easily determine whether or not they would like to “share their movie ratings
with their friends”, it is less clear whether or not they would like to share a long list
of raw data (e.g., Figure 4.12). Consequently, in the long-term, an alternative approach
to obtaining user consent is likely necessary. One such approach is discussed in Section
5.3.3.
5.3.3 Proof-of-Concept
As the implementation of the Virtual Faraday Cage was intended to be a proof-of-concept
and not a “production-ready” framework for deploying the Virtual Faraday Cage, there
exist many areas where the implementation could be improved:
• Efficiency - The current implementation does not take into account efficiency:
many objects are sub-classed multiple times, special vector objects are used to
represent data, and the storage, retrieval, and execution of privacy policies and their
corresponding views can be made both more memory and computationally efficient.
Additionally, communication efficiency can be improved greatly by switching from
a raw XML-based protocol to something that uses compression and/or binary data,
such as EXI1 .
1
Efficient XML Interchange[139]
151
• Sandboxing - Sandboxed local extensions need to confined to a set amount of
memory, limited in their CPU consumption, and be terminated if their execution
time exceeds certain margins. This should be configured by the web application
platform, potentially on a per-extension basis. Additionally, the execution of sandboxed code needs to continue to be fast even with these additional changes. This
may mean that a production-ready architecture would necessitate sandboxed extensions running on a separate dedicated server for the sake of performance.
• Third-Parties - Currently, third-party extension identities are bound to their
URIs. If a third-party were to change the URI of their extension, they would lose
all access rights to resources on a web application platform. This can be rectified
either through a per-application mechanism for changing third-party extension IDs,
or having extension IDs be something other than directly derived from (and fixed
to) the third-party extension URLs.
• Callbacks - Callback delays are not implemented, however in practice, some level
of delay may be preferable. This may need to be set at the end-user level, so that
for certain applications, artificial delays are not forced (e.g., for a real-time mapping
extension). Alternatively, end-users might be able to set a flag or permission that
allows for extensions to receive immediate responses from them at risk of revealing
when they are online.
• Seamless Remote Procedure Calls - Currently, only methods can be exposed
through the LXRP system. Future work could allow the direct exposure of objects
to clients rather than manually specifying methods, however this would largely be
a stylistic improvement rather than a functional one.
152
• Selective API Revelation - LXRP’s architecture does not allow for selective
function revelation to clients depending on their credentials, however this should
be implemented in the future.
5.3.4 Hash Functions
Throughout this thesis, the Virtual Faraday Cage makes extensive use of the SHA-256
hash function. While the SHA-256, SHA-384, and SHA-512 family of hash functions has
wide adoption and has been studied extensively for security vulnerabilities, there is no
guarantee that vulnerabilities will not be discovered in the future. However, implementations of the Virtual Faraday Cage can utilize any strong cryptographic hash function
as better methods are discovered.
5.3.5 High-Level Protocol
The Virtual Faraday Cage’s High-Level Protocol (VFC-HLP) can verify that an incoming
connection is coming from the URL it is claiming to come from, however, is this sufficient?
Will user’s be able to differentiate between two similar-looking extension URLs? And,
what if someone were able to put a malformed extension on the same host or top-level
domain?
While preventing phishing attacks is not one of the Virtual Faraday Cage’s design
goals, it is important to acknowledge that such attacks may become an eventuality. One
potential solution to mitigate such attacks is to mandate a fixed format for all extension
URLs, for example “https://extension.yourdomain.com/”. However, this approach
suffers from inflexibility, and only addresses a very specific set of impersonation attacks
(namely, attacks from the same top-level domain).
153
Apart from informing the user clearly about the third-party when they attempt to
grant data-access to a third-party extension, there is no obvious remedy to this problem.
Improvements to the user-interface may allow for users to clearly check to see if a given
URL for an extension is valid, and similarly, the display of SSL certificate information
may allow for users to be more confident about those extensions.
Despite this, even if end-users accidentally allow a malicious or impersonating extension access to their data, their private data remains private and cannot leave the web
application platform. This remains one of the strongest benefits to the Virtual Faraday
Cage’s unique architecture.
5.3.6 Time & Space Complexity Analysis
While some aspects of the time and space complexity for the Virtual Faraday Cage have
been discussed, and seem promising, a full and in-depth consideration of the additional
overhead required by the Virtual Faraday Cage has not been discussed in detail. Additionally, the analysis of the time and space complexity for sandboxing is incomplete, as
is the complexity of the Virtual Faraday Cage’s protocol when considered platform-wide.
5.4 Future Work
5.4.1 Purpose
While Purpose is considered unenforceable by the Virtual Faraday Cage, allowing thirdparties to claim a purpose can still serve a useful role in the Virtual Faraday Cage.
Currently, third-parties must supply data-specific purposes for the initial data that they
request (if any) from end-users when end-users first grant these third-party extensions
access to their data.
154
The Virtual Faraday Cage can easily be extended to allow for purpose specification
when a third-party attempts to perform any requested operations, allowing an end-user
to potentially better gauge whether or not to allow an operation or not. While thirdparties could easily lie about their purposes, having the ability to both log and track data
sharing for specified purposes may both improve the end-user and third-party developer
experience, as well as provide some form of accountability and tracking when data is
shared with third-parties.
As mentioned in Section 5.3.1, the Virtual Faraday Cage should support purpose to
a greater extent, for example, by allowing a third-party to re-specify a new purpose for
the use of data.
5.4.2 Enhanced Support for Legal Compliance
As indicated in Section 5.3.1, the Virtual Faraday Cage falls short as a complete system
for PIPEDA compliance. Additionally, apart from PIPEDA, there are several other
important privacy laws that exist worldwide (see Section 1.4.2). Future work with the
Virtual Faraday Cage should seek to further address the issue of privacy law compliance,
and seek to enhance the Virtual Faraday Cage as a tool for privacy law compliance.
5.4.3 Callbacks
For many API function calls, a callback ID can be passed as an optional parameter so
that the Virtual Faraday Cage can send a potentially delayed response to the third-party
extension. This provides two benefits: 1) third-party extensions do not have to wait
for processing to occur on the back-end, or wait for end-user input, and 2) callbacks
can be delayed or dropped, making it unclear to the third-party when the end-user was
online and denied a request. In the latter case, this helps prevent leakage of information
regarding when a user was online or not.
155
For example, in the context of a third-party extension asking for geo-locational data,
if an end-user had authorized that third-party extension to always see the the city or
country that the end-user resides in, the initial map view could be of that locale and
if the user then accepts sharing their fine-grained details, the more detailed locational
information could be sent to the extension through the callback, and the map view
updated. If the end user decides not to accept the request, then the callback request may
never be sent.
Callback delays were not implemented in the proof-of-concept for the Virtual Faraday
Cage, and there are no set guidelines for how the delays should ideally be implemented.
Future work in this area could explore callback delays and their impact on privacy, as
well as how “ideal” delays should be implemented.
5.4.4 Inter-extension Communication
As indicated in Section 5.3.2, Inter-extension Communication remains a challenge in the
Virtual Faraday Cage. Not only is communication of end-user data between the same extension difficult, but there is no proposal for how different extensions may communicate,
if at all.
One way to address this problem might be to split the local extension component’s
cache space from the channel for inbound communication from third-parties. Instead of
a unified local extension cache space, there would be a remote cache and a local cache.
The remote cache would allow for the third-party to write directly to it, however no other
entities would be able to write to data in that location. The local cache would grant full
read and write capabilities to the local extension component, but no other entities would
be able to read or write to the local cache. In this scenario, the web application platform
would then decide which other end-users’ extensions can read from the remote cache.
This may be done automatically, or by prompting the end-user to choose other end-users
156
specifically. On a social network, this may be dictated by friendship relations between
end-users. As the remote cache read-only for local extension components, it prohibits
the leakage of one end-user’s private data to another end-user’s caches; the only data
that can be shared with other users of that extension is data that the third-party already
had, and upstream communication is still prohibited.
Future work should explore this area, as one of the big benefits to social-network
based extensions is that they can leverage an end-user’s social network. Without being
able to communicate effectively and in a privacy-preserving manner between extensions
running on the same platform however, this no longer becomes an advantage.
5.4.5 Time & Space Complexity and Benchmarking
As stated in Section 5.2.5, 5.2.6, and 5.3.6, time and space complexity analysis for sandboxing and the protocol are currently lacking.
First, while limits on computational and storage resources for local extension components should be decided on a per-platform basis, guidelines should be proposed, along
with the rationale behind them. Ideally, limits should be expandable on a per-extension
and/or per-user basis – this could be accomplished through manual means (e.g., the
platform reviews and decides), or through charging end-users and/or third-parties.
Additionally, an in-depth study of how local extension components can be structured
to both provide maximum features (e.g., supporting long-lived tasks) and minimize the
resource costs. As indicated in Section 5.2.5, one potential avenue to addressing computational and storage resource usage would be to pass the costs to either third-parties or
end-users, or both. A future study should ascertain how such a system would work, and
if such a system would be economically viable and attractive to all involved parties.
Another option may be to consider offloading storage and computational resources to
end-users themselves, e.g., by requiring that third-party local extension components to
157
be either written in JavaScript or converted to such – and then running them within a
sandbox inside the end-users’ web browser.
Finally, benchmarks should also be established – both for sandboxing as well as for
the messages passed within the protocol. Without concrete data regarding both of these
aspects of the Virtual Faraday Cage, it would remain unclear as to how scalable the
Virtual Faraday Cage is in practice.
5.4.6 URI Ontology
Structuring data across web application platforms in a uniform way may also be highly
advantageous, however it is not clear how one may accomplish this. With a unified
URI ontology for data across these different web platforms, it may be possible to write
extensions that can operate across these platforms because the underlying end-user data
is still structured in the same way and located at the same URIs. Future work exploring
this area may overlap with work on the Semantic Web [140], as the latter is an attempt
to standardize the categorization and structure of diverse data content.
5.5 Summary
This thesis has presented a new and novel architecture for web application platforms
that seek to facilitate the interaction between end-users and third-party extensions in a
privacy-aware manner. Not only can information be shared in granular way, but information can be withheld completely from third-parties while still made usable by third-party
extensions. The Virtual Faraday Cage advocates a paradigm shift from the idea that
privacy must be sacrificed for utility, to the idea that privacy can be preserved while
still gaining some utility. In the process, this thesis has presented an overview of privacy
in the area of web applications, and highlighted many of the challenges associated with
existing approaches.
158
This thesis has also presented a theoretical model, within which security and privacy
issues were examined. By using this model and by splitting end-user data into two disjoint
sets (private data and sensitive data) while enforcing a strict information flow control
policy, private data can be unconditionally protected from disclosure to third-parties. On
the other hand, it was also shown that any sensitive data that is ever disclosed to any
third-party cannot be protected from disclosure to unauthorized parties.
Following that, the Virtual Faraday Cage was presented and described in detail. The
Virtual Faraday Cage architecture abides by the theoretical model, and consequently is
capable of providing the same unconditional guarantee towards the protection of end-user
private-data. The Virtual Faraday Cage is applicable towards a broad category of web
applications, and is capable of supporting both “current-style” third-party extensions
as well as new hybrid extensions and locally-hosted (sandboxed) extensions. Finally, a
proof-of-concept for the Virtual Faraday Cage was also constructed, demonstrating the
Virtual Faraday Cage’s core functionalities as feasible.
In conclusion, privacy continues to be a significant and major challenge for web applications, and the Virtual Faraday Cage is just one step towards addressing some of these
problems.
159
Bibliography
[1] K. Barker, M. Askari, M. Banerjee, K. Ghazinour, B. Mackas, M. Majedi, S. Pun,
and A. Williams, “A data privacy taxonomy,” in Proceedings of the 26th British
National Conference on Databases: Dataspace: The Final Frontier, ser. BNCOD
26.
Berlin, Heidelberg: Springer-Verlag, 2009, pp. 42–54. [Online]. Available:
http://dx.doi.org/10.1007/978-3-642-02843-4 7
[2] Department of Computer Science. (2012) The advanced database systems
and application (ADSA) laboratory. University of Calgary. [Online]. Available:
http://www.adsa.cpsc.ucalgary.ca/
[3] MySpace
Developer
Platform.
(2010)
Applications
FAQs.
Retrieved
4/20/2012; Last modified (according to site) on 9/9/2010. [Online]. Available:
http://wiki.developer.myspace.com/index.php?title=Applications FAQs#
I just published my app. How long will the approval process take.3F
[4] Facebook. (2012) Facebook developers. [Online]. Available: http://developers.
facebook.com/
[5] D.
tion,
M.
Boyd
history,
nication,
vol.
and
N.
B.
Ellison,
“Social
network
sites:
Defini-
and scholarship.” Journal of Computer-Mediated Commu13,
no.
1,
pp.
210
–
230,
2007.
[Online].
Avail-
able: http://ezproxy.lib.ucalgary.ca:2048/login?url=http://search.ebscohost.com/
login.aspx?direct=true&db=ufh&AN=27940595&site=ehost-live
[6] Unknown.
ble
Blog.
(2007)
[Online].
10
facts
Available:
you
should
know
about
bebo.
Socia-
http://www.sociableblog.com/2007/11/14/
10-facts-you-should-know-about-bebo/
160
[7] J. Owyang. (2008) Social network stats:
Facebook, myspace, reunion. Web
Strategy by Jeremiah Owyang. [Online]. Available: http://www.web-strategist.
com/blog/2008/01/09/social-network-stats-facebook-myspace-reunion-jan-2008/
[8] J.
users,
book.
Smith.
(2009,
continuing
[Online].
February)
to
grow
Available:
Facebook
by
600k
surpasses
users/day.
175
million
Inside
Face-
http://www.insidefacebook.com/2009/02/14/
facebook-surpasses-175-million-users-continuing-to-grow-by-600k-usersday/
[9] Friendster Inc. (2009) About friendster. Friendster, Inc. Retrieved on December
22nd, 2009. [Online]. Available: http://www.friendster.com/info/index.php
[10] P. Perez. (2009) Orkut — stimulate your social life. Social Networks 10. Retrieved
on December 22nd, 2009. [Online]. Available: http://www.socialnetworks10.com/
orkut
[11] LiveJournal, Inc. (2009) About LiveJournal. LiveJournal, Inc. Retrieved on
December 22nd, 2009. [Online]. Available: http://www.livejournal.com/
[12] Tencent Inc. (2009) What is qq? I’M QQ - QQ Official Site. Retrieved December
22nd 2009. [Online]. Available: http://www.imqq.com/
[13] W. Luo, Q. Xie, and U. Hengartner, “FaceCloak: An architecture for user privacy on social networking sites,” Computational Science and Engineering, IEEE
International Conference on, vol. 3, pp. 26–33, 2009.
[14] United Nations. (1948) The universal declaration of human rights. Retrieved on
4/20/2012. [Online]. Available: http://www.un.org/en/documents/udhr/
[15] United Nations Educational Scientific and Cultural Organization. (2012)
161
UNESCO privacy chair - motivation. Retrieved 4/20/2012. [Online]. Available:
http://unescoprivacychair.urv.cat/motivacio.php
[16] P. Guarda and N. Zannone, “Towards the development of privacy-aware
systems,” Information & Software Technology, vol. 51, pp. 337–350, 2009. [Online]. Available:
http://academic.research.microsoft.com/Publication/5881826/
towards-the-development-of-privacy-aware-systems
[17] S. Kenny and J. J. Borking, “The value of privacy engineering,” Journal of Information, Law and Technology, vol. 2002, 2002. [Online]. Available: http://academic.
research.microsoft.com/Publication/794202/the-value-of-privacy-engineering
[18] N. Kiyavitskaya, A. Krausova, and N. Zannone, “Why eliciting and managing legal
requirements is hard,” in Requirements Engineering and Law, 2008. RELAW ’08.,
sept. 2008, pp. 26 –30.
[19] R. Gellman, “Privacy in the clouds: Risks to privacy and confidentiality from cloud
computing,” World Privacy Forum, Tech. Rep., 2009.
[20] P. Roberts. (2011) HIPAA bares its teeth:
violation. threatpost:
$4.3m fine for privacy
The Kapersky Lab Security News Service. Re-
trieved 4/20/2012. [Online]. Available:
http://threatpost.com/en us/blogs/
hipaa-bares-its-teeth-43m-fine-privacy-violation-022311
[21] G. of Canada, “Personal information protection and electronic documents act,”
2000.
[22] E. Denham (Assistant Privacy Commissioner of Canada), “Report of findings into
the complaint filed by the canadian internet policy and public interest clinic (cippic)
against facebook inc. under the personal information protection and electronic doc-
162
uments act,” Findings under the Personal Information Protection and Electronic
Documents Act (PIPEDA), 2009.
[23] P. Lawson, “Pipeda complaint: Facebook,” Canadian Internet Policy and Public
Interest Clinic, 2008.
[24] T. Israel, “Statement of concern re: Facebooks new privacy approach,” Canadian
Internet Policy and Public Interest Clinic (CIPPIC), 2010.
[25] European Parliament and Council, “Directive 95/46/ec of the european parliament
and of the council on the protection of individuals with regard to the processing
of personal data and on the free movement of such data,” Official Journal of the
European Communities, 1995.
[26] ——, “Directive 2002/58/ec of the european parliament and of the council concerning the processing of personal data and the protection of privacy in the electronic
communications sector (directive on privacy and electronic communications),” Official Journal of the European Communities, 2002.
[27] E. Commission, “Regulation of the european parliament and of the council on the
protection of individuals with regard to the processing of personal data and on
the free movement of such data (general data protection regulation),” European
Commission, 2012.
[28] United Kingdom. (1998) Data protection act 1998. [Online]. Available:
http://www.legislation.gov.uk/ukpga/1998/29/contents
[29] U. S. D. of Justice. (1988) Overview of the privacy act of 1974, 2012 edition.
[Online]. Available: http://www.justice.gov/opcl/1974privacyact-overview.htm
163
[30] O. o. J. P. J. I. S. U.S. Department of Justice. (1986) Electronic communications
privacy act of 1986. [Online]. Available: https://it.ojp.gov/default.aspx?area=
privacy&page=1285
[31] R. Gross and A. Acquisti, Privacy Enhancing Technologies.
/ Heidelberg, 2006, ch. Imagined Communities:
Springer Berlin
Awareness, Information
Sharing, and Privacy on the Facebook, pp. 36 – 58. [Online]. Available:
http://www.springerlink.com/content/gx00n8nh88252822
[32] Facebook. (2012) Facebook ads. [Online]. Available: http://www.facebook.com/
advertising/
[33] J. Quittner. (2008) MySpace to businesses: Kiss myads. Retrieved on 4/20/2012.
[Online]. Available: http://www.time.com/time/business/article/0,8599,1849458,
00.html
[34] P. Domingos and M. Richardson, “Mining the network value of customers,” in
KDD ’01: Proceedings of the seventh ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. New York, NY, USA: ACM Press, 2001,
pp. 57–66. [Online]. Available: http://dx.doi.org/10.1145/502512.502525
[35] S. Staab, P. Domingos, P. Mike, J. Golbeck, L. Ding, T. Finin, A. Joshi, A. Nowak,
and R. R. Vallacher, “Social networks applied,” Intelligent Systems, IEEE [see also
IEEE Intelligent Systems and Their Applications], vol. 20, no. 1, pp. 80–93, 2005.
[Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=1392679
[36] N. Mook. (2005) Cross-site scripting worm hits MySpace. betanews. Retrieved 4/20/2012. [Online]. Available:
cross-site-scripting-worm-hits-myspace/
http://betanews.com/2005/10/13/
164
[37] K. J. Higgins. (2012) Worm siphons 45,000 facebook accounts. Dark
Reading (UBM TechWeb). Retrieved 4/21/2012. [Online]. Available:
http:
//www.darkreading.com/insider-threat/167801100/security/attacks-breaches/
232301379/worm-siphons-45-000-facebook-accounts.html?itc=edit stub
[38] S. Machlis. (2010) How to hijack facebook using firesheep. Computerworld
(PCWorld). Retrieved on 4/21/2012. [Online]. Available: http://www.pcworld.
com/article/209333/how to hijack facebook using firesheep.html
[39] J. Gold. (2012) Social predators still gaming the system on facebook. NetworkWorld
(PCWorld). Retrieved 4/21/2012. [Online]. Available: http://www.pcworld.com/
article/254185/social predators still gaming the system on facebook.html
[40] D. Danchev. (2012) Facebook phishing attack targets syrian activists. ZDNet
(CBS Interactive). Retrieved 4/21/2012. [Online]. Available: http://www.zdnet.
com/blog/security/facebook-phishing-attack-targets-syrian-activists/11217
[41] S. Mukhopadhyay. (2009) Black woman murdered by stalker who had been
using Youtube and Facebook to threaten her. Feministing. Retrieved 4/20/2012.
[Online]. Available: http://feministing.com/2009/04/14/black woman murdered
by facebo/
[42] A. Moya. (2009) An online tragedy. CBS News. Retrieved 4/20/2012. [Online].
Available:
http://www.cbsnews.com/stories/2000/03/23/48hours/main175556.
shtml
[43] Metro.co.uk. (2009) Facebook stalker is given a life ban for bombarding student with 30 messages a day. Associated Newspapers Limited.
Retrieved
4/20/2012.
[Online].
Available:
http://www.metro.co.uk/news/
765760-facebook-stalker-is-given-a-life-ban-for-bombarding-student-with-30-messages-a-day
165
[44] D. Thompson. (2011) Facebook stalker won’t ’fit in’ to prison, says lawyer.
Associated Press (via MSNBC). Retrieved 4/20/2012. [Online]. Available:
http://www.msnbc.msn.com/id/42019388/ns/technology and science-security/
[45] T.
Telegraph.
Telegraph
line].
Media
Available:
(2012)
Facebook
Group
stalker
Limited.
’murdered’
Retrieved
ex-girlfriend.
4/20/2012.
[On-
http://www.telegraph.co.uk/news/uknews/crime/9041971/
Facebook-stalker-murdered-ex-girlfriend.html
[46] T. N. Jagatic, N. A. Johnson, M. Jakobsson, and F. Menczer, “Social phishing,”
Communications of the ACM, vol. 50, no. 10, pp. 94–100, 2007.
[47] Microsoft Corporation. (2013) Phishing: Frequently asked questions. [Online].
Available: http://www.microsoft.com/security/online-privacy/phishing-faq.aspx
[48] Discovery Channel. (2007) 2057. Discovery Communications LLC. Retrieved on
4/20/2012. [Online]. Available: http://dsc.discovery.com/convergence/2057/2057.
html
[49] CBC. (2009) Depressed woman loses benefits over Facebook photos. CBC News
(Montreal). [Online]. Available: http://www.cbc.ca/canada/montreal/story/2009/
11/19/quebec-facebook-sick-leave-benefits.html
[50] J. M. Grohol. (2011) Posting about health concerns on facebook, twitter. Psych
Central. [Online]. Available: http://psychcentral.com/blog/archives/2011/02/01/
posting-about-health-concerns-on-facebook-twitter/
[51] S. Li. (2011) Insurers are scouring social media for evidence of fraud. Los Angeles
Times. [Online]. Available:
http://articles.latimes.com/2011/jan/25/business/
la-fi-facebook-evidence-20110125
166
[52] Agence
book
France-Presse.
speeding
Available:
(2010)
clip:
Police
report.
charge
ABS-CBN
driver
over
Interactive.
Face[Online].
http://www.abs-cbnnews.com/lifestyle/classified-odd/10/13/10/
police-charge-driver-over-facebook-speeding-clip-report
[53] G.
leads
Laasby.
to
(2011)
arrest.
Greenfield
Journal
Sentinel
trieved 4/20/2012. [Online]. Available:
teen’s
facebook
Online
(Milwaukee
post
for
Wisconsin).
drugs
Re-
http://www.jsonline.com/news/crime/
greenfield-teens-facebook-post-for-drugs-leads-to-arrest-131826718.html
[54] WRAL Raleigh-Durham Fayetteville, N.C. (2005) NCSU students face underage
drinking charges due to online photos. Capitol Broadcasting Company Inc.
(Mirrored by the Internet Archive Wayback Machine). Retrieved 4/20/2012.
[Online]. Available:
http://web.archive.org/web/20051031084848/http://www.
wral.com/news/5204275/detail.html
[55] D.
Chalfant.
party-goers.
net
line].
(2005)
The
Archive’s
Available:
Facebook
Northerner
Wayback
postings,
Online
Machine).
photos
(Mirrored
Retrieved
incriminate
dorm
by
Inter-
the
4/20/2012.
[On-
http://web.archive.org/web/20051125003232/http:
//www.thenortherner.com/media/paper527/news/2005/11/02/News/Facebook.
Postings.Photos.Incriminate.Dorm.PartyGoers-1042037.shtml
[56] TMCnews. (2006) Officials at institutions nationwide using facebook site.
Technology Marketing Corporation. Retrieved 4/20/2012. [Online]. Available:
http://www.tmcnet.com/usubmit/2006/03/29/1518943.htm
[57] N. Hass. (2006) In your facebook.com. The New York Times. Retrieved
167
4/20/2012. [Online]. Available: http://www.nytimes.com/2006/01/08/education/
edlife/facebooks.html
[58] R. Gross, A. Acquisti, and H. J. Heinz, III, “Information revelation and privacy in
online social networks,” in WPES ’05: Proceedings of the 2005 ACM Workshop on
Privacy in the Electronic Society. New York, NY, USA: ACM, 2005, pp. 71–80.
[59] T. Krazit. (2010) What google needs to learn from buzz backlash. CNN
Tech (Mirrored by the Internet Archive’s Wayback Machine). Retrieved
4/20/2012. [Online]. Available: http://web.archive.org/web/20100220075121/http:
//www.cnn.com/2010/TECH/02/18/cnet.google.buzz/index.html
[60] H. Tsukayama. (2012) Google faces backlash over privacy changes. The Washington Post. Retrieved 4/20/2012. [Online]. Available: http://www.washingtonpost.
com/business/technology/google-faces-backlash-over-privacy-changes/2012/01/
25/gIQAVQnMQQ story.html
[61] H. Blodget. (2007) Facebook’s ”beacon” infuriate users, moveon. Business
Insider. [Online]. Available: http://articles.businessinsider.com/2007-11-21/tech/
30022354 1 facebook-users-web-sites-ads
[62] R.
Singel.
(2009)
Facebook
loosens
lash. WIRED. [Online]. Available:
privacy
controls,
sparks
a
back-
http://www.wired.com/epicenter/2009/
12/facebook-privacy-backlash/
[63] BBC. (2010) Facebook mulls u-turn on privacy. BBC News. [Online]. Available:
http://news.bbc.co.uk/2/hi/technology/10125260.stm
[64] D. Rosenblum, “What anyone can know: The privacy risks of social networking
sites,” IEEE Security and Privacy, vol. 5, pp. 40–49, 2007.
168
[65] A. Felt and D. Evans, “Privacy protection for social networking platforms,” Web
2.0 Security and Privacy 2008 (Workshop).
2008 IEEE Symposium on Security
and Privacy, 2008.
[66] J. Bonneau, J. Anderson, and G. Danezis, “Prying data out of a social network,”
Social Network Analysis and Mining, International Conference on Advances in,
vol. 0, pp. 249–254, 2009.
[67] Facebook. (2012) Facebook platform policies. Retrieved 4/21/2012. [Online].
Available: http://developers.facebook.com/policy/
[68] S. Kelly. (2008) Identity ’at risk’ on facebook. BBC (BBC Click). [Online].
Available: http://news.bbc.co.uk/2/hi/programmes/click online/7375772.stm
[69] A. Felt, P. Hooimeijer, D. Evans, and W. Weimer, “Talking to strangers without
taking their candy: isolating proxied content,” in SocialNets ’08: Proceedings of
the 1st Workshop on Social Network Systems. New York, NY, USA: ACM, 2008,
pp. 25–30.
[70] S. Buchegger, “Ubiquitous social networks,” in Ubicomp Grand Challenge, Ubiquitous Computing at a Crossroads Workshop, London, U.K., January 6-7, 2009.
[71] N. Jag. (2007) MySpace friend adder bots exposed!
[Online]. Available:
http://www.nickjag.com/2007/08/23/myspace-friend-adder-bots-exposed/
[72] L.
staff
Conway.
for
[Online].
(2008,
calling
Available:
its
November)
flyers
Virgin
’chavs’.
The
atlantic
sacks
Independent
13
(UK).
http://www.independent.co.uk/news/uk/home-news/
virgin-atlantic-sacks-13-staff-for-calling-its-flyers-chavs-982192.html
169
[73] WKMG. (2007) Teacher fired over MySpace page. ClickOrlando.com (WKMG
Local 6) - Mirrored by the Internet Archive’s Wayback Machine. [Online].
Available: http://web.archive.org/web/20090217150126/http://clickorlando.com/
education/10838194/detail.html
[74] WESH. (2006) Local sheriff’s deputy fired over myspace profile. MSNBC (WESH
Orlando). [Online]. Available: http://www.wesh.com/news/9400560/detail.html
[75] A.
Judd.
(2009,
November)
fired
for
Facebook
pictures.
ered
Media.
[Online].
Available:
Ashley
payne,
NowPublic
former
-
Crowd
teacher
Pow-
http://www.nowpublic.com/strange/
ashley-payne-former-teacher-fired-facebook-pictures-2515440.html
[76] L. Wu, M. Majedi, K. Ghazinour, and K. Barker, “Analysis of social networking
privacy policies,” in International Conference on Extending Database Technology,
2010, pp. 1–5. [Online]. Available:
http://academic.research.microsoft.com/
Publication/13267317/analysis-of-social-networking-privacy-policies
[77] Proofpoint, Inc., “Outbound email and data loss prevention in todays enterprise,”
Proofpoint, Inc., Tech. Rep., 2009.
[78] Reputation.com. (2012) Online reputation management. Retrieved 4/21/2012.
[Online]. Available: http://www.reputation.com/
[79] Zululex LLC. (2012) ZuluLex reputation management company. Retrieved
4/21/2012. [Online]. Available: http://reputation.zululex.com/
[80] D. E. Denning, “A lattice model of secure information flow,” Communications
of the ACM, vol. 19, no. 5, pp. 236–243, May 1976. [Online]. Available:
http://doi.acm.org/10.1145/360051.360056
170
[81] J. H. Saltzer, “Protection and the control of information sharing in multics,”
Communications of the ACM, vol. 17, no. 7, pp. 388–402, Jul. 1974. [Online].
Available: http://doi.acm.org/10.1145/361011.361067
[82] D. E. Bell and L. J. LaPadula, “Secure computer systems: A mathematical model.
volume ii.” MITRE Corporation Bedford, MA, 1998. [Online]. Available: http://oai.
dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=AD0770768
[83] A. C. Myers and B. Liskov, “A decentralized model for information flow control,”
in Proceedings of the sixteenth ACM Symposium on Operating Systems Principles,
ser. SOSP ’97.
New York, NY, USA: ACM, 1997, pp. 129–142. [Online].
Available: http://doi.acm.org/10.1145/268998.266669
[84] I. Papagiannis, M. Migliavacca, P. Pietzuch, B. Shand, D. Eyers, and J. Bacon,
“Privateflow: decentralised information flow control in event based middleware,”
in Proceedings of the Third ACM International Conference on Distributed
Event-Based Systems, ser. DEBS ’09.
New York, NY, USA: ACM, 2009, pp.
38:1–38:2. [Online]. Available: http://doi.acm.org/10.1145/1619258.1619306
[85] A. Futoransky and A. Waissbein,
tions,”
in
Conference
line]. Available:
on
Privacy,
“Enforcing privacy in web applicaSecurity
and
Trust,
2005.
[On-
http://academic.research.microsoft.com/Publication/1906067/
enforcing-privacy-in-web-applications
[86] Apache Software Foundation. (2011) Apache accumulo. Retrieved 4/21/2012.
[Online]. Available: http://accumulo.apache.org/
[87] R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham, “Efficient software-based
fault isolation,” SIGOPS Operating Systems Principles Review, vol. 27, no. 5,
171
pp. 203–216, Dec. 1993. [Online]. Available: http://doi.acm.org/10.1145/173668.
168635
[88] I. Goldberg,
D. Wagner,
R. Thomas,
and E. A. Brewer,
“A secure
environment for untrusted helper applications confining the wily hacker,”
in Proceedings of the 6th conference on USENIX Security Symposium,
Focusing on Applications of Cryptography - Volume 6,
ser. SSYM’96.
Berkeley, CA, USA: USENIX Association, 1996, pp. 1–1. [Online]. Available:
http://dl.acm.org/citation.cfm?id=1267569.1267570
[89] S. Maffeis and A. Taly, “Language-based isolation of untrusted javascript,” in
Computer Security Foundations Workshop, 2009, pp. 77–91. [Online]. Available:
http://academic.research.microsoft.com/Paper/5113449.aspx
[90] Google et al. (2007) google-caja: Compiler for making third-party HTML, CSS
and JavaScript
safe for embedding. Retrieved 4/21/2012. [Online]. Available:
http://code.google.com/p/google-caja/
[91] W3C. (1998) Platform for privacy preferences (P3P) project. World Wide
Web Consortium (W3C). Retrieved 4/21/2012. [Online]. Available:
http:
//www.w3.org/P3P/
[92] Microsoft Corporation. (2012) A history of internet explorer. Retrieved 4/21/2012.
[Online].
Available:
http://windows.microsoft.com/en-us/internet-explorer/
products/history
[93] G. Karjoth, M. Schunter, and E. V. Herreweghen, “Translating privacy
practices into privacy promises -how to promise what you can keep,”
in Policies for Distributed Systems and Networks,
2003,
pp. 135–146.
172
[Online]. Available: http://academic.research.microsoft.com/Publication/664496/
translating-privacy-practices-into-privacy-promises-how-to-promise-what-you-can-keep
[94] P. Ashley, S. Hada, G. Karjoth, and M. Schunter, “E-P3P privacy policies and
privacy authorization,” in Proceedings of the 2002 ACM Workshop on Privacy in
the Electronic Society, ser. WPES ’02.
New York, NY, USA: ACM, 2002, pp.
103–109. [Online]. Available: http://doi.acm.org/10.1145/644527.644538
[95] K.
ing
Ghazinour
an
and
enforceable
K.
Barker,
lattice-based
“Capturing
structure,”
P3P
pp.
semantics
1–6,
2011.
us[On-
line]. Available: http://academic.research.microsoft.com/Publication/39237918/
capturing-p3p-semantics-using-an-enforceable-lattice-based-structure
[96] A. Rezgui, M. Ouzzani, A. Bouguettaya, and B. Medjahed, “Preserving privacy
in web services,” in Proceedings of the 4th International Workshop on Web
Information and Data Management, ser. WIDM ’02. New York, NY, USA: ACM,
2002, pp. 56–62. [Online]. Available: http://doi.acm.org/10.1145/584931.584944
[97] M. A. C. Dekker, S. Etalle, and J. den Hartog, “Privacy in an ambient
world,” Enschede, April 2006, imported from CTIT. [Online]. Available:
http://doc.utwente.nl/66174/
[98] M. W. Bagga,
“Privacy-enabled application scenarios for web services,”
2003. [Online]. Available: http://academic.research.microsoft.com/Publication/
4655282/privacy-enabled-application-scenarios-for-web-services
[99] W. Xu,
“A
V. N. Venkatakrishnan,
framework
for
building
R. Sekar,
privacy-conscious
in International Conference on Web Services,
and I. V. Ramakrishnan,
composite
2006,
web
services,”
pp. 655–662. [On-
173
line]. Available:
http://academic.research.microsoft.com/Publication/2362859/
a-framework-for-building-privacy-conscious-composite-web-services
[100] K. El-Khatib,
“A privacy negotiation protocol for web services,”
Workshop on Collaboration Agents:
tive Environments.
Halifax,
Information Technology;
line]. Available:
in
Autonomous Agents for Collabora-
Nova Scotia,
Canada:
NRC Institute for
National Research Council Canada,
2003. [On-
http://academic.research.microsoft.com/Publication/4507596/
a-privacy-negotiation-protocol-for-web-services
[101] S. Benbernou, H. Meziane, Y. H. Li, and M.-S. Hacid, “A privacy agreement model
for web services,” in Services Computing, 2007. SCC 2007. IEEE International
Conference on, july 2007, pp. 196 –203.
[102] D. Métayer, “Formal aspects in security and trust,” P. Degano, J. Guttman,
and F. Martinelli, Eds.
Berlin, Heidelberg:
Springer-Verlag, 2009, ch. A
Formal Privacy Management Framework, pp. 162–176. [Online]. Available:
http://dx.doi.org/10.1007/978-3-642-01465-9 11
[103] B. Luo and D. Lee, “On protecting private information in social networks: A proposal,” in Data Engineering, 2009. ICDE ’09. IEEE 25th International Conference
on, 29 2009-april 2 2009, pp. 1603 –1606.
[104] R. Hamadi,
ing
of
vanced
H. young Paik,
privacy-aware
Information
line]. Available:
web
Systems
and B. Benatallah,
service
protocols,”
Engineering,
in
2007,
“Conceptual modelConference
pp.
on
233–248.
Ad[On-
http://academic.research.microsoft.com/Publication/2419338/
conceptual-modeling-of-privacy-aware-web-service-protocols
174
[105] S.
E.
site
Levy
privacy
Wide
and
C.
policies
Web
Gutwin,
with
Conference
Available:
“Improving
fine-grained
Series,
understanding
policy
2005,
anchors,”
pp.
of
web-
in
480–488.
World
[Online].
http://academic.research.microsoft.com/Publication/1242366/
improving-understanding-of-website-privacy-policies-with-fine-grained-policy-anchors
[106] Y.
Tian,
B.
Song,
and
E.
preservation
system
in
ference
Hybrid
Information
on
line]. Available:
nam
untrusted
Huh,
“A
environment,”
Technology,
threat-based
in
2009,
privacy
International
pp.
102–107.
Con[On-
http://academic.research.microsoft.com/Publication/6054821/
a-threat-based-privacy-preservation-system-in-untrusted-environment
[107] S. Guha, K. Tang, and P. Francis, “NOYB: privacy in online social networks,” in
WOSP ’08: Proceedings of the First Workshop on Online Social Networks.
New
York, NY, USA: ACM, 2008, pp. 49–54.
[108] M. M. Lucas and N. Borisov, “FlyByNight: mitigating the privacy risks of social
networking,” in WPES ’08: Proceedings of the 7th ACM Workshop on Privacy in
the Electronic Society. New York, NY, USA: ACM, 2008, pp. 1–8.
[109] R. Baden, A. Bender, N. Spring, B. Bhattacharjee, and D. Starin, “Persona: an
online social network with user-defined privacy,” SIGCOMM Computer Communication Review, vol. 39, no. 4, pp. 135–146, 2009.
[110] World Wide Web Consortium (W3C). (2013) Cross-origin resource sharing.
[Online]. Available: http://www.w3.org/TR/cors/
[111] A. Barth, A. P. Felt, P. Saxena, and A. Boodman, “Protecting browsers from
extension vulnerabilities,” in Network and Distributed System Security Symposium,
175
2009. [Online]. Available: http://academic.research.microsoft.com/Publication/
6357389/protecting-browsers-from-extension-vulnerabilities
[112] A. P. Felt, K. Greenwood, and D. Wagner, “The effectiveness of install-time permission systems for third-party applications,” UC Berkeley, Tech. Rep. EECS-2010143, 2010.
[113] M. Fredrikson and B. Livshits, “REPRIV: Re-envisioning in-browser privacy,”
Microsoft Research, Tech. Rep. MSR-TR-2010-116, 2010. [Online]. Available:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.173.13
[114] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and
suppression,” International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, vol. 10, no. 5, pp. 571–588, Oct. 2002. [Online]. Available:
http://dx.doi.org/10.1142/S021848850200165X
[115] Facebook. (2012) User - facebook developers (graph API). [Online]. Available:
https://developers.facebook.com/docs/reference/api/user/
[116] ——. (2012) Graph API. [Online]. Available: https://developers.facebook.com/
docs/reference/api/#insights
[117] J. Anderson, “Computer security technology planning study,” Command and Management Systems HQ Electronic Systems Divisions (AFSC), Tech. Rep., 1972.
[118] V. Stinner. (2012) pysandbox 1.5: Python package index. [Online]. Available:
http://pypi.python.org/pypi/pysandbox
[119] M. Muhammad and J. Cappos. (2012) Seattle: Open peer-to-peer computing.
Computer Science & Engineering at the University of Washington. [Online].
Available: https://seattle.cs.washington.edu/html/
176
[120] World Wide Web Consortium (W3C). (2000) SOAP (simple object access
protocol). [Online]. Available: http://www.w3.org/TR/soap/
[121] R. T. Fielding, “Representational state transfer (REST), from architectural styles
and the design of network-based software architectures,” Ph.D. dissertation,
University of California, Irvine, 2000, retrieved on 4/23/2012. [Online]. Available:
http://www.ics.uci.edu/∼fielding/pubs/dissertation/rest arch style.htm
[122] Google. (2011) Introducing protorpc for writing app engine web services
in python. [Online]. Available:
http://googleappengine.blogspot.ca/2011/04/
introducing-protorpc-for-writing-app.html
[123] ——. (2012) Google app engine. [Online]. Available: https://developers.google.
com/appengine/
[124] ——. (2012) Google app engine:
Pure python. [Online]. Available:
https:
//developers.google.com/appengine/docs/python/runtime#Pure Python
[125] J. Simon-Gaarde. (2011) Ladon webservice. [Online]. Available: http://packages.
python.org/ladon/
[126] Arskom Bilgisayar Danışmanlık ve Tic. Ltd Şti. (2011) Rpclib. [Online]. Available:
https://github.com/arskom/rpclib
[127] Unknown. (2011) pysimplesoap: Python simple soap library. [Online]. Available:
http://code.google.com/p/pysimplesoap/
[128] J. Ortel, J. Noehr, and N. V. Gheem. (2010) Suds. [Online]. Available:
https://fedorahosted.org/suds/
[129] Boomi Inc. (2008) appengine-rest-server: REST server for google app engine applications. [Online]. Available: http://code.google.com/p/appengine-rest-server/
177
[130] Aaron Swartz et al. (2012) Welcome to web.py. [Online]. Available:
http:
//webpy.org/
[131] Armin Ronacher et al. (2012) Flask - web development, one drop at a time.
Pocoo. [Online]. Available: http://flask.pocoo.org/
[132] Marcel Hellkamp et al. (2012) Bottle: Python web framework. [Online]. Available:
http://bottlepy.org/docs/dev/
[133] Yahoo! Inc. (2012) Make yahoo! web service rest calls with python. Retrieved on
4/24/2012. [Online]. Available: http://developer.yahoo.com/python/python-rest.
html
[134] Genivia Inc. (2012) The gSOAP toolkit for SOAP web services and XML-based
applications. [Online]. Available: http://www.cs.fsu.edu/∼engelen/soap.html
[135] Apache Software Foundation. (2012) Web services project @ Apache. [Online].
Available: http://ws.apache.org/soap/
[136] J. King and J. Kawash, “A real-time XML protocol for bridging virtual
communities,” International Journal of Networking and Virtual Organisations,
vol. 9, no. 3, pp. 248–264, September 2011. [Online]. Available:
http:
//dx.doi.org/10.1504/IJNVO.2011.042482
[137] W. W. W. Consortium. (2013) Web storage. [Online]. Available:
http:
//dev.w3.org/html5/webstorage/#disk-space
[138] G.
dows
able:
Keizer.
7
(2009)
security
Microsoft
change.
cites
’click
Computerworld.
fatigue’
for
[Online].
winAvail-
http://www.computerworld.com/s/article/9127458/Microsoft cites click
fatigue for Windows 7 security change?taxonomyId=17&pageNumber=2
178
[139] World Wide Web Consortium. (2009) Efficient XML interchange evaluation.
[Online]. Available: http://www.w3.org/TR/2009/WD-exi-evaluation-20090407/
[140] World Wide Web Consortium (W3C). (2001) W3C Semantic Web Activity.
[Online]. Available: http://www.w3.org/2001/sw/