SOSP 2015 slides

Transcription

SOSP 2015 slides
Vuvuzela
a scalable private messaging system
David Lazar
Jelle van den Hooff, Matei Zaharia, Nickolai Zeldovich
Motivation
Alice
Bob (Oncologist)
Encryption
Z28gUGF0cmlvdHMhCg
c2VhaGF3a3Mgc3Vjawo
Alice
Bob (Oncologist)
Problem: metadata
Z28gUGF0cmlvdHMhCg
Ex-boyfriend
Pfizer
Lawyer
c2VhaGF3a3Mgc3Vjawo
Alice
Bob (Oncologist)
Hospital Lawyer
Snowden
NY Times
AA
Guardian
White House
Goal: hide metadata
Vuvuzela
Ex-boyfriend
Pfizer
Lawyer
AA
Alice
Bob (Oncologist)
Hospital Lawyer
Snowden
NY Times
Guardian
White House
Goal: hide metadata
Vuvuzela
Ex-boyfriend
Pfizer
Lawyer
AA
Alice
Bob (Oncologist)
Hospital Lawyer
Snowden
NY Times
Guardian
White House
Goal: scalability
Vuvuzela
Ex-boyfriend
Pfizer
Lawyer
AA
Alice
Bob (Oncologist)
Hospital Lawyer
Snowden
NY Times
Guardian
White House
Tor is scalable
Alice
Bob
Tor network
Tor is insecure
Alice
Bob
Tor network
Tor is insecure
Low-Cost Traffic Analysis of Tor
Steven J. Murdoch and George Danezis
University of Cambridge, Computer Laboratory
15 JJ Thomson Avenue, Cambridge CB3 0FD
United Kingdom
{Steven.Murdoch,George.Danezis}@cl.cam.ac.uk
Users Get Routed:
Traffic Correlation on Tor by Realistic Adversaries
Abstract
Other systems, based on the idea of a mix, were developed to Aaron
carry low
latency traffic.
ISDNWacek
mixes [33]
1
2
Johnson
Chris
Rob Jansen1
Micah Sherr2
Paul Syverson1
Tor is the second generation Onion Router, supporting
propose a design that allows phone conversations to be
1 web-mixes [6] follow the same design pat2
the anonymous transport of TCP streams over the Interanonymised, and
U.S. Naval Research Laboratory, Washington DC
Georgetown University, Washington DC
net. Its low latency makes it very suitable for common
terns to{aaron.m.johnson,
anonymise web traffic.
A service paul.syverson}@nrl.navy.mil
based on these
rob.g.jansen,
{cwacek, msherr}@cs.georgetown.edu
tasks, such as web browsing, but insecure against trafficideas, the Java Anon Proxy (JAP)1 has been implemented
and is running at the University of Dresden. These apanalysis attacks by a global passive adversary. We present
proaches
work in a synchronous fashion, which is not well
new traffic-analysis techniques that allow adversaries with
ABSTRACT
The traffic correlation problem in Tor has seen much attention
adapted for the asynchronous nature of widely deployed
only a partial view of the network to infer which nodes are
in the literature. Prior Tor security analyses often consider entropy
We present the first analysis of the popular Tor anonymity network
or similar statistical measures as metrics of the security provided
TCP/IPthat
networks
being used to relay the anonymous streams and therefore
Circuit
Fingerprinting
Attacks:
indicates[8].
the security of typical users
against reasonably realisby the system at a static point in time. In addition, while prior
greatly reduce the anonymity provided by Tor. Furthermore,
The
Onion
Routing
project
has
been
working
on
streamtic
adversaries
in
the
Tor
network
or
in
the
underlying
Internet.
Our
Passive Deanonymization
of Tor
Hiddento compromise
Services metrics of security may provide useful information about overall
we show that otherwise unrelated streams can be linked
show that
Tor users are faranonymous
more susceptible
level, results
low-latency,
high-bandwidth
communiusage, they typically do not tell users how secure a type of behavindicated
by prior
Specific
of the paper
back to the same initiator. Our attack is feasible for the
cationsthan
[35].
Their
latestwork.
design
and contributions
implementation,
ior is. Further, similar previous work has thus far only considered
include
(1)
a
model
of
various
typical
kinds
of
users,
(2)
an
adveradversary anticipated by the Tor designers. Our theoretiTor [18], has many
attractive features,†including forward
seadversaries†that control either a subset of the members of the Tor
†
‡§†⇤
‡
model that ,includes
Tor
network
relays,Dacier
autonomous
systems
Albert Kwon
AlSabah
Lazar
, Marc
, and
Srinivas network,
Devadas
cal attacks are backed up by experiments performed
on the , Mashael
curity sary
and
support for David
anonymous
servers.
These
features,
a single autonomous system (AS), or a single Internet ex(ASes), Internet exchange points (IXPs), and groups of IXPs drawn
deployed, albeit experimental, Tor network. Our techniques
and
its
ease
of
use,
have
already
made
it
very
popular,
and
change
point
(IXP). These analyses have missed important charfrom empirical
study, (3) metrics
that indicate how secure users are
† Massachusetts Institute
of Technology,
{kwonal,lazard,devadas}@mit.edu
acteristics of the network, such as that a single organization often
should also be applicable to any low latency anonymous
a testing
network,
available
for
public
use, already
has 50model to
over
a
period
of
time,
(4)
the
most
accurate
topological
controls several geographically diverse ASes or IXPs. That organiComputing
Research
network. These attacks highlight the relationship between ‡ Qatar
nodes
acting
as onion
routers
(as of
November
2004).
date
of ASes
and IXPs
as Institute,
they
relate
[email protected]
Tor usage
and network conzation may have malicious intent or undergo coercion, threatening
§figuration,
the field of traffic-analysis and more traditional computer
(5)
a
novel
realistic
Tor
path
simulator
(TorPS),
and
(6)
Qatar
University,
[email protected]
Tor aims to protect the anonymity of its users from nonusers of all network components under its control.
of security
all the
above. To
show
security issues, such as covert channel analysis. Our reglobalanalyses
adversaries.
Thismaking
meansuse
thatofthe
adversary
has
thethat our
Given the severity of the traffic correlation problem and its seapproach
is
useful
to
explore
alternatives
and
not
just
Tor
as
cursearch also highlights that the inability to directly observe
ability to observe and control some part of the network, but
curity
implications, we develop an analysis framework for evaluatrently
deployed, we
analyze
published
alternative
pathservices
senetwork links does not prevent an attacker
from
performing
This
paper
sheds light on
weaknesses
in also
As aa result,
many
are
accessi-of various user behaviors on the live Tor network
notcrucial
its lection
totality.
Similarly,
the adversary
isTor.
assumed
tosensitive
beancaingonly
the security
algorithm,
Congestion-Aware
We create
empirical
traffic-analysis: the adversary can use design
the anonymising
of hiddennetservices pable
that allow
us to break
the
ble
through
Tor.By Prominent
examplesand
include
human
show
how
to concretely apply this framework by performing a
of
controlling
some
fraction
of
Tor
nodes.
making
model of Tor congestion, identify novel attack vectors, and show
comprehensive
evaluation of the security of the Tor network [41]
work as an oracle to infer the traffic load
on remote
nodes service
anonymity
of hidden
clients
and
operators
pasrights
and
whistleblowing
organizations
such
as
Wikthese
assumptions,
the
designers
of
Tor
believe
it
is
safe
to
that it too is more vulnerable than previously indicated.
against the
threat of complete deanonymization. To enable such an
thatonly
theminimal
circuits,mixing
paths of the
ileaks
andcells
Globalleaks,
tools for anonymous
messagin order to perform traffic-analysis. sively. In particular, we show
employ
stream
that are reanalysis,
we develop a detailed model of a network adversary that
established through the Torlayed,
network,
used lowering
to commuing such
as TorChat
Bitmessage, and
black markets
therefore
the
latency
overhead
of theand
comCategories
andbeSubject
includes
(i) many
the largest and most accurate system for AS path innicate with hidden servicesmunication.
exhibit
a very different
like Descriptors
Silkroad and Black Market Reloaded.
Even
ference
yet
applied
to Tor and (ii) a thorough analysis of the threat
C.2.0 [Computer-Communication
Networks]:
General—Secuhavior compared to a general This
circuit.
Weofpropose
two with
non-hidden
services,
1 Introduction
choice
threat model,
its limitation
of thelike
ad-Facebook andofDuckDuckGo,
Internet exchange points and IXP coalitions. We also develop
rity
and
protection
attacks, under two slightly versaries’
different threat
models,
that a subject
recently
have started in
providing
hidden versions
of theirthat inform this analysis, considering the network
realistic metrics
powers,
has been
of controversy
the
could identify a hidden service client or operator using
websites to provide stronger anonymity guarantees.
topology
as it evolves over time, for example, as new relays are
anonymity community, yet most of the discussion has foAnonymous communication networks were first introthese weaknesses. We found that
we can identify the
Keywords
introduced
and others go offline.
cused on assessing whether these restrictions
of attackers’
duced by David Chaum in his seminal paper [10] describing
That said, over
the past few years, hidden
services
users’ involvement with hidden services
withmetrics;
more than
Our analysis
shows that 80% of all types of users may be deAnonymity;
onion
routing
capabilities are ‘realistic’ or not.have
We witnessed
leave thisvarious
discussion
the mix as a fundamental building block for anonymity. A
active attacks in the
wild [12,by
28],
anonymized
a relatively moderate Tor-relay adversary within six
98% true positive rate and less than 0.1% false positive
aside and instead show that traffic-analysis
can be
mix acts as a store-and-forward relay that hides the correresulting in attacks
several takedowns
[28]. To months.
examineOur
the results
se- also show that against a single AS adversary
rate with the first attack, and 99% true positive rate and
successfully mounted against Tor even within this very re-
Alice
Bob
Tor network
Related work
Pond
Scalability
Tor
Riposte [Oakland 2015]
Dissent [OSDI 2012]
Privacy
Contribution
Tor
Pond
Scalability
Vuvuzela
Riposte [Oakland 2015]
Dissent [OSDI 2012]
Privacy
Contribution
•
Vuvuzela: the first private messaging system that hides
metadata from powerful adversaries for millions of users
•
Vuvuzela scales linearly with the number of users
•
Differential privacy for millions of messages per user
for one million users
•
37s end-to-end message latency
•
60,000 messages / second throughput
•
Good match for private text-based messaging
Vuvuzela overview
•
Handful of servers arranged in a chain
•
Users send/receive messages through
the first server
Alice
Bob
Charlie
Server 1
•
Server 2
Server 3
Last server decides who gets
what messages and sends
them back down the chain
Vuvuzela’s two protocols
Dialing protocol:
Alice
Initiate conversation session between two users
Bob
Conversation protocol:
Charlie
Exchange messages between two users
Threat model
•
All but one server are compromised
•
Adversary is active (can knock users
offline, tamper with messages, etc)
•
All users might be malicious
(besides you and your friends)
•
PKI: users know each other’s keys
Alice
Bob
Charlie
Metadata privacy
Scenario 1
Alice Bob Charlie
Vuvuzela
Scenario 2
Scenario 3
Alice Bob Charlie
Alice Bob Charlie
Vuvuzela
Vuvuzela
Metadata privacy
Scenario 1
Alice Bob Charlie
Scenario 2
Scenario 3
Alice Bob Charlie
Alice Bob Charlie
Vuvuzela
Vuvuzela
Vuvuzela
?
?
?
47D1FC9A…
traffic analysis
hacked servers
Approach to scalable privacy
•
Use efficient cryptography to encrypt as much
metadata as possible.
•
Add noise to metadata that we can’t “encrypt.”
•
Use differential privacy to reason about how much
privacy the noise gives us.
Dead drops prevent users
from talking directly
Alice
Bob
Charlie
Dead drop: a place to
leave a message that
another user can pick up
Talking via dead drops
Alice
Bob
Charlie
De
ad
Me
dro
ssa
p: z
ge:
zp8
“Hi
ns0
Bo
b! nrxt3
Ho
w’s g9efb
it g
6c
oin
g?”
Dead
Mess drop: zzp
8ns0
age:
nrxt3
“”
g9ef
b6c
Conversation protocol
Alice
Bob
Charlie
De
ad
Me
dro
ssa
p: z
ge:
zp8
“Hi
ns0
Bo
b! nrxt3
Ho
w’s g9efb
it g
6c
oin
g?”
Dead
Mess drop: zzp
8ns0
age:
nrxt3
“”
g9ef
b6c
Round 1
Conversation protocol
Dead drop:
Fsdd5vPML
H3KARqE2a
Message: “”
Alice
Bob
a
2
E
q
R
A
K
3
H
L
M
P
v
5
d
d
s
F
:
Dead drop
”
!
s
k
n
a
h
t
,
d
o
o
g
m
’
I
“
:
e
g
a
s
s
Me
Charlie
Round 2
Conversation protocol
Alice
Bob
Charlie
Round 3
Conversation protocol
Alice
Bob
Charlie
Round 4
Messages are encrypted
Alice
Bob
Charlie
Dead drop:
Fsdd5vPML
H3KARqE2a
Message: W
CzdjL5wBNp
JUtt9tE7…
a
2
E
q
R
A
K
3
H
L
M
P
v
5
d
d
s
F
:
j…
Dead drop
E
g
6
P
u
4
W
q
8
k
V
s
W
Q
1
T
j
y
:
e
g
Messa
Idle clients send cover traffic
Alice
Bob
Charlie
Dead drop:
Fsdd5vPML
H3KARqE2a
Message: W
CzdjL5wBNp
JUtt9tE7…
a
2
E
q
R
A
K
3
H
L
M
P
v
5
d
d
s
F
:
j…
Dead drop
E
g
6
P
u
4
W
q
8
k
V
s
W
Q
1
T
j
y
:
e
g
Messa
Idle clients send cover traffic
Alice
Dead drop:
Fsdd5vPML
H3KARqE2a
Message: W
CzdjL5wBNp
JUtt9tE7…
Bob
a
2
E
q
R
A
K
3
H
L
M
P
v
5
d
d
s
F
:
j…
Dead drop
E
g
6
P
u
4
W
q
8
k
V
s
W
Q
1
T
j
y
:
e
g
Messa
Charlie
h
C
r
7
U
R
E
r
v
T
T
u
O
Z
6
0
y
u
:
p
o
r
d
d
Dea
…
0
s
O
K
7
2
6
B
e
r
5
H
G
D
p
X
w
J
:
e
g
a
s
Mes
Dead drops give privacy
Alice
Dead drop:
Fsdd5vPML
H3KARqE2a
Message: W
CzdjL5wBNp
JUtt9tE7…
Bob
a
2
E
q
R
A
K
3
H
L
M
P
v
5
d
d
s
F
:
j…
Dead drop
E
g
6
P
u
4
W
q
8
k
V
s
W
Q
1
T
j
y
:
e
g
Messa
Charlie
h
C
r
7
U
R
E
r
v
T
T
u
O
Z
6
0
y
u
:
p
o
r
d
d
Dea
…
0
s
O
K
7
2
6
B
e
r
5
H
G
D
p
X
w
J
:
e
g
a
s
Mes
Dead drops give privacy
Alice
Dead drop:
Fsdd5vPML
H3KARqE2a
Message: W
CzdjL5wBNp
JUtt9tE7…
Bob
a
2
E
q
R
A
K
3
H
L
M
P
v
5
d
d
s
F
:
j…
Dead drop
E
g
6
P
u
4
W
q
8
k
V
s
W
Q
1
T
j
y
:
e
g
Messa
Charlie
h
C
r
7
U
R
E
r
v
T
T
u
O
Z
6
0
y
u
:
p
o
r
d
d
Dea
…
0
s
O
K
7
2
6
B
e
r
5
H
G
D
p
X
w
J
:
e
g
a
s
Mes
Mixnet hides origin of
messages
Alice
Bob
Charlie
Mixnet hides origin of
messages
A
Alice
B
Bob
C
Charlie
Mixnet hides origin of
messages
Alice
Bob
B
C
A
Charlie
Mixnet hides origin of
messages
Alice
Bob
B
C
A
Charlie
Are we done yet?
Alice
Bob
Charlie
2
1
Are we done yet?
Alice
2
Bob
Charlie
1
Challenge: dead drop counts
reveal access patterns
Demo!
Let’s see why access counts are a problem.
Solution: Each server adds
noise
Alice
Bob
Charlie
Fake exchanges (noise)
1
2
1
1
2
What is noise?
Fake singles
Fake doubles
Dead drop: RY9VjW4XROtTcbnZPaJ
Message: Bzizd2loCIeXdIfHU33mds…
Dead drop: 3nPki8GbZWfXRyw61wk
Message: nE7yvLJLeiCvcD1Cu62…
Dead drop: t53c81TtFdmBCzFLQ7Q
Message: rCCnMCttJ8C8JMthLxN8…
Dead drop: pavnHQmuegSmvXz6Y5
Message: IuA94shFx7okpZdBacjBg…
Dead drop: 3nPki8GbZWfXRyw61wk
Message: 4QjdRfoB7GoEEb0vtMjf…
Dead drop: kt2JnceRb7ieU3M1k5Oj
Message: mb4ZgDABTLTtm9rUZzV…
Dead drop: kt2JnceRb7ieU3M1k5Oj
Message: wYNxuyoOiP9Ffjr4LKtv38…
Dead drop: LWnyE3AB2TTmUcCGL
Message: k1bVsoTVlJQTEy92Vxd1o…
Dead drop: LWnyE3AB2TTmUcCGL
Message: mTLa2cdkKgzADt0oJm8s…
Demo!
Vuvuzela with noise is effective!
Formalizing privacy guarantee
Pr[ i | Alice talked to Bob]
Alice Bob
Vuvuzela
≈𝜺 × Pr[ i | not Alice talked to Bob]
Alice Bob
Vuvuzela
(𝜀,𝛿) differential privacy, simplified
Pr[ i | Alice talked to Bob]
Alice Bob
Vuvuzela
≤ 𝜺 × Pr[ i | not Alice talked to Bob]
Alice Bob
Vuvuzela
Noise achieves DP
•
Let d be the number of dead drops with two
accesses in a single round.
•
To make d differentially private, we need to make
these distributions very close (indistinguishable):
Pr[ d=x | not Alice talked to Bob]
Probability
Probability
Pr[ d=x | Alice talked to Bob]
0 1
Dead drops with two messages
250
Dead drops with two messages
Generating this distribution
Pr[ d=x | not Alice talked to Bob]
Probability
Pr[ d=x | Alice talked to Bob]
250
Dead drops with two messages
Constraints:
•
Can’t have negative dead drops
•
Distributions have to be close enough for
differential privacy
Generating this distribution
Pr[ d=x | Alice talked to Bob]
Pr[ d=x | not Alice talked to Bob]
Probability
Average noise is
hundreds of fake
messages
250
Dead drops with two messages
Constraints:
•
Can’t have negative dead drops
•
Distributions have to be close enough for
differential privacy
Privacy degrades every round
•
Each round leaks metadata
•
We want differential privacy after sending many messages
•
This means adding more noise to support more messages.
Vuvuzela’s approach to noise
•
More noise means privacy for more messages.
•
Add as much noise as possible, while still keeping the
system practical.
•
Use differential privacy to compute how much privacy
users get.
•
•
Using composition theorem [Dwork & Roth 2014]
We picked: 300,000 fake singles and 300,000 fake
doubles per server per round.
Privacy with 300,000 noise
Pr[ i | Alice talked to Bob]
≤ 𝜺 × Pr[ i | not Alice talked to Bob]
9
8
7
6
� 5
4
3
2
1
10,000
100,000
1M
2M
Messages Alice wants to keep private
Eve is very evil
•
Alice sees previous graph and sends Eve many
messages through Vuvuzela.
•
Will NSA arrest Alice for talking to Eve?
•
•
Probably: using Vuvuzela is already suspicious
Will a fair jury convict Alice of talking to Eve?
•
Unlikely: Vuvuzela observations are not damning
evidence!
Alice gets a fair trial
•
Jury is already 50% certain Alice did the crime
(NSA is intimidating, other evidence, etc)
•
Beyond unreasonable doubt = 90% certainty
Alice is innocent for millions
of messages
90
Jury certainty %
89
88
86
83
80
75
67
50
10,000
100,000
1M
2M
Messages Alice wants to keep private
Implementation
•
3,000 lines of Go
•
Untrusted entry server manages user connections
•
Entry server notifies clients when a new round
starts
•
Available soon on Github:
•
github.com/davidlazar/vuvuzela
Evaluation
•
Can Vuvuzela servers support a large number of
users and messages?
•
Does Vuvuzela provide acceptable performance?
Asymptotic performance
•
Noise is independent of number of users.
•
Performance is linear in number of users
•
Bandwidth, latency, CPU
Setup
Entry
server
Server 1
Server 2
36 cores per VM
10 Gbps links
Client VMs
Server 3
Acceptable end-to-end
latency for text messaging
End-to-end latency for
conversation messages
60 s
50 s
40 s
30 s
20 s
10 s
0s
10
500,000
1M
1.5M
Number of online users
2M
Performance bottlenecks
•
CPU bound
•
•
Dominated by mixnet operations
High bandwidth cost
•
166 MB/s for servers, 12 KB/s for clients
•
Can lower bandwidth by increasing latency
linearly
Conclusion
• Problem:
hide metadata in a secure and scalable way.
• Approach:
• Encrypt
• Add
as much metadata as possible.
noise to obscure remaining metadata.
• Formalized
• Vuvuzela:
• Scales
privacy guarantee with differential privacy
scalable private messaging without metadata
linearly with number of users
• Privacy
for millions of messages per user → 37s latency
• 60,000
messages / second of throughput
What happens after 2M?
•
Privacy for lifetime of messages is unrealistic under this configuration
•
User’s should change their expectation to just expect privacy for a subset
of messages
•
•
Example: privacy just for important messages.
•
Example: privacy just for recent messages.
User does not need to specify which subset of messages to keep private
•
Vuvuzela’s guarantee holds for any (small) subset of messages that
the adversary cares about