4 The Economics of Peer-to

Transcription

4 The Economics of Peer-to
4 The Economics of Peer-to-Peer File
Sharing
Imagine you need to distribute a software patch to 10 Million users. What’s
an efficient way for doing so? As a user of a peer-to-peer file sharing system,
what is the optimal strategy to maximize your download speed? What are the
incentives at play in file-sharing systems like Gnutella or BitTorrent?
For many applications that require the distribution of files to a large number of users, peerto-peer (P2P) file-sharing networks are an attractive alternative to server-based solutions. If
the community of file sharers cooperates appropriately, high download rates can be achieved
at virtually no cost to the injector of the content. However, different P2P file-sharing
protocols can give rise to very different P2P file-sharing games, and whether users of a
P2P file-sharing system have an incentive to cooperate largely depends on the design of
the system. For this reason, game theory has proven particularly useful for the analysis of
existing P2P file-sharing protocols as well as for the design of new ones.
We begin this chapter with a brief introduction to the P2P file-sharing paradigm. In
Section 4.2, we discuss the rise and fall of the Gnutella network, and explain why many filesharing networks suffer from free riding. We then focus on BitTorrent, the most successful
file sharing network with more than 150 Million active users per month. In Section 4.3,
we describe the BitTorrent protocol in detail, and explain how BitTorrent changed the filesharing game and improved incentives for cooperation. In Section 4.4 we explore a number
of different attacks on BitTorrent, which we can think about as strategies an individual user
can use to increase his own performance. Finally, in Section 4.5 we conclude with a brief
discussion on user behavior observed in practice, private BitTorrent communities, altruism
in BitTorrent, and a brief history of P2P file sharing.
4.1 Introduction to P2P File Sharing
Some of the most popular internet-based services relate to media content, including downloading MP3s from iTunes, buying e-books from Amazon, and streaming videos over YouTube
or Netflix. These services are based on the client-server model, which means that the required tasks are clearly separated: the service provider is associated with the server machines, and the user’s device is the client machine.
Because millions of MP3s and videos are downloaded or streamed every day, these services
require thousands of servers around the world, and also cache content on machines close
to users in order to reduce latency. All in all, it is costly to deliver lots of content in this
client-server paradigm and only a few companies have the scale to compete.
In the late 1990’s, a very different information sharing paradigm emerged, namely peer-topeer (P2P) file-sharing networks, also known as P2P file-sharing systems. In a P2P network,
69
4 The Economics of Peer-to-Peer File Sharing
the separation between servers and clients is removed and each computer acts as both a
server and a client, and is simply called a peer. With the appropriate protocol in place,
users can exchange files with each other and this can take place with little or no centralized
system infrastructure or control.
This decentralization leads to a number of advantages that P2P systems have over serverbased systems. Most importantly, P2P systems scale very cheaply. In particular, large files
can be distributed to a large number of users at very low costs for the initial uploader of
those files. Additionally, P2P systems are very robust by avoiding a single point of failure,
and achieving a similar degree of robustness using a traditional client-server model (by
using thousands of server) is significantly more costly. Finally, P2P systems don’t require
their users to reveal their real name or register with a credit card, and thus they provide a
certain degree of anonymity and privacy for their users, in contrast to many media services
like iTunes, Netflix, etc. Of course these advantages also come at a certain cost. For
example, the injector of the content has essentially no control over who will download the
files, for how long the files will be available in the P2P network and at which download
speed. Obviously, a P2P network is not suitable for all applications.
P2P file-sharing networks sometimes receive a bad reputation because the content exchanged can contain copyrighted material, and thus, exchanging them with other users is
illegal in most countries. While it is undeniable that rise of file-sharing networks eas primarily due to the availability of popular copyrighted material in these networks, there are
also many legal uses of P2P file-sharing networks.1
For example, free software (e.g., Linux distributions) is made available via P2P file-sharing
networks. The gaming company Blizzard Entertainment distributes the game installer package as well as update patches for World of Warcraft via the BitTorrent P2P file-sharing
network. Given that such files can easily be 500 MBytes large, and considering the Millions
of subscribers of World of Warcraft (many of whom will update simultaneously), it is easy
to see how using a P2P file-sharing system is significantly cheaper for Blizzard compared to
using a server-based distribution approach.
Blizzard has also used BitTorrent to distribute trailers for their games Starcraft II and
Diablo III, and Internet TV services such as Zattoo are streaming video data to their users
via P2P networks. When used for content distribution, the originator incurs minimal cost
for injecting the content— the cost of an initial upload. From this point on, users can
download the content at high speeds, in particular when millions of users are downloading
the content at the same time and are also sharing with each other.
Figure 4.1 displays the cumulative distribution function (CDF) of average download
speeds in the P2P file-sharing network BitTorrent, based on measurements of more than
500,000 peers in 2009 (note the logarithmic scale on the x-axis). We see that the average
downloads speeds are very high. For example, the median download speed in the public
PirateBay community was 333kbps, the average download speed was around 1Mbps, and
the fastest 10% obtained download speeds of more than 2Mbps. In the private communities (TVTorrent, TorrentLeech, and PolishTracker ), the average download speeds ranged
between 3.6Mbps and 8.6Mbps (high enough to stream HD movies). Thus, in contrast to
server-based solutions, P2P systems can provide lots of users with very high download rates
1
We discourage the reader to download copyrighted material via P2P file sharing networks. Downloading
copyrighted material is illegal in many countries.
70
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.1 Introduction to P2P File Sharing
Figure 4.1: The CDF of average download speed for five different BitTorrent communities,
based on measurements of over 500,000 peers in 2009 (Meulpolder et al., 2010).
without using expensive infrastructure. We briefly return to the variation in performance
between the public and private BitTorrent communities in Section 4.5.2.
4.1.1 P2P File Sharing in the Language of Game Theory
Many different P2P file-sharing networks have emerged, each with properties that vary
according to the following four factors:
1. Protocol: A P2P file-sharing network requires a network protocol such as Gnutella or
BitTorrent. The protocol refers to the messages and actions that are supported by the
system. In the language of game theory, the protocol defines the rules of the game.
2. Reference client: One way to introduce a new protocol is to develop a file-sharing
client, which is a software application compatible with the protocol. Referred to as
the reference client, this implements a default behavior in the file-sharing game. In
the language of game theory, the reference client implements a default strategy.
3. Other clients: It is hard to enforce the usage of a particular reference client. Thus,
alternative clients with different behaviors, but still compatible with the protocol, can
be developed. In the language of game theory, each new client implements a different
strategy.
4. User behavior: Finally, end users decide which network to join and which client application to use on their machine. Users can also configure the client application,
allowing it to access certain files or not, limit the bandwidth usage, and so forth.
Thus, the ultimate behavior of a user in the file-sharing game is determined both by
the design of the client application and a user’s decisions.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
71
4 The Economics of Peer-to-Peer File Sharing
Whereas we used client in the discussion of the client-server architecture to refer to a
client’s machine, in the context of P2P networks the client is used to refer to the software
application that runs on a peer’s machine. Note: For many P2P networks, the same term
is used interchangeably to refer to the network, the protocol, and a client. For example, we
refer to the BitTorrent protocol, the BitTorrent network, and different BitTorrent clients;
in addition, the reference client application, is also called BitTorrent.
In evaluating a P2P file-sharing network, we need to consider all these factors: the protocol
that defines the rules of the game; given a particular protocol, users will choose a client that
implements a particular strategy (for example, the client that maximizes the individual
user’s performance); and given the choice of client, a user will configure the client according
to his preferences. The distribution of client strategies that are adopted in the network and
the configurations that users choose ultimately determine the outcome of the underlying
P2P file-sharing game.
When comparing different file-sharing systems, it is useful to consider the following three
properties:
• Social Welfare: Social welfare can be defined as the average download speed of participating users. Thus, a social welfare-maximizing system minimizes average download
times.
• Incentive Properties: The social welfare of a P2P file-sharing network depends on the
willingness of users to share with other users. Networks vary according to the degree
to which they align incentives with sharing, so that it is in a user’s self-interest to
allow downloading from his peer.
• Fairness: In order to sustain P2P file-sharing communities it can be useful to achieve
a distribution of upload and download resources that is seen to be fair by most users;
e.g., one possible criterion for fairness is that a user’s download speed is roughly
proportional to his upload speed.
Other terms that are used in place of the social welfare of a P2P file-sharing network are
the efficiency, or simply performance, of the network.
4.1.2 Napster: How Everything Began
The rise of P2P file-sharing networks started in 1999 with the release of the Napster P2P
file-sharing client, which became widely popular in early 2000, and reached its peak in
February 2001 with 26.5 million users world-wide. Before Napster, users had already shared
files via other networks like IRC and USENET, but Napster was unique in providing a userfriendly interface, adopting a centralized directory to make it easy to search for content,
and in specializing in music files. Users who wanted to share MP3s could connect to the
Napster server and add descriptions of their files to Napster’s database. A user wanting to
download files could then easily query the server for content. The actual exchange of files
happened directly between the clients, i.e., the peers, thus the term “peer-to-peer.” Due to
legal difficulties, Napster was shut down in July 2001, but a number of other networks had
already emerged and quickly took its place.
In fact, around the turn of the century, with broadband Internet access becoming available
around the world, file sharing became more and more popular. In 2002, P2P file sharing was
72
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.2 Free riding in P2P Networks
responsible for approximately 50% of world-wide Internet traffic. Between 2000 and 2010,
many P2P file-sharing protocols were introduced. In the first few years after Napster’s shut
down, the three networks FastTrack, eDonkey, and Gnutella were responsible for the vast
majority of P2P traffic world-wide.
Users moved quickly from one network to the other as better alternatives were introduced.
In contrast to Napster, where users could only share music, FastTrack, eDonkey and Gnutella
also allowed users to share movies, which over time became the primary content shared in
these networks. Because these later P2P networks were more decentralized than Napster,
they were essentially impossible to shut down, despite many different lawsuits. For a more
detailed history of P2P file-sharing, please see Section 4.5.4.
4.2 Free riding in P2P Networks
The Gnutella network, introduced in 2000, was the first truly decentralized P2P file-sharing
network. To join the network, a peer connects to one of several peers that are known and
almost always online, but do not generally share files. Rather, these peers share a list of
IP and port addresses of other peers. Communication between peers then proceeds via
broadcast messages, sent to the list of peers known by the peer. This can include the
re-broadcast of messages received from another peer.
To find a desired file, a peer uses a query message to describe the desired content. Such a
message is re-broadcast peer-to-peer until a peer with the desired content is found, or until
some maximum number of re-broadcasts has occurred. A peer with the desired file replies
with a query response, a message that contains the peer’s IP and port address, a unique
client ID, as well as other information necessary to download the file. These query response
messages are propagated backwards along the path that the original query message took,
until reaching the original requester who can then contact the peer who has the file to start
the download.
4.2.1 Free Riding on Gnutella
In contrast to Napster, Gnutella has no central server and no statistics about individual user
behavior are maintained, in either a centralized or a decentralized way. Thus, for two peers
interacting with each other, every interaction looks just like any other one: a simultaneousmove game with anonymous players. This is in contrast to the way users shared files before
the existence of P2P file sharing networks. For example, users on bulletin boards often knew
each other from previous interactions, forming a close-knit community.
Unfortunately, the design features of Gnutella very quickly led to a problem: the majority
of users were free riding, i.e., consuming resources (downloading files) without contributing
back to the community (uploading files). In Figure 4.2, we present a simplified version of
the P2P File-Sharing Game to illustrate why free riding is beneficial in Gnutella.
In this game, each player can either share files with other users in the network, or free ride,
i.e., only download files without ever uploading any files. The numbers in the payoff table
are just illustrative, but are meant to convey the main features of file sharing in Gnutella:
peers obtain positive utility from downloading a file (here 3), whether they share or not
has no effect on their download experience, and users incur a small cost for uploading a file
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
73
4 The Economics of Peer-to-Peer File Sharing
Player 1
Share
Free Ride
Player 2
Share Free Ride
2, 2
−1, 3
3, −1
0, 0
Figure 4.2: The P2P File-Sharing Game: A Prisoner’s Dilemma.
Figure 4.3: Ordering of Gnutella peers by contribution (Adar and Huberman, 2000).
(here -1). In particular, when both agents share, then each agent receives payoff 3 from
downloading and -1 from uploading, and thus a total payoff of 2.
The cost for uploading may arise for many reasons, including: increasing the bandwidth
payments a user must make, precluding some other use for upload bandwidth such as VoiceOver-IP, a disutility from leaving a computer on while files are uploaded (electricity, noise,
etc.), or concerns about the legal implications of uploading copyrighted material.
As long as 1) a user obtains positive value from receiving a file, 2) his actions do not
influence his download experience, and 3) he incurs a small cost for providing a file, the
game has the structure of the Prisoner’s Dilemma game (see Chapter 2). Thus, it is a
dominant strategy to free ride, and both players free riding is the only Nash equilibrium of
the game.
Given this incentive structure, it is surprising that anyone ever shared any files in Gnutella
and that the network was relatively successful, at least in the beginning. After all, it was
one of the four most popular file-sharing networks in the years after Napster was shut down.
Some of the sharing activity can be explained by users who left the default settings of a
software client in place. Indeed, many client applications were configured to make files
downloaded automatically available to be uploaded to other peers. Other Gnutella users
may have simply enjoyed sharing files with other people; i.e., the joy from uploading files
to others outweighed the costs they incurred.
74
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.2 Free riding in P2P Networks
Still, a study of Gnutella confirmed that 66% of the peers in 2000 shared no files with other
peers; see Figure 4.3. In fact, the top 1% of sharers provided 37% of the total files shared
and of the peers that shared files, the top 1% provided almost 47% of all query responses,
while approximately 63% never provided a query response. Thus, the contributions made
by peers on Gnutella were heavily skewed, and the large majority of peers were free riding.
In fact, the amount of free riding in Gnutella increased significantly over the years. A
study conducted in 2005 found that 85% of peers shared no files. The authors of the study
argue that the developers of the various Gnutella clients did not have an incentive to build
mechanisms into their software that would prevent free riding, because users would switch
to other client applications without such restrictions. Thus, free riding remained a problem
and there was a steady decline in performance and market share. In 2013, Gnutella was
responsible for less than 1% of global P2P traffic.
4.2.2 Kazaa and Participation Statistics
The developers of clients for other P2P file-sharing networks tried to fix the free-riding
problem, but initially with little success. For example, Kazaa, a popular client using the
FastTrack protocol, kept track of the uploads and downloads performed by a peer, thereby
measuring the participation level of the peer in the network. The client application shared
this information with other peers, and was designed so that a peer would give priority
to peers with a high participation levels. However, because this information was stored
locally by peers and self-reported to others, it could easily be spoofed! Very quickly, other
programmers developed new client applications such as Kazaa Lite K++ and K-Lite, that
simply set the reported participation level to the maximum, thus circumventing this simple
incentive mechanism.
4.2.3 Repeated Games in Gnutella?
So far, we have described the interactions between two file-sharing peers as a one-shot,
simultaneous-move game. However, in practice, most file-sharing users download many files,
possibly thousands, over their lifetime. Thus, we might think to model P2P file sharing as
a repeated game instead. Remember from Chapter 3 that in a repeated game, the same
simultaneous-move game is played by the same players over and over again, with every
player having perfect information about the history of actions in all previous periods.
In a repeated game, the observations of past actions allow for new strategies that do
not exist in one-shot games. In a two-player game, player 1 can now condition his actions
on the past actions of player 2, and vice versa. In the Prisoner’s Dilemma, for example,
player 1 could reward “good behavior” by player 2 and punish “bad behavior,” hoping that
this changes the incentives for player 2 in such a way that player 2 will always cooperate.
Under certain conditions it is indeed possible to sustain cooperation in a repeated Prisoner’s
Dilemma game, i.e., for both players to play (C,C) in every period, by threatening to punish
the other player should he not cooperate.
A particularly well-known strategy for playing an infinitely-repeated Prisoner’s Dilemma
game is the so-called “Tit for Tat” (TfT) strategy. We have already seen one version of
TfT in Chapter 3, but we will now present a slightly different version. In this version of
TfT, each player starts out cooperating. If the other player ever defects, then the player
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
75
4 The Economics of Peer-to-Peer File Sharing
punishes this in the next period by also defecting, and then goes back to cooperating in
the following period. It can be shown formally, with the techniques we have studied in
Chapter 3, that under certain conditions, this TfT strategy sustains cooperation in an
infinitely-repeated Prisoner’s Dilemma game. In fact, one can show that this strategy
constitutes a sub-game perfect Nash equilibrium (see Chapter 3 for details on these kinds of
folk theorem results). Given these results, we might hope that suitably-designed file-sharing
clients could implement strategies that lead to cooperative equilibria in the P2P file-sharing
game, providing agents with an incentive to share.
However, in many P2P file-sharing systems, the same two peers only interact once or a few
times over their lifetime; i.e., the rendezvous probability is extremely low. And even if two
peers see each other multiple times, it is unlikely that one peer has a file that the other wants
at the time of rendezvous, which makes the theory of repeated games non-applicable. This
problem was neatly solved through the BitTorrent file-sharing protocol, which introduced a
completely new paradigm.
4.3 BitTorrent: Taking Incentives Seriously
One of the key differences between BitTorrent and Gnutella is that in BitTorrent, a peer
that is downloading a file is also simultaneously uploading pieces of that same file to other
peers. While this may look like an unimportant detail at first, it is actually an integral
design feature of the BitTorrent protocol because it solves the problem that occurs because
of low rendezvous probability.
Because of this change, the peers that are concurrently downloading the same file are
exchanging lots of pieces with each other and are thus playing a “repeated game on the piece
level,” which allows for new kinds of cooperative strategies. Indeed, the BitTorrent client
implements a reciprocation policy that resembles TfT. The client is designed to promote
the phenomenon that the more upload a peer provides to other peers, the faster it will be
able to download pieces of the same file. This provides users with an incentive to upload to
others, and discourages free riding.
This incentive alignment, designed into BitTorrent, is one of the primary reasons for
BitTorrent’s success. In 2013, BitTorrent was by far the most popular file-sharing protocol,
responsible for more than 80% of world-wide P2P file-sharing traffic (see Section 4.5.4 for
details on the evolution of BitTorrent’s market share over the last 10 years).
4.3.1 The BitTorrent Protocol
To participate in BitTorrent, a user must download a client that is compatible with the
BitTorrent protocol. When Bram Cohen introduced BitTorrent in 2001, he released a
reference client which he also called BitTorrent. In some sense, introducing this reference
client is the same as introducing the first version of the BitTorrent protocol. In this section,
we describe the details of that client as specified by Cohen in a 2003 paper. Of course,
since then the BitTorrent client has been continuously improved, but the main design has
remained the same; otherwise the newer clients would not be backwards compatible.
The content that users download on the BitTorrent network can be a single file (e.g., a
movie), or a large aggregated collection of files (e.g., all MP3s from a music album). For
simplicity we will simply refer to “a file” going forward. To find content, a user generally
76
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.3 BitTorrent: Taking Incentives Seriously
Figure 4.4: Starting a download process in the BitTorrent protocol: 1) A user goes to a
searchable directory (e.g., a website) to find a link to a .torrent file corresponding
to the desired content; 2) the .torrent file contains metadata about the content,
in particular the IP address of a tracker; 3) the tracker provides a list of peers
participating in the swarm for the content; 4) the user’s BitTorrent client can
now contact all these peers and download content.
goes to a website that maintains a searchable directory of torrents, linking to so-called
.torrent files. The .torrent files are generally not hosted by the same website that provides
the directory service, but only linked to from that site. There also exist BitTorrent client
applications with a completely decentralized content discovery protocol (e.g, Tribler ), and
many users subscribe to RSS web feeds to automatically download new content when it
becomes available. However, centralized websites remain the most common way to find
content.
Once a user downloads the .torrent file, he opens it with a BitTorrent client application.
The .torrent file contains all the relevant metadata, including the name and size of the file
that contains the desired content. A file in BitTorrent is divided into pieces, which are
further divided into blocks (typically 64-512KB per block). The .torrent file also contains a
160-bit SHA-1 digital fingerprint of the data blocks to be downloaded, such that the client
can later verify that blocks have been correctly downloaded. The metadata also includes
the IP address of a tracker, which is a computer that acts as a server for the torrent, and is
responsible for coordinating the peers who are interested in the file.2
All peers that are uploading or downloading the same file form a swarm. A peer announces
itself to the tracker to obtain a list of peers that are part of the swarm, and the tracker
returns a random subset (typically 50) of peers that are currently active in this swarm (i.e.,
their IP addresses). Peers re-announce themselves periodically to the tracker, usually every
15 to 30 minutes, at which point the tracker returns a new list of peers currently active
in the swarm. Finally, the peer also tells the tracker when it is leaving the swarm. See
Figure 4.4 for a schematic view of the process necessary to start downloading content via
BitTorrent.
Today, all standard BitTorrent clients also have an option for decentralized tracking,
2
The term “.torrent file” always refers to the file containing the metadata information. However, the term
“torrent” can be used both to refer to the .torrent file and also to the file (e.g., a movie) a user wants to
download.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
77
4 The Economics of Peer-to-Peer File Sharing
realized via a distributed hash table (DHT). This method is also often called tracker-less
tracking because instead of using a centralized tracker, the peer obtains the IP addresses
of the other peers in the swarm from an ad-hoc network of BitTorrent peers that are exchanging information about torrents in a decentralized way. Decentralized trackers are more
robust and also more difficult to shut down in case of legal disputes. Note that even using
DHTs, the torrent directory websites containing the .torrent files (which contain the tracker
information) are still required. However, these websites now often also provide magnet links
instead of (or in addition to) linking to .torrent files. If a user clicks on a magnet link he
does not download the whole .torrent file, but instead he only downloads a (much smaller)
torrent hash that represents a cryptographic hash value of the desired content. Using the
torrent hash, a peer can find the corresponding peers in the DHT network and download
the .torrent file from them before downloading the content file.
The peers that are currently downloading pieces of a file are called leechers, and the peers
who already have the complete file (and not just pieces of it) and are still uploading the file
are called seeders. A user who wants to inject new content into the BitTorrent system can
use a BitTorrent client to create a .torrent file corresponding to the file to be injected. The
software takes care of splitting the file into pieces and pieces into blocks and computing the
SHA-1 fingerprint. The user must only provide a tracker (there are lots of public trackers
available) or use the decentralized tracking option. The user can then upload the .torrent
file to a directory website, or send the .torrent file directly to interested users. Finally, the
user must act as the first seeder of the new content, at least until each piece of the file has
successfully been downloaded by at least one other user.
Leechers, i.e., peers who are still downloading the file, need to connect to other peers in
the swarm (seeders or other leechers) to download from them. A peer can either initiate
a connection or respond to a connection request. When two peers connect they become
neighbors and exchange a bitfield, which is a bit array informing each other about which
pieces of the file they have. Each peer maintains open connections to all neighbors, forming
its local neighborhood, and only closes a connection when it or the other peer leaves the
swarm. Thus, the size of a peer’s neighborhood within a swarm generally keeps growing
over time, until the download is complete.
Whenever a peer finishes downloading a piece, it sends per-piece have messages to all
peers in its neighborhood. Each peer maintains an estimate of the availability of each piece
of the file by counting how many of its neighbors have the piece. When a peer i starts
uploading to a leecher j, then j informs i about which blocks j wishes to receive next. The
standard strategy is rarest-first, where a leecher tries to download blocks from those pieces
that currently have the lowest availability. Rarest-first is designed to distribute rare pieces
quickly, to reduce the likelihood of many peers waiting for the same pieces, and to minimize
the risk that some pieces will not be available in the swarm at all.
A peer generally only uploads to a small subset of peers in its neighborhood. All peers
that it doesn’t upload to are called choked, and the few that receive some data are called
unchoked or the active set. The number of unchoke slots S is variable, and can be set
by the client software. Most BitTorrent clients originally used a fixed number of unchoke
slots, often 4, while today many clients set the number of upload slots proportionally to the
total available upload bandwidth. A key aspect of the BitTorrent protocol is determining
which peers to unchoke and this aspect is also of strategic importance, as we will see in the
following sections.
78
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.3 BitTorrent: Taking Incentives Seriously
Figure 4.5: File exchange in the BitTorrent reference client, showing four unchoke slots with
three allocated base on peers that reciprocate the most bandwidth and one used
for optimistic unchoking and allocating to some other, random peer.
4.3.2 The Unchoking Algorithm: Who to upload to?
The reference client always tries to download pieces from every peer in its neighborhood,
but is very selective regarding which peers it will allow to download from its machine. In
the reference client, a peer’s decision regarding which peers to unchoke is primarily based
on its recent download rate from other peers. Given S unchoke slots in total, a peer makes a
new decision every 10 seconds, unchoking the S − 1 peers from which it received the highest
average download rate during the last 20 seconds. Additionally, every 30 seconds, a peer
allocates its optimistic unchoking slot to some other random peer from its neighborhood.
Finally, the peer splits its upload bandwidth equally among all S slots at the so-called
equal-split rate.
Optimistic unchoking serves two goals. First, it helps a client to explore its neighborhood
and find peers that reciprocate with high upload speeds. Second, it helps peers that have
just joined the swarm to obtain their first pieces before they have something they can
reciprocate with, which is good from a social welfare perspective. For a schematic view of
how a peer’s upload slots are allocated in the reference client application, see Figure 4.5.
4.3.3 A Tit-for-Tat Analysis of BitTorrent’s Unchoking Algorithm
BitTorrent’s unchoking strategy is often compared to the famous Tit-for-Tat strategy that
can sustant cooperation in a repeated Prisoner’s Dilemma game. In this section, we will
explain to what degree this comparison is justified, and where the similarities end.
Peers in a BitTorrent swarm may have thousands of repeat interactions, even during the
download of one file. For this reason, it is reasonable to model BitTorrent as a repeated
game. In fact, BitTorrent’s game is simultaneously played by hundreds or thousands of peers
connected to each other in a swarm through random neighborhoods. However, instead of
modeling this detail, we take the perspective that a peer is simultaneously playing many twoplayer repeated games with many different peers, and focus on a simple repeated Prisoner’s
Dilemma model. Indeed, while two peers are still each missing pieces that the other peer
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
79
4 The Economics of Peer-to-Peer File Sharing
has, they could choose to continue their bilateral exchange.
In the reference client, the unchoking algorithm makes a new decision every 10 seconds
based on its average download rate from each peer over the past 20 seconds. We can model
the effect of peer i’s unchoking strategy on peer j in the context of a repeated game between
the two peers. For this, let u(3) denote the 3rd-highest speed that peer i received from any of
its neighbors in the most recent period. Given this, the unchoking strategy of the reference
client can be stated as:
• If peer j uploaded to i with a speed ≥ u(3) : unchoke j in the next period.
• If peer j uploaded to i with a speed < u(3) : do not unchoke j in the next period.
• Every three time periods, unchoke a random peer from the neighborhood who is
currently choked, and leave that peer unchoked for three time periods.
Remember that peer i might be downloading from many peers, and more peers than its
number of unchoke slots. This is because a peer can be optimistically unchoked by other
peers. Because of this, a large number of peers may be contending for the unchoke slots.
We can now see the similarity between BitTorrent’s unchoking strategy and TfT: if peer
j uploaded a lot to i in the most recent period (i.e., played “Cooperate” in the Prisoner’s
Dilemma) then i rewards j in the next period by unchoking and also uploading to j (i.e., i
will also play “Cooperate”). If, however, j did not upload a lot to i (i.e, played “Defect” in
the Prisoner’s Dilemma), then i will not upload to j in the next period (i.e, i will also play
“Defect”). Furthermore, the optimistic unchoking slot can be interpreted as corresponding
to playing “Cooperate” in the first time period of the repeated Prisoner’s Dilemma game.
Thus, BitTorrent’s unchoking strategy creates an incentive for peers to upload. If client
j does not upload to client i then client i will not unchoke client j, except via optimistic
unchoking, which happens rarely. On the other hand, by uploading to i with high speeds, j
gets unchoked by i, and also receives high speeds in return. Thus, in contrast to Gnutella,
uploading is rewarded in BitTorrent, just as “Cooperating” is rewarded in the repeated
Prisoner’s Dilemma game under the TfT strategy.
But this is also where the similarity between BitTorrent’s unchoking strategy and the TfT
strategy end. Unlike in the Prisoner’s Dilemma game, BitTorrent peers have more than just
two actions. Not only can they decide between sharing and not sharing, but they can also
vary how much upload bandwidth to make available to each individual peer. Furthermore,
whether peer j receives any bandwidth from peer i depends not only on peer j’s actions,
but also on the actions of other peers in i’s neighborhood. In addition, a peer j may get to
a point where it no longer has any pieces that i wants, in which case it is no longer able to
upload anything useful to i.
It turns out that these differences matter a lot. In the next section, we will see that
there are possible “attacks” on BitTorrent through alternative clients. In the language of
game theory, these represent “better strategies.” Thus, despite the similarities with TfT,
everyone using the BitTorrent reference client is not an equilibrium.
4.4 “Attacks” on BitTorrent
A BitTorrent peer has a large degree of freedom in making decisions regarding how to
interact with other peers, and there is a large design space for client applications, including
80
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.4 “Attacks” on BitTorrent
the following decisions:
1. How often to contact the tracker to receive a list of peers?
2. Which pieces to reveal to which peers?
3. How many upload slots to use?
4. Which peers to unchoke, how much upload speed to give to each unchoked peer, and
how often to make this decision?
5. What data to upload to each unchoked peer?
Different choices will result in different performance for a peer participating in a swarm.
Of course, users’ preferences may vary, and where some prefer to minimize their download
time, others may prefer to minimize usage of upload bandwidth or to strike a good balance
between the two.
In the following sections, we will present a number of different BitTorrent strategies, each
addressing one or both of these two goals, and often making different trade-offs. Many of
these strategies have been called “attacks” on BitTorrent, because they lead to a different
play of the BitTorrent game than was intended by its inventor, and because some of these
strategies would be quite harmful to the overall BitTorrent network if everyone adopted
them. On the other hand, introducing new “strategic” clients also revealed certain shortcomings of the original protocol, and often led to improvements. Thus, it is largely a matter
of perspective whether you call these strategies an “attack” or a “rational deviation” from
the reference client’s strategy.
4.4.1 Uploading Garbage Data
When a BitTorrent client joins a swarm it has no blocks to upload to other peers and must
rely on optimistic unchoking by others. Thus, getting started in BitTorrent can be slow, in
particular when there are many leechers and few seeders. A simple attack that addresses
this issue is to upload garbage data. A peer could falsely announce that it has all pieces of
the file, and then upload random data when the pieces are requested.
However, the SHA-1 fingerprint in a .torrent file allows clients to detect incorrect data
on the block level, and quickly identify a peer that uploads garbage data. Most client
applications are now designed to stop interacting with a peer once they detect that this
peer has uploaded garbage data, making the attack is no longer viable in practice.
4.4.2 BitThief: Exploiting Optimistic Unchoking
The second attack we consider was implemented and tested via a real-world BitTorrent client
called BitThief. This client exploits the optimistic unchoking strategy that most BitTorrent
clients follow (including the reference client). The goal of BitThief is to download files via
BitTorrent without uploading anything in return. Thus, BitThief relies exclusively on the
optimistic unchoking slots of other peers. BitThief’s goal may make sense for users who have
little upload bandwidth available and need it for other applications like VoIP. Furthermore,
for users who are interested in downloading copyrighted material, there may be legal reasons
to prefer a client that downloads but does not upload.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
81
4 The Economics of Peer-to-Peer File Sharing
Figure 4.6: The BitThief client opens connections to new peers much faster than the reference client (Locher et al., 2006).
BitThief does two things differently than the reference client. First, it initially asks the
tracker for a list of 200 peers instead of 50, which is the standard. Second, it re-announces
itself to the tracker much more frequently than the reference client does (the standard
is between 15 or 30 minutes). The goal of both these techniques is to grow a client’s
neighborhood as fast as possible. This is beneficial, because every additional peer in a
client’s local neighborhood means an additional chance of being optimistically unchoked in
the next period.
Consider Figure 4.6, which shows a comparison of the number of connections by BitThief
versus the reference client over time. We see that BitThief is able to open new connections
much faster than the reference client, with a particularly high advantage at the beginning,
and thus will receive many more optimistic unchoke slots, which improves download performance. On the other hand, because BitThief is not uploading content it will only get
optimistic unchoke slots and this negatively affects its download performance.
For typical torrents with a mix of seeders and leechers, BitThief can generally be used
to download the torrent successfully. Given that BitThief does not upload at all, it is not
surprising that the completion time is on average between 2 and 4 times longer than with
the reference client. However, for some torrents, in particular those with small files and a
large number of seeders, BitThief can even be slightly faster than the reference client. This
is because BitThief’s strategy of quickly opening many connections is particularly powerful
in the first few minutes of a download process (see Figure 4.6). For small files, the additional
optimistic unchoke slots obtained in the first few minutes may be enough to outweigh the
lost reciprocation due to not uploading, such that the total download time is reduced.
Thus, BitThief points towards an obvious vulnerability of the optimistic unchoking strat-
82
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.4 “Attacks” on BitTorrent
Figure 4.7: Strategic Piece Revelation: Peer i prefers to remain as interesting as possible (Levin et al., 2008).
egy of the reference client. However, the exploit relies on trackers not being very careful
about this kind of behavior. In fact, nowadays, most trackers prevent multiple requests
from the same IP address in a 30-minute window, which makes this attack mute, unless
the attacker has a way to obtain multiple IP addresses, which most users cannot easily do.
Furthermore, the attack would primarily be of interest to users who really care about minimizing their upload bandwidth. For large files, using BitThief leads to significantly longer
download times.
4.4.3 Strategic Piece Revelation
The third attack we consider seeks to keep other peers interested in exchanging pieces for
as long as possible. If peer i has a piece that j does not yet we say that j is interested in i .
Once j has all of the pieces that i has, it loses interest in i. In particular, j will quickly stop
uploading to i (except via the optimistic unchoking slot) because the unchoking strategy
will not select i to be unchoked when j receives no more useful pieces from i. Consequently,
a peer would like to maximize the amount of time that other peers find it interesting.
Remember that the reference client truthfully reports the information about the pieces
it has, and follows the “rarest-piece first” policy when downloading. At first sight, it seems
optimal for a peer to reveal all pieces it has, in order to maximize interest from others.
However, this view is myopic, and it turns out that by under-reporting its pieces, a peer
can benefit.
Now consider Figure 4.7, where various scenarios from peer i’s perspective are shown,
and i indicates i’s preferences (i.e., scenarios (a) through (f) are ordered in decreasing
preference order). Peer i prefers having more peers interested in i rather than fewer, and
thus (a) i (c) i (e), as well as (b) i (d) i (f ). Next, it is disadvantageous for peer
i if other peers trade pieces among each other, because this potentially reduces the other
peers’ interest in i in the future. Thus, (a) i (b), and (c) i (d) and (e) i (f ). Lastly, it
is always better to have one additional peer being interested in peer i, even if that peer is
also interested in another peer in the swarm. Thus, (b) i (c) and (d) i (e).
Based on the ordering of scenarios presented in Figure 4.7, one can design a piece revelation
strategy for agent i that under-reports pieces with the implicit goal of maintaining the
scenario in Figure 4.7 (a). Let bi represent i’s true bit-field, with bi (p) = 1 if peer i has
piece p and 0 otherwise. While the reference client always reports its true bit-field to new
neighbors and also sends truthful updates to existing neighbors whenever it completes a new
piece (via have-messages), we now consider a strategy where agent i may possibly represent
a different bit-field to each of its neighbors. Thus, a peer j may receive a report of bbi from
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
83
4 The Economics of Peer-to-Peer File Sharing
peer i about i’s bit-field, which may be untruthful, i.e., it may be that bbi 6= bi . Algorithm 4.1
describes the piece revelation algorithm, where peer i is considering which pieces to reveal
to peer j.
Algorithm 4.1: Strategic Piece Revelation Algorithm (Levin et al., 2008).
1. Let bi represent i’s true bitfield, and bbj denote j’s bitfield as j has announced it to i.
For each peer k, peer i maintains a list of pieces that i has already revealed to k,
denoted Li (k).
2. If there does not exist any piece p such that bbj (p) = 0 and bi (p) = 1 then quit; peer i
cannot truthfully gain j’s interest.
3. Find the piece p with bbj (p) = 0 and bi (p) = 1 that maximizes the number of other
neighbors k which (a) also have the piece p, or (b) to whom i has revealed piece p
before.
4. Send a have-message to peer j, revealing that i has piece p, and add p to Li (j).
This piece-revelation strategy differs from the default one in two important ways. First,
peer i is non-truthful about its bit-field, only revealing a new piece to peer j whenever it
is necessary, namely just at the point in time when j would otherwise lose interest in i.
Second, even when i reveals a new piece to j, it reveals the most common piece it has and
that j does not have, rather than the rarest piece. The idea is that providing j with a rare
piece could increase other peers’ interest in peer j, which could reduce their interest in peer
i. Thus, by following this algorithm, i tries to maintain the scenario from Figure 4.7 (a) as
opposed to the other scenarios.
Figure 4.8 shows an example run from an experiment designed to compare the standard
BitTorrent client with one that uses strategic piece revelation (in two separate runs). In
this experiment, the two clients join a swarm of peers with a 20-second delay, to test how
many other peers (who already have more pieces than they do) they can keep interested in
them, and for how long. As we can see, the strategic peer is successful in attracting more
interest from other peers over a long period of time, and ultimately completes its download
about 30% earlier than the reference client.
Note that it is private knowledge of each peer which pieces it has, and consequently
a single other peer cannot detect whether a peer under-reports pieces or not. Thus, in
contrast to the attacks that involve uploading garbage data or contacting the tracker more
frequently than the default client, the strategic piece revelation strategy cannot easily be
defended against.
Even though using strategic piece revelation is beneficial from an individual peer’s point of
view, a remaining question is its effect on overall social welfare. Unfortunately, experimental
studies have shown that if all peers use strategic piece revelation, then the average download
time for the whole population increases by 12%. This negative effect on overall performance
is due to the fact that all peers withhold information from each other, thereby leading to
suboptimal allocations of bandwidth. It is perhaps surprising that the increase in download
times is not higher. However, peers need to reveal more and more pieces to each other
84
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.4 “Attacks” on BitTorrent
Figure 4.8: Example run, comparing a standard client with a strategic piece revealer. The
strategic client achieves higher interest from others and finishes downloading a
file faster (Levin et al., 2008).
over time, to keep each other’s interest, which limits the negative impact of strategic piece
revelation.
Yet, from a social welfare perspective, it is unfortunate that the BitTorrent protocol
allows clients to benefit in this way. An interesting and open research question is whether
a protocol could be designed that incentivizes truthful piece revelation.
4.4.4 BitTyrant: Strategic Unchoking
The fourth and last strategic attack we consider is also the most powerful one in terms of
decreasing a peer’s download time. The strategy was implemented in a BitTorrent client
called BitTyrant in 2006, its efficacy has been experimentally tested, and the client is freely
available for download.
The main motivation for developing BitTyrant was the observation that the reference
client’s unchoking strategy may be sub-optimal. Remember that most clients use a fixed
number (often 4) of unchoking slots, upload primarily to those peers that reciprocate with
the fastest download rate, allocate one optimistic unchoking slot randomly, and split their
upload bandwidth equally among all of the unchoked peers (the equal-split policy). The
question is whether there is a better unchoking strategy than the default one.
The first observation is that BitTorrent’s default unchoking strategy does not really provide the property of “the more you give the more you get.” Instead, if peer i does not
upload enough to a peer j to be in its top 3 peers (assuming 4 upload slots), then i will
not be unchoked by j except optimistically. Moreover, once peer i is in the top-3 peers of j
there is no benefit to i for further increasing its upload capacity to j. Thus, when i decides
how much it should upload to j, what really matters is j’s reciprocation probability given a
particular upload bandwidth u that i provides to j, which is the likelihood that peer j will
unchoke i given upload bandwidth u.
Consider Figure 4.9, which plots the probability that another peer will reciprocate (i.e.,
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
85
4 The Economics of Peer-to-Peer File Sharing
Figure 4.9: The probability that another peer will reciprocate and unchoke peer i as a
function of peer i’s raw upload capacity as well as its upload capacity per slot
(assuming the equal-split policy) (Piatek et al., 2007).
unchoke peer i) as a function of peer i’s: 1) raw upload capacity, as well as 2) its upload
capacity available in a single slot (assuming the equal-split policy). We see a clear discontinuity in the probability of reciprocation as a function of either measure of upload capacity.
For example, looking at a single slot’s upload capacity (equal-split), if peer i provides 11-12
KB/s, peer j is much more likely to reciprocate than with 10 KB/s, but increasing the upload bandwidth above approximately 14 KB/s provides little additional benefit to i. This
suggests that the equal-split policy is suboptimal from an individual peer’s perspective.
To examine this aspect further, consider Figure 4.10, which plots the expected download
performance of a peer against its total upload capacity. Performance increases sublinearly,
with high-capacity peers getting less than their “fair share” of download bandwidth in return
for upload bandwidth. Especially for a high-capacity peer, it seems likely to be better to
split its upload capacity into more than 4 slots.
The BitTyrant client exploits these insights by deviating from the reference client in three
ways:
• How many upload slots to use? Instead of using a fixed number of upload slots,
BitTyrant dynamically adjusts the number of upload slots, choosing the number that
maximizes performance.
• Who to unchoke? Instead of unchoking those peers with the fastest download speed,
BitTyrant unchokes those peers where the ratio of received download speed to provided
upload speed is best.
• How fast to upload to unchoked peers? Instead of using the equal-split strategy, BitTyrant dynamically adjusts the upload bandwidth provided to every unchoked peer,
with the goal to upload at the minimum rate at which that peer is willing to reciprocate.
Essentially, BitTyrant takes a “return-on-investment” perspective. It “invests” upload
86
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.4 “Attacks” on BitTorrent
Figure 4.10: Expectation of download performance as a function of upload capacity (Piatek
et al., 2007).
capacity to maximize its download rate. To perform this strategy, the BitTyrant client
needs to do more bookkeeping than the reference client.
Bookkeeping: estimating dj and uj . For each neighbor j, peer i maintains an estimate
of dj , the current download rate that j provides its unchoked peers, and uj , the upload rate
a peer must allocate to peer j to become unchoked at j. If peer i is currently unchoked
by j, then dj is simply the actual download bandwidth i receives from j. Otherwise,
i must estimate dj based on secondary information, and BitTyrant does this in a very
smart, yet roundabout way. Remember that peers send have-messages to all peers in their
neighborhood every time they complete a new piece, even to peers that are currently not
unchoked. Based on the frequency with which such have-messages arrive from peer j, peer
i can estimate the total download rate received by peer j.
Now consider Figure 4.10 again, and notice the roughly linear relationship between a
peer’s upload capacity and its expected download rate. For this reason, BitTyrant uses the
estimated total download rate as an estimate for a peer’s total upload capacity. Assuming
an equal-split upload rate for peer j, the total estimated upload capacity is divided by the
total number of upload slots j is expected to use at this speed, to obtain an estimate of
dj , i.e., the download bandwidth that i can expect to receive from j if i is unchoked. Peer
i does not know the number of upload slots used by j, and instead estimates this number
as the number of upload slots that popular file-sharing clients would typically use at that
upload capacity.
Estimating uj , the upload rate required to become unchoked at j, is even harder than
estimating dj , because it cannot be estimated based on messages that j sends to its neighbors. The rate uj depends on the upload capacities of other peers in the swarm, which peers
have each other in their respective neighborhoods, which peers currently have which pieces
and are thus interested in each other, and so on. For this reason, BitTyrant simply uses
the equal split upload capacities observed in prior measurements as the initial estimate for
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
87
4 The Economics of Peer-to-Peer File Sharing
uj , and then adjusts these estimates over time. For example, based on the measurements
shown in Figure 4.9, BitTyrant would estimate uj to be somewhere around 14 KB/s.
The BitTyrant Algorithm: Step by Step. Now consider Algorithm 4.2 which describes the
complete BitTyrant unchoking strategy. In step 1, the parameters uj and dj are initialized,
as described above. In step 2, the variable cap i is set to the maximum upload capacity
the client shall use in this swarm. Going forward, we consider two different versions of
the BitTyrant client: #1) the default BitTyrant client, which we also call altruistic or
uncapped, sets cap i equal to its total available upload capacity (i.e., no artificial cap); and
#2) the capped BitTyrant client, which caps its upload capacity at some point below its
total available upload capacity. Using version #2 may make sense if the user can make
better use of the additional upload capacity in some other way, e.g., by using it in another
BitTorrent swarm or by using it for a non-BitTorrent application (e.g., VoIP). In step 3 of
d
the algorithm, the client i ranks peers j in decreasing order of their ujj ratio, and then keeps
adding new upload slots, filling
P them with the most valuable peers, until the upload budget
cap i is reached. If cap i > j uj then all peers in i’s neighborhood are unchoked and the
excess upload capacity remains unused.
Finally, in step 4 of the algorithm, the parameters uj and dj are updated based on the
observations from the current period. Good values for the parameters α, γ and r used in
this last step can be determined experimentally. In their experiments, Piatek et al. (2007)
decreased uj by γ = 10% if a peer reciprocated for r = 3 periods, and increased uj by
α = 20% if a peer failed to reciprocate after being unchoked during the previous period.
Empirical Evaluation of BitTyrant. For the empirical evaluation of BitTyrant described
next, the upload budget cap i is set manually to fixed levels, such that the two versions of
BitTyrant coincide. Figure 4.11 shows a comparison of the download times for a single
BitTyrant client and the download times of a single unmodified BitTorrent client, when all
other peers in the swarm are also using an unmodified BitTorrent client. The peer’s upload
cap is plotted on the x-axis, and the achieved download time on the y-axis.
We see that the BitTyrant client consistently outperforms the BitTorrent client. While
the exact size of the performance difference depends on the client’s upload capacity, the
BitTyrant client is roughly twice as fast as the standard BitTorrent client, except for very
low upload caps where the difference is smaller. Furthermore, while the performance of
the BitTorrent client saturates at around an upload cap of 100 KB/s, the performance of
BitTyrant continues to improve.
The fact that download times do not decrease much further for BitTyrant as upload
capacities go beyond 500 KB/s is due to the limited swarm size used in the simulations.
A BitTyrant peer i with a large upload bandwidth will be able to unchoke so many peers,
that at some point it will reach a peer j where the amount of download bandwidth dj it
gets in return for uj is very close to 0, or even equal to 0. Of course, allocating bandwidth
to peers with dj = 0 does not hurt the client, but it also doesn’t help. This is where the
behavior of the two BitTyrant versions differ. The uncapped (altruistic) BitTyrant client
keeps allocating bandwidth to new peers until all of the total available upload bandwidth
is used up. In contrast, the capped BitTyrant client withholds some of its available upload
88
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.4 “Attacks” on BitTorrent
Algorithm 4.2: The BitTyrant Unchoking Algorithm (Piatek et al., 2007).
1. For each peer j, peer i maintains estimates of expected download rate dj and
expected upload rate uj required for reciprocation by peer j:
a) If peer i is currently unchoked by j, then dj is the observed download
bandwidth. Otherwise, dj is inferred indirectly from j’s block announcement
rate.
b) Initialize uj using the distribution of equal split capacities observed in prior
measurements (e.g., Figure 4.9).
2. Set cap i to the maximum upload capacity that peer i shall use in this swarm
3. Each period, order peers by decreasing ratio
cap i is reached:
dj
uj
and unchoke those of top rank until
dn
d0 d1 d2 d3 d4
, , , , , ...,
u u u u u
un
| 0 1 2 P{z3 4
}
max n s.t.
n
j=0
uj ≤ cap i
4. At the end of each period, for each unchoked peer j:
a) If peer j does not unchoke i: uj ← (1 + α)uj
b) If peer j unchokes i: dj ← observed rate.
c) If peer j has unchoked i for the last r periods: uj ← (1 − γ)uj
bandwidth if cap i is smaller than the total available upload bandwidth, and thus, some
peers with small values of dj may not be unchoked. The optimal value for cap i could
be determined dynamically. For example, we could modify the algorithm such that the
client never allocates to any peer with dj = 0. Alternatively, the client might already stop
d
allocating slots to new peers once the ratio ujj drops below some small value ε, at which
point the user might prefer to use the extra bandwidth for other applications. Thus, the
BitTyrant client can in fact discover this “point of diminishing returns,” and then cap the
upload bandwidth at exactly that point.
We now take a look at what happens, if all peers in a swarm use BitTyrant, and we
analyze the performance difference between the two BitTyrant versions. While we have seen
that BitTyrant improves the performance of a single peer when playing against standard
BitTorrent peers, we are now interested in BitTyrant’s effect on social welfare, i.e, the
download performance averaged over all peers in the swarm. For this, consider Figure 4.12,
where the cumulative distribution function of completion times is shown, comparing the two
versions of the BitTyrant client with the standard BitTorrent client. For this experiment, the
capped BitTyrant client was modified such that the upload capacity of even the high capacity
peers was limited to 100 KB/s. All three lines in Figure 4.12 are based on experiments where
all peers in the swarm use the same client. We see that the altruistic/uncapped BitTyrant
client provides the highest performance, the standard BitTorrent client (which is also not
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
89
4 The Economics of Peer-to-Peer File Sharing
Figure 4.11: The BitTyrant client consistently outperforms a standard BitTorrent client in
terms of average download time (Piatek et al., 2007).
capped) lies in the middle, and the capped BitTyrant client achieves the lowest performance.
The altruistic/uncapped BitTyrant client is able to provide the highest overall performance because the distribution of upload capacities in the swarm is skewed. Using BitTyrant, high capacity peers obtain more file pieces faster, and can then reciprocate with
more pieces, thus effectively increasing the swarm’s capacity. This is a very positive result:
by modifying the unchoking strategy, the download performance has increased, not only for
the individual but also for the community as a whole. On the other hand, if BitTyrant peers
cap their upload bandwidth at the point of diminishing returns, then overall performance
decreases. This happens because every BitTyrant peer that limits its upload bandwidth
effectively reduces the total amount of upload bandwidth available in the whole swarm,
which necessarily means that other peers get lower download speeds. Thus, whether or not
BitTyrant provides a social welfare benefit depends on whether users only seek to maximize
their individual download rate or whether they also seek to minimize their upload rate as
a secondary objective (for example because they are participating in multiple swarms, or
because they want to use their upload bandwidth for other applications).
Another draw-back of using BitTyrant is that new users may experience an extended
bootstrapping phase. Unlike the reference client, which uses one upload slot for optimistic
unchoking, the BitTyrant client only unchokes peers that send at a fast rate. A BitTyrant
client will only unchoke a newcomer that is not uploading if there is no other peer from whom
the BitTyrant peer would at least get something in return, and if furthermore BitTyrant is
not using an upload cap. Introducing optimistic unchoking into BitTyrant would increase
the swarm’s performance, but this would not be in the interest of self-interested users. To
summarize: while BitTyrant certainly improves an individual peer’s download performance
compared to the reference client, it lowers the overall swarm’s performance if everyone
adopts the capped BitTyrant client.
90
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.5 Discussion
Figure 4.12: CDFs of completion times, comparing the uncapped (altruistic) BitTyrant, the
original BitTorrent, and the capped BitTyrant (Piatek et al., 2007).
4.5 Discussion
4.5.1 User Behavior in Practice
Despite all of the possible ways in which BitTorrent can be attacked, the BitTorrent network
has been incredibly robust and successful. A number of BitTorrent clients that can improve
a user’s download performance are available for download (e.g., BitTyrant), but almost
nobody uses them. Rather, the most popular clients implement strategies that are similar
to the reference client. One explanation for this user behavior is that the overall user
experience with these clients may be better; e.g., with graphically appealing user interfaces,
integrated search for content, and so forth. Given this, a rational user might prefer a client
with a slower download speed (or one that requires more upload bandwidth), if the client
provides other useful functionality that is missing from more “strategic” clients. In fact,
the BitTorrent clients that are popular today compete for new users by advertising low
processor or small memory usage of their clients instead of focusing on download speed.
This also indicates that users choose their favorite clients based on a number of different
factors, and download speed is no longer the differentiating factor.
4.5.2 Private BitTorrent Communities
An interesting development that has changed the nature of the BitTorrent network in recent
years is the rise of private BitTorrent communities. While a public BitTorrent community
uses a public tracker or a DHT to coordinate the peers in a swarm, a private community uses
a private tracker that can only be used with special credentials, i.e., users must register an
account. In many private BitTorrent communities, a user account can only be obtained by
invitation from current members, and these invitations can be difficult to get. Thus, it is not
surprising that these private communities are much smaller than their public counterparts.
A study from 2009 reported that the largest public BitTorrent community The Pirate Bay
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
91
4 The Economics of Peer-to-Peer File Sharing
had 4 Million members, while TorrentLeech, one of the largest private communities, had only
178,000 members. However, the members of private BitTorrent communities are usually
power users with high-speed internet connections, and thus the member numbers do not
necessarily reflect the number of files or the total bandwidth available in these communities.
Private trackers generally collect statistics about each user’s upload and download behavior and enforce sharing ratios, i.e., a minimum ratio of upload to download traffic per
user. For example, a typical policy may be that a user must have a sharing ratio of at
least 0.25, and will ban a user if he violates the sharing ratio for an extended period of
time. The empirical evidence we have shown in Figure 4.1 suggests that sharing-ratio enforcement is working reasonably well: the typical download speeds that can be obtained
in private BitTorrent communities are about 3 to 5 times higher than in public communities. Note that the use of higher-level incentive mechanisms like sharing-ratio enforcement
also reduces the importance of the incentives on the lower levels, e.g., with regard to the
unchoking algorithm.
While some recent measurement studies have shown how well private BitTorrent communities are working, understanding the dynamics present in these communities is still
an active area of research. Some questions in regard to the design of optimal incentive
mechanisms for these communities include:
1. Are sharing-ratios a good mechanism, and if so, which exact ratio should be enforced?
2. Should a credit-based system be used in place of sharing-ratio enforcement, with each
type of action corresponding to a different amount of credit?
3. Should there be a minimum seeding amount by each user?
There are also obvious vulnerabilities in the way in which existing private tracker communities operate. They all rely on BitTorrent clients making truthful reports regarding the
total amount of bytes downloaded and uploaded. To achieve this, they usually only allow
clients from a small list of “trusted” clients. However, the client’s user-id, which is used to
identify the particular BitTorrent client, is also self-reported by the client to the tracker,
and could be spoofed. Thus, it would be relatively easy to develop a client that pretends to
be one of the trusted ones, but then misreports the total amount of bytes downloaded and
uploaded to gain an advantage. To this date, we are not aware of any private tracker that
uses a more sophisticated security mechanism that would prevent such an attack.
Given this, it is a bit surprising that private BitTorrent communities are flourishing and
not suffering from instability due to cheating. One possible reason is that users fear being
banned from the community if found to be cheating by the tracker, and don’t want to give
up the 3-5x download speed improvements over public communities. A second reason is
that some participants in private communities seem to exhibit altruism.
4.5.3 Altruism in BitTorrent
So far, we have been able to reconcile most of the observed behavior in BitTorrent with a
selfish-rational user model. However, one behavior that seems to contradict this standard
user model is that some users (in particular of private BitTorrent communities) purposefully
select very “altruistic” settings for their clients. It seems that many BitTorrent users have
adopted a certain sharing norm. It is not uncommon that users configure their clients to
92
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.5 Discussion
Figure 4.13: Traffic proportions due to different P2P protocols in the US between 2002 and
2004 (?).
upload at least as much (or even twice as much) as they have downloaded before going
offline. Thus, there remain interesting questions regarding what motivates users to share
in P2P file-sharing communities, even when they don’t benefit directly. These questions lie
at the intersection of behavioral economics and computer systems design and give rise to a
number of challenging research problems. We will come back to some of these considerations
involving behavioral economics models in Chapter 27. There, we also look at systems like
Linux and Wikipedia, where some users display similarly altruistic behavior.
4.5.4 A Short History of P2P File Sharing
P2P file sharing was initially marked by rapid changes, with users migrating quickly from
one network to the next. While BitTorrent has now emerged as the clear winner of the P2P
file-sharing protocol battle, this was far from obvious when BitTorrent was first introduced
in 2001. Remember that Napster had just been developed two years earlier, in 1999. In
2000, the two protocols eDonkey and Gnutella had been introduced, and in 2001, FastTrack
joined the reign of popular protocols. These three were responsible for the majority of P2P
file-sharing traffic in the first two years after Napster was shut down in 2001 (see Figure
4.13).
After Napster was gone, the FastTrack network became the dominant player worldwide,
reaching its peak in 2003 with more than 3 million users simultaneously connected to the
network, sharing over 5,000 terabytes of content. For many reasons, among them a proprietary protocol and client software that came bundled with malware, FastTrack started
losing its market share to other networks towards the end of 2003. In parallel to FastTrack’s decay, the popularity of BitTorrent started to increase. In 2005, BitTorrent had
between 2 and 3 million simultaneous users world-wide, while eDonkey had between 3 and
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
93
4 The Economics of Peer-to-Peer File Sharing
100%
4%
12%
5%
80%
5%
5%
7%
28%
24%
20%
7%
12%
44%
50%
60%
Other
FastTrack
72%
eDonkey
40%
52%
67%
71%
07
08
73%
81%
BitTorrent
45%
20%
16%
0%
2003
04
05
06
09
10
11
12
2013
Figure 4.14: P2P File-sharing traffic due to different P2P protocols in Europe/Germany
between 2003 and 2013.3
4 million, FastTrack had around 2 million, and Gnutella had around 1.8 million. However,
user adoption of the various P2P networks differed significantly around the world. In the
US, BitTorrent had quickly emerged as the most popular protocol. In 2005, it was already
responsible for 48% of the P2P traffic in the US, and its market share continued to grow
over the next 8 years, with 86% in 2012 and 88% in 2013.
In the rest of the world, however, the development was a little different, and the adoption
of BitTorrent was slower than in the US. In Europe, for example, the eDonkey protocol was
much more popular than in the US. It was the most popular protocol in 2003, responsible for
over 50% of the P2P traffic, and continued to be the most popular protocol until BitTorrent
finally overtook it in 2007. This trend continued, and in 2013, BitTorrent was responsible
for 81% of the P2P file sharing traffic in Europe, but eDonkey is still making its mark,
responsible for roughly 12% of the P2P traffic (see Figure 4.14).
While the absolute amount of Internet traffic due to P2P file sharing has increased yearafter-year from 1999 till 2013, its relative share has decreased significantly since it had
reached its peak somewhere between 2004 and 2006 (see Figure 4.15). While the increasingly
strict enforcement of copyright laws with multi-million dollar law-suits may have scared off
some file-sharing users, the most important reason for this relative decline in recent years is
3
Because of incomplete data, for the years 2003, 2005, 2007, and 2008, the graph is based on data for
Germany only. For the other years, the graph is based on data for all of Europe. However, the numbers
for Germany can be seen as a relatively good approximation of the European averages.
4
The numbers for worldwide traffic shares from 1999 till 2004 are based on ?, page 3 (our sources did not
provide separate traffic numbers for US vs. Europe for 1999-2003). The numbers for 2005 are estimated
average traffic shares based on data for upload and download bandwidth. The European numbers for
2007 and 2008 are based on data from Germany only. However, the numbers for Germany can be seen
as a relatively good approximation of the European averages.
94
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.6 Notes
80%
Europe
60%
US
40%
Worldwide
20%
0%
99
00
01
02
03
04
05
06
07
08
09
10
11
12
13
Figure 4.15: Share of the total Internet traffic due to P2P file sharing from 1999 to 2013.4
the increased attractiveness of alternative means to obtain music or videos on the Internet.
In the early 2000s, P2P file sharing may have been the only (practical) way to get access
to music or videos (legally or illegally) directly via the Internet. But today, users have
many attractive options available to them. For example, they can now buy individual songs
or whole albums directly from Apple’s iTunes or other comparable service. Additionally,
music streaming services like Pandora, Last.fm, Spotify, Soundcloud, etc., have become very
popular in recent years, and are often available for free (supported via advertising). For
music videos and other short video clips, YouTube! has become the go-to-place since it
was founded in 2005. In 2007, the DVD-by-mail service Netflix started streaming some
of its movies to end-users, and since then has continuously improved its online streaming
service. The increased availability of such real-time entertainment options is the reason
why the share of Internet traffic due to P2P file sharing decreased so much between 2005
and 2009. This also explains why the decline happened somewhat earlier in the US, where
such options became available earlier than in other parts of the world. In the US, real-time
entertainment was responsible for 62% of Internet traffic in 2013, with Netflix clearly being
the major player responsible for 28.8% of Internet traffic. In contrast, in Europe only 35.7%
of Internet traffic was due to real-time entertainment, mainly because the market for realtime entertainment is still relatively young. For the next few years, it is expected that the
relative importance of real-time entertainment will continue to increase around the world,
while the share of Internet traffic due to P2P file-sharing will continue to decrease further.
4.6 Notes
One of the first papers that studied P2P file-sharing was the paper “Free Riding on Gnutella”
by Adar and Huberman (2000), which showed convincingly how prevalent free riding was
already in 2000. The follow-up paper “Free Riding on Gnutella Revisited: The Bell Tolls?”
by Hughes et al. (2005) showed that the amount of free riding had increased significantly
from 2000 to 2005. For background reading on BitTorrent, which was first introduced by
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
95
4 The Economics of Peer-to-Peer File Sharing
Bram Cohen in 2001, the original paper “Incentives Build Robustness in BitTorrent” by
Cohen (2003) gives a short yet good introduction to the protocol design. A good reference
for the tit-for-tat strategy, an important element of Cohen’s original design, is the paper
“The Evolution of Cooperation” by Axelrod and Hamilton (1981). A detailed measurement
study illustrating the performance of BitTorrent as of 2005 can be found in the paper
“The BitTorrent P2P File-Sharing System: Measurements and Analysis” by Pouwelse et
al. (2005). For an updated study, including a comparison of public and private BitTorrent
communities, see the “ Public and Private BitTorrent Communities: A Measurement Study”
by Meulpolder et al. (2010). For a measurement study analyzing the incentives at play in
private communities see the paper “Economics of BitTorrent Communities” by Kash et al.
(2012). For more details on how decentralized tracking is implemented in BitTorrent via a
distributed hash table (DHT), please see Falkner et al. (2007).
Many papers on BitTorrent have challenged the idea that BitTorrent’s incentives provide
robustness against strategic attacks. Shneidman et al. (2004) as well as Liogkas et al. (2006)
have described multiple simple attacks, including the exploit based on uploading garbage
data. The paper by Locher et al. (2006) introduced the BitThief client, and showed that
it is possible to use BitTorrent effectively without ever uploading. The “piece revelation
strategy” was described and shown to be effective by Levin et al. (2008). The BitTyrant
client was introduced by Piatek et al. (2007) and has shown the major download speed
improvements that can be obtained by using a better unchoking strategy. Due to space
constraints, we could not discuss the “Propshare Mechanism” by Levin et al. (2008), who
propose a different unchoking algorithm than BitTyrant, leading to slightly better individual
and system-wide performance.
The advances from the systems research community regarding exploiting the BitTorrent
protocol and developing better strategies have out-paced the theoretical understanding of
the incentives in BitTorrent. Some examples of theoretical work includes the paper on
coupon replication systems by Massoulié and Vojnović (2005), and the paper on anonymous
social networks by Immorlica et al. (2010).
It is difficult to give a consistent account of the history of P2P file-sharing, because the
exact market-shares, traffic ratios, etc., depend on the measurement technology used, the
particular network and user group sampled, and so forth. Furthermore, these numbers
varied significantly for different regions in the world, and nobody measured these numbers
consistently over the years. For this reason, the brief history of P2P file-sharing we provide at
the end of this chapter is based on a number of different sources, including the data provided
in (?) and (?), online news articles of the BBC5 and of arstechnica6 , news items from the
P2P file-sharing website Slyck 7 , press releases from BitTorrent Inc.8 , a measurement study
by Liang et al. (2006), the Internet studies conducted by ipoque in 20069 , 200710 , and
2008/200911 , the press releases from 2002 till 2013 by the networking equipment company
5
http://news.bbc.co.uk/2/hi/business/1449127.stm
http://arstechnica.com/uncategorized/2007/06/the-youtube-effect-http-traffic-now-eclipses-p2p/
7
http://www.slyck.com/news.php?story=814
8
http://www.bittorrent.com/company/about/ces 2012 150m users
9
http://www.ipoque.com/sites/default/files/mediafiles/documents/p2p-survey-2006.pdf
10
http://www.ipoque.com/sites/default/files/mediafiles/documents/internet-study-2007.pdf
11
http://www.ipoque.com/sites/default/files/mediafiles/documents/internet-study-2008-2009.pdf
6
96
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
4.7 Comprehension Questions and Exercises
Sandvine 12 , and the three most recent measurement studies conducted by Sandvine in 2012
and 201313 . Note that companies like Sandvine that are producing traffic shaping equipment
have an incentive to overstate the relative importance of P2P traffic. Thus, while the relative
trends depicted in Figure 4.14 and 4.15 are correct, the absolute numbers should be taken
with the original sources in mind.
4.7 Comprehension Questions and Exercises
4.7.1 Comprehension Questions
c4.1 Explain in one sentence each, a) what the biggest problem of Gnutella was, and b)
why the theory of repeated games doesn’t apply.
c4.2 Explain, in one sentence, the main difference between Gnutella and BitTorrent with
regard to the resulting incentives.
c4.3 Describe three ways in which the file-sharing game corresponding to the BitTorrent
protocol is not just a repeated prisoner’s dilemma.
c4.4 Explain two ways in which the original design of the BitTorrent network remained
centralized.
c4.5 Is the strategic piece revelation strategy good or bad for social welfare? Explain!
c4.6 Describe three aspects in which the BitTyrant unchoking strategy differs from the
reference client.
c4.7 Does the BitTyrant client increase or decrease social welfare? Explain!
4.7.2 Exercises
4.1 The strategic-piece-revelation strategy in the BitTorrent protocol uses “under-reporting”
of pieces. Consider instead a strategy based on “over-reporting” pieces, i.e., a client
reporting to have pieces that it doesn’t actually have. Provide some intuition for
why such a strategy might make sense. Next, explain why such a strategy would be
unlikely to work in practice, i.e., provide a vulnerability?
4.2 The strategies in file-sharing games are provided by software, with new clients (=
strategies) such as BitThief released over time. Suppose that a client is universally
adopted, and even proved to be a Nash equilibrium with itself. Why might you still
worry this is insufficient to provide stability of the ecosystem?
4.3 Prove that for a suitably chosen discount factor, the Tit-for-Tat strategy described
in Section 4.2.3 indeed constitutes a subgame-perfect Nash equilibrium of the P2P
File-Sharing game
12
13
http://www.sandvine.com/news/press releases.asp
http://www.sandvine.com/news/global broadband trends.asp
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
97