4 The Economics of Peer-to
Transcription
4 The Economics of Peer-to
4 The Economics of Peer-to-Peer File Sharing Imagine you need to distribute a software patch to 10 Million users. What’s an efficient way for doing so? As a user of a peer-to-peer file sharing system, what is the optimal strategy to maximize your download speed? What are the incentives at play in file-sharing systems like Gnutella or BitTorrent? For many applications that require the distribution of files to a large number of users, peerto-peer (P2P) file-sharing networks are an attractive alternative to server-based solutions. If the community of file sharers cooperates appropriately, high download rates can be achieved at virtually no cost to the injector of the content. However, different P2P file-sharing protocols can give rise to very different P2P file-sharing games, and whether users of a P2P file-sharing system have an incentive to cooperate largely depends on the design of the system. For this reason, game theory has proven particularly useful for the analysis of existing P2P file-sharing protocols as well as for the design of new ones. We begin this chapter with a brief introduction to the P2P file-sharing paradigm. In Section 4.2, we discuss the rise and fall of the Gnutella network, and explain why many filesharing networks suffer from free riding. We then focus on BitTorrent, the most successful file sharing network with more than 150 Million active users per month. In Section 4.3, we describe the BitTorrent protocol in detail, and explain how BitTorrent changed the filesharing game and improved incentives for cooperation. In Section 4.4 we explore a number of different attacks on BitTorrent, which we can think about as strategies an individual user can use to increase his own performance. Finally, in Section 4.5 we conclude with a brief discussion on user behavior observed in practice, private BitTorrent communities, altruism in BitTorrent, and a brief history of P2P file sharing. 4.1 Introduction to P2P File Sharing Some of the most popular internet-based services relate to media content, including downloading MP3s from iTunes, buying e-books from Amazon, and streaming videos over YouTube or Netflix. These services are based on the client-server model, which means that the required tasks are clearly separated: the service provider is associated with the server machines, and the user’s device is the client machine. Because millions of MP3s and videos are downloaded or streamed every day, these services require thousands of servers around the world, and also cache content on machines close to users in order to reduce latency. All in all, it is costly to deliver lots of content in this client-server paradigm and only a few companies have the scale to compete. In the late 1990’s, a very different information sharing paradigm emerged, namely peer-topeer (P2P) file-sharing networks, also known as P2P file-sharing systems. In a P2P network, 69 4 The Economics of Peer-to-Peer File Sharing the separation between servers and clients is removed and each computer acts as both a server and a client, and is simply called a peer. With the appropriate protocol in place, users can exchange files with each other and this can take place with little or no centralized system infrastructure or control. This decentralization leads to a number of advantages that P2P systems have over serverbased systems. Most importantly, P2P systems scale very cheaply. In particular, large files can be distributed to a large number of users at very low costs for the initial uploader of those files. Additionally, P2P systems are very robust by avoiding a single point of failure, and achieving a similar degree of robustness using a traditional client-server model (by using thousands of server) is significantly more costly. Finally, P2P systems don’t require their users to reveal their real name or register with a credit card, and thus they provide a certain degree of anonymity and privacy for their users, in contrast to many media services like iTunes, Netflix, etc. Of course these advantages also come at a certain cost. For example, the injector of the content has essentially no control over who will download the files, for how long the files will be available in the P2P network and at which download speed. Obviously, a P2P network is not suitable for all applications. P2P file-sharing networks sometimes receive a bad reputation because the content exchanged can contain copyrighted material, and thus, exchanging them with other users is illegal in most countries. While it is undeniable that rise of file-sharing networks eas primarily due to the availability of popular copyrighted material in these networks, there are also many legal uses of P2P file-sharing networks.1 For example, free software (e.g., Linux distributions) is made available via P2P file-sharing networks. The gaming company Blizzard Entertainment distributes the game installer package as well as update patches for World of Warcraft via the BitTorrent P2P file-sharing network. Given that such files can easily be 500 MBytes large, and considering the Millions of subscribers of World of Warcraft (many of whom will update simultaneously), it is easy to see how using a P2P file-sharing system is significantly cheaper for Blizzard compared to using a server-based distribution approach. Blizzard has also used BitTorrent to distribute trailers for their games Starcraft II and Diablo III, and Internet TV services such as Zattoo are streaming video data to their users via P2P networks. When used for content distribution, the originator incurs minimal cost for injecting the content— the cost of an initial upload. From this point on, users can download the content at high speeds, in particular when millions of users are downloading the content at the same time and are also sharing with each other. Figure 4.1 displays the cumulative distribution function (CDF) of average download speeds in the P2P file-sharing network BitTorrent, based on measurements of more than 500,000 peers in 2009 (note the logarithmic scale on the x-axis). We see that the average downloads speeds are very high. For example, the median download speed in the public PirateBay community was 333kbps, the average download speed was around 1Mbps, and the fastest 10% obtained download speeds of more than 2Mbps. In the private communities (TVTorrent, TorrentLeech, and PolishTracker ), the average download speeds ranged between 3.6Mbps and 8.6Mbps (high enough to stream HD movies). Thus, in contrast to server-based solutions, P2P systems can provide lots of users with very high download rates 1 We discourage the reader to download copyrighted material via P2P file sharing networks. Downloading copyrighted material is illegal in many countries. 70 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.1 Introduction to P2P File Sharing Figure 4.1: The CDF of average download speed for five different BitTorrent communities, based on measurements of over 500,000 peers in 2009 (Meulpolder et al., 2010). without using expensive infrastructure. We briefly return to the variation in performance between the public and private BitTorrent communities in Section 4.5.2. 4.1.1 P2P File Sharing in the Language of Game Theory Many different P2P file-sharing networks have emerged, each with properties that vary according to the following four factors: 1. Protocol: A P2P file-sharing network requires a network protocol such as Gnutella or BitTorrent. The protocol refers to the messages and actions that are supported by the system. In the language of game theory, the protocol defines the rules of the game. 2. Reference client: One way to introduce a new protocol is to develop a file-sharing client, which is a software application compatible with the protocol. Referred to as the reference client, this implements a default behavior in the file-sharing game. In the language of game theory, the reference client implements a default strategy. 3. Other clients: It is hard to enforce the usage of a particular reference client. Thus, alternative clients with different behaviors, but still compatible with the protocol, can be developed. In the language of game theory, each new client implements a different strategy. 4. User behavior: Finally, end users decide which network to join and which client application to use on their machine. Users can also configure the client application, allowing it to access certain files or not, limit the bandwidth usage, and so forth. Thus, the ultimate behavior of a user in the file-sharing game is determined both by the design of the client application and a user’s decisions. Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 71 4 The Economics of Peer-to-Peer File Sharing Whereas we used client in the discussion of the client-server architecture to refer to a client’s machine, in the context of P2P networks the client is used to refer to the software application that runs on a peer’s machine. Note: For many P2P networks, the same term is used interchangeably to refer to the network, the protocol, and a client. For example, we refer to the BitTorrent protocol, the BitTorrent network, and different BitTorrent clients; in addition, the reference client application, is also called BitTorrent. In evaluating a P2P file-sharing network, we need to consider all these factors: the protocol that defines the rules of the game; given a particular protocol, users will choose a client that implements a particular strategy (for example, the client that maximizes the individual user’s performance); and given the choice of client, a user will configure the client according to his preferences. The distribution of client strategies that are adopted in the network and the configurations that users choose ultimately determine the outcome of the underlying P2P file-sharing game. When comparing different file-sharing systems, it is useful to consider the following three properties: • Social Welfare: Social welfare can be defined as the average download speed of participating users. Thus, a social welfare-maximizing system minimizes average download times. • Incentive Properties: The social welfare of a P2P file-sharing network depends on the willingness of users to share with other users. Networks vary according to the degree to which they align incentives with sharing, so that it is in a user’s self-interest to allow downloading from his peer. • Fairness: In order to sustain P2P file-sharing communities it can be useful to achieve a distribution of upload and download resources that is seen to be fair by most users; e.g., one possible criterion for fairness is that a user’s download speed is roughly proportional to his upload speed. Other terms that are used in place of the social welfare of a P2P file-sharing network are the efficiency, or simply performance, of the network. 4.1.2 Napster: How Everything Began The rise of P2P file-sharing networks started in 1999 with the release of the Napster P2P file-sharing client, which became widely popular in early 2000, and reached its peak in February 2001 with 26.5 million users world-wide. Before Napster, users had already shared files via other networks like IRC and USENET, but Napster was unique in providing a userfriendly interface, adopting a centralized directory to make it easy to search for content, and in specializing in music files. Users who wanted to share MP3s could connect to the Napster server and add descriptions of their files to Napster’s database. A user wanting to download files could then easily query the server for content. The actual exchange of files happened directly between the clients, i.e., the peers, thus the term “peer-to-peer.” Due to legal difficulties, Napster was shut down in July 2001, but a number of other networks had already emerged and quickly took its place. In fact, around the turn of the century, with broadband Internet access becoming available around the world, file sharing became more and more popular. In 2002, P2P file sharing was 72 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.2 Free riding in P2P Networks responsible for approximately 50% of world-wide Internet traffic. Between 2000 and 2010, many P2P file-sharing protocols were introduced. In the first few years after Napster’s shut down, the three networks FastTrack, eDonkey, and Gnutella were responsible for the vast majority of P2P traffic world-wide. Users moved quickly from one network to the other as better alternatives were introduced. In contrast to Napster, where users could only share music, FastTrack, eDonkey and Gnutella also allowed users to share movies, which over time became the primary content shared in these networks. Because these later P2P networks were more decentralized than Napster, they were essentially impossible to shut down, despite many different lawsuits. For a more detailed history of P2P file-sharing, please see Section 4.5.4. 4.2 Free riding in P2P Networks The Gnutella network, introduced in 2000, was the first truly decentralized P2P file-sharing network. To join the network, a peer connects to one of several peers that are known and almost always online, but do not generally share files. Rather, these peers share a list of IP and port addresses of other peers. Communication between peers then proceeds via broadcast messages, sent to the list of peers known by the peer. This can include the re-broadcast of messages received from another peer. To find a desired file, a peer uses a query message to describe the desired content. Such a message is re-broadcast peer-to-peer until a peer with the desired content is found, or until some maximum number of re-broadcasts has occurred. A peer with the desired file replies with a query response, a message that contains the peer’s IP and port address, a unique client ID, as well as other information necessary to download the file. These query response messages are propagated backwards along the path that the original query message took, until reaching the original requester who can then contact the peer who has the file to start the download. 4.2.1 Free Riding on Gnutella In contrast to Napster, Gnutella has no central server and no statistics about individual user behavior are maintained, in either a centralized or a decentralized way. Thus, for two peers interacting with each other, every interaction looks just like any other one: a simultaneousmove game with anonymous players. This is in contrast to the way users shared files before the existence of P2P file sharing networks. For example, users on bulletin boards often knew each other from previous interactions, forming a close-knit community. Unfortunately, the design features of Gnutella very quickly led to a problem: the majority of users were free riding, i.e., consuming resources (downloading files) without contributing back to the community (uploading files). In Figure 4.2, we present a simplified version of the P2P File-Sharing Game to illustrate why free riding is beneficial in Gnutella. In this game, each player can either share files with other users in the network, or free ride, i.e., only download files without ever uploading any files. The numbers in the payoff table are just illustrative, but are meant to convey the main features of file sharing in Gnutella: peers obtain positive utility from downloading a file (here 3), whether they share or not has no effect on their download experience, and users incur a small cost for uploading a file Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 73 4 The Economics of Peer-to-Peer File Sharing Player 1 Share Free Ride Player 2 Share Free Ride 2, 2 −1, 3 3, −1 0, 0 Figure 4.2: The P2P File-Sharing Game: A Prisoner’s Dilemma. Figure 4.3: Ordering of Gnutella peers by contribution (Adar and Huberman, 2000). (here -1). In particular, when both agents share, then each agent receives payoff 3 from downloading and -1 from uploading, and thus a total payoff of 2. The cost for uploading may arise for many reasons, including: increasing the bandwidth payments a user must make, precluding some other use for upload bandwidth such as VoiceOver-IP, a disutility from leaving a computer on while files are uploaded (electricity, noise, etc.), or concerns about the legal implications of uploading copyrighted material. As long as 1) a user obtains positive value from receiving a file, 2) his actions do not influence his download experience, and 3) he incurs a small cost for providing a file, the game has the structure of the Prisoner’s Dilemma game (see Chapter 2). Thus, it is a dominant strategy to free ride, and both players free riding is the only Nash equilibrium of the game. Given this incentive structure, it is surprising that anyone ever shared any files in Gnutella and that the network was relatively successful, at least in the beginning. After all, it was one of the four most popular file-sharing networks in the years after Napster was shut down. Some of the sharing activity can be explained by users who left the default settings of a software client in place. Indeed, many client applications were configured to make files downloaded automatically available to be uploaded to other peers. Other Gnutella users may have simply enjoyed sharing files with other people; i.e., the joy from uploading files to others outweighed the costs they incurred. 74 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.2 Free riding in P2P Networks Still, a study of Gnutella confirmed that 66% of the peers in 2000 shared no files with other peers; see Figure 4.3. In fact, the top 1% of sharers provided 37% of the total files shared and of the peers that shared files, the top 1% provided almost 47% of all query responses, while approximately 63% never provided a query response. Thus, the contributions made by peers on Gnutella were heavily skewed, and the large majority of peers were free riding. In fact, the amount of free riding in Gnutella increased significantly over the years. A study conducted in 2005 found that 85% of peers shared no files. The authors of the study argue that the developers of the various Gnutella clients did not have an incentive to build mechanisms into their software that would prevent free riding, because users would switch to other client applications without such restrictions. Thus, free riding remained a problem and there was a steady decline in performance and market share. In 2013, Gnutella was responsible for less than 1% of global P2P traffic. 4.2.2 Kazaa and Participation Statistics The developers of clients for other P2P file-sharing networks tried to fix the free-riding problem, but initially with little success. For example, Kazaa, a popular client using the FastTrack protocol, kept track of the uploads and downloads performed by a peer, thereby measuring the participation level of the peer in the network. The client application shared this information with other peers, and was designed so that a peer would give priority to peers with a high participation levels. However, because this information was stored locally by peers and self-reported to others, it could easily be spoofed! Very quickly, other programmers developed new client applications such as Kazaa Lite K++ and K-Lite, that simply set the reported participation level to the maximum, thus circumventing this simple incentive mechanism. 4.2.3 Repeated Games in Gnutella? So far, we have described the interactions between two file-sharing peers as a one-shot, simultaneous-move game. However, in practice, most file-sharing users download many files, possibly thousands, over their lifetime. Thus, we might think to model P2P file sharing as a repeated game instead. Remember from Chapter 3 that in a repeated game, the same simultaneous-move game is played by the same players over and over again, with every player having perfect information about the history of actions in all previous periods. In a repeated game, the observations of past actions allow for new strategies that do not exist in one-shot games. In a two-player game, player 1 can now condition his actions on the past actions of player 2, and vice versa. In the Prisoner’s Dilemma, for example, player 1 could reward “good behavior” by player 2 and punish “bad behavior,” hoping that this changes the incentives for player 2 in such a way that player 2 will always cooperate. Under certain conditions it is indeed possible to sustain cooperation in a repeated Prisoner’s Dilemma game, i.e., for both players to play (C,C) in every period, by threatening to punish the other player should he not cooperate. A particularly well-known strategy for playing an infinitely-repeated Prisoner’s Dilemma game is the so-called “Tit for Tat” (TfT) strategy. We have already seen one version of TfT in Chapter 3, but we will now present a slightly different version. In this version of TfT, each player starts out cooperating. If the other player ever defects, then the player Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 75 4 The Economics of Peer-to-Peer File Sharing punishes this in the next period by also defecting, and then goes back to cooperating in the following period. It can be shown formally, with the techniques we have studied in Chapter 3, that under certain conditions, this TfT strategy sustains cooperation in an infinitely-repeated Prisoner’s Dilemma game. In fact, one can show that this strategy constitutes a sub-game perfect Nash equilibrium (see Chapter 3 for details on these kinds of folk theorem results). Given these results, we might hope that suitably-designed file-sharing clients could implement strategies that lead to cooperative equilibria in the P2P file-sharing game, providing agents with an incentive to share. However, in many P2P file-sharing systems, the same two peers only interact once or a few times over their lifetime; i.e., the rendezvous probability is extremely low. And even if two peers see each other multiple times, it is unlikely that one peer has a file that the other wants at the time of rendezvous, which makes the theory of repeated games non-applicable. This problem was neatly solved through the BitTorrent file-sharing protocol, which introduced a completely new paradigm. 4.3 BitTorrent: Taking Incentives Seriously One of the key differences between BitTorrent and Gnutella is that in BitTorrent, a peer that is downloading a file is also simultaneously uploading pieces of that same file to other peers. While this may look like an unimportant detail at first, it is actually an integral design feature of the BitTorrent protocol because it solves the problem that occurs because of low rendezvous probability. Because of this change, the peers that are concurrently downloading the same file are exchanging lots of pieces with each other and are thus playing a “repeated game on the piece level,” which allows for new kinds of cooperative strategies. Indeed, the BitTorrent client implements a reciprocation policy that resembles TfT. The client is designed to promote the phenomenon that the more upload a peer provides to other peers, the faster it will be able to download pieces of the same file. This provides users with an incentive to upload to others, and discourages free riding. This incentive alignment, designed into BitTorrent, is one of the primary reasons for BitTorrent’s success. In 2013, BitTorrent was by far the most popular file-sharing protocol, responsible for more than 80% of world-wide P2P file-sharing traffic (see Section 4.5.4 for details on the evolution of BitTorrent’s market share over the last 10 years). 4.3.1 The BitTorrent Protocol To participate in BitTorrent, a user must download a client that is compatible with the BitTorrent protocol. When Bram Cohen introduced BitTorrent in 2001, he released a reference client which he also called BitTorrent. In some sense, introducing this reference client is the same as introducing the first version of the BitTorrent protocol. In this section, we describe the details of that client as specified by Cohen in a 2003 paper. Of course, since then the BitTorrent client has been continuously improved, but the main design has remained the same; otherwise the newer clients would not be backwards compatible. The content that users download on the BitTorrent network can be a single file (e.g., a movie), or a large aggregated collection of files (e.g., all MP3s from a music album). For simplicity we will simply refer to “a file” going forward. To find content, a user generally 76 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.3 BitTorrent: Taking Incentives Seriously Figure 4.4: Starting a download process in the BitTorrent protocol: 1) A user goes to a searchable directory (e.g., a website) to find a link to a .torrent file corresponding to the desired content; 2) the .torrent file contains metadata about the content, in particular the IP address of a tracker; 3) the tracker provides a list of peers participating in the swarm for the content; 4) the user’s BitTorrent client can now contact all these peers and download content. goes to a website that maintains a searchable directory of torrents, linking to so-called .torrent files. The .torrent files are generally not hosted by the same website that provides the directory service, but only linked to from that site. There also exist BitTorrent client applications with a completely decentralized content discovery protocol (e.g, Tribler ), and many users subscribe to RSS web feeds to automatically download new content when it becomes available. However, centralized websites remain the most common way to find content. Once a user downloads the .torrent file, he opens it with a BitTorrent client application. The .torrent file contains all the relevant metadata, including the name and size of the file that contains the desired content. A file in BitTorrent is divided into pieces, which are further divided into blocks (typically 64-512KB per block). The .torrent file also contains a 160-bit SHA-1 digital fingerprint of the data blocks to be downloaded, such that the client can later verify that blocks have been correctly downloaded. The metadata also includes the IP address of a tracker, which is a computer that acts as a server for the torrent, and is responsible for coordinating the peers who are interested in the file.2 All peers that are uploading or downloading the same file form a swarm. A peer announces itself to the tracker to obtain a list of peers that are part of the swarm, and the tracker returns a random subset (typically 50) of peers that are currently active in this swarm (i.e., their IP addresses). Peers re-announce themselves periodically to the tracker, usually every 15 to 30 minutes, at which point the tracker returns a new list of peers currently active in the swarm. Finally, the peer also tells the tracker when it is leaving the swarm. See Figure 4.4 for a schematic view of the process necessary to start downloading content via BitTorrent. Today, all standard BitTorrent clients also have an option for decentralized tracking, 2 The term “.torrent file” always refers to the file containing the metadata information. However, the term “torrent” can be used both to refer to the .torrent file and also to the file (e.g., a movie) a user wants to download. Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 77 4 The Economics of Peer-to-Peer File Sharing realized via a distributed hash table (DHT). This method is also often called tracker-less tracking because instead of using a centralized tracker, the peer obtains the IP addresses of the other peers in the swarm from an ad-hoc network of BitTorrent peers that are exchanging information about torrents in a decentralized way. Decentralized trackers are more robust and also more difficult to shut down in case of legal disputes. Note that even using DHTs, the torrent directory websites containing the .torrent files (which contain the tracker information) are still required. However, these websites now often also provide magnet links instead of (or in addition to) linking to .torrent files. If a user clicks on a magnet link he does not download the whole .torrent file, but instead he only downloads a (much smaller) torrent hash that represents a cryptographic hash value of the desired content. Using the torrent hash, a peer can find the corresponding peers in the DHT network and download the .torrent file from them before downloading the content file. The peers that are currently downloading pieces of a file are called leechers, and the peers who already have the complete file (and not just pieces of it) and are still uploading the file are called seeders. A user who wants to inject new content into the BitTorrent system can use a BitTorrent client to create a .torrent file corresponding to the file to be injected. The software takes care of splitting the file into pieces and pieces into blocks and computing the SHA-1 fingerprint. The user must only provide a tracker (there are lots of public trackers available) or use the decentralized tracking option. The user can then upload the .torrent file to a directory website, or send the .torrent file directly to interested users. Finally, the user must act as the first seeder of the new content, at least until each piece of the file has successfully been downloaded by at least one other user. Leechers, i.e., peers who are still downloading the file, need to connect to other peers in the swarm (seeders or other leechers) to download from them. A peer can either initiate a connection or respond to a connection request. When two peers connect they become neighbors and exchange a bitfield, which is a bit array informing each other about which pieces of the file they have. Each peer maintains open connections to all neighbors, forming its local neighborhood, and only closes a connection when it or the other peer leaves the swarm. Thus, the size of a peer’s neighborhood within a swarm generally keeps growing over time, until the download is complete. Whenever a peer finishes downloading a piece, it sends per-piece have messages to all peers in its neighborhood. Each peer maintains an estimate of the availability of each piece of the file by counting how many of its neighbors have the piece. When a peer i starts uploading to a leecher j, then j informs i about which blocks j wishes to receive next. The standard strategy is rarest-first, where a leecher tries to download blocks from those pieces that currently have the lowest availability. Rarest-first is designed to distribute rare pieces quickly, to reduce the likelihood of many peers waiting for the same pieces, and to minimize the risk that some pieces will not be available in the swarm at all. A peer generally only uploads to a small subset of peers in its neighborhood. All peers that it doesn’t upload to are called choked, and the few that receive some data are called unchoked or the active set. The number of unchoke slots S is variable, and can be set by the client software. Most BitTorrent clients originally used a fixed number of unchoke slots, often 4, while today many clients set the number of upload slots proportionally to the total available upload bandwidth. A key aspect of the BitTorrent protocol is determining which peers to unchoke and this aspect is also of strategic importance, as we will see in the following sections. 78 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.3 BitTorrent: Taking Incentives Seriously Figure 4.5: File exchange in the BitTorrent reference client, showing four unchoke slots with three allocated base on peers that reciprocate the most bandwidth and one used for optimistic unchoking and allocating to some other, random peer. 4.3.2 The Unchoking Algorithm: Who to upload to? The reference client always tries to download pieces from every peer in its neighborhood, but is very selective regarding which peers it will allow to download from its machine. In the reference client, a peer’s decision regarding which peers to unchoke is primarily based on its recent download rate from other peers. Given S unchoke slots in total, a peer makes a new decision every 10 seconds, unchoking the S − 1 peers from which it received the highest average download rate during the last 20 seconds. Additionally, every 30 seconds, a peer allocates its optimistic unchoking slot to some other random peer from its neighborhood. Finally, the peer splits its upload bandwidth equally among all S slots at the so-called equal-split rate. Optimistic unchoking serves two goals. First, it helps a client to explore its neighborhood and find peers that reciprocate with high upload speeds. Second, it helps peers that have just joined the swarm to obtain their first pieces before they have something they can reciprocate with, which is good from a social welfare perspective. For a schematic view of how a peer’s upload slots are allocated in the reference client application, see Figure 4.5. 4.3.3 A Tit-for-Tat Analysis of BitTorrent’s Unchoking Algorithm BitTorrent’s unchoking strategy is often compared to the famous Tit-for-Tat strategy that can sustant cooperation in a repeated Prisoner’s Dilemma game. In this section, we will explain to what degree this comparison is justified, and where the similarities end. Peers in a BitTorrent swarm may have thousands of repeat interactions, even during the download of one file. For this reason, it is reasonable to model BitTorrent as a repeated game. In fact, BitTorrent’s game is simultaneously played by hundreds or thousands of peers connected to each other in a swarm through random neighborhoods. However, instead of modeling this detail, we take the perspective that a peer is simultaneously playing many twoplayer repeated games with many different peers, and focus on a simple repeated Prisoner’s Dilemma model. Indeed, while two peers are still each missing pieces that the other peer Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 79 4 The Economics of Peer-to-Peer File Sharing has, they could choose to continue their bilateral exchange. In the reference client, the unchoking algorithm makes a new decision every 10 seconds based on its average download rate from each peer over the past 20 seconds. We can model the effect of peer i’s unchoking strategy on peer j in the context of a repeated game between the two peers. For this, let u(3) denote the 3rd-highest speed that peer i received from any of its neighbors in the most recent period. Given this, the unchoking strategy of the reference client can be stated as: • If peer j uploaded to i with a speed ≥ u(3) : unchoke j in the next period. • If peer j uploaded to i with a speed < u(3) : do not unchoke j in the next period. • Every three time periods, unchoke a random peer from the neighborhood who is currently choked, and leave that peer unchoked for three time periods. Remember that peer i might be downloading from many peers, and more peers than its number of unchoke slots. This is because a peer can be optimistically unchoked by other peers. Because of this, a large number of peers may be contending for the unchoke slots. We can now see the similarity between BitTorrent’s unchoking strategy and TfT: if peer j uploaded a lot to i in the most recent period (i.e., played “Cooperate” in the Prisoner’s Dilemma) then i rewards j in the next period by unchoking and also uploading to j (i.e., i will also play “Cooperate”). If, however, j did not upload a lot to i (i.e, played “Defect” in the Prisoner’s Dilemma), then i will not upload to j in the next period (i.e, i will also play “Defect”). Furthermore, the optimistic unchoking slot can be interpreted as corresponding to playing “Cooperate” in the first time period of the repeated Prisoner’s Dilemma game. Thus, BitTorrent’s unchoking strategy creates an incentive for peers to upload. If client j does not upload to client i then client i will not unchoke client j, except via optimistic unchoking, which happens rarely. On the other hand, by uploading to i with high speeds, j gets unchoked by i, and also receives high speeds in return. Thus, in contrast to Gnutella, uploading is rewarded in BitTorrent, just as “Cooperating” is rewarded in the repeated Prisoner’s Dilemma game under the TfT strategy. But this is also where the similarity between BitTorrent’s unchoking strategy and the TfT strategy end. Unlike in the Prisoner’s Dilemma game, BitTorrent peers have more than just two actions. Not only can they decide between sharing and not sharing, but they can also vary how much upload bandwidth to make available to each individual peer. Furthermore, whether peer j receives any bandwidth from peer i depends not only on peer j’s actions, but also on the actions of other peers in i’s neighborhood. In addition, a peer j may get to a point where it no longer has any pieces that i wants, in which case it is no longer able to upload anything useful to i. It turns out that these differences matter a lot. In the next section, we will see that there are possible “attacks” on BitTorrent through alternative clients. In the language of game theory, these represent “better strategies.” Thus, despite the similarities with TfT, everyone using the BitTorrent reference client is not an equilibrium. 4.4 “Attacks” on BitTorrent A BitTorrent peer has a large degree of freedom in making decisions regarding how to interact with other peers, and there is a large design space for client applications, including 80 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.4 “Attacks” on BitTorrent the following decisions: 1. How often to contact the tracker to receive a list of peers? 2. Which pieces to reveal to which peers? 3. How many upload slots to use? 4. Which peers to unchoke, how much upload speed to give to each unchoked peer, and how often to make this decision? 5. What data to upload to each unchoked peer? Different choices will result in different performance for a peer participating in a swarm. Of course, users’ preferences may vary, and where some prefer to minimize their download time, others may prefer to minimize usage of upload bandwidth or to strike a good balance between the two. In the following sections, we will present a number of different BitTorrent strategies, each addressing one or both of these two goals, and often making different trade-offs. Many of these strategies have been called “attacks” on BitTorrent, because they lead to a different play of the BitTorrent game than was intended by its inventor, and because some of these strategies would be quite harmful to the overall BitTorrent network if everyone adopted them. On the other hand, introducing new “strategic” clients also revealed certain shortcomings of the original protocol, and often led to improvements. Thus, it is largely a matter of perspective whether you call these strategies an “attack” or a “rational deviation” from the reference client’s strategy. 4.4.1 Uploading Garbage Data When a BitTorrent client joins a swarm it has no blocks to upload to other peers and must rely on optimistic unchoking by others. Thus, getting started in BitTorrent can be slow, in particular when there are many leechers and few seeders. A simple attack that addresses this issue is to upload garbage data. A peer could falsely announce that it has all pieces of the file, and then upload random data when the pieces are requested. However, the SHA-1 fingerprint in a .torrent file allows clients to detect incorrect data on the block level, and quickly identify a peer that uploads garbage data. Most client applications are now designed to stop interacting with a peer once they detect that this peer has uploaded garbage data, making the attack is no longer viable in practice. 4.4.2 BitThief: Exploiting Optimistic Unchoking The second attack we consider was implemented and tested via a real-world BitTorrent client called BitThief. This client exploits the optimistic unchoking strategy that most BitTorrent clients follow (including the reference client). The goal of BitThief is to download files via BitTorrent without uploading anything in return. Thus, BitThief relies exclusively on the optimistic unchoking slots of other peers. BitThief’s goal may make sense for users who have little upload bandwidth available and need it for other applications like VoIP. Furthermore, for users who are interested in downloading copyrighted material, there may be legal reasons to prefer a client that downloads but does not upload. Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 81 4 The Economics of Peer-to-Peer File Sharing Figure 4.6: The BitThief client opens connections to new peers much faster than the reference client (Locher et al., 2006). BitThief does two things differently than the reference client. First, it initially asks the tracker for a list of 200 peers instead of 50, which is the standard. Second, it re-announces itself to the tracker much more frequently than the reference client does (the standard is between 15 or 30 minutes). The goal of both these techniques is to grow a client’s neighborhood as fast as possible. This is beneficial, because every additional peer in a client’s local neighborhood means an additional chance of being optimistically unchoked in the next period. Consider Figure 4.6, which shows a comparison of the number of connections by BitThief versus the reference client over time. We see that BitThief is able to open new connections much faster than the reference client, with a particularly high advantage at the beginning, and thus will receive many more optimistic unchoke slots, which improves download performance. On the other hand, because BitThief is not uploading content it will only get optimistic unchoke slots and this negatively affects its download performance. For typical torrents with a mix of seeders and leechers, BitThief can generally be used to download the torrent successfully. Given that BitThief does not upload at all, it is not surprising that the completion time is on average between 2 and 4 times longer than with the reference client. However, for some torrents, in particular those with small files and a large number of seeders, BitThief can even be slightly faster than the reference client. This is because BitThief’s strategy of quickly opening many connections is particularly powerful in the first few minutes of a download process (see Figure 4.6). For small files, the additional optimistic unchoke slots obtained in the first few minutes may be enough to outweigh the lost reciprocation due to not uploading, such that the total download time is reduced. Thus, BitThief points towards an obvious vulnerability of the optimistic unchoking strat- 82 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.4 “Attacks” on BitTorrent Figure 4.7: Strategic Piece Revelation: Peer i prefers to remain as interesting as possible (Levin et al., 2008). egy of the reference client. However, the exploit relies on trackers not being very careful about this kind of behavior. In fact, nowadays, most trackers prevent multiple requests from the same IP address in a 30-minute window, which makes this attack mute, unless the attacker has a way to obtain multiple IP addresses, which most users cannot easily do. Furthermore, the attack would primarily be of interest to users who really care about minimizing their upload bandwidth. For large files, using BitThief leads to significantly longer download times. 4.4.3 Strategic Piece Revelation The third attack we consider seeks to keep other peers interested in exchanging pieces for as long as possible. If peer i has a piece that j does not yet we say that j is interested in i . Once j has all of the pieces that i has, it loses interest in i. In particular, j will quickly stop uploading to i (except via the optimistic unchoking slot) because the unchoking strategy will not select i to be unchoked when j receives no more useful pieces from i. Consequently, a peer would like to maximize the amount of time that other peers find it interesting. Remember that the reference client truthfully reports the information about the pieces it has, and follows the “rarest-piece first” policy when downloading. At first sight, it seems optimal for a peer to reveal all pieces it has, in order to maximize interest from others. However, this view is myopic, and it turns out that by under-reporting its pieces, a peer can benefit. Now consider Figure 4.7, where various scenarios from peer i’s perspective are shown, and i indicates i’s preferences (i.e., scenarios (a) through (f) are ordered in decreasing preference order). Peer i prefers having more peers interested in i rather than fewer, and thus (a) i (c) i (e), as well as (b) i (d) i (f ). Next, it is disadvantageous for peer i if other peers trade pieces among each other, because this potentially reduces the other peers’ interest in i in the future. Thus, (a) i (b), and (c) i (d) and (e) i (f ). Lastly, it is always better to have one additional peer being interested in peer i, even if that peer is also interested in another peer in the swarm. Thus, (b) i (c) and (d) i (e). Based on the ordering of scenarios presented in Figure 4.7, one can design a piece revelation strategy for agent i that under-reports pieces with the implicit goal of maintaining the scenario in Figure 4.7 (a). Let bi represent i’s true bit-field, with bi (p) = 1 if peer i has piece p and 0 otherwise. While the reference client always reports its true bit-field to new neighbors and also sends truthful updates to existing neighbors whenever it completes a new piece (via have-messages), we now consider a strategy where agent i may possibly represent a different bit-field to each of its neighbors. Thus, a peer j may receive a report of bbi from Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 83 4 The Economics of Peer-to-Peer File Sharing peer i about i’s bit-field, which may be untruthful, i.e., it may be that bbi 6= bi . Algorithm 4.1 describes the piece revelation algorithm, where peer i is considering which pieces to reveal to peer j. Algorithm 4.1: Strategic Piece Revelation Algorithm (Levin et al., 2008). 1. Let bi represent i’s true bitfield, and bbj denote j’s bitfield as j has announced it to i. For each peer k, peer i maintains a list of pieces that i has already revealed to k, denoted Li (k). 2. If there does not exist any piece p such that bbj (p) = 0 and bi (p) = 1 then quit; peer i cannot truthfully gain j’s interest. 3. Find the piece p with bbj (p) = 0 and bi (p) = 1 that maximizes the number of other neighbors k which (a) also have the piece p, or (b) to whom i has revealed piece p before. 4. Send a have-message to peer j, revealing that i has piece p, and add p to Li (j). This piece-revelation strategy differs from the default one in two important ways. First, peer i is non-truthful about its bit-field, only revealing a new piece to peer j whenever it is necessary, namely just at the point in time when j would otherwise lose interest in i. Second, even when i reveals a new piece to j, it reveals the most common piece it has and that j does not have, rather than the rarest piece. The idea is that providing j with a rare piece could increase other peers’ interest in peer j, which could reduce their interest in peer i. Thus, by following this algorithm, i tries to maintain the scenario from Figure 4.7 (a) as opposed to the other scenarios. Figure 4.8 shows an example run from an experiment designed to compare the standard BitTorrent client with one that uses strategic piece revelation (in two separate runs). In this experiment, the two clients join a swarm of peers with a 20-second delay, to test how many other peers (who already have more pieces than they do) they can keep interested in them, and for how long. As we can see, the strategic peer is successful in attracting more interest from other peers over a long period of time, and ultimately completes its download about 30% earlier than the reference client. Note that it is private knowledge of each peer which pieces it has, and consequently a single other peer cannot detect whether a peer under-reports pieces or not. Thus, in contrast to the attacks that involve uploading garbage data or contacting the tracker more frequently than the default client, the strategic piece revelation strategy cannot easily be defended against. Even though using strategic piece revelation is beneficial from an individual peer’s point of view, a remaining question is its effect on overall social welfare. Unfortunately, experimental studies have shown that if all peers use strategic piece revelation, then the average download time for the whole population increases by 12%. This negative effect on overall performance is due to the fact that all peers withhold information from each other, thereby leading to suboptimal allocations of bandwidth. It is perhaps surprising that the increase in download times is not higher. However, peers need to reveal more and more pieces to each other 84 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.4 “Attacks” on BitTorrent Figure 4.8: Example run, comparing a standard client with a strategic piece revealer. The strategic client achieves higher interest from others and finishes downloading a file faster (Levin et al., 2008). over time, to keep each other’s interest, which limits the negative impact of strategic piece revelation. Yet, from a social welfare perspective, it is unfortunate that the BitTorrent protocol allows clients to benefit in this way. An interesting and open research question is whether a protocol could be designed that incentivizes truthful piece revelation. 4.4.4 BitTyrant: Strategic Unchoking The fourth and last strategic attack we consider is also the most powerful one in terms of decreasing a peer’s download time. The strategy was implemented in a BitTorrent client called BitTyrant in 2006, its efficacy has been experimentally tested, and the client is freely available for download. The main motivation for developing BitTyrant was the observation that the reference client’s unchoking strategy may be sub-optimal. Remember that most clients use a fixed number (often 4) of unchoking slots, upload primarily to those peers that reciprocate with the fastest download rate, allocate one optimistic unchoking slot randomly, and split their upload bandwidth equally among all of the unchoked peers (the equal-split policy). The question is whether there is a better unchoking strategy than the default one. The first observation is that BitTorrent’s default unchoking strategy does not really provide the property of “the more you give the more you get.” Instead, if peer i does not upload enough to a peer j to be in its top 3 peers (assuming 4 upload slots), then i will not be unchoked by j except optimistically. Moreover, once peer i is in the top-3 peers of j there is no benefit to i for further increasing its upload capacity to j. Thus, when i decides how much it should upload to j, what really matters is j’s reciprocation probability given a particular upload bandwidth u that i provides to j, which is the likelihood that peer j will unchoke i given upload bandwidth u. Consider Figure 4.9, which plots the probability that another peer will reciprocate (i.e., Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 85 4 The Economics of Peer-to-Peer File Sharing Figure 4.9: The probability that another peer will reciprocate and unchoke peer i as a function of peer i’s raw upload capacity as well as its upload capacity per slot (assuming the equal-split policy) (Piatek et al., 2007). unchoke peer i) as a function of peer i’s: 1) raw upload capacity, as well as 2) its upload capacity available in a single slot (assuming the equal-split policy). We see a clear discontinuity in the probability of reciprocation as a function of either measure of upload capacity. For example, looking at a single slot’s upload capacity (equal-split), if peer i provides 11-12 KB/s, peer j is much more likely to reciprocate than with 10 KB/s, but increasing the upload bandwidth above approximately 14 KB/s provides little additional benefit to i. This suggests that the equal-split policy is suboptimal from an individual peer’s perspective. To examine this aspect further, consider Figure 4.10, which plots the expected download performance of a peer against its total upload capacity. Performance increases sublinearly, with high-capacity peers getting less than their “fair share” of download bandwidth in return for upload bandwidth. Especially for a high-capacity peer, it seems likely to be better to split its upload capacity into more than 4 slots. The BitTyrant client exploits these insights by deviating from the reference client in three ways: • How many upload slots to use? Instead of using a fixed number of upload slots, BitTyrant dynamically adjusts the number of upload slots, choosing the number that maximizes performance. • Who to unchoke? Instead of unchoking those peers with the fastest download speed, BitTyrant unchokes those peers where the ratio of received download speed to provided upload speed is best. • How fast to upload to unchoked peers? Instead of using the equal-split strategy, BitTyrant dynamically adjusts the upload bandwidth provided to every unchoked peer, with the goal to upload at the minimum rate at which that peer is willing to reciprocate. Essentially, BitTyrant takes a “return-on-investment” perspective. It “invests” upload 86 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.4 “Attacks” on BitTorrent Figure 4.10: Expectation of download performance as a function of upload capacity (Piatek et al., 2007). capacity to maximize its download rate. To perform this strategy, the BitTyrant client needs to do more bookkeeping than the reference client. Bookkeeping: estimating dj and uj . For each neighbor j, peer i maintains an estimate of dj , the current download rate that j provides its unchoked peers, and uj , the upload rate a peer must allocate to peer j to become unchoked at j. If peer i is currently unchoked by j, then dj is simply the actual download bandwidth i receives from j. Otherwise, i must estimate dj based on secondary information, and BitTyrant does this in a very smart, yet roundabout way. Remember that peers send have-messages to all peers in their neighborhood every time they complete a new piece, even to peers that are currently not unchoked. Based on the frequency with which such have-messages arrive from peer j, peer i can estimate the total download rate received by peer j. Now consider Figure 4.10 again, and notice the roughly linear relationship between a peer’s upload capacity and its expected download rate. For this reason, BitTyrant uses the estimated total download rate as an estimate for a peer’s total upload capacity. Assuming an equal-split upload rate for peer j, the total estimated upload capacity is divided by the total number of upload slots j is expected to use at this speed, to obtain an estimate of dj , i.e., the download bandwidth that i can expect to receive from j if i is unchoked. Peer i does not know the number of upload slots used by j, and instead estimates this number as the number of upload slots that popular file-sharing clients would typically use at that upload capacity. Estimating uj , the upload rate required to become unchoked at j, is even harder than estimating dj , because it cannot be estimated based on messages that j sends to its neighbors. The rate uj depends on the upload capacities of other peers in the swarm, which peers have each other in their respective neighborhoods, which peers currently have which pieces and are thus interested in each other, and so on. For this reason, BitTyrant simply uses the equal split upload capacities observed in prior measurements as the initial estimate for Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 87 4 The Economics of Peer-to-Peer File Sharing uj , and then adjusts these estimates over time. For example, based on the measurements shown in Figure 4.9, BitTyrant would estimate uj to be somewhere around 14 KB/s. The BitTyrant Algorithm: Step by Step. Now consider Algorithm 4.2 which describes the complete BitTyrant unchoking strategy. In step 1, the parameters uj and dj are initialized, as described above. In step 2, the variable cap i is set to the maximum upload capacity the client shall use in this swarm. Going forward, we consider two different versions of the BitTyrant client: #1) the default BitTyrant client, which we also call altruistic or uncapped, sets cap i equal to its total available upload capacity (i.e., no artificial cap); and #2) the capped BitTyrant client, which caps its upload capacity at some point below its total available upload capacity. Using version #2 may make sense if the user can make better use of the additional upload capacity in some other way, e.g., by using it in another BitTorrent swarm or by using it for a non-BitTorrent application (e.g., VoIP). In step 3 of d the algorithm, the client i ranks peers j in decreasing order of their ujj ratio, and then keeps adding new upload slots, filling P them with the most valuable peers, until the upload budget cap i is reached. If cap i > j uj then all peers in i’s neighborhood are unchoked and the excess upload capacity remains unused. Finally, in step 4 of the algorithm, the parameters uj and dj are updated based on the observations from the current period. Good values for the parameters α, γ and r used in this last step can be determined experimentally. In their experiments, Piatek et al. (2007) decreased uj by γ = 10% if a peer reciprocated for r = 3 periods, and increased uj by α = 20% if a peer failed to reciprocate after being unchoked during the previous period. Empirical Evaluation of BitTyrant. For the empirical evaluation of BitTyrant described next, the upload budget cap i is set manually to fixed levels, such that the two versions of BitTyrant coincide. Figure 4.11 shows a comparison of the download times for a single BitTyrant client and the download times of a single unmodified BitTorrent client, when all other peers in the swarm are also using an unmodified BitTorrent client. The peer’s upload cap is plotted on the x-axis, and the achieved download time on the y-axis. We see that the BitTyrant client consistently outperforms the BitTorrent client. While the exact size of the performance difference depends on the client’s upload capacity, the BitTyrant client is roughly twice as fast as the standard BitTorrent client, except for very low upload caps where the difference is smaller. Furthermore, while the performance of the BitTorrent client saturates at around an upload cap of 100 KB/s, the performance of BitTyrant continues to improve. The fact that download times do not decrease much further for BitTyrant as upload capacities go beyond 500 KB/s is due to the limited swarm size used in the simulations. A BitTyrant peer i with a large upload bandwidth will be able to unchoke so many peers, that at some point it will reach a peer j where the amount of download bandwidth dj it gets in return for uj is very close to 0, or even equal to 0. Of course, allocating bandwidth to peers with dj = 0 does not hurt the client, but it also doesn’t help. This is where the behavior of the two BitTyrant versions differ. The uncapped (altruistic) BitTyrant client keeps allocating bandwidth to new peers until all of the total available upload bandwidth is used up. In contrast, the capped BitTyrant client withholds some of its available upload 88 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.4 “Attacks” on BitTorrent Algorithm 4.2: The BitTyrant Unchoking Algorithm (Piatek et al., 2007). 1. For each peer j, peer i maintains estimates of expected download rate dj and expected upload rate uj required for reciprocation by peer j: a) If peer i is currently unchoked by j, then dj is the observed download bandwidth. Otherwise, dj is inferred indirectly from j’s block announcement rate. b) Initialize uj using the distribution of equal split capacities observed in prior measurements (e.g., Figure 4.9). 2. Set cap i to the maximum upload capacity that peer i shall use in this swarm 3. Each period, order peers by decreasing ratio cap i is reached: dj uj and unchoke those of top rank until dn d0 d1 d2 d3 d4 , , , , , ..., u u u u u un | 0 1 2 P{z3 4 } max n s.t. n j=0 uj ≤ cap i 4. At the end of each period, for each unchoked peer j: a) If peer j does not unchoke i: uj ← (1 + α)uj b) If peer j unchokes i: dj ← observed rate. c) If peer j has unchoked i for the last r periods: uj ← (1 − γ)uj bandwidth if cap i is smaller than the total available upload bandwidth, and thus, some peers with small values of dj may not be unchoked. The optimal value for cap i could be determined dynamically. For example, we could modify the algorithm such that the client never allocates to any peer with dj = 0. Alternatively, the client might already stop d allocating slots to new peers once the ratio ujj drops below some small value ε, at which point the user might prefer to use the extra bandwidth for other applications. Thus, the BitTyrant client can in fact discover this “point of diminishing returns,” and then cap the upload bandwidth at exactly that point. We now take a look at what happens, if all peers in a swarm use BitTyrant, and we analyze the performance difference between the two BitTyrant versions. While we have seen that BitTyrant improves the performance of a single peer when playing against standard BitTorrent peers, we are now interested in BitTyrant’s effect on social welfare, i.e, the download performance averaged over all peers in the swarm. For this, consider Figure 4.12, where the cumulative distribution function of completion times is shown, comparing the two versions of the BitTyrant client with the standard BitTorrent client. For this experiment, the capped BitTyrant client was modified such that the upload capacity of even the high capacity peers was limited to 100 KB/s. All three lines in Figure 4.12 are based on experiments where all peers in the swarm use the same client. We see that the altruistic/uncapped BitTyrant client provides the highest performance, the standard BitTorrent client (which is also not Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 89 4 The Economics of Peer-to-Peer File Sharing Figure 4.11: The BitTyrant client consistently outperforms a standard BitTorrent client in terms of average download time (Piatek et al., 2007). capped) lies in the middle, and the capped BitTyrant client achieves the lowest performance. The altruistic/uncapped BitTyrant client is able to provide the highest overall performance because the distribution of upload capacities in the swarm is skewed. Using BitTyrant, high capacity peers obtain more file pieces faster, and can then reciprocate with more pieces, thus effectively increasing the swarm’s capacity. This is a very positive result: by modifying the unchoking strategy, the download performance has increased, not only for the individual but also for the community as a whole. On the other hand, if BitTyrant peers cap their upload bandwidth at the point of diminishing returns, then overall performance decreases. This happens because every BitTyrant peer that limits its upload bandwidth effectively reduces the total amount of upload bandwidth available in the whole swarm, which necessarily means that other peers get lower download speeds. Thus, whether or not BitTyrant provides a social welfare benefit depends on whether users only seek to maximize their individual download rate or whether they also seek to minimize their upload rate as a secondary objective (for example because they are participating in multiple swarms, or because they want to use their upload bandwidth for other applications). Another draw-back of using BitTyrant is that new users may experience an extended bootstrapping phase. Unlike the reference client, which uses one upload slot for optimistic unchoking, the BitTyrant client only unchokes peers that send at a fast rate. A BitTyrant client will only unchoke a newcomer that is not uploading if there is no other peer from whom the BitTyrant peer would at least get something in return, and if furthermore BitTyrant is not using an upload cap. Introducing optimistic unchoking into BitTyrant would increase the swarm’s performance, but this would not be in the interest of self-interested users. To summarize: while BitTyrant certainly improves an individual peer’s download performance compared to the reference client, it lowers the overall swarm’s performance if everyone adopts the capped BitTyrant client. 90 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.5 Discussion Figure 4.12: CDFs of completion times, comparing the uncapped (altruistic) BitTyrant, the original BitTorrent, and the capped BitTyrant (Piatek et al., 2007). 4.5 Discussion 4.5.1 User Behavior in Practice Despite all of the possible ways in which BitTorrent can be attacked, the BitTorrent network has been incredibly robust and successful. A number of BitTorrent clients that can improve a user’s download performance are available for download (e.g., BitTyrant), but almost nobody uses them. Rather, the most popular clients implement strategies that are similar to the reference client. One explanation for this user behavior is that the overall user experience with these clients may be better; e.g., with graphically appealing user interfaces, integrated search for content, and so forth. Given this, a rational user might prefer a client with a slower download speed (or one that requires more upload bandwidth), if the client provides other useful functionality that is missing from more “strategic” clients. In fact, the BitTorrent clients that are popular today compete for new users by advertising low processor or small memory usage of their clients instead of focusing on download speed. This also indicates that users choose their favorite clients based on a number of different factors, and download speed is no longer the differentiating factor. 4.5.2 Private BitTorrent Communities An interesting development that has changed the nature of the BitTorrent network in recent years is the rise of private BitTorrent communities. While a public BitTorrent community uses a public tracker or a DHT to coordinate the peers in a swarm, a private community uses a private tracker that can only be used with special credentials, i.e., users must register an account. In many private BitTorrent communities, a user account can only be obtained by invitation from current members, and these invitations can be difficult to get. Thus, it is not surprising that these private communities are much smaller than their public counterparts. A study from 2009 reported that the largest public BitTorrent community The Pirate Bay Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 91 4 The Economics of Peer-to-Peer File Sharing had 4 Million members, while TorrentLeech, one of the largest private communities, had only 178,000 members. However, the members of private BitTorrent communities are usually power users with high-speed internet connections, and thus the member numbers do not necessarily reflect the number of files or the total bandwidth available in these communities. Private trackers generally collect statistics about each user’s upload and download behavior and enforce sharing ratios, i.e., a minimum ratio of upload to download traffic per user. For example, a typical policy may be that a user must have a sharing ratio of at least 0.25, and will ban a user if he violates the sharing ratio for an extended period of time. The empirical evidence we have shown in Figure 4.1 suggests that sharing-ratio enforcement is working reasonably well: the typical download speeds that can be obtained in private BitTorrent communities are about 3 to 5 times higher than in public communities. Note that the use of higher-level incentive mechanisms like sharing-ratio enforcement also reduces the importance of the incentives on the lower levels, e.g., with regard to the unchoking algorithm. While some recent measurement studies have shown how well private BitTorrent communities are working, understanding the dynamics present in these communities is still an active area of research. Some questions in regard to the design of optimal incentive mechanisms for these communities include: 1. Are sharing-ratios a good mechanism, and if so, which exact ratio should be enforced? 2. Should a credit-based system be used in place of sharing-ratio enforcement, with each type of action corresponding to a different amount of credit? 3. Should there be a minimum seeding amount by each user? There are also obvious vulnerabilities in the way in which existing private tracker communities operate. They all rely on BitTorrent clients making truthful reports regarding the total amount of bytes downloaded and uploaded. To achieve this, they usually only allow clients from a small list of “trusted” clients. However, the client’s user-id, which is used to identify the particular BitTorrent client, is also self-reported by the client to the tracker, and could be spoofed. Thus, it would be relatively easy to develop a client that pretends to be one of the trusted ones, but then misreports the total amount of bytes downloaded and uploaded to gain an advantage. To this date, we are not aware of any private tracker that uses a more sophisticated security mechanism that would prevent such an attack. Given this, it is a bit surprising that private BitTorrent communities are flourishing and not suffering from instability due to cheating. One possible reason is that users fear being banned from the community if found to be cheating by the tracker, and don’t want to give up the 3-5x download speed improvements over public communities. A second reason is that some participants in private communities seem to exhibit altruism. 4.5.3 Altruism in BitTorrent So far, we have been able to reconcile most of the observed behavior in BitTorrent with a selfish-rational user model. However, one behavior that seems to contradict this standard user model is that some users (in particular of private BitTorrent communities) purposefully select very “altruistic” settings for their clients. It seems that many BitTorrent users have adopted a certain sharing norm. It is not uncommon that users configure their clients to 92 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.5 Discussion Figure 4.13: Traffic proportions due to different P2P protocols in the US between 2002 and 2004 (?). upload at least as much (or even twice as much) as they have downloaded before going offline. Thus, there remain interesting questions regarding what motivates users to share in P2P file-sharing communities, even when they don’t benefit directly. These questions lie at the intersection of behavioral economics and computer systems design and give rise to a number of challenging research problems. We will come back to some of these considerations involving behavioral economics models in Chapter 27. There, we also look at systems like Linux and Wikipedia, where some users display similarly altruistic behavior. 4.5.4 A Short History of P2P File Sharing P2P file sharing was initially marked by rapid changes, with users migrating quickly from one network to the next. While BitTorrent has now emerged as the clear winner of the P2P file-sharing protocol battle, this was far from obvious when BitTorrent was first introduced in 2001. Remember that Napster had just been developed two years earlier, in 1999. In 2000, the two protocols eDonkey and Gnutella had been introduced, and in 2001, FastTrack joined the reign of popular protocols. These three were responsible for the majority of P2P file-sharing traffic in the first two years after Napster was shut down in 2001 (see Figure 4.13). After Napster was gone, the FastTrack network became the dominant player worldwide, reaching its peak in 2003 with more than 3 million users simultaneously connected to the network, sharing over 5,000 terabytes of content. For many reasons, among them a proprietary protocol and client software that came bundled with malware, FastTrack started losing its market share to other networks towards the end of 2003. In parallel to FastTrack’s decay, the popularity of BitTorrent started to increase. In 2005, BitTorrent had between 2 and 3 million simultaneous users world-wide, while eDonkey had between 3 and Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 93 4 The Economics of Peer-to-Peer File Sharing 100% 4% 12% 5% 80% 5% 5% 7% 28% 24% 20% 7% 12% 44% 50% 60% Other FastTrack 72% eDonkey 40% 52% 67% 71% 07 08 73% 81% BitTorrent 45% 20% 16% 0% 2003 04 05 06 09 10 11 12 2013 Figure 4.14: P2P File-sharing traffic due to different P2P protocols in Europe/Germany between 2003 and 2013.3 4 million, FastTrack had around 2 million, and Gnutella had around 1.8 million. However, user adoption of the various P2P networks differed significantly around the world. In the US, BitTorrent had quickly emerged as the most popular protocol. In 2005, it was already responsible for 48% of the P2P traffic in the US, and its market share continued to grow over the next 8 years, with 86% in 2012 and 88% in 2013. In the rest of the world, however, the development was a little different, and the adoption of BitTorrent was slower than in the US. In Europe, for example, the eDonkey protocol was much more popular than in the US. It was the most popular protocol in 2003, responsible for over 50% of the P2P traffic, and continued to be the most popular protocol until BitTorrent finally overtook it in 2007. This trend continued, and in 2013, BitTorrent was responsible for 81% of the P2P file sharing traffic in Europe, but eDonkey is still making its mark, responsible for roughly 12% of the P2P traffic (see Figure 4.14). While the absolute amount of Internet traffic due to P2P file sharing has increased yearafter-year from 1999 till 2013, its relative share has decreased significantly since it had reached its peak somewhere between 2004 and 2006 (see Figure 4.15). While the increasingly strict enforcement of copyright laws with multi-million dollar law-suits may have scared off some file-sharing users, the most important reason for this relative decline in recent years is 3 Because of incomplete data, for the years 2003, 2005, 2007, and 2008, the graph is based on data for Germany only. For the other years, the graph is based on data for all of Europe. However, the numbers for Germany can be seen as a relatively good approximation of the European averages. 4 The numbers for worldwide traffic shares from 1999 till 2004 are based on ?, page 3 (our sources did not provide separate traffic numbers for US vs. Europe for 1999-2003). The numbers for 2005 are estimated average traffic shares based on data for upload and download bandwidth. The European numbers for 2007 and 2008 are based on data from Germany only. However, the numbers for Germany can be seen as a relatively good approximation of the European averages. 94 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.6 Notes 80% Europe 60% US 40% Worldwide 20% 0% 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 Figure 4.15: Share of the total Internet traffic due to P2P file sharing from 1999 to 2013.4 the increased attractiveness of alternative means to obtain music or videos on the Internet. In the early 2000s, P2P file sharing may have been the only (practical) way to get access to music or videos (legally or illegally) directly via the Internet. But today, users have many attractive options available to them. For example, they can now buy individual songs or whole albums directly from Apple’s iTunes or other comparable service. Additionally, music streaming services like Pandora, Last.fm, Spotify, Soundcloud, etc., have become very popular in recent years, and are often available for free (supported via advertising). For music videos and other short video clips, YouTube! has become the go-to-place since it was founded in 2005. In 2007, the DVD-by-mail service Netflix started streaming some of its movies to end-users, and since then has continuously improved its online streaming service. The increased availability of such real-time entertainment options is the reason why the share of Internet traffic due to P2P file sharing decreased so much between 2005 and 2009. This also explains why the decline happened somewhat earlier in the US, where such options became available earlier than in other parts of the world. In the US, real-time entertainment was responsible for 62% of Internet traffic in 2013, with Netflix clearly being the major player responsible for 28.8% of Internet traffic. In contrast, in Europe only 35.7% of Internet traffic was due to real-time entertainment, mainly because the market for realtime entertainment is still relatively young. For the next few years, it is expected that the relative importance of real-time entertainment will continue to increase around the world, while the share of Internet traffic due to P2P file-sharing will continue to decrease further. 4.6 Notes One of the first papers that studied P2P file-sharing was the paper “Free Riding on Gnutella” by Adar and Huberman (2000), which showed convincingly how prevalent free riding was already in 2000. The follow-up paper “Free Riding on Gnutella Revisited: The Bell Tolls?” by Hughes et al. (2005) showed that the amount of free riding had increased significantly from 2000 to 2005. For background reading on BitTorrent, which was first introduced by Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 95 4 The Economics of Peer-to-Peer File Sharing Bram Cohen in 2001, the original paper “Incentives Build Robustness in BitTorrent” by Cohen (2003) gives a short yet good introduction to the protocol design. A good reference for the tit-for-tat strategy, an important element of Cohen’s original design, is the paper “The Evolution of Cooperation” by Axelrod and Hamilton (1981). A detailed measurement study illustrating the performance of BitTorrent as of 2005 can be found in the paper “The BitTorrent P2P File-Sharing System: Measurements and Analysis” by Pouwelse et al. (2005). For an updated study, including a comparison of public and private BitTorrent communities, see the “ Public and Private BitTorrent Communities: A Measurement Study” by Meulpolder et al. (2010). For a measurement study analyzing the incentives at play in private communities see the paper “Economics of BitTorrent Communities” by Kash et al. (2012). For more details on how decentralized tracking is implemented in BitTorrent via a distributed hash table (DHT), please see Falkner et al. (2007). Many papers on BitTorrent have challenged the idea that BitTorrent’s incentives provide robustness against strategic attacks. Shneidman et al. (2004) as well as Liogkas et al. (2006) have described multiple simple attacks, including the exploit based on uploading garbage data. The paper by Locher et al. (2006) introduced the BitThief client, and showed that it is possible to use BitTorrent effectively without ever uploading. The “piece revelation strategy” was described and shown to be effective by Levin et al. (2008). The BitTyrant client was introduced by Piatek et al. (2007) and has shown the major download speed improvements that can be obtained by using a better unchoking strategy. Due to space constraints, we could not discuss the “Propshare Mechanism” by Levin et al. (2008), who propose a different unchoking algorithm than BitTyrant, leading to slightly better individual and system-wide performance. The advances from the systems research community regarding exploiting the BitTorrent protocol and developing better strategies have out-paced the theoretical understanding of the incentives in BitTorrent. Some examples of theoretical work includes the paper on coupon replication systems by Massoulié and Vojnović (2005), and the paper on anonymous social networks by Immorlica et al. (2010). It is difficult to give a consistent account of the history of P2P file-sharing, because the exact market-shares, traffic ratios, etc., depend on the measurement technology used, the particular network and user group sampled, and so forth. Furthermore, these numbers varied significantly for different regions in the world, and nobody measured these numbers consistently over the years. For this reason, the brief history of P2P file-sharing we provide at the end of this chapter is based on a number of different sources, including the data provided in (?) and (?), online news articles of the BBC5 and of arstechnica6 , news items from the P2P file-sharing website Slyck 7 , press releases from BitTorrent Inc.8 , a measurement study by Liang et al. (2006), the Internet studies conducted by ipoque in 20069 , 200710 , and 2008/200911 , the press releases from 2002 till 2013 by the networking equipment company 5 http://news.bbc.co.uk/2/hi/business/1449127.stm http://arstechnica.com/uncategorized/2007/06/the-youtube-effect-http-traffic-now-eclipses-p2p/ 7 http://www.slyck.com/news.php?story=814 8 http://www.bittorrent.com/company/about/ces 2012 150m users 9 http://www.ipoque.com/sites/default/files/mediafiles/documents/p2p-survey-2006.pdf 10 http://www.ipoque.com/sites/default/files/mediafiles/documents/internet-study-2007.pdf 11 http://www.ipoque.com/sites/default/files/mediafiles/documents/internet-study-2008-2009.pdf 6 96 Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 4.7 Comprehension Questions and Exercises Sandvine 12 , and the three most recent measurement studies conducted by Sandvine in 2012 and 201313 . Note that companies like Sandvine that are producing traffic shaping equipment have an incentive to overstate the relative importance of P2P traffic. Thus, while the relative trends depicted in Figure 4.14 and 4.15 are correct, the absolute numbers should be taken with the original sources in mind. 4.7 Comprehension Questions and Exercises 4.7.1 Comprehension Questions c4.1 Explain in one sentence each, a) what the biggest problem of Gnutella was, and b) why the theory of repeated games doesn’t apply. c4.2 Explain, in one sentence, the main difference between Gnutella and BitTorrent with regard to the resulting incentives. c4.3 Describe three ways in which the file-sharing game corresponding to the BitTorrent protocol is not just a repeated prisoner’s dilemma. c4.4 Explain two ways in which the original design of the BitTorrent network remained centralized. c4.5 Is the strategic piece revelation strategy good or bad for social welfare? Explain! c4.6 Describe three aspects in which the BitTyrant unchoking strategy differs from the reference client. c4.7 Does the BitTyrant client increase or decrease social welfare? Explain! 4.7.2 Exercises 4.1 The strategic-piece-revelation strategy in the BitTorrent protocol uses “under-reporting” of pieces. Consider instead a strategy based on “over-reporting” pieces, i.e., a client reporting to have pieces that it doesn’t actually have. Provide some intuition for why such a strategy might make sense. Next, explain why such a strategy would be unlikely to work in practice, i.e., provide a vulnerability? 4.2 The strategies in file-sharing games are provided by software, with new clients (= strategies) such as BitThief released over time. Suppose that a client is universally adopted, and even proved to be a Nash equilibrium with itself. Why might you still worry this is insufficient to provide stability of the ecosystem? 4.3 Prove that for a suitably chosen discount factor, the Tit-for-Tat strategy described in Section 4.2.3 indeed constitutes a subgame-perfect Nash equilibrium of the P2P File-Sharing game 12 13 http://www.sandvine.com/news/press releases.asp http://www.sandvine.com/news/global broadband trends.asp Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission. 97