Unstructured P2P Overlay Networks
Transcription
Unstructured P2P Overlay Networks
Institute of Computer Science Research Group – Theory of Distributed Systems Peer-to-Peer Networks and Applications Unstructured P2P Overlay Networks Dr.-Ing. Kalman Graffi Faculty for Electrical Engineering, Computer Science at the University of Paderborn This slide set is based on the lecture "Communication Networks 2" of Prof. Dr.-Ing. Ralf Steinmetz at TU Darmstadt Dr.-Ing. Kalman Graffi, University of Paderborn 1 Institute of Computer Science Research Group – Theory of Distributed Systems Room issues Problem: Lecture time: Tuesday, 16:00 - 18:00 p.m. Lecture is in English German course: Monday – Thursday 16:00 - 18:00 p.m. Collision New time for lecture is needed Free slots in Fürstenallee Tuesday: 9:00 - 11:00 • 23 - Lecture Tuesday 11:00 - 13:00 • 16 – Exercise • 1/3 (like now) - 12 • 2/4 –(change) - 6 Tuesday 18:00 - 20:00: 9 Friday 11:00 - 13:00: 13 Dr.-Ing. Kalman Graffi, University of Paderborn 2 Institute of Computer Science Research Group – Theory of Distributed Systems Overview 1 Unstructured Homogeneous P2P Overlays 1.1 Centralized P2P Networks 1.2 Distributed / Homogeneous P2P Overlays 1.3 Homogeneous P2P Overlay: Gnutella - Version 0.4 1.4 Rendezvous-based P2P Overlays 2 Unstructured Heterogeneous P2P Overlays 2.1 Decentralized File Sharing with Distributed Servers 2.2 Decentralized File Sharing with Super Nodes 2.3 Unstructured Hybrid Resource Sharing: Skype Dr.-Ing. Kalman Graffi, University of Paderborn 3 Institute of Computer Science Research Group – Theory of Distributed Systems 1 Unstructured Homogeneous P2P Overlays Principles Centralized P2P Networks Napster Homogeneous P2P Systems Gnutella Protocol-0.4 Bubblestorm Dr.-Ing. Kalman Graffi, University of Paderborn 4 Institute of Computer Science Research Group – Theory of Distributed Systems Unstructured Centralized P2P Overlays Unstructured P2P Structured P2P Centralized P2P Homogenous P2P Heterogeneous P2P DHT-Based Heterogeneous P2P 1. All features of Peer-to-Peer included 2. Central entity is necessary to provide the service 3. Central entity is some kind of index/group database 1. All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. no central entities 1. All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. dynamic central entities 1. 1. All features of Peer-to-Peer included 2. Peers are organized in a hierarchical manner 3. Any terminal entity can be removed without loss of functionality Examples: Gnutella 0.4 Freenet Examples: Gnutella 0.6 Fasttrack eDonkey Examples: Napster All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. No central entities 4. Connections in the overlay are “fixed” Examples: Chord CAN Kademlia Examples: • AH-Chord • Globase.KOM from R.Schollmeier and J.Eberspächer, TU München Dr.-Ing. Kalman Graffi, University of Paderborn 5 Institute of Computer Science Research Group – Theory of Distributed Systems Terminology - Overlay Types Structured Objects linked to peers by ID Clear responsibility function Peers are resp. for ID ranges Lookup function: find by ID Unstructured Homogeneous All nodes are assumed equal All nodes take same roles Heterogeneous Nodes vary in capacity Various roles assigned to peers by their capacity Objects stay were submitted No special structure to find files established Search: search by keyword Pure All nodes are operated by users Only decentral components in the network Flat All nodes are in only one overlay All functions performed in that single overlay Hybrid Servers support the p2p overlays Hierarchical Several overlays exist Centralized components allowed Each with dedicated function Some nodes are in only one overlay, some in more Dr.-Ing. Kalman Graffi, University of Paderborn 6 Institute of Computer Science Research Group – Theory of Distributed Systems Principles Principle of unstructured overlay networks Location of resource only known to submitter Objects have no special identifier (unstructured) Each peer is responsible only for the objects it submitted Introduction of new resource • at any location Main task: to search To find all peers storing objects fitting to some criteria To communicate P2P having identified these peers Dr.-Ing. Kalman Graffi, University of Paderborn 7 Institute of Computer Science Research Group – Theory of Distributed Systems Centralized P2P Networks Central index server, maintaining index: 1.1 Overlay Network What: • object name, file name, criteria (ID3) … Central Server Where • (IP address, Port) Search engine • Combining object and location information Global view on the network Normal peer, maintaining the objects: Each peer maintains only its own objects Decentralized storage (content at the edges) File transfer between clients (decentralized) Issues: Unbalanced costs: central server is bottleneck Security: server is single point of failure Dr.-Ing. Kalman Graffi, University of Paderborn 8 Institute of Computer Science Research Group – Theory of Distributed Systems Centralized P2P Network “A stores D” Server S “Where is D ?” Node B “A stores D” Transmission: D “A stores D” Node B Node A Simple strategy: Central server stores information about locations • • • • Node A (provider) tells server that it stores item D Node B (requester) asks server S for location of D Server S tells B that node A stores item D Node B requests item D from node A Dr.-Ing. Kalman Graffi, University of Paderborn 9 Institute of Computer Science Research Group – Theory of Distributed Systems Approach I: Central Server Advantages Search complexity of O(1) – “just ask the server” Complex and fuzzy queries are possible Simple and fast Challenges No intrinsic scalability • O(N) network and system load of server Single point of failure or attack Non-linear increasing implementation and maintenance cost • in particular for achieving high availability and scalability Central server not suitable for systems with massive numbers of users But overall, … Best principle for small and simple applications Dr.-Ing. Kalman Graffi, University of Paderborn 10 Institute of Computer Science Research Group – Theory of Distributed Systems Centralized P2P Networks (like Napster, ICQ) Napster - History: 1999-2001: first famous P2P file sharing tool for mp3 files free content access, violation of copyright of artists 2001-2003: by introduction of filters • users abandoned service April 2001: OpenNap • appr. 45.000 users, • 80 (interconnected) directory servers, • more than 50 TB data 2004: Napster 2 music store no P2P network anymore download music tracks with a subscription model Dr.-Ing. Kalman Graffi, University of Paderborn 11 Institute of Computer Science Research Group – Theory of Distributed Systems Napster Phases: 1. Directory update Peers provide data to central server • stored files - mp3 tags, related metadata, own IP-address 2. Search Requesting peer addresses at central server (locally) Central server delivers related IP addresses 3. Test of quality requesting peer Central Server Overlay Network • sends ping (gets pong) from all potential peers • select the best 4. Download - data transfer requesting peers • connects best peer • retrieves data Service delivery (file transfer) Dr.-Ing. Kalman Graffi, University of Paderborn 12 Institute of Computer Science Research Group – Theory of Distributed Systems ICQ Feature: P2P communication History 1996: first instant messaging tool • wide deployment, participation is free see e.g. http://www.icq.com/ • ca. 430 million registered accounts • > 700 million messages sent and received per day Functionalities Central server maintains status of the users Communication either over a server or P2P (directly) Server can be used as relay station • to store messages until user gets online • to circumvent NAT Direct communication offers more functions • file transfer, video messages, … Dr.-Ing. Kalman Graffi, University of Paderborn 13 Institute of Computer Science Research Group – Theory of Distributed Systems Unstructured Homogenous P2P Overlays Unstructured P2P Structured P2P Centralized P2P Homogeneous P2P Heterogeneous P2P DHT-Based Heterogeneous P2P 1. All features of Peer-to-Peer included 2. Central entity is necessary to provide the service 3. Central entity is some kind of index/group database 1. All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. no central entities 1. All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. dynamic central entities 1. 1. All features of Peer-to-Peer included 2. Peers are organized in a hierarchical manner 3. Any terminal entity can be removed without loss of functionality Examples: Gnutella 0.4 Freenet Examples: Gnutella 0.6 Fasttrack eDonkey Examples: Napster All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. No central entities 4. Connections in the overlay are “fixed” Examples: Chord CAN Kademlia Examples: • AH-Chord • Globase.KOM from R.Schollmeier and J.Eberspächer, TU München Dr.-Ing. Kalman Graffi, University of Paderborn 14 Institute of Computer Science Research Group – Theory of Distributed Systems 1.2 Distributed / Homogeneous P2P Overlays Service delivery Characteristics All peers are equal • (in their roles) Search mechanism is provided by cooperation of all peers Local view on the network Organic growing: Just append to current network Overlay Network No special infrastructure element needed Motivation: To provide robustness To have self organization Dr.-Ing. Kalman Graffi, University of Paderborn 15 Institute of Computer Science Research Group – Theory of Distributed Systems Tasks to solve 1. To connect to the network No central index server Joining strategies needed To join to know at least 1 peer of the network Local view on network advertisements needed 2. To search Search Mechanisms in P2P Overlays • • • • Broadcast Expanding Ring Random Walk Rendezvous Idea 3. To deliver the service Establish connection to other node(s) Peer to peer communication Dr.-Ing. Kalman Graffi, University of Paderborn 16 Institute of Computer Science Research Group – Theory of Distributed Systems Properties of Distributed / Homogeneous P2P Networks Benefits: Drawbacks: Robustness: Every peer is dispensable • Switch off peer for network no effect Balanced costs: • each peer contributes the same Self organization Slow and expensive search • (to flood the network) Finding all objects fitting to search criteria is not guaranteed • Object out of reach for search query Dr.-Ing. Kalman Graffi, University of Paderborn 17 Institute of Computer Science Research Group – Theory of Distributed Systems Search Mechanisms: Flooding Breadth-first search (BFS) Use system-wide maximum TTL to control communication overhead Send a message to all neighbors except the one who delivered the incoming message Store message identifiers of routed messages or use nonoblivious messages to avoid retransmission cycles Query Query Dr.-Ing. Kalman Graffi, University of Paderborn 18 Institute of Computer Science Research Group – Theory of Distributed Systems Example 4 4 increasing 6 5 3 hop count 3 4 3 6 2 3 4 1 2 5 3 3 4 6 source peer 5 destination peer Overhead Large, here 43 messages sent Length of the path: 5 hops Dr.-Ing. Kalman Graffi, University of Paderborn 19 Institute of Computer Science Research Group – Theory of Distributed Systems Flooding Search Fully Distributed Approach Central systems • vulnerable • do not scale Unstructured P2P systems follow opposite approach • No global information available about location of a item • Content only stored at respective node providing it Retrieval of data No routing information for content Necessity to ask as many systems as possible / necessary Approaches • Flooding: high traffic load on network, does not scale • Highest effort for searching – quick search through large areas many messages needed for unique identification Dr.-Ing. Kalman Graffi, University of Paderborn 20 Institute of Computer Science Research Group – Theory of Distributed Systems Search Mechanisms: Expanding Ring Mechanism Successive floods with increasing TTL • Start with small TTL • If no success increase TTL • .. etc. Properties Improved performance • if objects follow Zipf law popularity distribution and are located accordingly Message overhead is high Dr.-Ing. Kalman Graffi, University of Paderborn 21 Institute of Computer Science Research Group – Theory of Distributed Systems Excursion: Zipf – Distribution Describes the probability of an element to occur N be the number of elements • E.g. 1000 elements k be their rank • E.g. element i is the 623th most frequent one to occur s be the value of the exponent characterizing the distribution Frequency of an element k to occur: Simple example: N =10000, s= 0.6 1 P ( x ) ~ rank ( x ) 0 .6 ∑ s N s 1 n n =1 Request Probability f (k ; s, N ) = 1 k Rank Dr.-Ing. Kalman Graffi, University of Paderborn 22 Institute of Computer Science Research Group – Theory of Distributed Systems Search Mechanisms: Random Walk Random walks Forward the query to a randomly selected neighbor • Message overhead is reduced significantly • Increased latency Multiple random walks (k-query messages) • Reduces latency • Generates more load Termination mechanism • TTL-based • Periodically checking requester before next submission Dr.-Ing. Kalman Graffi, University of Paderborn 23 Institute of Computer Science Research Group – Theory of Distributed Systems Example 5 4 4 3 3 increasing 7 6 hop count 5 8 2 3 4 1 8 source peer 10 8 2 7 9 9 destination peer Random walk with n=2 (each incoming message is sent twice out) Overhead Smaller, here e.g. 30 messages sent until destination is reached Length of the path found e.g. • 6 hubs Dr.-Ing. Kalman Graffi, University of Paderborn 24 Institute of Computer Science Research Group – Theory of Distributed Systems Search Mechanisms: Rendezvous Point Storing node Green/light grey on right side Propagate content on all nodes within a predefined range Requesting node Blue/dark grey on left side Propagates his query to all neighbors within predefined range A query hit can be found at the Rendezvous Point (black) Source: http://www.dvs.tu-darmstadt.de/research/bubblestorm/ Dr.-Ing. Kalman Graffi, University of Paderborn 25 Institute of Computer Science Research Group – Theory of Distributed Systems 1.3 Homogeneous P2P Overlay: Gnutella - Version 0.4 Gnutella Version 0.4 History: first decentralized p2p overlay Developed by Nullsoft (owned by AOL) Protocol version 0.4 – homogeneous roles Protocol version 0.6 – heterogeneous roles / hierarchical structure Some Characteristics (of ver. 0.4) : Message broadcast for node discovery and search requests Flooding • (to all connected nodes) is used to distribute information Nodes recognize message they already have forwarded • by their GUID and • do not forward them twice Hop limit by TTL originally TTL = 7 Dr.-Ing. Kalman Graffi, University of Paderborn 26 Institute of Computer Science Research Group – Theory of Distributed Systems Phases of Protocol 0.4 1. Connecting PING message: • Actively discover hosts on the network PONG message: • Answer to the PING messages • includes information about one connected Gnutella servent 2. Search QUERY message: • Searching the distributed network QUERY HIT message: • Response to a QUERY message • can contain several matching files of one servent 3. Data transfer HTTP is used to transfer files (HTTP GET) • Not part of the protocol v0.4 PUSH message: To circumvent firewalls Dr.-Ing. Kalman Graffi, University of Paderborn 27 Institute of Computer Science Research Group – Theory of Distributed Systems Phase 1: Gnutella – Connecting PING PONG 1) 3) 2) TCP 4) PING New peer New peer New peer New peer PONG To connect to a Gnutella network, peer must initially know TCP (at least) one member node of the network and connect to it This/these first member node(s) must be found by other means find first member using other medium (Web, telephone …) nowadays host caches are usually used Servent connects to a number of nodes it got PONG messages from • thus it becomes part of the Gnutella network Dr.-Ing. Kalman Graffi, University of Paderborn 28 Institute of Computer Science Research Group – Theory of Distributed Systems Phase 2: Gnutella – Searching Flooding: QUERY message is distributed to all connected nodes 1. A node that receives a QUERY message increases the HOP count field of the message and IF • HOP <= TTL (Time To Live) • and a QUERY message with this GUID was not received before THEN forwards it to all nodes • except the one he received it from Flooding Dr.-Ing. Kalman Graffi, University of Paderborn 29 Institute of Computer Science Research Group – Theory of Distributed Systems Phase 2: Gnutella – Searching 2. The node also checks whether it can answer to this QUERY with a QUERYHIT message if e.g. available files match the search criteria 3. QUERYHIT message contains • the IP address of the sender • information about one or more files that match search criteria information passed back the same way the QUERY took • (no flooding) • does not establish new links avoids NAT problem Dr.-Ing. Kalman Graffi, University of Paderborn 30 Institute of Computer Science Research Group – Theory of Distributed Systems Phase 3: Gnutella – Data Transfer Peer sets up a HTTP connection actual data transfer is not part of the Gnutella protocol HTTP GET is used Special case: peer with the file located behind a firewall/NAT gateway downloading peer • cannot initiate a TCP/HTTP connection • can instead send the PUSH message – asking the other peer to initiate a TCP/HTTP connection to it and – then transfer (push) the file via it • does not work if both peers are behind firewalls Dr.-Ing. Kalman Graffi, University of Paderborn 31 Institute of Computer Science Research Group – Theory of Distributed Systems Gnutella 0.4: Scalability Issues A TTL (Time To Live) of 4 hops for the PING messages leads to a known topology of roughly 8000 nodes TTL in the original Gnutella client was 7 (not 4) Gnutella in its original version (V 0.4) suffers from a range of scalability issues due to • fully decentralized approach • flooding of messages Frequency Distribution of Gnutella Messages (e.g. Portmann et. al.): QUERY 54.8% PONG 26.9% PING 14.8% QUERY HIT 2.8% PUSH 0.7% 41.7% of the messages just for network discovery Dr.-Ing. Kalman Graffi, University of Paderborn 32 Institute of Computer Science Research Group – Theory of Distributed Systems Gnutella 0.4: Scalability Issues Low bandwidth peers easily use up all their bandwidth for passing on PING and QUERY messages of other users leave no bandwidth for up- and downloads Reason for breakdown of Gnutella network in August 2000 Dr.-Ing. Kalman Graffi, University of Paderborn 33 Institute of Computer Science Research Group – Theory of Distributed Systems Gnutella 0.4: How Scalability Issues are tackled Mechanisms to overcome the performance problems Optimization of connection parameters • number of hops etc PING/PONG optimization • e.g. C. Rohrs und V. Falco. Limewire: Ping Pong Scheme, April 2001. – http://www.limewire.com/index.jsp/pingpong Dropping of low-bandwidth connections • To move low-bandwidth users to edge of Gnutella Network Ultra-Peers • similar to KaZaA super nodes File Hashes to identify files • similar to eDonkey Chosen solution: Hierarchy (Gnutella v. 0.6): SUPERPEERS (like in FastTrack..) to cope with load of low bandwidth peers Dr.-Ing. Kalman Graffi, University of Paderborn 34 Institute of Computer Science Research Group – Theory of Distributed Systems Gnutella 0.4 Problem: Free Riders Free Rider exist in big anonymous communities selfish individuals that opt out of a voluntary contribution • to the community social welfare • i.e. by not sharing any files but downloading from others • ... and get away with it Study results (since e.g. Adar/Hubermann 2000): 70% of the Gnutella users share no files 90% answer no queries Solutions incentives for sharing • peers only accept connections / forward messages from peers that share content • but: e.g., – how to verify? – quality of the content? micro-payment • see Mojo Nation Dr.-Ing. Kalman Graffi, University of Paderborn 35 Institute of Computer Science Research Group – Theory of Distributed Systems 1.4 Rendezvous-based P2P Overlays Rendezvous idea Content is announced in a region (bubble) of the P2P network Queries flood just a region (bubble) of the P2P network Announcements and queries meet at a rendevouz point Replicate both queries and data O( ) copies each (hidden constants unequal) Data and queries rendezvous in the network Source: Terpstra, Leng, Lehn – Short Presentation Bubblestorm Dr.-Ing. Kalman Graffi, University of Paderborn 36 Institute of Computer Science Research Group – Theory of Distributed Systems Rendezvous-based P2P Overlay: Bubblestorm Random Topology Peer neighbors are chosen randomly Allows efficient sampling of peers at random Topology Measurement To calculate bubble sizes the network size must be known Computes network size and statistics through gossiping Bubblecast Replicates queries/data onto peers quickly Intelligent flooding to / in the bubble Bubble Maintainer Preserves the correct number of replicas Dr.-Ing. Kalman Graffi, University of Paderborn 37 Institute of Computer Science Research Group – Theory of Distributed Systems Random Multigraph Topology Random graphs support the birthday paradox Exploring an edge leads to a randomly sampled peer Creation of random node subset (bubble) is cheap Node degree is chosen proportional to bandwidth As random walks (and bubblecasts) follow edges with equal probability Utilization will be balanced for heterogeneity Dr.-Ing. Kalman Graffi, University of Paderborn 38 Institute of Computer Science Research Group – Theory of Distributed Systems Bubblecast Motivation Flooding Bubblecast Random Walk + low latency + reliable - imprecise node count - unbalanced link load + low latency + reliable + precise node count + balanced link load - high latency - unreliable + precise length + balanced link load node counter (not hops) fixed branch factor branch in every step Source: Terpstra, Leng, Lehn – Short Presentation Bubblestorm Dr.-Ing. Kalman Graffi, University of Paderborn 39 Institute of Computer Science Research Group – Theory of Distributed Systems Example Bubblecast Execution Bubblecast: Announcement / query in the bubble Procedure: A counter specifies the number of replicas to create Decrement the counter for matching locally Split the counter between two neighbors Counters are always integral Final routing depth differs by at most one hop 1 1 -1 2 1 1 -1 2 1 -1 4 -1 3 -1 8 1 1 -1 4 -1 17 1 -1 3 -1 8 Dr.-Ing. Kalman Graffi, University of Paderborn 40 Institute of Computer Science Research Group – Theory of Distributed Systems BubbleStorm: Random Replication Place data replicas on random nodes Nodes evaluate query replicas on all stored data Where both data and query go, matches are found Collisions result from the birthday paradox Data Query Source: Terpstra, Leng, Lehn – Short Presentation Bubblestorm Dr.-Ing. Kalman Graffi, University of Paderborn 41 Institute of Computer Science Research Group – Theory of Distributed Systems BubbleStorm: Exploiting Heterogeneity Peers have different capacities Faster peers receive more traffic This is beneficial! Contribution is squared Data Query Dr.-Ing. Kalman Graffi, University of Paderborn 42 Institute of Computer Science Research Group – Theory of Distributed Systems Bubblecast Properties Used for query and data replication Fixed branch factor balances load Same stationary distribution as a random walk Counter for edges crossed, not hops Precisely controls replica count Logarithmic routing depth Slightly deeper than flooding Message loss reduces replication by log(size) Samples random nodes … due to random topology Dr.-Ing. Kalman Graffi, University of Paderborn 43 Institute of Computer Science Research Group – Theory of Distributed Systems Complexity and Correctness BubbleStorm costs roughly c n bandwidth / operation to provide exhaustive search with P(failure) < e−c 2 c 1 2 3 P(success) 63.21% 98.17% 99.99% The full equation ( e − c 2 + c 3 HΥ 4 99.99999% ) is complicated by Heterogeneous peer capacity (H) Dependent sampling (due to repeated withdrawals) Unequal query and post traffic (Υ; BS optimizes this) Full details in the paper Dr.-Ing. Kalman Graffi, University of Paderborn 44 Institute of Computer Science Research Group – Theory of Distributed Systems 2 Unstructured Heterogeneous P2P Overlays Principles Heterogeneous Overlays Gnutella-0.6 Decentralized File Sharing with Distributed Servers eDonkey, eMule Decentralized File Sharing with Super Nodes KaZaA Fasttrack message exchange at login super node regular node Skype login server Unstructured Heterogeneous Resource Sharing Skype Dr.-Ing. Kalman Graffi, University of Paderborn 45 Institute of Computer Science Research Group – Theory of Distributed Systems Unstructured Heterogeneous P2P Overlays Unstructured P2P Structured P2P Centralized P2P Homogeneous P2P Heterogeneous P2P DHT-Based Heterogeneous P2P 1. All features of Peer-to-Peer included 2. Central entity is necessary to provide the service 3. Central entity is some kind of index/group database 1. All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. no central entities 1. All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. dynamic central entities 1. 1. All features of Peer-to-Peer included 2. Peers are organized in a hierarchical manner 3. Any terminal entity can be removed without loss of functionality Examples: Gnutella 0.4 Freenet Examples: Gnutella 0.6 Fasttrack eDonkey Examples: Napster All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. No central entities 4. Connections in the overlay are “fixed” Examples: Chord CAN Kademlia Examples: • AH-Chord • Globase.KOM from R.Schollmeier and J.Eberspächer, TU München Dr.-Ing. Kalman Graffi, University of Paderborn 46 Institute of Computer Science Research Group – Theory of Distributed Systems Principles – Hierarchical / Heterogeneous Approach: to combine best of both worlds Robustness by distributed indexing Fast searches by server queries Components Supernodes • mini servers / super peers • used as servers for queries – To build a sub-network between supernodes – queries distributed at subnetwork between supernodes ++ Advantages More robust than centralized solutions Faster searches than in pure P2P systems -- Disadvantages Need of algorithms to choose reliable supernodes “Normal” peers • have only overlay connections to supernodes Picture from R.Schollmeier and J.Eberspächer, TU München Dr.-Ing. Kalman Graffi, University of Paderborn 47 Institute of Computer Science Research Group – Theory of Distributed Systems 2.1 Decentralized File Sharing with Distributed Servers For example: eDonkey see e.g. • http://www.overnet.org/ • http://www.emule-project.net/ • http://savannah.gnu.org/projects/mldonkey/ eDonkey file-sharing protocol most successful/used file-sharing protocol in • e.g. Germany & France in 2003 [see sandvine.org] – 52% of generated P2P file sharing traffic – KaZaA only for 44% in Germany Stopped by law • February 2006 largest server „Razorback 2.0“ disconnected be Belgium police – http://www.heise.de/newsticker/eDonkey-Betreiber-wirft-endgueltigdas-Handtuch--/meldung/78093 Dr.-Ing. Kalman Graffi, University of Paderborn 48 Institute of Computer Science Research Group – Theory of Distributed Systems The eDonkey Network - Principle Distributed server(s) set up and RUN BY POWER-USERS nearly impossible to shut down all servers exchange their server lists with other servers • using UDP as transport protocol manages file indices Client application connects to one random server and stays connected using a TCP connection searches are directed to the server clients can also extend their search • by sending UDP search messages to additional servers Dr.-Ing. Kalman Graffi, University of Paderborn 49 Institute of Computer Science Research Group – Theory of Distributed Systems The eDonkey Network Search TCP UDP Server List Exchange Supernode Download Node Extended Search Dr.-Ing. Kalman Graffi, University of Paderborn 50 Institute of Computer Science Research Group – Theory of Distributed Systems The eDonkey Network Procedure Search New servers send • their port + IP to other servers (UDP) Servers send • server lists (other servers they know) to the clients Server lists can also be downloaded on various websites Server List Exchange Files are identified by Download unique MD4 • Message-Digest Algorithm4, RFC 1186 file hashes • 16 byte long Extended Search are not identified by filenames this helps in • resuming a download from a different source • downloading the same file from multiple sources at the same time • verification that the file has been correctly downloaded TCP UDP Supernode Node Dr.-Ing. Kalman Graffi, University of Paderborn 51 Institute of Computer Science Research Group – Theory of Distributed Systems The eDonkey Network the SEARCH consists of two steps 1. Full text search to • connected server (TCP) or • extended search with UDP to other known servers. Search result are the hashes of matching files 2. Query Sources • query servers for clients offering a file with a certain hash Later • download from these sources Dr.-Ing. Kalman Graffi, University of Paderborn 52 Institute of Computer Science Research Group – Theory of Distributed Systems eDonkey: User Behavior Average file size 217MB (indication that many videos are shared) distribution of the size of files Zipf distribution Heckmann et. al. The eDonkey File-Sharing Network. Gi-Informatik 2004 Dr.-Ing. Kalman Graffi, University of Paderborn 53 Institute of Computer Science Research Group – Theory of Distributed Systems eDonkey: User Behavior 57.8 files shared on average Number of shared files/user Zipf distribution Heckmann et. al. The eDonkey File-Sharing Network. Gi-Informatik 2004 Dr.-Ing. Kalman Graffi, University of Paderborn 54 Institute of Computer Science Research Group – Theory of Distributed Systems Decentralized File Sharing with Super Nodes 2.2 see www.kazaa.com, gift.sourceforge.net, http://www.my-k-lite.com/ System Developer: Fasttrack Clients: KaZaA Properties: most successful P2P network in USA in 2002/3 Architecture: neither completely central nor decentralized Supernodes to reduce communication overhead P2P system #users #files terabytes #downloads (from download.com Fasttrack 2,6 Mio. 472 Mio. 3550 4 Mio. eDonkey 230.000 13 Mio. 650-2600 600.000 Gnutella 120.000 28 Mio. 105 Ca. 525.000 Numbers are from 10‘2002 Dr.-Ing. Kalman Graffi, University of Paderborn 55 Institute of Computer Science Research Group – Theory of Distributed Systems Decentralized File Sharing with Super Nodes Peers connected only to some super nodes send IP address and file names only to super peers Super nodes - super peers: peers with high-performance network connections take the role of the central server and proxy for simple peers answer search messages for all peers (reduction of comm. load) one or more supernodes can be removed without problems Additionally, the communication between nodes is encrypted Search Service Delivery Search Download Superpeer Peer Dr.-Ing. Kalman Graffi, University of Paderborn 56 Institute of Computer Science Research Group – Theory of Distributed Systems Decentralized File Sharing with Complete Files Drone 1 receives 25% of the file at 12,5 KB/s rate Drone 1 has 50 KB/s upload rate not utilized until he has whole file Queen Bee has 100 MB file 50 KB/s upload rate in total At the beginning Later From www.wtata.com Dr.-Ing. Kalman Graffi, University of Paderborn 57 Institute of Computer Science Research Group – Theory of Distributed Systems Issues with KaZaA Keyword-based search You do not know what you get Pollution a problem • Music companies flooded the network with false files • Chance to get a “good” file ~ 10% • Problem for “small” files Full file download before uploading User go offline after download finished Only few uploaders online Problem for “large” files Dr.-Ing. Kalman Graffi, University of Paderborn 58 Institute of Computer Science Research Group – Theory of Distributed Systems Google Trends for KaZaA, Limewire, Torrent, Emule http://www.google.com/trends?q=kazaa,+limewire,+torrent,+emule Dr.-Ing. Kalman Graffi, University of Paderborn 59 Institute of Computer Science Research Group – Theory of Distributed Systems 2.3 Unstructured Hybrid Resource Sharing: Skype Offered Services IP Telephony features File exchange Instant Messaging Features KaZaA technology High media quality Encrypted media delivery Support for teleconferences Multi-platform Further Information Very popular Low-cost IP-Telephony business model SkypeOut extension to call regular phone numbers (not free) Great business potential if combined with free WIFIs Very popular From www.skype.com Dr.-Ing. Kalman Graffi, University of Paderborn 60 Institute of Computer Science Research Group – Theory of Distributed Systems Skype vs. Call-by-Call Skype-to-Skype TCP/IP Internet SkypeOut PSTN Call-by-Call Internet Call-by-Call PSTN TCP/IP Call-by-Call PSTN (rented) PSTN Dr.-Ing. Kalman Graffi, University of Paderborn 61 Institute of Computer Science Research Group – Theory of Distributed Systems Skype Network Architecture formerly KaZaA based message exchange at login super node regular node Skype login server Dr.-Ing. Kalman Graffi, University of Paderborn 62 Institute of Computer Science Research Group – Theory of Distributed Systems Conclusion of this Section Principles of unstructured P2P overlays Centralized unstructured P2P overlays Napster Homogeneous unstructured P2P overlays Gnutella Bubblestorm Heterogeneous unstructured P2P overlays Hybrid unstructured P2P networks • Decentralized File Sharing with Distributes Servers • like eDonkey/eMule/mlDonkey Hierarchical unstructured P2P networks • Decentralized File Sharing with Super Nodes • like KaZaA/Fasttrack Skype Dr.-Ing. Kalman Graffi, University of Paderborn 63
Similar documents
An Online Gaming Testbed for Peer-to
• The networking components must have well-defined and flexible interfaces to facilitate the replacement of the network architecture. We believe that with these features we can provide a testbed fo...
More information