Unstructured P2P Overlay Networks

Transcription

Unstructured P2P Overlay Networks
Institute of Computer Science
Research Group – Theory of Distributed Systems
Peer-to-Peer Networks and Applications
Unstructured P2P Overlay Networks
Dr.-Ing. Kalman Graffi
Faculty for Electrical Engineering, Computer Science at the University of Paderborn
This slide set is based on the lecture "Communication
Networks 2" of Prof. Dr.-Ing. Ralf Steinmetz at TU Darmstadt
Dr.-Ing. Kalman Graffi, University of Paderborn
1
Institute of Computer Science
Research Group – Theory of Distributed Systems
Room issues
Problem:
Lecture time: Tuesday, 16:00 - 18:00 p.m.
Lecture is in English
German course: Monday – Thursday 16:00 - 18:00 p.m.
Collision
New time for lecture is needed
Free slots in Fürstenallee
Tuesday: 9:00 - 11:00
• 23 - Lecture
Tuesday 11:00 - 13:00
• 16 – Exercise
• 1/3 (like now) - 12
• 2/4 –(change) - 6
Tuesday 18:00 - 20:00: 9
Friday 11:00 - 13:00: 13
Dr.-Ing. Kalman Graffi, University of Paderborn
2
Institute of Computer Science
Research Group – Theory of Distributed Systems
Overview
1 Unstructured Homogeneous P2P Overlays
1.1 Centralized P2P Networks
1.2 Distributed / Homogeneous P2P Overlays
1.3 Homogeneous P2P Overlay: Gnutella - Version 0.4
1.4 Rendezvous-based P2P Overlays
2 Unstructured Heterogeneous P2P Overlays
2.1 Decentralized File Sharing with Distributed Servers
2.2 Decentralized File Sharing with Super Nodes
2.3 Unstructured Hybrid Resource Sharing: Skype
Dr.-Ing. Kalman Graffi, University of Paderborn
3
Institute of Computer Science
Research Group – Theory of Distributed Systems
1
Unstructured Homogeneous P2P Overlays
Principles
Centralized P2P Networks
Napster
Homogeneous P2P Systems
Gnutella Protocol-0.4
Bubblestorm
Dr.-Ing. Kalman Graffi, University of Paderborn
4
Institute of Computer Science
Research Group – Theory of Distributed Systems
Unstructured Centralized P2P Overlays
Unstructured P2P
Structured P2P
Centralized P2P
Homogenous P2P
Heterogeneous P2P
DHT-Based
Heterogeneous P2P
1. All features of
Peer-to-Peer
included
2. Central entity is
necessary to
provide the
service
3. Central entity is
some kind of
index/group
database
1. All features of
Peer-to-Peer
included
2. Any terminal
entity can be
removed without
loss of
functionality
3.
no central
entities
1. All features of
Peer-to-Peer
included
2. Any terminal
entity can be
removed without
loss of
functionality
3.
dynamic
central entities
1.
1. All features of
Peer-to-Peer
included
2. Peers are
organized in a
hierarchical
manner
3. Any terminal
entity can be
removed without
loss of
functionality
Examples:
Gnutella 0.4
Freenet
Examples:
Gnutella 0.6
Fasttrack
eDonkey
Examples:
Napster
All features of
Peer-to-Peer
included
2.
Any terminal
entity can be
removed
without loss of
functionality
3.
No central
entities
4.
Connections in
the overlay are
“fixed”
Examples:
Chord
CAN
Kademlia
Examples:
• AH-Chord
• Globase.KOM
from R.Schollmeier and J.Eberspächer, TU München
Dr.-Ing. Kalman Graffi, University of Paderborn
5
Institute of Computer Science
Research Group – Theory of Distributed Systems
Terminology - Overlay Types
Structured
Objects linked to peers by ID
Clear responsibility function
Peers are resp. for ID ranges
Lookup function: find by ID
Unstructured
Homogeneous
All nodes are assumed
equal
All nodes take same roles
Heterogeneous
Nodes vary in capacity
Various roles assigned to
peers by their capacity
Objects stay were submitted
No special structure to find files
established
Search: search by keyword
Pure
All nodes are operated by users
Only decentral components in
the network
Flat
All nodes are in only one overlay
All functions performed in that
single overlay
Hybrid
Servers support the p2p overlays Hierarchical
Several overlays exist
Centralized components allowed
Each with dedicated function
Some nodes are in only one
overlay, some in more
Dr.-Ing. Kalman Graffi, University of Paderborn
6
Institute of Computer Science
Research Group – Theory of Distributed Systems
Principles
Principle of unstructured overlay networks
Location of resource only known to submitter
Objects have no special identifier (unstructured)
Each peer is responsible only for the objects it submitted
Introduction of new resource
• at any location
Main task: to search
To find all peers storing objects fitting to some criteria
To communicate P2P having identified these peers
Dr.-Ing. Kalman Graffi, University of Paderborn
7
Institute of Computer Science
Research Group – Theory of Distributed Systems
Centralized P2P Networks
Central index server, maintaining index:
1.1
Overlay
Network
What:
• object name, file name, criteria (ID3) …
Central
Server
Where
• (IP address, Port)
Search engine
• Combining object and location information
Global view on the network
Normal peer, maintaining the objects:
Each peer maintains only its own objects
Decentralized storage (content at the edges)
File transfer between clients (decentralized)
Issues:
Unbalanced costs: central server is bottleneck
Security: server is single point of failure
Dr.-Ing. Kalman Graffi, University of Paderborn
8
Institute of Computer Science
Research Group – Theory of Distributed Systems
Centralized P2P Network
“A stores D”
Server S
“Where is D ?”
Node B
“A stores D”
Transmission: D
“A stores D”
Node B
Node A
Simple strategy:
Central server stores information about locations
•
•
•
•
Node A (provider) tells server that it stores item D
Node B (requester) asks server S for location of D
Server S tells B that node A stores item D
Node B requests item D from node A
Dr.-Ing. Kalman Graffi, University of Paderborn
9
Institute of Computer Science
Research Group – Theory of Distributed Systems
Approach I: Central Server
Advantages
Search complexity of O(1) – “just ask the server”
Complex and fuzzy queries are possible
Simple and fast
Challenges
No intrinsic scalability
• O(N) network and system load of server
Single point of failure or attack
Non-linear increasing implementation and maintenance cost
• in particular for achieving high availability and scalability
Central server not suitable for systems with massive numbers of
users
But overall, …
Best principle for small and simple applications
Dr.-Ing. Kalman Graffi, University of Paderborn
10
Institute of Computer Science
Research Group – Theory of Distributed Systems
Centralized P2P Networks (like Napster, ICQ)
Napster - History:
1999-2001:
first famous P2P file sharing
tool for mp3 files
free content access, violation of copyright of artists
2001-2003:
by introduction of filters
•
users abandoned service
April 2001: OpenNap
• appr. 45.000 users,
• 80 (interconnected) directory servers,
• more than 50 TB data
2004: Napster 2
music store
no P2P network anymore
download music tracks with a subscription model
Dr.-Ing. Kalman Graffi, University of Paderborn
11
Institute of Computer Science
Research Group – Theory of Distributed Systems
Napster
Phases:
1. Directory update
Peers provide data to central server
• stored files - mp3 tags, related metadata, own IP-address
2. Search
Requesting peer addresses at central server (locally)
Central server delivers related IP addresses
3. Test of quality
requesting peer
Central
Server
Overlay
Network
• sends ping (gets pong) from
all potential peers
• select the best
4. Download - data transfer
requesting peers
• connects best peer
• retrieves data
Service delivery (file
transfer)
Dr.-Ing. Kalman Graffi, University of Paderborn
12
Institute of Computer Science
Research Group – Theory of Distributed Systems
ICQ Feature: P2P communication
History
1996: first instant messaging tool
• wide deployment, participation is free
see e.g. http://www.icq.com/
• ca. 430 million registered accounts
• > 700 million messages sent and received per day
Functionalities
Central server maintains status of the users
Communication either over a server or P2P (directly)
Server can be used as relay station
• to store messages until user gets online
• to circumvent NAT
Direct communication offers more functions
• file transfer, video messages, …
Dr.-Ing. Kalman Graffi, University of Paderborn
13
Institute of Computer Science
Research Group – Theory of Distributed Systems
Unstructured Homogenous P2P Overlays
Unstructured P2P
Structured P2P
Centralized P2P
Homogeneous P2P
Heterogeneous P2P
DHT-Based
Heterogeneous P2P
1. All features of
Peer-to-Peer
included
2. Central entity is
necessary to
provide the
service
3. Central entity is
some kind of
index/group
database
1. All features of
Peer-to-Peer
included
2. Any terminal
entity can be
removed without
loss of
functionality
3.
no central
entities
1. All features of
Peer-to-Peer
included
2. Any terminal
entity can be
removed without
loss of
functionality
3.
dynamic
central entities
1.
1. All features of
Peer-to-Peer
included
2. Peers are
organized in a
hierarchical
manner
3. Any terminal
entity can be
removed without
loss of
functionality
Examples:
Gnutella 0.4
Freenet
Examples:
Gnutella 0.6
Fasttrack
eDonkey
Examples:
Napster
All features of
Peer-to-Peer
included
2.
Any terminal
entity can be
removed
without loss of
functionality
3.
No central
entities
4.
Connections in
the overlay are
“fixed”
Examples:
Chord
CAN
Kademlia
Examples:
• AH-Chord
• Globase.KOM
from R.Schollmeier and J.Eberspächer, TU München
Dr.-Ing. Kalman Graffi, University of Paderborn
14
Institute of Computer Science
Research Group – Theory of Distributed Systems
1.2
Distributed / Homogeneous P2P Overlays
Service delivery
Characteristics
All peers are equal
• (in their roles)
Search mechanism is provided
by cooperation of all peers
Local view on the network
Organic growing:
Just append to current network
Overlay
Network
No special infrastructure element needed
Motivation:
To provide robustness
To have self organization
Dr.-Ing. Kalman Graffi, University of Paderborn
15
Institute of Computer Science
Research Group – Theory of Distributed Systems
Tasks to solve
1. To connect to the network
No central index server
Joining strategies needed
To join
to know at least 1 peer of the network
Local view on network
advertisements needed
2. To search
Search Mechanisms in P2P Overlays
•
•
•
•
Broadcast
Expanding Ring
Random Walk
Rendezvous Idea
3. To deliver the service
Establish connection to other node(s)
Peer to peer communication
Dr.-Ing. Kalman Graffi, University of Paderborn
16
Institute of Computer Science
Research Group – Theory of Distributed Systems
Properties of Distributed / Homogeneous P2P Networks
Benefits:
Drawbacks:
Robustness: Every peer is
dispensable
• Switch off peer
for network
no effect
Balanced costs:
• each peer contributes the
same
Self organization
Slow and expensive
search
• (to flood the network)
Finding all objects fitting
to search criteria is not
guaranteed
• Object out of reach for
search query
Dr.-Ing. Kalman Graffi, University of Paderborn
17
Institute of Computer Science
Research Group – Theory of Distributed Systems
Search Mechanisms: Flooding
Breadth-first search (BFS)
Use system-wide maximum TTL to control communication
overhead
Send a message to all neighbors except the one who
delivered the incoming message
Store message identifiers of routed messages or use nonoblivious messages to avoid retransmission cycles
Query
Query
Dr.-Ing. Kalman Graffi, University of Paderborn
18
Institute of Computer Science
Research Group – Theory of Distributed Systems
Example
4
4
increasing
6
5
3
hop count
3
4
3
6
2
3
4
1
2
5
3
3
4
6
source peer
5
destination peer
Overhead
Large, here 43 messages sent
Length of the path:
5 hops
Dr.-Ing. Kalman Graffi, University of Paderborn
19
Institute of Computer Science
Research Group – Theory of Distributed Systems
Flooding Search
Fully Distributed Approach
Central systems
• vulnerable
• do not scale
Unstructured P2P systems follow opposite approach
• No global information available about location of a item
• Content only stored at respective node providing it
Retrieval of data
No routing information for content
Necessity to ask as many systems as possible / necessary
Approaches
• Flooding: high traffic load on network, does not scale
• Highest effort for searching
– quick search through large areas
many messages needed for unique identification
Dr.-Ing. Kalman Graffi, University of Paderborn
20
Institute of Computer Science
Research Group – Theory of Distributed Systems
Search Mechanisms: Expanding Ring
Mechanism
Successive floods with increasing TTL
• Start with small TTL
• If no success increase TTL
• .. etc.
Properties
Improved performance
• if objects follow Zipf law popularity
distribution and are located accordingly
Message overhead is high
Dr.-Ing. Kalman Graffi, University of Paderborn
21
Institute of Computer Science
Research Group – Theory of Distributed Systems
Excursion: Zipf – Distribution
Describes the probability of an element to occur
N be the number of elements
• E.g. 1000 elements
k be their rank
• E.g. element i is the 623th most frequent one to occur
s be the value of the exponent characterizing the distribution
Frequency of an element k to occur:
Simple example:
N =10000, s= 0.6

1
P ( x ) ~ 
 rank ( x )



0 .6
∑
s
N
s
1
n
n =1
Request Probability
f (k ; s, N ) =
1 k
Rank
Dr.-Ing. Kalman Graffi, University of Paderborn
22
Institute of Computer Science
Research Group – Theory of Distributed Systems
Search Mechanisms: Random Walk
Random walks
Forward the query to a randomly selected neighbor
• Message overhead is reduced significantly
• Increased latency
Multiple random walks
(k-query messages)
• Reduces latency
• Generates more load
Termination mechanism
• TTL-based
• Periodically checking requester before next submission
Dr.-Ing. Kalman Graffi, University of Paderborn
23
Institute of Computer Science
Research Group – Theory of Distributed Systems
Example
5
4
4
3
3
increasing
7
6
hop count
5
8
2
3
4
1
8
source peer
10
8
2
7
9
9
destination peer
Random walk with n=2
(each incoming message is sent twice out)
Overhead
Smaller, here e.g. 30 messages sent until destination is
reached
Length of the path found
e.g.
• 6 hubs
Dr.-Ing. Kalman Graffi, University of Paderborn
24
Institute of Computer Science
Research Group – Theory of Distributed Systems
Search Mechanisms: Rendezvous Point
Storing node
Green/light grey on right side
Propagate content on all nodes within a predefined range
Requesting node
Blue/dark grey on left side
Propagates his query to all neighbors within predefined range
A query hit can be found at the Rendezvous Point (black)
Source: http://www.dvs.tu-darmstadt.de/research/bubblestorm/
Dr.-Ing. Kalman Graffi, University of Paderborn
25
Institute of Computer Science
Research Group – Theory of Distributed Systems
1.3 Homogeneous P2P Overlay: Gnutella - Version 0.4
Gnutella Version 0.4
History: first decentralized p2p overlay
Developed by Nullsoft (owned by AOL)
Protocol version 0.4 – homogeneous roles
Protocol version 0.6 – heterogeneous roles / hierarchical structure
Some Characteristics (of ver. 0.4) :
Message broadcast for node discovery and search requests
Flooding
• (to all connected nodes) is used to distribute information
Nodes recognize message they already have forwarded
• by their GUID and
• do not forward them twice
Hop limit by TTL
originally TTL = 7
Dr.-Ing. Kalman Graffi, University of Paderborn
26
Institute of Computer Science
Research Group – Theory of Distributed Systems
Phases of Protocol 0.4
1. Connecting
PING message:
• Actively discover hosts on the network
PONG message:
• Answer to the PING messages
• includes information about one connected Gnutella servent
2. Search
QUERY message:
• Searching the distributed network
QUERY HIT message:
• Response to a QUERY message
• can contain several matching files of one servent
3. Data transfer
HTTP is used to transfer files (HTTP GET)
• Not part of the protocol v0.4
PUSH message: To circumvent firewalls
Dr.-Ing. Kalman Graffi, University of Paderborn
27
Institute of Computer Science
Research Group – Theory of Distributed Systems
Phase 1: Gnutella – Connecting
PING PONG
1)
3)
2)
TCP
4)
PING
New
peer
New
peer
New
peer
New
peer
PONG
To connect to a Gnutella network, peer must initially know
TCP
(at least) one member node of the network and connect to it
This/these first member node(s) must be found by other means
find first member using other medium (Web, telephone …)
nowadays host caches are usually used
Servent connects to a number of nodes
it got PONG messages from
•
thus it becomes part of the Gnutella network
Dr.-Ing. Kalman Graffi, University of Paderborn
28
Institute of Computer Science
Research Group – Theory of Distributed Systems
Phase 2: Gnutella – Searching
Flooding: QUERY message is distributed to all
connected nodes
1. A node that receives a QUERY message
increases the HOP count field of the message and
IF
• HOP <= TTL (Time To Live)
• and a QUERY message with this GUID was not received before
THEN forwards it to all nodes
• except the one he received it from
Flooding
Dr.-Ing. Kalman Graffi, University of Paderborn
29
Institute of Computer Science
Research Group – Theory of Distributed Systems
Phase 2: Gnutella – Searching
2. The node also checks
whether it can answer to this QUERY
with a QUERYHIT message
if e.g. available files match the search criteria
3. QUERYHIT message
contains
• the IP address of the sender
• information about one or more files that match search criteria
information passed back the same way the QUERY took
• (no flooding)
• does not establish new links
avoids NAT problem
Dr.-Ing. Kalman Graffi, University of Paderborn
30
Institute of Computer Science
Research Group – Theory of Distributed Systems
Phase 3: Gnutella – Data Transfer
Peer sets up a HTTP connection
actual data transfer is not part of the Gnutella protocol
HTTP GET is used
Special case: peer with the file located behind a
firewall/NAT gateway
downloading peer
• cannot initiate a TCP/HTTP connection
• can instead send the PUSH message
– asking the other peer to initiate a TCP/HTTP connection to it and
– then transfer (push) the file via it
• does not work if both peers are behind firewalls
Dr.-Ing. Kalman Graffi, University of Paderborn
31
Institute of Computer Science
Research Group – Theory of Distributed Systems
Gnutella 0.4: Scalability Issues
A TTL (Time To Live) of 4 hops for the PING messages
leads to a known topology of roughly 8000 nodes
TTL in the original Gnutella client was 7 (not 4)
Gnutella in its original version (V 0.4) suffers from a range of scalability issues
due to
• fully decentralized approach
• flooding of messages
Frequency Distribution of Gnutella Messages (e.g. Portmann et. al.):
QUERY
54.8%
PONG
26.9%
PING
14.8%
QUERY HIT
2.8%
PUSH
0.7%
41.7% of the messages just for network discovery
Dr.-Ing. Kalman Graffi, University of Paderborn
32
Institute of Computer Science
Research Group – Theory of Distributed Systems
Gnutella 0.4: Scalability Issues
Low bandwidth peers easily use up all their
bandwidth
for passing on PING and QUERY messages of other users
leave no bandwidth for up- and downloads
Reason for breakdown of Gnutella network
in August 2000
Dr.-Ing. Kalman Graffi, University of Paderborn
33
Institute of Computer Science
Research Group – Theory of Distributed Systems
Gnutella 0.4: How Scalability Issues are tackled
Mechanisms to overcome the performance problems
Optimization of connection parameters
• number of hops etc
PING/PONG optimization
• e.g. C. Rohrs und V. Falco. Limewire: Ping Pong Scheme, April 2001.
– http://www.limewire.com/index.jsp/pingpong
Dropping of low-bandwidth connections
• To move low-bandwidth users to edge of Gnutella Network
Ultra-Peers
• similar to KaZaA super nodes
File Hashes to identify files
• similar to eDonkey
Chosen solution: Hierarchy (Gnutella v. 0.6):
SUPERPEERS (like in FastTrack..)
to cope with load of low bandwidth peers
Dr.-Ing. Kalman Graffi, University of Paderborn
34
Institute of Computer Science
Research Group – Theory of Distributed Systems
Gnutella 0.4 Problem: Free Riders
Free Rider
exist in big anonymous communities
selfish individuals that opt out of a voluntary contribution
• to the community social welfare
• i.e. by not sharing any files but downloading from others
• ... and get away with it
Study results (since e.g. Adar/Hubermann 2000):
70% of the Gnutella users share no files
90% answer no queries
Solutions
incentives for sharing
• peers only accept connections / forward messages
from peers that share content
• but: e.g.,
– how to verify?
– quality of the content?
micro-payment
• see Mojo Nation
Dr.-Ing. Kalman Graffi, University of Paderborn
35
Institute of Computer Science
Research Group – Theory of Distributed Systems
1.4
Rendezvous-based P2P Overlays
Rendezvous idea
Content is announced in a region (bubble) of the P2P
network
Queries flood just a region (bubble) of the P2P network
Announcements and queries meet at a rendevouz point
Replicate both queries and data
O(
) copies each (hidden constants unequal)
Data and queries rendezvous in the network
Source: Terpstra, Leng, Lehn – Short Presentation Bubblestorm
Dr.-Ing. Kalman Graffi, University of Paderborn
36
Institute of Computer Science
Research Group – Theory of Distributed Systems
Rendezvous-based P2P Overlay: Bubblestorm
Random Topology
Peer neighbors are chosen randomly
Allows efficient sampling of peers at random
Topology Measurement
To calculate bubble sizes the network size must be known
Computes network size and statistics through gossiping
Bubblecast
Replicates queries/data onto peers quickly
Intelligent flooding to / in the bubble
Bubble Maintainer
Preserves the correct number of replicas
Dr.-Ing. Kalman Graffi, University of Paderborn
37
Institute of Computer Science
Research Group – Theory of Distributed Systems
Random Multigraph Topology
Random graphs support the birthday paradox
Exploring an edge leads to a randomly sampled peer
Creation of random node subset (bubble) is cheap
Node degree is chosen proportional to bandwidth
As random walks (and bubblecasts) follow edges with
equal probability
Utilization will be balanced for heterogeneity
Dr.-Ing. Kalman Graffi, University of Paderborn
38
Institute of Computer Science
Research Group – Theory of Distributed Systems
Bubblecast Motivation
Flooding
Bubblecast
Random Walk
+ low latency
+ reliable
- imprecise node count
- unbalanced link load
+ low latency
+ reliable
+ precise node count
+ balanced link load
- high latency
- unreliable
+ precise length
+ balanced link load
node counter (not hops)
fixed branch factor
branch in every step
Source: Terpstra, Leng, Lehn – Short Presentation Bubblestorm
Dr.-Ing. Kalman Graffi, University of Paderborn
39
Institute of Computer Science
Research Group – Theory of Distributed Systems
Example Bubblecast Execution
Bubblecast: Announcement / query in the bubble
Procedure:
A counter specifies the number of replicas to create
Decrement the counter for matching locally
Split the counter between two neighbors
Counters are always integral
Final routing depth differs by at most one hop
1
1
-1 2
1
1 -1 2
1
-1
4
-1
3
-1
8
1
1
-1
4
-1
17
1
-1
3
-1
8
Dr.-Ing. Kalman Graffi, University of Paderborn
40
Institute of Computer Science
Research Group – Theory of Distributed Systems
BubbleStorm: Random Replication
Place data replicas on random nodes
Nodes evaluate query replicas on all stored data
Where both data and query go, matches are found
Collisions result from the birthday paradox
Data
Query
Source: Terpstra, Leng, Lehn – Short Presentation Bubblestorm
Dr.-Ing. Kalman Graffi, University of Paderborn
41
Institute of Computer Science
Research Group – Theory of Distributed Systems
BubbleStorm: Exploiting Heterogeneity
Peers have different capacities
Faster peers receive more traffic
This is beneficial!
Contribution is squared
Data
Query
Dr.-Ing. Kalman Graffi, University of Paderborn
42
Institute of Computer Science
Research Group – Theory of Distributed Systems
Bubblecast Properties
Used for query and data
replication
Fixed branch factor balances load
Same stationary distribution as a random walk
Counter for edges crossed, not hops
Precisely controls replica count
Logarithmic routing depth
Slightly deeper than flooding
Message loss reduces replication by log(size)
Samples random nodes
… due to random topology
Dr.-Ing. Kalman Graffi, University of Paderborn
43
Institute of Computer Science
Research Group – Theory of Distributed Systems
Complexity and Correctness
BubbleStorm costs roughly c n bandwidth / operation
to provide exhaustive search with P(failure) < e−c
2
c
1
2
3
P(success)
63.21%
98.17%
99.99%
The full equation ( e
− c 2 + c 3 HΥ
4
99.99999%
) is complicated by
Heterogeneous peer capacity (H)
Dependent sampling (due to repeated withdrawals)
Unequal query and post traffic (Υ; BS optimizes this)
Full details in the paper
Dr.-Ing. Kalman Graffi, University of Paderborn
44
Institute of Computer Science
Research Group – Theory of Distributed Systems
2
Unstructured Heterogeneous P2P Overlays
Principles
Heterogeneous Overlays
Gnutella-0.6
Decentralized File Sharing with Distributed Servers
eDonkey, eMule
Decentralized File Sharing with Super Nodes
KaZaA
Fasttrack
message exchange
at login
super node
regular node
Skype login
server
Unstructured Heterogeneous Resource Sharing
Skype
Dr.-Ing. Kalman Graffi, University of Paderborn
45
Institute of Computer Science
Research Group – Theory of Distributed Systems
Unstructured Heterogeneous P2P Overlays
Unstructured P2P
Structured P2P
Centralized P2P
Homogeneous P2P
Heterogeneous P2P
DHT-Based
Heterogeneous P2P
1. All features of
Peer-to-Peer
included
2. Central entity is
necessary to
provide the
service
3. Central entity is
some kind of
index/group
database
1. All features of
Peer-to-Peer
included
2. Any terminal
entity can be
removed without
loss of
functionality
3.
no central
entities
1. All features of
Peer-to-Peer
included
2. Any terminal
entity can be
removed without
loss of
functionality
3.
dynamic
central entities
1.
1. All features of
Peer-to-Peer
included
2. Peers are
organized in a
hierarchical
manner
3. Any terminal
entity can be
removed without
loss of
functionality
Examples:
Gnutella 0.4
Freenet
Examples:
Gnutella 0.6
Fasttrack
eDonkey
Examples:
Napster
All features of
Peer-to-Peer
included
2.
Any terminal
entity can be
removed
without loss of
functionality
3.
No central
entities
4.
Connections in
the overlay are
“fixed”
Examples:
Chord
CAN
Kademlia
Examples:
• AH-Chord
• Globase.KOM
from R.Schollmeier and J.Eberspächer, TU München
Dr.-Ing. Kalman Graffi, University of Paderborn
46
Institute of Computer Science
Research Group – Theory of Distributed Systems
Principles – Hierarchical / Heterogeneous
Approach:
to combine best of both
worlds
Robustness by distributed
indexing
Fast searches by server
queries
Components
Supernodes
• mini servers / super peers
• used as servers for queries
– To build a sub-network
between supernodes
– queries distributed at subnetwork between supernodes
++ Advantages
More robust than
centralized solutions
Faster searches than in
pure P2P systems
-- Disadvantages
Need of algorithms to
choose reliable
supernodes
“Normal” peers
• have only overlay connections
to supernodes
Picture from R.Schollmeier and J.Eberspächer, TU München
Dr.-Ing. Kalman Graffi, University of Paderborn
47
Institute of Computer Science
Research Group – Theory of Distributed Systems
2.1
Decentralized File Sharing with Distributed Servers
For example: eDonkey
see e.g.
• http://www.overnet.org/
• http://www.emule-project.net/
• http://savannah.gnu.org/projects/mldonkey/
eDonkey file-sharing protocol
most successful/used file-sharing protocol in
• e.g. Germany & France in 2003 [see sandvine.org]
– 52% of generated P2P file sharing traffic
– KaZaA only for 44% in Germany
Stopped by law
• February 2006 largest server „Razorback 2.0“ disconnected be
Belgium police
– http://www.heise.de/newsticker/eDonkey-Betreiber-wirft-endgueltigdas-Handtuch--/meldung/78093
Dr.-Ing. Kalman Graffi, University of Paderborn
48
Institute of Computer Science
Research Group – Theory of Distributed Systems
The eDonkey Network - Principle
Distributed server(s)
set up and RUN BY POWER-USERS
nearly impossible to shut down all servers
exchange their server lists with other servers
• using UDP as transport protocol
manages file indices
Client application
connects to one random server and
stays connected
using a TCP connection
searches are directed to the server
clients can also extend their search
• by sending UDP search messages to additional servers
Dr.-Ing. Kalman Graffi, University of Paderborn
49
Institute of Computer Science
Research Group – Theory of Distributed Systems
The eDonkey Network
Search
TCP
UDP
Server List
Exchange
Supernode
Download
Node
Extended
Search
Dr.-Ing. Kalman Graffi, University of Paderborn
50
Institute of Computer Science
Research Group – Theory of Distributed Systems
The eDonkey Network
Procedure
Search
New servers send
• their port + IP to other servers (UDP)
Servers send
• server lists (other servers they know) to
the clients
Server lists can also be downloaded
on various websites
Server List
Exchange
Files are identified by
Download
unique MD4
• Message-Digest Algorithm4, RFC 1186
file hashes
• 16 byte long
Extended
Search
are not identified by filenames
this helps in
• resuming a download from a different
source
• downloading the same file from multiple
sources at the same time
• verification that the file has been
correctly downloaded
TCP
UDP
Supernode
Node
Dr.-Ing. Kalman Graffi, University of Paderborn
51
Institute of Computer Science
Research Group – Theory of Distributed Systems
The eDonkey Network
the SEARCH consists of two steps
1. Full text search to
• connected server (TCP) or
• extended search with UDP to other known servers.
Search result are the hashes of matching files
2. Query Sources
• query servers for clients offering a file with a certain hash
Later
• download from these sources
Dr.-Ing. Kalman Graffi, University of Paderborn
52
Institute of Computer Science
Research Group – Theory of Distributed Systems
eDonkey: User Behavior
Average file size 217MB
(indication that many videos are shared)
distribution of the size of files
Zipf distribution
Heckmann et. al. The eDonkey File-Sharing Network. Gi-Informatik 2004
Dr.-Ing. Kalman Graffi, University of Paderborn
53
Institute of Computer Science
Research Group – Theory of Distributed Systems
eDonkey: User Behavior
57.8 files shared on average
Number of shared files/user
Zipf distribution
Heckmann et. al. The eDonkey File-Sharing Network. Gi-Informatik 2004
Dr.-Ing. Kalman Graffi, University of Paderborn
54
Institute of Computer Science
Research Group – Theory of Distributed Systems
Decentralized File Sharing with Super Nodes
2.2
see
www.kazaa.com, gift.sourceforge.net, http://www.my-k-lite.com/
System
Developer: Fasttrack
Clients: KaZaA
Properties:
most successful P2P network in USA in 2002/3
Architecture: neither completely central nor decentralized
Supernodes to reduce communication overhead
P2P system
#users
#files
terabytes
#downloads
(from
download.com
Fasttrack
2,6 Mio.
472 Mio.
3550
4 Mio.
eDonkey
230.000
13 Mio.
650-2600
600.000
Gnutella
120.000
28 Mio.
105
Ca. 525.000
Numbers are from 10‘2002
Dr.-Ing. Kalman Graffi, University of Paderborn
55
Institute of Computer Science
Research Group – Theory of Distributed Systems
Decentralized File Sharing with Super Nodes
Peers
connected only to some super nodes
send IP address and file names only to super peers
Super nodes - super peers:
peers with high-performance network connections
take the role of the central server and proxy for simple peers
answer search messages for all peers (reduction of comm. load)
one or more supernodes can be removed without problems
Additionally, the communication between nodes is encrypted
Search
Service
Delivery
Search
Download
Superpeer
Peer
Dr.-Ing. Kalman Graffi, University of Paderborn
56
Institute of Computer Science
Research Group – Theory of Distributed Systems
Decentralized File Sharing with Complete Files
Drone 1 receives
25% of the file
at 12,5 KB/s rate
Drone 1 has
50 KB/s upload rate
not utilized
until he has whole file
Queen Bee has
100 MB file
50 KB/s upload rate
in total
At the beginning
Later
From www.wtata.com
Dr.-Ing. Kalman Graffi, University of Paderborn
57
Institute of Computer Science
Research Group – Theory of Distributed Systems
Issues with KaZaA
Keyword-based search
You do not know what you get
Pollution a problem
• Music companies flooded the network with false files
• Chance to get a “good” file ~ 10%
• Problem for “small” files
Full file download before uploading
User go offline after download finished
Only few uploaders online
Problem for “large” files
Dr.-Ing. Kalman Graffi, University of Paderborn
58
Institute of Computer Science
Research Group – Theory of Distributed Systems
Google Trends for KaZaA, Limewire, Torrent, Emule
http://www.google.com/trends?q=kazaa,+limewire,+torrent,+emule
Dr.-Ing. Kalman Graffi, University of Paderborn
59
Institute of Computer Science
Research Group – Theory of Distributed Systems
2.3
Unstructured Hybrid Resource Sharing: Skype
Offered Services
IP Telephony features
File exchange
Instant Messaging
Features
KaZaA technology
High media quality
Encrypted media delivery
Support for teleconferences
Multi-platform
Further Information
Very popular
Low-cost IP-Telephony business
model
SkypeOut extension to call
regular phone numbers (not free)
Great business potential if
combined with free WIFIs
Very
popular
From www.skype.com
Dr.-Ing. Kalman Graffi, University of Paderborn
60
Institute of Computer Science
Research Group – Theory of Distributed Systems
Skype vs. Call-by-Call
Skype-to-Skype
TCP/IP
Internet
SkypeOut
PSTN
Call-by-Call
Internet
Call-by-Call
PSTN
TCP/IP
Call-by-Call
PSTN
(rented)
PSTN
Dr.-Ing. Kalman Graffi, University of Paderborn
61
Institute of Computer Science
Research Group – Theory of Distributed Systems
Skype
Network Architecture
formerly KaZaA based
message exchange
at login
super node
regular node
Skype login
server
Dr.-Ing. Kalman Graffi, University of Paderborn
62
Institute of Computer Science
Research Group – Theory of Distributed Systems
Conclusion of this Section
Principles of unstructured P2P overlays
Centralized unstructured P2P overlays
Napster
Homogeneous unstructured P2P overlays
Gnutella
Bubblestorm
Heterogeneous unstructured P2P overlays
Hybrid unstructured P2P networks
• Decentralized File Sharing with Distributes Servers
• like eDonkey/eMule/mlDonkey
Hierarchical unstructured P2P networks
• Decentralized File Sharing with Super Nodes
• like KaZaA/Fasttrack
Skype
Dr.-Ing. Kalman Graffi, University of Paderborn
63