Communication Systems Peer-To-Peer Networking

Transcription

Communication Systems Peer-To-Peer Networking
Mühlenpfordtstraße 23, 38106 Braunschweig, Germany
Email: [email protected]
l5_p2p_e.fm 1 6.August.02
3.2 Processing Sharing with Central Server
3.3 Decentral File Sharing
3.4 Anonymous File Sharing
4. Second Generation P2P Networks
4.1 Decentral File Sharing with Supernodes
4.2 File Sharing with Charging
4.3 Distributed Search Engine
4.4 P2P Virus Protection
5. GRID Computing
6. Problems in P2P Networking
7. Annex: Literature about P2P
l5_p2p_e.fm 3 6.August.02
Kommunikationssysteme: Peer-to-peer
3.1 File Sharing with Central Server
L2
Data Link Layer
(Sicherung)
L1
Physical Layer
(Bitübertragung)
WAN:
ISDN &
ATM
Web
Telnet
Files
Email
Internet:
IP
LAN, MAN
High-Speed LAN
Media
Data Flow
RT(C)P
Transport
Network
Other Lectures of “ET/IT” & Computer Science
Introduction
One of the newest buzzwords in networking is Peer-to-Peer (P2P).
Is P2P a hype?
• 40 million Napster users in 2 years
• strong presence in international networking conferences
• strong support by industry (e.g. Intel, Sun, Deutsche Bank)
• mayor traffic source, e.g. 10’2001 at TU Munich: ~40% P2P, ~45% Web
l5_p2p_e.fm 4 6.August.02
Security
Network Layer
(Vermittlung)
MM COM - QoS specific
L3
Internet:
TCP, UDP
Moblie IP
Transport Layer
(Transport)
IP-Tel:
Signal.
H.323 SIP
1. Motivation
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
3. First Generation P2P Protocols/Applications
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
2. Peer To Peer Networking (Basics)
L4
P2P
l5_p2p_e.fm 2 6.August.02
Overview
1. Motivation
Application Layer
(Anwendung)
Transitions & Addressing
Applications
L5
Mobile Communications
TU Braunschweig
Institut für Betriebssysteme und Rechnerverbund
Complementary Courses: Multimedia Systems, Distributed Systems,
Mobile Communications, Security, Web, Mobile+UbiComp, QoS
Kommunikationssysteme: Peer-to-peer
Prof. Dr.-Ing Lars Wolf
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Communication Systems
Peer-To-Peer Networking
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Scope
2nd generation (since 90ties):
• WWW & graphical browsers
• dynamic IP addresses / NAT* (network address
translation) / roaming users
• heterogeneous applications
• asymmetric server based services
• protocol: HTTP
⇒ World Wide Web
*
NAT (network address
translation):
• clients in a (company)
network do not get a
global IP address,
only a local one:
192.168.XX.XX
• gateway has a global
IP address and
translates local
addresses
⇒ impossible to open a
connection from
outside to a client in
a NAT network
l5_p2p_e.fm 5 6.August.02
2000 - the P2P revolution?
Kommunikationssysteme: Peer-to-peer
⇒ World Wide Access
Evolution of Internet Computing Paradigms
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
(1)
1st generation (since the beginning of the Internet):
• permanent IP adresses
• static domain name system (DNS) mapping
• always connected
• limited specialized centralized applications
• protocols: Telnet, FTP, Gopher, ....
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Evolution of Internet Computing Paradigms
l5_p2p_e.fm 6 6.August.02
Litmus test for P2P:
1. does it treat variable connectivity as the norm?
e.g. does it support dial-up users with variable IP addresses
2. does it give the nodes at the edges of the network significant autonomy?
e.g. is storage / processing done by autonomous end-systems
⇒ if the answer to both is YES then the application is P2P otherwise it is not.
See Andy Oram: Peer-To-Peer / Harnessing the Power of Disruptive
Technologies, O’Reilly 2001
l5_p2p_e.fm 7 6.August.02
• nodes act both as clients and servers:
⇒ "SERVer + cliENT = SERVENT"
• P2P application easy to use and well integrated
Kommunikationssysteme: Peer-to-peer
Peer-to-peer (P2P) is a class of applications that takes advantage of resources storage, cycles, human presence - available at the edges of the Internet.
Because accessing these decentralized resources means operating in an
environment of unstable connectivity and unpredictable IP addresses, peer-topeer nodes must operate outside the DNS and have significant or total
autonomy from central servers
P2P Architecture Characteristics
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
C. Shirkey:
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
2. Peer To Peer Networking (Basics)
DEFINITION OF P2P NETWORKING
3rd generation (since 2000):
• more collaboration and personalized applications
• powerful edge devices (peers)
• instant networking
• protocols/applications:
• Napster
• Gnutella
• Fasttrack
• Freenet
• ...
• P2P applications operate outside the domain name system (DNS)
• Napster/Fasttrack username
• dynamic IP address
• ...
• P2P applications operate in unstable environments
• e.g. dial-up connections, users disconnect and change their IP often
l5_p2p_e.fm 8 6.August.02
(2)
Cost effectiveness
• reduces centralized management resources
• optimizes computing, storage and communication resources
• rapid deployment
P2P applications/protocols tailored for user’s needs
• Napsters success depended to a great amount on its ease of use
l5_p2p_e.fm 9 6.August.02
Highly attractive content
• users share their content with other users
⇒ attractive content
• copyrights are usually not respected
⇒ cheap content
Kommunikationssysteme: Peer-to-peer
Group collaboration superior for Business Processes
• grow organically, non-unifom and highly dynamic
• largely manual, ad-hoc, iterative and document-intensive work
• often distributed, not centralized
• no single person/organisation understands the entire process from
beginning to end
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Why P2P Networking?
New Services at the edge of the network
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Why P2P Networking?
basic network node
⇒ P2P can learn a lot from the according Internet solutions (routing
algorithms etc)
⇒ P2P and the Internet are based on very much the same principles (but on
different layers)
Kommunikationssysteme: Peer-to-peer
overlay network node
3. First Generation P2P Protocols/Applications
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
P2P networks form an overlay network on
top of the Internet IP network
l5_p2p_e.fm 11 6.August.02
processing power
l5_p2p_e.fm 10 6.August.02
The Principles of P2P and the Internet
IP rsp. the Internet
• form an overlay network (politically and
technically) over the underlying telecom
infrastructure
• introduced their own addressing
scheme
• e.g. user names
• emphasized fault-tolerance
• are based on the end-to-end principle:
as much intelligence as possible to the
end nodes
Unused resources
• assume e.g. a large business with 2000 desktop computers:
• storage space: 2000 x 10 GB = 20 TB spare storage space
• processing power: 2000 x 600 MHz x 5 ops/cycle = 6 trillion ops/sec spare
First approaches for
• Distributed Collaboration/Communication
• P2P groupware, P2P content generation
• P2P instant messaging
• Online games
• Distributed Storage
• P2P file sharing, online backups
• Distributed Computing
• P2P CPU cycle sharing
• Distributed simulation
• Distributed Search Engines, Intelligent Agents
l5_p2p_e.fm 12 6.August.02
(2)
l5_p2p_e.fm 13 6.August.02
Kommunikationssysteme: Peer-to-peer
• first famous P2P filesharing tool
• for downloading mp3 files only
• central P2P network
• decentral storage (content at the edges)
• central server (file index, search engine)
• file transfer between clients
• decentralization as a tool, not a goal
• users as providers and consumers
• napster user namespace instead of
• fixed IP addresses
• domain name system (DNS)
3.2 Processing Sharing with Central Server
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
see e.g. http://www.napster.com/
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
3.1 File Sharing with Central Server
main differences to Napster
• no central server
• all kinds of files are shared
• beyond filesharing (more flexibility)
l5_p2p_e.fm 15 6.August.02
Kommunikationssysteme: Peer-to-peer
decentral filesharing tool
parallel analysis of radio signals
SETI = Search for Extraterrestrial Intelligence
about 50 GB of data coming in from the Arecibo telescope per day
distributed via central server to millions of processing-clients
client = screensaver (using idle CPU cycles)
uses unused processing power of desktop computers
• architecture similar to napster
• started 1998
until october 2000 4x1020 floating point operations performed (largest
computation ever performed)
• similar idea: GRID computing
•
•
•
•
Gnutella: Architecture
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
http://dss.clip2.com/GnutellaProtocol04.pdf
http://gnutella.wego.com/
http://www.gnutella.co.uk/
• first famous P2P network for sharing processing power for the massively
l5_p2p_e.fm 14 6.August.02
3.3 Decentral File Sharing
see e.g.
see e.g. http://setiathome.ssl.berkeley.edu/
Characteristics
• each node keeps a database of known and connected nodes
• message broadcasting for node discovering and search requests
• each message can be identified by a globally unique ID (GUID)
• flooding (to all connected nodes) is used to distribute information
• nodes recognize message they already have forwarded by their GUID and do
not forward them twice
Functionality of the Protocol
• connecting
• PING message: Actively discover hosts on the network
• PONG message: Answer to the PING messages, includes information about
one connected Gnutella servent
• search
• QUERY message: Searching the distributed network
• QUERY HIT message: Response to a QUERY message (can contain several
matching files of one servent)
• data transfer
• HTTP is used to transfer files (HTTP GET)
• PUSH message: To circumvent firewalls
l5_p2p_e.fm 16 6.August.02
ping
servent
2
ping
servent
1
pong
servent
2
ping
pong
servent
1
servent
2
l5_p2p_e.fm 17 6.August.02
Kommunikationssysteme: Peer-to-peer
servent
1
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Gnutella: Connecting - The Gnutella Net
In order to connect to a Gnutella network the servent must initially know (at
least) one member node of the network and connect to it
• these first member nodes must be found by other means (IRC, Web, ...)
• nowadays host caches are usually used
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Gnutella: Connecting
servent
2
query
servent
1
servent
2
query
QUERY message is distributed to all connected nodes (flooding)
• a node that receives a QUERY message increases the HOP count field of the
message and forwards it to all nodes (except the one he received it from) if
• HOP <= TTL (Time To Live)
• A QUERY message with this GUID was not received before
(2)
• the node also checks wether he can answer to this QUERY with a QUERYHIT
message (that is wether he has files available that match the search criteria)
Kommunikationssysteme: Peer-to-peer
query
Gnutella: Searching
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
servent
1
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Gnutella connection (TCP on specified Port)
The servent connects to a number of nodes he has PONG messages from and
thus becomes part of the Gnutella net
l5_p2p_e.fm 18 6.August.02
Gnutella: Searching
l5_p2p_e.fm 19 6.August.02
servent
1
servent
1
query hit
servent
2
• the QUERYHIT message contains the IP address of the sender plus
information about one or more files that match the search criteria. The
information is not flooded but passed back the same way the QUERY took.
l5_p2p_e.fm 20 6.August.02
• if the servent with the file is behind a firewall, then
• the downloading servent cannot initiate a connecting
• he can instead send the PUSH message, asking the other servent to instead
initiate a HTTP connection to him and push the file via it
• does not work if both servents are behind firewalls
l5_p2p_e.fm 21 6.August.02
QUERY
54.8%
PONG
26.9%
PING
14.8%
QUERY HIT
2.8%
PUSH
0.7%
⇒ 41.7% of the messages just for network discovery
PING rate
500 Nodes
4000 Nodes
8000 Nodes
1/min
4.8
68.2
194.9
1/sec
288
4090.4
11694.5
Average bandwidth usage (kbps) per node for search messages:
Search rate
500 Nodes
4000 Nodes
8000 Nodes
1/min
2.5
36.8
127.0
1/sec
151.0
2211.0
7617.3
⇒ Low bandwidth clients easily use up all their available bandwidth for
passing on PING and QUERY messages of other users, leaving no
bandwidth for up- and downloads
Free Rider
• exist in big anonymous communities
• selfish individuals that opt out of a voluntary contribution to the community
social welfare (i.e. by not sharing any files but downloading from others)
• ... and get away with it
Kommunikationssysteme: Peer-to-peer
Average bandwidth usage (kbps) per node for peer discovery :
Gnutella Problem: Free Riders
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
(2)
A TTL (Time To Live) of 4 hops for the PING messages leads to a known
topology of rougly 8000 nodes! The TTL in the original Gnutella client was 7!
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Frequency Distribution of Gnutella Messages (Portmann et. al.):
l5_p2p_e.fm 22 6.August.02
Gnutella Problem: Scalability Issues
l5_p2p_e.fm 23 6.August.02
Gnutella suffers from a range of scalability issues due to the decentral
approach and the flooding of messages.
Kommunikationssysteme: Peer-to-peer
• HTTP GET is used instead
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Gnutella Problem: Scalability Issues
• data transfer is not part of the Gnutella protocol !
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Gnutella: Data Transfer
Study results (Adar/Hubermann):
• 70% of the Gnutella users share no files
• 90% answer no queries
Solutions
• incentives for sharing
• servents only accept connections / forward messages from servents that share
content
• but: how to verify? quality of the content?
• micropayment
• see Mojo Nation
l5_p2p_e.fm 24 6.August.02
japanese
movies
british movies
Gnutella connection
Gnutella Node
IP Route
IP Node
l5_p2p_e.fm 25 6.August.02
german movies
l5_p2p_e.fm 27 6.August.02
⇒ This overlay network has a better topology
e.g. Freenet
see http://freenet.sourceforge.net/
Kommunikationssysteme: Peer-to-peer
Overlay Topology
• G. Pandurangan et.al.: Building Low-Diameter P2P Networks
• G. Pandurangan et.al.: Building P2P networks with good topological
properties
british movies
3.4 Anonymous File Sharing
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Free Riders
• E. Adar, B. Hubermann: Free Riding on Gnutella
• P. Golle et.al.: Incentives for Sharing in Peer-to-Peer Networks
japanese
movies
l5_p2p_e.fm 26 6.August.02
Gnutella: Related Work
Scalability
• M. Portmann et. al.: The Cost of Peer Discovery and Searching in the Gnutella
Peer-to-peer File Sharing Protocol
• J. Guterman: Gnutella to the Rescue? Not so Fast, Napster fiends.
• K. Sripanidkulchai, Carnegie Mellon University: The popularity of Gnutella
queries and its implications on scalability
• Clip2.com: Bandwidth Barriers to Gnutella Network Scalability
(2)
search for german movie
Kommunikationssysteme: Peer-to-peer
german movies
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Gnutella Problem: Overlay Topology Design
search for german movie
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Gnutella Problem: Overlay Topology Design
Documents are encrypted, cut into several pieces and stored on several
machines
• provides anonymity for users
• author of document cannot be identified
• prohibits censorship of documents
• as document pieces are copied on several machines
• removes any single point of failure or control
• decentral network
• provide plausible deniability for node operators
• as they cannot read the content on their discs
l5_p2p_e.fm 28 6.August.02
Charging for Content and Services
• battling free riders
Adaptation to Business Applications
• e.g. distributed search engine
• e.g. virus protection
l5_p2p_e.fm 29 6.August.02
see
www.fasttrack.nu
www.musiccity.com, www.kazaa.com, www.grokster.com
gift.sourceforge.net
Developer: Fasttrack
Kommunikationssysteme: Peer-to-peer
Performance Improvements
• decentral networks with supernodes
4.1 Decentral File Sharing with Supernodes
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Currently the second generation of P2P networks is emerging
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
4. Second Generation P2P Networks
l5_p2p_e.fm 31 6.August.02
take the role of the central server
• supernodes answer search messages of the other nodes
• one or more supernodes can drop out without problems
• additionally the communication between nodes is encrypted
example:
posted to server
t=1
Rev 1.0
Kommunikationssysteme: Peer-to-peer
• extends Freenet
• defines with electronic money (mojos)
• mojos are earned by offering resources to the Mojo Nation network
• storage space
• processing power
• bandwidth
• mojos can be spent on
• searching
• downloading
• neither completely central nor decentral:
• distributed supernodes (nodes with high-performance network connections)
4.3 Distributed Search Engine
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
see http://www.mojonation.net/
Properties:
• currently most successfull P2P network
l5_p2p_e.fm 30 6.August.02
4.2 File Sharing with Charging
e.g. Mojo Nation
Clients: Morpheus, KaZaA, Grokster, GiFT/OpenFT
Document
Server
posted to server
t=2
Rev 1.1
t=3
Rev 1.2
t=4
Rev 1.3
P2P
Search
Rev 1.0
Rev 1.1
Rev 1.2
Rev 1.3
Server
Search
Rev 1.0
Rev 1.0
Rev 1.0
Rev 1.3
⇒ P2P search yields better results
other business applications: see JXTA, Groove
l5_p2p_e.fm 32 6.August.02
t
strange
ILOVEYOU.vbs
file
strange
ILOVEYOU.vbs
file
Virus Warning
Peer 2
Peer 3
strange
ILOVEYOU.vbs
file
Peer 4
l5_p2p_e.fm 33 6.August.02
Similar idea, similar concept as in P2P
For many scientific applications high performance data processing centers are
needed. They are expensive to provide and often do not offer enough
performance. Thus the idea was born to interconnect the existing data
processing centers into the GRID distributed processing center.
Kommunikationssysteme: Peer-to-peer
Peer 1
strange
ILOVEYOU.vbs
file
5. GRID Computing
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
example:
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
4.4 P2P Virus Protection
Interoperability
• there are a number of incompatible protocols
• there are no standards and there is no dominating protocol (yet)
l5_p2p_e.fm 35 6.August.02
Kommunikationssysteme: Peer-to-peer
Copyright Management
• how can I be sure that I am not downloading illegal stuff?
• how can the rights of the owners of intellectual property be enforced?
Typical Transfer
Volume
Typical Service
Typical Problems
Mainly private users
Scientists
Small (MP3) to medium
(video)
Huge (often terabytes)
File Sharing
Processing Sharing
Hugh number of users
cause scalability issues
Transfering huge
amounts of data
Problems in P2P Networking
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Free Riders
• see Gnutella example
• incentives for sharing content, for not misbehaving ....
Users
GRID
Saving costs for data
processing centers
l5_p2p_e.fm 34 6.August.02
6. Problems in P2P Networking
Performance & Scalability
• see Gnutella example
• much more communication overhead than in client-server systems
• bandwidth is usually scarce
• if a peer is unreachable, TCP/IP can take up to several minutes to time out the
connection
History
P2P
Sharing MP3 files &
illegal content
(2)
Trust
• how can I be sure that the document/software/movie... I am downloading is
the one that it was announced as
• how can I be sure that it is unchanged, does not contain viruses etc.?
• which community members can I trust?
• trust can be increased by
• decreasing the number of people that must be trusted
• reducing the risk
• components: message digest functions, digital certificates etc.
• see slides KNII: Security
⇒ Potential Solution: Reputation Mechanisms
• see eBay, PGP web of trust, ...
• eBay collects feedback about each eBay participant, users are encouraged to
post feedback about the trade and rate their trading partner.
• someone considering a trade can look into the trading partner’s eBay record
• difficult in a decentral network
• tradeoff with anonymity
l5_p2p_e.fm 36 6.August.02
⇒ There are a number of important unsolved problems in P2P networking
l5_p2p_e.fm 37 6.August.02
7. Annex: Literature about P2P
Andy Oram: Peer-To-Peer / Harnessing the Power of Disruptive Technologies,
O’Reilly 2001
Kurt Tutschku: Management of Peer-To-Peer Networks
Gnutella Protocol Specification: http://dss.clip2.com/GnutellaProtocol04.pdf
Krishna Kant, Vijay Tewari: On the Potential of Peer-to-Peer Computing
Kommunikationssysteme: Peer-to-peer
Other security aspects
• authentication?
• protection against denial of service attacks?
• protection against tampering of files, search messages etc.?
• virus protection?
• P2P protocols are built to circumwent firewalls
• ...
(3)
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Kommunikationssysteme: Peer-to-peer
IBR (Institut für Betriebssysteme und Rechnerverbund) – TU Braunschweig
Problems in P2P Networking
c’t Computermagazin: P2P & Filesharing-Reports in 06/2001 & 26/2001
DFN Symposiums Präsentation: www.dfn.de/projekte/symposium01/
oertel_symposium2001_p2p/
l5_p2p_e.fm 38 6.August.02

Similar documents