The Architecture of the World Wide Web Min Song IS

Transcription

The Architecture of the World Wide Web Min Song IS
The Architecture of the World
Wide Web
Min Song
IS
NJIT
Internet Architecture
 Today’s Internet




Thousands of networks
Connected by legal agreements and commercial
contracts
Uses TCP/IP protocol
Internet service providers (ISPs)




Provide most individual users with access to the Internet
Dialup connections
Modems and conventional phone lines
xDSL and cable modems provide broadband access
Packet Switching
 Most modern Wide Area Network (WAN) protocols,
including TCP/IP, X.25, and Frame Relay
 Packet switching is more efficient and robust for
data that can withstand some delays in transmission,
such as e-mail messages and Web pages.
 Circuit-switching: Normal telephone service is based
on a circuit-switching technology



a dedicated line is allocated for transmission between
two parties.
data must be transmitted quickly and must arrive in
the same order in which it's sent.
real-time data, such as live audio and video.
Use of Packets
Internet Protocols:TCP/IP
 Communications protocol suite

Packet switched protocol

Transmission Control Protocol (TCP)

Internet Protocol (IP)



No end-to-end connection is required
Each message broken down into small pieces called packets
Packets possibly routed to destination over different
paths



Breaks messages into packets
Numbers packets in order
Reorders packets at the destination

Routes packets to the proper destination
Domain Names
 Every computer connected to the Internet must have
a unique IP address

IP address format is xxx.xxx.xxx.xxx where xxx is a
number between 0 and 255
 How do we know that 207.46.245.222 is Microsoft?
 Domain Name Service(DNS)



A database of Internet names
DNS Servers convert Internet names to IP addresses
Top level domains
 Ping: to test whether a particular host is
reachable across an IP network.
 Tcpdump:to sniff network packets and make
some statistical analysis out of those dumps
The World Wide Web
 Collection of hyperlinked computer files on the Internet
 Client-server application
 Web servers
 Web browsers as clients
 WWW standards
 Hypertext markup language (HTML)



Current standard for writing Web pages
Implementation of SGML specifically for Web pages
Tags in HTML instruct the client browser how to format and
display the Web page content

Hypertext transfer protocol (HTTP)

Extensible markup language (XML)



Protocol that establishes a connection between Web server and
client
A meta-markup language
Gives meaning to the data enclosed within XML tags
Static versus Dynamic Web Pages
 HTML and XML only display and exchange data
 No interactivity; no processing of data
 Scripting languages
 Provides basic interactivity


Rollovers
Crawling text
 JavaScript
 VBScript
 Full-featured Web programming
 Java
 Client side scripting or browser side scripting
 Applets
 J2EE
 Common Gateway Interface (CGI)
 Allows passing of data between a static HTML page and a
computer program
Searching the WWW
 Most data on the Internet is part of the WWW
 Search engines – large databases that index WWW
content
 Building the search engine database

Submit a site to the search engine administrator for
listing
Spiders


Google
Yahoo


Metatags
Hypertext Transfer Protocol
 A protocol (syntax and semantics) for
transferring representations of resources
 usually across the Internet using TCP
 Design goals
 speed (stateless, cachable, few roundtrips)
 simplicity
 extensibility
 data (payload) independence
 A true network-based API
HTTP/0.9 (pre-1993)
 Absolute Simplicity
GET /url-path
<TITLE>Hello World</TITLE>
Hello World
 No Extensibility
 only one method (GET)
 no request modifiers
 no response metadata
HTTP/1.0 (1993-present)
 Simple and (mostly) Extensible
GET /Test/hello.html HTTP/1.0
Accept: text/html
User-Agent: GET/5 libwww-perl/0.40
HTTP/1.0 200 OK
Date: Fri, 12 Jan 1996 01:02:49 GMT
Server: Apache/1.0.5
Content-type: text/html
Content-length: 38
Last-modified: Wed, 10 Jan 1996 01:
<TITLE>Hello</TITLE>
Hello out there!
HTTP/1.0 Deficiencies
 No complete specification until end of `94
 No minimum standard for compliance
 Poor network behavior






one request per connection
no reliable transfer of dynamic content
no control over response caching
failed to anticipate proxies and gateways
created huge demand for vanity addresses
misuse/misunderstanding of MIME
HTTP/1.1
 Culmination of two years work, RFC2068
 with Henrik Frystyk, Jim Gettys, Jeff
Mogul
 designed at UCI and W3C; expanded in
IETF
 Improved Reliability
 chunked transfer of dynamic content
 recognition of proxy and gateway
requirements
 explicit cachability of responses
 Improved Network Behavior
 persistent connections
 virtual hosts (many names, one address)
HTTP/1.1 (1997-????)
 Less Simple, More Extensible, but Compatible
GET /Test/hello.html HTTP/1.1
Host: kiwi.ics.uci.edu:8080
User-Agent: GET/7 libwww-perl/5.40
HTTP/1.1 200 OK
Date: Fri, 07 Jan 1997 15:40:09 GMT
Server: Apache/1.2b6
Content-type: text/html
Transfer-Encoding: chunked
Etag: “a797cd-465af”
Cache-control: max-age=3600
Vary: Accept-Language
…
HTTP/1.x Deficiencies
 MIME is too verbose (overhead per message)
 Control mixed with metadata
 Metadata restricted to header or trailer
 Fixed request/response ordering can block
progress
 Incurs frequent round-trip delays due to
connection establishment.
HTTP/2.x
 Tokenized transfer of common fields
 reducing bandwidth usage, latency
 removal of MIME syntax limitations
 self-descriptive for extensions
 Multiplexing control, data, metadata streams
 reducing desire for multiple connections
 enabling multi-protocol connections
 per-stream priority or credit mechanism
 Layered streams for meta-metadata,
encryption...
XML to the rescue?
 “X” for extensible:
 self-descriptive syntax
 semantics by reference (doctype,
namespaces)
 rendering by reference (style sheets)
 An XML representation is an object turned
inside-out, with behavior-by-reference
 However, network application performance
will demand standards for domain-specific
doctypes and style sheets
Future Work
 Dynamic application architectures
 Architectural analysis and performance
bounds
 Impact of future network architectures
(ATM)
 Balancing secure transfer with firewall
visibility
 Protocol for manipulating resource mappings
 HTTP-NG (W3C/Xerox PARC)
 rHTTP
(UCI)