The Architecture of the World Wide Web Min Song IS


The Architecture of the World Wide Web Min Song IS
The Architecture of the World
Wide Web
Min Song
Internet Architecture
 Today’s Internet
Thousands of networks
Connected by legal agreements and commercial
Uses TCP/IP protocol
Internet service providers (ISPs)
Provide most individual users with access to the Internet
Dialup connections
Modems and conventional phone lines
xDSL and cable modems provide broadband access
Packet Switching
 Most modern Wide Area Network (WAN) protocols,
including TCP/IP, X.25, and Frame Relay
 Packet switching is more efficient and robust for
data that can withstand some delays in transmission,
such as e-mail messages and Web pages.
 Circuit-switching: Normal telephone service is based
on a circuit-switching technology
a dedicated line is allocated for transmission between
two parties.
data must be transmitted quickly and must arrive in
the same order in which it's sent.
real-time data, such as live audio and video.
Use of Packets
Internet Protocols:TCP/IP
 Communications protocol suite
Packet switched protocol
Transmission Control Protocol (TCP)
Internet Protocol (IP)
No end-to-end connection is required
Each message broken down into small pieces called packets
Packets possibly routed to destination over different
Breaks messages into packets
Numbers packets in order
Reorders packets at the destination
Routes packets to the proper destination
Domain Names
 Every computer connected to the Internet must have
a unique IP address
IP address format is where xxx is a
number between 0 and 255
 How do we know that is Microsoft?
 Domain Name Service(DNS)
A database of Internet names
DNS Servers convert Internet names to IP addresses
Top level domains
 Ping: to test whether a particular host is
reachable across an IP network.
 Tcpdump:to sniff network packets and make
some statistical analysis out of those dumps
The World Wide Web
 Collection of hyperlinked computer files on the Internet
 Client-server application
 Web servers
 Web browsers as clients
 WWW standards
 Hypertext markup language (HTML)
Current standard for writing Web pages
Implementation of SGML specifically for Web pages
Tags in HTML instruct the client browser how to format and
display the Web page content
Hypertext transfer protocol (HTTP)
Extensible markup language (XML)
Protocol that establishes a connection between Web server and
A meta-markup language
Gives meaning to the data enclosed within XML tags
Static versus Dynamic Web Pages
 HTML and XML only display and exchange data
 No interactivity; no processing of data
 Scripting languages
 Provides basic interactivity
Crawling text
 JavaScript
 VBScript
 Full-featured Web programming
 Java
 Client side scripting or browser side scripting
 Applets
 J2EE
 Common Gateway Interface (CGI)
 Allows passing of data between a static HTML page and a
computer program
Searching the WWW
 Most data on the Internet is part of the WWW
 Search engines – large databases that index WWW
 Building the search engine database
Submit a site to the search engine administrator for
Hypertext Transfer Protocol
 A protocol (syntax and semantics) for
transferring representations of resources
 usually across the Internet using TCP
 Design goals
 speed (stateless, cachable, few roundtrips)
 simplicity
 extensibility
 data (payload) independence
 A true network-based API
HTTP/0.9 (pre-1993)
 Absolute Simplicity
GET /url-path
<TITLE>Hello World</TITLE>
Hello World
 No Extensibility
 only one method (GET)
 no request modifiers
 no response metadata
HTTP/1.0 (1993-present)
 Simple and (mostly) Extensible
GET /Test/hello.html HTTP/1.0
Accept: text/html
User-Agent: GET/5 libwww-perl/0.40
HTTP/1.0 200 OK
Date: Fri, 12 Jan 1996 01:02:49 GMT
Server: Apache/1.0.5
Content-type: text/html
Content-length: 38
Last-modified: Wed, 10 Jan 1996 01:
Hello out there!
HTTP/1.0 Deficiencies
 No complete specification until end of `94
 No minimum standard for compliance
 Poor network behavior
one request per connection
no reliable transfer of dynamic content
no control over response caching
failed to anticipate proxies and gateways
created huge demand for vanity addresses
misuse/misunderstanding of MIME
 Culmination of two years work, RFC2068
 with Henrik Frystyk, Jim Gettys, Jeff
 designed at UCI and W3C; expanded in
 Improved Reliability
 chunked transfer of dynamic content
 recognition of proxy and gateway
 explicit cachability of responses
 Improved Network Behavior
 persistent connections
 virtual hosts (many names, one address)
HTTP/1.1 (1997-????)
 Less Simple, More Extensible, but Compatible
GET /Test/hello.html HTTP/1.1
User-Agent: GET/7 libwww-perl/5.40
HTTP/1.1 200 OK
Date: Fri, 07 Jan 1997 15:40:09 GMT
Server: Apache/1.2b6
Content-type: text/html
Transfer-Encoding: chunked
Etag: “a797cd-465af”
Cache-control: max-age=3600
Vary: Accept-Language
HTTP/1.x Deficiencies
 MIME is too verbose (overhead per message)
 Control mixed with metadata
 Metadata restricted to header or trailer
 Fixed request/response ordering can block
 Incurs frequent round-trip delays due to
connection establishment.
 Tokenized transfer of common fields
 reducing bandwidth usage, latency
 removal of MIME syntax limitations
 self-descriptive for extensions
 Multiplexing control, data, metadata streams
 reducing desire for multiple connections
 enabling multi-protocol connections
 per-stream priority or credit mechanism
 Layered streams for meta-metadata,
XML to the rescue?
 “X” for extensible:
 self-descriptive syntax
 semantics by reference (doctype,
 rendering by reference (style sheets)
 An XML representation is an object turned
inside-out, with behavior-by-reference
 However, network application performance
will demand standards for domain-specific
doctypes and style sheets
Future Work
 Dynamic application architectures
 Architectural analysis and performance
 Impact of future network architectures
 Balancing secure transfer with firewall
 Protocol for manipulating resource mappings
 HTTP-NG (W3C/Xerox PARC)