Measuring and Managing the Remote Client Perceived Response
Transcription
Measuring and Managing the Remote Client Perceived Response
Measuring and Managing the Remote Client Perceived Response Time for Web Transactions using Server-side Techniques David P. Olshefski Submitted in partial fulfillment of the requirements for the degree of Doctor of Engineering Science in the Fu Foundation School of Engineering and Applied Science COLUMBIA UNIVERSITY 2006 c 2006 David P. Olshefski All Rights Reserved ABSTRACT Measuring and Managing the Remote Client Perceived Response Time for Web Transactions using Server-side Techniques David P. Olshefski As businesses continue to grow their dependence on the World Wide Web, it is increasingly vital for them to accurately measure and manage response time of their Web services. This dissertation shows that it is possible to determine the remote client perceived response time for Web transactions using only server-side techniques and that doing so is useful and essential for the management of latency based service level agreements. First, we present Certes, a novel modeling algorithm, that accurately estimates connection establishment latencies as perceived by the remote clients, even in the presence of admission control drops. We present a non-linear optimization that models this effect and then we present an O(c) time and space online approximation algorithm. Second, we present ksniffer, an intelligent traffic monitor which accurately determines the pageview response times experienced by a remote client without any changes to existing systems or Web content. Novel algorithms for inferring the remote client perceived response time on a per pageview basis are presented which take into account network loss, RTT, and incomplete information. Third, we present Remote Latency-based Management (RLM), a system that controls the latencies experienced by the remote client by manipulating the packet traffic into and out of the Web server complex. RLM tracks the progress of each pageview download in real-time, as each embedded object is requested, making fine grained decisions on the processing of each request as it pertains to the overall pageview latency. RLM introduces fast SYN and SYN/ACK retransmission and embedded object rewrite and removal techniques to control the latency perceived by the remote client. We have implemented these mechanisms in Linux and demonstrate their effectiveness across a wide-range of realistic workloads. Our experimental results show for the first time that server-side response time measurements can be done in real-time at gigabit traffic rates to within 5% of that perceived by the remote client. This is an order of magnitude better than common application-level techniques run at the Web server. Our results also demonstrate for the first time how both the mean and the shape of the per pageview client perceived response time distribution can be dynamically controlled at the server complex. Contents List of Figures iv Acknowledgments Chapter 1 viii Introduction 1 1.1 Client Perceived Response Time . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Network Protocol Behavior . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Modeling/Managing Client Perceived Response Time Latencies . . . . . 13 1.4 Novel Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 2 Modeling Latency of Admission Control Drops 20 2.1 The Certes Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2 Mathematical Constructs of the Certes Model . . . . . . . . . . . . . . . 29 2.3 Fast Online Approximation of the Certes Model . . . . . . . . . . . . . . 44 2.3.1 Packet Loss in the Network . . . . . . . . . . . . . . . . . . . . 52 2.3.2 Client Frustration Time Out (FTO) . . . . . . . . . . . . . . . . . 52 2.3.3 SYN Flood Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.3.4 Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Certes Linux Implementation . . . . . . . . . . . . . . . . . . . . . . . . 55 2.4 i 2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.5.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . 62 2.5.2 Measurements and Results . . . . . . . . . . . . . . . . . . . . . 67 2.6 Certes Applied in Admission Control . . . . . . . . . . . . . . . . . . . . 73 2.7 Shortcomings of the (strictly) Queuing Theoretic Approach . . . . . . . . 76 2.8 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Chapter 3 Modeling Client Perceived Response Time 83 3.1 ksniffer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.2 ksniffer Pageview Response Time . . . . . . . . . . . . . . . . . . . . . 90 3.2.1 TCP Connection Setup . . . . . . . . . . . . . . . . . . . . . . . 92 3.2.2 HTTP Request . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.2.3 HTTP Response . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.2.4 Online Embedded Pattern Learning . . . . . . . . . . . . . . . . 98 3.2.5 Embedded Object Processing . . . . . . . . . . . . . . . . . . . 100 3.3 Packet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 3.4 Longest Prefix Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 113 3.5 Tracking the Most Active . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Chapter 4 Remote Latency-based Web Server Management 129 4.1 RLM Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . 132 4.2 RLM Pageview Event Node Model . . . . . . . . . . . . . . . . . . . . . 134 4.3 Connection Latency Management . . . . . . . . . . . . . . . . . . . . . 138 ii 4.4 Transfer Latency Management . . . . . . . . . . . . . . . . . . . . . . . 144 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 4.5.1 Response Time Distribution . . . . . . . . . . . . . . . . . . . . 150 4.5.2 Managing Connection Latency . . . . . . . . . . . . . . . . . . . 152 4.5.3 Managing Load and Admission Control . . . . . . . . . . . . . . 153 4.5.4 Managing Transfer Latency . . . . . . . . . . . . . . . . . . . . 160 4.6 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 4.7 Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Chapter 5 Related Work 183 5.1 Measuring Client Perceived Response Time . . . . . . . . . . . . . . . . 186 5.2 Latency Management using Admission Control . . . . . . . . . . . . . . 190 5.3 Web Server Based Approaches . . . . . . . . . . . . . . . . . . . . . . . 192 5.4 Content Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 5.5 TCP Level Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.6 Packet Capture and Analysis Systems . . . . . . . . . . . . . . . . . . . 201 5.7 Services On-demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 5.8 Stream-based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 5.9 Internet Standards Activity . . . . . . . . . . . . . . . . . . . . . . . . . 206 Chapter 6 6.1 Conclusion 209 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 iii List of Figures 1.1 Typical TCP client-server interaction. . . . . . . . . . . . . . . . . . . . 8 1.2 Effect of SYN drops on client perceived response time. . . . . . . . . . . 10 1.3 Downloading a container page and embedded objects over multiple TCP connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Breakdown of client response time. . . . . . . . . . . . . . . . . . . . . . 14 1.5 Pageview modeled as an event node graph. . . . . . . . . . . . . . . . . . 16 2.1 Typical TCP client-server interaction. . . . . . . . . . . . . . . . . . . . 22 2.2 Effect of SYN drops on client perceived response time. . . . . . . . . . . 23 2.3 Dropped SYN/ACK from server to client captured in SYN-to-END time. 26 2.4 Variance in RTT affects arrival time of retries. . . . . . . . . . . . . . . . 35 2.5 Initial connection attempts that get dropped become retries three seconds later. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 45 A second attempt at connection, that gets dropped, becomes a retry six seconds later. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.7 After three connection attempts the client gives up. . . . . . . . . . . . . 48 2.8 Relationship between incoming, accepted, dropped, completed requests. . 49 2.9 The smaller the interval, the more difficult to accurately discretize events. 50 iv 2.10 Addition of network SYN drops to the model. . . . . . . . . . . . . . . . 53 2.11 Certes implementation on a Linux Web server. . . . . . . . . . . . . . . . 56 2.12 TCP/IP connection establishment on Linux. . . . . . . . . . . . . . . . . 58 2.13 TCP/IP outbound data transmission on Linux. . . . . . . . . . . . . . . . 60 2.14 Experimental test bed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.15 Certes accuracy and stability in various environments. . . . . . . . . . . . 67 2.16 Certes response time distribution approximates that of the client for Tests D and G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.17 Certes online tracking of the client response time in Tests A and G. . . . . 70 2.18 Certes online tracking of the client response time in Test J, in on-off mode. 72 2.19 Web server control manipulating the Apache accept queue limit. . . . . . 75 2.20 Client response time increases as accept queue limit decreases. . . . . . . 75 2.21 Effect of SYN drop rate on client response time, as modeled as an M/M/1 queuing system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2.22 Modeling as an M/M/1 queuing system fails to accurately track client perceived response time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.23 Using a sliding window of drop probabilities fails to capture all the dependences between time intervals. . . . . . . . . . . . . . . . . . . . . . . . 2.24 Certes begins modeling at the 600th interval during a consistent load test. 78 79 2.25 Certes begins modeling at the 575th interval (in the middle of a peak) during a variable load test. . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 3.2 80 Downloading a container page and embedded objects over multiple TCP connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Multi-tiered server farm with a ksniffer monitor. . . . . . . . . . . . . . . 87 v 3.3 Typical libpcap based sniffer architecture (left) vs. the ksniffer architecture (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.4 Objects used by ksniffer for tracking. . . . . . . . . . . . . . . . . . . . . 91 3.5 HTTP request/reply. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.6 Downloading multiple container pages and embedded objects over multiple connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.7 Client active pageviews. . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.8 Network and server dropped SYNs. . . . . . . . . . . . . . . . . . . . . 109 3.9 Algorithm for detecting network dropped SYNs from captured SYNs. . . 111 3.10 Experimental test bed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.11 Test F, pageviews per second. . . . . . . . . . . . . . . . . . . . . . . . . 119 3.12 Test F, mean pageview response time. . . . . . . . . . . . . . . . . . . . 120 3.13 Client perceived response time on a per subnet basis. . . . . . . . . . . . 121 3.14 Test V, mean pageview response time. . . . . . . . . . . . . . . . . . . . 122 3.15 Test V, response time distribution. . . . . . . . . . . . . . . . . . . . . . 122 3.16 Test X, mean pageview response time. . . . . . . . . . . . . . . . . . . . 123 3.17 Live Internet Web site. . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 3.18 Apache measured response time, per URL. . . . . . . . . . . . . . . . . . 126 3.19 Apache measured response time for loner pages. . . . . . . . . . . . . . . 126 4.1 RLM deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.2 Breakdown of client response time. . . . . . . . . . . . . . . . . . . . . . 135 4.3 Pageview modeled as an event node graph. . . . . . . . . . . . . . . . . . 136 4.4 SYN drops at the server. . . . . . . . . . . . . . . . . . . . . . . . . . . 139 4.5 Second connection in page download fails. . . . . . . . . . . . . . . . . . 140 vi 4.6 Fast SYN retransmission. . . . . . . . . . . . . . . . . . . . . . . . . . . 143 4.7 Fast SYN/ACK retransmission. . . . . . . . . . . . . . . . . . . . . . . . 144 4.8 Cardwell et al. Transfer Latency Function f for 80 ms RTT and 2% loss rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4.9 Experimental test bed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 4.10 0.3 ms RTT, 0% loss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 4.11 80 ms RTT, 0% loss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 4.12 80 ms RTT, 4% loss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 4.13 Unmanaged heavy load. . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 4.14 MaxClients load shedding. . . . . . . . . . . . . . . . . . . . . . . . 155 4.15 Low priority penalties. . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.16 Improvement from applying fast SYN retransmission. . . . . . . . . . . . 158 4.17 Widening the think time gap. . . . . . . . . . . . . . . . . . . . . . . . . 159 4.18 Embedded object removal. . . . . . . . . . . . . . . . . . . . . . . . . . 160 4.19 Embedded object rewrite. . . . . . . . . . . . . . . . . . . . . . . . . . . 163 4.20 Applying predicted elapsed time. . . . . . . . . . . . . . . . . . . . . . . 164 4.21 Full vs. Empty, mean pageview response time. . . . . . . . . . . . . . . . 165 4.22 Steady state flow model. . . . . . . . . . . . . . . . . . . . . . . . . . . 168 4.23 Equation 4.27 for µ0 = λ0 = 1000. . . . . . . . . . . . . . . . . . . . . . 176 4.24 Equation 4.27 for µ0 = λ0 = 1000, zoomed in on minima. . . . . . . . . 177 4.25 /M/G/1 service latency 1 µ0 −µ overlayed onto Figure 4.24. . . . . . . . . . 178 4.26 Capacity model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 vii Acknowledgments My deepest gratitude goes toward Jason Nieh, my advisor, who led me through the dissertation. I will forever be amazed at his utter relentless pursuit of excellence. He consistently overturned every stone, demanded thorough explanation and objective evaluation of every idea, and developed ideas well beyond that which I would reach on my own. He has developed and shaped my entire outlook on how to perform solid, meaningful research within the field of computer science. He led by example, not only as a researcher, but also as a person, whom I respect and admire. I would like to thank my first advisor, Yechiam Yemini, who brought me into the Ph.D. program at Columbia University. I often think of YY and his insistence on developing ideas into something ‘big’. His imagination and ability to see far down the road ahead has always struck me as unique and uncanny. Likewise, many thanks to Henning Schulzrinne and Vishal Misra for taking the time and energy to be on my thesis committee, making the event memorable (even for a ‘hacker’ such as myself). There are many people at Columbia University who were a significant part of my graduate program. This includes my office mate Sushil DaSilva, who many times kept me laughing under the toils and stress of graduate life. Danilo Florissi was a mentor and friend, something I will always remember and treasure. Gong Su, Ioannis Stamos, Susan Tritto, Andreas Prodromidis, Maria Papadopouli (Bella Maria!), Alexandros Konstanti- viii nou, Martha Zadok, Ashutosh and Manu Dutta, Michael Grossberg, Ricardo Baratto, and Dan Phung have all helped me in numerous ways, leaving me with many fond memories as well. Lastly, I owe a tremendous debt of gratitude to those individuals at IBM who made this possible. Michael Karasick, my manager, was the first to support me in my quest for higher education, both through encouragement and financial support. Dinesh Verma continued to support my degree program over the course of many years. John Tracey not only showed great patience and understanding by allowing me to spend time on thesis related work at the office, but also took the time to participate as a member of my thesis committee. Erich Nahum, coauthor and committee member, will always be a friend and mentor. My thanks to Li Zhang for his insights on modeling the steady state behavior of TCP connection establishment. Dakshi Agrawal, my friend and coauthor, could you burn me a few more of those CDs? DAVID P. O LSHEFSKI COLUMBIA UNIVERSITY May 2006 ix To ‘Country Joe’ and all those like him. x CHAPTER 1. INTRODUCTION 1 Chapter 1 Introduction “A Web site that fails to deliver its content, either in a timely manner or not at all, causes visitors to quickly lose interest, wasting the time and money spent on the site’s development.” - Freshwater Software [147]. “Every Web usability study I have conducted since 1994 has shown the same thing: users beg us to speed up page downloads.” - J. Nielsen, “The Need for Speed” [113]. “Some users and applications drive the revenue of the business. If the system is slow, customers go elsewhere, and transactions or sales are lost forever.” - P. Sevcik, Business Communications Review [144]. “Forty one percent of consumers experiencing service failures at online retail sites say they will not shop at that site again.” - Boston Consulting Group [70]. “Users perceive slow Web sites as being less secure and, as a result, are less likely to CHAPTER 1. INTRODUCTION 2 make a purchase due to fear of credit card theft.” - Bhatti et al. [29]. These quotes and others like them indicate that Web sites need to manage their response times. The revenue generated by a Web site for a business depends on it - large amounts of revenue can be lost by a Web based company if the response time of its Web site is too slow. Customers get frustrated, and leave the site to conduct their business elsewhere. Worst yet, end users retain memories of such experiences and avoid slow Web sites when conducting future business. Once a customer is lost due to poor performance, he or she is difficult to regain. This translates into real dollars. This need to manage Web response time has shifted the focus of Web server performance from throughput and utilization benchmarks [106, 24, 112] to guaranteeing delay bounds for different classes of clients [94, 159, 86, 125, 53, 8, 123, 42, 30]. Key to the effective management of response time is the accurate measurement of response time. Although an accurate measurement of response time is highly valued, it is difficult to obtain. Inarguably, companies spend millions each year in an attempt to do so, which is evident in the industry which has sprung and flourished based on the promise of obtaining true, accurate response times [69, 88, 98, 59, 149]. Unfortunately, Web based companies are paying good money only to receive in return a poor substitute for the client perceived response time. Real-time management of Web response time requires that client response time measurements be available in real-time. Such real-time measurements can be an integral part of a closed-loop management system that manages resources or scheduling within the Web server complex. If the response time metrics are only available long after the transactions complete, then the control mechanism is obviously unable to affect the response time of those transactions. Likewise, both measurement of response time and the corresponding CHAPTER 1. INTRODUCTION 3 control mechanisms must be able to function at high traffic rates so as to be applicable in today’s Internet environment. The term itself, “response time”, has become diluted, meaning a variety of different metrics to different people. To the system administrator it usually means the per URL server response time. To the database administrator it means the per query response time. To the network administrator it relates to the amount of available bandwidth. These definitions only measure a portion of the client perceived response time. In addition, no standard definition for “client perceived response time” for Web transactions exists, nor is there an RFC working group in the process of creating such a standard definition. Existing research to date has misrepresented response time by incorrectly measuring or ignoring key latencies in the process of a Web request. Management mechanisms are then based on and validated against these inaccurate latency measurements. We expose these long standing shortcomings and present pragmatic solutions for addressing them. From the remote client perspective, there is only one measure of response time that matters: how long it takes to download a Web page along with all its embedded objects. This definition includes latencies associated with network delays, server delays, retrieval of embedded objects and the latencies experienced by the remote client due to his/her local machine. It is this definition of response time we seek to measure and manage: the per pageview response time, as perceived by the remote client. Capturing and managing the per pageview client perceived response time for a client that is located on the other side of the country (or planet) is non-trivial. This dissertation focuses on how to measure and manage the remote client perceived response time using only information available at the Web server. Our approach for tackling this problem is based on analyzing the packet streams into and out of the server complex. By reconstructing the activity across multiple network protocol layers, we are able to determine the CHAPTER 1. INTRODUCTION 4 per pageview response time, as perceived by the remote client who is physically located a great distance from the Web server. Our approach is non-invasive to existing systems - no modifications are required to the server complex or Web site content making deployment fast and simple. Being a server-side approach, the resulting measurements are available in real-time and are used to manage the response time as the Web page is being downloaded. 1.1 Client Perceived Response Time Measuring and managing the remote client perceived response time ought to account for all latencies associated with a pageview download, from the perspective of the remote client. We first present an anatomical view of the client-server behavior that occurs when a Web client accesses a remote Internet Web site, considering the behaviors and interactions between the four key entities involved: client, Web browser, network and server. Once a URL, such as http://www.cnn.com/index.html is entered into a Web browser [99, 110], the following ten steps occur to download and display the Web page: 1. URL parsing. The client browser parses the URL to obtain the name of the remote host, www.cnn.com, from which to obtain the Web page, /index.html. Web browsers maintain a cache of Web pages, so if the Web page is in cache and has not expired, processing can be performed locally and steps 2-7 below can be skipped. 2. DNS lookup. In order to contact the Web site (e.g., www.cnn.com), the browser must first obtain its IP address from DNS [100, 101]. Since the browser maintains a local cache containing the IP addresses of frequently accessed Web sites, contacting CHAPTER 1. INTRODUCTION 5 the DNS server for this information is only performed on a cache miss, which often implies that the Web site is being visited for the first time. 3. TCP connection setup. The client establishes a TCP connection with the remote Web server. Before a client can send the HTTP request to the Web server, a TCP connection must first be established via the TCP three-way handshake mechanism [8, 38]. First, the client sends a SYN packet to the server. Second, the server acknowledges the client request for connection by sending a SYN/ACK back to the client. Third, the client responds by sending an ACK to the server, completing the process of establishing a connection. Note that if the client’s Web browser already had an established TCP connection to the server and persistent HTTP connections [61, 28] are used, the browser may reuse this connection, skipping this step. 4. HTTP request sent. The browser requests the Web content, /index.html, from the remote site by sending an HTTP request over the established TCP connection. 5. HTTP request received. When the Web server machine receives an HTTP packet, the operating system determines which application should receive the message. The HTTP request is then passed to an HTTP server application, such as Apache, which is typically executing in user space. 6. HTTP request processed. The HTTP server application processes the request by obtaining the content either from a disk file, CGI script or other such program. 7. HTTP response sent. The HTTP server application passes the content to the operating system, which in turn, sends the content to the client. If the response is a disk file, the HTTP server may use the sendfile() system call to transfer the file to the client using only kernel level mechanisms. CHAPTER 1. INTRODUCTION 6 8. HTTP response processed. Upon receiving the response to the HTTP request, the client browser processes the Web content. If the content consists of an HTML page, the browser parses the HTML, identifies any embedded objects such as images, and begins rendering the Web page on the display. 9. Embedded objects retrieved. The browser opens additional connections to retrieve any embedded objects, allowing the browser to make multiple, simultaneous requests for the embedded objects. This parallelism helps to reduce overall latency. Depending on where the embedded objects are located, connections may be to the same server, other Web servers, or content delivery networks (CDNs). If the connections are persistent and embedded objects are located on the same server, then several embedded objects will be obtained over each connection (HTTP 1.1 [61] or HTTP 1.0 with KeepAlive [28]). Otherwise, a new connection will be established for each embedded object (HTTP 1.0 without KeepAlive [28]). 10. Rendering. Once all the embedded objects have been obtained, the browser can fully render the Web page on the display, within the browser window. As a remote client maneuvers through a Web site this process repeats itself for each new pageview being downloaded. We define a pageview as the collection of objects used for displaying a Web page, usually consisting of the container page (an HTML file) and any embedded objects (ie. gifs, jpgs, etc) that may be associated with the container page. A pageview, of course, may not have embedded objects, as is the case for a pageview consisting of a single postscript file. A complete measure of the time to download and display a pageview would account for the time spent across all ten steps. The only way to completely and accurately measure the actual client perceived response time is to measure the response time on the CHAPTER 1. INTRODUCTION 7 client machine within the Web browser itself. In addition, for such information to be of use to a Web site, the Web browser would need to support a mechanism for transmitting the response time measurements back to the Web site for use in verifying compliance with service-level objectives. Unfortunately, such browser functionality does not exist. As a result, several pragmatic approaches have been developed to determine client response time without requiring client browser modification. These approaches must be considered as methods to estimate rather than measure client perceived response time, though some may be more accurate than others. For example, measuring response time within the Web server from the time the request arrives (step 5) to the time the response is transmitted (step 7) ignores the connection establishment and network transfer latencies. We show in Chapter 2 this can be as much as an order of magnitude less than the response time perceived by the remote client. Likewise, the response time obtained via monitor machines, although covering steps 1 through 10, does not measure the response time for actual client transactions. The use of embedded JavaScript not only requires modifications to existing Web content but also fails to measure steps 1 through 8 (i.e. the download of the initial container page is not captured). In this dissertation, we present an approach for measuring all but the DNS lookup time and final rendering step using server-side mechanisms that do not require modifications to existing systems. In this work, we assume that URL parsing and web page rendering times are small and DNS lookups are generally cached to reduce their impact on response time. 1.2 Network Protocol Behavior Steps 1 through 10 describe the tasks performed by each key entity. Figure 1.1 illustrates how the tasks are performed by depicting the client and server interaction at a packet CHAPTER 1. INTRODUCTION 8 Client Server SYN J TCP connection establishment (3) Client Perceived Response Time SYN K, ack J+1 ack K+1 (4) GET index.html (5) (6) (7) Web Server Measured Response Time DATA (8) Figure 1.1: Typical TCP client-server interaction. level that occur in steps 3 through 8 over a single TCP connection. Figure 1.1 depicts the download of a pageview with no embedded objects; shortly we address the case where embedded objects are present. First, the TCP connection is established via the TCP threeway handshake mechanism, corresponding to step 3. Step 4 is depicted when the client transmits a GET request to the server. The GET request makes its way through the Internet and arrives at the server (step 5). The Web server reads, parses and processes the request (step 6). The Web server begins to transmit the response to the client in step 7. The response makes its way through the Internet and arrives at the remote client where it is processed and rendered by the browser (step 8). The response is not fully processed or rendered until all segments of the response are received by the client. The difficult challenge we address in this dissertation is how one can measure and manage the response time perceived by the remote client using events observed at the Web server. For example, the TCP three-way handshake that establishes the connection CHAPTER 1. INTRODUCTION 9 is processed entirely within the kernel, in the TCP stack. The HTTP GET request, on the other hand, is processed in user space by the HTTP server. A correlation between the activities in both kernel space and user space, across network protocol layers (IP, TCP and HTTP) is required. In addition, events occurring at the Web server are 1 2 RTT (Round Trip Time) different in time than when the remote client experiences it. For example, the remote client perceives the arrival of the response to a HTTP GET request 1 2 RTT after the server transmits the response. Packet loss in the network, TCP retransmissions, variance in RTT, time spent waiting in kernel queues, etc. all have an effect on the response time latency, yet are invisible to the user space HTTP server. Figure 1.2 shows the same client-server interaction as in Figure 1.1 but in the presence of SYN drops at the server. SYN drops at the server commonly occur under server overload or due to admission control [148]. When the initial SYN is dropped, the server does not transmit the corresponding SYN/ACK packet. As a result, the client-side TCP waits 3 s, then retransmits the initial SYN to the server. If this retransmitted SYN is also dropped, the client-side TCP waits another 6 s, then retransmits the SYN. If that SYN is dropped, then the client-side TCP waits another 12 s before retransmitting the SYN. This is the well known TCP exponential back-off mechanism [33, 127] which causes the client-side TCP to double the wait time between SYN retransmissions. In Figure 1.2, the resulting effect is a 21 s connection establishment latency. Figure 1.2 clearly shows that dropping a SYN at the server does not represent a denial of access but rather a delay in establishing the connection. In other words, dropping a SYN at the server simply reschedules the connection establishment for a (near) future moment in time. The key observation is that the latency due to SYN drops/retransmission is large relative to the time required to compose and transfer the HTTP response, and as such, will be the dominant factor in the overall client perceived response time. This ef- CHAPTER 1. INTRODUCTION 10 Client initial SYN to server Server SYN J wait 3 seconds x SYN J x wait 6 seconds SYN ‘s dropped by server SYN J x wait 12 seconds Client Perceived Response Time SYN J TCP Connection Established SYN accepted SYN K, ack J+1 ack K+1 GET index.html DATA Web Server Measured Response Time Figure 1.2: Effect of SYN drops on client perceived response time. fect is exacerbated under HTTP 1.0 without KeepAlive where each individual Web object requires the establishment of a separate TCP connection. Capturing this effect is crucial when measuring and managing response time - yet it has been ignored by other approaches [42, 43, 54, 89, 92, 91]. Server-side admission control mechanisms which perform load shedding by explicitly dropping connection requests ignore the effect that the dropped SYNs have on the response time perceived by the remote client. Only the response time for the accepted connections, when they are accepted, is collected and reported. The latencies associated with the SYN drops/retransmissions that CHAPTER 1. INTRODUCTION 11 occur prior to successful connection establishment are ignored. Any admission control mechanism which explicitly drops SYNs but then ignores these effects underestimates the client perceived response time and misrepresents the number of rejected requests. Modeling and quantifying the effect that SYN drops have on the client perceived response time is a major contribution of this dissertation. SYN drop latencies have significant implications not only for the individual client experiencing them but also for aggregating latency metrics. Often it is the mean, median or 95th percentile of the response time that is used in specifying the service level objective. For example, a service level objective may state that “the mean response time for high priority customers must be below 3 s” or that “the 95th percentile of the response time be below 5 s”. We will show in later chapters that the shape of the distribution of the client perceived response time is significantly impacted, in predictable ways, by SYN drop latencies. Prior work that ignores the shape of the response time distribution and validates an approach using only the mean response time misrepresents the effectiveness of their technique. Lastly, it is important to not lose sight of how Web browsers behave and what the client actually sees in the Web browser. Existing Web browsers open multiple connections to obtain embedded objects, in parallel, from the Web server. Where as Figure 1.1 depicts the client-server interaction over a single TCP connection, Figure 1.3 depicts the pageview download of a container page plus embedded objects over multiple TCP connections. The beginning of the client perceived response time is shown as the moment the initial SYN packet is transmitted from the client (t0 ), and the end of the response time is defined as the moment at which the client receives the last byte for the last embedded object within the page (te ); the client perceived response time is therefore calculated as te − t0 . The connection establishment latencies depicted in Figure 1.2 is observed by the CHAPTER 1. INTRODUCTION Client t0 12 Server SYN J ck J+1 SYN K, a ack K+1 GET index .html GET obj3.g if GET obj8.g if GET obj2.g if Client SYN M Server M+1 SYN N, ack ack N+1 G E T o bj1.gif ti tj tx ty GET obj4.gif te Figure 1.3: Downloading a container page and embedded objects over multiple TCP connections. remote client differently depending on which connection is experiencing the SYN drops. If the first connection experiences SYN drops, the Web browser will display a message ‘Connecting to server...’ in the progress text message bar while the TCP exponential backoff mechanism is in effect. This is not the case when the SYN drops occur on the second connection. Likewise, this is significantly different than for the case when the connection is immediately established but the server then takes a significant amount of time to compose the response. In this case, the progress text message bar displays ‘Waiting for response...’ and, as such, the client may be willing to wait longer CHAPTER 1. INTRODUCTION 13 for a response knowing that the server is busy working on a reply. Likewise, if portions of the pageview are slowly being obtained and rendered, a client may hit the stop button on the Web browser after enough of the pageview has been displayed. We refer to this as a partial page download. Facts such as these are important for a latency management system to understand - admission control drops ought to be coordinated given the current number of established connections the browser is maintaining. An effective server-side approach for measuring and managing the per pageview client perceived response time as depicted in Figure 1.3 must examine the activity between client and server at the packet level. Only by tracking activity at the packet level is it possible to capture the key latencies involved with each task in Steps 3 through 10. This requires tracking, reconstructing and correlating the activity across multiple protocols (IP, TCP and HTTP), over multiple connections. Determining which embedded objects belong to which pageviews and handling dropped SYNs and how they impact response time when they occur on either of the two connections depicted in Figure 1.3 is also required. An important novel contribution of this dissertation is to provide server-side mechanisms for solving these problems under high bandwidth rates, online, in real-time, in the presence of loss and incomplete information. 1.3 Modeling/Managing Client Perceived Response Time Latencies A key observation is that whatever measure of latency a management approach relies on becomes the latency which gets managed. Therefore, if a management approach simply tracks the server response time for a single URL (e.g. ty − tx in Figure 1.3), then this, of course, is what the management technique ends up actually controlling. In this dissertation CHAPTER 1. INTRODUCTION Client Server t0 Tconn 14 SYN J ck J+1 SYN K, a ack K+1 GET index .html Tserver Ttransfer Trender GET obj3 .gif Client Tconn Tserver Server M+1 SYN N, ack ack N+1 GET obj6.gif Ttransfer Trender SYN M GET obj8.g if Tserver Tserver Ttransfer te Ttransfer Figure 1.4: Breakdown of client response time. we take a per pageview approach to response time management instead of a per URL approach. Figure 1.4 depicts the response time of te − t0 for the pageview download of index.html which embeds obj3.gif, obj6.gif and obj8.gif. The figure is annotated with the following terms: 1. Tconn TCP connection establishment latency, using the TCP 3-way handshake. Begins when the client sends the TCP SYN packet to the server. 2. Tserver latency for server complex to compose the response by opening a file, or calling a CGI program or servlet. Begins when the server receives an HTTP request from the client. CHAPTER 1. INTRODUCTION 15 3. Ttransf er time required to transfer the response from the server to the client. Begins when the server sends the HTTP request header to the client. 4. Trender time required for the browser to process the response, such as parse the HTML or render the image. Begins when the client receives the last byte of the HTTP response. These four latencies are serialized over each connection and delimited by a specific event. As such, a pageview download can be viewed as a set of well defined activities required to complete the pageview. Figure 1.5 depicts the download of Figure 1.4 as an event node graph, where each node represents a state, and each link indicates a precedence relationship and is labeled with the transition activity. The nodes in the graph are ordered by time and each node is annotated with the elapsed time from the start of the transaction. Each activity contributes to the overall response time; certain activities overlap in time, some activities have greater potential to add larger latencies than others, some activities are on the critical path and some activities are more difficult to control than others. It is exactly this perspective which is missing in management systems to date. Existing systems fail to see the context in which an individual URL is being requested and its relationship to the overall pageview latency. Systems perform admission control on individual URL requests or classify all URL requests from the same client into the same class without regard as the current state of the pageview download. Managing the critical path high latency activities, in the context of the current state of the pageview download, is a novel approach presented in this dissertation. We present a system capable of tracking a pageview download, online as it occurs, and performing packet manipulations that manage the response time perceived by the remote client. These techniques are applied in the context of an individual page download to achieve a specified absolute response time goal, and can be triggered by an CHAPTER 1. INTRODUCTION 16 0ms 1 SYN arrival Tconn 75ms 2 GET index.html Tserver 884ms 3 index.html response header Ttransfer 1375ms 4 Index.html response complete Trender 1385ms 5 browser parsed index.html GET obj3.gif 6 1410ms 1410ms 7 SYN arrival Tserver obj3.gif response header 9 1710ms Tconn 1485ms 8 Ttransfer obj3.gif response complete 10 2533ms 1785ms 15 obj6.gif response header Trender GET obj8.gif 11 2543ms Ttransfer 3426ms 16 obj6.gif response complete Tserver obj8.gif response header 12 2833ms GET obj6.gif Tserver Trender 3436ms 17 obj6.gif response processed Ttransfer obj8.gif response complete 13 3297ms Trender obj8.gif response processed 14 3307ms 18 3436ms Figure 1.5: Pageview modeled as an event node graph. elapsed time threshold or a prediction of the latency associated with an embedded object. The rest of this dissertation is outlined as follows. In Chapter 2 we present Certes [116, 117], our initial research into understanding and quantifying the effects of SYN drops on TCP connection establishment latency. The result is a novel modeling algorithm capable of determining the effect of SYN drops on TCP connection establishment using only a set of simple counters. Certes was implemented within the Linux kernel on the CHAPTER 1. INTRODUCTION 17 Web server and shown to be highly accurate for a variety of workloads, under conditions commonly found in the Internet. In Chapter 3 we present ksniffer [118], which is a novel intelligent traffic monitor capable of determining the per pageview response times experienced by a remote client. Implemented as an appliance that is placed in front of the Web server it uses novel algorithms for inferring the remote client perceived response time on a per pageview basis, online in real-time. In chapter 4 we present Remote Latency-based Management (RLM) [115], which manages the latencies experienced by the remote client by manipulating the packet traffic into and out of the Web server complex. RLM tracks the progress of each page download in real-time, as each embedded object is requested, allowing us to make fine grained decisions on the processing of each request as it pertains to the overall pageview latency. Related work is presented in Chapter 5. The last chapter of this dissertation consists of concluding remarks and directions for future work. 1.4 Novel Contributions This dissertation addresses the challenge of measuring and managing per pageview response time, as perceived by the remote client, using only information observed at the Web server. We delve into the issues related to admission control drops and their effect on response time latency as well as server-side packet-level manipulation techniques for managing client perceived response times. Many of our techniques can generalize to the problem of determining transaction level latencies for protocols that consist of a set of implicitly correlated requests. Specifically, in the context of HTTP over TCP/IP this dissertation presents: 1. An understanding and quantification of the effects of SYN drops on TCP connection establishment latency. We present a non-linear optimization model of the effect of CHAPTER 1. INTRODUCTION 18 the TCP exponential backoff mechanism on TCP connection establishment latency, then devise an O(c) online algorithm for approximating the non-linear model. We present a kernel level design and implementation of the fast online algorithm. We experimentally validate the fast online algorithm showing results accurate to within 5% of the latencies measured at the client. 2. An architecture, design and implementation of a scalable, high speed traffic monitor capable of determining the remote client perceived response time on a per pageview basis. We present online algorithms for correlating HTTP requests over multiple TCP connections in the presence of ambiguity and incomplete information. This includes an algorithm for online, incremental, embedded pattern learning in the presence of ambiguity and incomplete information and an algorithm for inferring the existence of SYN or SYN/ACK packets that are dropped in the network and never captured by the traffic monitor. We experimentally validate our implementation of the high speed traffic monitor showing that it is possible to determine the remote client perceived response time at near gigabit rates to within 5% error by only analyzing the packet stream into and out of the Web server complex. 3. The design and implementation of an inline traffic monitor capable of managing the client perceived response time through manipulation of the packet stream that exists between the remote client and Web server. We present a novel model for tracking an individual pageview download which is based on modeling the pageview activities as an event node graph. We present a study which examines how common Web browsers behave under certain failure conditions such as admission control SYN drops and how this affects the response time perceived by the client. This led us to two novel techniques, Fast SYN and Fast SYN/ACK retransmission for managing the CHAPTER 1. INTRODUCTION 19 latencies associated with admission control drops. We experimentally validate the event node model showing that it is possible to manage the shape of the response time distribution using the Fast SYN and Fast SYN/ACK retransmission techniques. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 20 Chapter 2 Modeling Latency of Admission Control Drops Certes (CliEnt Response Time Estimated by the Server) represents the results of our initial investigation into measuring TCP latencies, with a focus toward the behavior and effects of the TCP exponential backoff mechanism on TCP connection establishment. The result was a novel, online mechanism that accurately estimates mean client perceived response time, using only information available at the Web server. Certes combines a model of TCP retransmission and exponential back-off mechanisms with three simple server-side measurements: connection drop rate, connection accept rate, and connection completion rate. The model and measurements are used to quantify the time due to failed connection attempts and determine their effect on mean client perceived response time. Certes then measures both time spent waiting in kernel queues as well as time to retrieve requested Web data. It achieves this by going beyond application-level measurements to using a kernel-level measure of the time from the very beginning of a successful connection until it is completed. Existing admission control mechanisms which perform service differen- CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 21 tiation based on connection throttling have failed to address these affects. Our approach does not require probing or third party sampling, and does not require modification of Web pages, HTTP servers, or client-side modifications. Certes uses a model that is inherently able to decompose response time into various server and network components to help determine whether server or network providers are responsible for performance problems. Certes can be used to measure response times for any Web content, not just HTML. Certes runs online in constant time with very low overhead and can be used at Web sites and server farms to verify compliance with service level objectives. Figure 2.1 depicts the typical TCP interaction between the remote client and server while Figure 2.2 depicts the same situation under conditions of server SYN drops (due to admission control or overload). Modeling the latency associated with the dropped SYNs, depicted as CONN-FAIL in Figure 2.2, is the main contribution of Certes. The problem arises when a server drops a SYN: no information concerning the individual SYN drop is maintained. Therefore, when a server accepts a SYN and processes a connection, the server is unaware of how many failed connection attempts have been experienced by the client prior to this successful attempt. Maintaining state for each SYN drop is not an option for scalability reasons. This would require the server to commit a portion of memory to track each SYN drop for a significant period of time (up to 10’s of seconds). Overload conditions or transient spikes could exhaust memory. In addition, this would make the server vulnerable to SYN flood attacks, where malicious remote clients send large amounts of SYN packets to the server without the intent of ever establishing a connection. In either case the server would be tying up large amounts of memory for unestablished connections at the exact point in time in which the server is already over-utilized. Certes models the CONN-FAIL latency without maintaining state for each individual SYN drop. We have implemented Certes and verified its response time measurements against CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS Client 22 Server SYN J TCP connection SYN K, ack J+1 Client Perceived Response Time ack K+1 GET index.html bi-directional data transfer SYN-to-END Server Measured Response Time DATA FIN M ack M+1 TCP termination FIN N ack N+1 Figure 2.1: Typical TCP client-server interaction. those obtained via detailed client-side instrumentation. Our results demonstrate that Certes provides accurate server-based measurements of mean client response times in HTTP 1.0/1.1 environments, even with rapidly changing workloads. Our results show that Certes is particularly useful under overloaded server conditions when Web server applicationlevel and kernel-level measurements can be grossly inaccurate. We further demonstrate the need for Certes measurement accuracy in Web server control mechanisms that manipulate inbound kernel queuing or that perform admission control to achieve response time goals. This chapter is outlined as follows. Section 2.1 presents an overview of the Certes approach followed by Section 2.2 which presents the detailed mathematical construction of the Certes model. Section 2.3 presents a fast online approximation of the non-linear maximization presented in Section 2.2. Section 2.4 describes our implementation of Certes within the Linux kernel. Section 2.5 presents experimental results demonstrating the ef- CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS Client initial SYN to server 23 Server SYN J wait 3 seconds x SYN J x wait 6 seconds SYN ‘s dropped by server x SYN J CONN-FAIL connection failure latency wait 12 seconds Client Perceived Response Time SYN J TCP Connection SYN accepted SYN K, ack J+1 ack K+1 GET SYN-to-END Server Measured Response Time DATA FIN M ack M+1 TCP Termination FIN N ack N+1 Figure 2.2: Effect of SYN drops on client perceived response time. fectiveness of Certes in estimating mean client perceived response time at the server with various dynamic workloads for both HTTP 1.0/1.1. Section 2.6 presents the problem associated with existing admission control mechanisms and how Certes effectively solves it. Section 2.7 compares Certes to the less effective queuing theoretic approach and Section 2.8 shows that Certes converges rapidly. Before ending this chapter we summarize our CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 24 findings. 2.1 The Certes Model The novel contribution of Certes is to provide a server-side measure of mean client perceived response time that includes the impact of failed TCP connection attempts on Web server performance. To simplify our discussion and focus on the issue of failed TCP connection attempts, we make the following assumptions: 1. We focus on measuring the response time due to TCP connection setup through retrieving embedded objects, steps 3 through 9 in Chapter 1. We do not consider steps 1, 2, and 10. We assume that URL parsing and Web page rendering times are small and DNS lookups are generally cached to reduce their impact on response time. 2. We focus on determining the contribution to client perceived response time due to the performance of a given Web server. We do not quantify delays that may be due to Web objects residing on other servers or CDNs. 3. We limit our discussion to an estimate of response time based on the duration of a TCP connection. For non-persistent connections where each HTTP request uses a separate TCP connection, this estimate corresponds to measuring the response time for individual HTTP requests. For persistent connections where multiple HTTP requests may be served over a single connection, this estimate may include the time for multiple requests. Since a Web page with embedded objects requires multiple HTTP requests in order to be fully displayed, determining the response time for downloading a Web page requires correlating the response times of multiple HTTP CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 25 requests which we discuss further in Chapter 3. Given these assumptions, a measure of client-perceived response time should include the time starting from when the first SYN packet is sent from the client to the server until the last HTTP response data packet is received from the server by the client. For a given connection, we define CONN-FAIL as the time between when the first SYN packet is sent from the client and when the last SYN packet is sent from the client (Figure 2.2). This is the time due to failed TCP connection attempts. When there are no failed connection attempts, CONN-FAIL is zero. For a given connection, we define SYN-to-END as the time between when the server receives the last SYN packet until the time when the server sends the last data packet. This is essentially the server’s perception of response time in the absence of SYN drops. The client perceived response time is equal to CONN-FAIL and SYN-to-END plus one round trip time (RTT) to account for the time it takes to send the SYN packet from the client to the server plus the time it takes to send the last data packet from the server to the client. The client perceived response time over the connection is: CLIENT RT = CONN-FAIL + SYN-to-END + RT T (2.1) Determining client perceived response time then reduces to determining CONN-FAIL, SYN-to-END, and RTT. Note that any failure to complete the 3-way handshake after the SYN is accepted by the server is captured by SYN-to-END. For example, delays caused by dropped SYN/ACKs from the server to the client (the second part of the 3-way handshake) are accounted for in the SYN-to-END time (as shown in Figure 2.3). The equation also holds if the server terminates the connection before sending any data by sending a FIN or RST. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS Client initial SYN to server 26 Server SYN J wait 3 seconds x SYN J x wait 6 seconds SYN ‘s dropped by server SYN J x wait 12 seconds Client Perceived Response Time SYN J SYN accepted SYN K, ack J+1 x wait 3 seconds SYN K, ack J+1 x wait 6 seconds SYN K, ack J+1 ack K+1 SYN-to-END Server Measured Response Time GET DATA Figure 2.3: Dropped SYN/ACK from server to client captured in SYN-to-END time. Determining the SYN-to-END component of the client perceived response time is relatively straightforward. The SYN-to-END time can be decomposed into two components: the time taken to establish the TCP connection after receiving the initial SYN, and the time taken to receive and process the HTTP request(s) from the client. In certain circumstances, for example when the Web server is lightly loaded and the data transfer is large, the first component of the SYN-to-END time can be ignored, and the second component can be used as an approximation to the processing time spent in the application-level CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 27 server. In such cases, measuring the processing time in the application-level server can provide a good estimate of the SYN-to-END time. In general, the processing time in the application-level server is not a good estimate of the SYN-to-END time. If the Web server is heavily loaded, it may delay sending the SYN/ACK back to the client, or it may delay delivering the HTTP request from the client to the application-level server. In such cases, the time to establish the TCP connection may constitute a significant component of the SYN-to-END time. Thus, to obtain an accurate measure of the SYN-to-END time, measurements must be done at the kernel level. A simple way to measure SYN-to-END is by timestamping in the kernel when the last SYN packet is received by the server and when the last data packet is sent from the server. If the kernel does not already provide such a packet timestamp mechanism, it can be added with minor modifications. Section 2.4 describes in further detail how we measured SYN-to-END for our Certes Linux implementation. Determining the RTT component of the client perceived response time is also relatively straightforward. RTT can be determined at the server by measuring the time from when the SYN/ACK is sent from the server to the time when the server receives the ACK back from the client. The RTT time measured in this way includes the time spent by the client in processing the SYN/ACK and preparing its reply - as is the norm for TCP. Other approaches for estimating RTT can also be used [7]. For both SYN-to-END and RTT measurements, the kernel at the Web server must provide the respective timestamps. As discussed in Section 2.4, these timestamps can be added with minor modifications. However, determining CONN-FAIL is a difficult problem. The problem is that when a server accepts a SYN and processes the connection, the server is unaware of how many failed connection attempts have been made by the client prior to this successful attempt. The TCP header [127] and the data payload of a SYN packet do not provide any CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 28 indication of which attempt the accepted SYN represents. As a result, the server cannot examine the accepted SYN to determine whether it is an initial attempt at connecting, or a first retry at connecting, or an N th retry at connecting. Even in the cases where the server is responsible for dropping the initial SYN and causing a retry, it is difficult for the server to remember the time the initial SYN was dropped and correlate it with the eventually accepted SYN for a given connection. For such a correlation, the server would be required to retain additional state for each dropped SYN at precisely the time when the server’s input network queues are probably near capacity, which could result in performance scalability problems for the server. In Chapter 3 we present an approach that is able to maintain the persistent state of each connection request by offloading this processing from the Web server to a separate appliance. Certes solves this problem by taking advantage of two properties of server mechanisms for supporting SYNs. First, since the server cannot distinguish between whether a SYN packet is an initial attempt or N th retry, it must treat them all equally. Second, it is easy for a server to simply count the number of SYNs that are dropped versus accepted since it only requires a small amount of state. As a result, Certes can compute the probability that a SYN is dropped and apply that probability equally to all SYNs during a given time period to estimate the number of SYN retries that occur. This information is then combined with a understanding of the TCP exponential backoff mechanism to correlate accepted SYNs with the number of SYN drops that occurred to determine how many retries were needed before establishing a connection. Certes can then determine CONN-FAIL based on how many retries were needed and the amount of time necessary for those retries to occur. In particular, due to TCP timeout and exponential backoff mechanisms specified in RFC 1122 [33], the first SYN retry occurs 3 seconds after the initial SYN, the second SYN retry occurs 6 seconds after the CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 29 first retry, the third SYN retry occurs 12 seconds after the second retry, etc. Certes does assume that all clients adhere to this exact exponential behavior on SYN retries from RFC 1122. This is a reasonable assumption given that RFC 1122 is supported by all major operating systems, including Microsoft operating systems [99], Linux [133], FreeBSD [65], NetBSD 1.5 [108], AIX 5.x, and Solaris. OneStat.com [119] estimates that 97.46% of the Web server accesses on the Internet are from users running a Windows operating system. They attribute the rest to Macintosh and Linux users (1.43% and .26%, respectively). Section 2.2 presents a detailed step-by-step construction of the Certes model. In particular, we discuss the impact of the variance of RTT on when retries arrive at the server and how Certes accounts for this variability. Section 2.3 describes how the Certes model can be implemented efficiently, yielding good response time results. 2.2 Mathematical Constructs of the Certes Model Certes determines the mean client perceived response time by accounting for CONN-FAIL using a statistical model that estimates the number of first, second, third, etc., retries that occur during a specified time interval. Certes divides time into discrete intervals for grouping connections by their temporal relationship. Without loss of generality, we will assume that time is divided into one second intervals, but in general any interval size less than the initial TCP retry timeout value of three seconds may be used. For ease of exposition, let m = 3 be the number of discrete time intervals that occur during the initial TCP retry timeout value of three seconds. Certes determines the number of retries that occurred before a SYN is accepted by using simple counters to take three aggregate server-side measurements for each time interval. The measurements are: CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS DROP P EDi 30 the total number of SYN packets that the server dropped during the ith interval. ACCEP T EDi the total number of SYN packets that the server did not drop during the ith interval. COMP LET EDi the total number of connections that completed during the ith interval. Using these three measurements, we can compute for a given interval the offered load at the server, which is the number of SYN packets arriving at the server. The offered load in the ith interval is: OF F ERED LOADi = ACCEP T EDi + DROP P EDi (2.2) Certes decomposes each of these measured quantities, OF F ERED LOADi , DROP P EDi, ACCEP T EDi, and COMP LET EDi as a sum of terms that have associations to connection attempts. Let Rij be the number of SYNs that arrived at the server as a j th retry during the ith interval, starting with Ri0 as the number of initial attempts to connect to the server during interval i. Let Dij be the number of SYNs that arrived at the server as a j th retry during the ith interval but were dropped by the server. Let Aji be the number of SYNs that arrived at the server as a j th retry during the ith interval and were accepted by the server. Let Cij be the number of connections completed during the ith interval that were accepted by the server as a j th retry. Let k be the maximum number of retries attempted by any client. For each interval i we have the following decomposition: CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS OF F ERED LOADi = DROP P EDi = ACCEP T EDi = COMP LET EDi = Pk j=0 Rij Pk Dij Pk Cij j=0 Pk 31 (2.3) j j=0 Ai j=0 For each time interval i, Certes determines the mean client perceived response time for those Web transactions that are completed during the time interval. This includes both connections that are completed during the time interval as well as connections that give up during the interval after exceeding the maximum number of retries attempted by the client. COMP LET EDi is the number of transactions that completed during the ith interval and Rik+1 is the number of clients that gave up during the interval. Applying Equation 2.1 to a time interval, Certes computes the mean client response time for the ith interval as: CLIENT RTi = Rik+1 · 3[2k+1 − 1] + Pk j=1 Cij · 3[2j − 1] + P SYN-to-END + COMP LET EDi + Rik+1 P RT T (2.4) Equation 2.4 essentially divides the sum of the response times by the number of transactions to obtain mean response time. In the denominator, Equation 2.4 sums the total number of transactions that completed and clients that gave up. In the numerator, there are four terms summed together. The first term Rik+1 · 3[2k+1 − 1] is the amount of time that clients waited before giving up based on the TCP exponential backoff mechanism. The P second term kj=1[Cij · 3[2j − 1]] represents the total CONN-FAIL time experienced by P those clients that completed in the ith interval. The third term SYN-to-END is the sum of the measured SYN-to-END times for all transactions completed in the ith interval. The P fourth term RT T is the sum of one round trip time for all transactions completed during CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 32 the ith interval. For example, if k = 2, then Equation 2.4 reduces to: CLIENT RTi = P P SYN-to-END + RT T + 21Rik+1 + 9Ci2 + 3Ci1 COMP LET EDi + Rik+1 (2.5) Ci1 indicates the number of clients that waited an additional 3 seconds due to a SYN drop, Ci2 is the number of clients that waited an additional 9 seconds due to two SYN drops, and Rik+1 is the number of clients that gave up after waiting 21 seconds. To compute the mean client perceived response time for each interval, Certes uses Equation 2.3 to derive the values of Cij and Rik+1 from the measured quantities OF F ERED LOADi , DROP P EDi, ACCEP T EDi, and COMP LET EDi . We start from the observation that the TCP header [127] and the data payload of a SYN packet do not provide any indication of which connection attempt a dropped SYN represents. As a result, the server’s TCP implementation cannot distinguish a SYN packet containing a j th SYN retry from a SYN packet containing a k th SYN retry. This implies that all types of SYN packets are dropped or accepted with equal probability. The mean SYN drop rate at the server for the ith interval can be computed from OF F ERED LOADi and DROP P EDi: DRi = DROP P EDi/OF F ERED LOADi (2.6) A key hypothesis of Certes is that the drop rate must therefore be equal for all Rij in the ith interval. This results in the following relations between Rij and Dij : CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 33 Di0 = DRi · Ri0 Di1 = DRi · Ri1 Di2 = DRi · Ri2 .. . (2.7) Dik = DRi · Rik Each individual connection that completes during the ith interval was accepted during the (i − SYN-to-END)th interval. Because each connection may have a different SYN-to-END time, connections that complete during the ith interval may have been accepted during different intervals. Let ACCEP T EDp,i be the number of connections that were accepted during the pth interval and completed during the ith interval. Therefore, COMP LET EDi = X ACCEP T EDp,i (2.8) p Let ACCEP T EDp,i = k X Ajp,i (2.9) j=0 where Ajp,i is the number of SYNs that were accepted during the pth interval as a j th retry and completed during the ith interval. Therefore, Cij = X Ajp,i (2.10) p As mentioned above, when a server accepts a SYN and processes the connection, the server is unaware of how many failed connection attempts have been made by the client prior to this successful attempt. Therefore, there is no direct method for determining the CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 34 number of retries associated with a specific connection. As such, there is no direct method for obtaining Ajp,i . We estimate the value of Ajp,i from the ratio of Ajp to ACCEP T EDp : Ajp,i Ajp = · ACCEP T EDp,i ACCEP T EDp (2.11) Since the SYNs that do not get dropped get accepted, Equation 2.7 implies that Aji is: Aji = Rij − Dij = Rij − [DRi · Rij ] (2.12) Combining Equations 2.11 and 2.12 allows us to rewrite Equation 2.10 as: Cij = X Rpj − [DRp − Rpj ] p ACCEP T EDp · ACCEP T EDp,i (2.13) Equation 2.13 solves for Cij in terms of Rpj , DRp and ACCEP T EDp,i. We can substitute Equation 2.13 into our equation for calculating CLIENT RTi , effectively removing Cij from Equation 2.4. We now turn our attention to solving for Rij . Drops occurring during the ith interval return as retries in future intervals. Based on the TCP exponential backoff mechanism, the timing of the return depends on whether it was an initial SYN, a 1st retry, a 2nd retry, etc. As a result, the number of retries arriving during the ith interval is a function of the number of drops that occurred in prior intervals: Ri1 0 = Di−m Ri2 1 = Di−2m Ri3 .. . 2 = Di−4m k Rik+1 = Di−2 k−1 m (2.14) CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS Client initial SYN to server 35 Server SYN J 50ms wait 3 seconds x SYN J SYN ‘s dropped by server 200ms wait 6 seconds x SYN J 100ms SYN accepted Figure 2.4: Variance in RTT affects arrival time of retries. Equation 2.14 assumes that retries arrive at the server exactly when expected based on the TCP specification (i.e. in 3 seconds, 6 seconds, etc). Due to variance in RTT, this assumption may not hold in practice. Such a scenario is shown in Figure 2.4, where the network delay changes between connection attempts for a specific client. This has the effect of skewing the estimates for Rij , since retries may not always arrive at the server exactly when expected (i.e. in 3 seconds, 6 seconds, etc). Note that it is the variance in RTT for a specific client that affects the model and not the differences in RTT between clients. For example, the server will observe the 3 second, 6 second, 12 second, etc. retry delay for each client with a consistent RTT, regardless of the magnitude of the RTT. This effect can be accounted for by treating Rij as a weighted distribution over the j Dij of past intervals instead of just using a single interval. Let Wp,i be the portion of Dpj that will return as Rij+1 . The following holds: 1= X i j Wp,i (2.15) CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 36 Using these weights, we can modify Equation 2.14 so that Rij is a combination of drops occurring in a small set of prior intervals, rather than the number of drops that occurred in one specific prior interval: 0 0 Ri1 = · · · + [Wi−m−1,i · Di−m−1 ]+ 0 [Wi−m,i 0 · Di−m ]+ 0 0 [Wi−m+1,i · Di−m+1 ]+··· 1 1 Ri2 = · · · + [Wi−2m−1,i · Di−2m−1 ]+ 1 [Wi−2m,i 1 · Di−2m ]+ 1 1 [Wi−2m+1,i · Di−2m+1 ]+··· 2 2 Ri3 = · · · + [Wi−4m−1,i · Di−4m−1 ]+ 2 [Wi−4m,i .. . 2 ]+ · Di−4m 2 2 [Wi−4m+1,i · Di−4m+1 ]+··· k−1 k−1 Rik = · · · + [Wi−2 k−1 m−1,i · Di−2k−1 m−1 ]+ k−1 [Wi−2 k−1 m,i k−1 · Di−2 k−1 m ]+ k−1 k−1 [Wi−2 k−1 m+1,i · Di−2k−1 m+1 ] + · · · (2.16) CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 37 j Equation 2.7 allows us to rewrite Equation 2.16 in terms of DRi , Wp,i and Rij by substituting DRi · Rij for Dij : 0 0 Ri1 = · · · + [Wi−m−1,i · DRi−m−1 · Ri−m−1 ]+ 0 [Wi−m,i · DRi−m 0 · Ri−m ]+ 0 0 [Wi−m+1,i · DRi−m+1 · Ri−m+1 ]+··· 1 1 Ri2 = · · · + [Wi−2m−1,i · DRi−2m−1 · Ri−2m−1 ]+ 1 [Wi−2m,i · DRi−2m 1 · Ri−2 ]+ 1 1 [Wi−2m+1,i · DRi−2m+1 · Ri−2m+1 ] +··· 2 2 Ri3 = · · · + [Wi−4m−1,i · DRi−4m−1 · Ri−4m−1 ]+ 2 [Wi−4m,i .. . · DRi−4m · 2 Ri−4m ]+ 2 2 ] +··· · DRi−4m+1 · Ri−4m+1 [Wi−4m+1,i k−1 k−1 Rik = · · · + [Wi−2 k−1 m−1,i · DRi−2k−1 m−1 · Ri−2k−1 m−1 ]+ k−1 [Wi−2 k−1 m,i · DRi−2k−1 m k−1 · Ri−2 k−1 m ]+ k−1 k−1 [Wi−2 k−1 m+1,i · DRi−2k−1 m+1 · Ri−2k−1 m+1 ] + · · · (2.17) CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 38 By recursive substitution of the Rij terms we can transform these k equations into j terms of the unknowns Ri0 and Wp,i . For k = 2 and m = 3 the result is: 0 0 0 0 0 0 Ri1 = [Wi−4,i · DRi−4 · Ri−4 ] + [Wi−3,i · DRi−3 · Ri−3 ] + [Wi−2,i · DRi−2 · Ri−2 ] 1 0 0 Ri2 = Wi−7,i · DRi−7 · [[Wi−11,i−7 · DRi−11 · Ri−11 ]+ 0 0 [Wi−10,i−7 · DRi−10 · Ri−10 ]+ 0 0 [Wi−9,i−7 · DRi−9 · Ri−9 ]] + 1 0 0 Wi−6,i · DRi−6 · [[Wi−10,i−6 · DRi−10 · Ri−10 ]+ 0 0 ]+ · DRi−9 · Ri−9 [Wi−9,i−6 0 [Wi−8,i−6 · DRi−8 · 0 Ri−8 ]] (2.18) + 1 0 0 Wi−5,i · DRi−5 · [[Wi−9,i−5 · DRi−9 · Ri−9 ]+ 0 0 ]+ · DRi−8 · Ri−8 [Wi−8,i−5 0 0 [Wi−7,i−5 · DRi−7 · Ri−7 ]] From Equation 2.3 we have: OF F ERED LOADi = Ri0 + Ri1 + Ri2 (2.19) CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 39 and by substituting Equations 2.18 into Equation 2.19 we get: OF F ERED LOADi = Ri0 + 0 0 0 0 0 0 [Wi−4,i · DRi−4 · Ri−4 ] + [Wi−3,i · DRi−3 · Ri−3 ] + [Wi−2,i · DRi−2 · Ri−2 ]+ 1 0 0 Wi−7,i · DRi−7 · [[Wi−11,i−7 · DRi−11 · Ri−11 ]+ 0 0 [Wi−10,i−7 · DRi−10 · Ri−10 ]+ 0 0 [Wi−9,i−7 · DRi−9 · Ri−9 ]]+ (2.20) 0 0 1 ]+ Wi−6,i · DRi−6 · [[Wi−10,i−6 · DRi−10 · Ri−10 0 0 [Wi−9,i−6 · DRi−9 · Ri−9 ]+ 0 0 [Wi−8,i−6 · DRi−8 · Ri−8 ]]+ 1 0 0 Wi−5,i · DRi−5 · [[Wi−9,i−5 · DRi−9 · Ri−9 ]+ 0 0 [Wi−8,i−5 · DRi−8 · Ri−8 ]+ 0 0 [Wi−7,i−5 · DRi−7 · Ri−7 ]] Equation 2.20 provides one equation for each interval i, in terms of OF F ERED LOADi j (which is measured), DRi (which is measured), Ri0 (which is unknown) and Wp,i (which is unknown). Once solutions for Ri0 are found, they can be used to calculate Rij , ∀i, j. Addij tionally, the presence of Wp,i introduces nonlinearity. Each interval i contains 7 unknowns: 0 0 0 1 1 1 Ri0 , Wi,i+2 , Wi,i+3 , Wi,i+4 , Wi,i+5 , Wi,i+6 , and Wi,i+7 . From Equation 2.15 we have the following equations for each interval i: 0 0 0 1 = Wi,i+2 + Wi,i+3 + Wi,i+4 1 1 1 1 = Wi,i+5 + Wi,i+6 + Wi,i+7 (2.21) All values in Equation 2.20 must be positive, and hence we have the constraints: CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS j 0 ≤ Ri0 , Wp,i ∀i, j, p 40 (2.22) j Of course, if the values for Wp,i were somehow magically known, then Equation 2.20 could be solved directly since it reduces to a linear system of N equations in N j unknowns. In practice, however, Wp,i are unknown and need to be estimated. We describe one approach to a solution whose general steps are as follows: j 1. Determine an initial estimate for all Wp,i over a window of prior intervals. Errors in j the estimates for Wp,i are directly related to the errors in Ri0 . As such, determining the bounds for this error is a known solved problem: bounding the error in solving a system of linear equations whose coefficients may contain experimental error [68]. j 2. Solve Equation 2.20 using these Wp,i estimated values. 3. If there is no solution in step 2, (i.e. Equation 2.22 is not satisfied) or there is a j positive change in the optimization objective, then change the values for Wp,i and iterate. j Let WI be the initial vector of Wp,i estimated values. The objective of the optimiza- tion may be to minimize ||WI − WS ||, where WS is the final solution vector of weights. In other words, assuming that the initial best estimate is based on prior fact, the solution vector ought not to deviate significantly from it. Step 1: One approach for determining WI to account for the impact of variance in RTT shown in Figure 2.4 would be to base WI on average historical measures of the changes in RTT over time. Let χk be the probability density function of ∆RT T over a period of length 3[2k ]m. Given that the arrivals of Ri0 are uniformly distributed over the CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 41 ith interval (defined by the probability density function ti ), then 0 E[Wi,i+2 ] = 0 E[Wi,i+3 ] = 0 E[Wi,i+4 ] = 1 E[Wi,i+5 ] = 1 E[Wi,i+6 ] = 1 E[Wi,i+7 ] = R i+2 i+1 R i+3 i+2 R i+4 i+3 R i+5 i+4 R i+6 i+5 R i+7 i+6 fχ0 (x)dx fχ0 (x)dx fχ0 (x)dx fχ1 (x)dx (2.23) fχ1 (x)dx fχ1 (x)dx Where fχk (t) is the convolution of ti and χk : fχ0 (t) = 3 + fχ1 (t) = 9 + R∞ −∞ R∞ −∞ χ0 (x)ti (t − x)dx χ1 (x)ti (t − x)dx (2.24) 0 In other words, E[Wi,i+2 ] is the mean portion of Ri0 that is expected to return during j 1 the (i+2)nd interval as Ri+2 . Note that in Equation 2.23 the E[Wp,i ] terms are independent j j of p. We now set WI to E[Wp,i ], in effect, replacing Wp,i in Equation 2.20 with its historical j j mean, E[Wp,i ]. By replacing the variables Wp,i by their means, the error can be quantified using Chernoff’s Bound [124]. 0 1 Step 2: Substituting the current estimated values of Wp,i and Wp,i into Equation 2.20 translates the problem into a linear system of N equations in N unknowns, for N 0 1 intervals (i.e. since Wp,i and Wp,i are now constants, the only unknowns left are Ri0 ). During system initialization, note that all SYNs arriving, accepted or dropped during the first interval are initial SYNs. Likewise, Rij = 0 for 1 ≤ j ≤ k, 1 ≤ i ≤ 3 (no 1st , 2nd , 3rd , ... k th , retries can occur in the first three intervals) and Rij = 0 for 2 ≤ j ≤ k, 4 ≤ i ≤ 9 (no 2nd , 3rd , ... k th , retries can occur during the 4th and 9th intervals). In CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 42 general, Rij = 0 for i ≤ 3(2z − 1), j ≥ z, 1 ≤ z ≤ k, Rij (2.25) = 0 for i ≤ 0, ∀j. For the initial N intervals, there are only N unknowns: OF F ERED LOAD1 = R10 OF F ERED LOAD2 = R20 OF F ERED LOAD3 = R30 0 · DR1 ]+ OF F ERED LOAD4 = R10 · [W1,4 0 R20 · [W2,4 · DR2 ]+ (2.26) R40 0 · DR1 ]+ OF F ERED LOAD5 = R10 · [W1,5 0 R20 · [W2,5 · DR2 ]+ 0 R30 · [W3,5 · DR3 ]+ R50 Step 3: If step 2 does not produce a satisfactory solution, an adjustment is made to the 0 1 values of Wp,i and Wp,i . There are several ways to perform this adjustment. One method 0 1 is based on the partial derivatives of Ri0 with respect to Wp,i and Wp,i , as defined by the gradient matrix: ∂R0i 0 ∂Wi−2,i ∂R0i−1 0 ∂Wi−2,i ∂R0i−2 0 ∂Wi−2,i ... ∂R0i ∂R0i−1 ∂R0i−2 ... 0 0 0 ∂W ∂W ∂W i−3,i i−3,i i−3,i ∂R0i ∂R0i−1 ∂R0i−2 G= ... 0 0 0 ∂W ∂W ∂W i−4,i i−4,i i−4,i .. .. .. .. . . . . (2.27) CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 43 The number of columns in G is equal to the number of intervals in the sliding window and j the number of rows in G is equal to the total number of Wp,i in the sliding window. Using j G we can formulate a linear program to determine Wp,i for the next iteration: GT ∂R0i 0 ∂Wi−2,i ∂R0i 0 ∂Wi−3,i ∆R0 ∆W ∂R0i 0 ∂Wi−4,i ... ∂R0i−1 ∂R0i−1 ∂R0i−1 ... ∂W 0 0 0 ∂Wi−3,i ∂Wi−4,i i−2,i ∂R0i−2 ∂R0i−2 ∂R0i−2 ... ∂W 0 0 0 ∂Wi−3,i ∂Wi−4,i i−2,i . .. .. .. .. . . . 0 ∆Wi−2,i 0 ∆Wi−3,i 0 ∆Wi−4,i 1 ∆Wi−7,i 1 ∆Wi−6,i 1 ∆Wi−5,i .. . ∆Ri0 0 ∆Ri−1 0 ∆Ri−2 = ∆R0 i−3 0 ∆Ri−4 ∆R0 i−5 . .. (2.28) j The column vector ∆W is the amount of (unknown) change to apply to the Wp,i for the next iteration. The column vector ∆R0 is the amount of change we would like to witness j for each Ri0 by applying the new values for Wp,i . In this case, ||0 − R0 || if R0 < 0 i i 0 ∆Ri = 0 otherwise (2.29) Essentially, Equation 2.28 uses the gradient matrix GT to determine how much each weight ought to be changed in order to achieve a viable solution. Equation 2.28 can be solved using a linear least squares method [130] to obtain a best fit solution for the ∆W . j Final Step: Once step 2 produces a satisfactory solution for Ri0 and Wp,i , these values can be plugged into Equation 2.17 to obtain the values for Rij . The values for Rij can then be used in Equation 2.13 to determine Cij . Having determined the values for Rij and Cij for the ith interval, we use these values in Equation 2.4 to obtain the mean client response time. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 44 2.3 Fast Online Approximation of the Certes Model Section 2.2 describes a computationally expensive algorithm: solving a system of nonlinear equations. We now present a fast, online, implementation of Certes that produces near optimal results based on a non-iterative approach. We simplify the mathematical approach in two ways: 1. We assume that all transactions that complete during the ith interval have roughly the same SYN-to-END time. If variance in SYN-to-END time leads to an inconsistency in the model, we make an online adjustment similar to Equation 2.13 but based on the mean SYN-to-END time for a given interval. For the remainder of the chapter, when referring to SYN-to-END time, we imply the mean SYN-to-END time for a given interval. 2. We compute an initial estimate of weights, WI , by assuming RTT has no variance. If this assumption leads to an inconsistency in the model, we make simple online j adjustments to Wp,i in the current and future time intervals. What follows is a step-by-step example exposing this approach. Step 1: An alternative to the approach given in the prior section for determining WI is to begin with the assumption that the RTT has no variance. Given an assumption of zero variance in the RTT, the initial values for WI become: k−1 0 1 2 0 = Wi−m−1,i = Wi−2m−1,i = Wi−4m−1,i = ... = Wi−2 k−1 m−1,i 0 1 = Wi−m,i 0 = 0 Wi−m+1,i 1 = Wi−2m,i = 1 Wi−2m+1,i 2 = Wi−4m,i = 2 Wi−4m+1,i k−1 = ... = Wi−2 k−1 m,i = ... = (2.30) k−1 Wi−2 k−1 m+1,i If, by using this assumption, a solution cannot be found, we add-in or adjust for RTT j variance by increasing or decreasing the values for Wp,i using simple online heuristics CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 45 ACCEPTED 1 A 10 t0 t1 R 10 D10 R 10 t2 seconds t3 t4 R 14 Client’s TCP waits 3 sec then tries to connect again total incoming SYNs Figure 2.5: Initial connection attempts that get dropped become retries three seconds later. in Step 3. These adjustments serve as an alternative to iterating over Equation 2.20 to j determine optimal values for Wp,i . Step 2: The following demonstrates how to efficiently solve Equation 2.20 via online direct substitution over a sliding window of intervals. Assume that the server is booted at time t0 (or there is a period of inactivity prior to t0 ), as shown in Figure 2.5. Certes assumes that all SYNs arriving during the first interval [t0 ,t1 ] are initial SYNs. During the first interval [t0 ,t1 ] the server measures ACCEP T ED1 and DROP P ED1 and can use those measurements to determine A01 = ACCEP T ED1 , D10 = DROP P ED1, and R10 = OF F ERED LOAD1 . Section 2.8 shows the results when Certes is applied when SYNs in the first interval are not all initial SYNs. The dropped SYNs, D10 , will return to the server as 1st retries three seconds later as R41 during interval [t3 ,t4 ]. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS ACCEPTED 1 ACCEPTED 4 A 10 A 04 t0 t1 R 10 D10 46 + A 14 t3 t4 R 04 D 04 R 14 D14 R 10 R 04 + R 14 total incoming SYNs total incoming SYNs t6 t7 R 17 1st retries t9 t10 seconds 2 R 10 2nd retries Figure 2.6: A second attempt at connection, that gets dropped, becomes a retry six seconds later. Moving ahead in time to interval [t3 ,t4 ], as shown in Figure 2.6, the server measures ACCEP T ED4 and DROP P ED4 and calculates the SYN drop rate for the 4th interval, DR4 , using Equation 2.6. The Web server cannot distinguish between an initial SYN or a 1st retry, therefore, the drop rate applies to both R40 and R41 equally, giving D41 = DR4 ·R41 , and then A14 = R41 − D41 . From Equations 2.3, A04 = ACCEP T ED4 − A14 and D40 = DROP P ED4 − D41 . Finally, the number of initial SYNs arriving during the 4th interval is R40 = A04 + D40 . We have determined the values for all terms in Figure 2.6. Note that the D41 dropped SYNs will return to the server as 2nd retries six seconds 2 later during interval [t9 ,t10 ], as R10 , when those clients experience their second TCP timeout and that the D40 dropped SYNs will return to the server as 1st retries, as R71 , three seconds later during interval [t6 ,t7 ]. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 47 By continuing in this manner it is possible to recursively compute all values of Rij , Aji and Dij for all intervals, for a given k. Figure 2.7 depicts the 10th interval, including those intervals that directly contribute to the values in the 10th interval. Clients that give up after k connection attempts are depicted as ending the transaction. Figure 2.8 shows the final model defining the relationships between the incoming, accepted, dropped and completed connections during the ith interval. Connections accepted during the ith interval complete during the (i + SYN-to-END)th interval. The j client frustration timeout is specified in seconds and the term Ri+[F indicates T O−2k−1 m] that clients who do not get accepted during the ith interval on the k th retry will cancel their attempt for service during the i + [F T O − 2k−1m] interval. The model in Figure 2.8 can be implemented in a Web server by using a simple data structure with a sliding window. Note that during each time interval, only the aggregate counters for DROP P EDi, ACCEP T EDi , and COMP LET EDi are incremented. At the end of each time interval, the more detailed counters for Rij , Aji , Dij , Cij are computed using a fixed number of computations. Step 3: As mentioned in Section 2.2, due to inconsistencies in network delays the 1st retry from a client may not arrive at the server exactly three seconds later, rather it may arrive in the interval prior to or after the interval it was expected to arrive. Likewise, since the measurement for SYN-to-END is not constant, there will be instances where j Ci+SYN-to-END 6= Aji ; in other words, some of the j retries accepted in the ith interval may complete prior to or after the (i + SYN-to-END)th interval. These occurrences relate to an interesting aspect of the choice for interval length. In general, when sampling techniques are used, the smaller the sampling period (more frequent the sampling), the more accurate the result. Certes is not a sampling based approach - yet one might intuit that using shorter intervals would somehow provide for better results 0 A 10 t0 t3 t1 R 10 D10 R =A 0 1 T 1 total incoming SYNs t4 R 14 D14 t6 t7 R 07 D 07 + A 110 2 + A 10 t9 t10 seconds 0 2 0 2 R 10 R 110 D110 R 10 D10 D10 R +R 0 10 1 10 +R total incoming SYNs Figure 2.7: After three connection attempts the client gives up. 2 10 clients get frustrated and give up before next retry CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS ACCEPTED10 48 C1i+SYN− to −END C i2+SYN− to −END C ik+SYN− to −END ACCEPTEDi = A 0i + A1i + A i2 K + A ik seconds ti-1 ti R 0i D 0i R 1i D1i R 2i D i2 … ti+SYN-to_END-1 ti+SYN_to_END R ik D ik D i0-3 D i0-6 D i0- 2k -1 m R ik++[1FTO-2 k -1 m ] R 0i + R 1i + R i2 + K + R ki total incoming SYNs clients get frustrated and eventually give up. Figure 2.8: Relationship between incoming, accepted, dropped, completed requests. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS C 0i+SYN− to− END Transactions accepted in the ith interval complete during the i+SYN-to-END interval, where SYN-to-END is the server measured response time. 49 CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS ac c ep te d ac ce p te d t0 t1 50 t2 d ro p p e d t3 seco nd s t4 1 st re trie s in itial co n n e ctio n atte m p ts Figure 2.9: The smaller the interval, the more difficult to accurately discretize events. - just the opposite is true. As shown in Figure 2.9, as the size of the interval is reduced below a certain point, the probability that events happen when expected reduces as well. For example, the probability that a dropped initial SYN will arrive back at the server during the interval that is exactly three seconds later becomes zero as the size of the interval is reduced to zero. This is similar in nature to the problem of variance in RTT that is specified in Figure 2.4. Likewise, with small sized intervals, the probability of events occurring on an interval boundary increases. These inconsistencies can be accounted for by performing online adjustments to j Wp,i to ensure that relationships within and between intervals remain consistent. The function rint, round to integer, is used to ensure that certain values for the model remain integral (i.e. we do not allow a fractional number of dropped SYNs). The first heuristic we use is: CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS if 51 (OF F ERED LOADi < Ri1 + Ri2 ) then overload = (Ri1 + Ri2 ) − OF F ERED LOADi h 1 i Ri Ri1 = Ri1 − rint(overload · R1 +R 2 ) h i 1i i Ri 1 1 Ri+1 = Ri+1 + rint(overload · R1 +R 2 ) i h 2 ii Ri Ri2 = Ri2 − rint(overload · R1 +R 2 ) i h 2i i Ri 2 2 Ri+1 = Ri+1 + rint(overload · R1 +R 2 ) i (2.31) i If the number of retries exceeds the measured offered load, we simply delay a portion of the overload until the next interval. The second heuristic we use is: if (ACCEP T EDi−SYN-to-END 6= COMP LET EDi ) then h i A0 Ci0 = rint(COMP LET EDi · ACCEPi−SYN-to-END ) T EDi−SYN-to-END i h A1 ) Ci1 = rint(COMP LET EDi · ACCEPi−SYN-to-END T EDi−SYN-to-END i h A2 ) Ci2 = rint(COMP LET EDi · ACCEPi−SYN-to-END T EDi−SYN-to-END (2.32) Since our approximation uses the mean SYN-to-END time, the number of completed connections may not equal the number of accepted connections. We adjust for this difference by using the ratio A0i−SYN-to-END : A1i−SYN-to-END : A2i−SYN-to-END to calculate the ratio for Ci0 : Ci1 : Ci2 . This attempts to adjust for variance in the SYN-to-END time. As we’ll show in Section 2.5, the results obtained by using these heuristics are sufficiently accurate to allow us to bypass the use of the costlier optimization approach defined in Section 2.2. Final Step: Having determined the values for Rij and Cij for the ith interval, we use these values in Equation 2.4 to obtain the mean client response time. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 52 2.3.1 Packet Loss in the Network Packet drops that occur in the network (and not explicitly by the server) are included in the model to refine the client response time estimate. Since the client-side TCP reacts to network drops in the same manner as it does to server-side drops, network drops are estimated and added to the drop counts, Dij . As shown in Figure 2.10, SYNs dropped by the network (NDSij ) are combined with those dropped at the server. To estimate the SYN drop rate in the network, one can use a general estimate of a 2-3% [164, 168] packet loss rate in the Internet or, in the case of private networks, obtain packet loss probabilities from routers. Another approach is to assume that the packet loss rate from the client to the server is equal to the loss rate from the server to the client. The server can estimate the packet loss rate to the client from the number of TCP retransmissions. 2.3.2 Client Frustration Time Out (FTO) A scenario that is very often neglected when calculating response times occurs when the client cancels the connection request due to frustration while waiting to connect. This was shown in Figure 2.7. Any client is only willing to wait a certain amount of time before hitting reload on the browser or going to another site. Such failed transactions must be included when determining client response time. To include this latency, the Certes model defines a limit, referred to as the client frustration timeout (FTO), which is the longest amount of time a client is willing to wait for an indication of a successful connection. In other words, the FTO is a measure of the upper bound on the number of connection attempts that a client’s TCP implementation will make before the client hits reload on the browser or goes to another Web site. C 1i + SYN − to − END C i2+ SYN − to − END C ik+ SYN − to − END ACCEPTED i = A i0 + A 1i + A i2 K + A ik seconds t i-1 ti R 0i D 0i R 1i NDS 0i D 1i R i2 D i2 NDS 1i NDS i2 … t i+SYN-to-END-1 R ik D ik NDS ik R ik++[1FTO - 2 k -1 m ] D D 0 i -3 0 i-6 ti+SYN-to-END clients get frustrated and give up. D i0- 2 k -1 m Figure 2.10: Addition of network SYN drops to the model. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS C 0i + SYN − to − END Transactions accepted in the ith interval complete during the i+SYN-to-END interval, where SYN-to-END is the server measured response time. 53 CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS If the client frustration timeout is: less than 3 sec at least 3 sec but less than 9 sec at least 9 sec but less than 21 sec at least 21 sec but less than 45 sec at least 45 sec but less than 1.55 min at least 1.55 min but less than 3.15 min 54 then the number of retries will be: 0 1 2 3 4 5 Table 2.1: Relationship between client frustration timeout and number of connection attempts. Table 2.1 specifies the relationship between FTO and the value for k, which was introduced in Section 2.2 as the maximum number of retries a client is willing to attempt before giving up. Fortunately, the value chosen for the number of retries covers a range of client behavior - unfortunately, that range will not cover all client behavior. Although it is possible to use a distribution of the FTO derived from real world Web browsing traffic, for simplicity, we used a constant default value of 21 seconds (k = 2) in most of our experiments. In Section 2.5 we look more carefully at the impact of using an incorrect assumption for the FTO. 2.3.3 SYN Flood Attacks Another scenario that is very often neglected when calculating response times arises during a SYN flood (denial of service) attack. During a SYN flood, the attackers keep the server’s SYN queue full by continually sending large numbers of initial SYNs. This essentially reduces the FTO to zero. The end result is that the server stands idle, with a full SYN queue, while very few client connections are established and serviced. A SYN flood attack is very different from a period of heavy load. The perpetrators of a SYN attack do not adhere to the TCP timeout and exponential back-off mechanisms, never respond to a SYN/ACK and never establish a connection with the server; no transactions are ser- CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 55 viced. On the other hand, in heavy load conditions, clients adhere to the TCP protocol and large numbers of transactions are serviced (excluding situations where the server enters a thrashing state). Certes works well under heavy load conditions due to the adherence of clients to the TCP protocol. During a SYN flood attack, Certes faces the problem of identifying the distribution of the FTO. Our approach to a solution involves identifying when a SYN attack is underway, allowing Certes to switch from the FTO distribution currently in use to one that is representative of a SYN attack. While identifying a SYN attack is relatively simple, it is not simple to construct a representative FTO distribution for a SYN flood attack. Implementing this approach was beyond the scope of this thesis and left for future work. 2.3.4 Categorization Certes can be used, in parallel, to obtain response time estimates for multiple classes of transactions. Since Certes is based on the drop activity associated with SYN packets, the classification of a dropped SYN is limited to the information contained in the SYN packet which includes, the device the packet arrived on, source IP address and port, and destination IP address and port. 2.4 Certes Linux Implementation In this section we describe our implementation of the fast online Certes model from Section 2.3 that executes on the Web server machine itself. In Chapter 3 we discuss our implementation of the Certes model which executes on an appliance that sits in front of the Web server. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS remote admin Web Server Machine local admin Apache event Certes Log /proc user kernel 56 TCP counters & SYN-to-END kernel module IP device driver network Figure 2.11: Certes implementation on a Linux Web server. Figure 2.11 shows Certes executing along side, but separate from Apache. Apache is shown as the Web server application in Figure 2.11, but any Web server application can be used since Certes runs completely independently. Certes was built with the expectation that it would be part of a control loop. As such, a local or remote administrator (or control module) can subscribe to Certes to receive notification when response time thresholds are exceeded. Certes also periodically logs its modeling results to disk to provide a history of Web server performance that can be used for additional performance analysis. Certes was mostly implemented as a user-space application that obtains kernel measurements at the end of each time interval, and then uses the measurements to perform modeling calculations in user space. This split between user and kernel space is by design, to reduce the amount of changes introduced into the kernel. This time interval is the same one introduced in Section 2.2, and can be set to any value less than the initial TCP retry timeout value of 3 seconds. The kernel measurements required by Certes are the total CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 57 number of accepted, dropped, and completed connections, and the total SYN-to-END time for all completed connections during an interval. We implemented these as global running counters within the kernel. These variables are monotonically increasing from the time at which the machine is booted, regardless of whether or not the Certes model is executing in user space. If the kernel already provides these four measurements, then Certes can be implemented without any kernel modifications. However, since RedHat 7.1 is not fully instrumented for Certes, minor modifications were made to the kernel. These modifications totaled less than 50 lines of code. To expose the ACCEPTED, COMPLETED and DROPPED counters, and the SYN-to-END measurement to user space, we wrote a simple kernel module that extended the /proc directory. User space programs can then obtain the kernel values by simply reading a file in the /proc directory. This is the de facto method in Linux for obtaining kernel values from user space. To provide further details on our kernel modifications, we describe the steps by which the Linux kernel manages TCP connection establishment and termination. We then discuss our instrumentation code that obtains the ACCEPTED, COMPLETED and DROPPED counters, and the SYN-to-END measurement. Figure 2.12 shows the structure of the TCP/IP connection establishment implementation in Linux. The three important data structures are the rx queue, the SYN queue, and the accept queue. The rx queue contains incoming packets that have just arrived. The SYN queue, which is actually implemented as a hash table, contains those connections which have yet to complete the TCP three-way handshake. The accept queue contains those connections which have completed the three-way handshake but have not yet been accepted by the Apache Web server application. The accept queue is often referred to as the listen queue since the socket it is attached to is a listening socket. Figure 2.12 is num- CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS (1) SYN J (3) ts SYN Hash Table limit (2) (4) SYN K, 58 IP limit user space TCP limit # processes rx queue ack J+1 (6) (5) ack K+1 (7) GET device driver + device independent network apache (8) limit accept(); accept queue Figure 2.12: TCP/IP connection establishment on Linux. bered according to the following steps that occur during TCP connection establishment in an unmodified Linux kernel: 1. A SYN arrives and is timestamped (denoted ts in the figure). 2. The incoming SYN packet is placed onto the rx queue during the hardware interrupt. The rx queue does have a limit, but this limit can be changed and rarely do packets get dropped due to the rx queue being full. 3. During post interrupt handling, the IP layer will route the incoming SYN to TCP. If the SYN hash table is full, TCP drops the incoming SYN. Otherwise, TCP creates an open (connection) request and places it into the SYN hash table. Note that Linux does not save the timestamp for the initial SYN packet in the open request structure. 4. TCP will respond to the incoming SYN immediately by sending a SYN/ACK to the client. If TCP cannot immediately send a SYN/ACK to the client (i.e. the tx queue in Figure 2.13 is full), TCP will drop the incoming SYN. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 59 5. The client completes the TCP 3-way handshake by sending an ACK to the server. 6. Once TCP receives the third part of the TCP 3-way handshake from the client, the open request will be placed onto the accept queue for processing. At this point, the new child socket is created and pointed to by the open request. The connection is considered to be established at this point. 7. The GET request could arrive at the server prior to the child connection being accepted by Apache, in which case the GET request is simply attached to the child socket as an inbound data packet. 8. Apache accepts the newly established child connection and proceeds to process the request. The speed at which Apache can process requests, along with the limit on the number of running Apache processes affects the length of the accept queue. Our kernel modifications with respect to steps 1 through 8 are as follows. We added an 8-byte timestamp field to both the open request structure and the socket structure so that the timestamp of the initial SYN could be saved across the lifetime of the connection. In step 3 the timestamp in the SYN is copied to the open request structure and in step 6 it is copied from the open request structure to the child socket structure. The ACCEPTED counter is also incremented during step 6. For our DROPPED counter, we just used the existing SNMP/TCP drop counter, but fixed several functions in the kernel that either failed to increment the counter when necessary or, due to incorrect logic, incremented the counter more than once for the same SYN drop. Figure 2.13 shows the outbound processing that occurs when Apache is sending data to the client. Figure 2.13 is numbered according to the following steps that occur during TCP outbound data transmission in an unmodified Linux kernel: CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS free queue device driver + device independent ts (13) network (14) user space IP TCP sock write queue (9) apache writev(); (10) (12) (11) tx queue writev() returns before packet is transmitted Figure 2.13: TCP/IP outbound data transmission on Linux. (9) Apache compiles a response to the GET request. This may include executing a program (such as CGI script or Java program) or reading a file from disk. (10) Once the response is composed, Apache makes a socket system call to send the data (e.g., writev()). If there is space available in the kernel for the response data, the data is copied into the kernel and then writev() immediately returns. If not, writev() will block the Apache process until space becomes available. (11) Once the data is copied to kernel space, TCP will immediately attempt to queue the data for transmission by placing it onto the tx queue. If the tx queue is full, TCP places the data on the socket outbound write queue. If that queue is full TCP will cause the Apache process to block. (12) Placed onto the tx queue, the data waits to be transmitted onto the network. (13) After the data is transmitted onto the network, the data packet is placed onto 60 CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 61 the free queue. (14) Pending acknowledgment by the remote TCP, the data packet is freed. Our kernel modifications infringed upon step 13. As a data packet is being placed onto the free queue, the current time is stored in the socket structure. Likewise, if the server application (e.g., Apache) closes the connection, or a TCP FIN or RST packet is received from the client, the current time is also saved in the socket structure in another timestamp field. This is also the point at which the COMPLETED counter is incremented. In this manner we are able to identify when the server finished sending data to the client and when either the server or the client closed the connection. We choose the lesser of these two as the end of the transaction. Subtracting the timestamp obtained from the initial SYN allows us to determine the SYN-to-END time for the connection (which is then added to the running total). In other words, we defined the end of the transaction to be whichever occurs first: the last data packet is sent from the server to the client or the first arrival/transmission of a TCP RST or FIN packet. In this section we provided some insights into the key kernel modifications we performed, all of which were relatively minor. Other modifications, not included in the above discussion, were too far removed from the purpose of this thesis to be discussed. Suffice it to say, a thorough investigation of the Linux kernel TCP/IP stack was undertaken to ensure that all code paths relevant to Certes were examined. Although we provided all of the above by directly modifying and rebuilding the kernel, it would be possible to provide the identical support using a kernel module (but with greater implementation difficulty). CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 62 2.5 Experimental Results To demonstrate the effectiveness of Certes, we implemented Certes on Linux and evaluated its performance in HTTP 1.0/1.1 environments, under constant and changing workloads. The results presented here focus on evaluating the accuracy of Certes for determining client perceived response time in the presence of failed connection attempts. The accuracy of Certes is quantified by comparing its estimate of client perceived response time with client-side measurements obtained through detailed client instrumentation. Section 2.5.1 describes the experimental design and the test bed used for the experiments. Section 3.6 presents the client perceived response time measurements obtained for various HTTP 1.0/1.1 Web workloads. Section 2.6 demonstrates how a Web server control mechanism can use Certes to evaluate its own ability to manage client response time. 2.5.1 Experimental Design The test bed consisted of six machines: a Web server, a WAN emulator, and four client machines (Figure 2.14). Each machine was an IBM Netfinity 4500R with dual 933 MHz CPUs, 10/100 Mbps Ethernet, 512 MB RAM, and 9.1 GB SCSI HD. Both the server and clients ran RedHat Linux 7.1 while the WAN emulator ran FreeBSD 4.4. The client machines were connected to the Web server via two 10/100 Mbps Ethernet switches and a WAN emulator, used as a router between the two switches. The client-side switch was a 3Com SuperStack II 3900 and the server-side switch was a Netgear FS508. The WAN emulator software used was DummyNet [136], a flexible and commonly used FreeBSD tool. The WAN emulator simulated network environments with different network latencies, ranging from 0.3 to 150 ms of round-trip time, as would be experienced in LAN and cross-country WAN environments, respectively. The WAN emulator simulated networks CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 63 Client 1 Client 2 Web Server Switch A 100 Mbps WAN Emulator Switch B 100 Mbps Client 3 Client 4 Figure 2.14: Experimental test bed. with 0-3% packet loss, which is not uncommon over the Internet. The Web server machine ran the latest stable version of the Apache HTTP server, V1.3.20. Apache was configured to run 255 daemons and a variety of test Web pages and CGI scripts were stored on the Web server. The number of test pages was small and the page sizes were 1 KB, 5 KB, 10 KB, and 15 KB. The CGI scripts would dynamically generate a set of pages of similar sizes. Certes also executed on the server machine, independently from Apache (shown in Figure 2.11). The Certes implementation was designed to periodically obtain counters and aggregate SYN-to-END time from the kernel and perform modeling calculations in user space. Periodically Certes would log the modeling results to disk. For our experiments, the Certes implementation was configured to use 250 ms measurement intervals and a default frustration timeout of 21 seconds (except where noted). The client machines ran an improved version of the Webstone 2.5 Web traffic generator [160]. Five improvements were made to the traffic generator. First, we removed all interprocess communication (IPC) and made each child process autonomous to avoid any load associated with IPC. Second, we modified the WebStone log files to be smaller yet contain more information. Third, we extended the error handling mechanisms and modified how and when timestamps were taken to obtain more accurate client-side measurements. Fourth, we implemented a client frustration timeout mechanism after discovering CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 64 the one provided in WebStone was only triggered during the select() function call and was not a true wall clock frustration timeout mechanism. Fifth, we added an option to the traffic generator that would produce a variable load on the server by switching between on and sleep states. The traffic generators were used on the four client machines to impose a variety of workloads on the Web server. The results for sixteen different workloads are presented, half of which were HTTP 1.0, the other half HTTP 1.1. While both HTTP 1.0 and HTTP 1.1 support persistent and non-persistent connections, we configured the traffic generators to run HTTP 1.0 over non-persistent connections and HTTP 1.1 over persistent connections. Although recent studies indicate that non-persistent connections are still used far more frequently than persistent connections in practice [146], the use of persistent connections increases the duration of each connection and reduces the number of connection attempts, thereby reducing the effect that SYN drops have on client response time. Measuring both HTTP 1.0/1.1 workloads provides a way to quantify the benefits of using Certes for different versions of HTTP versus only using simpler SYN-to-END measurements. For the HTTP 1.1 workloads considered, the number of Web objects per connection ranged from 5 to 15, consistent with recent measurements of the number of objects (e.g., banners, icons) typically found in a Web page [146]. The characteristics of the sixteen workloads are summarized in Table 2.2. In an attempt to cover a broad range of conditions we varied the workloads along the following dimensions: 1. static pages and dynamic content (Perl and C) 2. HTTP 1.0 and 1.1 3. 1 to 15 pages per connection CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 65 4. 0% to 3% network drop rate 5. 5 ms to 150 ms network delays 6. 1400 to 4800 clients (30 to 1670 conn/sec) 7. CPU and bandwidth bound 8. consistent and variable load All of the sixteen workloads imposed a constant load on the server except for Test I and Test J, which imposed a highly-varying load on the server. Each experimental workload was run for 20 minutes. For each workload, we measured at the server the steady-state number of connections per second and mean SYN drop rate during successive one-second time intervals. These measurements provide an indication of the load imposed on the server. A B C D E F G H I J K L M N O P Page Types Pages per Connection static static+cgi static+cgi cgi static static static+cgi static+cgi static+cgi static static+cgi static static+cgi static static+cgi static+cgi 1 1 1 1 15 15 5 5 5 1 5 1 5 1 5 1 Network HTTP Drop Rate 0 1.0 0 1.0 0 1.0 0 1.0 0 1.1 0 1.1 0 1.1 0 1.1 0 1.1 0 1.0 3% 1.1 3% 1.0 3% 1.1 3% 1.0 3% 1.1 3% 1.0 ping RTT (ms) min/avg/max 1/8/21 0.2/0.5/5 141/152/165 0.2/0.4/4 4/11/17 141/153/167 0.2/0.7/6 140/152/165 120/133/147 142/151/165 0.2/0.6/6 0.2/0.9/8 140/151/161 144/151/164 140/150/161 140/151/161 Connections SYN per Second Drop Rate 1210-1670 330-580 320-675 175-320 80-150 50-96 97-173 95-175 30-185 55-470 103-177 340-1310 50-115 145-400 57-147 180-400 11%-22% 11%-33% 0.5%-26% 26%-44% 45%-63% 0.5%-36% 42%-59% 9%-37% 0% - 54% 0%-78% 35%-56% 0%-30% 14%-54% 0.5%-34% 8%- 53% 5%-38% Table 2.2: Test configurations included HTTP 1.0/1.1, with static and dynamic pages. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS Test Total Number of Clients 2000 2000 2000 2000 2000 1400 2000 1600 2000 4800 2000 2000 1600 2000 1500 1800 66 CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 67 2.5.2 Measurements and Results Figure 2.15a compares the client-side, Certes, SYN-to-END and Apache measured mean response times for each experiment. The values shown are the response times, calculated on a per second interval, averaged over the 20 minute test period. Figure 2.15b shows the same results normalized with respect to the client-side measurements. Client (a) Apache SYN-to-END Certes Mean Response Time (seconds) 18 16 14 12 10 8 6 4 2 0 A B C D E F G H I J K L M N O P N O P Experiments Normalized Mean Response Time (b) Apache SYN-to-END Certes 110% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% A B C D E F G H I J K L M Experiments Figure 2.15: Certes accuracy and stability in various environments. The results show that the SYN-to-END measurement consistently underestimated the client-side measured response time, with the error ranging from 5% to more than 80%. The Apache measurements for response time, which by definition will always be less than the SYN-to-END time, were extremely inaccurate, with an error of at least 80% in all test cases. In contrast, the Certes estimate was consistently very close to the client-side CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 68 measured response time, with the error being less than 2.5% in all cases except Tests L, N and P, which were less than 7.4%. Figures 2.12 and 2.13 explain why the Apache level measure of response time is so short compared to the mean client perceived response time. Apache does not measure all the inbound kernel queuing that occurs nor the time it takes to perform the TCP three-way handshake. On outbound, Apache measures the end of the transaction before the data is transmitted (i.e. as soon as the writev() returns). Figures 2.16a and 2.16b, show the response time distributions for Test D using HTTP 1.0 and Test G using HTTP 1.1. These results show that Certes not only provides an accurate aggregate measure of client perceived response time, but that Certes provides an accurate measure of the distribution of client perceived response times. Figure 2.16 again shows how erroneous the SYN-to-END time measurements are in estimating client perceived response time. Figures 2.17a and 2.17b show how the response time varies over time for Test A using HTTP 1.0 and Test G using HTTP 1.1. The figures show the mean response time at one-second time intervals as determined by each of the four measurement methods. The client-side measured response time increases at the beginning of each test run then reaches a steady state during most of the test run while the traffic generated is relatively constant. At the end of the experiment the clients are terminated, the generated traffic drops off, and the response time drops to zero. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 69 110 (a) 100 90 80 Client−Side Certes SYN−to−END Apache count 70 60 50 40 30 20 10 0 0 1 2 3 4 5 6 mean response time 7 8 9 10 16 18 100 (b) 90 80 Client−Side Certes SYN−to−END Apache 70 count 60 50 40 30 20 10 0 0 2 4 6 8 10 mean response time 12 14 Figure 2.16: Certes response time distribution approximates that of the client for Tests D and G. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 1.2 (a) mean response time (sec) 1 0.8 0.6 Client−Side Certes SYN−to−END Apache 0.4 0.2 0 0 200 400 600 800 elapsed time (sec) 1000 1200 1000 1200 18 (b) 16 mean response time (sec) 14 12 10 Client−Side Certes SYN−to−END Apache 8 6 4 2 0 0 200 400 600 800 elapsed time (sec) Figure 2.17: Certes online tracking of the client response time in Tests A and G. 70 CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 71 Figure 2.17 shows that Certes can track in real-time the variations in client perceived response time for both HTTP 1.0/1.1 environments. The figure also indicates that Certes is effective at tracking both smaller and larger scale response times, and that Certes is able to track client perceived response time over time in addition to providing the accurate long term aggregate measures of mean response time shown in Figure 2.15. Again, Certes provides a far more accurate real-time measure of client perceived response time than SYN-to-END times or Apache. The large amount of overlap in the figures between the Certes response time measurements and client-side response time measurements show that the measurements are very close. In contrast, the SYN-to-END and Apache measurements have almost no overlap with the client-side measurements and are substantially lower. To gain insight on Certes’ sensitivity to the FTO, Test O and Test P were executed using false assumptions for the number of retries k. In these two cases the FTO was distributed across clients: 1/3 of the transactions were from clients configured to have an FTO of 9 seconds (k = 1), 1/3 were from clients configured to have an FTO of 21 seconds (k = 2), and 1/3 from clients configured to have a client FTO of 45 seconds (k = 3); the online model used the incorrect assumption that all clients had an FTO of 21 seconds (k = 2). The results for Tests O and P show that the Certes response time measurements were still within 2% and 7.4%, respectively, of the client-side response time measurements. For Test O the resulting Certes estimate was only off by 108 ms and for Test P the difference was 677 ms. As mentioned earlier, if the distribution for k was known (via historical measurements) the distribution can easily be included into the model. Further study is needed to determine if error bounds exist for Certes and under which specific conditions Certes is least accurate and why. One of the key requirements for an online algorithm such as Certes is to be able CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 72 18 16 Client−Side Certes SYN−to−END Apache mean response time (sec) 14 12 10 8 6 4 2 0 700 800 900 1000 elapsed time (sec) 1100 1200 1300 Figure 2.18: Certes online tracking of the client response time in Test J, in on-off mode. to quickly observe rapid changes in client response time. Figure 2.18 shows how Certes is able to track the client response time as it rapidly changes over time. There is no significant lag in Certes reaction time to these changes. This is an important feature for any mechanism to be used in real-time control. As expected, the SYN-to-END measurement tracks the client perceived response time during the time intervals in which SYN drops do not occur. During the interval in which SYN drops occur, the SYN-to-END measurement reaches a maximum (i.e. about 6 seconds in Figure 2.18), which indicates the inaccuracy of the SYN-to-END time for those connections that are accepted when the accept queue is nearly full. We note for completeness that Figure 2.18 is zoomed in to show detail and does not contain information from the entire experiment. The chaos at the end of the test run is indicative of the time-dependent nature of SYN dropping. These relatively few clients experienced SYN drops prior to these last few intervals, increasing the overall mean client response time during a period when the load on the system is actually very light. The mean client response time during these intervals actually reflects heavy load in the recent past. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 73 An important consideration in using an online measurement tool such as Certes is ensuring that the measurement overhead does not adversely affect the performance of the Web server. To determine the overhead of Certes, we re-executed Tests A, G and H on the server without the Certes instrumentation and found the difference in throughput and client response time to be insignificant. This suggests that Certes imposes little or no overhead on the server. 2.6 Certes Applied in Admission Control In this section we demonstrate how Certes can be combined with a Web server control mechanism to better manage client response time and identify a common pitfall that many admission control mechanisms fall into. Web server control mechanisms often manipulate inbound kernel queue limits as a way to achieve response time goals [92, 94, 86, 125, 8, 42, 41]. Unfortunately, there is a serious pitfall that can occur when post-TCP connection measurements are used as an estimate of the client response time. Using these types of measurements as the response time goal can lead the control mechanism to take actions that may result in having the exact opposite effect on client perceived response time from that which is intended. Without a model such as Certes, the control mechanism will be unaware of this effect. To emulate the effects of a control mechanism at the Web server, we modified the server to dynamically change the Apache accept queue limit over time. Figure 2.19 shows the accept queue limit changing every 10 seconds between the values of 25 and 211 . Figure 2.20 shows the effect this has on the client perceived response time. When the queue limit is small, such as near the 200th interval, the response time at the clients is high due to failed connection attempts, but the SYN-to-END time is small CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 74 due to short queue lengths at the server. The pitfall occurs when the control mechanism decides to shorten the accept queue to reduce response time, causing SYN drops, which in turn increases mean client response time. The control mechanism must be aware of the effect that SYN drops have on the client perceived response time and include this as an input when deciding on the proper queue limits. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS accept queue [0:1265.83] length limit 2000 1800 1600 1400 count 1200 1000 800 600 400 200 0 0 200 400 600 800 elapsed time (sec) 1000 1200 Figure 2.19: Web server control manipulating the Apache accept queue limit. 10 Client−Side Certes SYN−to−END Apache 9 8 mean response time (sec) 7 6 5 4 3 2 1 0 0 200 400 600 800 elapsed time (sec) 1000 1200 Figure 2.20: Client response time increases as accept queue limit decreases. 75 CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 76 2.7 Shortcomings of the (strictly) Queuing Theoretic Approach As discussed in Chapter 1, the related work on modeling TCP that we cite assumes that the SYN drop rate remains consistent over time (and is based on network drop probabilities and not drop rates at the server). We show here that such a queuing theoretic approach leads to an error prone result that is not nearly as accurate as the Certes model. Using an M/M/1 queuing system to represent the Web server, the steady-state expected client response time is: CLIENT RT = (ts + tq ) · (1 − p3 ) + 3p + 6p2 + 12p3 + · · · (2.33) where ts is the service time of the request, i.e. 1 µ tq is the time spent waiting on the queue p is the probability of dropping a connection request The assumptions for this overly simplified model is that the offered load remains constant over time and that ts remains constant regardless of the offered load. Nevertheless, Figure 2.21 is a plot of Equation 2.33 (with ts + tq = .010 seconds) showing the additional time added to the mean service time under the given SYN drop rate. For example, a drop rate of approximately 20% adds 1 second to the mean service time. Substituting SYN-to-END for ts + tq in Equation 2.33, we can obtain the results that the M/M/1 model produces for Test J. Figure 2.22 shows Figure 2.18 overlaid with the M/M/1 results. The M/M/1 model fails to track the client perceived response time as effective as Certes. This is due to its inability to track the dependencies between time CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 77 Effect of SYN drop on Client Response Time 20 Penalty Added to Client Response Time (sec) 18 16 14 12 10 8 6 4 2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Probability of a SYN drop 0.7 0.8 0.9 1 Figure 2.21: Effect of SYN drop rate on client response time, as modeled as an M/M/1 queuing system. intervals. Note that this model still requires collecting the SYN-to-END time, and the number of dropped, accepted and completed for the current interval. Equation 2.34 shows a more accurate approximation, using the drop probabilities from prior intervals: CLIENT RT = mean(SYN-to-ENDi ) 3 · DRi−SYN-to-END−3 + (2.34) 6 · DRi−SYN-to-END−6 · DRi−SYN-to-END−6 + 12 · DRi−SYN-to-END−12 · DRi−SYN-to-END−18 · DRi−SYN-to-END−21 This approach captures some, but not all, of the dependences that exist between time intervals. The results from applying Equation 2.34 to Test J are shown in Figure 2.23. Note that to apply this approach, a sliding window of the number of dropped, accepted and completed is required - exactly that which is required by Certes. Therefore, Certes gives a more accurate result using the same information at an equivalent computational cost. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 78 18 16 Client−Side Certes SYN−to−END /M/M/1 mean response time (sec) 14 12 10 8 6 4 2 0 900 950 1000 1050 elapsed time (sec) 1100 1150 1200 Figure 2.22: Modeling as an M/M/1 queuing system fails to accurately track client perceived response time. Client−Side Certes SYN−to−END /M/M/1 plus 20 18 mean response time (sec) 16 14 12 10 8 6 4 2 0 100 120 140 160 180 200 elapsed time (sec) 220 240 260 Figure 2.23: Using a sliding window of drop probabilities fails to capture all the dependences between time intervals. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 79 2.8 Convergence The online implementation of Certes makes the assumption that during the first interval, all SYNs are initial SYNs. Certes will converge if this assumption is not true - i.e. if Certes begins modeling during the ith interval. Figures 2.24 and 2.25 are the results of starting the Certes modeling halfway through the execution of a consistent and variable load experiment. Figure 2.24 is similar to Test A except that the accept queue limit was set to 512 and Figure 2.25 is a re-execution of Test J. Figures 2.24 and 2.25 represent worst case scenarios in the sense that none of the measurements for prior intervals are available when Certes begins modeling. In both cases, Certes converges after 21 seconds, which is the FTO. 3 Client−Side Certes SYN−to−END mean response time (sec) 2.5 2 1.5 1 0.5 0 0 200 400 600 800 elapsed time (sec) 1000 1200 Figure 2.24: Certes begins modeling at the 600th interval during a consistent load test. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 80 16 Client−Side Certes SYN−to−END 14 mean response time (sec) 12 10 8 6 4 2 0 200 300 400 500 600 700 elapsed time (sec) 800 900 1000 Figure 2.25: Certes begins modeling at the 575th interval (in the middle of a peak) during a variable load test. CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 81 2.9 Summary This chapter presented Certes, an online server-based mechanism that enables Web servers to measure client perceived response time. Certes is based on a model of TCP that quantifies the effect that SYN drops have on client perceived response time by using three simple server-side measurements. Certes does not suffer from any of the drawbacks associated with the addition of new hardware, having to modify existing Web pages or HTTP servers, and does not rely on third party sampling. Certes can also be used for the delivery of non-HTML objects such as PDF or PS files. A key result of Certes is its robustness and accuracy. Certes was shown to provide accurate estimates in the HTTP 1.0/1.1 environments, with both static and dynamically created pages, under constant and variable loads of differing scale. Certes can be applied over long periods of time and does not drift or diverge from the client perceived response time; any errors that may be introduced into the model do not accumulate over time. Certes is computationally inexpensive and can be used online at the Web server to provide information in real-time. Certes captures the subtle changes that can occur under constant load as well as the rapid changes that occur under bursty conditions. Certes can also determine the distribution of the client perceived response time, which is extremely important, since service-level objectives may not only specify mean response time targets, but also indicate variability measures such as mode, maximum, standard deviation and variance. Certes can be readily applied in a number of contexts. Certes is particularly useful to Web servers that manage QoS by performing admission control. Certes allows such servers to quantify the effect that admission control drops have on client perceived response time as well as allowing them to avoid the pitfalls associated with using application level or kernel level SYN-to-END measurements of response time. Certes is accurate CHAPTER 2. MODELING LATENCY OF ADMISSION CONTROL DROPS 82 under heavy server load, the moment at which admission control or scheduling algorithms must make critical decisions. Algorithms that manage resource allocation, reservations or congestion [17] can also benefit from the short-term forecasting [45] of connection retries modeled by Certes. Our work with Certes lead us to develop ksniffer, a system for determining the per pageview client perceived response time, which is presented in the next chapter. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 83 Chapter 3 Modeling Client Perceived Response Time The Certes research does what no other published work has done to date which is to expose a significant flaw in existing admission control techniques that ignore the latency effects of SYN drops. We now step beyond the Certes work by developing an approach that is capable of determining the response time, as perceived by the remote client, on a per pageview basis. We introduce ksniffer, a kernel-based traffic monitor capable of determining pageview response times as perceived by remote clients, in real-time at gigabit traffic rates. ksniffer is based on novel, online mechanisms that take a “look once, then drop” approach to packet analysis to reconstruct TCP connections and learn client pageview activity. These mechanisms are designed to operate accurately with live network traffic even in the presence of packet loss and delay, and can be efficiently implemented in kernel space. This enables ksniffer to perform analysis that exceeds the functionality of current traffic analyzers while doing so at high bandwidth rates. ksniffer requires only to passively monitor network traffic and can be integrated with systems that perform server management CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 84 to achieve specified response time goals. Our experimental results demonstrate that ksniffer can run on an inexpensive, commodity, Linux-based PC and provide online pageview response time measurements, across a wide range of operating conditions, that are within five percent of the response times measured at the client by detailed instrumentation. As described in Chapter 1 and shown in Figure 3.1, a pageview consists of a container page and zero or more embedded objects, which are usually obtained over multiple TCP connections. This introduces a set of problems not present when simply measuring per URL server response time. In addition to the SYN drop latencies covered by Certes, we are now faced with the problem of correlating multiple HTTP requests over multiple TCP connections into a consistent model for each pageview. SYN drop latencies on either connection, ambiguities in object containment, TCP retransmission latencies, HTTP protocol parsing, missing information, and network packet loss are some of the problems that need to be addressed. ksniffer mechanisms take a “look once, then drop” approach to packet analysis, use simple hashing data structures to match Web objects to pageviews, and can be efficiently implemented in kernel space. ksniffer uses the Certes model of TCP retransmission and exponential backoff that accounts for latency due to connection setup overhead and network packet loss. It combines this model with higher level online mechanisms that use access history and HTTP referer information when available to learn relationships among Web objects to correlate connections and Web objects to determine pageview response times. Furthermore, ksniffer only looks at TCP/IP and HTTP protocol header information and does not need to parse any HTTP data payload. This enables ksniffer to perform higher level Web pageview analysis online, in the presence of high data rates; it can monitor traffic at gigabit line speeds while running on an inexpensive, commodity PC. These mechanisms enable ksniffer to provide accurate results across a wide range of operating CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME Client t0 85 Server SYN J ck J+1 SYN K, a ack K+1 GET index .html GET obj3 .gif Client SYN M Server M+1 SYN N, ack ack N+1 GET obj1.gif GET obj8.g if GET obj2 .gif GET obj4.gif te Figure 3.1: Downloading a container page and embedded objects over multiple TCP connections. conditions, including high load, connection drops, and packet loss. In these cases, obtaining accurate performance measures is most crucial because Web server and network resources may be overloaded. Current approaches to measuring response time include active probing from geographically distributed monitors, instrumenting HTML Web pages with JavaScript, offline analysis of packet traces, and instrumenting Web servers to measure application-level performance or per connection performance. These all fall short, in one area or another, in terms of accuracy, cost, scalability, usefulness of information collected, and real-time availability of measurements. ksniffer provides several significant advantages over these CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 86 approaches. First, ksniffer does not require any modifications to Web pages, Web servers, or browsers, making deployment easier and faster. This is particularly important for Web hosting companies responsible for maintaining the infrastructure surrounding a Web site but are often not permitted to modify the customer’s server machines or content. Second, ksniffer captures network characteristics such as packet loss and delay, aiding in distinguishing network problems from server problems. Third, ksniffer measures the behavior of every session for every real client who visits the Web site. Therefore, it does not fall prey to biases that arise when sampling from a select, predefined set of remote monitoring machines that have better connectivity, and use different Web browser software, than the actual users of the Web site. Fourth, ksniffer can obtain metrics for any Web content, not just HTML. Fifth, ksniffer performs online analysis of high bandwidth, live packet traffic instead of offline analysis of traces stored on disk, bypassing the need to manage large amounts of disk storage to store packet traces. More importantly, ksniffer can provide performance measurements to Web servers in real-time, enabling them to respond immediately to performance problems through diagnosis and resource management. The rest of this chapter is outlined as follows. Section 3.1 presents an overview of the ksniffer architecture. Section 3.2 describes the ksniffer algorithms for reconstructing TCP connections and pageview activities. Section 3.3 discusses how ksniffer handles less ideal operating conditions, such as packet loss and server overload. Sections 3.4 and 3.5 discuss some issues related to result aggregation. Section 3.6 presents experimental results quantifying the accuracy and scalability of ksniffer under various operating conditions. We measure the accuracy of ksniffer against measurements obtained at the client and compare the scalability of ksniffer against user-space packet analysis systems. Finally, we present some concluding remarks. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME clients w/browser Application servers database servers INTERNET HTTP servers 87 Figure 3.2: Multi-tiered server farm with a ksniffer monitor. 3.1 ksniffer Architecture ksniffer was motivated by the desire to have a fast, scalable, flexible, yet inexpensive traffic monitor that can be used in production environments for observing latencies in server farms. As such, performance is one of our key design issues: online analysis of high bandwidth links, where information is available in an on-demand manner. Cost of deployment is another major consideration as well. Figure 3.2 depicts how ksniffer is deployed within a server farm. ksniffer receives packet streams from the mirrored port of the switch that is situated in front of the server complex. ksniffer, in turn, provides information back to the server complex which can be used for managing the client perceived response time. This overall approach to gather traffic from a multi-tiered server farm is used in existing systems [109]. What distinguishes ksniffer from other high speed systems is that ksniffer achieves high performance on commodity PC’s by splitting its functionality between the kernel and user space. Figure 3.3 (right) illustrates the architecture of ksniffer. ksniffer attaches itself directly to the device independent layer within the Linux kernel. To the kernel, ksniffer appears as just another network protocol layer within the stack, treated similarly to IP code. ksniffer does not produce any log files, but can read configuration parameters and write CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 88 ksniffer Machine Traffic Monitor Machine trace logs Local Analysis Monitor libpcap user user kernel kernel Socket TCP Raw Socket Offline Analysis mmap Socket (a) TCP IP IP config files filtering protocol analysis (b) reports device independent device driver network device independent device driver packet stream device driver Remote Analysis Figure 3.3: Typical libpcap based sniffer architecture (left) vs. the ksniffer architecture (right). debugging information to disk from kernel space. ksniffer is similar to other computing systems in that a portion of the functionality is contained in the kernel, as dynamically loadable modules, and the remainder is contained in user space. Lower level features requiring high performance are implemented in the operating system, and exploit in-kernel performance advantages such as zero-copy buffer management, eliminated system calls, and reduced context switches. This functionality includes TCP connection tracking, HTTP session monitoring, IP address longest prefix matching, file extension matching, longest URI path matching, multiple attribute range matching, and a hot list module. Some of this code was developed by duplicating then modifying code from the Linux kernel, as well as from an in-kernel HTTP server [78, 83]. Certain components, such as HTTP session tracking, could well be placed in either kernel or user space. In these cases, for performance reasons, we opted for the faster kernel implementation. In this way, we reduce the amount of information that is transferred between kernel CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 89 and user space, and reduce the number of user space processes and their attendant context switching costs. In addition, the finer-grained timing capabilities afforded in the kernel allow for more accurate time-keeping than in user space. Higher level analysis features, such as clustering and rule generation, are implemented in user space. This functionality is less performance-critical, and benefits from available support such as floating point arithmetic and a large number of specialty libraries. Application level information developed by ksniffer is made available to user space via an mmap’ed shared memory region. This lets communication occur between the user and kernel components, in a producer/consumer relationship, without the overhead of buffer copies or system calls. Making the decision to place a portion of ksniffer in the kernel has both advantages and drawbacks. Other research has demonstrated scalability in the context of HTTP servers. The world’s fastest Web servers execute in kernel space: AFPA [83, 78] and Tux [132]. On the other hand, kernel module development requires more expertise than user space development. This issue is ameliorated somewhat by the availability in recent years of virtualization tools that allow kernel development in user space, such as VMWare [158] and User-Mode Linux [52]. Similarly, kernel programming is inherently less safe than user-space programming, since programming errors can result in crashes that halt the machine and disable other services that might be running. Since ksniffer is designed to be used as a dedicated monitoring appliance, however, this is less of a concern. We show that the performance gains of developing in the kernel outweigh the cost of developing for special purpose hardware. Nprobe [75] avoids copying overheads by mapping a section of kernel memory into user space and placing packets into the shared memory area. ksniffer also uses shared memory, but does not transfer packets between kernel and user space, but instead transfers CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 90 application level events. This allows ksniffer to access kernel structures directly, such as the skb, which contain pointers into the raw data packet. This avoids the dual header parsing that would occur with Nprobe: 1) first, the kernel creates a skb and parses the headers in the packet, 2) second, Nprobe passes the raw packet up to user space to be re-parsed and analyzed. Gigascope [48] is a traffic monitor designed for high-speed IP backbone links and requires a programmable network interface adapter. Where ksniffer splits its functionality between user and kernel space, Gigascope splits its functionality between the NIC and user space. The key difference between the two is that ksniffer provides higher level functions within its kernel modules (such as TCP connection tracking) where as Gigascope modifies the NIC firmware to perform packet filtering. They report being able to count packets on port 80 whose payload matches the regular expression ˆ[ˆ \\ n]*HTTP/1.* at 610 Mbps when their code is executing directly on the NIC. Executing their NIC packet filtering code on the host CPU instead of on the NIC, they fall to 480 Mbps, at which point they experience interrupt livelock. This was using a 733 Mhz processor with 2GB of RAM. ksniffer has already demonstrated more functionality at similar bandwidth rates, using a machine with less memory. In addition, the cost of a Gigascope deployment far exceeds that of ksniffer. 3.2 ksniffer Pageview Response Time To determine the client perceived response time for a Web page, ksniffer measures the time from when the client sends a packet corresponding to the start of the transaction until the client receives the packet corresponding to the end of the transaction. How a packet may indicate the start or end of a transaction depends upon several factors. To show how CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME Client IP addr pageview hash table FIFO pageview queue loners queue round trip time (RTT) Flow 4-tuple start time, end time number of requests FIFO request queue FIFO finish queue requesting client 91 Web URL start time, end time server reply state referring pageview Pageview URL start time, end time request count container pattern timeout requesting client container Web object embedded object hash table Figure 3.4: Objects used by ksniffer for tracking. this is done, we first briefly describe some basic entities tracked by ksniffer, then describe how ksniffer determines response time based on an anatomical view of the client-server behavior that occurs when a Web page is downloaded. ksniffer keeps track of four entities to maintain the information it needs to measure response time: clients, pageviews, HTTP objects, and TCP connections. ksniffer tracks each of these entities using the corresponding data objects shown in Figure 3.4. Clients are uniquely identified by their IP address. A pageview consists of a container page and a set of embedded HTTP objects. For example, a typical Web page consists of an HTML file as the container page and a set of embedded images which are the embedded HTTP objects. Pageviews are identified by the URL of the associated container page and Web objects are identified by their URL. A flow represents a TCP connection, and is uniquely identified by the four tuple consisting of source and destination IP address and port numbers. It is the associations between instances of these objects which enables ksniffer to reconstruct the activity at the Web site. To efficiently manage these associations, ksniffer maintains sets of hash tables to perform fast lookup and correlation between the four CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 92 types of objects. Separate hash tables are used for finding clients and flows, indexed by hash functions on the IP address and four-tuple, respectively. Each client object contains a pageview hash table indexed by a hash function over the container page URL. Flows contain a FIFO request queue of Web objects that have been requested but not completed, and a FIFO finish queue of Web objects that have been completed. Suppose a remote client, Cj , requests a Web page. We decompose the resulting client-server behavior into four parts: TCP connection setup, HTTP request, HTTP response, and embedded object processing. We use the following notation in our discussion. Let Cj be the j th remote client and Fij be the ith TCP connection associated with remote client Cj . Let pvij be the ith pageview associated with remote client Cj , and wkj,i be the k th Web object requested on Fij . Let ti be the ith moment in time, d represent an insignificant amount of processing time, either at the client or the server, p represent the Web server processing time of an HTTP request, and RT T be the round trip time between the client and the server. 3.2.1 TCP Connection Setup If the client, Cj , is not currently connected to the Web server, the pageview transaction begins with making a connection. Connection establishment is performed using the well known TCP three-way handshake, as shown in Figure 3.5. The start of the pageview transaction corresponds to the SYN J packet transmitted by the client at time t0 . However, ksniffer is located on the server-side of the network, where a dotted line is used in Figure 3.5 to represent the point at which ksniffer captures the packet stream. ksniffer does not capture SYN J until time t0 + .5RT T , after the packet takes 1/2 RTT to traverse the network. This is assuming ksniffer and the Web server are located close enough together that they see packets at essentially the same time. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME F1j Client t0 t0+RTT + d t0+RTT + 2d ti 93 Server SYN J J+1 SYN K, ack t0+.5RTT t0+.5RTT + d ack K+1 GET index.h tml HTTP response header t0+1.5RTT + 2d ti+.5RTT ti +.5RTT+ p tk tk+.5RTT captured by ksniffer Figure 3.5: HTTP request/reply. If this is the first connection from Cj , ksniffer will create a flow object F1j and insert it in the flow hash table. At this moment, ksniffer does not know the value for RT T since only the SYN J packet has been captured, so it cannot immediately determine time t0 . Instead, it sets the start time for F1j equal to t0 + .5RT T . ksniffer then waits for further activity on the connection. At t0 + 1.5RT T + 2d, ksniffer and the Web server receive the ACK K+1 packet, establishing the TCP connection between client and server. ksniffer can now determine the RT T as the difference between the SYN-ACK from the server (the SYN K, ACK J+1 packet) and the resulting ACK from the client during connection establishment (the ACK K+1 packet). ksniffer then updates F1j ’s start time by subtracting 1/2 RT T from its value to obtain t0 . At time t0 + 1.5RT T + 2d, for the first connection from Cj , ksniffer creates a client object Cj , saves the RT T value, and inserts the object into the client hash table. For each CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 94 subsequent connection from Cj , a new flow object Fij will be created and linked to the existing client object, Cj. The RT T for each new flow will be computed, and Cj ’s RT T will be updated based on an exponentially weighted moving average of the RT T s of its flows in the same manner as TCP [126]. The updated RT T is then used to determine the actual start time for each flow. 3.2.2 HTTP Request Once connected to the server, the remote client transmits an HTTP request for the container page and waits for the response. If this is not the first request over the connection, then this HTTP request indicates the beginning of the pageview transaction. Figure 3.5 depicts the first request over a connection. At time ti , the client transmits the HTTP GET request onto the network, and after taking 1/2 RTT to traverse the network, the server receives the request at ti + .5RT T . ksniffer captures and parses the packet containing the HTTP GET request, splitting the request into all its constituent components and identifying the URL requested. Since this is the first HTTP request over connection F1j , it incurs the connection setup overhead. In this case, a Web object is created, w1j,1, to represent the request, and the start time for w1j,1 is set to the start time of F1j . In this manner, the connection setup time is attributed to the first HTTP request on each flow. w1j,1 is then inserted into F1j ’s request queue and F1j ’s number-of-requests field is set to one. If this was not the first HTTP request over connection F1j , but was instead the k th request on F1j , a Web object wkj,1 would be created and its start time would be set equal to ti . Next, ksniffer creates pv1j , the pageview object that will track the pageview, and inserts it into Cj ’s pageview hash table. We assume for the moment that w1j,1 is a container page; embedded objects are discussed in Section 3.2.5. ksniffer sets pv1j ’s start time equal CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 95 to w1j,1’s start time, and sets w1j,1 as the container Web object for pv1j . At this point in time, ksniffer has properly determined which pageview is being downloaded, and the correct start time of the transaction. 3.2.3 HTTP Response After the Web server receives the HTTP request and takes p amount of time to process it, the server sends a reply back to the client. ksniffer captures the value of p, the server response time, which is often mistakenly cited as the client perceived response time. As we demonstrated in Chapter 2 server response time can underestimate the client perceived response time by more than an order of magnitude. The first response packet contains the HTTP response header, along with the initial portion of the Web object being retrieved. ksniffer looks at the response headers but never parses the actual Web content returned by the server; HTML parsing would entail too much overhead to be used in an online, high bandwidth environment (in Chapter 4 we discuss packet manipulation techniques which do not require an HTML language parser). ksniffer obtains F1j from the flow hash table and determines the first Web object in F1j ’s request queue is w1j,1, which was placed onto the queue when the request was captured. An HTTP response header does not specify the URL for which the response is for. Instead, HTTP protocol semantics dictate that, for a given connection, HTTP requests be serviced in the order they are received by the Web server. As a result, F1j ’s FIFO request queue enables ksniffer to identify each response over a flow with the correct request object. ksniffer updates w1j,1’s server reply state based on information contained in the response header. In particular, ksniffer uses the Content-length: and Transfer Encoding: fields, if present, to determine what will be the sequence number of the last byte of data transmitted by the server for this request. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 96 ksniffer captures each subsequent packet to identify the time of the end of the response. This is usually done by identifying the packet containing the sequence number for the last byte of the response. When the response is chunked [61], sequence number matching cannot be used. Instead, ksniffer follows the chunk chain within the response body across multiple packets to determine the packet containing the last byte of the response. For CGI responses over HTTP 1.0 which do not specify the Content-length: field, the server closes the connection to indicate the end of the response. In this case, ksniffer simply keeps track of the time for the last data packet before the connection is closed. ksniffer sets w1j,1’s end time to the arrival time of each response packet, plus 1/2 RT T to account for the transit time of the packet from server to client. ksniffer also sets pv1j ’s end time to w1j,1’s end time. The end time will monotonically increase until the server reply has been completed, at which point the (projected) end time will be equal to tk + .5RT T , as shown in Figure 3.5. When ksniffer captures the last byte of the response at time tk , w1j,1 is moved from F1j ’s request queue to F1j ’s finish queue, where it remains until either F1j is closed or until ksniffer determines that all segment retransmissions (if any) have been accounted for, which is discussed in Section 3.3. Most Web browsers in use today serialize multiple HTTP requests over a connection such that the next HTTP request is not sent until the response for the previous request has been fully received. For these clients, there is no need for each flow object to maintain a queue of requests since there will only be one outstanding request at any given time. The purpose of ksniffer’s request queue mechanism is to support HTTP pipelining, which has been adopted by a small, but potentially growing number of Web browsers. Under HTTP pipelining, a browser can send multiple HTTP requests at once, without waiting for the server to reply to each individual request. ksniffer’s request queues provide support for HTTP pipelining by conforming to RFC2616 [61], which states that a server must send its CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 97 responses to a set of pipelined requests in the same order that the requests are received on that connection. Since TCP is a reliable transport mechanism, requests that are pipelined from the client, in a certain order, are always received by the server in the same order. Any packet reordering that may occur in the network is handled by TCP at the server. ksniffer provides similar mechanisms to handle packet reordering so that HTTP requests are placed in F1j ’s request queues in the correct sequence. This entails properly handling a packet that contains multiple HTTP requests as well as an HTTP request which spans packet boundaries. At this point in time, ksniffer has properly determined tk + .5RT T , the time at which the packet containing the last byte of data for w1j,1 was received by client Cj . If the Web page has no embedded objects then this marks the end of the pageview transaction. For example, if w1j,1 corresponds to a PDF file instead of an HTML file, ksniffer can determine that the transaction has completed, since a PDF file cannot have embedded objects. If w1j,1 can potentially embed one or more Web objects, ksniffer cannot assume that pv1j has completed. Unfortunately, at time tk + .5RT T , ksniffer cannot determine yet if requests for embedded objects are forthcoming or not. In particular, ksniffer does not parse the HTML within the container page to identify which embedded objects may be requested by the browser. Such processing is too computationally expensive for an online, high bandwidth system, and often does not even provide the necessary information. For example, a JavaScript within the container page could download an arbitrary object that could only be detected by executing the JavaScript, not just parsing the HTML. Furthermore, HTML parsing would not indicate which embedded objects will be directly downloaded from the server, since some may be obtained via caches or proxies. ksniffer instead takes a simpler approach based on waiting and observing what further HTTP re- CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 98 quests are sent by the client, then using HTTP request header information to dynamically learn which container pages embed which objects. 3.2.4 Online Embedded Pattern Learning ksniffer learns which container pages embed which objects by tracking the Referer: field in HTTP request headers. The Referer: field contained in subsequent requests is used to group embedded objects with their associated container page. Since the Referer: field is not always present, ksniffer develops patterns from those it does collect to infer embedded object relationships when requests are captured that do not contain a Referer: field. This technique is faster than parsing HTML, executing JavaScript, or walking the Web site with a Web crawler. In addition, it allows ksniffer to react to changes in container page composition as they are reflected in the actual client transactions. ksniffer creates referer patterns on the fly. For each HTTP request that is captured, ksniffer parses the HTTP header and determines if the Referer: field is present. If so, this relationship is saved in a pattern for the container object. For example, when monitoring www.ibm.com, if a GET request for obj1.gif is captured, and the Referer: field is found to contain “http://www.ibm.com/index.html”, ksniffer adds obj1.gif as an embedded object within the pattern for index.html. If a Referer: field is captured which specifies a host not being monitored by ksniffer, such as “http://www.xyz.com/buy.html”, it is ignored. ksniffer uses file extensions as a heuristic when building patterns. Web objects with an extension such as .ps and .pdf cannot contain embedded objects, nor can they be embedded within a page. As such, patterns are not created for them, nor are they associated with a container page. Web objects with an extension such as .gif or .jpg are usually associated with a container page, but cannot themselves embed other objects. Web objects with an extension such as .html or .htm can embed other objects or be embedded CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 99 themselves. Each individual .html object has its own unique pattern, but currently an .html object is never a member of another object’s pattern. This prevents cycles within the pattern structures, but results in ksniffer treating frames of .html pages as separate pageviews. Taking this approach means that ksniffer does not need to be explicitly told which Web pages embed which objects – it learns this on its own. Patterns are persistently kept in memory using a hash table indexed by the container page URL. Each pvij and container wkj,i is linked to the pattern for the Web object it represents, allowing ksniffer to efficiently query the patterns associated with the set of active pageview transactions. Since Web pages can change over time, patterns get dynamically updated, based on the client activity seen at the Web site. Therefore, a particular embedded object, obj1.jpg, may not belong to the pattern for container index.html at time ti , and yet belong to the pattern at time ti±k . Likewise, a pattern may not exist for buy.html at time ti , but then be created at a later time ti+k , when a request is captured. Of course, the same embedded object, obj1.jpg, may appear in multiple patterns, index.html and buy.html, at the same time or at different times. Since patterns are only created from client transactions, the set of patterns managed by ksniffer may be a subset of all the container pages on the Web site. This can save memory: ksniffer maintains patterns for container pages that are being downloaded, but not for those container pages on the Web site which do not get requested. Only the Referer: field is used to manipulate patterns, and the embedded objects within a pattern are unordered. ksniffer places a configurable upper bound of 100 embedded objects within a pattern so as to limit storage requirements. When the limit is reached, an LRU algorithm is used for replacement, removing the embedded object which has not been linked to the container page in an HTTP request for the longest amount of time. Each pattern typically contains a superset of those objects which the container page CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 100 actually embeds. As the pattern changes, the new embedded objects get added to the pattern; but the old embedded objects only get removed from the pattern if the limit is reached. This is perfectly acceptable since ksniffer does not use patterns in a strict sense to determine, absolutely, whether or not a container page embeds a particular object. Most Web browsers, including Internet Explorer and Mozilla, provide Referer: fields, but some do not and privacy proxies may remove them. To see what percentage of embedded objects have Referer: fields in practice, we analyzed the access log files of a popular musician resource Web site that has over 800,000 monthly visitors. The access logs covered a 15 month period from January 2003 until March 2004. 87% of HTTP requests had a Referer: field, indicating that a substantial portion of embedded objects may have Referer: fields in practice. ksniffer is specifically designed for monitoring high speed links that transmit a large number of transactions per second. In the domain of pattern generation, this is an advantage. The probability that at least one HTTP request with the Referer: field set for a particular container page will arrive within a given time interval is extremely high. 3.2.5 Embedded Object Processing If a container page references embedded objects, the end of the transaction will be indicated by the packet containing the sequence number of the last byte of data, for the last object to complete transmission. To identify this packet, ksniffer determines which embedded object requests are related to each container page using the Referer: field of HTTP requests, file extension information, and the referer patterns discussed in Section 3.2.4. In our example, suppose index.html contains references to five embedded images obj1.gif, obj2.gif, obj3.gif, obj4.gif, and obj8.gif. The embedded objects will be identified and processed as shown in Figure 3.6 (ignoring for the moment F3j ). At time tk + .5RT T , CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME Client t0 Fj 1 Server Client SYN J tj ck J+1 SYN K, a ack K+1 GET index .html tk tk+.5RTT GET obj3.g if Fj 3 101 Server SYN Q ack Q+1 SYN R, ack R+1 GET buy.htm l Fj Client ta 2 SYN M Server M+1 SYN N, ack ack N+1 GET obj1.gif GET obj8.g if GET obj8.gif GET obj1.gif tr tq GET obj2.g if GET obj4.gif GET obj11.g if te Figure 3.6: Downloading multiple container pages and embedded objects over multiple connections. the browser parses the HTML document and identifies any embedded objects. If embedded objects are referenced within the HTML, the browser opens an additional connection, F2j , to the server so that multiple HTTP requests for the embedded objects can be serviced, in parallel, to reduce the overall latency of the transaction. The packet containing the sequence number of the last byte of the last embedded object to be fully transmitted indicates the end of the pageview transaction, te . The start and end times for embedded object requests are determined in the same manner as previously described in Sections 3.2.2 and 3.2.3. Each embedded object that is requested is tracked in the same manner that the container page, index.html, was tracked. For example, when the second connection is initiated, ksniffer creates a flow object F2j to track the connection, and associates it with Cj . When the request for obj1.gif on F2j is CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 102 captured at time tq , a w1j,2 object is created for tracking the request, and is placed onto F2j ’s request queue. To determine the pageview response time, which is calculated as te - t0 , requires correlating embedded objects to their proper container page, which involves tackling a set of challenging problems. Clients, especially proxies, may be downloading multiple pageviews simultaneously. It is possible for a person to open two or more browsers and connect to the same Web site, or for a proxy to send multiple pageview requests to a server, on behalf of several remote clients. In either case, there can be multiple active pageview transactions simultaneously associated with the remote client Cj (e.g., pv1j , pv2j ... pvkj ). In addition, some embedded objects being requested may appear in multiple pageviews, and some Web objects may be retrieved from caches or CDNs. ksniffer applies a set of heuristics that attempt to determine the true container page for each embedded object. We present experimental results in Section 3.6 demonstrating that these heuristics are effective for accurately measuring client perceived response time. For example, suppose that F3j in Figure 3.6 depicts client Cj downloading buy.html at roughly the same time as index.html (i.e. t0 ≈ tj ). Suppose also that ksniffer knows in advance that index.html embeds {obj1.gif, obj3.gif, obj8.gif, obj4.gif ,obj2.gif} and that buy.html embeds {obj1.gif, obj8.gif, obj11.gif}. This means that both container pages are valid candidates for the true container page of obj1.gif. Whether or not tr < tq is a crucial indication as to the identity of the true container page. At time ta , when connection F2j is being established, there is no information which could distinguish whether this connection belongs to index.html or buy.html. The only difference between F1j , F2j and F3j with respect to the TCP/IP 4-tuple is the remote client port number. Hence only the client, Cj , can be identified at time ta , and at time tq , it is unknown whether index.html or buy.html is the true container page for obj1.gif. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 103 Cj hash table loners FIFO pv 6j pv 5j pv 3j pv 7j pv 2j pv 8j pv 4j pv1j pv 8j pv 7j pv 4j .pdf, ps, .zip, etc .html, shtml, etc pv 2j pv1j contents of hashtable Figure 3.7: Client active pageviews. To manage pageviews and their associated embedded objects, ksniffer maintains three lists of active pageviews for each client, each sorted by request time, as shown in Figure 3.7. The loners queue contains pageviews which represent objects that cannot have embedded objects. These pageviews are kept in their own list, which is never searched when attempting to locate a container page for a new embedded object request. All other pageviews, which could potentially embed an object, are placed on both a FIFO pageview queue and the pageview hash table. This enables ksniffer to quickly locate the youngest candidate container page. Each pageview also maintains an embedded object hash table, not shown in Figure 3.7, that consists of the embedded objects associated with that pageview and state indicating whether and to what extent they have been downloaded. Given a request wij,k captured on flow Fkj for client Cj , ksniffer will perform the following actions: 1. If wij,k ∈ {.html, .shtml, ...} ksniffer will treat wij,k as a container page by placing it into the pageview hash table (and FIFO queue) for client Cj . In addition, if a pageview is currently associated with Fkj , ksniffer assumes that pageview is done. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 104 2. If wij,k ∈ {.pdf, .ps, ...} ksniffer will treat wij,k as a loner object by placing it on the loner queue for Cj . In addition, if a pageview is currently associated with Fkj , ksniffer assumes it is done. 3. If wij,k ∈ {.jpg, .gif, ...} then (a) If the Referer: field contains the monitored server name, such as http://www.ibm.com/buy.html, then Cj ’s pageview hash table is searched to locate pvcj , the youngest pageview downloading that container page that has yet to download wij,k . If pvcj exists, wij,k is associated to pvcj as one of its embedded objects. If no pageview meets the criterion, pvcj is created and wij,k is associated to it. (b) If the Referer: field contains a foreign host name, such as http://www.xyz.com/buy.html, then wij,k is treated as a loner object. (c) If wij,k has no Referer: field, then the FIFO queue is searched to locate, pvcj , the youngest pageview which has wij,k in its referer pattern and has yet to download wij,k . If pvcj exists, then wij,k is associated to pvcj as one of its embedded objects. If no pageview meets the criterion, then wij,k is treated as a loner object. The algorithm above is based on several premises. If a request for an embedded object wij,k arrives with a Referer: field containing the monitored server as the host (e.g., http://www.ibm.com/buy.html), then the remote browser almost certainly must have previously downloaded that container page (e.g., buy.html) from the monitored server (e.g., www.ibm.com), parsed the page, and is now sending the request for the embedded object wij,k . If ksniffer failed to capture the request for the container page (e.g., buy.html) it is highly likely that it is being served from the browser cache for this particular transaction. If a request for an embedded object arrives with a Referer: field containing a foreign CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 105 host (e.g., http://www.xyz.com/buy.html), it is highly likely that the foreign host is simply embedding objects from the monitored Web site into its own pages. When a request for an embedded object arrives without a Referer: field, every pageview associated with the client becomes a potential candidate for the container page of that object. This is depicted in Figure 3.6 when the request for obj1.gif arrives without a Referer: field. If the client is actually a remote proxy, then the number of potential candidates may be large. ksniffer applies the patterns described in Section 3.2.4 as a means of reducing the number of potential candidates and focusing on the true container page of the embedded object. The heuristic is to locate the youngest pageview which contains the object in its pattern, but has yet to download the object. Patterns are therefore exclusionary. Any candidate pageview not containing the embedded object in its pattern is excluded from consideration. This may result in the true container page being passed over, but as mentioned in Section 3.2.4, the likelihood that a container page embeds an object that does not appear in the page’s pattern is very low for an active Web site. If a suitable container pageview is not found, then the object is treated as a loner object. If a Referer: field is missing, then most likely it was removed by a proxy and not a browser on the client machine; but if the proxy had cached the container page during a prior transaction, it is likely to have cached the embedded object as well. This implies the object is not being requested as part of a page, but being downloaded as an individual loner object. If a client downloads an embedded object, such as obj1.gif, it is unlikely that the client will download the same object again, for the same container page. If an object appears multiple places within a container page, most browsers will only request it once from the server. Therefore, ksniffer not only checks if an embedded object is in the pattern for a container page, but also checks if that instance has already downloaded the object or not. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 106 The youngest candidate is usually a better choice than the oldest candidate. If browsers could not obtain objects from a cache or CDN, then the oldest candidate would be a better choice, based on FCFS. Since this is not the case, choosing the oldest candidate will tend to assign an object obj1.jpg to a container page whose ‘slot’ for obj1.jpg was already filled via an unseen cache hit. This tends to overestimate response time for older pages. It is more likely that an older page obtained obj1.jpg from a cache and that the younger page is the true container for obj1.jpg, than vice versa. ksniffer relies on capturing the last byte of data for the last embedded object to determine the pageview response time. However, given the use of browser caches and CDNs, not all embedded objects will be seen by ksniffer since not all objects will be downloaded directly from the Web server. The purpose of a cache or CDN is to provide much faster response time than can be delivered by the original Web server. As a result, it is likely that objects requested from a cache or CDN will be received by the client before objects requested from the original server. If the Web server is still serving the last embedded object received by the client, other objects served from a cache or CDN will not impact ksniffer’s pageview response time measurement accuracy. If the last embedded object received by the client is from a cache or CDN, ksniffer will end up not including that object’s download time as part of its pageview response time. Since caches and CDNs are designed to be fast, the time unaccounted for by ksniffer will tend to be small even in this case. Given that embedded objects may be obtained from someplace other than the server, and that a pattern for a container page may not be complete, how can ksniffer determine that the last embedded object has been requested? For example, at time te , how can ksniffer determine whether the entire download for index.html is completed, or another embedded object will be downloaded for index.html on either F1j or F2j ? This is essentially the same CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 107 problem described at the end of Section 3.2.3 with respect to whether or not embedded object requests will follow a request for a container page or not. ksniffer approaches this problem in two ways. First, if no embedded objects are associated to a pageview after a timeout interval, the pageview transaction is assumed to be complete. A six second timeout is used by default, in part based on the fact that the current ad hoc industry quality goal for complete Web page download times is six seconds [87]. If a client does not generate additional requests for embedded objects within this time frame, it is very likely that the pageview is complete. ksniffer also cannot report the response time for a pageview until the timeout expires. A six second timeout is small enough to impose only a modest delay in reporting. We discuss the implications of such a timeout in the presence of connection failure in the next section on packet loss. Second, if a request for a container page, wkj,i, arrives on a persistent connection Fij , then we consider that all pageview transactions associated with each prior object, wbj,i, b < k, on Fij to be complete. In other words, a new container page request over a persistent connection signals the completion of the prior transaction and the beginning of a new one. We believe this to be a reasonable assumption, including under pipelined requests, since in most cases, only the embedded object requests will be pipelined. Typical user behavior will end up serializing container page requests over any given connection. Hence, the arrival of a new container page request would indicate a user click in the browser associated with this connection. Taking this approach also allows ksniffer to properly handle quick clicks, when the user clicks on a visible link before the entire pageview is downloaded and displayed in the browser. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 108 3.3 Packet Loss Studies have shown that the packet loss rate within the Internet is roughly 1-3% [167]. We classify packet loss into three types: A) a packet is dropped by the network before being captured by ksniffer, B) a packet is dropped by the network after being captured and C) a packet is dropped by the server or client after being captured. Types A and B are most often due to network congestion or transmission errors while type C drops occur when the Web server (or, less likely, the client) becomes temporarily overloaded. The impact that a packet drop has on measuring response time depends not only on where or why it was dropped, but also on the contents of the packet. We first address the impact of SYN drops, then look at how a lost data packet can affect response time measurements. Figure 3.5 depicts the well known TCP connection establishment protocol. Suppose that the initial SYN which is transmitted at time t0 is either dropped in the network or at the server. In either case, no SYN/ACK response is forthcoming from the server. The client side TCP recognizes such SYN drops through use of a timer [33]. If a response is not received in 3 seconds, TCP will retransmit the SYN packet. If that SYN packet is also dropped by the network or server, TCP will again resend the same SYN packet, but not until after waiting an additional 6 seconds. As each SYN is dropped, TCP doubles the wait period between SYN retransmissions: 3 s, 6 s, 12 s, 24 s, etc. TCP continues in this manner until either the configured limit of retries is reached, at which time TCP reports “unable to connect” back to the browser, or the user takes an action to abort the connection attempt, such as refreshing or closing the browser. This behavior is the TCP exponential backoff mechanism we have been discussing throughout this dissertation and is depicted in Figure 3.8. This additional delay has a large impact on the client response time. Suppose there CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 109 F1j Client t0 Server SYN J x t0 + 3s SYN J SYN dropped by network x SYN dropped by server t0 + 9s SYN J t0+9s+.5RTT + d SYN K, ack J+1 t0+9s+RTT + d ack K+1 t0+9s+1.5RTT + 2d captured by ksniffer Figure 3.8: Network and server dropped SYNs. is a 3% network packet loss rate from client to server. Three percent of the SYN packets sent from the remote clients will be dropped in the network before reaching ksniffer or the server. The problem is that since the SYN packets are dropped in the network before reaching the server farm, both ksniffer and the server are completely unaware that the SYNs were dropped. This will automatically result in an error for any traffic monitoring system which measures response time using only those packets which are actually captured. If each client is using two persistent connections to access the Web site, this error will be 180% for a 100 ms response time and a 4.5% error for a 4 s response time. Under HTTP 1.0 without KeepAlive, where a connection is opened to obtain each object, the probability of a network SYN drop grows with the number of objects in the pageview. For CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 110 a page download of 10 objects, there is a 30% chance of incurring the 3 second retransmission delay, a 60% chance for 20 objects and a 90% chance for 30 objects. ksniffer uses a simple technique similar to the fast online Certes algorithm for capturing this undetectable connection delay (type ‘A’ SYN packet loss). Three counters are kept for each subnet. One of the three counters is incremented whenever a SYN/ACK packet is retransmitted from the server to the client (which indicates that the SYN/ACK packet was lost in the network). The counter that gets incremented depends on how many times the SYN/ACK has been transmitted. Every time a SYN/ACK is sent twice, the first counter is incremented, every time a SYN/ACK packet is sent 3 times, the second counter is incremented, and every time a SYN/ACK is sent 4 times, the third counter is incremented. Whenever a SYN packet arrives for a new connection, if one of the three counters is greater than zero, then ksniffer subtracts the appropriate amount of time from the start time of the connection and decrements the counter (round robin is used to break ties). Assuming that a SYN packet will be dropped as often as a SYN/ACK, this gives ksniffer a reasonable estimate for the number of connections which are experiencing a 3 s, 9 s, or 21 s connection delay. Alternatively, an approach such as the one in Sting [140] could be used to estimate client to server network loss rates, but this involves active probing. The same retransmission delays are incurred when SYNs are dropped by the server (type ‘C’). In this case, ksniffer is able to capture and detect that the SYNs were dropped by the server, and distinguish these connection delays, which are due to server overload, from those previously described, which are due to network congestion. ksniffer also determines when a client is unable to connect to the server. If the client reattempts access to the Web site in the next six seconds after a connection failure, ksniffer considers the time associated with the first failed connection attempt as part of the connection latency for the reattempt; otherwise the failed connection attempt is reported under the category “frustrated client”. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 111 if (num syns > 1) return; if (num syns == 0) num syns = 1; last syn = ti ; start time = ti ; else delta = ti - last syn; for (j = 1, j ≤ k, j++) { F = 3[2j - 1]; for (i = j + 1; i ≤ k; i++) { L = 3[2i - 1]; if (delta ≈ L - F) { start time -= F; num syns = i+1; return; } } } Figure 3.9: Algorithm for detecting network dropped SYNs from captured SYNs. Interestingly, some undetectable network SYN drops are actually detectable. In Figure 3.8, the presence of a 6 second gap between the first and second captured SYN allows ksniffer to infer the presence of the network dropped SYN. Although this situation is less common, whenever two SYNs are captured by ksniffer, the gap can be calculated to determine whether or not one or more network dropped SYNs occurred. The algorithm in Figure 3.9 depicts the process required to capture detectable network SYN drops from the time gap between two captured SYNs. In the algorithm, ti is the capture time of the current SYN packet, k is the number of expected SYN retransmissions to account for, and num syns is the number of total SYNs sent by the client, both captured plus inferred. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 112 Similar undetected latency occurs when a GET request is dropped in the network before reaching ksniffer or the server, then retransmitted by the client. An undetected GET request drop differs from an undetected SYN drop in two ways. First, unlike SYN drops, TCP determines the retransmission timeout period based on RTT and a number of implementation dependent parameters. ksniffer implements the standard RTO calculation [126] using Linux TCP parameters, and adjusts for this undetectable time in the same manner as mentioned above. Note that fast retransmit does not come into play since the server is not actually expecting to receive a TCP segment from the client. Second, a dropped GET request will only affect the measurement of the overall pageview response time if the GET request is for a container page and is not the first request over the connection. Otherwise, the start of the transaction will be indicated by the start of connection establishment, not the time of the container page request. As mentioned in the previous section, ksniffer uses a timeout as one mechanism for determining when a pageview is finished. If no embedded objects are associated to a pageview after a specified timeout period, the pageview transaction is assumed to be complete. Yet, the presence of network and server dropped SYNs affects this heuristic since both can increase the time gap between object downloads. This is easily resolved by using a 6 s timeout period in conjunction with the notion that a pageview remains as being ‘in progress’ while the client is in the process of attempting to establish a connection (which may include a series of SYN drops). The usual rules then apply: the first request over the new connection will be for either an embedded object (i.e. continued processing of the current pageview) or a container page (i.e. start of a new pageview). ksniffer often expects to capture the packet containing the sequence number of the last byte of data for a particular request. To capture retransmissions, ksniffer uses a timer along with the finish queue on each flow to capture retransmitted packets and update the CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 113 end of response time appropriately. Suppose the last packet of a response is captured by ksniffer at time tk , at which point ksniffer identifies it as containing the sequence number for the last byte of the response, and moves the wkj,i request object from the flow’s request queue to the flow’s finish queue. The packet is then dropped in the network before reaching the client (type ‘B’). At time tk+h , ksniffer will capture the retransmitted packet and, using its sequence number, determine that it is a retransmission for wkj,i, which is located on the finish queue. The completion time of wkj,i is then set to the timestamp of this packet. 3.4 Longest Prefix Matching Longest prefix matching is the ability to determine which subnet a remote client is a member of. This capability enables ksniffer to monitor or aggregate information on a per remote subnet basis. Determining response times on a per subnet basis, determining the most active subnets, identifying changes in activity over time periods for specific subnets are all examples of such analysis. This capability cannot be performed using common packet header filtering techniques. As such, ksniffer implements the Chiueh and Pradhan [44] algorithm for longest prefix matching. We made two slight modifications to the published work. First, in our implementation, we shared the prefix structures across multiple hash tables, reducing memory requirements. Second, we tracked the most frequently accessed prefixes by use of a statistical hot list, in which we kept pointers to the top 200 prefixes. We perform an offline preprocessing of routing tables to improve the performance at load time of the kprefix module. This includes the following offline steps: 1. Download the compressed, binary formatted BGP routing tables from RIPE NCC [135] and the Oregon Route Views Project [155]. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 114 2. Parse the compressed, binary BGP tables and construct a longest prefix matching hash structure and save it to disk. 3. Copy the longest prefix file to the ksniffer machine. The online process simply involves reading the longest prefix file from disk during module load and creating a hash table structure in kernel memory. The files obtained from RIPE NCC [135] and the Oregon Route Views Project [155] are in the well known MRT binary format [104]. The process of downloading and parsing the files takes several hours and is performed offline using a Java program. The longest prefix file is simply a list of prefix/length pairs. During module load, this file can be read into the hash structure in less than one second. The reason why the offline process takes several hours is twofold: downloading the BGP routing table files takes time, and each file contains a significant number of duplicate entries. Krishnamurthy and Wang [90] found 391,497 unique prefix/netmask entries following a related approach. We obtained 139,508 unique longest prefixes that required 21.5 MB of kernel memory. The longest prefix matching is performed only once per connection, not for every packet seen, and then stored in a data structure. At runtime, the cost of a longest prefix match is O(c), where c is a small constant independent of the number of entries in the hash tables. ksniffer performs a longest prefix match once per connection (for the remote IP address), when it observes the TCP 3-way handshake (not for every packet). 3.5 Tracking the Most Active Often it is only the most active clients, pages or subnets that are of interest in managing a Web server. Knowing which subnets or clients are most active during different times of the CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 115 day is useful for providing quality of service, performing capacity planning, or deciding where to place a content distribution node. Such a “hot list” would be expected to change over time, based on the activities of the clients. Unfortunately, for things like IP addresses, it is difficult to track each and every client over long periods of time (a hash table of size 232 would be required). Instead, we use a probabilistic approach to maintaining hot lists which is based on the Gibbons and Matias algorithm [67] which tracks the most relevant users over time to within a small error fraction. This allows ksniffer to maintain hot lists for a number of items without large storage overheads. ksniffer also supports the ability to quickly perform various types of matching based on a URI string (or substring thereof) which is useful for aggregating response times. Three matching algorithms are implemented: 1) the ability to track response time for URIs based on the longest directory path which it matches, 2) matching on an exact directly path, and 3) matching on the file extension of the requested filename. 3.6 Experimental Results We implemented ksniffer as a set of Linux kernel modules and installed it on a commodity PC to demonstrate its accuracy and performance under a wide range of Web workloads. We report an evaluation of ksniffer in a controlled experimental setting as well as an evaluation of ksniffer tracking user behavior at a live Internet Web site. Our experimental test bed is shown in Figure 3.10. We used a traffic model based on Surge [23] but made some minor adjustments to reflect more recent work [76, 146] done on characterizing Web traffic: the maximum number of embedded objects in a given page was reduced from 150 to 100 and the percentage of base, embedded, and loner objects were changed from 30%, 38% and 32% to 42%, 48% and 10%, respectively. The total CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 10.1.0.0 200ms 10.2.0.0 140ms 1 GHz PIII 512M RAM Redhat 7.3 116 Catalyst 6500 10.3.0.0 80ms 10.4.0.0 switch 1.7GHz Xeon 750M RAM RedHat 9.0 Web Server 20ms 1.7GHz Xeon ksniffer 750M RAM Redhat 9.0 Figure 3.10: Experimental test bed. number of container pages was 1041, with 959 unique embedded objects. 49% of the embedded objects are embedded by more than one container page. We also fixed a bug in the modeling code and included CGI scripts in our experiments, something not present in Surge. For traffic generation, we used an updated version of WaspClient [107], which is a modified version of the client provided by Surge. Virtual clients on each machine cycle through a series of pageview requests, first obtaining the container page then all its embedded objects. A virtual client can open 2 parallel TCP connections for fetching pages, which mimics the behavior of Microsoft IE. Requests on a TCP connection are serialized, so that the next request is not sent until the current response on that connection is obtained. In addition, each virtual client binds to a unique IP address using IP aliasing on the client machine. This lets each client machine appear to the server as a collection of up to 200 unique clients from the same subnet. To emulate wide-area network conditions, we extended the rshaper [139] bandwidth shaping tool to include packet loss and round trip latencies. We installed this software on each client traffic generator machine, enabling us to impose packet drops as well CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 117 as the RTT delays between 20 to 200 ms as specified in Figure 3.10. To quantify the accuracy of the client perceived response times measured by ksniffer, we ran fifteen different experiments with different traffic loads under non-ideal and high-stress operating conditions and compared ksniffer’s measurements against those obtained by the traffic generators executing on the client machines. We measured with two different Web servers, Apache and TUX, used both HTTP 1.0 without KeepAlive and persistent HTTP 1.1, and included a combination of static pages and CGI programs for Web content. We also measured in the presence of network and server packet loss, missing Referer: fields, client caching, and near gigabit traffic rates. Table 3.1 summarizes these experimental results. In all cases, the difference between the mean response time as determined by ksniffer, and that measured directly on the remote client was less than 5%. Furthermore, the absolute time difference between ksniffer and client-side instrumentation was in some cases less than 1 ms and in all cases less than 50 ms. Web Server Apache Apache Apache Apache TUX TUX Apache Apache Apache TUX TUX TUX Apache Apache Apache Type HTTP PV/s URL/s Mbps static cgi+static static cgi+static static static static cgi+static static static static static static static static 1.0 1.0 1.1 1.1 1.0 1.1 1.0 1.1 1.1 1.0 1.1 1.0 1.0 1.1 1.0 5-140 5-160 10-180 10-400 65-750 125-1370 35-500 60-690 60-700 1909 2423 0-2410 419 728 2174 5-625 10-660 30-730 40-1520 260-3000 500-5300 140-2000 250-2880 260-3000 8,007 10,164 0-10,000 1756 3054 9120 1-60 1-60 3-70 3-140 15-270 35-455 10-200 15-250 20-265 690 878 0-850 152 264 462 Table 3.1: Summary of results. Client RT 1.528s 1.513s 1.003s 0.726s 1.556s 0.815s 1.537s 0.792s 0.884s 7.8ms 30.5ms 0.574s 1.849s .328s .365s ksniffer RT 1.498s 1.483s 0.981s 0.699s 1.506s 0.782s 1.489s 0.825s 0.929s 7.7ms 29.7ms 0.571s 1.806s .318s .363s diff (ms) -29 -30 -22 -27 -49 -33 -48 -33 -45 -0.17 -0.83 -3 -42 -10 -1.7 % diff -1.9 -2.0 -2.2 -3.7 -3.2 -4.1 -3.1 -4.0 -4.8 -2.2 -2.7 -0.5 -2.3 -3.1 -0.5 elapsed time 133m 133m 79m 72m 20m 11m 32m 22m 18m 210s 165s 29m 16m 9m 184s CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME A B C D E F G H I S1 S2 V O1 O2 X Virtual Clients 120 120 120 120 800 800 500 400 500 16 80 800 800 240 800 118 CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 119 1400 clients ksniffer 1200 pageviews/s 1000 800 600 400 200 0 0 200 400 interval (s) 600 Figure 3.11: Test F, pageviews per second. All tests (except Tests S1 and S2) were done under non-ideal conditions found in the Internet with 2% packet loss and 20% missing Referer: fields. Each client requested the same sequence of pageviews, but since each traffic generator machine was configured with a different RTT to the Web server as shown in Figure 3.10, the clients took different amounts of time to obtain all of their pages, resulting in a variable load on the Web server over time. For example, Figure 3.11 shows results from Test F comparing ksniffer against client-side instrumentation in measuring pageviews/s over time. There are two lines in the figure, but they are hard to distinguish because ksniffer’s pageview count is so close to direct client-side instrumentation. Figure 3.12 shows results from Test F comparing ksniffer against client-side instrumentation in measuring mean client perceived pageview response time for each 1 second interval. ksniffer results are very accurate and hard to distinguish from client-side instrumentation. As indicated by Figure 3.11, the variable response time is due to the completion of clients. During the initial 250 s, clients from each of the four subnets are actively making requests. At around 250 s, the clients from subnet 10.4.0.0 with RTT 20 ms have CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 120 mean response time (s) 2.5 2 clients ksniffer 1.5 1 0.5 0 0 200 400 interval (s) 600 Figure 3.12: Test F, mean pageview response time. completed, while clients from the other subnets remain active. At around 300 s, the clients from subnet 10.3.0.0 with RTT of 80 ms have completed, leaving clients from subnets 10.2.0.0 and 10.1.0.0 active. At time 475 s, clients from subnet 10.2.0.0 with RTT of 140 ms have completed, leaving only those clients from subnet 10.1.0.0 with RTT of 200 ms. Note that, although the pageview request rate decreases, the mean response time increases because the remaining clients have larger RTTs to the Web server and thus incur larger response times. Figure 3.13 shows results for Tests A through F obtained by applying the longest prefix matching algorithm described in Section 3.4 to categorize RTT and response time on a per subnet basis. In the figure, the blue bars represent the response time obtained by ksniffer for the corresponding set of clients in each subnet. These results show that ksniffer provides accurate pageview response times as compared to client-side instrumentation even on a per subnet basis when different subnets have different RTTs to the Web server. ksniffer RTT measurements are also very accurate as compared to the actual RTT used for each subnet. The results show how this mechanism can be very effective in differentiating CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 10.1.0.0 10.2.0.0 10.3.0.0 10.4.0.0 ksniffer 3 response time (sec) 121 2.5 2 1.5 1 0.5 0 A B C D Experiments E F Figure 3.13: Client perceived response time on a per subnet basis. performance and identifying problems across different subnets. Tests S1 and S2 were done under high bandwidth conditions to show results at the maximum bandwidth rate possible in our test bed. This was done by using the faster TUX Web server and by imposing no packet loss or network delay. For HTTP 1.1, 80 virtual clients generated the greatest bandwidth rate, but under HTTP 1.0 only 16 clients generated the highest bandwidth rate. ksniffer is within 3% of client-side measurements, even under a rate of 878 Mbps of HTTP content. The absolute time difference between ksniffer and client response time measurements was less than 1 ms. We note that the resolution of the packet timer on ksniffer is only 1 ms, due to the Linux clock timer granularity. Under HTTP 1.0 without KeepAlive, each object retrieved requires its own TCP connection. The TCP connection rate under Test S1 was 8,000 connections/s. The results demonstrate ksniffer’s ability to track TCP connection establishment and termination at high connection rates. Test V was done with severe variations in load alternating between no load and CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 122 mean response time (s) 0.8 0.6 0.4 clients ksniffer 0.2 0 0 200 400 600 interval (s) 800 1000 Figure 3.14: Test V, mean pageview response time. 4 8 x 10 count 6 clients ksniffer 4 2 0 0 0.5 1 1.5 seconds 2 2.5 Figure 3.15: Test V, response time distribution. maximum bandwidth load by switching the clients between on and off modes every 50 s. Figure 3.14 compares ksniffer response time with that measured at the client, and Figure 3.15 compares the distribution of the response time. This indicates ksniffer’s accuracy under extreme variations in load. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 123 mean response time (s) 0.5 0.4 0.3 0.2 clients ksniffer 0.1 0 0 50 100 interval (s) 150 Figure 3.16: Test X, mean pageview response time. Tests O1 and O2 were done with the Web server experiencing overload and therefore dropping connections. We configured Apache to support up to 255 simultaneous connections, then started 240 virtual clients. Since each client opens two connections to the server to obtain a container page and its embedded objects, this overwhelmed Apache. During Test O1 and O2, the Web server machine reported a connection failure rate of 27% and 12%, respectively. Table 3.1 shows that ksniffer’s pageview response time for these tests were only 3% less than those from the client-side. These results show ksniffer’s ability to measure response times accurately in the presence of both server overload and network packet loss Test X was done to show ksniffer performance with caching clients by modifying the clients so that 50% of the embedded objects requested were obtained from a zero latency local cache. Figure 3.16 compares ksniffer and client-side instrumentation in measuring pageview response time over the course of the experiment. The results show that ksniffer can provide very accurate response time measurements in the presence of client caching as well. PageDetailer embedded objects 35 2.5 30 response time (s) 3 ksniffer 25 2 20 1.5 15 1 10 0.5 5 0 0 124 embedd ed objects CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 1 2 3 4 5 6 7 8 9 10 11 12 each page downloaded during user session Figure 3.17: Live Internet Web site. We deployed ksniffer in front of a live Internet Web site, GuitarNotes.com, which is hosted in NYC. Figure 3.17 depicts results for tracking a single user during a logon session from Hawthorne, NY. Using MS IE V6, and beginning with an empty browser cache, the user first accessed the home page and then visited a dozen pages within the site including the product review section, discussion forum, FAQ, classified ads, and performed several site searches for information. This covered a range of static and dynamically generated pageviews. The number of embedded objects for each page varied between 5 and 30, and is indicated by the dotted line, which is graphed against the secondary Y axis on the right. These objects included .gif, .css and .js objects. PageDetailer [79] was executing on the client machine monitoring all socket level activity of IE. PageDetailer uses a Windows socket probe to monitor and timestamp each socket call made by the browser: connect(), select(), read() and write(). By parsing the HTTP requests and replies, it is able to determine the response time for a pageview, as well as for each embedded object within a page. The pageview response time is calculated as the difference between the connect() system call entry and the return from the read() CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 125 system call for the last byte of data of the last embedded object. As shown in Figure 3.17, the response time which ksniffer calculates in NYC at the Web server is nearly identical to that measured by PageDetailer running on the remote client machine. For each of the twelve pages downloaded by the client, ksniffer is within 5% of the response time recorded by PageDetailer. ksniffer provides excellent performance scalability compared to common user-space passive packet capture systems. Almost all existing passive packet capture systems in use today are based on libpcap [151]. Libpcap is a user space library that opens a raw socket to provide packets to user space monitor programs. As a scalability test, we wrote a libpcap based traffic monitor program whose only function was to count TCP packets. Executing on the same physical machine as ksniffer, the libpcap packet counter program began to drop a large percentage of packets when the traffic rate was roughly 325 Mbps. In contrast, ksniffer performs complex pageview analysis at near gigabit traffic rates without such packet loss. In Chapter 5 we discuss several alternative methods for measuring client perceived response time along with their respective shortcomings. One approach, which we mention now, involves instrumenting the Web server to track when requests arrive and complete service at the application-level [8, 86, 92, 94]. This approach has the desirable properties that it only requires information available at the Web server and can be used for non-HTML content. However, application-level approaches do not account for network interactions or delays due to TCP connection setup or waiting in kernel queues on the Web server. Our results from Chapter 2 demonstrate that application-level Web server measurements can under estimate response time by more than an order of magnitude. Figure 3.18 compares the response time captured within Apache by tracking when requests arrive and complete service with the response time measured at the remote client and by ksniffer. CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 126 3 clients (per PV) ksniffer (per PV) apache (per URL) mean response time (s) 2.5 2 1.5 1 0.5 0 0 500 1000 1500 2000 2500 interval (s) 3000 3500 4000 4500 Figure 3.18: Apache measured response time, per URL. 1.5 clients, RT = 0.131 ksniffer, RT = 0.133 apache, RT = 0.000429051 response time (s) 1 0.5 0 0 500 1000 1500 2000 2500 interval (s) 3000 3500 4000 4500 Figure 3.19: Apache measured response time for loner pages. Apache does not correlate embedded objects with their container pages to provide a per pageview response time. Instead, Apache only measures response time for each individual URL request. As such, the response time reported by Apache is completely unrelated to the response time perceived by the remote client. Even the per URL response time reported by Apache is grossly inaccurate with respect to the per URL response time measured by CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 127 the client. Figure 3.19 depicts the Apache response time for only the loner pages - container pages without embedded objects. This per URL response time measured by Apache is extremely inaccurate with respect to the response time experienced by the remote client. Nevertheless, systems continue to mistakenly use the Apache measured response time as the basis for determining and managing quality of service in Web servers [26, 46]. 3.7 Summary We have designed, implemented and evaluated ksniffer, a kernel-based traffic monitor that can be colocated with Web servers to measure their performance as perceived by remote clients in real-time. State of the art packet level approaches for reconstructing network protocol behavior are based on offline analysis, after the network packets have been logged to disk [6, 37, 60, 66, 75]. This kind of analysis uses multiple passes and is limited to analyzing only reasonably sized log files [146]. ksniffer’s correlation algorithm differs from EtE [66] in that it does not require multiple passes and offline operation, uses file extensions and refer host names in addition to the filename in the Referer: field, handles multiple requests for the same Web page from the same client, and accounts for connection setup time and packet loss in determining response time. Feldmann [60] describes many of the issues involved in TCP/HTTP reconstruction, but does not consider the problem of measuring response time. ksniffer shares certain limitations that are present in all network traffic monitors. Response time components due to processing on the remote client machines cannot be directly measured from server-side network traffic. Examples include times for DNS query resolution and HTML parsing and rendering on the client. Embedded objects obtained from locations other than the monitored servers may have an impact on accuracy as well, CHAPTER 3. MODELING CLIENT PERCEIVED RESPONSE TIME 128 but only if their download completion time exceeds that of the last object obtained from the monitored server. As a passive network monitor, ksniffer requires no changes to clients or Web servers, and does not perturb performance in the way that intrusive instrumentation methods can. ksniffer determines client perceived pageview response times using novel, online mechanisms that take a “look once, then drop” approach to packet analysis to reconstruct TCP connections and learn client pageview activity. We have implemented ksniffer as a set of loadable Linux kernel modules and validated its performance using both a controlled experimental test bed and a live Internet Web site. Our results show that ksniffer’s in-kernel design scales much better than common user-space approaches, enabling ksniffer to monitor gigabit traffic rates using only commodity hardware, software, and network interface cards. More importantly, our results demonstrate ksniffer’s unique ability to accurately measure client perceived response times even in the presence of network and server packet loss, missing HTTP Referer: fields, client caching, and widely varying static and dynamic Web content. Our measurement work with ksniffer lead us to development Remote Latency-based Management, an approach for not only measuring but also managing the remote client perceived response time. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 129 Chapter 4 Remote Latency-based Web Server Management Although many techniques have been developed for managing the response time of Web services [3, 8, 30, 31, 43, 46, 54, 89, 159], previous approaches focused on controlling only the Web server response time of individual URL requests. Unfortunately, this has little relevance to end users who are located remotely, not at the Web server, and who are interested in viewing entire Web pages that consist of multiple objects, not just individual URLs. The problem is exacerbated by the fact that the response time measured within the Web server can be an order of magnitude less than that perceived by the remote client, as shown in Chapter 2. These techniques are controlling the wrong measure of response time. They may in fact improve server response time while unknowingly and unexpectedly degrade the overall response time seen from the perspective of end users as shown in Section 2.6. Managing the response time of Web services is crucial for providing differentiated services in which different classes of traffic or clients can receive different quality of CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 130 service. For example, an e-commerce Web site may want to ensure that users with full shopping carts are given highest priority for receiving the best response time while users who are casual visitors are given lower priority. Existing approaches that provide differentiated services often depend on load shedding in the form of admission control to maintain a specified set of response time thresholds [31, 54, 43, 89]. Requests from low priority clients are dropped when they begin to interfere with the response time of high priority clients. However, prior techniques ignore the effect that admission control drops have on the overall response time perceived by end users. In this chapter we present Remote Latency-based Management (RLM), a novel approach for managing Web response times as perceived by remote clients using only serverside techniques. RLM correlates container pages and their embedded objects to manage pageview response times of entire Web pages, not just individual URLs. RLM tracks the progress of each Web page in real-time as it is downloaded, and uses the information to dynamically control the client perceived response time by manipulating the network traffic in and out of the Web server complex. RLM provides its management functionality in a non-invasive manner as a stand-alone appliance that simply sits in front of a Web server complex, without any changes to existing Web clients, servers, or applications. RLM builds on our work with ksniffer from Chapter 3, our kernel-based traffic monitor that accurately estimates pageview response times as perceived by remote clients at gigabit traffic rates. RLM uses passive packet capture to track the elapsed time of each pageview download, then uses this information to build a novel event node model to enable RLM to make key management decisions dynamically at each point in the download process. In particular, our model defines and includes the effect of connection admission control drops on partially successful Web page downloads. It also accounts for some notable behaviors of common Web browsers in the presence of connection failures. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 131 Using this model, RLM applies two sets of techniques for managing pageview response time, fast SYN and SYN/ACK retransmission, and embedded object removal and rewrite. Fast SYN and SYN/ACK retransmission reduce connection latencies associated with bursty loads and network loss, which are key factors for short-lived TCP connections typical of Web transactions. Embedded object removal and rewrite reduce server and network transfer latencies by adapting Web page content as the pageview is in the process of being downloaded. These techniques can be applied using a simple set of management rules that can be defined to provide differentiated services across multiple classes of Web clients and content. We implemented RLM on an inexpensive, commodity, Linux-based PC and demonstrate that it can manage client perceived pageview response times in real-time for threetier Web architectures. Using our prototype, we present some experimental data obtained from using RLM for managing client perceived pageview response times using the TPCW e-commerce Web workload. We present results for both single and multiple service class environments. Our results show RLM’s unique ability to track a pageview download as it occurs, properly measure its elapsed response time as perceived by the remote client, decide if action ought to be taken at key junctures during the download, and apply latency control mechanisms for the current activities. The rest of this chapter is outlined as follows. Section 4.1 presents an overview of the RLM architecture. Section 4.2 describes the RLM pageview download model and pageview event node framework used for making response time management decisions. Section 4.3 describes RLM mechanisms used for connection latency management and their effect on client browsers. Section 4.4 describes RLM mechanisms used for page transfer latency management. Section 4.5 describes the implementation of RLM and presents some experimental results based on managing a TPC-W workload. Section 4.6 presents CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 132 RLM Internet HTTP TCP HTTP TCP Web server complex Figure 4.1: RLM deployment. theoretical analysis based on several simplifying assumptions. Finally, we present some concluding remarks. 4.1 RLM Architecture Overview As shown in Figure 4.1, RLM is a stand-alone appliance which sits in front of a Web server complex. RLM does not require any modifications to Web pages, the server complex, or browsers, making deployment fast and easy. This is particularly important for Web hosting companies that are responsible for maintaining the infrastructure surrounding a Web site, but are not permitted to modify the customer’s server machines or content. RLM builds on our work with ksniffer from Chapter 3 and uses a similar architecture. It is designed as a set of dynamically loadable kernel modules that reside above the network device independent layer in the operating system. Its device independence makes it easy to deploy on any inexpensive, commodity PC without special NIC hardware or device driver modifications. RLM monitors and manages bidirectional network traffic and looks at each packet only once. Its in-kernel implementation exploits several performance optimizations such as zero-copy buffer management, eliminated system calls, and reduced context switches [78, 84]. Our work with ksniffer demonstrated that the in-kernel CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 133 architecture can support gigabit traffic rates. RLM operates as a server-side mechanism with a low-delay control path to the Web server complex that is unaffected by outside network conditions. RLM measures client perceived pageview response times for Web transactions, then uses that information as real-time feedback in managing the behavior of the Web server complex to deliver desired response times. This tight measurement and management feedback loop near the server complex is key to RLM’s ability to provide real-time control of the performance of the Web server complex. RLM passively captures network packets to measure client perceived response times, then actively manipulates the packet stream between client and server to meet desired response time goals. RLM operates at the network packet level in part to provide its functionality without any modifications to the Web server complex. More importantly, as discussed in Section 4.2, measuring and controlling client perceived response times require tracking the client-server interaction at the packet level. RLM measures client perceived response times by capturing and analyzing packets using the Certes model of TCP retransmission and exponential backoff that accounts for latency due to connection setup overhead and network packet loss. It combines this model with ksniffer’s higher level online mechanisms that use access history and HTTP referrer information when available to learn relationships among Web objects to correlate connections and Web objects to determine pageview response times. The remainder of this chapter focuses on the management mechanisms RLM provides that work in conjunction with its measurement mechanisms. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 134 4.2 RLM Pageview Event Node Model RLM introduces a new model for specifying and achieving response time service level objectives based on tracking a pageview download as it happens and making service decisions at each key juncture based on the current state of the pageview download. A pageview download can be viewed as a set of well defined activities such as establishing a connection, getting the container page, and getting the embedded objects in a page. RLM models a pageview download as an event node graph, where each node represents an activity and each link indicates a precedence relationship. The nodes in the graph are ordered by time and each node is annotated with the elapsed time from the start of the transaction. Each activity contributes to the overall response time. Some activities may overlap in time, have greater potential to incur larger latencies, be on the critical path, and be more difficult to control than others. RLM controls pageview response times by identifying and managing the high latency activities on the critical path of the pageview download. To illustrate our approach, Figure 4.2 depicts the response time of te − t0 for the pageview download of index.html which embeds obj3.gif, obj6.gif and obj8.gif. This example uses two connections to download the page, consistent with modern Web browsers which open multiple connections to download Web content faster. Four types of latencies are serialized over each connection and delimited by specific events: 1. Tconn TCP connection establishment latency, using the TCP 3-way handshake. Begins when the client sends the TCP SYN packet to the server. 2. Tserver latency for the server complex to compose the response by opening a file, or calling a CGI program or servlet. Begins when the server receives an HTTP request from the client. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT Client Server t0 Tconn 135 SYN J ck J+1 SYN K, a ack K+1 GET index .html Tserver Ttransfer Trender GET obj3 .gif Client Tconn Tserver Server M+1 SYN N, ack ack N+1 GET obj6.gif Ttransfer Trender SYN M GET obj8.g if Tserver Tserver Ttransfer te Ttransfer Figure 4.2: Breakdown of client response time. 3. Ttransf er time required to transfer the response from the server to the client. Begins when the server sends the HTTP request header to the client. 4. Trender time required for the browser to process the response, such as parse the HTML or render the image. Begins when the client receives the last byte of the HTTP response. The corresponding event node graph generated by RLM is shown in Figure 4.3. Each link is identified with the type of latency that results from the particular activity. Each node is annotated with the elapsed time from the start of the transaction. By measuring the elapsed time at a given node, RLM can track the page download as it progresses and CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 136 0ms 1 SYN arrival Tconn 75ms 2 GET index.html Tserver 884ms 3 index.html response header Ttransfer 1375ms 4 Index.html response complete Trender 1385ms 5 browser parsed index.html GET obj3.gif 6 1410ms 1410ms 7 SYN arrival Tserver obj3.gif response header 9 1710ms Tconn 1485ms 8 Ttransfer obj3.gif response complete 10 2533ms 1785ms 15 obj6.gif response header Trender GET obj8.gif 11 2543ms Ttransfer 3426ms 16 obj6.gif response complete Tserver obj8.gif response header 12 2833ms GET obj6.gif Tserver Trender 3436ms 17 obj6.gif response processed Ttransfer obj8.gif response complete 13 3297ms Trender obj8.gif response processed 14 3307ms 18 3436ms Figure 4.3: Pageview modeled as an event node graph. determine at each node whether to take additional actions to satisfy response time goals for the given page. This ability to make management decisions at each point in time within the context of the pageview download is a key difference between RLM and other QoS approaches. It is crucial for RLM to model network loss and track client-server interaction at the packet level to measure and manage the entire client perceived response time te − CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 137 t0 shown in Figure 4.2. Mechanisms which attempt to measure response time via time stamping server-side user space events are ineffective. For example, measuring response time within an Apache Web server ignores time spent during the TCP 3-way handshake for establishing the connection and time spent in kernel queues before the request is given to Apache. We showed in Chapter 2 that such measurements are an order of magnitude less than the response time experienced by the remote client. Likewise, measuring and controlling the time required to service a single URL (i.e. Tserver ) is simply not relevant to the remote client who is downloading not just a single URL but an entire pageview. Latency control mechanisms must take into account effects seen at the packet level which impose latency for the remote client. RLM allows a variety of rules to be defined and enforced at different points in an event node graph to manage response time. Each node in the graph has a set of associated characteristics that determine what types of rules can be defined. For example, when a SYN is captured by RLM at node 1 in Figure 4.3, the management decision can be based only on fields contained within the SYN packet, namely source IP address, destination IP address, source port, and destination port. The decision cannot be based on which page is being requested since the GET request has not yet been sent by the client. Algorithms for quickly classifying packets or clients into service classes have been previously studied and are not discussed in this dissertation. We focus instead on the management framework and on introducing a set of techniques that can be applied for each instance of a page download to reduce the remaining time to complete the pageview to meet a defined response time goal. Section 4.5 provides some example rules that can be used with RLM and their impact on response time. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 138 4.3 Connection Latency Management One of the key types of latency that RLM must manage is TCP connection establishment latency Tconn . It is especially important to understand its impact on client perceived pageview response times since a great deal of work in controlling Web server performance has focused on applying admission control to prevent Web server overload by dropping TCP connections. However, the effect of admission control drops on the behavior of Web browsers has not been carefully studied. To determine how load shedding affects the client perceived response time on real Web browsers, we conducted a series of experiments using Microsoft Internet Explorer v6.0 and Firefox v1.02 in which we performed various types of connection rejection by performing SYN drops to emulate an admission control mechanism at the Web server. The end result was that the resulting response time at the browser is greatly affected not only by the number of SYN drops, but also by the connection for which the SYN drops occur. Figure 4.4 depicts the behavior of TCP under server SYN drops. The client sends the initial SYN at t0 , but the server drops this connection request due to admission control. The client’s TCP implementation waits 3 seconds for a response. If no response is received, the client will retransmit the SYN at t0 + 3s. If that SYN gets dropped, then the next SYN retransmission occurs at time t0 + 9s. The timeout period doubles (from 3 s, 6 s, 12 s, etc.) until either the connection is established, the client hits stop/refresh on the browser which cancels the connection, or the maximum number of SYN retries is reached. This is the well-known TCP exponential backoff mechanism we have been discussing throughout this dissertation. Server SYN drops are not a denial of service, but rather a means for rescheduling the connection into the near future. Although this behavior is effective in shedding server CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT Client t0 t0 + 3 t0 + 9 139 Server SYN J x SYN J x dropped by server SYN J ck J+1 SYN K, a ack K+1 GET index .html tA Figure 4.4: SYN drops at the server. load, it has significant effects on the response time perceived at the remote clients. Existing admission control mechanisms which perform SYN throttling simply ignore this effect and report the response time once the connection is accepted, beginning from time tA . This misrepresents both the client perceived response time and throttling rate at the Web site. Because Web browsers open multiple connections to the server as shown in Figure 4.2, it is important to understand the effect of a SYN drop in the context of which connection is being affected. If only the first SYN on the first connection is dropped, then the client will experience the 3 s retransmission delay, but will still be serviced. If the first connection gets established immediately, but all SYNs on the second connection are dropped, as shown in Figure 4.5, then the client will eventually receive a connection failure after multiple retries. While the second connection is undergoing SYN drops at the server, our study shows that Web browsers will display an hourglass cursor on the screen, a spinning busy icon in the corner of the browser, and a progress bar at the bottom of the browser CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT Client t0 140 Server SYN J serviced requests Client tz SYN M Server x tx Last byte of last embedded object on this connection tz + 3 SYN M tz + 9 SYN M x x Connection failure reported tz + 21 Figure 4.5: Second connection in page download fails. showing ‘in progress’. The browser continues to show these signs that the page is in the process of being downloaded until TCP reports the connection failure to the browser after 21 s as shown in Figure 4.5. Although all objects successfully obtained from the server are obtained over the first connection during the time interval t0 through tx , the browser does not indicate the end of the page download until tz + 21, when TCP reports the failure of the second connection to the browser. For the scenario in Figure 4.5, our study of Web browsers indicates that only a partial page download will occur in practice. The browser will never retrieve the first object CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 141 which would have been retrieved on the second connection. The browser will retrieve all other objects over the first connection, including those objects which would have been obtained over the second connection had it been established. Therefore, one embedded object is strictly associated with the second failed connection and is not obtained. If the second connection is eventually established, the embedded object associated with the second connection will then be obtained. For example, suppose that the SYN transmitted at tz + 9 was accepted by the server, the connection was established, and an object was requested and obtained over that connection. The end of the client perceived response time would then be the time that the last byte of the response for that object was received by the client. A variety of SYN drop combinations could occur, across multiple connections causing various effects on the client perceived response time. If all SYNs on the first connection are dropped, then the client is actually denied access to the server. If both connections are established, each after one or more SYN drops, then the TCP exponential backoff mechanism plays an important role in the latency experienced at the remote browser. This effect becomes more pronounced under HTTP 1.0 without KeepAlive where each URL request requires its own TCP connection and the retrieval of each embedded object faces the possibility of SYN drops and possible connection failure. Although the majority of browsers use persistent HTTP, the trend for Web servers is to close a connection after a single URL request is serviced if the load is high. Apache Tomcat [153] behaves in this manner when the number of simultaneous connections is greater than 90% of the configured limit, and reduces the idle time if the number of simultaneous connections is greater than 66%. This effectively reduces all transactions to HTTP 1.0 without KeepAlive. The maximum number of SYN retries that lead to a connection failure defines the connection timeout and depends on the operating system used by the client Web browser. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 142 In most cases, the default configuration on the number of SYN retries is used. For example, Windows XP operating systems default to two retries, resulting in a connection timeout after 21 s. As such, RLM uses 21 s in its model as the pageview response time for a Web page request that suffers a connection timeout. Other operating systems allow for more SYN retries, so the value we use is conservative as using a larger value would increase the effect of connection failure on response time, exaggerating the benefit of the RLM mechanisms used for managing Tconn . On the other hand, if the browser is painting the screen in a piece-meal manner, indicating that progress is being made, it is more likely that clients will tend to read the pageview as it slowly gets displayed on the screen. This behavior would occur if SYN drops occur on the second connection. In this situation, the pageview response time could exceed 21 s. In all cases, our study of Web browsers indicates that packet drops during connection establishment can have a significant, coarse-grained impact on pageview response time. Because of the TCP exponential backoff mechanism, any SYN drop results in a significant increase in Tconn . Note that other types of packet drops that occur once a TCP connection is established do not have the same coarse-grained effect. For example, if an HTTP GET request is dropped, the client will retransmit after the retransmission timeout expires, but this timeout value is much smaller than the 3 s initial timeout used during connection establishment. RLM introduces a fast SYN retransmission technique that can be used to reduce the coarse-grained effect of SYN drops. Figure 4.6 depicts the behavior of this mechanism. After a server SYN drop, RLM retransmits the SYN, on behalf of the remote client, at a shorter time interval than the TCP exponential backoff. Since RLM resides within the same complex in which the server exists and is not retransmitting the SYNs over the net- CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT Client t0 143 Server SYN J ck J+1 SYN K, a ack K+1 x Fast SYN retransmission RLM Figure 4.6: Fast SYN retransmission. work, it could at most be considered a locally controlled violation of the TCP protocol. The net effect is that a connection is established as soon as the server is able to accept the request. Since dropping a SYN at the server requires little processing, the overhead of this approach on the server complex is minimal, even when the server is loaded. Nevertheless, the retransmission gap can be adjusted based on the current load or the number of active simultaneous connections. RLM also introduces a fast SYN/ACK retransmission technique that can be used to reduce the coarse-grained effect of SYN/ACK drops. SYN/ACKs dropped in the network cause the same latency effect as a SYN dropped at the server. From the client perspective, there is no difference between a SYN dropped at the server and a SYN/ACK dropped in the network; a SYN/ACK does not arrive at the client and the TCP exponential backoff mechanism applies. Figure 4.7 depicts the behavior of the RLM fast SYN/ACK retransmission mechanism. If RLM does not capture an ACK from the client within a timeout much smaller than the TCP exponential backoff, RLM retransmits the SYN/ACK to the client on behalf of the server. Fast SYN/ACK retransmission violates the TCP protocol by performing retransmissions using a shorter retransmission timeout period than the expo- CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT Client t0 144 Server SYN J x , ack J+1 K N SY ack K+1 Fast SYN/ACK retransmission RLM Figure 4.7: Fast SYN/ACK retransmission. nential backoff. One can make several arguments that this is a minor divergence from the protocol. On the other hand, an Internet Web site which uses this technique to improve Tconn can rightly be labeled as an unfair participant on the Internet. If deployed, the overhead, either in the network or in the remote client is minimal. Referring to Figure 4.3, both fast SYN and fast SYN/ACK retransmission can be applied during state transitions 1→2 and 7→8 to reduce the critical path Tconn . 4.4 Transfer Latency Management Another key type of latency that RLM must manage is the TCP transfer latency Ttransf er , which can become a dominant component of response time when the network connection between the client and the server is the bottleneck. Ttransf er is known to be a function of object size, network round trip time (RTT) and packet loss rate: Ttransf er = f (size, RT T, loss) (4.1) Several analytic models of f (size, RT T, loss) have been developed [121, 38, 145]. Figure 4.8 depicts the transfer latency function defined by Cardwell at al. [38], for realistic CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 145 1.6 1.4 Feasible Region E[T] in seconds 1.2 1 0.8 0.6 Infeasible Region 0.4 0.2 0 0 20 40 60 Object size in packets 80 100 Figure 4.8: Cardwell et al. Transfer Latency Function f for 80 ms RTT and 2% loss rate. Internet conditions of an RTT of 80 ms and loss rate of 2% [167]. The line indicates the expected time (y-axis) it will take to transfer an object of the given size (x-axis). For smaller objects, in this case less than 10 packets in size, the transfer latency is dominated by TCP slow start behavior, the logarithmic part of the graph. For larger objects, the transfer latency is dominated by TCP steady-state behavior, the near-linear part of the graph. Cardwell’s function is a model of the expected amount of time required, not the minimum time. The farther a point is from the line, the less likely it is to occur in practice. For example, it is extremely unlikely that an object of size 50 packets can be transferred in under 1 second if the RTT is 80 ms and the loss rate is 2%. We labeled the region below the line as infeasible. The model predicts that under higher loss rates and longer RTT, reducing object size can reduce Ttransf er by half. Since RTT and loss rate are a function of the end-to-end path from client to server through the Internet and therefore uncontrollable, RLM is left with varying the response CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 146 size as a control mechanism for affecting Ttransf er . RLM accomplishes this using two simple techniques: 1. Embedded object rewrite: Translate a request for a large image into a request for a smaller image. Capture the HTTP request packet, if the request is for a large image then modify the request packet by overwriting the URL so that it specifies a smaller image, and then pass the request onto the server. 2. Embedded object removal: Remove references to embedded objects from container pages. Capture the HTTP response packets, if the response is for a container page then modify the response packet by overwriting references to embedded objects with blanks, and then pass the request packet onto the client. Embedded object rewrite retrieves an embedded object but one of much smaller size than the original, reducing the response size and Ttransf er for that object. The tradeoff is that the quality of the content is affected since the client will see a lower quality image. By modifying the client to server HTTP request, RLM can decide on a per request basis, in the middle of a pageview download, whether or not to change the requested object size. This presumes the existence of smaller objects; for some Web sites, maintaining all or some of their images in two or more sizes may not be possible. This technique can also be applied to dynamic content, where a less computationally expensive CGI is executed in place of the original, or the arguments to the CGI are modified (e.g., a search request has its arguments changed to return at most 25 items instead of 200). Embedded object removal entirely avoids retrieving an embedded object, eliminating Ttransf er for that embedded object. This has a greater latency reduction effect than the embedded object rewrite, but may further reduce the quality of the Web content displayed. Instead of viewing thumbnail images, the client only sees text. Unlike embedded object CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 147 rewrite which can be applied for any image retrieval during pageview download, the decision on whether or not to blank out embedded objects in the container page can only be made at one point in the pageview download, when the container page is being sent from the server to the client, which is transition 3→4 in Figure 4.3. These content adaptation techniques may also be able to reduce other types of latencies. Embedded object rewrite can reduce Tserver on the server and Trender at the client when the smaller object is also faster to serve and render. Embedded object removal can eliminate Tconn for an embedded object if a new connection is no longer required, and can eliminate Tserver and Trender if the object no longer needs to be served or displayed. However, the resulting change in the pageview response time due to these latencies will depend on whether the server delivering embedded objects or the client rendering them are the bottleneck, the latter being unlikely with modern PC clients. RLM adapts content by simply modifying a packet and forwarding the modified version; it does not need to keep buffers of packet content. RLM is not a proxy, and as such, must ensure the consistency of the sequence space for each connection. This means that changing the HTTP request/response is constrained by the size and amount of white space in each packet, and the checksum value must be recomputed as it changes with the change in payload. 4.5 Experimental Results We implemented RLM as a set of kernel modules that can be loaded into an inexpensive, off-the-shelf PC running Linux. Our kernel module approach is based on our work with ksniffer which demonstrated significant performance scalability benefits for executing within kernel space. We present some results using RLM to manage client perceived CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 148 Server Complex – black box RLM loss + delay HTTP Apache HTTP MySQL Tomcat APJ 1.3 JDBC Figure 4.9: Experimental test bed. response times in both single and multiple service class environments using TPC-W [152], a transactional Web e-commerce benchmark which emulates an online book store, running on a three-tier Web architecture. Figure 4.9 shows our experimental test bed. It consists of seven machines: three Web clients, one RLM appliance, and three servers functioning as a three-tier Web architecture. Apache 2.0.55 [64] was installed as the first tier HTTP server and was configured to run up to 1200 server threads using the worker multi-processing module configuration. Apache Tomcat 5.5.12 [153] was employed as the second tier application server (servlet engine) and was configured to maintain a pool of 1500 to 2000 AJP 1.3 server threads to service requests from the HTTP server, and a pool of 1000 persistent JDBC connections to the database server. MySQL 1.3 [105] was employed as the third tier database (DB) server and was set to the default configuration with the exception that max connections was changed to accommodate the 1000 persistent connections from Tomcat. Each of the three client machines was an IBM IntelliStation M Pro 6868 with a 1GHz Pentium 3 CPU and 512MB RAM. The RLM machine was an IBM IntelliStation 6850 with a 1.7GHz Xeon CPU and 768MB RAM. The Apache machine was an IBM IntelliStation M Pro 6868 with a 1 GHz Pentium 3 CPU and 1GB RAM. The Tomcat machine was an IBM IntelliStation M Pro 6849 with a 1.7GHz Pentium 4 CPU and 768MB RAM. The MySQL machine was an IBM IntelliStation 6850 with a 1.7GHz Xeon CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 149 CPU and 768MB RAM. All machines were running RedHat Linux, with the DB server running a 2.6.8.1 Linux kernel and the other machines running a 2.4.20 Linux kernel. The machines were connected via 100Mbps Fast Ethernet Netgear, CentreCOM, and Dell switches. We installed a modified version of the rshaper [139] bandwidth shaping tool on each of the three client machines to emulate wide-area network conditions in terms of transmission latencies and packet loss. We used a popular Java implementation of TPC-W [154] for our workload, but made two important modifications to the client emulated browser (EB) code to make it behave like a real Web browser such as Microsoft Internet Explorer. First, we modified the EB code to use two persistent parallel connections as shown in Figure 4.2 over which the container object and embedded objects are retrieved. These connections were not closed by the client but remained open during the client think periods (unless closed by the server). The original EB sent HTTP/1.1 request headers but actually used one connection for each GET request, effectively emulating HTTP/1.0 behavior by opening a connection, sending the request, reading the response and closing the connection. Second, we modified the EB to behave under connection failure as shown in Figure 4.5. We also used IP aliasing so that each individual EB could obtain its own unique IP address. The TPC-W e-commerce application consists of a set of 14 servlets. Each pageview download consists of the container page and a set of embedded images. All container pages are built dynamically by one of the 14 servlets running within Tomcat. First, the servlet performs a DB query to obtain a list of items from one or more DB tables, then the container page is dynamically built to contain that list of items as references to embedded images. After the container page is sent to the client, the client parses it to obtain the list of embedded images, which are then retrieved from Apache. As such, all images are served by the front end Apache server, and all container pages are served by Tomcat and MySQL. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 150 0.7 0.6 PDF 0.5 mean 0.26s, 81.4th percentile 0.4 95th is 1.404s 0.3 0.2 32314 total pages 0.1 0 0 5 10 15 response time (sec) 20 25 Figure 4.10: 0.3 ms RTT, 0% loss. 4.5.1 Response Time Distribution We first present measurements running TPC-W under ideal conditions of light load and no network loss or delay; we then add network loss and delay to see the effect this has on the response time distribution. We use 200 clients which keeps the DB server, which is the bottleneck resource in our multi-tier complex, at only 60% utilization. Figure 4.10 shows the response time distribution under ideal network conditions of minimal delay and zero loss along with the mean and 95th percentile of the client perceived response time and the total number of Web pages served. This scenario is often used for Web server performance benchmarking and QoS experimentation, but is very unrealistic for an Internet Web site. Figure 4.11 shows the response time distribution for the same experiment under 80 ms RTT and zero packet loss. The additional RTT shifts and spreads the distribution to the right as the transfer latency becomes more significant for larger pageviews. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 151 0.25 PDF 0.2 0.15 mean 0.98s, 68.5th percentile 95th is 1.90s 0.1 29307 total pages 0.05 0 0 5 10 15 response time (sec) 20 25 Figure 4.11: 80 ms RTT, 0% loss. Figure 4.12 shows the response time distribution under more realistic network conditions of 80 ms RTT and 2% network loss in each direction; studies have shown that the packet loss rate within the Internet is roughly 1-3% [167]. The distribution shifts further to the right and a clearly distinguishable spike occurs just after 3 s. This is attributed to either the first or second connection of the pageview having an initial SYN or SYN/ACK drop in the network. The response time distribution in Figure 4.12, not the one shown in Figure 4.10, is most likely to be the shape of the response time distribution for Web clients. Any approach which claims to manage client perceived response time for Internet Web service ought to be verified under conditions found in the Internet: network latency and loss. Unless otherwise indicated, all of our experiments are done using the same 80 ms RTT network latency and 2% network loss. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 152 0.12 0.1 PDF 0.08 0.06 26601 total pages mean 1.9s, 67.4th percentile 95th is 4.452s 0.04 0.02 0 0 5 10 15 response time (sec) 20 25 Figure 4.12: 80 ms RTT, 4% loss. 4.5.2 Managing Connection Latency Table 4.1 shows how RLM can improve response times by applying fast SYN/ACK retransmission to the same experiment shown in Figure 4.12 with different SYN/ACK retransmission gaps. For each retransmission gap, we report mean client pageview (PV) response time (RT), mean PV RT speedup versus Figure 4.12, 95th percentile PV RT, percentage of pages downloaded with greater than 3 s response time, and total number of pageviews. For example, the following RLM rule is used for a 500 ms retransmission gap: IF IP.SRC == *.*.*.* THEN FAST SYN/ACK GAP 500ms (4.2) Fast SYN/ACK retransmission results in a modest reduction in the mean response time, as much as 16.8% for the smallest retransmission gap of 10 ms. More importantly, CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT SYN/ACK gap 3s 1s 500 ms 10 ms mean PV RT 1.9 s 1.72 s 1.64 s 1.58 s mean speedup 0% 9.5% 13.7% 16.8% 95th % PV RT 4.45 s 4.22 s 4.09 s 4s >3s PV RT 22.36% 18.17% 16.05% 15.42% 153 total pages 26601 27001 27287 27455 Table 4.1: Fast SYN/ACK retransmission. the technique results in a much larger reduction in the number of pages that have higher response times, reducing the number of pages with more than 3 s response time by over 30%. In general, we would not expect this technique to significantly affect the mean response time, but rather significantly affect those pageviews that experience a network SYN/ACK drop. 4.5.3 Managing Load and Admission Control Figure 4.13 shows the response time distribution for running TPC-W with 550 clients, causing heavy load. The mean client perceived response time increased to 5 s from the 1.9 s for 200 clients shown in Figure 4.12. No SYN drops are occurring at the server complex. The only SYNs being dropped are those being lost in the network, so the percentage of SYN drops is the same for both the light and heavy load experiments shown in Figures 4.12 and 4.13, respectively. Bandwidth is at low utilization throughout the entire test bed. The increase in response time is due to increased CPU utilization within the multi-tier complex. In such a scenario, it is usually desirable to apply a load shedding technique to prevent the Web server from overloading or to simply improve response time by reducing the load. A simple and common load shedding mechanism is to manipulate the Apache MaxClients setting [51, 93, 161]. MaxClients is an upper bound on the number of httpd threads available to service incoming connections; it limits the number of simulta- CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 154 0.025 0.02 54193 total pages 0.015 PDF mean 5.01s, 58th percentile 0.01 95th is 11.5s 0.005 0 0 5 10 15 20 response time (sec) 25 Figure 4.13: Unmanaged heavy load. neous connections being serviced by Apache. Figure 4.14 shows the result of setting MaxClients to 400 for the same workload shown in Figure 4.13. The spike at 5 s in the distribution represents those pageviews which incurred an initial SYN drop resulting in a 3 s timeout on one of the two EB connections to the server (in addition to the 2 s baseline latency shown in Figure 4.12). The spike at 8 s, which is barely visible in Figure 4.12 but pronounced in Figure 4.14, represents those pageviews which incurred a 3 s timeout on both connections to the server. The spike at 21 s represents those clients which experienced a connection failure. Table 4.2 depicts the results for setting various limits on the number of simultaneous connections served. We instrumented the TPC-W servlets to capture their response time by taking a timestamp when the servlet was called and a timestamp when the servlet returned; this covers the time it takes to build the container page, including the DB query but does not include the time to connect to the server complex or transmit the response. Table 4.2 shows CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 155 0.025 44087 total pages 0.02 mean 7.86s, 61.8th percentile PDF 0.015 95th is 19.61s 0.01 0.005 0 0 5 10 15 20 response time (sec) 25 Figure 4.14: MaxClients load shedding. Max Clients 600 500 400 300 200 mean PV RT 5.01 s 5.28 s 7.86 s 12.3 s 19.1 s 95th % PV RT 11.6 s 11.9 s 19.6 s 28.5 s 40.5 s Tomcat RT 3.14 s 1.013 s 0.405 s 0.155 s 0.068 s total pages 54193 53038 44087 34440 25894 server SYN drops 0% 4.8% 18.2% 30.7% 43.5% Table 4.2: Load shedding via connection throttling. under Tomcat response time that as the number of simultaneous connections decreases, the mean time to query the DB and create the container page decreases. However, when measured in terms of pageviews perceived by the client and including those pages which experienced the default admission control drops, the overall mean pageview response time actually increases. Some clients are experiencing response times which can be considered better than required while other clients are experiencing significant latencies due to SYN drops. The results demonstrate that this common form of load shedding is ineffective at CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 156 reducing client perceived response times. Furthermore, the significant effect that SYN drops have on the response time distribution makes providing service level agreements based on meeting a threshold for the 95th percentile impossible to achieve. A common alternative to changing the Apache MaxClients is to perform SYN throttling to control the offered load on the system. We apply this technique in the context of a multi-class QoS environment. Many Web sites demand the ability to support different users with multiple classes of service and response time, such as providing buyers with better performance than casual visitors at an e-commerce site. It is often desirable to maintain a specific response time threshold for a certain class of clients. Given a finite set of resources under heavy load, high priority clients are expected to receive better response time than if all clients were treated equally. Similarly, low priority clients will suffer worse response time than if all clients were treated equally. We ran TPC-W with 550 clients as we did in Figure 4.13, but divided the clients into one-third high priority clients from subnet 10.4.*.* and two-thirds low priority clients from other subnets. We perform typical SYN throttling (i.e. admission control) by dropping SYNs arriving from low priority clients when the high priority clients are exceeding their response time threshold. We employ RLM using the following rule: IF IP.SRC ! = 10.4.*.* AND RT HIGH > 3.0s THEN DROP SYN (4.3) Figure 4.15 shows that mean response time for the 184 high priority clients was roughly 3 s, but at a heavy cost to the 366 low priority clients. The vertical spike at 21 s for the low priority clients indicates the set of connection failures experienced by those clients. From Figure 4.12 we see that 200 clients alone receive 2 s mean response time. As CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT high priority 3.11s mean RT 8.4s 95th % 21634 pages low priority 9.53s mean RT 26.6s 95th % 28280 pages 0.15 PDF 157 0.1 0.05 0 0 5 10 15 response time (sec) 20 25 Figure 4.15: Low priority penalties. such, our 184 high priority clients have processing to spare for the low priority clients. But the high retransmission penalty significantly affects the response time of the low priority clients. RLM can improve this situation by using fast SYN and SYN/ACK retransmission. Figure 4.16 shows the effect of the following rule: IF IP.SRC == *.*.*.* THEN FAST SYN + SYN/ACK GAP 500ms IF IP.SRC ! = 10.4.*.* AND RT HIGH > 3.0s THEN DROP SYN HALT FAST SYN (4.4) In enforcing this rule, if the mean response time for the high priority clients exceeds 3 s, then incoming SYNs from low priority clients are dropped by RLM and existing low CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT PDF 0.15 high priority 3.19s mean RT 7.55s 95th % 21520 pages low priority 7.74s mean RT 22s 95th % 29201 pages 0.1 0.05 0 0 158 5 10 15 response time (sec) 20 25 Figure 4.16: Improvement from applying fast SYN retransmission. priority fast SYN retransmissions are temporarily halted. The moment RT HIGH < 3.0 s, the low priority fast SYN retransmission is resumed and new SYNs from low priority clients are passed to the server. Low priority clients have their requests processed without them waiting the full TCP retransmission timeout periods. This lead to a 23% improvement for low priority clients, while maintaining essentially the same response time and throughput of the high priority clients. Most importantly, the spike at 21 s in Figure 4.15 is gone with the removal of the large number of connection failures experienced by the low priority clients. Figure 4.17 depicts an alternative distribution to Figure 4.16 based on using the same rule with the addition that all initial SYNs on the first connection of the pageview from low priority clients are dropped, with a fast SYN retransmitted 3 s later and 500 ms subsequently there after. By dropping all initial SYNs for low priority clients we effectively increased their think time, the interarrival time between container page requests. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT PDF 0.15 high priority 2.65s mean RT 6.28s 95th % 22472 pages low priority 7.51s mean RT 18.1s 95th % 29902 pages 0.1 0.05 0 0 159 5 10 15 response time (sec) 20 25 Figure 4.17: Widening the think time gap. This shifts the distribution to the right and has the effect of improving the 95th percentile of both service classes. Note that some low priority pages are served very fast. These pages are re-using the TCP connection for the container page, and hence unaffected by SYN manipulation. Reducing the arrival rate of low priority clients resulted in a reduction in load on the MySQL server, which in turn reduced server time and client response time. Application of this technique warrants further investigation. Over-use could lead to livelock [102] or create more work for the system, offsetting the benefits of reducing the drop latencies (although, in an e-commerce environment the front-end Web server is not the bottleneck). We have not observed any negative effects on the TCP endpoints (client or server) with respect to their ability to measure RTT and enforce their TCP timeouts and retransmissions. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 160 Mean Client Perceived RT (sec) 6 5 4 3 2 1 0 0 160ms RTT 220ms RTT 300ms RTT 2 4 6 8 10 12 Number of embedded objects retrieved Figure 4.18: Embedded object removal. 4.5.4 Managing Transfer Latency We now consider managing transfer latency under situations of variable and larger RTT. We run TPC-W with 200 clients so that the multi-tier complex is under modest load and modified our environment by splitting our clients into three groups, one having 160 ms RTT, another with 220 ms RTT and the third with 300 ms RTT. The resulting mean client perceived response times were roughly 3 s, 4 s, and 5 s, respectively. In this environment, nothing in the server complex is overloaded, and no server-side SYN drops are occurring. As such, load shedding performed at the server will not improve response times. Figure 4.18 shows the impact of embedded object removal on mean response time as a function of the number of embedded objects removed and retrieved. In this graph, the x-axis represents the number of embedded objects retrieved and therefore not removed. For example, when x is 3, the first 3 embedded objects are retrieved while the rest of the objects are removed from the container page. No page had more than 11 objects per page, CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 161 so the rightmost points in Figure 4.18 correspond to full downloads. The leftmost points correspond to all embedded objects being removed, reducing mean response time to near 1 s, using the following RLM rule: IF IP.SRC = *.*.*.* THEN REMOVE EMBEDS (4.5) Without embedded object removal, the difference in RTT separates out the clients into three service classes when only one service class is desired. Figure 4.18 shows that different numbers of embedded objects can be removed for clients with different RTTs to provide similar response times for all clients. For example, all clients can be given the same mean response time of 3 s by doing full downloads for the 160 ms RTT clients, only allowing 220 ms RTT clients to download two embedded objects, and having half the 300 ms RTT clients download one object and receive 2 s response time while the other half download two objects and receive 4 s response time. The mix of different numbers of embedded objects for the 300 ms RTT clients is due to the discrete nature of the technique: either an object is obtained or it is not obtained. Although discrete, embedded object removal has the advantage over embedded object rewrite in that multiple copies of the same object need not be maintained. This is an issue if disk space is limited or charged by use. Note the large jump in Figure 4.18 when the second embedded object is downloaded. While this is partly due to the overhead in opening the second connection, the key reason is that for roughly 18% of the pageviews in TPC-W, the second embedded object is a large 256KB image. Since it is being downloaded as the first object on the second connection, not only does it incur Tconn but also TCP slow-start. By configuring RLM to CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 162 remove only the second image, the response time dropped to 2.19 s for the 160 ms RTT clients, 2.85 s for the 220 ms RTT clients, and 3.68 s for the 300 ms RTT clients. The curves in Figure 4.18 are relatively flat after the second object. As more objects are downloaded on the same persistent connection, the TCP window size increases. In addition, fewer pages have a larger number of embedded objects. Roughly 75% of the pages contain 9 objects, and 18% of the TPC-W pages contain 10 or more. If the typical e-commerce Web site has a similar shape, then removal/rewrite could be applied in this manner, bottom to top, for the pages with the most items; this implies fewer clients will be affected. Figure 4.19 shows the corresponding results for embedded object rewrite. Embedded object removal is more effective at reducing response time than embedded object rewrite, but the effect is coarse-grained. The removal of the references to embedded objects must occur during the transition 3→4 in Figure 4.3, essentially eliminating states 6 through 18. In contrast, embedded object rewrite can be applied at a finer-granularity as different parts of the page are downloaded. To reduce response time, one would like to apply a rule such as: IF RT > 2s THEN REWRITE EMBEDS (4.6) However, this may not be effective. Referring back to Figure 4.3, it is at node 5 that the browser obtains the list of embedded objects to obtain. In our current scenario, this is after the server, which is under light load, returns the container page. At this point, the elapsed response time is relatively short, less than 1 s. It is at this moment that the EB opens the second connection and may request the large image which greatly increases the response time. Even if we decide after 1 s to rewrite the remaining objects, the time CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 163 Mean Client Perceived RT (sec) 6 5 4 3 160ms RTT 220ms RTT 300ms RTT 2 1 0 0 2 4 6 8 10 Number of full sized objects 12 Figure 4.19: Embedded object rewrite. required to finish downloading the large image will extend the pageview response time to beyond 1 s. Indeed, the result we obtain is a response time of 2.8 s, 3.52 s and 4.55 s for the 160 ms, 220 ms and 300 ms RTT clients, respectively; this matches the download of two full size images in Figure 4.19. This indicates the need to predict the latency contribution that a request will have to the overall pageview response time. Figure 4.20 shows the results of rewriting embedded objects if the predicted Ttransf er for that object would cause the pageview to exceed the specified elapsed time. RLM keeps an average of the Ttransf er for image downloads under different size, RTT and loss groupings, which is comparable to Equation 4.1. Note this does not include Trender , and as such is an underestimate of the latency associated with the image. For an RTT of 160 ms, 220 ms and 300 ms the large image is predicted to have a Ttransf er value of 6.17 s, 8.11 s and 11.05 s, respectively. Notice that at a 1 s threshold Figure 4.20 matches the response time as seen in Figure 4.19 when rewriting all CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 164 Mean Client Perceived RT (sec) 6 5 4 3 2 160ms RTT 220ms RTT 300ms RTT 1 0 2 4 6 8 10 Predicted Elapsed RT 12 Figure 4.20: Applying predicted elapsed time. images - hence correctly removing the large image for the short threshold. As the threshold increases, the predictor allows the large image to be included in the pageview when it no longer is a factor in achieving the response time goal. This is seen at 7 s, 9 s and 12 s for 160 ms, 220 ms and 300 ms clients. This technique tends to shorten the tail on the response time distribution by removing embedded objects which take longer to download. Figure 4.21 shows the effect of embedded object removal as compared to full page downloads as the number of clients increase. Once the number of clients reaches 550 the time to download full pages vs. pages with no embedded objects is the same. At this point, when full pages are being downloaded the 550 clients are spending half their time at the DB server obtaining the container page and half their time at Apache retrieving the embedded objects - this means the DB server is serving roughly 275 requests simultaneously. When the embedded images are removed from the container page, the client spends no time at Apache and all the time at the DB server - effectively doubling the load on the Response time (sec) (220ms RTT) CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 165 10 8 Full downloads Container portion Remove all embeds 6 4 2 0 200 300 400 500 Number of clients 600 700 Figure 4.21: Full vs. Empty, mean pageview response time. DB server so that the DB server is serving roughly 550 requests simultaneously. The extra load on the DB server causes longer delay in serving the container page. As a reference point, the dotted line in Figure 4.21 shows the split between the time spent on the container page (below the line) and the embedded objects (above the line). The time required to obtain the embedded objects from Apache increases the time between DB requests to the MySQL server. As such, the complex could return a full pageview or an empty pageview at the same response time. Note that the think time in both cases is the same. This clearly shows that attempting to managing one portion of the response time instead of the overall pageview response time may not be effective. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 166 4.6 Theoretical Analysis As mentioned throughout this dissertation, a key problem is that the retransmission timeout values for SYN drops are too large having a disproportionate effect on client response time and are ignored by existing admission control mechanisms. In this section we take a look at this problem from a theoretical perspective, with the goal of identifying any points of interest, such as an optimal response time minima. To simply matters we consider the situation where the client is downloading a container page with no embedded objects over a single connection - such as when a client downloads a postscript or PDF file. In Section 2.2 we presented Equation 2.4 for defining CLIENT RTi , the mean client perceived response time for the transactions completing during the ith interval. For simplicity we assume k = 2, which means Equation 2.4 resolves to: CLIENT RTi = P P SYN-to-END + RT T + 21Ri3 + 9Ci2 + 3Ci1 COMP LET EDi + Ri3 (4.7) Intuition and general experience suggests that the SYN-to-END time of a transaction is dependent on the current load in the system: as the number of currently active transactions in the system increases, the resource share alotted to each individual transaction decreases and the per transaction service time increases (general processor sharing model). Assume the variance of SYN-to-END is small, such that all transactions accepted in the ith interval complete in the (i + SYN-to-END)th interval. Let SYN-to-END = f (µ) be the mean SYN-to-END as a function of the given acceptance rate µ. We modify Equation 4.7 to be: CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 167 CLIENT RTi = P COMP LET EDi · SYN-to-END + RT T + 21Ri3 + 9Ci2 + 3Ci1 (4.8) COMP LET EDi + Ri3 P Equation 2.3 allows us to substitute kj=0 Cij for COMP LET EDi and rewrite Equation 4.8 as: CLIENT RTi = P (Ci2 + Ci1 + Ci0 ) · SYN-to-END + RT T + 21Ri3 + 9Ci2 + 3Ci1 Ri3 + Ci2 + Ci1 + Ci0 (4.9) which resolves to: CLIENT RTi = 21Ri3 + Ci2 (SYN-to-END + 9) + Ci1 (SYN-to-END + 3)+ P Ci0 (SYN-to-END) + RT T Ri3 + Ci2 + Ci1 + Ci0 (4.10) In order to make our optimization claims we first re-formulate Equation 4.10 under a steady-state model. Equation 2.12 gives us: j j Cij = Aji−SYN-to-END = Ri−SYN-to-END − [DRi−SYN-to-END · Ri−SYN-to-END ] Substituting Equation 4.11 into Equation 4.10 gives us: (4.11) CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT p0 20 20 0 3s p0 p1 20 1 (1-p0120 p0(1-p1120 6s p0p1p220 2 168 12s G p0p1(1-p2120 S SYN-to-END T 20 Figure 4.22: Steady state flow model. CLIENT RTi = 21Ri3 + 2 2 − [DRi−SYN-to-END · Ri−SYN-to-END ])(SYN-to-END + 9)+ (Ri−SYN-to-END 1 1 − [DRi−SYN-to-END · Ri−SYN-to-END ])(SYN-to-END + 3)+ (Ri−SYN-to-END 0 0 (Ri−SYN-to-END − [DRi−SYN-to-END · Ri−SYN-to-END ])(SYN-to-END)+ P RT T Ri3 + 2 2 Ri−SYN-to-END − [DRi−SYN-to-END · Ri−SYN-to-END ]+ 1 1 Ri−SYN-to-END − [DRi−SYN-to-END · Ri−SYN-to-END ]+ 0 0 − [DRi−SYN-to-END · Ri−SYN-to-END ] Ri−SYN-to-END (4.12) which presents CLIENT RTi in terms of DRi , Rij , SYN-to-END and RT T . Since we are able to specify Rj in terms of R0 we are able to create a steady state flow model, which is depicted in Figure 4.22. Let λ0 be the arrival rate of R0 and pj be the drop probability for Rj . State S represents those requests which get served and state G CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 169 represent those requests which do not get served (giveups). Each node is annotated with its associated latency. The steady state flow model has allowed us to remove the concept of an ith interval, making the analysis tractable by removing the explicit interdependencies between intervals from the equation. Let λj be: λ1 = (1 − p0 )λ0 λ2 = p0 (1 − p1 )λ0 λ3 = p0 p1 (1 − p2 )λ0 λ 4 = p0 p1 p2 λ 0 It should be clear that just as we are able to write λj in terms of pj , we are able to write pj in terms of λj : p0 = 1 − λ1 λ0 p1 = 1 − λ2 p0 λ0 p2 = 1 − λ3 p0 p1 λ0 We’ll tend to write λj in terms of pj in the following since it will result in easier to understand formulas. We define µ as the acceptance rate into state S in Figure 4.22. We can now rewrite Equation 4.10 as: CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 170 CLIENT RTi = RT T + λ1 SYN-to-END + λ2 (SYN-to-END + 3) + λ3 (SYN-to-END + 9) + λ4 21 λ0 (4.13) subject to the constraints: λ0 = λ1 + λ2 + λ3 + λ4 µ = λ1 + λ2 + λ3 λj ≥ 0 µ≥0 The first question we answer is how should we control λj to minimize the overall mean response time? In other words, does it affect mean response time if we accept or drop a SYN based on whether it is an initial SYN, 1st retry, 2nd retry, etc? Put another way, is it possible to minimize Equation 4.13, for an arbitrary value of SYN-to-END? We can view Equation 4.13 as: CLIENT RT = w 1 λ1 + w 2 λ2 + w 3 λ3 + w 4 λ4 λ1 + λ2 + λ3 + λ4 where wj are cost weights applied to each λj : + RT T (4.14) CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 171 w1 = SYN-to-END w2 = SYN-to-END + 3 (4.15) w3 = SYN-to-END + 9 w4 = 21 Minimizing Equation 4.13 can now be stated as determining values for λj , 1 ≤ j ≤ 4 such that the cost of w 1 λ1 + w 2 λ2 + w 3 λ3 + w 4 λ4 (4.16) is minimal, under the same constraints as Equation 4.13. It should be clear that the minimal solution will choose λj to be largest when wj is smallest (and vice versa). In most cases the minimum mean response time will be achieved when λ 1 > λ2 > λ3 > λ4 (4.17) w1 < w2 < w3 < w4 (4.18) since in most cases The optimal algorithm for accepting SYNs is therefore to always accept as many initial SYNs as possible, then as many 1st retries as possible, then as many 2nd retries as possible, etc. The exception to this rule is when the ordering of the weights in Equation 4.18 change, which can happen when SYN-to-END gets large. Since w1 < w2 < w3 (4.19) CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 172 always holds and w4 is a constant, the only possible orderings are: w1 < w2 < w3 < w4 w1 < w2 < w4 < w3 (4.20) w1 < w4 < w2 < w3 w4 < w1 < w2 < w3 Assuming that it is better to serve a request at 21 s rather than deny it, the ordering of wj in terms of SYN-to-END is: if SYN-to-END ≤ 12 then, SYN-to-END < SYN-to-END + 3 < SYN-to-END + 9 ≤ 21 if 12 < SYN-to-END ≤ 18 then, SYN-to-END < SYN-to-END + 3 ≤ 21 < SYN-to-END + 9 if 18 < SYN-to-END ≤ 21 then, SYN-to-END ≤ 21 < SYN-to-END + 3 < SYN-to-END + 9 if 21 < SYN-to-END then, 21 < SYN-to-END < SYN-to-END + 3 < SYN-to-END + 9 (4.21) Equation 4.21 simply suggests that as SYN-to-END increases, at some point the client will perceive a shorter response time if denied service rather than being provided with a response. This ordering result is general and independent from f (µ), which determines the value for SYN-to-END. This implies the following two cases: CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 173 Case 1: if 21 < SYN-to-END λ4 = λ0 λ1 = λ2 = λ3 = 0 (4.22) p0 = p1 = p2 = 1 CLIENT RT = 21 Case 2: if SYN-to-END ≤ 21 λ1 = µ λ4 = λ0 − µ λ2 = λ3 = 0 p0 = (4.23) λ0 −µ λ0 p1 = p2 = 1 CLIENT RT = w1 λ1 +w4 λ4 λ1 +λ4 Although Equation 4.21 has four orderings to consider, the constraint that λ1 +λ2 + λ3 = µ and the fact that we are minimizing forces us to merge the first three orderings into case 2. If λ0 ≤ µ then all requests get accepted on the initial SYN; if λ0 > µ then there will always be λ0 −µ transactions not getting any service; of the µ receiving service, it is always optimal to accept them on the initial SYN, unless w4 < w1 (Case 1). Since w1 < w2 < w3 dropping some portion of λ1 only to be accepted as λ2 or λ3 only adds latency to the response time. In summary, this result states that, in the general case, to minimize response time one should either accept a request on the initial SYN or completely deny the request by dropping all subsequent SYN retransmissions. However, our experimental results under realistic workload conditions show a benefit for not following this policy. Under realistic workload conditions, bursts in the arrival rate are often followed by a period t∆ of low CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 174 λ0 , eliminating the need to continually drop incoming requests during t∆ . In other words, the additional work imposed by the burst can be serviced after the burst when few other requests arrive. The second question we answer is for what functions of f (µ) is it possible to optimize the overall mean response time? In other words, can we reduce the acceptance rate such that the resulting decrease in SYN-to-END more than offsets the latencies associated with SYN drops? Since µ = λ1 + λ2 + λ3 , the only way to reduce the acceptance rate µ is to increase λ4 , the number of requests which do not receive service. We now define µ0 to be the service rate at state S. Where µ is the acceptance rate (rate at which requests are entering state S) µ0 is the rate at which the accepted requests are being serviced. By definition, µ ≤ µ0 always holds. From the result given in Equation 4.23 in case 2, we have: CLIENT RT = w 1 λ1 + w 4 λ4 λ1 = µ, λ4 = λ0 − µ λ1 + λ4 (4.24) Assume a /M/G/1 queuing model for state S. The service delay for /M/G/1 is: SYN-to-END = 1 µ0 − µ (4.25) by substituting Equation 4.25 into Equation 4.24 we get: CLIENT RT = h 1 µ0 −µ i µ + 21[λ0 − µ] µ + [λ0 − µ] (4.26) which is equivalent to: 21µ0 1 µ0 1 CLIENT RT = 21 − 0 − 0 + 0 + 21(µ0 − µ) λ λ λ µ0 − µ (4.27) CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 175 To minimize Equation 4.27 note that the only variable in the equation that is not considered a constant in steady-state is µ, the acceptance rate. Therefore, to minimize the last term is to minimize the equation - determine an acceptance rate, µ, which minimizes √ √ the response time. It is well known that a + b ≥ 2 ab and that a + b = 2 ab iff a = b. If we let a= µ0 µ0 −µ (4.28) b = 21(µ0 − µ) then µ0 µ0 −µ √ + 21(µ0 − µ) = 2 21µ0 (4.29) √ 0 = 21(µ0 − µ) = 21µ0 . By solving for µ we are able to obtain µopt = iff µ0µ−µ p µ0 − µ210 as the acceptance rate which minimizes the response time for a given capacity √ 0 and µ0 , and is independent from the offered load, λ0 . Substituting 21µ0 for both µ0µ−µ 21(µ0 − µ) in Equation 4.27 we are able to obtain the minimum possible response time as: CLIENT RT = 21 − The other result is µ = µ0 − 1 , 21 21µ0 1 p 1 − 0 + 0 2 21µ0 0 λ λ λ (4.30) which is the point at which the accept rate will cause the response time to exceed 21 s. This suggests that when the accept rate reaches 95.2% of the capacity of the system, it is better to simply drop all requests than to provide a response whose latency will be > 21s. The optimal drop rate is then: p0 = 1 − µopt , subject to p0 ≥ 0 λ0 (4.31) CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 176 22 20 18 Response Time (sec) 16 14 12 10 8 6 4 2 0 0 200 400 µ 600 800 1000 Figure 4.23: Equation 4.27 for µ0 = λ0 = 1000. which is zero when λ0 ≤ µopt . This implies that as long as λ0 is greater than the optimal accept rate, the minimum CLIENT RT can only be achieved through the use of SYN drops. This leaves us with three ranges for setting the drop rate p0 : 1. If 0 ≤ λ0 ≤ µopt then p0 = 0 ; λ0 = µ and λ0 < µ0 2. If µopt ≤ λ0 ≤ µ0 − 3. if µ0 − 1 21 1 21 then p0 = 1 − µopt λ0 < λ0 then p0 = 1 Figure 4.23 is a graph of Equation 4.27 and Figure 4.24 is the same graph, zoomed in on the minima point. Figure 4.23 shows that for values of µ << λ0 (or µ << µ0 ) the drop penalty significantly affects the response time. As µ increases, the drops decrease and the response time decreases. The dotted portion of the curve, to the right of µopt , increases sharply. This indicates that the latency associated with queuing delay dominates the SYN drop latencies. Figure 4.24 shows the minima point at CLIENT RT = 0.29s CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 177 Response Time (sec) 1 0.8 0.6 0.4 0.29s 0.2 0 950 960 970 980 µ µ0 − 990 1000 p µ0 /21 = 993.099 1010 Figure 4.24: Equation 4.27 for µ0 = λ0 = 1000, zoomed in on minima. when µ = 993.099, which is when the acceptance rate is at 99.3% of system capacity. The drop rate at the minima is p0 = 0.006901. As a comparison, Figure 4.25 shows the /M/G/1 service latency overlayed onto Figure 4.24 as a dashed line. The /M/G/1 is monotonically increasing with no minima point - except to service only a single request. Suppose the /M/G/1 curve was used to determine when to drop or accept requests. If a 1 s threshold was selected, then indeed the mean response time would be 1 s, including the overhead for SYN drop latencies. Yet, for the same arrival rate and system capacity, the optimal mean response time is 0.29 s, which is 3.4 times less than the 1 s threshold. Likewise, if the system is throttled at 90% capacity, the resulting response time would be 3.8 s, again 13 times larger than the optimum. An alternative view of the RLM/server complex can be based on capacity (fixed buffer), as shown in Figure 4.26. Let C be the maximum number of requests the server complex can process concurrently. When a request is received by the server it is allocated CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 178 Response Time (sec) 1 0.8 0.6 0.4 0.29s 0.2 1 µ0 −µ 0 950 960 970 µ 980 990 1000 p µ0 − µ0 /21 = 993.099 Figure 4.25: /M/G/1 service latency one of C slots. Let µ0 be the service rate with 1 µ0 −µ 1 µ0 1010 overlayed onto Figure 4.24. being the service latency. Suppose that the server complex is currently servicing C − 1 requests. From this state the server can either enter state C if a request arrives before a request completes, or state C − 2 if a request completes before a request arrives. This implies that if the goal is to maintain the server as fully loaded in state C, then RLM should send requests to the server complex at a rate of µ0 or every 1 µ0 seconds once the server is full. Alternatively, RLM could transmit a request whenever it detects an available slot at the server (assuming RLM knows the value of C). Consider the requests waiting at RLM to be sent onto the server at the rate of µ0 . If λ0 > µ0 then some requests will not be serviced. In this case, the policy which minimizes mean response time for the client would be to enforce a LILO service policy at RLM. In other words, RLM should pass on to the server the requests which have been waiting the shortest amount of time; since there will be µ0 − λ0 requests that are denied service the CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 1 RLM 0 top 1 179 server 20 C slots bot frustrated clients Figure 4.26: Capacity model. best choice for those requests are the requests which currently have waited the longest. So the top of the stack is popped and sent onto the server, while the bottom of the stack is dequeued as clients are denied service. This is consistent with the flow model we presented which suggested that a request either be accepted or completely denied service. Here, we are simply saying that as λ0 varies over time, the LIFO stack at RLM stores requests which can be serviced when λ0 again falls below µ0 . This is the basic idea behind fast SYN retransmission. The analysis presented in this section complements our experimental results which are performed using actual Web servers under realistic workloads - conditions which are difficult to analyze. Although the analysis looks at a limited set of conditions, it does provide a theoretical framework and insight for understanding how such systems behave. On the Internet, for example, the arrival rate λ0 and service capacity µ0 are not constant, difficult to measure, and dependent on numerous factors. Likewise, the default accept queue throttling mechanism within Linux is not a strict /M/G/1 with a bounded buffer length, but rather resembles a variant of random early detection. Although the analysis suggests using a ‘persistent drop’ approach for load shedding, we found through experimentation with CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 180 actual Web servers the variance in the arrival rate λ0 allowed us to effectively apply our techniques of fast SYN + SYN/ACK retransmission to obtain the benefit we demonstrated in the prior section. This can be seen if we consider the server complex as changing from one steady-state to the next over time. As such, as λ0 and µ0 change over time, so does the optimal drop rate, p0 . Changes in p0 at time t0 will affect p1 at time t3 and p2 at time t9 , allowing us to accept SYNs which had been dropped during a prior time interval. 4.7 Alternative Approaches If we allow modifications to the Web server then alternative approaches arise. Some or all of the functionality of RLM could be moved into the Web server complex; alternatives arise based on which functions are moved from RLM into the server. Since tracking client perceived response time requires tracking activity at the packet level, this functionality as a whole would most likely remain as a single unit, either in RLM or moved to the server. Therefore, one option would be to load all the RLM kernel modules into the Web server kernel and pass response time information up to the Web server executing in user space. The Web server could then make control decisions and/or adjustments based on the response time. For example, with enough coordination between kernel and user space the technique of embedded object rewrite/removal could be performed by the Web server or application server. Fast SYN + SYN/ACK retransmission would be performed by the RLM modules executing in kernel space. In addition, the RLM modules could combine other techniques regarding connection management with the fast SYN + SYN/ACK, such as manipulating the accept queue and SYN hash table. RLM could also share the learned embedded object patterns with the Web server (or vice versa) so that the Web server could perform pageview management within user space. CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 181 The drawback to such an approach is that the Web server would now incur the additional overhead of performing RLM duties - but in a e-commerce environment where the front end system is underutilized this will not be a problem. If the server complex consists of several front end systems, then by placing RLM functionality in each of them we lose a single point of control. Another option would be to keep measurement functionality within RLM and move the management and control functions into the server complex. In such a scenario, RLM would only be used as a measurement device, reporting per pageview response times into the server complex which makes adjustments based on the client perceived response time. One drawback to this approach is that the ability to manage TCP connection establishment latencies would be lost - unless the server was loaded with kernel modules which performed this RLM functionality. Likewise, if RLM was reporting per pageview response times after a page has completed downloading, the ability to make adjustments for the client, in real-time, as the page is in the process of being downloaded is lost. By splitting RLM functionality across two machines, RLM and the server, we lose the tight feedback loop between measurement and management which allows us to affect the remote client perceived response time in real-time, in an online manner. 4.8 Summary Remote Latency-based Management (RLM) is a new approach for managing the client perceived response time of a Web services infrastructure using only server-side techniques. RLM tracks each pageview download as it occurs and makes control decisions at each key juncture in the context of a specific pageview download. RLM introduces fast SYN and SYN/ACK retransmission mechanisms and uses content adaptation in the form of em- CHAPTER 4. REMOTE LATENCY-BASED WEB SERVER MANAGEMENT 182 bedded object removal and rewrite. These techniques can be applied based on control decisions during the pageview download. We have implemented RLM as a stand-alone Linux PC appliance that simply sits in front of a Web server complex and manages response time, without any changes to existing Web clients, servers, or applications. Our results demonstrate the importance of measuring client perceived pageview response time in managing Web server performance, the limitations of existing admission control techniques, and the benefits of RLM’s mechanisms for controlling response time to manage an overloaded Web server complex and to mitigate the negative impact of network latencies and loss. RLM provides a starting point for future work in developing a comprehensive management scheme that can provide a range of policies for controlling client perceived response times, and can meet different service level objectives for different classes of service. Many other packet manipulation techniques can be explored in this context, including manipulating drop, delay or retransmission of URL requests and responses. While RLM provides a blackbox approach to management when modifications to a Web server complex are not possible, its core management framework can be used with other more invasive mechanisms that can be deployed in the Web server complex to achieve a greater degree of control of resources to manage overall response time. CHAPTER 5. RELATED WORK 183 Chapter 5 Related Work Many approaches for measuring and managing Web server latency have been developed [94, 159, 86, 125, 53, 8, 123, 42, 30]. Each approach has contributed to the field in its own way, yet almost all approaches consist of the following elements, which provide a simple framework for viewing the measurement and management of Web servers: 1. A service level objective (SLO) which is the defined set of latency goals to achieve. The SLO is often expressed in the form of a set of policy rules and a set of classification rules that categorize clients or requests into different service classes. 2. A measure of latency which is to be controlled, often used as feedback for decision making and validation. 3. A set of effectors which can be configured to control the latency. Examples of an effector would be the scheduling mechanisms on CPU, disk or network transmission queues, or an admission control mechanism. 4. A decision algorithm for properly configuring the effector given a set of measurements and a target latency to achieve. CHAPTER 5. RELATED WORK 184 The two most commonly cited service level objectives are the relative and absolute objectives. Also known as proportional, a relative objective seeks to distinguish service classes by a relative factor. For example, low priority requests may experience at worst twice the latency than the high priority requests, yet neither the high priority or low priority requests will be guaranteed a specific threshold. In an absolute objective, each class is guaranteed to experience a latency below a certain threshold. For example, high priority requests are guaranteed to complete in under 3 s, while low priority requests are guaranteed to complete in under 10 s. Hybrids of the two have been proposed, and the definition of what a ‘guarantee’ actually means varies, with penalties being assessed for missed objectives. Verma [157] provides background on policy management and different types of service level agreements. A latency management approach that is only applicable with one type of service level objective is considered to be weaker than an approach which is applicable across several different types of service level objectives. Often the objective chosen is that which best fits the mechanism. For example, weighted fair queuing naturally lends itself towards a proportional objective, while admission control tends to be applied when response times exceed a specific threshold. The focus of this dissertation is not on policy related matters such as policy definition, management or evaluation. We used absolute objectives for our work in Chapter 4 with RLM simply because we felt that having an absolute threshold for response time is more relevant to the remote client. Our focus on managing the shape of the response time distribution has not been seen in the types of service level objectives that are discussed in the literature or appear on commercial Web sites. In this context we showed that defining service level objectives by the use of percentiles (e.g., 95th ) does not bode well in the context of admission control drops since the rejected transactions cause a significant shift in the response time distribution. Future work in service level agreement definition and CHAPTER 5. RELATED WORK 185 validation may incorporate many of the ideas presented in this dissertation. The focus of this dissertation has been on measuring remote client perceived response time using only the information available at the server complex. As such, in this chapter we describe existing techniques for measuring response time and indicate how our work is different from these other approaches. Existing approaches fail to capture all the key latencies (such as the latencies associated with SYN drops), or fail to capture the latencies for actual clients (instead capturing latencies for monitor machines), or simply measure the wrong thing (e.g., per URL server response time). In this dissertation we also applied novel server-side techniques for managing the remote client perceived response time distribution. As such, we present in this chapter existing latency management approaches and contrast them with our work, citing what we feel to be their strengths and weaknesses. Each of the latency management approaches we discuss here rely on one of the inferior methods presented in the next section for measuring response time (or a variant thereof). Whatever a management latency approach chooses as its measure of latency becomes the latency that is actually managed and controlled. Often the effector being used is an effective mechanism for controlling some resource (e.g., scheduling, admissions control), but it is usually applied to manage the wrong latency (e.g., server response time) or without an understanding of the effect it has on the remote client pageview response time. As such, our work is unique from the work presented in this chapter in that, unlike these other approaches, we base our management latency approach on an accurate measure of the remote client perceived pageview response time. By doing so this has led us to develop unique effectors (e.g., fast SYN + SYN/ACK retransmission) and decision algorithms specific to the problem of managing per pageview response times (e.g., accepting or dropping a SYN based on whether it belongs to the first or second connection of a pageview download) which also differentiates our work from the existing CHAPTER 5. RELATED WORK 186 approaches. 5.1 Measuring Client Perceived Response Time A key focus of this dissertation has been on accurately determining the response time, as perceived by the remote client. As a result, we developed a number of measurement techniques that allows us to determine latency using only information available at the Web server. We then verified our approach by validating our measure of latency against those taken at the client via detailed instrumentation. In this section we present existing alternative approaches for measuring latency. One approach for measuring response time being taken by a number of companies [69, 88, 98, 59, 149] is to periodically measure response times obtained by a geographically distributed set of monitors. These monitors can be fully instrumented to provide a complete measurement of response time across all of the ten steps previously discussed, as perceived by the monitors. However, this approach differs from our approach in at least five significant ways. First, no actual client transactions are being measured - only the response time for transactions generated by the monitors are reported. Second, this approach is based on coarse-grained sampling and may suffer from statistical biases. ksniffer and RLM, on the other hand, capture and analyze all actual client transactions, not just a sampling. Third, monitors are limited to performing transactions that do not affect other users or modify state in backend databases. For example, it would be unwise to configure a monitor to actually purchase an airline ticket or trade stock on an open exchange. Although it is possible to tag the transactions from monitors as ‘fake’, this requires that the server complex be changed to treat such transactions differently, hence exercising different code paths than those taken by actual client transactions. Both ksniffer and RLM do not require CHAPTER 5. RELATED WORK 187 changes to existing systems. It would, however, be an interesting study to compare the mean response time reported by monitor machines with the mean response time reported by ksniffer for the actual client pageviews. Fourth, the information gathered by monitors is generally not available at the Web server in real-time, limiting the ability of a Web server to respond to changes in response time to meet delay bound guarantees. Both ksniffer and RLM are placed at the server and produce results which are available in real-time. Lastly, CDN providers are known to place CDN servers near monitors used by these companies to artificially improve their own performance measurements [50]. This effect is not seen by the real clients being tracked by ksniffer and RLM. A second approach involves instrumenting existing Web pages with client-side scripting in order to gather client response time statistics (e.g., JavaScript) [131]. Like our approach, this approach can be used to account for actual client transactions. However, client-side scripting will always consider the start of the transaction to be sometime after the first byte of the HTTP response is received by the client and the client begins processing the HTTP response (step 8). A ‘post-connection’ approach as this does not account for any delays that occur in steps 1 through 7, including time due to TCP connection setup or waiting in kernel queues on the Web server. Throughout this dissertation we showed how important it is to capture TCP connection establishment latency, especially in the presence of admission control drops. Both ksniffer and RLM have been explicitly designed and engineered to determine these latencies. Client-side scripting also cannot be applied to non-HTML files that cannot be instrumented, such as PDF and Postscript files. It may also not work for older browsers or browsers with scripting capabilities disabled. Both ksniffer and RLM can determine response times for non-HTML content, regardless if scripting is enabled or disabled on the client browser, and without modification to existing systems. JavaScript measurements cannot accurately decompose the response time CHAPTER 5. RELATED WORK 188 into server and network components and therefore provide no insight into whether server or network providers would be responsible for problems. A packet level approach such as that used by ksniffer and RLM is able to provide these insights. Network behaviors such as packet retransmissions are not visible to a JavaScript executing within a Web browser. A third approach for measuring response time is to have the Web server application track when requests arrive and complete service [92, 94, 86, 8]. Like our approach, this approach has the desirable properties that it only requires information available at the Web server and can be used for non-HTML content. However, this approach only measures step 6 of the total response time - per URL server latency. Server latency measurements at the application-level do not properly include network interactions and provide no information on network problems that might occur and affect client perceived response time. They also do not account for overheads associated with the TCP protocol underlying HTTP, including the time due to TCP connection setup or waiting in kernel queues. These times can be significant, especially for servers which discard connection attempts to avoid overloading the server [159], or for servers which limit input queue lengths of an application server [53] in order to provide a bound on the time spent in the application layer. We showed in Chapter 2 how the Apache level measure of response time can be a order of magnitude less than the client perceived response time, then in Chapter 3 we presented mechanisms in ksniffer which can account for these latencies. A variant of this approach is to measure response time within the kernel of the Web server. We showed in Chapter 2 that without tracking or modeling the latencies associated with SYN drops and retransmissions, even the best designed kernel level measure of response time can be orders of magnitude less than the response time perceived by the remote client. A fourth approach is to simply log network packets to disk, and then use the log files to reconstruct the client response time [146, 6, 37, 60, 66, 75]. This kind of analysis CHAPTER 5. RELATED WORK 189 is performed offline, using multiple passes and limited to analyzing only reasonably sized log files [146]. Hence this technique cannot be used in a real-time latency management system. EtE [66] uses a packet level reconstruction approach similar to ksniffer and RLM to determine per pageview response time, but uses multiple pass, offline algorithms to do so. Although EtE does account for steps 4 to 9, it does not account for SYN drop latencies due to admission control or network loss. Scalability can also be a drawback with any packet capture/logging approach. The mechanism must be able to capture the packet at line speed while at the same time write all or a portion of the packet to disk. Both ksniffer and RLM execute within kernel space avoiding packet copies to user space and perform online analysis without the need to generate log files. BLT [60] is a system that aggregates TCP and HTTP information from packet traces to produce application-layer information. BLT is user-space program built on top of tcpdump to produce logs that are processed off-line, and is used in a number of network monitoring projects within AT&T [37]. Feldmann [60] describes many of the issues involved in TCP/HTTP reconstruction from packet traces, but does not consider the problem of measuring response time. Indeed, simply reconstructing the TCP sequence space is much simpler than accurately determining latencies, as perceived by the remote TCP endpoint. In addition to the above measurement approaches, a number of analytical models have been developed for modeling TCP latencies [122, 121, 38]. For example, Padhye et al. [121] derived a model of the steady state latency of a TCP bulk transfer for a given loss rate and round trip time. This model was further extended by Cardwell et al. [38] to include the effects of TCP three-way handshake and TCP slow start. The extended model can accurately estimate throughput for TCP transfers of a given length. These analytical models focus on modeling only a portion of the pageview response time - the TCP transfer latency. They assume a fixed packet loss rate that remains constant over time and is known CHAPTER 5. RELATED WORK 190 a priori. These assumptions are often invalid in measuring Web server performance. For example, SYN packet loss rates may change frequently due to server load or when the Web server uses SYN drops to manipulate its quality of service. The models and algorithms presented in this thesis are explicitly designed to handle large variances in SYN drop rates over time. As we mentioned in Chapter 4 we see the static models as being useful in the prediction of transfer latency, Ttransf er , but may need to be adjusted and combined with actual measurements for better accuracy. 5.2 Latency Management using Admission Control Throughout this dissertation we presented the problem that existing admission control approaches have in relation to controlling client perceived response time - existing approaches ignore the effect of admission control drops on the remote client perceived response time. In this section we mention several of these systems and contrast their approach to ksniffer and RLM. Quorum [31] is a proxy front-end that throttles URL requests into the server. Quorum is driven by the per URL server response time and reports results for only successful requests, ignoring the dropped URL requests and their impact on the pageview response time. Like us, they take a black box approach towards the server complex, requiring no changes to existing systems. However, being a user space proxy, they fail to capture kernel and network level effects on both sides of the proxy: client to/from the proxy and proxy to/from the server. The key issue is that this system will drop a URL request without understanding the context (page download) in which the object is being requested and the impact this has on the remote client. By comparison, RLM is not a user space proxy but a packet level capture and manipulation system, so it can track latencies at all network CHAPTER 5. RELATED WORK 191 protocol levels. The pageview correlation algorithms and online event node model allows RLM to understand the context of each URL being requested. Other systems suffer from similar problems. In [86], Kanodia and Knightly combine admission control and scheduling with service envelopes to provide relative delay bounds. They measure delay in user space on the Web Server and only for those transactions that are accepted, ignoring all transactions that are dropped due to admission control. In [53], Eggert and Heidemann take an application-level approach that allocates resources into two classes, foreground and background. They relegate lower priority responses to background processing. In [42], Chen et al. present an admission control algorithm named PACERS. Admission is based on the near future request rate and expected service time of the request. Unfortunately, only simulations were performed. A control theoretic approach is presented by Kamra et al. [85] in which a selftuning proportional integral controller was integrated into a front end Web proxy and then used in admission control. One key benefit to this approach is that a learning period is not required. Although they used client measured response time to validate that the controller/proxy overhead is insignificant (as compared to without using the proxy), the remainder of their testing and validation was performed by using response time measurements taken within the proxy, in user space, and not at the client. This work also focused on the per URL and not the pageview response time, and failed to report the effect that admission control drops have on latency (although the drop rates were reported). Welsh and Culler [162] decompose Internet services into multiple stages, each of which can perform admission control. They monitor the response time through a stage, where each stage can enforce its own targeted 90th percentile response time. Voigt et al. [159] proposed TCP SYN policing and prioritized accept queues to support different service classes. Elnikety et al. [54] developed a DB proxy that prevents overload on the CHAPTER 5. RELATED WORK 192 database server by performing admission control on the database queries being sent from the 2nd tier application server to the 3rd tier DB server. Although they directly manage the bottleneck resource by dropping DB queries, they ignore the effect this has on the pageview response time. In addition, these drops are being performed late, after much resource and processing has already been invested in developing a response. Our load shedding mechanisms in RLM perform packet stream manipulations before resources are invested by the server complex in generating a response. Cherkasova et al. [43] introduced the concept of session based admission control, the idea being that the true measure of a Web site is its ability to successfully complete sessions and considers the cost of rejecting a client session from the perspective of the amount of computational resources wasted on the rejected session. A session is defined to be a series of pageview downloads which culminate in a sale. Kihl and Widell [89] showed that session based admission control reduces the number of angry customers, which are defined to be those customers which gain access to the Web site but are unable to complete their full transaction. Yet, those customers that are rejected before they have spent any time in the site are not classified as angry. We see the possibility of extending RLM to support a variety of session-based management techniques. 5.3 Web Server Based Approaches The systems we presented in this dissertation are server-side solutions, yet require no modifications to existing server complex infrastructure. Both ksniffer and RLM rely on packet level analysis, avoiding the severe problems associated with measuring latencies within user space. What follows is a description of existing server-side approaches which fall prey to these and other problems associated with attempting to manage latency within CHAPTER 5. RELATED WORK 193 the Web server itself. eQoS [161] is an approach for managing pageview response time by adjusting the number of simultaneous connections being serviced by the Web server. This system tracks the activity between the client and Apache in user space within the Web server by intercepting socket calls in user space. As such, it is unable to detect latencies due to SYN drops, time spent in kernel queues, and network RTT and loss, and is therefore unable to obtain an accurate measure of client perceived response time. As shown in Figure 2.13, the writev() system call, which writes data to a socket, returns after the data is just copied from user space into the kernel - not when the transmission is actually completed. This approach simply manages the server response time, not the remote client perceived response time, and could be prone to effects we presented in Section 2.6 - throttling the number of simultaneous connections enough to cause undetected SYN drops in the kernel. On the other hand, our approach directly tracks the packet level interaction between the remote client and server, allowing us to capture latencies associated with both TCP and HTTP. The tight feedback loop between measurement and management allows RLM to manage the pageview download online, as it happens. Hellerstein et al. [51, 93, 125] are performing research into using control theory to control computer systems. In [93] they apply control theory to determine the proper value of the well known MaxClients setting in Apache to minimize response time for all users. They measure response time as the average wait time in the accept queue (ignoring the TCP 3-way handshake) plus the average number of HTTP requests per page times the apache response time. A major contribution of this dissertation is to show the significant affects that SYN drops have on remote client response time, and to not simply ignore it as they have. We present results in Figure 4.14 which clearly show the problems associated with throttling the MaxClients setting within Apache. The control theory approach CHAPTER 5. RELATED WORK 194 makes the assumption that the outputs of the system are a linear function of the system inputs, which is not typically true of a Web server. It also assumes that the system itself is time invariant, meaning that the system will always behave in exactly the same way for a given load. This is not true for a Web server whose service times change due to application changes. Sensitivity to instability is another concern. An oversensitive controller can cause oscillations in the system by rapidly changing from one extreme setting to another. Another issue is that training such a controller usually requires a training set that covers a large portion of the operating space under which the system is expected to run. RLM does not require a training period but rather can begin working once placed in front of a server complex. In [125] they are able to achieve latency goals for individual RPC requests sent to Lotus Notes by managing the length of the inbound request queue. Lu et al. [94] have also looked at the control theory approach, to provide relative delays. Pradhan et al. [129] presented a closed-loop adaptive mechanism for controlling a single tier Web server. Once again, this work was per URL based, unlike our work which is per pageview based, and unlike our work, requires modifications to the Web server complex. Other work on managing service quality on a per URL request includes [46, 30, 8, 3, 159, 41]. We feel that all these per URL approaches are missing the simple, highly important idea that a remote client does not download a single URL, but rather a whole pageview into his or her browser. 5.4 Content Adaptation In Chapter 4 we presented the content adaptation techniques of embedded object rewrite and removal used by RLM to manage the client perceived response time. What differentiates our work from prior art is that the application of these adaptation techniques is driven CHAPTER 5. RELATED WORK 195 by the goal of achieving a specified response time, as perceived by the remote client, for the currently active pageview download. In addition, our implementation is not proxy or Web server-based, but is instead based on a system which can analyze and manipulate packet streams, in real-time, as the pageview is being downloaded. Proxy or Web server based systems do not have the ability to accurately measure client perceived response time due to their inability to capture TCP level or kernel level latencies. T. Abdelzaher and N. Bhatti in [4] provide a good overview of the many issues involved with content adaptation within the Web server, using server load as the motivation for its application. In order for a Web server to perform embedded object rewrite, an initial offline process must take place that creates multiple versions of the embedded objects, each version varying in size and fidelity. Then, in order to perform embedded object rewrite, either (a) multiple versions of the container pages must be created with each version embedding objects of alternate sizes, or (b) the underlying file system on the Web server is changed so that when the Web server performs a sendfile() the symbolic link to that file is swapped to point to an alternate version of the file, or (c) if the container page is created dynamically, the program creating the container page must be altered to choose which size objects to embed. Similar modifications must be made to the Web server in order to support embedded object removal: multiple versions of container pages must be created in advance or modifications must be made to the scripts which dynamically create the container pages. Our server-side approach does not require changes to the existing Web server for embedded object rewrite and removal except that it requires the existence of alternate versions of an object. In their approach, content adaptation is driven by the need for load shedding on the Web server, where as in our approach we drive the use of content adaptation based on the latency required by the remote client. As a way to use latency to drive the content adaptation, they suggest the use of a server-based agent, which CHAPTER 5. RELATED WORK 196 executes directly on the Web server machine, that periodically measures the response time by obtaining a request from the Web server. This, once again, is indicative of the focus existing systems have placed on server response time and not the response time as perceived by the remote client. Content adaptation has been applied in the context of content negotiation, where the endpoints (client and server) negotiate the content format, such as when a browser requests that images be sent as .PNG rather than as .GIF files which the browser does not support [39]. The negotiation between endpoints typically requires two requests for each object downloaded: the first request is made to retrieve the list of alternative formats, the second request is made to actually retrieve the selected object [77]. In client-initiated content negotiation [143], first the browser obtains a list of alternate versions of the container page, then retrieves an estimate of the available bandwidth to the server from a monitoring machine (using SPAND [142]). Based on the bandwidth prediction and the size of the alternates, the browser estimates the transfer time for the different versions of the container page and chooses the one that most closely matches the user’s requested response time. In this approach, the browser chooses between a predefined set of alternative container pages, where each variant of the container page has references to a set of embedded objects whose total size varies between the variants. Once the variant of the container page is selected there is no going back - all embedded objects are retrieved. This differs from our approach in that we track the pageview as it is being downloaded and make decisions as each embedded object is being requested. In addition, our approach does not require changes to Web browsers or the existence of a monitor machine to provide the bandwidth estimate to the Web browser. In server-initiated content negotiation the amount of outbound available bandwidth is used to determine the variant to return, with the goal of keeping the outbound bandwidth to at most 90% of the link capacity. In our server-side CHAPTER 5. RELATED WORK 197 solution, we do not seek to maintain a specified bandwidth utilization. Instead, we seek to provide a specified response time. We do so by tracking the RTT and loss rate between the server and client to develop an estimate for the transfer latency of an object. We then use this estimate to predict the impact this object will have on the overall pageview response time, which is currently in progress. In our model we do not necessarily assume that the outbound bandwidth is the sole key contributor to the latency of the overall pageview download but instead treat the component latencies of the pageview separately, and adjust as the bottleneck latency changes. The IETF has two efforts related to content adaptation. The first, named Open Pluggable Edge Services (OPES) [22, 138, 63, 137, 21], is a working group defining an architecture and set of protocols for a series of network-based proxies (termed OPES servers) capable of modifying a request or response as the message traverses from one endpoint to the other. These intermediary proxies perform well-defined transformations on the message as it forwards the message on toward its intended destination, usually modifying the message by adding or modifying the HTTP headers in the request or response. In addition, they perform typical proxy-like functions such as accounting, translation, compression, redirection, and servicing requests from local cache. In the OPES architecture each intermediary OPES service is a TCP endpoint, thus a series of application-to-application hops are performed as the request or response traverses through the series of OPES servers. Such a model is much different than our approach which consists of a single device that is placed in front of the Web server for the purpose of packet stream analysis and manipulation. RLM is not a TCP endpoint or proxy, and as such, is able to capture RTT, network loss and latencies due to SYN drops and retransmissions. In OPES, the services applied to a request or response are distributed across multiple machines. In RLM, the measurement and control feedback loop is tightly bound within the same machine. We wonder about CHAPTER 5. RELATED WORK 198 the overhead associated with proxy-to-proxy hops as defined in OPES and what affect this may have on scalability. On the other hand, our approach requires maintaining the TCP sequence number space over the connection, which in turn places limitations on the amount of modifications we can perform on the HTTP request or response. The second IETF effort, Internet Content Adaptation Protocol (ICAP) [55], is a lightweight protocol for executing a remote procedure call on HTTP messages. An ICAP client can pass HTTP messages to an ICAP server for transformation. A typical example of ICAP usage would be a proxy which sends an HTTP container page response to an ICAP server which then inserts the correct advertising links into the container page. As such, ICAP servers are user level services which perform proxy-like functions - similar to OPES but with a different API. Once again this differs from our work which seeks to manage remote client perceived response time through packet stream analysis and manipulation, rather than provide a general use architecture for HTTP request/response transformation. However, this does raise the question as to whether or not RLM could be used in conjunction with OPES or ICAP based proxies. If the proxy is placed between RLM and the server complex, RLM will simply treat the proxy as part of the server complex. If such a proxy were positioned between the client and RLM, then RLM will consider the proxy as the remote client (i.e. the remote TCP endpoint). In such a case, RLM would be measuring and managing response time to the proxy and not the remote client. In Section 6.1 we consider the possibility of having distributed RLM machines, having associations with distributed sets of OPES or ICAP proxies. Related to embedded object removal is the optional HTTP ALT=“string” attribute. Used by most Web sites it allows a non-graphical browser to substitute string in place of the referenced image: CHAPTER 5. RELATED WORK 199 < IMG SRC=“sell.gif” ALT=“[sell icon]” > In cases where the image is strictly cosmetic (not clickable) the string is usually set to the NULL string “” so that the user sees nothing at all. The embedded object removal technique we describe blanks out the SRC= portion of the reference so that the browser is left with: < IMG ALT=“[sell icon]” > As such, this forces the browser to use the alternate text string where the image would have been displayed. One policy for embedded objects removal would be to only blank out the non-clickable images or strictly cosmetic images, whose ALT=“” or is absent. 5.5 TCP Level Mechanisms ksniffer and RLM track behaviors at a packet level, using an understanding of how TCP/IP behaves under various conditions. Other research has sought to control or manage latencies by examining the low-level behaviors of TCP/IP. Schroeder et al. [141] show how changing the outbound packet scheduler in Linux from the default (FCFS) to a Shortest-Remaining-Processing-Time-first scheduler (SRPT) can reduce mean response times for all clients in an overloaded Web server. Hence, they do not look at the multi-class, nor dynamic adjustments to the scheduler. They focus on comparing the two static scheduling approaches for a Web server serving only static pages. Since no content is dynamically generated, the bottleneck in the system is known a priori CHAPTER 5. RELATED WORK 200 to always be the transmission link between the server and clients, and they use file size as the predictor of processing time, hence ignoring RTT and loss. Packets are scheduled on the outbound link based on which file the packet is from, and how many bytes remain from the file to be transmitted. Their traffic generator emulates a user as a single, persistent connection over which multiple requests are issued, which does not match that of a typical end user in the Internet using IE or Netscape, which opens multiple persistent connections. They also measured client response time at the traffic generator machine on a per request basis, not a per pageview basis. Crovella et al. [49] schedule outbound bandwidth based on SRN, where the byte count of the response is used as the response length - this is applied when the bandwidth at the server is the bottleneck and also does not take into account RTT and loss. Barford and Crovella used critical path analysis to examine the bottlenecks that exist for a single TCP connection [25]. Jamjoom and Shin [82] showed the best approach to SYN dropping is all or nothing - accept the initial SYN or drop it and all its retransmissions. With Pillai [81] they presented Abacus, a modified token bucket filter, and showed under simulation how it smoothed out the synchronization effects associated with SYN retransmissions. A fast retransmission mechanism violating TCP was presented in the context of wireless networking [16, 18] to alleviate latencies associated with loss due to the physical medium. However, this was not applied to SYN and SYN/ACK processing but only to data transfer over established connections. Williamson and Wu [163] detail the effects of individual packet drops (i.e. SYN drops) on transfer time for a pageview download, motivating the need to merge HTTP level information into TCP/IP to reduce timeout and retransmission latencies. Nahum et al. [107] discuss the effects of RTT and loss over WAN. Offline, multiple pass pageview reconstruction from packet dumps was performed CHAPTER 5. RELATED WORK 201 in [66]. Bhatti et al. has studied user’s tolerance for delay [29]. Pradhan et al. [128] showed under simulation that a portal router residing in front of a Web server could reduce the TCP slow start time for short lived connections by modifying the receiver window size within client to server TCP packets, causing the server to skip slow start and immediately send larger amounts of packets. In addition, they effectively reduced the server perceived RTT by transmitting ACK packets to the server immediately from the portal server, before the data packet is received by the remote client. Since their focus was on reducing TCP slow start they did not consider mechanisms such as fast SYN + SYN/ACK retransmission for connection establishment as we presented here. 5.6 Packet Capture and Analysis Systems ksniffer and RLM have a basic dependence on fast packet capture. In Chapter 3 we showed the scalability of ksniffer by tracking and reconstructing pageview response times up to gigabit bandwidth rates. This exceeds the functional capabilities of existing systems at that rate. Many packet capture and analysis systems have been developed over the years [48, 47, 75, 109, 151, 97]. Although none of the systems perform protocol reconstruction to determine the remote client per pageview response time they can be compared to ksniffer and RLM in their design and engineering. Gigascope [48] is a general traffic monitor designed for high-speed IP backbone links. Gigascope is a compiled query system: the user submits a set of SQL-like queries and the system generates C/C++ code that is compiled, linked to a runtime system and then executed in user space. Queries filter the packet stream into smaller streams which are stored in a data warehouse for offline report generation. They demonstrate scalability, but not at the level of online functionality that ksniffer and RLM provides. In this dissertation CHAPTER 5. RELATED WORK 202 we have not sought to develop a general packet analysis and filtering system, but rather have sought to solve the problem of determining an accurate measurement of the client perceived response time, on a per pageview basis, using only server-side information. It might be possible to extend a system like Gigascope with the algorithms that are presented here, but most of the algorithms in this dissertation cannot be presented as SQL queries (e.g., tracking the state of a TCP connection). Aguilera et al. [6] use passive packet capture for performance debugging of distributed systems. They treat each machine in the distributed system as a black box and then use offline statistical methods to infer the relationships between machines based on traffic patterns. ksniffer and RLM do not use statistical methods, but rather perform protocol reconstruction, examining each packet’s relationship in the overall transaction. We do see that statistical pattern matching could be applied within ksniffer and RLM but at a higher level to determine historical patterns of behavior, which in turn could be used in prediction. NetQoS SuperAgent [109], a commercial product, is a passive traffic monitor that, like us, provides information such as round-trip times to remote clients and measures components of response times that are due to packet retransmissions. SuperAgent also analyzes certain TCP-based applications such as Oracle and Lotus Notes. They report response times on a per URL request basis instead of per page view response time, such as ksniffer and RLM. Packeteer [120] sells an inline traffic monitor/shaper appliance. Instead of connecting the device to a mirrored port, the device is plugged inline like a router, like RLM. Although the device can monitor a large number of protocols, it does not reconstruct pageview response times, but only reports response time on a per URL request basis. It counts the number of dropped SYNs, but does not track the latency associated with them. CHAPTER 5. RELATED WORK 203 Their focus is mostly on monitoring and shaping traffic in an enterprise network. The top model supports up to 500 Mbps and costs roughly $27,000 per unit. They also sell a centralized controller (software which runs on windows) that can collect and aggregate information from each monitor device. Hypertrak from Trio Networks [111], a similar system, logs each response time entry into an Oracle database, from which reports can be generated. Using a third party DB they can determine the geographic location of a remote client from his/her IP address. Their software performs filtering at the device driver, and runs on RH 7.1. Similar to Packeteer, they can deploy multiple boxes within a server farm to achieve scalability - a central controller collects and aggregates information to develop final reports. The ability to collect and correlate information from multiple devices was not within the scope of this dissertation. We see this functionality as being useful, but not a research challenge. Most other network traffic monitoring systems have focused on improving scalability thought packet filtering. The assumption here is that the amount of network traffic of real interest to the monitor is actually a small subset of the volume of the traffic as a whole. They are thus concerned with efficient techniques for filtering ‘irrelevant’ packets to reduce the processing load of the monitor. Systems falling into this category include the work by Mogul et al. [103], the BSD Packet Filter (BPF) [97], the CPSF language [103], tcpdump [80], libpcap [151], Ethereal [58]. Subsequent research efforts on packet filters have focused on improving performance, for example by merging common predicates in filters [165], redesigning filtering engines [15], using dynamic code generation [57], and through just-in-time compilation [27]. For high-volume Web sites, however, the vast majority of the traffic is of interest, and thus we require efficient techniques for processing as much of it as possible. In this vein, the use of programmable ethernet adapters that can copy packets directly into user space are of interest [56, 32]. The key drawback to these CHAPTER 5. RELATED WORK 204 devices are their expense. In chapter 3 we showed scalability up to the gigabit level using just off-the-shelf hardware. 5.7 Services On-demand Server farm management systems that allocate resources on-demand to meet specified response time goals are receiving much attention [134, 9, 96]. The ability of a Web hosting center to move CPU cycles, machines, bandwidth and storage from a hosted Web site that is meeting its latency goal to one that is not, is a key requirement for an automated management system. B. Urgaonkar and P. Shenoy state in their Cataclysm[156] paper that ‘a hosting platform can take one or more of three actions during an overload: (i) add capacity to the application by allocating idle or underused servers (ii) turn away excess requests and preferentially service only important requests, and (ii) degrade the performance of admitted requests in order to service a larger number of aggregate requests’. They did not seem to perceive the other option which we presented in this dissertation and also presented by T. Abdelzaher and N. Bhatti in [4], which is to reduce the amount of work required for each pageview. Throughout this dissertation we have assumed a fixed amount of physical resources are available within the server farm (i.e. blackbox). Nevertheless, on-demand allocation decisions must be based on accurate measurements. Over-allocating resources to one hosted Web site results in an overcharge to that customer and a reduction in the available physical resources left to meet the needs of the others. Under-allocation results in poor response time and unsatisfied Web site users. The ability to base these allocation decisions on a measure that is relevant to both the Web site owner and the end user of the Web site is a competitive advantage. Likewise, the opportunity to apply content adaptation instead CHAPTER 5. RELATED WORK 205 of throwing more physical resources at the problem could lead to a significant reduction in cost. Merging our on ksniffer and RLM with an on-demand resource management system is potentially interesting future work. 5.8 Stream-based Systems There has been much work related to analyzing data streams, which has focused on information retrieval and not actually applying the information towards managing Web server latency. In this context, the packet stream into and out of the Web server complex can be viewed as a data stream which is to be mined. Several research groups have taken the approach of developing Data Stream Management Systems (DSMS), capable of performing persistent SQL type queries over continuous data streams (i.e. performing selects and joins over multiple streams). The basic idea is to take the fundamentals of database systems and extend them into the realm of data streams. These extensions include making the queries persistent over time (rather than a simple request/response on a set of fixed size tables), adding functionality for handling relational tables which are unbounded in length (via the use of windows) and adding support for time related predicates. The largest research group addressing DSMS is located at Stanford [72]. They have published work related to building general purpose DSMS [72], data stream query languages [10], operator scheduling [11], load shedding queries [13], and clustering [74, 73, 114]. So far, their only application of this system has been towards network traffic management [14]. Other work has been focused on optimizing the query engine by sharing and ordering selection predicates [40, 95]. One major benefit to these systems is being able to use a well known existing API, such as SQL queries, to obtain information about the data streams. This provides the user the ability to compose SQL queries on the fly to obtain specific information by using CHAPTER 5. RELATED WORK 206 select and join. On the other hand, drawbacks exist. First, such a system is fairly large and complex, requiring multiple components, such as an SQL parser, optimizer and query processor. Second, SQL queries alone are insufficient for modeling complex entities such as a TCP connection. In order to track TCP connection state, one is required to maintain state information and make complex decisions with respect to prior and current events. SQL queries alone are not amenable to this problem. Indeed, GigaScope [47] provides the capability for installing user built custom modules to handle exactly this. Other modeling questions that cannot be expressed well using SQL is determining remote client subnet membership via longest prefix matching and longest path matching on a URL. As such, these systems are different than the work presented in this dissertation. Our systems were designed to solve a specific problem, where as these systems appear to be designed more for browsing streams to identify any interesting or useful patterns or information. Although further afield from our work, the problem of clustering and modeling online numeric data streams presents engineering challenges that are similar to the problem of tracking and modeling Web transactions (i.e. packet streams). Such systems [166, 62, 74, 20, 5, 150] require high speed incremental update, compact memory representations whose memory requirements grow slowly with the number of entities being tracked, and the ability to work well in the presence of incomplete information. An example of such a system would be a system to analyze real-time sensor data. Barbara et al. [19] and Babcock et al. [12] both provide a good overview of such issues. 5.9 Internet Standards Activity We now end the related work section by mentioning the work being done by relevant standards bodies. As mentioned earlier in this dissertation, no standards body is tackling CHAPTER 5. RELATED WORK 207 the problem of developing a standard definition of, or methodology for, measuring remote client perceived pageview response time. The Internet Engineering Task Force (IETF)[2] is a large open international community of network designers, operators, vendors, and researchers concerned with the evolution of the Internet architecture and the smooth operation of the Internet. The mission of the IETF is to produce high quality, relevant technical and engineering documents that influence the way people design, use, and manage the Internet in such a way as to make the Internet work better. These documents include protocol standards, best current practices, and informational documents of various kinds. Transactions over TCP (T/TCP) [34, 35] is a set of RFCs concerned with the latency problems associated with the TCP 3-way handshake. They propose the TCP Accelerated Open (TAO) mechanism for eliminating the TCP 3-way handshake delay. In TAO, the request and response are sent in the payload of the SYN and SYN/ACK packets, respectively. Each endpoint accepts the other’s sequence number if the sequence number is greater than the last sequence number seen from that host. This requires each endpoint maintain a per host cache of the last sequence number received by the host. This was proposed in 1992 and has never caught on for a variety of reasons, including the requirement to modify existing TCP implementations, C libraries and user level applications. The existing TCP specification does allow a data payload to be contained in a SYN packet, but data cannot be presented to the user application until the 3-way handshake is completed - hence, TAO proposed the use of cached sequence numbers to eliminate the need to wait for the 3rd part of the handshake to complete. Regardless, no existing TCP implementation provides the capability to include a data payload in an initial SYN packet. This is something of an artifact of the sockets API and socket system call API. A user application must first call connect() to establish the connection to the server (3-way handshake), then call writev() CHAPTER 5. RELATED WORK 208 to send data. Even if the call to connect() is non-blocking and a writev() is executed immediately after returning from connect() in hopes of appending the data as payload to the initial SYN packet, the underlying TCP implementation will return an error to the user application on the call to writev(). By contrast, our approach does not seek to change the TCP protocol, nor does it require changes to existing TCP implementations. The Interconnect Software Consortium (ICSC) [1] was formed with the purpose to develop and publish software specifications, guidelines and compliance tests that enable the successful deployment of fast interconnects such as those defined by the InfiniBand specification. The Extended Sockets API [71] presented by the ICSC proposes a new set of event driven, asynchronous socket calls to improve the performance and scalability of the sockets API. On the server side, the Web server will be able to accept a large number of incoming connections at once by making a single call to accept(). This reduces the number of system calls required to handle all the incoming connection requests since they are grouped into a single call. On the client side, the browser has little added benefit - an asynchronous connect() call instead of a blocking or non-blocking call. In either case, the TCP 3-way handshake is still required before data can be transfered over the connection. The work presented in this dissertation is essentially neutral with respect to the proposed Extended Sockets API. CHAPTER 6. CONCLUSION 209 Chapter 6 Conclusion This dissertation shows that it is possible to determine the remote client perceived response time for Web transactions using only server-side techniques and that doing so is useful for the management of latency based service level agreements. First, we presented Certes, a novel modeling algorithm, that accurately estimates connection establishment latencies as perceived by the remote clients, even in the presence of admission control drops. We presented a non-linear optimization that models the effect that the TCP exponential backoff mechanism has on connection establishment latency. We then presented an O(c) time and space online approximation algorithm as an improvement over the non-linear optimization. We implemented the fast online approximation algorithm on a Web server and performed numerous experiments validating that the latencies determined by our online model were within 5% of the latencies observed at the remote client. In addition, through the use of kernel level accept queue limit adjustment on the Web server, we showed how existing techniques of admission control which ignore the latency affects of SYN drops can have the exact opposite effect on response time in which they intend, making response time worse for the remote client rather than improving it. CHAPTER 6. CONCLUSION 210 The basic idea that a SYN drop does not deny service, but rather postpones service of a request has significant impact on all prior and future work involving admission control. Second, we presented ksniffer, an intelligent traffic monitor which accurately determines the pageview response times experienced by a remote client without any changes to existing systems or Web content. We presented novel algorithms for inferring the remote client perceived response time on a per pageview basis which take into account network loss, RTT, and incomplete information. By noticing gaps in TCP retransmissions we are able to infer the existence of a network packet drop affecting the pageview response time, without the dropped packet ever being captured by our system. Our algorithms perform embedded object correlation, online, in the absence of Referer: fields and in the presence of multiple simultaneous pageview downloads to the same remote IP address. Our kernel level design and implementation was shown capable of tracking packet streams at near gigabit rates, far exceeding the functionality and performance of existing systems. The basic idea that a client downloads an entire pageview and not just a single URL, and that the response time as measured at the remote browser and not the server response time is what matters to the client has profound impact on all prior and future work not only involving response time measurement but also impacts the basic concepts surrounding how Web server performance is benchmarked, evaluated, managed and controlled. Third, we presented Remote Latency-based Management (RLM), a system that controls the latencies experienced by the remote client by manipulating the packet traffic into and out of the Web server complex. RLM tracks the progress of each pageview download in real-time as it happens and manages latencies dynamically, as each embedded object is requested, making fine grained decisions on the processing of each request as it pertains to the overall pageview latency, as perceived by the remote client. Our system is based on a novel event node graph model that models the pageview download as a set of CHAPTER 6. CONCLUSION 211 specific, well-defined activities with latency, precedence and critical path attributes. Using packet manipulation techniques such as fast SYN + SYN/ACK retransmission and embedded object rewrite and removal, we are able to manipulate not only the mean pageview response time but also the shape of the pageview response time distribution, without modifying the Web server complex or Web content. The basic idea that the QoS given to an individual URL request should be based on the context of the pageview it is currently being downloaded for has significant impact on how Web server performance is viewed, evaluated and managed. An important aspect of our approach has been to experimentally validate our ideas with realistic workloads under Internet conditions of network loss and delay, which can have significant impact on response time. Toward this end, we have also implemented various techniques to make traffic generators as realistic as possible in their behavior. It has been our goal to have the most realistically behaving traffic generator possible. By studying the behavior of existing Web browsers, such as Microsoft’s Internet Explorer, we were able to uncover some notable effects that occur under failure conditions and incorporate this behavior into our own traffic generator. Likewise, we discovered behaviors in traffic generators being used by the research community today which do not at all mimic the behavior of Internet Explorer. For example, something as simple as the traffic generator closing the connection instead of keeping it open until the Web server closes the connection (which is how IE works) has significant impact on latency, load on the server, number of simultaneous idle connections at the server, and the perception of load by an admissions control mechanism. The basic idea that a traffic generator used to validate a latency management approach should mimic the behavior of real Web browsers under all conditions has significant impact on prior and future research involving Web server benchmarking and latency management. CHAPTER 6. CONCLUSION 212 6.1 Future Work This dissertation provides a foundation for future work in managing web server performance and also raises some fundamental questions to consider as follow-on research. The TCP exponential backoff mechanism as it currently is defined and implemented for connection establishment is antiquated and needs to be changed. We predict that future implementations of TCP will have a different mechanism, based on a change or extension to the TCP protocol specification. Indeed, what we learned about the effect that SYN drops have on connection establishment latency applies to any new protocols being developed for the Internet which do not rely on TCP as their underlying transport mechanism (e.g., peer to peer, chat). New protocols ought to be developed with the understanding of how retransmissions, similar in nature to those due to SYN drops, can affect overall response time. Perhaps another transport layer protocol will be developed for the Internet and gain wide acceptance; one which is a hybrid of TCP, UDP, etc. Instead of dropping a SYN (if the protocol does have the concept of a SYN) the protocol may return a packet indicating to the client when they ought to re-attempt connection establishment. Unfortunately, it may take many years for a new version of TCP or a new transport protocol to be deployed throughout the world. A key thought when developing systems is to layer and isolate (network) protocols from each other, with each layer in the protocol providing a well defined yet somewhat limited API. This can be extended from the network into the server with the kernel being one layer and the HTTP user level process being another. This is considered a tried and true method for making the development of complex systems more tractable. What we found, of course, is that this makes for considerable difficulties when trying to measure and correlate activities across multiple layers. Each layer below hides information from CHAPTER 6. CONCLUSION 213 the layer above, with little or no access to any latency measurement of real value. This layering concept needs to be re-evaluated; at the very least new protocols being developed ought to be designed with the explicit goal of making accurate latency measurement a key design point. As Web services become more distributed, measuring and managing the client perceived response time becomes more difficult. If any portion of a pageview can be obtained from any computer on the Internet, then the only single centralized point of measurement and control will be the Web browser itself. It will be up to the browser to choose from which computer to obtain each portion of the pageview. Yet, Web browsers may not have access to the information required to make an intelligent choice of which source to use. Likewise, as content becomes ever more distributed, Web sites lose their ability to control the client experience. In such a scenario it may be the Internet infrastructure itself which measures performance and routes requests based on the content locations, load on those machines and available or projected bandwidth on the paths to those machines. This implies an Internet where the browser can place a request for an object or data onto the Internet infrastructure and receive a response without specifying where to obtain it from. Such global load balancing may not be possible, but the infrastructure of the Internet could at least be extended to be more supportive of the measurement and management of Web transactions. Understanding client perception and behavior is key to a Web site success. As such, additional research needs to be focused on managing the perceptions and behaviors of the remote client. Future browsers ought to be able to measure, analyze and relay their users perceptions and intentions to the Web server complex, working in conjunction with the server complex to provide the best possible experience for the client when visiting the Web site. A full and thorough examination of how current browsers behave under various CHAPTER 6. CONCLUSION 214 conditions and how clients react may lead towards the development of better browsers, not just better Web site content [36]. A Web server in such a partnership with Web browsers would be pageview centric, optimizing the pageview downloads of all currently active clients, given the current and expected load, RTT and loss to each client, and prediction of near future behaviors of each client. On-demand resource allocation promises the ability to shift resources, in real-time, to exactly where they are needed. Such environments require accurate measures of response time as input to the allocation decision. In addition, research in this area needs to consider and address the issues involved in the cost trade-off between the choice of redirecting resources and the choice of adapting content or computation to the given set of resources. No industry standard method for measuring pageview response time exists today which makes scientific evaluation and comparison of existing techniques difficult at best. A well defined standard for the measurement of client perceived response time needs to be developed. This would allow for a more meaningful comparison between latency management approaches. Such a standard may spark the development of new service level agreements based on remote client perceived response time (existing service level agreements only contractually bind to availability objectives). Given such a standard, verification tools could then be written to validate a system’s ability to measure and manage the client perceived response time. Given the number of packet switching devices on the Internet and available for use in a multi-tier server complex, other packet manipulation techniques that can manage latency or control the server complex ought to be explored. We showed in Chapter 4 how embedded object removal shifted load from the front end Apache server to the backend DB server. Other techniques could exist that control the behaviors of remote clients, browsers CHAPTER 6. CONCLUSION or Web servers. 215 BIBLIOGRAPHY 216 Bibliography [1] The Interconnect Software Consortium. http://www.opengroup.org/icsc/. [2] The Internet Engineering Task Force. http://www.ietf.org/. [3] Tarek Abdelzaher, Kang G. Shin, and Nina Bhatti. Performance Guarantees for Web Server End-Systems: A Control-Theoretical Approach. IEEE Transactions on Parallel and Distributed Systems, 13(1):80–96, January 2002. [4] Tarek F. Abdelzaher and Nina Bhatti. Web Content Adaptation to Improve Server Overload Behavior. Computer Networks, 31(11-16):1563–1577, 1999. [5] Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu. A Framework for Clustering Evolving Data Streams. In Proceedings of the 29th International Conference on Very Large Data Bases, pages 81–92, Berlin, Germany, September 2003. [6] Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, and Athicha Muthitacharoen. Performance Debugging for Distributed Systems of Black Boxes. In Proceedings of the 19th ACM Symposium on Operating System Principles (SOSP ’03), pages 74–89, Lake George, NY, October 2003. [7] Mark Allman. A Web Server’s View of the Transport Layer. ACM Computer Communication Review, 30(4):133–142, October 2000. [8] Jussara Almeida, Mihaela Dabu, Anand Manikutty, and Pei Cao. Providing Differentiated Levels of Service in Web Content Hosting. In Technical Report CS-TR1998-1364, University of Wisconsin-Madison, 1998. Computer Sciences Department. [9] Karen Appleby, Sameh Fakhouri, Liana Fong, German Goldszmidt, Michael Kalantar, Srirama Krishnakumar, Donald Pazel, John Pershing, and Benny Rochwerger. Oceano-SLA Based Management of a Computing Utility. In Proceedings of the IFIP/IEEE Symposium on Integrated Network Management, pages 855–868. IEEE, May 2001. BIBLIOGRAPHY 217 [10] Arvind Arasu, Shivnath Babu, and Jennifer Widom. The CQL Continuous Query Language: Semantic Foundations and Query Execution. In Technical Report 200367. Stanford University, October 2003. [11] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Dilys Thomas. Operator Scheduling in Data Stream Systems. In Technical Report 2003-68. Stanford University, 2003. [12] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and Issues in Data Stream Systems. In Proceedings of the 21st ACM Symposium on Principles of Database Systems (PODS ’02), pages 1–16, Madison, Wisconsin, 2002. [13] Brian Babcock, Mayur Datar, and Rajeev Motwani. Load Shedding Techniques for Data Stream Systems. In Proceedings of the 2003 Workshop on Management and Processing of Data Streams (MPDS ’03), June 2003. [14] Shivnath Babu, Lakshminarayan Subramanian, and Jennifer Widom. A Data Stream Management System for Network Traffic Management. In Proceedings of the Workshop on Network-Related Data Management (NRDM ’01), May 2001. [15] Mary L. Bailey, Burra Gopal, Michael A. Pagels, Larry L. Peterson, and Prasenjit Sarkar. PATHFINDER: A Pattern-Based Packet Classifier. In Proceedings of the First Symposium on Operating Systems Design and Implementation (OSDI ’94), pages 115–123, Monterey CA, November 1994. [16] Hari Balakrishnan, Venkata Padmanabhan, Srinivasan Seshan, and Randy Katz. A Comparison of Mechanisms for Improving TCP Performance over Wireless Links. IEEE/ACM Transactions on Networking (TON), 5(6):756–769, December 1997. [17] Hari Balakrishnan, Hariharan S. Rahul, and Srinivasan Seshan. An Integrated Congestion Management Architecture for Internet Hosts. ACM SIGCOMM Computer Communication Review, 29(4):175–187, October 1999. [18] Hari Balakrishnan, Srinivasan Seshan, and Randy Katz. Improving Reliable Transport and Handoff Performance in Cellular Wireless Networks. ACM Wireless Networks, 1(4):469–481, December 1995. [19] Daniel Barbara. Requirements for Clustering Data Streams. ACM SIGKDD Explorations, 3(2):23–27, 2002. [20] Daniel Barbara and Ping Chen. Using the Fractal Dimension to Cluster Datasets. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 260–264, Boston, MA, 2000. BIBLIOGRAPHY 218 [21] Abbie Barbir, Eric Burger, Robin Chen, Stephen McHenry, Hilarie Orman, and Reinaldo Penno. RFC 3752: Open Pluggable Edge Services (OPES) Use Cases and Deployment Scenarios. IETF, April 2004. [22] Abbie Barbir, Reinaldo Penno, Robin Chen, Markus Hofmann, and Hilarie Orman. RFC 3835: An Architecture for Open Pluggable Edge Services (OPES). IETF, August 2004. [23] Paul Barford and Mark Crovella. Generating Representative Web Workloads for Network and Server Performance Evaluation. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’98), pages 151–160, July 1998. [24] Paul Barford and Mark Crovella. A Performance Evaluation of Hyper Text Transfer Protocols. ACM SIGMETRICS Performance Evaluation Review, 27(1):188–197, June 1999. [25] Paul Barford and Mark Crovella. Critical Path Analysis for TCP Transactions. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’00), pages 127–138, Stockholm, Sweden, 2000. [26] Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. Using Magpie for Request Extraction and Workload Modelling. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI ’04), pages 259–272, San Francisco, CA, December 2004. [27] Andrew Begel, Steven McCanne, and Susan L. Graham. BPF+: Exploiting Global Data-Flow Optimization in a Generalized Packet Filter Architecture. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’99), pages 123–134, August 1999. [28] Tim Berners-Lee, Roy Fielding, and Henrik Frystyk. RFC 1945: Hypertext Transfer Protocol – HTTP/1.0. IETF, May 1996. [29] Nina Bhatti, Anna Bouch, and Allan Kuchinsky. Integrating User-Perceived Quality into Web Server Design. In Proceedings of the 9th International World Wide Web Conference (WWW-9), pages 1–16, Amsterdam, Netherlands, May 2000. [30] Nina Bhatti and Rich Friedrich. Web Server Support for Tiered Services. IEEE Network, 13(5):64–71, September 1999. [31] Josep M. Blanquer, Antoni Batchelli, Klaus Schauser, and Rich Wolski. Quorum: Flexible Quality of Service for Internet Services. In Proceedings of the 2nd Symposium on Networked Systems Design and Implementation (NSDI ’05), pages 159– 174, Boston, MA, May 2005. BIBLIOGRAPHY 219 [32] Herbert Bos, Willem de Bruijn, Mihai Cristea, Trung Nguyen, and Georgios Portokalidis. FFPF: Fairly Fast Packet Filters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI ’04), pages 347–363, San Francisco, CA, December 2004. [33] Robert Braden. RFC 1122: Requirements for Internet Hosts - Communication Layers. IETF, October 1989. [34] Robert Braden. RFC 1379: Extending TCP for Transactions – Concepts. IETF, November 1992. [35] Robert Braden. RFC 1644: T/TCP – TCP Extensions for Transactions Function Specification. IETF, July 1994. [36] Browster. http://www.browster.org/. [37] R. Caceres, N. G. Duffield, A. Feldmann, J. Friedmann, A. Greenberg, R. Greer, T. Johnson, C. Kalmanek, B. Krishnamurthy, D. Lavelle, P. Mishra, K. K. Ramakrishnan, J. Rexford, F. True, and J. E. van der Merwe. Measurement and Analysis of IP Network Usage and Behavior. IEEE Communications Magazine, 38(5):144–151, May 2000. [38] Neal Cardwell, Stefan Savage, and Thomas Anderson. Modeling TCP Latency. In Proceedings of the IEEE Conference on Computer Communications (INFOCOMM ’00), pages 1742–1751, Tel-Aviv, Israel, March 2000. IEEE. [39] Surendar Chandra, Carla Ellis, and Amin Vahdat. Differentiated Multimedia Web Services using Quality Aware Transcoding. In Proceedings of the IEEE Conference on Computer Communications (INFOCOMM ’00), pages 961–969, Tel-Aviv, Israel, March 2000. [40] Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. NiagaraCQ: a Scalable Continuous Query System for Internet Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 379–390, Dallas, TX, 2000. [41] Xiangping Chen and Prasant Mohapatra. Providing Differentiated Service from an Internet Server. In Proceedings of the IEEE 8th International Conference On Computer Communications and Networks, pages 214–217, Boston, MA, October 1999. [42] Xiangping Chen, Prasant Mohapatra, and Huamin Chen. An Admission Control Scheme for Predictable Server Response Time for Web Accesses. In Proceedings of the 10th International World Wide Web Conference, pages 545–554, Hong Kong, China, May 2001. BIBLIOGRAPHY 220 [43] Ludmila Cherkasova and Peter Phaal. Session Based Admission Control: a Mechanism for Improving Performance of Commercial Web Sites. In Proceedings of the 7th IEEE/IFIP International Workshop on Quality of Service (IWQoS ’99), pages 226–235, London, UK, May 1999. [44] Tzi-Cker Chiueh and Prashant Pradhan. High Performance IP Routing Table Lookup using CPU Caching. In Proceedings of the IEEE Conference on Computer Communications (INFOCOMM ’99), pages 1421–1428, Orlando, FLA, March 1999. [45] Edith Cohen, Balachander Krishnamurthy, and Jennifer Rexford. Efficient Algorithms for Predicting Requests to Web Servers. In Proceedings of the IEEE Conference on Computer Communications (INFOCOMM ’99), pages 284–293, Orlando, FLA, March 1999. [46] Ira Cohen, Jeff Chase, Moises Goldszmidt, Terence Kelly, and Julie Symons. Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI ’04), pages 231–244, San Francisco, CA, December 2004. [47] Chuck Cranor, Theodore Johnson, Vladislav Shkapenyuk, and Oliver Spatscheck. Gigascope: A Stream Database for Network Applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, CA, May 2003. [48] Chuck Cranor, Theodore Johnson, Vladislav Shkapenyuk, Oliver Spatscheck, and G. Yuan. Gigascope: A Fast and Flexible Network Monitor. Technical Report TD-5ABQY6, AT&T Labs–Research, Floram Park, NJ, May 2002. [49] Mark Crovella, Robert Frangioso, and Mor Harchol-Balter. Connection Scheduling in Web Servers. In Proceedings of the 2nd USENIX Symposium on Internet Technologies and Systems (USITS ’99), pages 243–254, Boulder, CO, October 1999. [50] Peter Danzig. Ideas for Next Generation Content Delivery. Talk given at the 11th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV ’01), June 2001. [51] Yixin Diao, Joseph L Hellerstein, and Sujay Parekh. Optimizing Quality of Service Using Fuzzy Control. In Proceedings of the 13th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management: Management Technologies for E-Commerce and E-Business Applications (DSOM ’02), pages 42–53, Montreal, Canada, 2002. [52] Jeff Dike. User-mode Linux. http://user-mode-linux.sourceforge.net/. BIBLIOGRAPHY 221 [53] Lars Eggert and John Heidemann. Application-Level Differentiated Services for Web Servers. World Wide Web Journal, 3(2):133–142, August 1999. [54] Sameh Elnikety, Erich Nahum, John Tracey, and Willy Zwaenepoel. A Method for Transparent Admission Control and Request Scheduling in E-Commerce Web Sites. In Proceedings of the 13th International World Wide Web Conference (WWW2004), pages 276–286, New York, NY, May 2004. [55] Jeremy Elson and Alberto Cerpa. RFC 3507: Internet Content Adaptation Protocol (ICAP). IETF, April 2003. [56] Endace. http://www.Endace.com/. [57] Dawson R. Engler and M. Frans Kaashoek. DPF: Fast, Flexible Message Demultiplexing Using Dynamic Code Generation. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’96), pages 53–59, Palo Alto, CA, August 1996. [58] Ethereal. http://www.Ethereal.com/. [59] Exodus. http://www.Exodus.com/. [60] Anja Feldmann. BLT: Bi-Layer Tracing of HTTP and TCP/IP. In Proceedings of the 9th International World Wide Web Conference (WWW-9), pages 321–335, Amsterdam, Netherlands, May 2000. [61] Roy Fielding, Jim Gettys, Jeffrey Mogul, Henrik Frystyk, and Tim Berners-Lee. RFC 2068: Hypertext Transfer Protocol HTTP 1.1. IETF, January 1997. [62] Doug H. Fisher. Iterative Optimization and Simplification of Hierarchical Clusterings. Journal of AI Research, 4:147–180, 1996. [63] Sally Floyd and Leslie Daigle. RFC 3238: IAB Architectural and Policy Considerations for Open Pluggable Edge Services. IETF, January 2002. [64] The Apache Software Foundation. Apache. http://www.apache.org/. [65] FreeBSD. http://www.FreeBSD.org/. [66] Yun Fu, Ludmila Cherkasova, Wenting Tang, and Amin Vahdat. EtE: Passive Endto-End Internet Service Performance Monitoring. In Proceedings of the USENIX Annual Conference, pages 115–130, Monterey, CA, June 2002. [67] Phillip B. Gibbons and Yossi Matias. New Sampling-based Summary Statistics for Improving Approximate Query Answers. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 331–342, Seattle, WA, June 1998. BIBLIOGRAPHY 222 [68] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The John Hopkins University Press, 2715 North Charles Street, Baltimore, MD 21218-4319, 1996. [69] Gomez Inc. http://www.Gomez.com/. [70] Boston Consulting Group. http://www.BCG.com/. [71] The Open Group Sockets API Extensions Working Group. Extended Sockets API (ES-API) 1.0. http://www.opengroup.org/icsc/sockets/, January 2005. [72] The STREAM Group. STREAM: The Stanford Stream Data Manager. IEEE Data Engineering Bulletin, 26(1), March 2003. [73] Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, and Liadan O’Callaghan. Clustering Data Streams: Theory and Practice. IEEE Transactions on Knowledge and Data Engineering, 15(3):515–528, May/June 2003. [74] Sudipto Guha, Nina Mishra, Rajeev Motwani, and Liadan O’Callaghan. Clustering Data Streams. In Proceedings of the Annual Symposium on Foundations of Computer Science, pages 359–366. IEEE Computer Society, November 2000. [75] James Hall, Ian Pratt, and Ian Leslie. Non-Intrusive Estimation of Web Server Delays. In Proceedings of the 26th Annual IEEE Conference on Local Computer Networks (LCN), page 215, Tampa, Florida, November 2001. [76] Felix Hernandez-Campos, Kevin Jeffay, and F. Donelson Smith. Tracking the Evolution of Web Traffic: 1995-2003. In Proceedings of the 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS), pages 16–25, Orlando, FL, October 2003. [77] Koen Holtman and Andrew Mutz. RFC 2295: Transparent Content Negotiation in HTTP. IETF, May 1998. [78] Elbert Hu, Philippe Joubert, Richard King, Jason LaVoie, and John Tracey. Adaptive Fast Path Architecture. IBM Journal of Research and Development, 45(2):191– 206, April 2001. [79] IBM AlphaWorks. Page Detailer. http://www.alphaworks.ibm.com/tech/pagedetailer. [80] Van Jacobson, Craig Leres, and Steve McCanne. tcpdump. ftp://ftp.ee.lbl.gov/. [81] Hani Jamjoom, Padmanabhan Pillai, and Kang G. Shin. Re-synchronization and Controllability of Bursty Service Requests. IEEE/ACM Transactions on Networking (TON), 12(4):582–594, August 2004. BIBLIOGRAPHY 223 [82] Hani Jamjoom and Kang G. Shin. Persistent Dropping: an Efficient Control of Traffic Aggregates. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’03), pages 287–298, Karlsruhe, Germany, August. [83] Philippe Joubert, Richard King, Rich Neves, Mark Russinovich, and John M. Tracey. High-Performance Memory-Based Web Servers: Kernel and User-Space Performance. In Proceedings of the USENIX Annual Conference, pages 175–188, Boston, MA, June 2001. [84] M. Frans Kaashoek, Dawson Engler, Gregory R. Ganger, Hector Briceno, Russell Hunt, David Mazieres, Tom Pinckney, Robert Grimm, John Janotti, and Kenneth Mackenzie. Application Performance and Flexibility on Exokernel Systems. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP ’97), Saint-Malo, France, October 1997. [85] Abhinav Kamra, Vishal Misra, and Erich Nahum. Yaksha: A Self-Tuning Controller for Managing the Performance of 3-Tiered Web Sites. In Proceedings of the 12th IEEE/IFIP International Workshop on Quality of Service (IWQoS ’04), pages 47–56, Montreal, Canada, June 2004. [86] Vikram Kanodia and Edward W. Knightly. Multi-Class Latency-Bounded Web Services. In Proceedings of the 8th IEEE/IFIP International Workshop on Quality of Service (IWQoS ’00), pages 231–239, Pittsburgh, PA, June 2000. [87] Terry Keeley. Thin, High Performance Computing over the Internet. In Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS ’00), page 407, San Francisco, CA, 2000. [88] KeyNote. http://www.KeyNote.com/. [89] Maria Kihl and Niklas Widell. Admission Control Schemes Guaranteeing Customer QoS in Commercial Web Sites. In Proceedings of the IFIP/IEEE Conference on Network Control and Engineering (NETCON ’02), pages 305–316, Paris, France, October 2002. [90] Balachander Krishnamurthy and Jia Wang. On Network-Aware Clustering of Web Clients. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’00), pages 97–110, Stockholm, Sweden, August 2000. [91] Sam C. M. Lee, John C. S. Lui, and David K. Y. Yau. Admission Control and Dynamic Adaptation for a Proportional-Delay DiffServ-Enabled Web Server. In Proceedings of the International Conference on Measurement and Modeling of BIBLIOGRAPHY 224 Computer Systems (SIGMETRICS ’02), pages 172–182, Marina Del Rey, CA, June 2002. [92] Kelvin Li and Sugih Jamin. A Measurement-Based Admission-Controlled Web Server. In Proceedings of the IEEE Conference on Computer Communications (INFOCOMM ’02), pages 651–659, New York, NY, June 2002. [93] Xue Liu, Lui Sha, Yixin Diao, Steve Froehlich, Joseph L. Hellerstein, and Sujay Parekh. Online Response Time Optimization of Apache Web Server. In Proceedings of the IEEE/IFIP International Workshop on Quality of Service (IWQoS ’03), pages 461–478, 2003. [94] Chenyang Lu, Tarek Abdelzaher, John Stankovic, and Sang H. Son. A Feedback Control Approach for Guaranteeing Relative Delays in Web Server. In Proceedings of the 7th IEEE Real-Time Technology and Applications Symposium, Taipei, Taiwan, June 2001. [95] Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman. Continuously Adaptive Continuous Queries over Streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 49–60, Madison, Wisconsin, 2002. [96] Francisco Matias, Cuenca-Acuna, and Thu D. Nguyen. Self-Managing Federated Services. In Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems (SRDS ’04), pages 240–250, Florianopolis, Brazil, October 2004. [97] Steven McCanne and Van Jacobson. The BSD Packet Filter: A New Architecture for User-level Packet Capture. In Proceedings of the Winter USENIX Technical Conference, pages 259–270, 1993. [98] Mercury Interactive. http://www-heva.MercuryInteractive.com/. [99] Microsoft. http://www.MicroSoft.com/. [100] Paul Mockapetris. RFC 1034: Domain Names Concepts and Facilities. IETF, November 1987. [101] Paul Mockapetris. RFC 1035: Domain Names Implementation and Specification. IETF, November 1987. [102] Jeffrey Mogul and K. K. Ramakrishnan. Eliminating Receive Livelock in an Interrupt-Driven Kernel. ACM Transactions on Computer Systems (TOCS), 15(3):217–252, 1997. BIBLIOGRAPHY 225 [103] Jeffrey Mogul, Richard Rashid, and Michael Accetta. The Packet Filter: An Efficient Mechanism for User-Level Network Code. In Proceedings of the 11th Symposium on Operating System Principles (SOSP ’87), pages 39–51, Austin, TX, November 1987. [104] Multi-Threaded Routing Toolkit. http://www.mrtd.net/. [105] MySQL. http://www.MySQL.com. [106] Erich Nahum, Tsipora Barzilai, and Dilip Kandlur. Performance Issues in WWW Servers. ACM SIGMETRICS Performance Evaluation Review, 27(1):216–217, May 1999. [107] Erich Nahum, Marcel Rosu, Srinivasan Seshan, and Jussara Almeida. The Effects of Wide-Area Conditions on WWW Server Performance. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’01), pages 257–267, Cambridge, MA, June 2001. [108] NetBSD. http://www.NetBSD.org/. [109] NetQoS. http://www.NetQoS.com/. [110] Netscape. http://www.Netscape.com/. [111] Trio Networks. Hypertrak. http://www.TrioNetworks.com/. [112] Henrik Frystyk Nielsen, James Gettys, Anselm Baird-Smith, Eric Prud’hommeaux, Hkon Wium Lie, and Chris Lilley. Network Performance Effects of HTTP/1.1, CSS1, and PNG. ACM SIGCOMM Computer Communication Review, 27(4):155– 166, October 1997. [113] Jakob Nielsen. The Need for Speed. http://www.useit.com/alertbox/9703a.html, March 1997. [114] Liadan O’Callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, and Rajeev Motwani. Streaming-Data Algorithms For High-Quality Clustering. In Proceedings of the 18th IEEE International Conference on Data Engineering (ICDE ’02), page 685, San Jose, CA, March 2000. [115] David Olshefski and Jason Nieh. Understanding the Management of Client Perceived Response Time. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’06), pages 240–251, Saint-Malo, France, June 2006. BIBLIOGRAPHY 226 [116] David Olshefski, Jason Nieh, and Dakshi Agrawal. Inferring Client Response Time at the Web Server. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’02), pages 160–171, Marina Del Rey, CA, June 2002. [117] David Olshefski, Jason Nieh, and Dakshi Agrawal. Using Certes to Infer Client Response Time at the Web Server. ACM Transactions on Computer Systems (TOCS), 22(1):49–93, February 2004. [118] David Olshefski, Jason Nieh, and Erich Nahum. ksniffer: Determining the Remote Client Perceived Response Time from Live Packet Streams. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI ’04), pages 333–346, San Francisco, CA, December 2004. [119] OneStat. Microsoft’s Windows OS global market share is more than 97% according to OneStat.com. OneStat Press Release, September 2002. [120] Packeteer. http://www.Packeteer.com/. [121] Jitendra Padhye, Victor Firoiu, Don Towsley, and Jim Kurose. Modeling TCP Throughput: A Simple Model and its Empirical Validation. ACM SIGCOMM Computer Communication Review, 28(4):303–314, October 1998. [122] Jitendra Pahdye and Sally Floyd. On Inferring TCP Behavior. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’01), pages 287–298, San Diego, CA, August 2001. [123] Raju Pandey, J. Fritz Barnes, and Ronald Olsson. Supporting Quality of Service in HTTP Servers. In Proceedings of the 17th Annual ACM Symposium on Principles of Distributed Computing, pages 247–256, Puerto Vallarta, Mexico, June 1998. [124] Athanasios Papoulis and S. Unnikrishna Pillai. Probability, Random Variables, and Stochastic Processes. McGraw-Hill Series in Electrical Engineering, 2001. [125] Sujay Parekh, Neha Gandhi, Joe Hellerstein, Dawn Tilbury, T. S. Jayram, and Joe Bigus. Using Control Theory to Achieve Service Level Objectives In Performance Management. In Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management Conference Proceedings, pages 841–854, Seattle, WA, May 2001. [126] Vern Paxson and Mark Allman. RFC 2988: Computing TCP’s Retransmission Timer. IETF, November 2000. [127] John Postel. RFC 793: Transmission Control Protocol. IETF, September 1981. BIBLIOGRAPHY 227 [128] Prashant Pradhan, Tzi cker Chiueh, and Anindya Neogi. Aggregate TCP Congestion Control Using Multiple Network Probing. In Proceedings of the 20th IEEE International Conference on Distributed Computing Systems (ICDCS ’00)), page 30, Taipei, China, April 2000. [129] Prashant Pradhan, Renu Tewari, Sambit Sahu, Abhishek Chandra, and Prashant Shenoy. An Observation-based Approach Towards Self-managing Web Servers. In Proceedings of the 10th IEEE/IFIP International Workshop on Quality of Service (IWQoS ’02), pages 13–22, Miami, FL, May 2002. [130] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C: The Art of Scientific Computing, 2nd Edition. Cambridge University Press, United Kingdom, 1992. [131] Ramakrishnan Rajamony and Mootaz Elnozahy. Measuring Client-Perceived Response Times on the WWW. In Proceedings of the 3rd USENIX Symposium on Internet Technologies and Systems (USITS ’01), San Francisco, CA, March 2001. [132] Red Hat Inc. The Tux WWW server. http://people.redhat.com/∼mingo/TUXpatches/. [133] RedHat. http://www.RedHat.com/. [134] IBM T.J. Watson Research. The Oceano Project. http://www.research.ibm.com/oceanoproject/. [135] RIPE Network Coordination Centre. http://data.ris.ripe.net/. [136] Luigi Rizzo. Dummynet: a Simple Approach to the Evaluation of Network Protocols. ACM SIGCOMM Computer Communication Review, 27(1):31–41, January 1997. [137] Alex Rousskov. RFC 4037: Open Pluggable Edge Services (OPES) Callout Protocol (OCP) Core. IETF, March 2005. [138] Alex Rousskov and Martin Stecher. RFC 4236: HTTP Adaptation with Open Pluggable Edge Services (OPES). IETF, November 2005. [139] Alessandro Rubini. rshaper. http://www.linux.it/∼rubini/software/index.html. [140] Stephan Savage. Sting: a TCP-based Network Measurement Tool. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS ’99), pages 71–79, Boulder, CO, October 1999. [141] Bianca Schroeder and Mor Harchol-Balter. Web Servers under Overload: How Scheduling can Help. ACM Transactions on Internet Technology, 6(1):20–52, 2006. BIBLIOGRAPHY 228 [142] Srinivasan Seshan, Mark Stemm, and Randy H. Katz. SPAND: Shared Passive Network Performance Discovery. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS ’97), Monterey, CA, December 1997. [143] Srinivasan Seshan, Mark Stemm, and Randy H. Katz. Benefits of Transparent Content Negotiation in HTTP. In Proceedings of the IEEE GLobcom 98 Internet MiniConference, Sydney, Australia, November 1998. [144] Peter Sevcik. Customers Need Performance-Based Network ROI Analysis. Business Communications Review, pages 12–14, July 1999. [145] Biplab Sikdar, Shivkumar Kalyanaraman, and Kenneth Vastola. Analytic Models and Comparative Study of the Latency and Steady-State Throughput of TCP Tahoe, Reno and SACK. In IEEE GLOBECOMM, pages 100–110, San Antonio, TX, November 2001. [146] F. Donelson Smith, Felix Hernandez Campos, Kevin Jeffay, and David Ott. What TCP/IP Protocol Headers can tell us about the Web. ACM SIGMETRICS Performance Evaluation Review, 29(1):245–256, June 2001. [147] Freshwater Software. Web Server Monitor. http://www.freshtech.com/white paper/bookchapter/chapter.html. [148] W. Richard Stevens. TCP/IP Illustrated, Volume 1 The Protocols. Addison-Wesley, Massachusetts, 1994. [149] StreamCheck. http://www.StreamCheck.com/. [150] Nitin Thaper, Sudipto Guha, Piotr Indyk, and Nick Koudas. Dynamic Multidimensional Histograms. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 428–439, 2002. [151] The libpcap project. http://sourceforge.net/projects/libpcap/. [152] The Transaction Processing Council (TPC). http://www.TPC.org/tpcw. [153] Tomcat 5.5. http://www.jakarta.apache.org/tomcat. [154] TPC-W Java Implementation. http://mitglied.lycos.de/jankiefer/tpcw. [155] University of Oregon Route Views Archive Project. http://archive.routeviews.org/. [156] Bhuvan Urgaonkar and Prashant Shenoy. Cataclysm: Policing Extreme Overloads in Internet Applications. In Proceedings of the 14th International World Wide Web Conference (WWW2005), pages 740–749, Chiba, Japan, May 2005. BIBLIOGRAPHY 229 [157] Dinesh Verma. Policy-Based Networking: Architecture and Algorithms. New Riders, ISBN 1-57870-226-7, 2000. [158] VMWare. http://www.VMWare.com/. [159] Thiemo Voigt, Renu Tewari, Ashish Mehra, and Douglas Freimuth. Kernel Mechanisms for Service Differentiation in Overloaded Web Servers. In Proceedings of the USENIX Annual Conference, pages 189–202, Boston, MA, June 2001. [160] WebStone. http://www.mindcraft.com/. [161] Jianbin Wei and Cheng-Zhong Xu. eQoS: Provisioning of Client-Perceived End-toEnd QoS Guarantees in Web Servers. In Proceedings of the International Workshop on Quality of Services (IQWoS 2005), Passau, Germany, June 2005. [162] Matt Welsh and David Culler. Adaptive Overload Control for Busy Internet Servers. In Proceedings of USENIX Symposium on Internet Technologies and Systems (USITS ’03), Seattle, Washington, 2003. [163] Carey Williamson and Qian Wu. A Case for Context-Aware TCP/IP. ACM SIGMETRICS Performance Evaluation Review, 29(4):11–23, 2002. [164] Maya Yajnik, Sue B. Moon, James F. Kurose, and Donald F. Towsley. Measurement and Modeling of the Temporal Dependence in Packet Loss. In Proceedings of the IEEE Conference on Computer Communications (INFOCOMM ’99), pages 345– 352, Orlando, FLA, March 1999. [165] Masanobu Yuhara, Brian N. Bershad, Chris Maeda, and J. Eliot B. Moss. Efficient Packet Demultiplexing for Multiple Endpoints and Large Messages. In Proceedings of the Winter USENIX Technical Conference, pages 153–165, 1994. [166] Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: A Efficient Data Clustering Method for Very Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 103–114, Montreal, Canada, 1997. [167] Yin Zhang, Nick Duffield, Vern Paxson, and Scott Shenker. On the Constancy of Internet Path Properties. In Proceedings of ACM SIGCOMM Internet Measurement Workshop (IMW ’01), San Francisco, CA, November 2001. [168] Yin Zhang, Vern Paxson, and Scott Shenker. The Stationarity of Internet Path Properties: Routing, Loss and Throughput. In ACIRI Technical Report, May 2000.