An Autonomous Management and Control System for Content
Transcription
An Autonomous Management and Control System for Content
An Autonomous Management and Control for Content Delivery Networks Bruce Zamaere 2009.04.24 Master of Science Thesis Report Conducted at Ericsson Research AB Industrial Supervisor: Victor Souza Academic Supervisor: Johan Montelius School of Information and Communication Technology (ICT) Royal Institute of Technology (KTH) 2 Abstract Content Distribution Networks (CDNs) have been around for almost a decade but in recent years have gained a phenomenal amount of interest and attention. Today there are many factors that can be attributed to fuelling the rate of adoption of CDN services. These factors include the development and deployment of new protocols, applications and services on the Internet that demand higher bandwidth and lower latency. This is further exacerbated by an ever growing demand for improved end-user experience and quality of service from end-users that are spread right across the globe. Though CDNs present content providers with an array of solutions, the operation and management of the CDNs themselves on a global scale is inherently a complex and laborious task. CDN infrastructures are always in a constant state of flux. This is because CDN operators constantly have to adjust and re-adjust to meet the needs of both content providers and the everchanging demands of end-users. This work presents a novel approach that employs a multi-agent system architecture to create an autonomous management and control framework for CDNs. A multi-agent system architecture that comprises stationary and mobile agents is described. A proof-of-concept prototype is realised that adheres to open industry standards for greater interoperability. Arguments are presented that show how this approach is able to mitigate and alleviate some of the major issues that make the operations of a CDN a complex task through intelligence, collaboration and communication. Finally, through empirical measurements the performance of the proposed system is shown for a variety of test cases and use case scenarios. i Sammanfattning <<The Swedish version of the abstract should be here>> ii Table of Contents Chapter 1 | Introduction...........................................................................................................1 1.1 Background.................................................................................................................................1 Chapter 2 | Theoretical Background.......................................................................................5 2.1 The Structure of the Internet.......................................................................................................5 2.2 Content Delivery Overview........................................................................................................6 2.3 Content Delivery Networks........................................................................................................7 2.4 Agent Technology....................................................................................................................11 Chapter 3 | Research Objectives............................................................................................17 3.1 Motivation................................................................................................................................17 3.2 Research Objectives.................................................................................................................18 3.3 Research Questions..................................................................................................................18 3.4 Scope........................................................................................................................................19 3.5 Research Method......................................................................................................................19 3.6 Expected results........................................................................................................................20 3.7 Evaluation strategy...................................................................................................................20 Chapter 4 | Related Work.....................................................................................................21 Chapter 5 | The JADE Platform............................................................................................23 5.1 JADE Architecture...................................................................................................................23 5.2 JADE Agents............................................................................................................................24 5.3 JADE Service Agents...............................................................................................................24 5.4 Agent Communication..............................................................................................................25 5.5 Behaviours................................................................................................................................26 5.6 Agent mobility and cloning......................................................................................................26 5.7 Agent Security..........................................................................................................................26 Chapter 6 | Proposed Architecture.......................................................................................27 6.1 Overview..................................................................................................................................27 6.2 CDN Implementation...............................................................................................................28 6.3 Functional overview.................................................................................................................29 6.4 Multi-Agent System Architecture............................................................................................32 Chapter 7 | Implementation...................................................................................................35 7.1 Architectural Overview............................................................................................................35 7.2 Agent Roles..............................................................................................................................36 7.3 Agent Reasoning and Intelligence............................................................................................39 7.4 Use Case Scenarios...................................................................................................................41 Chapter 8 | Measurements.....................................................................................................47 8.1 Testbed overview......................................................................................................................47 8.2 Migration Time.........................................................................................................................47 8.3 Creation Time...........................................................................................................................48 8.4 Deployment Time.....................................................................................................................49 8.5 Agent Population......................................................................................................................49 8.6 Ping-Pong Protocol...................................................................................................................49 8.7 Policy Decision Point...............................................................................................................50 8.8 Inference Engine.......................................................................................................................50 Chapter 9 | Discussion............................................................................................................51 9.1 Time Dependent Measurements...............................................................................................51 9.2 Agent Population......................................................................................................................51 9.3 Bundling Overhead...................................................................................................................51 9.4 Network Load...........................................................................................................................52 9.5 Maintaining Consistency..........................................................................................................52 iii 9.6 Resilience and Stability............................................................................................................52 9.7 Matchmaking / Service Discovery...........................................................................................53 9.8 Built-in Inference Engine.........................................................................................................53 9.9 All-Mobile Agent Architecture................................................................................................54 9.10 Global vs Local Optimisation.................................................................................................54 9.11 Global Decisions vs Local Decisions.....................................................................................54 9.12 Ping-Pong Protocol.................................................................................................................55 9.13 Security Considerations..........................................................................................................55 9.14 Degree of Autonomy..............................................................................................................56 9.15 Economic Considerations.......................................................................................................56 Chapter 10 | Conclusions........................................................................................................59 Chapter 11 | Future work.......................................................................................................61 11.1 Custom agent platform...........................................................................................................61 11.2 Further measurements of performance...................................................................................61 11.3 Peer to peer location of platforms..........................................................................................61 11.4 Adaptiveness...........................................................................................................................61 11.5 Resource Consumption...........................................................................................................62 iv List of Figures Figure 2-1: Content Delivery over the Internet.................................................................xvii Figure 2-2: Internet user traffic across all networks Source [20]...................................xviii Figure 2-3: Infrastructure components of a CDN Source [16]..........................................xix Figure 2-4: Classification of Autonomous Agents Source [33]........................................xxiii Figure 5-5: An Overview of the Main Architectural Elements of JADE.....................xxxiv Figure 5-6: Source Code for a Simple JADE Agent........................................................xxxv Figure 5-7: Screen shot of the JADE RMA.....................................................................xxxvi Figure 6-8: Overview of Media Agent...........................................................................xxxviii Figure 6-9: Format of Agent Names.....................................................................................xli Figure 6-10: Basic Structure of an Inference Rule............................................................xliv Figure 7-11: Proposed Architectural Overview.................................................................xlvi Figure 7-12: Screenshot of Controller Agent.....................................................................xlix Figure 7-13: Screenshot of the Client Application..................................................................l Figure 7-14: Sample Rule for a Media Agent........................................................................li Figure 7-15: Sequence Diagram - Agent Deployment.........................................................liii Figure 7-16: Sequence Diagram - Update Content Event..................................................liii Figure 7-17: Sequence Diagram - Policy Based Decisions...................................................lv Figure 7-18: Sequence Diagram - Client Requests Content...............................................lvi Figure 8-19: Migration Time of Agents................................................................................lix v List of Tables Table 1: Summary of Agent Middleware...........................................................................xxv Table 2: Sample Decision Matrix for Services Received.......................................................li Table 3: Size of Ping-Pong Protocol Messages......................................................................lx vi Acronyms and Abbreviated Terms ACL Agent Communication Language AID Agent Identifier AMS Agent Management System AOSE Agent Oriented Software Engineering AP Agent Platform API Application Programming Interface AS Autonomous System BGP Border Gateway Protocol CDN Content Delivery Network DARPA Department of Advanced Research Projects Agency DDoS Distributed Denial of Service DF Directory Facilitator EGP Exterior Gateway Protocol FIPA Foundation for Intelligent Physical Agents FTP File Transfer Protocol HD High Definition HTTP Hyper Text Transfer Protocol IGP Interior Gateway Protocol IIOP Internet Inter-ORB Protocol IOR Interoperable Object Reference IP Internet Protocol IPMS Inter Platform Mobility Service ISP Internet Service Provider J2SE Java 2 Standard Edition JADE Java Agent DEvelopment Framework vii JESS Java Expert System Shell JICP JADE Internal Communication Protocol JVM Java Virtual Machine MAS Multi-Agent System MPLS Multi-Protocol Label Switching MTP Message Transport Protocol OMG Object Management Group ORB Object Request Broker OS Operating System OSPF Open Shortest Path First P2P Peer to Peer PoP Point of Presence QoS Quality of Service RFC Request for Comment RIP Routing Information Protocol RMA Remote Monitoring Agent RMI Remote Method Invocation RPC Remote Procedure Call SLA Service Level Agreement SMA Software Mobile Agents SNMP Simple Network Management Protocol TCP Transmission Control Protocol URI Uniform Resource Locator W3C World Wide Web Consortium XACML eXtensible Access Control Markup Language XML eXtensible Markup Language viii Acknowledgements I would like to thank both my supervisors: Victor Souza, my industrial supervisor, who spent endless hours debugging code with me, and Johan Montelius, my academic supervisor, for always pointing me in the right direction. I would also like to thank Per Karlsson, and the whole Packet Technologies (TLA) Research Group at Ericsson for giving me this opportunity to conduct interesting research and their support throughout the months I was at Ericsson Research. I would also like to thank Bemnet Merha Tesfaye who was always available to bounce implementation ideas off of over a working lunch (that he, more often than not, paid for). I truly value your friendship. Last but not least, I would like to thank my family for always being there for me, for their patience and understanding throughout my studies. I dedicate this thesis to you all. ix x Chapter 1 | Introduction “Three Rules of Work: Out of clutter find simplicity; From discord find harmony; In the middle of difficulty lies opportunity.”- Albert Einstein. This chapter introduces the general area of this thesis. The general problem is presented and the structure of the remaining sections of this manuscript are briefly outlined. 1.1 Background The Internet has continued to experience exponential growth in recent years. According to [1], the number of users on the Internet is expected to increase by over 120 Million between the years 2009 and 2010 alone. This trend is expected to continue as Internet penetration in developing countries increases. Another factor, that is fuelling this increase, is the number of mobile terminals that are being connected to the Internet as data rates become more affordable. These trends are driving more and more businesses to start offering services online. Through the Internet, businesses have the potential to reach a global audience on a 24x7 basis at the mere cost of a single network connection. Today there are an unprecedented number of applications on the Internet that its original designers could not have conceived. The Internet now competes with technologies such as broadcast radio/television, CDs/DVDs and encyclopaedias. One application that has been generating a tremendous amount of interest is multimedia content. Multimedia content is responsible for the most amount of traffic on the internet today. This is position will further be accentuated with the advent of High Definition (HD) video content. Today websites like YouTube [2] see over 65,000 videos being uploaded and over 100 million videos being watched daily. With over 75 billion videos and registering well over 375 million unique visitors annually [3], sites like these generate a phenomenal amount of network traffic that must traverse many networks to reach a global audience. This results in congestion at network bottlenecks. These typically exist at the origin servers (first mile), at the end-users (last mile), and at peering points as most of the links in question are already running at full capacity. At the same time viewers of such content are becoming more and more demanding of higher quality content and are less patient to wait for the content. Furthermore, ubiquitous access and broadband initiatives that try to solve the last mile problem are expected to only solve part of the problem. All of these factors have a negative impact on businesses that stand to loose both revenue and market share due to poor performance of their web services. 1 1 Content providers who are the originator of content are not able to cope with these of demands from their clients. Optimising the delivery of content to a global audience is not a trivial task. It requires a great deal of knowledge and expertise in network engineering. This is further exacerbated by unpredictable demands for content during special events. Consequently, content providers turn to Content Delivery Networks (CDNs) whose principal business is the delivery of third party content to end-users. CDNs emerged in the late 1990s and have steadily been growing in popularity. Today the CDN market size for video delivery is projected to be US$800Million [4]. CDNs optimise the delivery of content using a variety of techniques. Essentially, CDNs place content closer to the end-users to reduce delays and to ensure that the content doesn’t have to traverse congested links numerous times. This results in faster access times and improves the perceived end-user experience. Reliability of services is also increased since CDNs will typically mirror the content thereby introducing redundancy. There are however, many technical challenges that must be addressed by a CDN provider. Most of these are related to the mechanisms used to transparently offer content on behalf of the content provider. There are also many issues that are related to operating and managing a geographically distributed CDN infrastructure with a high degree of precision. Furthermore, the CDN needs to provide detailed billing, logging, and reporting data all of which must be done in real time. This typically involves the collection and processing of large volumes of data, a task that requires a significant amount of computational power. Finally, owning and maintaining such delivery infrastructures requires a huge outlay of both capital and operating expenditure. Agent technology however, has been demonstrated to be highly effective in telecommunication systems. Such systems, of which CDNs can be regarded to fall under, tend to be extremely large and highly distributed. As is the case of CDNs, telecommunication systems place stringent requirements on operators. Such systems must be synchronised, managed and monitored to ensure five nine’s (99.999%) [5] availability. The competitive nature of the telecoms industry is also pushing operators to offer new and improved services to their customers ahead of their competitors. This is driving many operators to consider new and emerging technologies including agent technology [6]. Software Agents are highly effective in dealing with situations that have a significant degree of uncertainty. Agents are also uniquely suited to complex and combinatorial tasks that involve numerous variables. Lange and Mitsuru in [7] and [8] outlined the following advantages to using agents: • • • • • “Reduce network loads; Overcome network latency; Encapsulate protocols; Execute asynchronously and autonomously; Adapt dynamically; Naturally heterogeneous” These characteristics make agents well suited to overcoming the inherent limitation of Internet content delivery. Furthermore, these attributes are highly desirable for a Content Delivery Network. This work therefore, explores the possibility of deploying a CDN infrastructure using agent technology. A Multi-Agent System architecture that addresses the design challenges of a CDN is presented. 2 The remainder of this work is organised as follows: Chapter 2 presents theoretical background for understanding the rest of this manuscript. This chapter may be skipped by readers who are already well versed in CDN Technology and Agent Technology. Chapter 3 outlines a detailed problem statement and research objectives of this work. Chapter 4 presents some of the related work that has been published in the areas of CDNs and Multi-Agent Systems. Chapter 5 gives a brief overview of the JADE agent middleware. This chapter may also be skipped by readers who are already familiar with this agent development platform and middleware. Chapter 6 presents a proposed software mobile agent architecture that can be used to implement a CDN infrastructure. Chapter 7 provides implementation details of our proof-of-concept prototype. Chapter 8 presents empirical measurements made were made on our prototype. Discussions and Conclusions are presented in Chapters 9 and 10 respectively with future extensions on this work presented in Chapter 11. 3 3 4 Chapter 2 | Theoretical Background “In theory there is no difference between theory and practice. In practice there is.” - Yogi Berra In this chapter we present background information that is necessary to comprehend the remainder of this work. We also present the current state of the art in CDN technology. 2.1 The Structure of the Internet The Internet is a global network of networks that use the Internet Protocol Suite to communicate [9]. The Internet Protocol Suite, upon which the Internet relies for end-to-end communications, is commonly known as TCP/IP as it bundles two important protocols, the Transmission Control Protocol (TCP) and the Internet Protocol (IP). This suite was a result of work carried out by the Defence Advanced Research Projects Agency (DARPA) in the early 1970s. The current version of the Internet Protocol that is in wide spread deployment is version 4 or IPV4 as described by RFC 791 [14]. This has been in deployment since the early 1980’s. Its design approach was to move complexity to the end terminals thus eliminating the need for intelligence in the in actual network. The Internet Protocol Suite offers a best effort service, meaning that it provides no guarantees of bit rate or delivery time but is rather heavily dependant on the traffic conditions that are prevalent at that particular instance. In this regard no network resources are pre-allocated for applications using the protocol. Networks are typically organised as Autonomous Systems (ASs). Traffic between ASs, whether it is peering or transiting, is generally regulated through Service Level Agreements (SLAs) and implemented via Exterior Gateway Protocols (EGPs) [10] of which the Border Gateway Protocol (BGP) [11] is a de facto standard. EGPs take into account the SLAs and business policies that exist between networks to make routing decisions for traffic. Conversely, Interior Gateway Protocols (IGPs) like OSPF [12], and RIP [13], are used within an AS and unlike EGPs they have the ability to take into account network topology and link bandwidth to make routing decisions. Due to the nature of their business Internet Service Providers (ISPs) often limit the amount of traffic that they accept from their peers for purely business reasons. Furthermore, ISPs disallow traffic to transit their network unless a business arrangement has been made a priori. Lastly, EGPs do not take into account prevailing network traffic conditions. All these factors lead to an inevitable conclusion: the path taken for traffic on the Internet is often sub-optimal. Network congestion is a phenomenon that has become a norm for Internet users today [15]. Congestion causes packet delays and packet loss on a network which leads to deteriorating levels of Quality of Service (QoS). One of the principle causes of congestion can be attributed to some of the deliberate decisions made by network owners in an effort to optimize their link utilisation [16]. Network links tend to be costly and so it makes good business sense for network owners to only pay for bandwidth capacity that they need. Another cause can be attributed to new network protocols which aggressively utilize packet retransmissions to ensure reliable delivery of data [17]. Rich content media, such as High Definition (HD) 5 5 content, further complicates this situation by placing huge demands on the existing network infrastructures [16]. The resulting effects include queuing delay, high packet loss, or blocking of new connections. The sad reality is that the provisioning of additional bandwidth does not always remedy this situation. This is because some applications in use today are designed to simply increase the demands they place on network resources when additional bandwidth becomes available [18]. Furthermore, sudden changes in network traffic patterns cannot be accommodated without altering existing SLAs [16] between network providers. Lastly, though protocols exist to provision end to end QoS like the Multi Protocol Label Switching (MPLS) Protocol, deployment of such protocols is often restricted to an Autonomous System [19]. 2.2 Content Delivery Overview Content delivery over the Internet involves a content provider, the producers of the content and clients, the consumers of the content as depicted in Figure 2.1 below. Delivery over the Internet inevitably also involves the Internet Service Providers (ISPs) whose networks are traversed by the traffic as it moves between the content provider and the client. The first mile represents the links that exist between a content provider’s network and the Internet backbone. Typically data must traverse numerous networks to reach the end-users. The hand over of data packets from one network to another occurs at peering points or Internet exchange points. Finally, the last mile links are the ISP links that service the end-users. These range from dial-up modems to high speed optical fibre infrastructures. Figure 2-1: Content Delivery over the Internet The clients demand high quality content that is provisioned at high speeds which is perceived as responsiveness. The online behaviours of clients suggest that they tend to abandon sites that take time to load [16]. With the commercialisation of the Internet, content providers that do not significantly address these client demands stand to face loss of revenue and market share. This motivates content providers to consistently deliver rich and engaging experience. As the number of clients demanding the content increases the content provider need to invest in higher capacity servers to accommodate the surge of new requests. Content 6 providers also need to cater for the increase in infrastructure with personnel that are competent enough to manage the infrastructures. This becomes challenging for a content provider whose principal business is the creation and provisioning of content. To this end, it is neither strategic nor cost effective for them to invest in such content delivery infrastructures. The structure of the Internet adds an additional layer of complexity to this problem. As described in earlier sections, the Internet is a network of networks. This makes the Internet highly distributed. A birds-eye view of the Internet appears like a graph of networks with no clearly visible centre [10][20]. Rather there is one big edge where users are located. This means that as the network distance between the content provider and client increases (i.e. number of Autonomous Systems that need to be traversed to reach the client) the more unpredictable the performance experience will be. As a result content providers need to colocate content in sites that are closer to the users in order to optimize the delivery of their content. This becomes a challenge because Internet users are distributed over a long tail distribution [21] globally as shown in Figure 2.2 below. Akamai Technologies [22], one of the largest CDNs in operation today, no single ISP network sees more that 5% of the total Internet traffic. Akamai further approximates that to reach 50% of the subscribers you need to be looking at 30 network providers. To reach 95% you need to look at over 15,000 different networks [20]. To further exacerbate this problem a phenomenon commonly known as flash crowds, where a significant number of users suddenly become interested in a particular website, can very quickly overwhelm the resources of servers that are offering content. In order to meet these seemingly insurmountable challenges content providers turn to Content Delivery Networks (CDNs) to assist in the delivery of their content to a global audience. Figure 2-2: Internet user traffic across all networks Source [20] 2.3 Content Delivery Networks Content Delivery Networks (CDNs) are commercial entities that host third party content, mirror or replicate this content over their global network infrastructure and transparently redirect users to the best replica sites [16]. This allows them to significantly reduce the delays experienced by end-users that are geographically dispersed. It also makes it possible to dilute 7 7 the effects of flash crowds and Distributed Denial of Service (DDoS) attacks. Though there are no hard and fast rules that define or determine what does and what does not constitute a CDN infrastructure the following components outlined in Figure 2.3 are universally accepted. Figure 2-3: Infrastructure components of a CDN Source [16] Origin Server: These are the content providers servers that originate content and either pushed into the CDN’s network or is periodically pulled by the CDN’s network. Clients: These are the consumers of the content that is provisioned by the content provider Surrogate Servers: These are the servers that exist within the CDN infrastructure and replicate or mirror all or part of the content that is available on the content providers Origin Server. Distribution Infrastructure: This is the infrastructure inside the CDN for distributing content obtained from the content provider’s Origin Server to the CDN’s Surrogate Servers such that relevant content is as close as is feasibly possible to the clients. Request Routing Infrastructure: This is the infrastructure that exists inside the CDN infrastructure to route requests made by clients for content to the closest Surrogate Server that can fulfill the QoS requirements of that request. Accounting Infrastructure: This is the infrastructure within the CDN that accounts for all requests that the CDN fulfills on behalf of the content provider for both optimisation and billing purposes. With respect to Figure 2-3 above the CDN infrastructure operates as follows: 8 1. The content provider’s Origin Server delegates its Universal Resource Locator (URL) name space to the CDN’s Request Routing Infrastructure for objects that will be provisioned by the CDN. 2. The content provider’s Origin Server either pushes content to, or content is pulled by, the CDN’s Distribution Infrastructure. 3. The CDN’s Distribution Infrastructure mirrors or replicates the content to the Surrogate Servers based on some initial configuration defined by the content provider and replica placement mechanisms. 4. The client requests content from the Origin Server. This request is subsequently redirected due to URL name space delegation to the Request Routing Infrastructure. 5. The Request Routing Infrastructure routes the request to a suitable surrogate server that can fulfil the QoS requirements of the request. 6. The Surrogate Server delivers the content to the client and send accounting information to the Accounting Infrastructure. 7. The Accounting Infrastructure collects all accounting information and then manipulates content access records for input to the billing system and statistics are fed back to the Request Routing Infrastructure for optimisation of future requests. 8. The billing system uses the content detailed records to work out how much shall be charged or paid by each content provider. 2.3.1 Classification of Content Delivery 2.3.1.1 Distributed Delivery This is the more traditional type of CDN in use today. Typically the CDN would invest in a large infrastructure and deploy it globally. In the case of Akamai Technologies [22][23] it builds its infrastructure over existing links using unreliable servers but places the servers in such a way as to avoid traversing congested network links to serve clients. Akamai places its servers inside ISP networks. Another approach is to deploy a high capacity optical network that is independent of the Internet and use this to deliver content to Points Of Presences (POPs) that are geographically spread as in the case of LimeLight Networks [24]. LimeLight places its servers at peering points and exchange points on the Internet. Others, like MicroSpace [25], avoid using terrestrial links altogether and use satellite links to deliver content. The CDN must always have additional resources in terms of servers, bandwidth etc. available to accommodate sudden surges of demands as provisioning these servers can take weeks or even months to do. All in all to deploy a CDN infrastructure in this fashion requires a huge capital expenditure. This translates to high costs for CDN services. 2.3.1.2 Peer to Peer Content Delivery In order to avoid these costs companies are turning to peer to peer systems for the delivery of content. Using the resources that clients make available to the system, P2P systems scale extremely well to accommodate a large number of users. In fact, it is often argued that, the more users there are on the system the better the system scales. This is unlike a client/server model where the server’s response time can degrade as the number of clients requesting resources increases. Companies like BitTorrent DNA [26] are offering this service. However, using P2P systems for content delivery also has its drawbacks. The client/server approach out performs P2P systems in terms of end-to-end delivery. P2P systems also exhibit 9 9 high start-up delays and are susceptible to churn. Lastly P2P system lacks an efficient management and control platform that is necessary feature for global CDNs infrastructures. 2.3.1.3 Hybrid Content Delivery Today companies like Velocix [27] have realized that P2P systems alone cannot compete with the value proposition offered by CDNs [28]. To address this problem a traditional CDN approach is missed with P2P systems. The CDN therefore provides reliability and the P2P systems provide the scalability required in the system. 2.3.2 Benefits and Drawbacks of CDN Services The benefits of CDNs to both clients and content providers are numerous. Users: CDNs improve the end-user’s online experience by increasing the perceived speed and responsiveness of websites and reducing the download time to access web objects [29]. This is all done transparently which means that users do not need any special configurations or software to reap these benefits. Content Providers: CDNs reduce the loads placed on a content provider’s origin servers [18]. This in turn reduces the hardware requirements needed for content providers to deliver content to a global audience. This translates into a reduction in capital expenditure and the personnel to manage such infrastructures [16]. Furthermore additional resources can immediately be provisioned to cater for special events or to deal with sudden increases in demand from clients also known as flash crowds [30] [13]. Network Operators: CDNs help reduce the levels of network congestion and network traffic is significantly reduced flowing over ISP networks [16][18]. This further reduces the amount of bandwidth going to peering points which translates into cost saving [18]. Despite these numerous benefits there still exists numerous drawbacks to CDNs and CDN services. These are discussed below: High Cost of CDN Services: CDNs need to maintain a sophisticated high capacity network that spans a tremendously large geographical area. This network needs to have adequate capacity to accommodate sudden and unpredictable growth in network traffic. As a result the network links in question are often underutilised. This leads a huge operating expenditure overhead that is passed down to CDN customers. Complexity of CDN Networks: The sheer magnitude and scale of a CDNs network is daunting. What is more challenging is the intricate and interdependent systems that make the CDN function optimally. This makes managing a CDN a relatively complex task. Furthermore, requirements such as, ensuring that changes in content proliferate throughout the entire CDN infrastructure or ensuring that real-time logging and billing of services is carried out, can be overwhelming tasks even for the smallest CDN infrastructures. 2.3.3 CDN Problem Models and Solutions Operating a CDN is a complex and expensive activity. A successful CDN infrastructure must implement some mechanisms which are essential for the CDN infrastructure. These are: replica server placement, content placement, content update & management, request routing & redirection mechanism. 10 Replica Server Placement Mechanism: The Replica Server Placement Mechanism attempts to identify the most optimal place to locate a surrogate server [18]. This is in an effort to optimize the server to client communication by ensuring that data does not need to traverse already congested links [16][18]. The goal of this mechanism therefore is to place a server in a location as close to the users as possible i.e. least number of hops. Content Placement Mechanism: The content placement mechanism attempts to establish how to populate the replica servers with content in the most efficient means [16]. This may depend on other factors such as type of content and disk space, server utilisation among other factors. Content Update & Management Mechanism: The content update and management mechanism exists to ensure that content consistency is maintained in the CDN. This involves keeping content up to date but also ensuring that stale content expires in the surrogate servers. Furthermore when the content provider removes content from the CDN infrastructure the changes need to ripple though all of the surrogates that are in possession of copies of the content to ensure that the content is no longer served to requesting clients [16]. Active Measurements Mechanism: An Active Measurements Mechanism is required by a CDN infrastructure to present an almost real time view of the network and content usage [16]. This allows the CDN to make informed decisions to be made regarding servers to use to fulfil client requests but also allow it to optimize its network by provisioning new servers to address imminent hot spots. One of the big challenges that taking active measurements introduces involves the processing large volumes of data in the form of log files [20]. Request Routing & Redirection Mechanism: The request routing & redirection mechanism attempts to choose the best replica server to fulfil the required QoS requirements of content. This in some cases means merely selecting the closest server but could involve other parameters in more advanced scenarios in order to provide load balancing as well as other access control features such as geo-blocking [16]. 2.4 Agent Technology Agent technology is not new but has been in existence for a relatively long time. Though the technology has existed for over a decade, its use is still predominantly in academia. Though agent technology is widely used in the area of network management, it is still regarded by many as a solution looking for a problem. The agent paradigm employs key concepts from artificial intelligence to the distributed object technology [31]. An agent is an entity that assists people and acts on their behalf. Agents function by allowing people to delegate work to them [32]. Since the focus of this work is on software entities we will only discuss the field of Task-specific agents as described in [33] and illustrated in Figure 2-4 below. 11 11 Figure 2-4: Classification of Autonomous Agents Source [33] We further classify agents into stationary and mobile agents as defined in [32] as follows: Stationary Agent “A stationary agent executes only on the system where it begins execution. If it needs information that is not on that system, or needs to interact with an agent on a different system, it typically uses a communication mechanism such as remote procedure calling (RPC)” [32]. Mobile Agent “A mobile agent is not bound to the system where it begins execution. It has the unique ability to transport itself from one system in a network to another. The ability to travel, allows a mobile agent to move to a system that contains an object with which the agent wants to interact, and then to take advantage of being in the same host or network as the object” [32]. A mobile agent therefore, is a program that can migrate from host to host in a heterogeneous network and perform specific tasks. Finally we appeal to the definition of intelligent agents as defined in [32] as: “Intelligent agents are software entities that carry out some set of operations on behalf of a user or another program with some degree of independence or autonomy, and in doing so, employ some knowledge or representations of the user’s goals or desires”. Agents therefore have certain special properties that distinguish them from the standard programs. These include: Goal oriented an agent is capable of handling a task to meet its desired goal; Agency The degree in which an agent represents the user, application, computer system etc... Learning agents can learn from its conditions analyze trends and develop certain degree of reasoning that enables them to take intelligent decisions. 12 Autonomy agents can operate independently without the direct intervention from humans or other entities. Agents have control over the actions or tasks they perform and their internal state. Mobility Ability of an agent to move between systems in a network. This raises concerns of security and cost. Communicative agents can interact with other agents and (possibly) humans using some kind of agent communication language or protocol. Collaborative agents are capable of cooperating with other agents to achieve predetermined goals and objectives. Intelligent agents have been the focus of researchers for many years however, widespread adoption to date has been inhibited by security concerns[reference]. This is further underscored by the fact that the most prolific variant of software mobile agents today are computer viruses. 2.4.1 Agent Middleware Over the years a large number of agent middleware have been proposed. These have each attempted to solve some of the basic problems associated with the development of multi-agent systems. Furthermore, rather than expecting a developer to develop the core functionality of their agents from scratch, these middle-wares often provide a means of making it more convenient on top of existing framework that provides some core functionality. This allows the developer to develop their system with greater ease and speed saving them much time through reusable components. We reviewed literature on a number of agent middleware including: Aglets [7], JADE[34], Grasshopper [35], Cougaar [36], Concordia[37], Caffeine [38] and Voyager [38]. Due to the lack of documentation on the most of these platforms this work focuses on Aglets and JADE. These are discussed in greater detail below. 2.4.1.1 Aglets Aglets is a java based Agent API developed by IBM’s [7] Tokyo Labs by Danny B. Lange. Aglets built on the success of Java Applets, Java’s security and platform independence to provide an agent framework. Aglets is a MASIF compliant agent platform implementation. An aglet is a java object that can move between host to host on the Internet. Aglets exist within an execution context, are uniquely named and are globally addressable. Aglets interact with other aglets through a proxy aglet. Aglets are capable of migrating to other execution contexts taking their state with them. The execution context is a type of Java Virtual Machine (JVM) that acts as hosts for aglets. This provides facilities to aglets such as naming, dispatching, cloning, etc. Execution contexts coordinate the movement of aglets to other execution contexts and implement security policies that manage the life cycle of aglets. Two types of execution contexts models were proposed by the developers of aglets. These were called Tahiti model and Fiji model. The Tahiti model provides a server-based execution context and a shared context for communication where as the Fiji model provides execution context within a browser. Aglets were designed to make it easy for Java Programmers to develop Agents. This is reflected in the simplicity and extensibility of the API. Furthermore, by mirroring the Applet Model, aglets are platform independent and are capable of being run on any agent host that 13 13 supports the Java Aglet API (J-AAPI) [7]. Security in aglets just as in Applets is provided in the form of 'trusted' and 'untrusted' aglets. Aglets therefore can be signed and are trusted within the same domain. Access to resources by aglets or to an aglet’s properties is then determined by the trust relationship that exists. During the late 1990s Aglets had a thriving development community that included commercial, military and academia. However, at the time of this writing we were not aware of a user community that was still developing using Aglets. Most of the information we found was considered outdated. 2.4.1.2 JADE Java Agent DEvelopment Framework (JADE) is a java based Agent middleware developed by the Telecom Italia Lab. JADE complies to the FIPA specification and through a set of graphical tools that supports the debugging and deployment phases. The agent platform can be distributed across machines (which not even need to share the same OS) and the configuration can be controlled via a remote GUI • • Middle-ware for the development, deployment and debugging for multi agent systems It is a distributed system 2.4.1.3 Summary of Agent Middleware Table 1: Summary of Agent Middleware Middleware License JADE Languages supported Java Open Source (LPGL) FIPA Aglets Java Open Source (?) MASIF 2.4.2 Compliance Activity Documentation Active Online Community, Many recent papers use the platform. No paper published in the last 4-5 years were found. Book, Papers API, Book, Papers Standardisation The Object Management Group (OMG) and Foundation of Intelligent Physical Agents (FIPA) are the two major organisations that provide standardisation for agent technologies. Object Management Group: OMG developed the Mobile Agents Standard Interface Facility (MASIF) [39] that caters for both static and mobile agents. It addresses all aspects of agents and includes agent management, agent tracking, agent security, agent transport, naming of agents and agent systems, agent system type and location syntax, consideration/integration of Common Object Request Broker architecture (CORBA) services. 14 Foundation for Intelligent Physical Agents: The Foundation for Intelligent Physical Agents (FIPA) [40] is aimed to set standards for agent management, agent naming and locating, agent-to-agent Interaction through definition of a standard communication language based on speech act theory (KQML and KIF). Knowledge Query Manipulation Language (KQML) uses set of predefined message types such as ask, tell, register, and reply for agent communications. Knowledge Interchange Format (KIF) is used as knowledge representation in KQML. It also works on agent-to-software interaction, defining ways that agents can be linked to legacy software and other back-end processing systems. 15 15 16 Chapter 3 | Research Objectives “A set of definite objectives must be established if we are to accomplish anything in a big way.” – John McDonalds. This chapter presents the research objectives of this work. Research questions that arise are also detailed. The scope of this work is discussed and the relevant research methodologies that will be employed are described. Finally, the expected results and evaluation strategy is presented. 3.1 Motivation The delivery of digital media content over the Internet is an issue of paramount importance. This is underscored by the growth in market of Content Delivery Network (CDN) services [41] in recent years. A CDN’s value proposition is to improve performance, scalability, and cost efficiency in the delivery of digital media content to end-users. This is achieved by delivering the content as close to the users’ network as possible. While a number of CDNs have been in existence for over a decade there is an increasing number new entrants in the market offering new ways of delivering content over the Internet. CDNs are typically being built using proprietary technology that is either patented or kept secret. This makes it especially difficult for different CDN vendors to cooperate or collaborate. CDN providers roll out large network infrastructures that are both costly and complex to manage and maintain. They are considered costly because such structures are deployed globally with over provisioned resources in order to cater for unpredictable demands. They are considered complex because they must always be in a state of flux and need to be adjusted and readjusted depending on user patterns, and prevailing network conditions among other factors. The associated costs are passed on to the content providers who pay exorbitant rates for CDN services. However the challenges of managing a CDN infrastructure still remain. This work proposes an autonomous management and control platform using Software Mobile Agents (SMAs) that addresses the challenges associated with managing a global CDN infrastructure. SMAs are expected to be bundled with media assets and are granted the intelligence to make decisions. 17 17 3.2 Research Objectives The primary research objective is to develop an autonomous CDN management framework using Software Mobile Agents. Specifically, this work seeks to achieve the following research objectives: RO-1. To study and document the current state of the art technology used in Content Delivery Networks. RO-2. To investigate/explore the use of Software Mobile Agents to manage Content Delivery Networks. RO-3. To investigate and document the requirements a Software Mobile Agent based CDN management framework would impose on a network for both migration of agents and for operational control i.e. maintaining consistency. RO-4. To investigate how our proposed CDN framework is influenced by network metrics such as bandwidth, jitter, packet loss, node failure, flash crowds and network latency among other factors through experimentation, and/or the development of analytical models. RO-5. To quantify the overall performance of the proposed CDN framework as the number of users, caches, media assets, the popularity of assets and the asset size increases and measure responsiveness of the CDN framework. 3.3 Research Questions Based on the research objectives listed above, we have formulated the following research questions that this work must attempt to answer. These research questions will assist us in focusing our efforts throughout the project. The research questions are as follows: Research Questions related to Research Objective RO-2 RQ-2.1. Is it technically feasible to develop and implement an autonomous Content Delivery Network using Mobile Agents? RQ-2.2. Are Software Mobile Agents uniquely/well suited to solving the issues/problems associated in deploying a managing a content distribution network? RQ-2.3. What benefits can Content Providers reap by utilizing Mobile Software Agents? Questions related to Research Objective RO-3 RQ-3.1. Is the overhead from bundling content inside a mobile agent is justified? Does it affect the performance of the system for relatively small media assets? What is the recommended minimum size of content that should be used with mobile agents? RQ-3.2. Does using the proposed CDN framework increase or reduce network load and/or network traffic? RQ-3.3. Does our CDN framework reduce or increase the amount of data collected throughout the CDN infrastructure for optimisation and for billing purposes? RQ-3.4. What are the security implications and how would these be addressed? 18 Questions related to Research Objective RO-4 RQ-4.1. Does our CDN framework demonstrate stability, resilience and fault tolerance? Questions related to Research Objective RO-5 RQ-5.1. What is the migration time of objects? RQ-5.2. What is the replication time of objects? RQ-5.3. Does our CDN framework make it easier for the CDN to response to localized events? 3.4 Scope Content Distribution Networks encompass thousands of content servers that are globally distributed. This all involves a large number of servers, clients and media assets. It is impractical for us to attempt to roll out an elaborate infrastructure of such scale and magnitude for this work due to both time and resource limitations. This work will therefore focus on the key mechanisms that such networks can utilise. Though modern CDNs are capable of delivering multiple types of media in a variety of formats, this work will focus on the delivery of video content. A study into the use of agents in heterogeneous network environments that may span beyond organisational boundaries would be incomplete without considering the security implications of such a system. However, for our purposes we will have to rely primarily on the tools that are currently available. As a result security concerns will not be addressed in the design and development of the proposed prototype. However, a detailed discussion into some of the major security issues that can be identified will be presented. 3.5 Research Method Software Engineering is a synthetic and multi-disciplinary discipline that often involves both technical as well as social factors [42]. This makes the field complex as many of the problems areas can have more than one approach to addressing them. This inevitably results in different solutions of which none can be regarded as more correct. Such being the case it is vitally important for us to adopt a formal approach and methodology to our unique problem in the interest of adding credibility to our proposed solution. The research methodology adopted by this work will most likely involve Constructive and ‘Problem-solving’ Research. In this regard, the author intends to address a real world problem by bringing all his intellectual resources to bear on the proposed solution. A prototype will be constructed which will be empirically tested to ascertain its performance. This work will endeavour to utilise proven software development methodologies, tools and techniques to realise the software artefacts. 19 19 3.6 Expected results One of the unique contributions that is expected to emerge from this work will be a prototype framework for managing CDNs using software mobile agents. This prototype is expected to play two important roles. Firstly, it will be a proof-of-concept that demonstrates the workability of our novel approach. At the same time by implementing such a prototype the author will inevitably deepen his understanding of the implications of using agents to develop and deploy a CDN infrastructure. Secondly, the prototype will be a proof-ofperformance that demonstrates the performance of the proposed prototype over traditional CDN technology through the creation of a test bed for empirical measurements. 3.7 Evaluation strategy CDNs are relatively young and are still evolving. This means that it is rather difficult to refer to an industry standard for how they should be implemented. Existing CDNs tend to utilise proprietary solutions to deal with CDN infrastructure issues. This makes it extremely difficult to identify a benchmark with which we can compare the performance of our framework [29]. With this in mind our evaluation strategy will be to compare the proposed CDN infrastructure with traditional forms of Internet media delivery. Wherever possible we will provide models that can be used to describe how the proposed approach differs from traditional methods that are in use today. 20 Chapter 4 | Related Work “Subtitle here” – who said this. There are many benefits to the use and adoption of agent technologies. Though numerous issues, challenges and barriers have emerged which have hampered the wide spread adoption the merits that agent based systems present cannot simply be ignored or overlooked. Pechoucek’s [43] paper on the “Industrial deployment of multi-agent systems” suggests that industry at large is very much involved in research into agent technology deployment today. Today much research in agent based systems in fields as diverse as military [44], space exploration [45], tourism (travel agents)[], healthcare [46], electronic commerce[], air traffic control[]. In comparison, there is relatively less work into the use of multi-agent based systems in network applications. Most such applications are in the area of network management. In this regard, agent systems have been used extensively to manage and control network resources [47]. Jennings in [48] advocates for the suitability of Agents in complex engineering Control Systems. In this work he outlines how complexity is managed in the Software Engineering process detailing the unique benefits of agent-oriented methodology. Two engineering control system case studies are presented in this work: an electricity transportation management system developed by Iberdrola [49], a Spanish utility company and a manufacturing line control application developed and deployed by DaimlerChrysler [50] in Stuttgart, Germany. These case studies demonstrate the practicality of using Agent systems to build management and control systems. Autonomy is a characteristic that makes Agent technology extremely attractive. Autonomy has been defined by Huber in [51] as the level of separation between an agent and external influences. He further states that it can be regarded as a measure of how easy/difficult it is to corrupt or manipulate an agent. Reed presents the notion of Adjustable Autonomy in [44] where the degree at which an agent can be influenced by external entities varies over time. He argues that this makes agents more flexible since their behaviour can be modified during execution. This is very important in complex situations where an agent’s behaviour cannot be predicted or predetermined a priori. Finally, Marik in [52] argues that agent systems are well suited for applications where a centralised solution would not be appropriate. Though often times classical centralised solutions may perform better he suggests that agent systems would still be advantageous if the solution required is expected to be robust and operate in an ever changing environment that constantly needs to readjust itself. This case is especially interesting for Content Delivery Networks. 21 21 22 Chapter 5 | The JADE Platform “A Jade Stone is useless before it is possessed;” – Chinese Proverb For our prototype we have chosen to use the Java Agent Development (JADE) Framework [34], an agent development middleware and agent runtime platform. JADE is a result of work carried out by Telecom Italia (TILAB) in the late 1990’s. The project has thrived for nearly a decade as an open source project since the year 2000, and is popularly used in academia for its simplicity. JADE complies with the FIPA specification [40] for agent interoperability. This chapter introduces some of the concepts and key features of agent-oriented programming using JADE. A thorough and complete discussion on the JADE platform is beyond the scope of this work. 5.1 JADE Architecture The architectural elements of JADE are platforms, containers, and agents. These elements can be seen in figure x below. As can be seen in this figure agents live in containers. Containers belong to platforms. Figure 5-5: An Overview of the Main Architectural Elements of JADE A Platform in JADE is the top level component in JADE. A platform can be regarded as a single instance of the JADE environment. In JADE platforms are composed of containers that can be distributed over a network. This allows the platform to be distributed over several machines. It should be noted however, that JADE is not capable of crossing over Network Address Translators (NAT). A Container is a subordinate component to a platform. Each container represents one java thread. Containers belonging to the same platform can be on multiple hosts and linked using Java’s Remote Method Invocation (RMI). This allows for a great deal of flexibility in Agent based applications. The first container that is launched in a given platform is unique and is referred to as the ‘main container’. There can only be one ‘main container’ in a platform. Consequently, all other containers that are subsequently created for a given platform must join the main container by registering with it. It should also be mentioned that in order to 23 23 introduce a higher level of reliability, JADE does allow the main containers to be mirrored on multiple machines. This introduces redundancy allowing a machine to take over the role played by the ‘main container’ when it, for some reason, becomes unavailable. Agents in JADE reside in containers. An agent also represents a single thread of execution in java and is free to move between containers, so long as the containers belong to the same platform. JADE literature terms this type of mobility as Intra-Platform Mobility. The Inter Platform Mobility Service (IPMS) [53] is required to facilitate the movement of agents between containers belonging to different platforms. 5.2 JADE Agents JADE Agents follow a task-based programming model. Agents must inherit from the class Agent. For a list of the inherited methods please refer to the JADE API documentation [54]. Execution of an agent begins in the setup method which is basically equivalent to a main method in a java application. This is where the agent can be initialized and can set up behaviours that will determine how it will conduct itself. Figure 5.2 shows a simple agent that prints out “I’m an agent!” when executed. import jade.core.Agent; public class CDNAgent extends Agent { @Override protected void setup() { // Printout a welcome message System.out.println("I’m an agent!"); } } Figure 5-6: Source Code for a Simple JADE Agent Every agent has a globally unique Agent Identifier (AID). This comprises of a name and other parameters such as transport addresses or name resolution service addresses. The name of the agent is immutable, however, the other parameters can be altered during execution time. An example of an agent name is: CDNAgent@JadePlatform:1099/JADE As can be seen from this example the globally unique name for this agent is made up of a ‘localname’ CDNAgent, the '@' symbol, the hostname where the agent resides, the port number of the JADE RMI registry service and the word 'JADE'. 5.3 JADE Service Agents The running JADE platform provides some basic services for agents which include message passing support and the following mandatory service agents. These service agents are: Agent Management System (AMS): provides an FIPA prescribed white page service as well as playing supervisory role over agents and their life cycle. The AMS provides a naming service and has authority over all other agents that reside on a platform. Directory Facilitator (DF): provides yellow pages lookup service to other agents. This is achieved by mapping service descriptions to Agent Identifiers (AIDs). Agents can therefore 24 register/edit/remove service descriptions for the services they provide but at the same time agents can locate other agents that offer specific services. Remote Management Agent (RMA): provides a JADE graphical user interface. Figure 5.3 shows a screen shot of the RMA agent. Figure 5-7: Screen shot of the JADE RMA 5.4 Agent Communication JADE provides an Agent communication protocol that is implemented in accordance with the FIPA specifications. This is known as the Agent Communication Language (ACL). This is a fundamental feature of JADE and is required by all agent applications. Agent communications is message based as opposed to Remote Procedure Calls (RPC). Message routing is handled by the agent platform. To facilitate communication, all agents have a mail box for sending and receiving messages. Communication is asynchronous and the agent has the ability to select the order it wants to process incoming messages. JADE uses three messages transport protocols for agent communications. Firstly, RMI is used for intra-platform communication between agents. Internet Inter-ORB Protocol (IIOP) and Hyper Text Transfer Protocol (HTTP) are also available for inter-platform communications. Both these protocols support three kinds of encoding: string encoding, bitefficient encoding and XML. An ACL Message in JADE is a structured text based message that is targeted for flexible communication between agents. ACL messages follow a semantically defined standard for vocabulary and syntax to allow for interoperability between agent implementations. It contains the following fields: • • • • • • • Sender Receiver(s) Performative – Used to indicate the communication ConversationID – Used to link messages in same conversation In reply to – Sender uses to help distinguish answers Reply with – A performative that helps to distinguish answers Reply by – Used to set a time limit on an answer 25 25 • • • • Language – Specifies which language is used in the content. Ontology – Specifies which ontology is used in the content. Protocol – Specifies the protocol used. Content – This is the main content of the message. Through the use of FIPA ACL messages, JADE agents can communicate and thus collaborate with other agent platforms as long as they comply with the FIPA standard. 5.5 Behaviours The purpose of an agent can be defined through behaviours. A behaviour represents a task that an agent can perform. Behaviours execute until they are completed. Agents can typically have multiple behaviours. Intelligent agents are even capable of changing their behaviour during their execution. JADE provides templates of behaviours that the programmer can use to build their own custom behaviours. These range from simple behaviours, that to simple tasks either once or repetitively as in OneShotBehaviour or CyclicBehaviour respectively, to complex finite state machine like behaviours. Please refer to the JADE API documentation [54] for a more in-depth discussion on agent behaviours. 5.6 Agent mobility and cloning As discussed in earlier sections, agents can move to any container in the platform they reside on. This is known as intra-platform mobility. Furthermore, agents can move to different host by way of the Inter Platform Mobility Service (IPMS) that is an add-on to JADE. Mobility is achieved by calling the method doMove(Location) and specifying the new location. In order to facilitate a graceful movement of agents, the beforeMove() and afterMove() methods allow the programmer to define the steps or actions an agent must take before and after it relocates itself respectively. Agents are also capable of replicating themselves just as easily as they can move. This is achieved by calling the doClone(Location, newName) method. Just like the beforeMove() and afterMove() methods exist to facilitate smooth agent relocation, beforeClone() and afterClone() also exist to aid replication. To date agent mobility is still a major security concern on the JADE platform. Mobility is self-initiated by the agent by requesting the AMS. JADE provides a Mobility Ontology in the jade.domain.MobilityOntology class where all the concepts and actions required to support agent mobility and cloning are defined. 5.7 Agent Security Agent security is an area of research that has received much attention and focus in recent years [55][56][57][58]. Many security concerns have been raised that range from message encryption, to agent protection from both other competing or malicious agents, and from the host execution environment itself. Security is the main reason for a lack of industrial support in Agent Technology. As indicated in Section 3.4 security issues are beyond the scope of this work. It should be noted however that there is much effort that has been devoted to this area and it is plausible that this issue will be sufficiently addressed in the coming years. 26 Chapter 6 | Proposed Architecture "Design is not just what it looks like and feels like. Design is how it works." - Steve Jobs The major goal of this work is the use of mobile software agents to create an autonomous control system for content distribution networks. In order to realize this goal our solution needs to exhibit characteristics such as flexibility, autonomy, intelligence, adaptability and communication. Adaptability, which stems from the domain of machine learning, will not be addressed in this work but is proposed as a future extension of this work. Agent architectures are well suited for distributed applications [7]. Agents can be distributed over a network allowing them to cope with the scale and unstable behaviour of the Internet [59]. Furthermore, agent systems introduce contextual information that is often lacking in traditional Internet based applications. In this regard agents can be deployed wherever they are required and are able to perform their tasks using local information and in adherence to local conditions. Such functionality is difficult, though not impossible, to achieve using traditional applications that are typically client/server based. 6.1 Overview This work proposes a unique way of delivering content in a CDN. In our approach we propose to couple media assets with agents. Each media agent in the framework will therefore be responsible for the management of a single media asset. The agent has the ability to monitor the current environment characteristics, the intelligence to make decisions based on predefined policies and the authority to carry out control functions that manage its life cycle and existence. Figure 6-8: Overview of Media Agent Figure 6-8 depicts the proposed media agent. The media agent will be able to: interact with end-users requesting access to the media assets that it is responsible for; interact with other media agents to share knowledge and information; interact with the execution environment for life cycle management capabilities such as replication, migration and termination; and interact with supporting services for any other additional services that are not available 27 27 directly via the execution environment. The media agent is comprised of the following components: Media Asset State Information Policies Executable Code 6.2 This is the media asset that the media agent is responsible for managing. The agent will be responsible for providing access control as well as life cycle management for the asset. The agent will further be responsible for delivering this asset to end-users who request it. This is status information that the agent will keep. This information can be used at a later time for utility purposes such as logging and billing but could also be used internally by the agent to make decisions. Examples of state information include the number of times the media asset associated to the agent has been accessed. It could also be a list of locations that have been previously visited by the agent. The policies are used to determine how the agent should behave. These policies can also be used to specify access policies for the media assets as well as to control the degree of replication or the amount of resources an agent is permitted to consume. This is the logic that the media agent will use to determine its behaviour. One key functionality that must be implemented as executable code is the ability for the agent to handle end-user client requests for the media asset it is responsible for. It is therefore essential for the agent to be capable to stream video content to the requesting end-user. CDN Implementation This section discusses how the proposed infrastructure addresses some of the CDN problems that were discussed in Chapter 2 earlier. 6.2.1 Replica Server Placement Mechanism In the proposed prototype no assumptions are made on the replica server placement mechanisms or algorithms that will be used. Any existing or optimised heuristic can be implemented in the media agent. The mechanism used could therefore be identical to that used in traditional CDN infrastructures. This is expected to result in servers located in the same locations in terms of proximity to clients for the media to be delivered. It should be noted however, that traditional CDNs would typically only have access to servers that belong to them. The proposed architecture would be able to use any CDN’s platform that allows JADE software mobile agents to execute. This is a revolutionary step in CDN provisioning services. It levels the playing field of the CDN market. Further details on how the author envisages this to function are discussed in the future work section of this manuscript. 6.2.2 Content Placement Mechanism The content placement mechanism attempts to establish how to populate the replica servers with content in the most efficient means [16]. This may depend on factors such as type of content and disk space, server utilisation, and end-user requests among others. This step could also be carried out a priori using information that is known about the content, as is the case with content that is published regularly. In our approach, once the agents have been deployed initially they will be responsible for deciding the most optimal location to deliver content from. This will be achieved by collecting statistics on end-user requests as well as the local host platform conditions. Once collected however, this data will be processed locally by the agent independently. The agent will therefore periodically check its state against the rules to 28 determine what action to take next. If the action is indeterminate the agent will continue to remain at its current location. To supplement this mechanism we will also provide the agent a means to migrate, clone or self-destruct by sending it a specific command. This command, in the case of migrate and copy instructions, can either be specific, telling the agent exactly where it needs to go or copy itself; alternatively the instruction can just trigger a mechanism within the agent that causes it to discover a new location based on its current state. In the latter scenario the agent is expected to use the FIPA Contract Net Protocol [60] behaviour to determine the best platform. 6.2.3 Content Update & Management Mechanism The content update and management mechanism exists to ensure that content consistency is maintained in the CDN. This involves keeping content up to date but also ensures that stale content expires in the surrogate servers. The proposed architecture proposes to address these issues in a number of ways. Firstly, policies can be defined within the media agent managing a specific content. These policies could, for instance, request the media agent to periodically check for updated content on the content provider’s origin servers. The policies could also specify an expiry date for content that is stale. Secondly, it should be possible to instruct an agent to perform certain functions remotely. In this way a CDN provider could issue a command that instructs all agents to update their policy or download the latest version of the media asset they are responsible for. These instructions could also be used to purge all agents responsible for stale content from the CDN by instructing them to self-destruct. 6.2.4 Active Measurements Mechanism The active measurement mechanisms are still vital to a Software Mobile Agent (SMA) based CDN infrastructure. However, it is proposed that the purpose of such a mechanism evolve from simply an incident alarm system to a self-organising, self-healing infrastructure. In this regard we shift the functions of such a mechanism to simply notifying CDN operators of imminent hot spots that exist. Instead, our proposed infrastructure is able to make this information available to the SMA. This is expected to allow SMAs to take evasive counter measures that preclude adverse situations independently. 6.2.5 Request Routing & Redirection Mechanism The request routing & redirection mechanism attempts to choose the best replica server to fulfil the required QoS requirements of content. This in some cases this means merely selecting the closest server to service a given network segment. In more advanced scenarios this could involve other parameters to provide load balancing as well as other advanced access control features. This work does not specify a mechanism for request routing and redirection. Rather, it is assumed that existing mechanisms are current in use in traditional CDNs could be used. 6.3 Functional overview This section gives an overview of the functional capabilities that are required by our system. 6.3.1 Bootstrapping and Start-up Procedures Bootstrapping of our system is expected to be manual. This is expected to simplify the overall complexity of the system. In this regard, we assume that the surrogate servers hosting 29 29 our execution environments are already provisioned with all of the necessary elements for the successful deployment of our agents. 6.3.2 Locating Execution Environments Execution environments will be made aware of other host environments manually. Though a simple discovery mechanism and protocol could be created for this purpose, we feel this task is beyond the scope of this work. We will therefore assume that all execution environments know of other environments that are available and reachable. Due to these assumptions it would not be feasible to explore how our agent infrastructure can behave in a fault tolerant manner should execution environments become unavailable. For our prototype we will bootstrap each platform with the platforms within its vicinity. This information will be held in a local directory service that agent can query. Our agents will be able to query this directory service for available execution environments. This is expected to return a response of zero or more network locations. 6.3.3 Monitor the conditions of a Host Execution Environment An agent will be required to monitor the conditions of the host execution environment that it currently resides in. This is expected to enable the agent to make timely and informed decisions. One example of such a decision is the decision to migrate to another execution platform when the resources of the current host execution environment are becoming depleted. We envisage that this will be achieved in two separate ways: a) the agent will be able to poll a monitoring service to get the latest information on the host execution conditions; and b) a publisher/subscriber mechanism whereby the agent subscribes to the monitoring service for periodic notifications on the prevailing host execution environment conditions. 6.3.4 Discover the capabilities of execution environments An agent will be required to discover the capabilities of an execution environment. This information is expected to be used by an agent when it wishes to migrate or replicate itself to alternate platform locations. The capabilities discovery is expected to provide sufficient details to the agent for it to decide if the alternate platform can accommodate it. In this regard an agent should be able to request a remote execution environment to provide it with its prevailing environment conditions. Upon receipt of this data the agent can decide, based on thresholds defined in its policies, whether that execution environment is suitable or not. Agents will rely on the Locating of Execution Environments mechanism described in Section 6.3.1 to retrieve the list of available execution environments. Once this list is obtained, the agent will iterate through this list sending a request for the prevailing environmental conditions for each network location in this list. All execution environments that receive such a request will respond with their environmental conditions. 6.3.5 Locating and Naming of Agents Agents in the proposed architecture will have globally unique names. The names are expected to be informative and indicative. A123-6701-FE223401@HOSTNAME:RMIPORT/JADE Figure 6-9: Format of Agent Names 30 Figure 6-9 shows the format of our globally unique agent names. As discussed in Section 5.2 the name of the agent is composed of a local name and the host where the agent resides. We propose a simple naming convention that is comprised of a 16 character hexadecimal string for the local name. This can be broken down into the following fields: Organisational Identifier (4 characters) Owner Identifier (4 characters) Asset Identifier (8 characters) This field will identify the owner of the agent. This is useful if the agents will be deployed in a multi-vendor CDN environment. This field will therefore identify who owns or created the agent and can be used for inter-vendor billing purposes. Length of this field is 4 characters supporting up to 65,000 CDN providers. It is expected that these identifier would be globally assigned by Local Internet Registry (LIR). This field will identify the owner of the media asset. This field is assigned by a CDN provider. Length of this field is 4 characters supporting up to 65,000 Content providers. This field identifies the media assets owned by a content provider. This field is assigned by a content provider. The length of this field is 8 character hexadecimal supporting up to 4 billion assets per provider. For the purposes of simplicity, we will restrict all copies of an agent to have identical names. This will make it easier to locate agents without the requirement for maintaining a one-to-many list that maps media assets to agent names. Though this may not be a problem for locating any agent that is responsible for a given media assets, as will be discussed later, it makes it more challenging to update all media assets deployed on the CDN infrastructure. This one-to-one mapping of media asset to agent names adds one further restriction. The can only exist one agent responsible for a given media asset on a given platform. Such a restriction is acceptable and justified. 6.3.6 Configurability Our agents need to be highly configurable and flexible. This will involve increasing their intelligence and autonomy. We expect to achieve this through policies. Upon deployment it should be possible to define a set of policies that the agent can use in whatever situation that may arise. Two types of policies are envisaged: Agent Policies These are policies that agents will use internally to determine their behaviour to different situations that may arise. These policies could specify how the agent should behave when resources on a platform are being depleted i.e. what threshold levels should the agent adhere to. The policies could also specify how much the agent should replicate among other such policies. Access Policies These are policies that will determine the use of the media assets that are coupled with the agent. An example of such a policy includes restricting a media asset to only include or to exclude a specific geographical region (geo-blocking). 6.3.7 Other Functional Requirements For the purposes of this work other functional requirements such as security, data integrity, reliability, and fault tolerance are assumed to be provided by the underlying platform and are beyond the scope of this work. 31 31 6.4 Multi-Agent System Architecture A Multi-Agent System (MAS) architecture can be regarded as a system that is made up of multiple independent components. The degree of independence is expected to vary from component to component. The central theme of MAS is that each of these components performs a specific role yet communicates and collaborates with other components to solve a problem that each component could not practically address individually [6]. In order to fulfil the requirements of the functional requirements outlined in Section 6.3 we adopted MAS architecture. This was expected to greatly simplify the development of the system by focusing on the task-specific aspects of each agent. This allowed us to assemble intricate functionality and features that would not be practical to be performed by an individual agent. The Prometheus Agent Oriented Software Engineering (AOSE) methodology and the Prometheus Design Tool (PDT) [61] were extensively used to analyse and design our multi-agent architecture. Our multi-agent system therefore, allows agents to exchange information through the exchange of FIPA Agent Communication Language (ACL) messages [62]. This allows the agents to work in collaboration with each other and share knowledge that is pertinent to furthering their individual goals and to carry out sophisticated interactions. The multi-agent system comprises a combination of both mobile and stationary agents. The stationary agents offer basic services to our mobile agents that allow them to make informed decisions in a timely manner. 6.4.1 Media Agent The goal of the media agent is to optimize the delivery of the media assets that it is responsible for managing. This can be further broken down into two tasks 1) optimize the end-user experience and 2) minimize the cost of delivering the media assets. This is achieved by identifying optimal platform locations to service client requests. Policies for content migration, replication and deletion are stored within the media agent. This allows the agent to make local decisions based on these policies at runtime. In an effort to make the agents lightweight we have opted to decouple the media assets being managed by the agent. 6.4.2 Agent Intelligence In this work we attempt to use mobile agents to realize a functional autonomous Content Delivery Network Infrastructure. This suggests that we employ some intelligence in these agents that will enable them to make decisions based on current network conditions and a predetermined set of rules. In artificial intelligence, a knowledge-based system (KBS) is a system that is programmed to mimic human problem solving ability. A KBS is typically comprised of three components: a knowledge base, which is a database of knowledge on a particular subject; a working memory, that contains derived facts; and an inference engine, which contains the methods and techniques to process the facts and knowledge in order to achieve a reasonable solution to a given problem [63]. In formal logic rules of inference are usually presented as a one of more premises followed by a conclusion as shown in Figure 6-10 below. 32 Premise #1 Premise #2 -----------Premise #n Conclusion Figure 6-10: Basic Structure of an Inference Rule This expression above implies, that whenever the given premises can be considered to hold true, then the conclusion follows with necessity from the premises. 6.4.3 Execution Environment Our agents rely on an underlying execution environment that will provide them with the capability of instantiation, migration, replication and halting. These capabilities can be described as follows: Instantiation This is required when bootstrapping an agent. This involves loading the agent class from disk and calling its setup() method. Migration The ability of an agent to migrate from one host to another by calling its doMove() method as described in Section 5.6. Replication The ability of an agent to clone itself to another host by calling its doClone() method as described in Section 5.6. Halting This is the ability of an agent to self-destruct by calling its doDie() method as described in Section 5.7. 6.4.4 Supporting Services In order to perform their duties we propose a number of supporting services that are available on each platform. These supporting services will be responsible for providing capabilities that are not available to the agent through the agent execution environment. Examples of such agents include monitoring the conditions of the host execution environment. 6.4.4.1 Monitoring Service This is a service that will be responsible for monitoring the local execution environment that is hosting the JADE platform as described in Section 6.3.3. This is achieved by acquiring information from the SNMP daemon on the running host. Once retrieved this information is made available to other agents who subscribe for periodic notifications or who poll the service for specific information. 6.4.4.2 Authorisation Service This service implements a simple Policy Decision Point for access control to the media assets. It is expected that all requests for a media asset will therefore have to be approved by such a service. In this regard the authorisation service will need to be extremely flexible to cater for the diverse use cases that may exit. The policies themselves will need to be universally understood and precise leaving no room for ambiguity. This service could be implemented with in the agent itself or provided as an external service that the agent will utilise. 33 33 6.4.4.3 URL2AgentTranslator Service This service will be responsible for translating URLs to agent names. In this way the service will be responsible for interfacing with non-agent entities via external protocols. 6.4.4.4 Platform Location Service This agent will be responsible for locating adjacent platforms for our agents. This service will also provide a simple lookup service for agents to find the platforms where copies exist. 34 Chapter 7 | Implementation “For the things we have to learn before we can do them, we learn by doing them.” – Aristotle. This chapter presents the implementation details of the proposed proof-of-concept prototype. 7.1 Architectural Overview Figure 7-11: Proposed Architectural Overview Figure 7-11 above show the Proposed Architectural Overview of the prototype. Depicted in this figure is a single instance of a platform. Each platform is running the Ubuntu Linux [64] Operating System that also has an SNMP and FTP daemons. We used the Sun Java Runtime Environment as our virtual machine and as a hosts the JADE runtime system. Finally, the Sun Microsystems NetBeans [65] Integrated Development Environment (IDE) was used as our development platform. Our multi-agent system therefore, allows agents to exchange information through the exchange of FIPA Agent Communication Language (ACL) messages [62]. This allows the agents to work in collaboration with each other and share knowledge that is pertinent to furthering their individual goals. The central actor in our proposed architecture is the media agent. This agent is expected to rely on key services. The media agent is expected to be mobile and thus will be able to migrate from platform to platform delivering the content that it is responsible for to the 35 35 requesting clients. A platform in our architecture is therefore expected to have zero or more media agents at any instance in time. However, all services agents must be present in order for these services to be available to media agents. Additional agents are also required to complete this architecture. Thes agent will provide a Control Platform that the administrator of the CDN infrastructure can use to manage it. This additional agents are the ControllerAgent and the ReplicatorAgent. 7.2 Agent Roles The multi-agent system comprises a combination of both mobile and stationary agents. The stationary agents offer basic services to our mobile agents which allow them to make informed decisions in a timely manner. Our stationary agents are: Platform Monitor Service, Replication Service, Controller Agent, Url2AgentTranslator Service, and FTP Service. The current roles that our agents play is as follows: 7.2.1 Platform Location Service One of the core functionalities required by the system was the ability to discover execution environments that can support our agents. For this service we simplified the requirements by hard coding the list of available platforms on each host. This is a reasonable simplification for a prototype. Another requirement for this service was to distribute agent location information to other execution environments. To achieve this functionality we had to rely primarily on ACL Messages. A simple protocol, the Ping-Pong Protocol, was therefore created to provide this functionality to our service. The Ping-Pong protocol is a simple session establishment protocol that was created to allow platforms to discover each other. For our purposes, all available platforms are hard coded in each platform. However, since these platforms may or may not be available at the time a platform is being bootstrapped, the challenge was to ensure that the peer platform is available and ready to receive messages before proceeding. In order to achieve this, when a platform is being bootstrapped it is required to send pings to all of the platforms that it knows about. These pings are periodically sent every 5 seconds. When a platform receives a ping from a peer platform it responds with a pong message. Once a pong has been received the peer no longer sends pings to that peer that sent the pong. A connection is established between these peers once these messages have been exchanged. Each peer then requests its neighbour to send a refresh message that informs it of its conditions and the current list of agents that available on it. To ensure that the changes, in resident agents, on a platform are populated to all known platforms, a platform sends periodic updates to its peers. If no updates are required, due to no changes occurring on a platform, a peer is expected to send a keep alive message periodically informing its peers that it is still available. Failure to send keep alive messages results in the connection with that peer being terminated. 7.2.2 Platform Monitor Service This agent is responsible for monitoring the local execution environment that is hosting the JADE platform. This is achieved by acquiring information from the Simple Network Management Protocol (SNMP) daemon on the running host. SNMP is a de facto standard in managing and monitoring network devices. Defined in RFC 1157[66], this protocol is implemented by most vendors of network equipment. We propose the use of an SNMP server to monitor health of our host execution platform. Through the use of an SNMP API, agents will be able to periodically poll the health of a local system. Once retrieved this information is made available to other agents via either a request/response or publisher/subscriber communication model. 36 7.2.3 Authorisation Service This agent implements a simple Policy Decision Point for access control to the media assets. The agent uses a simple request response protocol that is based on the Oasis eXtensible Access Control Mark-up Language (XACML) [67][68] language. The OASIS XACML is a standardised access control policy language utilising XML syntax. It is a widely accept standard in the industry because it\s extensible and can be used to implement policies for different uses and environments. Policies created using the language can be extremely expressive allowing the user to specify precisely what their intentions are for the resources the policies are written for. 7.2.4 URL2AgentTranslator Service This agent is responsible for translating URLs into agent names. The agent is also responsible for checking access permissions for the requestor with the local Policy Decision Point. 7.2.5 FTP Service This agent is responsible for the transfer of media assets via ftp on behalf of the Media Agent when it migrates or replicates itself. This agent uses the Apache Commons Net FTP Library [69]. In addition to these services, we have also built utility services. These utility services are responsible for collating logging and billing information of all events that occur on a host platform. This information is aggregated and filtered based on a predefined criterion. Finally the information is transmitted to a central location for further processing. 7.2.6 Controller Agent Agent that is responsible for providing a management interface for the CDN. Through the graphical user interface provided by this agent the execution multi-agent system, distributed as it may be over multiple hosts, can be orchestrated. This agent is responsible for the deploying, and management of agents. The agent therefore, has the capability to create and deploy agents over the available platforms using predefined policies. The agent is also able to send commands to update their policies or associated media assets. Figure 7-12 below shows a screenshot of our controller agent. On the left is a list of assets that are available in the CDN infrastructure. This is a tree view that includes both running execution environment and the agents that are resident on those platforms. To the left is a simple map view that presents a graphical view of the infrastructure and the location of the agents. Using this interface the user is able to deploy new agents as well as manage existing ones. 37 37 Figure 7-12: Screenshot of Controller Agent 7.2.7 Client Application In order to fulfil the requirements of delivering video streams to an end-user we created a customised client application. It is often desirable to use existing technologies on the clientside to simplify the delivery of content. However, this could not be achieved without loosing some functionality. The client application therefore is responsible for formulating a request for content. Sending a URL to the URL2Agent translation service does this. One of the core capabilities that the client application needs to be able to do is to playout an RTP stream. This was achieved using the Java bindings for the VideoLan Client (JVLC) library.[70]. Figure 7-13 shows a screenshot of the Client Application. This application basically allows the user to specify the URL for the media asset in the text box. The send and exit buttons are used to send the request to the and to exit the application respectively. 38 Figure 7-13: Screenshot of the Client Application 7.2.8 Media Agent The media agent is the most crucial agent in the proposed framework. This agent is responsible for the management of media assets. The media agent is therefore capable of migrating from one execution environment to another in order to find the most optimal place to deliver content to an end-user. At the same time, the media agent must be capable of communicating with other agents described here to obtain services that are critical for its life cycle management. It must be able to receive and process notifications relating to local execution environment conditions. It must also be capable delivering media to the client application described in earlier sections. This is also achieved using the Java bindings for the VideoLan Client (JVLC) library. Policies for content migration, replication and deletion are stored in the media agent. This allows the agent to make local decisions based on these policies at runtime. In an effort to make the agents lightweight we have opted to decouple the media assets being managed by the agent. 7.3 Agent Reasoning and Intelligence One feature that enhances the performance of the proposed system is the agent’s ability to reason based the prevailing conditions and predefined policies. The policies are simply a rule base whilst the prevailing conditions are sensed either directly or communicated from other agents. This is achieved through a simple inference engine. A number of inference engines are available for the Java platform. Jess and Hammurapi Rules are two such engines that are widely used in industry and academia. Jess is a both a rule engine and scripting environment 39 39 that was developed by Ernest Friedman-Hill at Sandia National Laboratories [71], [72] and [73] present an architecture that combines both Jess with the JADE platform to create an intelligent agent applications. Jess is provided under a commercial license but is available at no cost for academic use. Hammurapi Rules is yet another rules engine for the Java platform that has developed entirely in Java language. Hammurapi Rules is freely available for use. This implementation uses an Inference Engine presented by Joseph Bigus and Jennifer Bigus in [63]. Rule migrate_decision = new Rule(rb, "migrate_decision", new Clause[]{ new Clause(service, cEquals, "Poor"), new Clause(num_requests, cEquals, "Low")}, new Clause(decision, cEquals, "Migrate")); Figure 7-14: Sample Rule for a Media Agent Figure 7-14 shows a simple snippet from a rule base for a media agent. A rule base is simple a collection of rules. Each rule is made up of Clauses. As described in Section x. A rule has one or more Premise Clauses followed by a Conclusion Clause. In the rule shown the rule migrate_decision is made up of two Premise Clauses. These are Service = “Poor” and num_requests = “Low” indicating that if the agent considers the services it is obtaining from the host execution environment to be “Poor” AND the number of user requests it is getting per minute is relatively “Low” then it can consider migrating to another platform. Naturally, such a rule will rely on other rules that define how the agent can determine if the services it is obtaining are indeed “Poor” or that the number of requests are “Low”. To better understand how such a rule base can be built we present a table below that details how an agent may determine the type of service it is obtaining from a host execution environment. Table 2: Sample Decision Matrix for Services Received QoE Poor Acceptable Best Bad Service Bad Service Bad Service Bad Service Acceptable Service Acceptable Service Acceptable Service Acceptable Service Best Service Environment Poor Acceptable Best Table 2 above presents a matrix that could be used by an agent to evaluate the service it is being rendered. The columns represent the Quality of Experience that is being reported by end-user clients. This could be based on values that are sensed such as the number of dropped frames or the frequency that the video stream being delivered from the current execution environment freezes. The rows represent environmental conditions that could also be determined based on the current load, available memory or disk space or the percentage of CPU time that is free. Both these in the example given are classified as either Poor Acceptable or Best. Each individual agent could in fact determine its own threshold thus two different agents resident on the same execution environment may arrive at two completely different outcomes. These outcomes are then combined in additional rules to determine the final action that the agent must take. A media agent will therefore be required to periodically monitor the conditions of the host execution environment and test these against such a rule base. 40 7.4 Use Case Scenarios This section presents some of the use case scenarios that detail the system’s behaviour as it responds to certain events and user actions. 7.4.1 Case One: Creation of an Agent Agents can be created and deployed via the Controller Agents Graphical User Interface (GUI). This allows the user to create the agent by performing the following steps: (i) predefine how the agent will behave by way of rules expressed in predicate logic. (ii) Associate a media asset file to the agent. (iii) Assign access policy for the media asset. With these parameters specified the Controller Agent proceeds to formulate a request to the Replicator Agent that instantiates a Media Agent based on these parameters and assigns an itinerary to it. Though no content placement mechanisms have been implemented as yet for the prototype, it is easy to envisage how the Replicator Agent can use such mechanisms to decide where to dispatch the newly created agents to. Upon instantiation the Media Agent checks if its current location is equivalent to the one prescribed in its itinerary. In the event that there is a discrepancy, the Media Agent is required to migrate to the prescribed location before continuing. 7.4.2 Case Two: Agent deployment For the purposes of this work the deployment of an agent occurs once it the agent has been instantiated. Though the instantiation, or creation, of the actual agent is control by other entities described in Section 7.4.2, the deployment of the agent is concerned with the actions the agent takes in its thread of execution. Upon instantiation the Media Agent checks if its current location is equivalent to the one prescribed in its itinerary. In the event that there is a discrepancy, the Media Agent is required to migrate to the prescribed location before continuing. When an Agent arrives at it’s itinerate location it must perform the following actions: (i) ensure that the associated media assets are available on the local file system. If not, request the local FTP Agent to transfer the assets from its previous location. (ii) Deposit its access policy file in the local Policy Information Base (PIB) to be used by the local PDP Agent. (iii) Register with the local Directory Facilitator (DF) the URL that the agent will be servicing. (iv) Subscribe for host platform information with the local Platform Monitor Agent. 41 41 Figure 7-15: Sequence Diagram - Agent Deployment 7.4.3 Case Three: Content Provider Updates Content When a content provider updates content this will either be in the form of updated requirements, which would necessitate a change in agent policies, or a change in content. Either way version controlling agents becomes increasingly important in order to identify which agents are running which version of policy or content. Figure 7-16: Sequence Diagram - Update Content Event 42 7.4.3.1 Case Three A: New Policies Therefore when the content provider makes available a new set of requirements the following actions must be taken: 1. Requirements are translated into a new policy. 2. The ReplicatorAgent obtains from the locator agent a list of all platforms that have the associated agent. 3. A command object that contains this new policy set is created and encapsulated in an ACL Message. 4. This message is dispatched to all available agents. 5. for each agent that does not respond to have correctly updated its policy a. the message is sent again 7.4.3.2 Case Three B: New Content When the content provider makes available a new version of the media content the following actions take place. 1. The ReplicatorAgent obtains from the locator agent a list of all platforms that have the associated agent. 2. A command object is creating that instructs receiving agents to download a given filename from a specified location. 3. A command object is encapsulated in an ACL Message. 4. This message is dispatched to all available agents. 5. for each agent that does not respond to have correctly updated its policy a. the message is sent again 7.4.4 Case Four: Policy Based Agent Decisions The migration and replication of our agents is based on policies that are defined as rules in predicate logic. Our agents periodically check their current state and its environment conditions against these rules and, using an inference engine, are able to reason on the appropriate actions to take. When an agent resident on a platform either senses conditions or infers that it should migrate it takes the following steps: (i) The Agent sends a requests to the platform location service for all available platform locations. (ii) Using the list of returned platforms the agent initiates a FIPA-ContractNet-Protocol [60] and formulates a Call for Proposals (CFPs) that is sent to each of these platforms. (iii) Upon receipt of a CFP a platform location service is required to fill out a proposal stating its current environmental conditions and submits this to the requesting agent. (iv) Based on the proposals an agent receives it can select an optimal location to migrate to and sends the appropriate “accept-proposal” or “refuse-proposal” messages . (v) Finally the agent deregisters from all of the services it had subscribed to on the current platform and calls the DoMove() method as prescribed by the JADE Inter-Platform Mobility Service to move to the new chosen location. 43 43 Figure 7-17: Sequence Diagram - Policy Based Decisions 7.4.5 Case Five: Client Requests Content The last case is for delivery of media to a client. This is done through the use of the JVLC wrapper library both on the sending and receiving ends. For the client we have created a simple java application that is able to carry out a basic HTTP conversation. For this case the following steps take place: 1. Client Application sends an HTTP GET for a URL to URL2AgentTranslatorAgent that is listening on port 80. 2. URL2AgentTranslatorAgent looks up the requested URL in the Directory Facilitator. 3. Directory Facilitator returns an agent that is associated with the requested URL. 4. If the agent was not registered in the Directory Facilitator a REDIRECT or a NOT_FOUND (404 Not Found error) can be sent to the client application. 5. The URL2AgentTranslatorAgent requests the PDPAgent to verify/validate this request against the access policy it has for the given agent. 6. The PDPAgent validates the requests and sends an appropriate response back to the URL2AgentTranslator. 7. The URL2AgentTranslator can then send a REFUSE (403 Forbidden error) to the client or an OK (200 OK) to the client based on the response it received from the PDPAgent. 8. This causes the client to initialise the JVLC client and start listening for RTP streams being sent to it. 44 9. At the same time the URL2AgentTranslatorAgent sends a request to MediaAgent that is responsible for the requested URL. This request states the IP address that it should stream to. 10. When the MediaAgent receives such a request it begins streaming to the given IP Address using the JVLC library and logs the bytes sent for billing purposes. 11. Client sends back QoE measurements to the MediaAgent that are included in the logs it sends for billing purposes. Figure 7-18: Sequence Diagram - Client Requests Content 45 45 46 Chapter 8 | Measurements “Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted.” - Albert Einstein This section presents the measurements that were carried out on the proof-of-concept prototype developed. 8.1 Testbed overview We carried out a number of tests to ascertain the performance of our prototype. Our test environment included two Personal computers with the following specifications: Pentium 4 Processor @ 3.5 GHz, 512MB of RAM. These two computers were connected using a TCP/IP connection over a 100MBps Ethernet connection. Using this setup we performed the following tests: 8.2 Migration Time The migration time of our agents is the time it takes for an agent to migrate from one execution environment and relocate itself on another execution environment. For our purposes it was important to determine how this time is affected by the media assets that are coupled with the agent. In this regard we measured the migration time for agents with different payload sizes. The payloads used in these measurements were JPEG images. We also compared two different migration strategies. These were migration using purely FIPA ACL Messages and migration using a combination of FIPA ACL Messages, for the agent itself, and via FTP, for the media asset. Migration using FIPA ACL Messages only: Migration using FIPA ACL Messages necessitated loading the payload into the memory space of the agent. This payload was then written out to disk upon completion of the migration step. Since Images in java are not serialisable, we had to load the pixels that makeup the image into a byte array. For each of these tests we configured our agents to print out the system time before migration was initiated in the agent’s BeforeMove() method and after the agent had completed migration in the agent’s AfterMove() method. In order to overcome the problem of synchronizing the clocks on the execution environments used we programmed the agents to migrate between the execution environments a total of 10 times for each. This meant that we could merely take the absolute time between the call to the BeforeMove() method and AfterMove() method and divide by two. Migration using FIPA ACL Messages and FTP Service: Migration using a combination of FIPA ACL Messages and FTP was comprised of two basic steps. The first step was the ordinary migration of the Agent using FIPA ACL Messages. Once this migration step has been completed the agent would then request an FTP service present on the execution environment to retrieve a specified file from its previous location. For the these tests we configured our agents to print out the system time before migration was initiated in the agent’s BeforeMove() method. We also configured our agents to print out the system time 47 47 upon receipt of the notification from the FTP service notifying it that the media asset has been successfully retrieved from the requested location. For these tests we also configured 10 migrations on each execution environment to avoid clock synchronisation issues. 100,000 Migration Time (in ms) 10,000 1,000 100 10 1 1 10 100 1000 10000 Size of Agent Payload (Kilo Bytes) Migration Using ACL Messages Migration Using ACL Messages and FTP Figure 8-19: Migration Time of Agents Figure 8-19 shows the migration time of agents using ACL Messsages versus using a combination of ACL Message and FTP. Both the size of the agent and the migration time are plotted on a logarithmic scale. We notice that the migration time is pretty much constant between 1kb and 20kb. At these sizes migration using purely ACL Messages out performs migration using ACL Messages and FTP by as much as 36.04%. This we attribute to the messaging overhead. For the FTP case the agent must send additional messages to the FTP Service and wait until it receive a notification message. However, beyond 25kb the migration using a combination of FIPA ACL Messages and FTP exhibits far better performance. Due to the memory limitations we could not instantiate agents with a bundled images of sizes greater than 1Mb. Even though, migration using purely FIPA ACL Messages was 56.85x times slower. 8.3 Creation Time The creation time of agents is the time it takes for an agent to be instantiated. This was achieved by the ReplicatorAgent. On the JADE platform, to create an agent a message must be sent to the AMS. This message contains the class name for the agent that is to be instantiated, its agent name and any additional arguments that must be passed to the created agent. For our measurements we configured the ReplicatorAgent to print out the system time just before it sends the message requesting the creation of an agent. We further print out the system time when the AMS responds with successful notification of a created agent. Lastly, for comparison we requested the agent to print out the system time as soon as it was created. We performed these tests multiple times and took an average, standard deviation and average deviation of the times recorded. For this test we measure an average time of 22.2ms with an average deviation of 7.24ms and a standard deviation 9.31ms. 48 8.4 Deployment Time The deployment time of agents is the time it takes for a agent to settle on a new platform. This occurs when the agent has just migrated or has been cloned onto a new platform. Typically our agents need to subscribe to a number of services including authorisation and monitoring services. For our purposes this time excludes the actual migration time and so if the agent needs to retrieve any media assets from a remote location this time is not included here. The Deployment time can therefore be determined by the time it takes for an agent to: 1. Register with the directory facilitator the URL it is responsible for. 2. Deposit a policy file for the PDPAgent to use as an access policy for the agent. 3. Subscribe to the PlatformMonitorAgent for Platform event notifications. Steps 1 and 3 are dependent on FIPA ACL Messages and can be expected to remain pretty much constant. Step 2 is an I/O operation which would depend on the size of the policy file in bytes. However, since such files are typically text files the time for such operations is also expected to be fairly small. We carried out 10 tests to ascertain the times associated for each of these steps and thereby determined the aggregate time. This we found to be between 160 – 800 Milliseconds. 8.5 Agent Population The aim of this test was to determine the total number of agents that could be supported on our platform. In these tests a ReplicatorAgent was created that instantiated MediaAgents in a loop. Upon receiving a notification from the AMS a counter would be incremented to represent a new agent created. This loop was performed by the ReplicatorAgent until all the available resources on the host execution environment were depleted. (Does not make sense yet) For this test we were able to create about 1,400 agents in the same container before getting an OutOfMemoryError from the Java Virtual Machine. 8.6 Ping-Pong Protocol The Ping-Pong protocol is particularly useful for connection establishment. Three scenarios exists where this protocol is particularly used. The first was the case where an execution environment is attempting to establish a connection with a peer. In order to achieve this, the execution environment periodically sends out Ping Messages. The second case is where the two peers have actually established a connection with each other. In this case updates need to be exchanged between them in order to maintain a consistent state. Finally if two peers have established a connection but no updates need to be exchanged KeepAlive messages are exchanged between to ensure that the connection is kept alive. The table 3 below shows the size of the messages that are used in this protocol. Message Ping Message Pong Message Refresh_Request Message Refresh Message Update Message KeepAlive Message Size (Bytes) 12 12 16 Vary Vary 16 Table 3: Size of Ping-Pong Protocol Messages 49 49 The update and refresh messages vary in size depending on the number of agents that are resident on a platform and the churn experienced respectively. These messages are further wrapped as ACLMessages by the JADE platform before being transmitted on the wire. 8.7 Policy Decision Point The policy decision point is another critical system component. It is responsible for the access policies that are associated with our agents. In a distributed system like this, the time taken to reach such a decision is of utmost importance. For the PDP based on Sun Microsystem’s implementation the steps involved are as follows: 1. Instantiation and initialisation. (125ms) 2. Evaluate a request based on policies (15ms) Step 1 above involves I/O operations where Step 2 is mostly characterised by XML parsing and string comparison operations. The total time measured was 140ms for simple access policies. It was beyond the scope of this work to determine how long the PDP would take to evaluate complex policies that are made up of numerous files. 8.8 Inference Engine The inference engine is a critical component of the media agents. It is exceptionally important for this engine to be able to provide valid and timely decisions. The time associated with reaching a decision using a fairly simple rule-base was measured to give an indication of how well this engine would perform. This time was found to be 56ms. This time was made up of the following components: 1. Instantiation and initialising the rule-base (31ms) 2. Loading the rule variables (0ms) 3. Forward chain (25ms) a. Fire rules b. Conflict resolution 4. Return final decision It should be noted that for fairly large and complex rule-bases step 3 may need to be iterated through a number of times before a final decision is reached. Due to time limitations it was not feasible to determine the time this engine would take to process fairly large and complex rule-bases. 50 Chapter 9 | Discussion “Convinced myself, I seek not to convince.”- Edgar Allan Poe This section discusses the measurements that were made on the proof-of-concept prototype that was realised. Attempts are also made to provide analytical models that compare the performance of the proposed system with existing solutions. 9.1 Time Dependent Measurements Our time dependent measurements were highly dependent on JADE ACL messages. Since these messages are asynchronous and there are no guarantees as to when a message will be processed by a recipient. The message is merely placed in an inbox until such a time as that agent executes code to read the message. To further complicate this, JADE messages have no timestamps so we cannot verify when the message was actually sent. Despite these irregularities, the creation and replication times of agents were fairly acceptable for such a system. Both these were in the order of milliseconds. The migration time of agents clearly shows that migration of media assets using FTP is a preferred method. This is also in line with the decision to decouple the media asset from the agent in order to reduce memory consumption. If the media file is held with in the agent and the agent is resident in memory then it can be inferred that the host platform will experience a significant performance penalty in terms of memory utilisation and page swapping. However, decoupling the media from the agent also introduces an adverse security issue. This issue is discussed later in Section 9.13. 9.2 Agent Population The agent population was considerably low considering the number of media assets that a typical CDN would have to host on its surrogate servers. This could be improved somewhat by using a more powerful computer. We could also scale the platform further by spreading the containers across multiple hosts. Furthermore, we could also attempt to introduce multithreading in our MediaAgents. This would greatly reduce the requirement for a large number of agents making it possible for a single agent to manage multiple media assets. All in all JADE is probably not well suited to supporting a large number of Agents on a single JVM. For such applications another framework that would probably be more suitable should be identified. 9.3 Bundling Overhead Due to restrictions that were identified in the above section it is critical to keep the number of agents hosted on an execution environment. The typical size of an agent jar file and policy file was found to be no more than 10kb in file size. The migration tests discussed in section 8.2.2 suggest that the minimum size of media asset should be no smaller than 1Mb to benefit from the performance gains of the migration strategy the prototype utilises. This suggests that bundling an agent with a media asset almost negligible (~1%). This is further emphasised by 51 51 the fact that for the selected application, streaming video content, the media assets tend to be extremely large files. 9.4 Network Load It can be shown, that for certain events that require real-time measurements, the distributed approach used in this work is more favourable than the centralized approach. Unlike traditional CDNs, that typically need to collect raw data from each surrogate server on access statistics, host conditions and so on, in order to make decisions on how to shuffle content around the CDN infrastructure, react to certain events or even perform billing cycles for clients, this framework is able to filter and aggregate raw information and take local decisions. This greatly reduces the amount of information that needs to be transmitted to the central servers for management and control decisions. The proposed framework eliminates the need for the central management and control platform to have real-time data for decisionmaking. The logic required to make such decisions is embedded in agents and thus instead of reporting its raw statistics a platform can merely report its state. This is an important concept in large CDNs where the number of surrogates is significantly large. The sheer volume of data generated in such networks is such that it can cripple network links and place huge burdens on the servers that must process this data. This approach decentralizes this task making it possible for the CDN to react to such events independently based on predefined policies. 9.5 Maintaining Consistency One of the common problems related to maintaining consistency is updating policies. If a system with n surrogate servers distributed globally is considered, the challenge is to ensure that all of these servers get updated at the same time. This ensures that all servers respond in a similar manner to requests made for a specific media asset. In such a system, if the traditional centralised approach is used, the central server will need to update each of these n servers regardless of where the media asset is. This is necessary because changes in viewer patterns could occur at anytime which would result in the media asset being relocated at a later time. Therefore with n servers the centralised approach will need to update n servers locations with new policies. In the proposed decentralised system changes made to a policy will need to be updated only in locations where the media asset is located. This means that with n servers, only a subset, k server locations will need to be updated where k <= n. This is because the policies for a media asset are bundled with the asset. In the event that the media asset needs to be relocated the policy file will also move with it to the new location. For popular content k and n are very close. This is expected to yield comparable performance using both approaches. However, for long tail content k can be significantly smaller than n. Since most content online can be characterised as long tail content, it can be said that probabilistically, content update in the proposed framework requires fewer network resources. Furthermore, content in our framework has an associated agent that is able to reason based on predefined policies. These policies can be fine grained offering vast possibilities including requiring the agents to call home at specified periods of time to update their internal logic. It can therefore be stated that this added intelligence would greatly simplify the management of content. 9.6 Resilience and Stability In the proposed CDN prototype, media agents are basically reactive. This means that they will remain on a host execution environment, resident in memory, until such a time when 52 either the conditions become adverse or a request is made for the media asset they are responsible for. In this regard we can consider them to be relatively stable because for as long as favourable conditions prevail the agents can be thought of as at an equilibrium point. In the same vein agents can be described as being resilient. This is due to their ability to adapt themselves to destabilizing events that occur in an effort to return to a state of equilibrium. The question of resilience can also be addressed from a different perspective by considering the impact of failure on the system. In the proposed framework failure of a service is equivalent to failure of an agent. Since multiple redundant agents can exist on a single platform the implications of such an occurrence are minimal. In a traditional CDN framework the failure of a service would suggest server failure. This is a frequent occurrence which takes place in large CDNs that use cheap off the shelf PC’s as is the case for Akamai [20]. Such a failure though significantly more serious than agent failure is circumvented by having multiple redundant servers which are able to take over the role of the failed server. This approach, in the author’s opinion, is expensive. Typically the setup and maintenance cost of such servers is relatively high and cannot be easily orchestrated on a global scale. This further underscores the importance of deploying generic CDN platforms which multiple CDNs can use to provide/serve content from. 9.7 Matchmaking / Service Discovery We chose to have the agent control where they choose to migrate or replicate to. These made them seemingly more autonomous and not have to rely on a brokerage service provided by the host execution environment. However, as the number of agents on a platform that need to migrate / replicate at roughly the same time increases, and the number of possible alternate locations becomes high this might prove to be sub-optimal. We can consider a case when a host execution environment needs to evict all of its agents, possibly for maintenance reasons. If the number of agents that reside on that host is n and the number of alternate locations available is k then each agent needs to send k messages. The total number of messages sent: n*k as n and k become rather large, as is the case in a real CDN infrastructure, would generate a significant amount of network traffic and would require / utilise a lot of computational resources. Though the number of messages sent could be controlled such that only platforms that are within close proximity are contacted. However, this does not improve the situation considerably for our agents. Since each agent carries out independent negotiations it is conceivable that most, if not all, agents may decide to migrate to the same alternate location thereby depleting the resources on that host execution environment. In order to minimise these risks a brokerage service could be used on the host machine. Rather than sending out requests for proposals at each time an agent needs to migrate or replicate, execution environments could periodically advertise and exchange capability information much like modern routing protocols. These advertisements could be cached by the brokerage service. Upon request from agents the service would be able to quickly and efficiently advise the agent on possible execution environments to relocate to. In the event that the host execution environment needs to evict its resident agents, this service can expeditiously re-allocate/share the load/spread the load the resident agents to the alternate execution environments ensuring that no execution environment is overwhelmed. 9.8 Built-in Inference Engine Another design decision that we made that would adversely affect the performance of our system should the population of agents resident on an execution environment increases 53 53 significantly was the decision to incorporate the Inference Engine inside the agent. This was expected to further make the agents more independent and thus more autonomous. However, since the engine is rather generic and needs to be resident in memory with the agent it can be (seen though of, not difficult) much of the available computational resources will be depleted. This is further complicated by the fact that as the number of rules in the rule base increases the inference engine may need to iterate through the list of rules a number of times to determine a final outcome. The rule base containing the policies the agent must follow is perhaps the only unique aspect and thus an external inference service resident on the host execution environment would probably be more optimal. In fact, similar to the authorisation service where an agent registers its access policy, the agent could also register its rule base with this service and get notifications of decisions that are reached based on its rule base. This will result in agents that are more light weight and consume fewer resources. These benefits will also reflect positively on the migration time of agents from host to host. An inference engine can either use forward chaining or backward chaining to arrive at a conclusive decision based on the given facts. In many situations backward chaining is more appropriate and more efficient than forward chaining. Forward chaining was adopted in this work due to its simplicity. The inference engine used however does support backwards chaining and a possible future work could review the performance of agents using forward chaining versus those using backward chaining. 9.9 All-Mobile Agent Architecture In the proposed system only the MediaAgent is mobile. However, it is not difficult to conceive a system where all of these service agents are also mobile. Such a system would be an ideal solution for provisioning new servers in a CDN. Here a CDN provider would be able to, for instance, acquire a generic execution environment from a third party. Such a execution environment could be based on Virtualization technologies. With this environment the CDN provider would be able to dynamically deploy all of the necessary service agents in effect expanding there CDN infrastructure. Should, at a later stage, the location of this execution environment be deemed as not strategic, the CDN provider merely needs to migrate all of its associated services and terminate the contract with the owner of the platform. 9.10 Global vs Local Optimisation 9.11 Global Decisions vs Local Decisions One of the key attributes of the proposed SMA architecture was the ability to make local decisions. This has the benefits of not having to rely on some central control platform for local decisions. As a result decisions can be made in a timely fashion and raw data does not need to be sent to the central control platform for processing. Such processing was found to be extremely processor intensive. However, there are some merits to taking global decisions. Global decisions take into consideration all of the players in a CDN and thus can be seen as an optimal. Local decisions on the other hand can result in undesirable conditions. One such condition is where the agents find themselves oscillating between two execution environments. This can be avoided through well-written policies that check the locations an agent has visited before migration. Another condition is a race condition where two or more of the same agents attempt to relocate to the same host simultaneously. This condition is unlikely when making global decisions but is highly probable if each agent is indeed acting 54 independently. One work around for this condition that is proposed is the inclusion of an additional step during the Contract Net Protocol. In this work around an execution environment that responds positively to a certain agent should record the name of the agent in some temporary registry. Once the agent migrates to the execution environment, it can erase its name from this registry. Furthermore the execution environment must use this registry to ensure that it does not respond positively to further call for proposals by agents with the same name. 9.12 Ping-Pong Protocol The Ping-Pong Protocol developed for this work was a stopgap measure. The protocol, albeit simple, provides adequate functionality required for coordinating the execution environments and performed relatively well. However, the protocol was found to be rather chatty. This was a trade off that was made to make the discovery process more responsive. Though the amount of traffic is negligible, when compared with the size of media assets that would be used in such an infrastructure, it is not easy to identify how this protocol could be further enhanced. The major problem with the protocol stems from the fact that there is no way of knowing when a peer execution environment has become available. As a result the protocol simply attempts to establish a connection with the peer by sending ping messages until one of its pings is answered. An enhancement to this protocol could be to reduce the frequency that a peer sends ping messages if it receives no response. In this regard the execution environment would send a series of ping messages for a predefined length of time. If no responses are received the frequency of sending pings could be halved and another series of pings could be sent. This procedure could be repeated resulting in a steady decay in the number of ping messages sent. Finally, the execution environment needs to be able to reset the frequency when a ping has been received from a peer. 9.13 Security Considerations In this section we discuss some of the security concerns that are associated with our system. Most of these concerns are inherent to the JADE platform and can not be attributed to flaws in our design. Though numerous security issues are evident, which we have not attempted to correct due to the scope of this work, we wish only to highlight some key security concerns that could have been implemented but were left out due to time constraints. 9.13.1 Agent on Agent Impersonation Communication between agents, especially to send commands from the ControllerAgent to media agents is achieved through ACL messages. Though these messages use serialised objects that can not be easily read from merely capturing the raw IP packets that they are sent in, it is possible for an agent that knows the format of such messages to impersonate the ControllerAgent and issue commands to other agents in the system. This can actually enable a agent to instruct other agents to move to certain locations or to even self-destruct. This concern has not been addressed because only the agents that we create are running on our prototype. 9.13.2 Agent on Host Environment The resources consumed by an agent as it resides on a platform are not closely monitored. This is due to constraints in the tools that were at our disposal. We were only able to measure the resource usage of the java runtime environment on which the JADE platform was running. We would have preferred to have the capability of reporting this information at a class level. This would be very useful for billing purposes but also as a mechanism to detect Denial of 55 55 Service attacks by agents on the host environment. In this case an agent could merely reserve resources for itself making it difficult for other agents with legitimate resource requirements to perform their tasks optimally. 9.13.3 Host Environment on Agent The de-coupling of media from the agent introduces some security risks that are worth mentioning. Since the media file is now resident on the file system it is prone to abuse by the host environment. i.e. anyone who gains access to the host environment could potentially access the media asset. This introduces Digital Rights Management (DRM) issues that are beyond the scope of this work. 9.13.4 Distribution of FTP Login Credentials The distribution of login credentials is another potential security loophole in our system. So far all the FTP servers in use in our infrastructure use the same password. However, in a real CDN infrastructure with numerous servers and possibly even spread across multiple autonomous systems, the distribution of passwords would be a challenge. Related to 8.3 9.13.5 Delivery of Video Streaming Another DRM issue that arises is the related to the delivery of media to the client. In this work the media agent is presented with an IP address of an authorised user and subsequently the agent streams the media content to that IP address. Anyone that is listening promiscuously, on any of the network segments that this stream takes en route to that IP address, can potentially rebuild the media file. Taking appropriate steps to remedy this security loophole was beyond the scope of this work. However, encrypting the video stream as it is being delivered is a plausible direction to solving this issue. 9.14 Degree of Autonomy Huber{} states that autonomy is commonly conceived as the quantity of work an agent can perform independent of human intervention. By this definition our prototype can be regarded as autonomous. Agents in the proposed prototype are able to make decisions that manage the tasks they are assigned to do and their lifecycle independently. 9.15 Economic Considerations The use of software mobile agents to deliver content in a Content Delivery Network has many merits. This would provide a means to create heterogeneous CDN infrastructures that can span organisational boundaries. This approach is of particular interest because it presents an opportunity for different CDN vendors to share resources and also lowers the barriers of entry for start-up CDN providers. Today interoperability between CDN providers is next to impossible due to proprietary technologies and the lack of standardisation in CDN technology. This works proposes a generic execution platform that could be deployed different CDN providers. Each platform would provide some level of basic services to resident agents. With all of the business logic and intelligence resident within these agents, they would be free to roam the Internet from one platform to another based on predefined policies delivering content that they are responsible for. CDN providers would therefore be able to bill each other for network/computing resources consumed. Such an arrangement is expected to drastically reduce the cost of CDN services. CDNs will no longer have to deploy parallel networks to reach clients serviced by their competitors. On 56 the contrary, a CDN provider would be able to dynamically negotiate with a peer CDN provider and deploy its agents into the competitor’s network. This vision is far from being realised however. For starters standardisation would be extremely essential before industrywide acceptance of such an approach would be possible. Furthermore, many of the concerns discussed in this chapter will need to be addressed before such an infrastructure can be made possible. The future work chapter of this manuscript discusses some of the extensions to this work that could make wide spread adoption of agent technology in CDNs possible. 57 57 58 Chapter 10 |Conclusions “It is a bad plan that admits of no modification.” - Publilius Syrus This work has studied and documented the current state of the art trends and technologies that are being employed in Content Delivery Networks (CDNs) today. The use Software Mobile Agents (SMAs) to manage CDNs has been investigated and an innovative approach to building CDN infrastructures using SMAs has been presented. This has afforded us the opportunity to appreciate how SMAs can deal with the unique CDN problems and helped us to understand the distinct challenges that they introduce. We have presented a novel and innovative SMA architecture that can be used to implement a decentralized management and control framework. Our prototype, developed as a proof of concept, successfully proves that it is indeed technically feasible to develop and deploy a CDN using SMAs. This work demonstrates that SMAs offer a robust and {powerful} infrastructure for the deployment, delivery and management of content in a CDN. Through collaboration and intelligence, agents can exhibit high levels of flexibility and autonomy. These features in turn can lead to lower cost of operations for CDN providers. Another benefit to CDN providers is the possibility of collocating facilities which is inconceivable using the {dissimilar} proprietary technologies that are currently in use today. This work has demonstrated that agents are able to exhibit high level interaction through xx such as contract net negotiations and auctions. These have been demonstrated to be useful for self-management in this work. Such interactions could be useful for dynamic negotiations in multi-vendor CDN infrastructures. In such a scenario agents from several designers, several vendors, and several organisations could cooperate and share resource based on SLAs. Through the use of open standards such as Oasis XACML, SNMP, FIPA and application domain ontologies to ensure openness and heterogeneity it is relatively easy to envisage how SMAs could well become an emerging standard in future deployments of multi-vendor CDNs. An SMA based CDN infrastructure offers content providers choice benefits. With the proposed architecture a Content provider would be able to deploy their media asset to numerous Content Delivery Network providers easily. This would open up a lot more opportunities as content providers would not be locked into a single provider. This would also allow a content provider to optimize the delivery of their content on a global scale in terms of cost, quality of experience, and availability among other parameters. Through the use of simple models we have shown that SMAs can reduce the network load and network traffic significantly. We have also shown that through the use of SMAs a reduction in the amount of data collected throughout the CDN infrastructure for optimisation and billing purposes can be achieved. For certain content delivery scenarios, we have been able to show that SMAs can exhibit a high degree of stability, resilience and fault tolerance. Finally, through well defined agent and access policies our CDN architecture is able to make localized decisions which simplifies configuration and reduces the number of configuration and re-configuration steps. Moreover for a number of content update scenarios, we have 59 59 shown that ensuring consistency is a trivial task. This in turn makes it possible for CDN providers to offer more customized and fine tuned services. 60 Chapter 11 |Future work “For tomorrow belongs to the people who prepare for it today.”– African Proverb. This section identifies and describes some of the areas where further work is required. These are proposed as future extensions of this work. A few open questions are also highlighted. 11.1 Custom agent platform The JADE platform was successfully used to realise the proposed SMA CDN architecture. However, there were some issues with this platform that presented numerous challenges in the realisation of the prototype. Of these issues multi-threading support was of paramount importance. This was not true for the interaction between agents. However, in order to accommodate and interface with external entities the ability to spawn new threads of execution was vital. A typical application of this is capability is required for handling requests for media. It is proposed therefore, as a future extension to this work, that an agent platform that supports multi-threading be identified. Research into agent platforms that can provide fault tolerance, inherent security and (semi) real-time performance should also be explored. 11.2 Further measurements of performance Due to time limitations it was not feasible to quantify the performance of our proposed CDN framework as the number of users, caches, media assets, and the popularity of media assets increases. Such being the case, only minimal view of the overall performance has been provided. Furthermore, this performance has not been benchmarked with other existing solutions. These tasks are also proposed as a future extension to this work. Finally, investigations into how this framework is influenced by network metrics such as bandwidth, jitter, packet loss, node failure, flash crowds and network latency among other factors is also proposed. Lastly, it is proposed that work into developing a CDN specific ontology be carried out as a first step into standardising agent-based CDN deployments. 11.3 Peer to peer location of platforms This work presented the Ping-Pong protocol as a stop-gap measure for locating execution environments. The authors however, envisage a mechanism that can perform much like a modern routing protocol as a future extension. In this mechanism, the ability to define policies, like those found in the BGP routing protocol, should be able to allow execution environments, both in the same AS and in different Ass, to discover each other. This may even involve the dynamic negotiation of SLA’s between peering execution environments in different ASs. This is seen as necessary step in dynamically composing multi-vendor CDNs. 11.4 Adaptiveness Though it may be feasible to identify all of the probable scenarios that we expect the agent to encounter ‘a priori’ it could be argued that defining these scenarios in the agent policies 61 61 that were described may not be practical and could even possibly reduce the response time of the agents. Consequently, it is proposed that some form of machine learning be employed by the agents. We therefore propose two approaches that could be adopted as a future extension to this work. Neural Network Learning:- In this approach most of the parameters that will affect our agents are fairly well understood and are know before hand. However, the weight that these parameters contribute to the final outcome or decision cannot be determined. This is true even for the proposed system. One example of this instance is the question whether memory utilisation should take higher precedence over CPU utilisation. This approach solves this issue by adjusting the weight of these factors over time in order to improve the accuracy or correctness of the final outcome. Data mining:- This is the ability to extract valuable information that is often times not obvious from large collections of data. In this approach it is proposed that some agents can be made responsible for analysing patterns in the data collected by an execution environment. This will enable them to learn through pattern recognition information that was not available at the onset. A challenge that this presents is how the data could be shared in a CDN infrastructure that spans organisational boundaries. 11.5 Resource Consumption. This is an important question if hosted agents are developed by third parties. Tools available such as top, stats, mem, and others only show that the java runtime is consuming these resources. It is important to find out which classes are actually consuming what resources. This can be useful for billing and to detect Denial of Service (DoS) attack by agents on a platform. In this case an agent could merely reserve resources for itself making it difficult for agents with legitimate resource needs to perform their tasks. Another research direction is how virtualization can be used to address the problem highlighted above. 62 References [1] “Internet Growth Statistics (1995 2010).” Available from: http://www.allaboutmarketresearch.com/internet.htm Last Accessed: April, 2009. [2] “YouTube Homepage” Available at: http://www.youtube.com Last accessed: February, 2009. [3] “Google Losing up to $1.65M a Day on YouTube.” Available at: http://www.internetevolution.com/author.asp?section_id=715&doc_id=175123& Last accessed: February, 2009. [4] “Market Size For Video CDN.” Available http://blog.streamingmedia.com/the_business_of_online_vi/2007/12/market-sizefor.html Last accessed: February, 2009. at: [5] “Uptime Definition - Wikipedia, th free encyclopedia.” http://en.wikipedia.org/wiki/Uptime Last accessed: February, 2009. at: Available [6] N.R. Jennings, K. Sycara, and M. Wooldridge, “A Roadmap of Agent Research and Development,” Autonomous Agents and Multi-Agent Systems, vol. 1, Mar. 1998, pp. 738. [7] D. Lange, M. Oshima, G. Karjoth, and K. Kosaka, “Aglets: Programming mobile agents in Java,” Worldwide Computing and Its Applications, 1997, pp. 253-266. [8] D.B. Lange and O. Mitsuru, Programming and Deploying Java Mobile Agents Aglets, Addison-Wesley Longman Publishing Co., Inc., 1998. [9] “Internet Definition - Wikipedia, the free encyclopedia.” http://en.wikipedia.org/wiki/Internet Last accessed: February, 2009. Available at: [10] “RFC 791 - Internet Protocol.” Available at: http://tools.ietf.org/html/rfc791 Last accessed: February, 2009. [11] S. Halabi, Internet Routing Architectures (2nd Edition), Cisco Press, 2000. [12] “RFC 4271 - A Border Gateway Protocol 4 (BGP-4).” Available at: http://www.ietf.org/ rfc/rfc4271.txt Last accessed: February, 2009. [13] “RFC 2328 - OSPF Version 2.” Available at: http://www.ietf.org/rfc/rfc2328.txt Last accessed: February, 2009. [14] “RFC 2453 - RIP Version 2.” Available at: http://www.ietf.org/rfc/rfc2453.txt Last accessed: February, 2009. [15] M. Welzl, Network Congestion Control: Managing Internet Traffic, Wiley, 2005. [16] N. Bartolini, E. Casalicchio, and S. Tucci, “A Walk through Content Delivery Networks,” LECTURE NOTES IN COMPUTER SCIENCE, 2004, pp. 1-25. [17] “Network congestion - Wikipedia, the free encyclopedia.” Available http://en.wikipedia.org/wiki/Network_congestion Last accessed: February, 2009. at: [18] Pallis George and Vakali Athena, “Insight and perspectives for content delivery networks,” Association for Computing Machinery. Communications of the ACM, vol. 49, 2006, pp. 101-106. [19] “Multiprotocol Label Switching - Wikipedia, the free encyclopedia.” Available at: Last accessed: February, 2009. 63 63 [20] “Experiences with Scalable Network Operations at Akamai, at USENIX LISA '07.” Available at: Last accessed: February, 2009. [21] “The Long Tail - Wikipedia, the free encyclopedia.” Available http://en.wikipedia.org/wiki/The_Long_Tail Last accessed: February, 2009. at: [22] “Akamai Technologies.” Available at: http://www.akamai.com/ Last accessed: April, 2009. [23] “How Akamai Works.” Available at: http://research.microsoft.com/enus/um/people/ratul/akamai.html Last accessed: October, 2009. [24] “Limelight Networks.” Available at: http://uk.limelightnetworks.com/index.php Last accessed: April, 2009. [25] De Lancie Philip, “Delivering under pressure,” EContent, vol. 26, 2003, pp. 26-30. [26] “BitTorrent DNA.” Available at: http://www.bittorrent.com/dna/ Last Accessed: April, 2009. [27] X. Liu, H. Yin, C. Lin, Y. Liu, Z. Chen, and X. Xiao, “Performance Analysis and Industrial Practice of Peer-Assisted Content Distribution Network for Large-Scale Live Video Streaming,” Advanced Information Networking and Applications, 2008. AINA 2008. 22nd International Conference on, 2008, pp. 568-574. [28] Rayburn Dan, “The State of the CONTENT DELIVERY MARKET, 2008,” Streaming Media Magazine, 2008, pp. 36-38,40-41. [29] B. Molina, C. Palau, and M. Esteve, “Modeling content delivery networks and their performance,” Computer Communications, vol. 27, 2004, pp. 1401-1411. [30] Wei-ying Ma, Bo Shen, and Jack Brassil, “Content Services Network: The Architecture and Protocols,” 2001. [31] F. Bellifemine, G. Caire, A. Poggi, G. Rimassa “JADE a White Paper” Available at: http://sharon.cselt.it/projects/jade/papers/2003/WhitePaperJADEEXP.pdf Last accessed: April, 2009. [32] D.B. Lange, “Mobile Objects and Mobile Agents: The Future of Distributed Computing?,” ECOOP’98 — Object-Oriented Programming, 1998, p. 1. [33] S. Franklin and A. Graesser, “Is it an Agent, or Just a Program? : A Taxonomy for Autonomous Agents.” Available at: http:// www.msci.memphis.edu/~franklin/AgentProg.html Last Accessed: October, 2008. [34] F.L. Bellifemine, G. Caire, and D. Greenwood, Developing Multi-Agent Systems with JADE, Wiley, 2007. [35] S. Covaci, “Grasshopper: The First Reference Implementation of the OMG MASIF.” Available at: http://www.omg.org/docs/orbos/98-04-05.pdf Last accessed: October, 2008. [36] I. Gorton, J. Haack, D. McGee, A. Cowell, O. Kuchar, and J. Thomson, “Evaluating Agent Architectures: Cougaar, Aglets and AAA,” Software Engineering for Multi-Agent Systems II, 2004, pp. 264-278. [37] D. Wong, N. Paciorek, T. Walsh, J. DiCelie, M. Young, and B. Peet, “Concordia: An Infrastructure for Collaborating Mobile Agents,” LECTURE NOTES IN COMPUTER SCIENCE, 1997, pp. 86-97. 64 [38] M. Dikaiakos, M. Kyriakou, and G. Samaras, “Performance Evaluation of Mobile-Agent Middleware: A Hierarchical Approach,” Mobile Agents, 2001, pp. 244-259. [39] “MASIF Specification.” Available at: http://www.omg.org/docs/orbos/97-10-05.pdf Last accessed: October, 2008. [40] “The Foundation for Intelligent Physical Agents http://www.fipa.org/ Last accessed: February, 2009. (FIPA).” Available at: [41] “List Of Video Delivery Networks Now Tops 50 Providers | The Business Of Online Video.” Available at: http://blog.streamingmedia.com/the_business_of_online_vi/2008/05/list-of-video-d.html Last accessed: November, 2008. [42] “Academic Careers for Experimental Computer Scientists and Engineers” Available at: http://www.nap.edu/openbook.php?record_id=2236 Last accessed: December 2008. [43] M. Pěchouček and V. Mařík, “Industrial deployment of multi-agent technologies: review and selected case studies,” Autonomous Agents and Multi-Agent Systems, vol. 17, Dec. 2008, pp. 397-431. [44] N. Reed, “A User Controlled Approach to Adjustable Autonomy,” System Sciences, 2005. HICSS '05. Proceedings of the 38th Annual Hawaii International Conference on, 2005, p. 295b. [45] W. Truszkowski and C. Rouff, “Progressive autonomy: a method for gradually introducing autonomy into space missions,” Software Engineering Workshop, 2002. Proceedings. 27th Annual NASA Goddard/IEEE, 2002, pp. 164-171. [46] M. Tentori, J. Favela, and M. Rodriguez, “Privacy-Aware Autonomous Agents for Pervasive Healthcare,” Intelligent Systems, IEEE, vol. 21, 2006, pp. 55-62. [47] C. Frei and B. Faltings, “A dynamic hierarchy of intelligent agents for network management,” Intelligent Agents for Telecommunication Applications, 1998, pp. 1-16. [48] N. Jennings and S. Bussmann, “Agent-based control systems: Why are they suited to engineering complex systems?,” Control Systems Magazine, IEEE, vol. 23, 2003, pp. 61-73. [49] “Iberdrola.” Available at: http://www.iberdrola.es/webibd/corporativa/iberdrola? IDPAG=ESWEBINICIO Last accessed: April, 2009. [50] “Daimler Chrysler Group.” Available at: http://www.daimler.com/ Last accessed: April, 2009. [51] M.J. Huber, “Agent Autonomy: Social Integrity and Social Independence,” Information Technology, 2007. ITNG '07. Fourth International Conference on, 2007, pp. 282-290. [52] V. Marik and D. McFarlane, “Industrial adoption of agent-based technologies,” Intelligent Systems, IEEE, vol. 20, 2005, pp. 27-35. [53] “JADE Inter-Platform Mobility Service.” http://sourceforge.net/projects/jipms/ Last accessed: March, 2009. Available [54] “JADE API Documentation v3.6.1.” Available http://jade.tilab.com/doc/api/index.html Last accessed: March, 2009. at: at: [55] N. Santoro, “Mobile Agents Computing: Security Issues and Algorithmic Solutions,” Theoretical Computer Science, 2005, p. 22. 65 65 [56] G. Karjoth, D. Lange, and M. Oshima, “A Security Model for Aglets,” Mobile Agents and Security, 1998, pp. 188-205. [57] D. Chess, C. Harrison, and A. Kershenbaum, “Mobile agents: Are they a good idea?,” Mobile Object Systems Towards the Programmable Internet, 1997, pp. 25-45. [58] V. Gunupudi , S. Tate, “SAgent: a security framework for JADE”, Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, May, 2006. [59] L. Crow and N. Shadbolt, “IMPS - Internet Agents for knowledge engineering.” Available at: http://eprints.ecs.soton.ac.uk/2322/1/index.html Last accessed: December, 2009. [60] “FIPA Contract Net Interaction Protocol Specification.” Available at: http://www.fipa.org/specs/fipa00029/SC00029H.html Last accessed: February, 2009. [61] “Prometheus Design Tool.” Available at: http://www.cs.rmit.edu.au/agents/pdt/ Last accessed: January, 2009. [62] “FIPA ACL Message Structure Specification.” Available at: http://www.fipa.org/specs/fipa00061/SC00061G.html Last accessed: February, 2009. [63] J.P. Bigus, J. Bigus, and J. Bigus, Constructing Intelligent Agents Using Java: Professional Developer's Guide, 2nd Edition, Wiley, 2001. [64] “Ubuntu Home Page.” Available at: http://www.ubuntu.com/ Last accessed: February, 2009. [65] “Netbeans Integrated Development Environment.” http://www.netbeans.org/ Last accessed: February, 2009. Available at: [66] “RFC 1157 - A Simple Network Management Protocol (SNMP).” Available at: http://www.ietf.org/rfc/rfc1157.txt Last accessed: February, 2009. [67] “OASIS eXtensible Access Control Markup Language (XACML) .” Available at: http:// www.oasis-open.org/committees/xacml/repository/cs-xacml-specification-01-1.pdf Last accessed: February, 2009. [68] “Sun's XACML Implementation.” Available at: http://sunxacml.sourceforge.net/ Last accessed: February, 2009. [69] “Commons Net - Jakarta Commons Net.” Available at: http://commons.apache.org/net/ Last accessed: February, 2009. [70] “Java bindings for the VideoLan Client (jvlc).” Available http://sourceforge.net/projects/freshmeat_jvlc/ Last accessed: February, 2009. at: [71] “Sandia National Laboratories.” Available at: http://www.sandia.gov/ Last accessed: February, 2009. [72] B. Balachandran, “Developing Intelligent Agent Applications with JADE and JESS,” Knowledge-Based Intelligent Information and Engineering Systems, 2008, pp. 236-244. [73] “Integrating JADE and Jess.” Available at: http://sharon.cselt.it/projects/jade/doc/tutorials/jade-jess/jade_jess.html Last accessed: January, 2009. 66