Simulative Evaluation of the Greediness Alignment Algorithm
Transcription
Simulative Evaluation of the Greediness Alignment Algorithm
Louis-Marie Loe, Emmanuel Twumasi Appiah-Bonna Zurich, Switzerland Student ID: 05-310-214, 12-755-625 M ASTER P ROJECT – Communication Systems Group, Prof. Dr. Burkhard Stiller Simulative Evaluation of the Greediness Alignment Algorithm Supervisor: Patrick Poullie, Thomas Bocek Date of Submission: March 20, 2015 University of Zurich Department of Informatics (IFI) Binzmuhlestrasse 14, CH-8050 Zurich, Switzerland ifi Master Project Communication Systems Group (CSG) Department of Informatics (IFI) University of Zurich Binzmuhlestrasse 14, CH-8050 Zurich, Switzerland URL: http://www.csg.uzh.ch/ Abstract Cloud computing emerged recently as the leading technology for delivering reliable, secure, fault-tolerant, sustainable and scalable computational services, which are presented as Software, Infrastructure and Platform as a Service (SaaS, IaaS, PaaS). Moreover, these services may be offered in private datacenters (private cloud), may be commercially offered to users (public clouds), or yet it is possible that both public and private clouds are combined in hybrid clouds. With the rise of SCIs (Shared Computing Infrastructures) in general and of cloud SCIs in particular, the fair allocation of multiple resources rapidly gains relevance in communication systems. In particular, different resources like CPU, RAM, disk space and bandwidth have to be shared among users with different demands, such that the overall outcome can be considered fair. Investigating resource allocation mechanisms in cloud SCIs and its applicable fairness mechanisms requires a cloud simulator. This project is about the design, implementation and testing of a cloud infrastructure simulator also referred to as cloud simulator in this project. The cloud simulator is implemented using the Openstack cloud technology. Furthermore, we design and implement an elastic load simulator as a distinct software component but integrated to the cloud simulator. The load simulator simulates the sharing of resources requested by VMs (Virtual Machines) running on compute hosts in the cloud using a fairness metric. Additionally, we present the results obtained from the implemented cloud simulator as well as the results obtained from the implemented load simulator. We finally identify potential further work. i ii Acknowledgments We would like to express our gratitude to Patrick Poullie and Dr. Thomas Bocek our supervisors for their support throughout the entire course of this project. Their patient guidance, encouragements, explanations and useful critiques contributed in a remarkable way to the success of this project. Our grateful thanks also go to Prof. Dr. Burkhard Stiller who gave us the opportunity to work on a project which was interesting and challenging. iii iv Contents Abstract i Acknowledgments iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Description of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Related Work 3 2.1 Optimal Joint Multiple Resource Allocation [5] . . . . . . . . . . . . . . . 3 2.2 Multi-dimensional Resource Allocation [6] . . . . . . . . . . . . . . . . . . 3 2.3 Dominant Resource Fairness [14] . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Multi-Resource Allocation [7] . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.5 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 SCI Architectures Concepts 7 3.1 Cluster Computing Infrastructure [1] . . . . . . . . . . . . . . . . . . . . . 7 3.2 Grid Computing Infrastructure [1] . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Cloud Computing Infrastructure [1] . . . . . . . . . . . . . . . . . . . . . . 9 v vi CONTENTS 4 Virtualization Concepts 11 4.1 Hardware Virtualization [4] . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Role of VMM [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3 CPU Virtualization [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.4 Memory Virtualization [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.5 Device and I/O Virtualization [3] . . . . . . . . . . . . . . . . . . . . . . . 15 4.6 Hypervisor Technologies in Cloud SCIs . . . . . . . . . . . . . . . . . . . . 16 5 Resource Allocation in Clouds 19 5.1 How Cloud Resources are bundled . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 The Role of the Cloud Scheduler 5.3 The Role of the Hypervisor or Compute Host . . . . . . . . . . . . . . . . 20 5.4 Cloud Consolidation Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.5 Memory Overcommitment Techniques [13] . . . . . . . . . . . . . . . . . . 21 5.6 CPU Overcommitment Techniques [17] . . . . . . . . . . . . . . . . . . . . 21 5.7 Hypervisors Resource Allocation Techniques [2] . . . . . . . . . . . . . . . 22 . . . . . . . . . . . . . . . . . . . . . . . 19 6 Search for a suitable Cloud Simulation Tool 23 6.1 Comparison of CloudSim and Openstack [8] . . . . . . . . . . . . . . . . . 23 6.2 Openstack Architecture Overview [12] . . . . . . . . . . . . . . . . . . . . . 25 6.3 VMs provisioning in the Openstack Cloud [12] . . . . . . . . . . . . . . . . 28 6.4 Architecture of Openstack Keystone [12] . . . . . . . . . . . . . . . . . . . 31 6.5 Architecture of Openstack Nova [12] 6.6 Architecture of Openstack Glance [12] . . . . . . . . . . . . . . . . . . . . 33 . . . . . . . . . . . . . . . . . . . . . 31 7 Overview of the Cloud Infrastructure Simulator 35 7.1 Physical Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 7.2 Cloud Resources, Tenants and VMs . . . . . . . . . . . . . . . . . . . . . . 37 7.3 Nova Compute FakeDriver and the Cloud Simulator . . . . . . . . . . . . . 37 7.4 Principle of Decoupling and VMs Creation in the Cloud . . . . . . . . . . . 39 7.5 Cloud Simulator high Level Design Principles . . . . . . . . . . . . . . . . 40 CONTENTS vii 8 Overview of the Load Simulator 41 8.1 Logical Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . 41 8.2 Reader Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 8.3 Validation, Aggregation and Grouping Layer . . . . . . . . . . . . . . . . . 42 8.4 Load Consumption, Time Translation and Reporting Layer . . . . . . . . . 43 8.5 Input Parameter Design: Load Design . . . . . . . . . . . . . . . . . . . . 43 8.6 Physical Time in the Load Simulator . . . . . . . . . . . . . . . . . . . . . 45 8.7 The Fair Share Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 8.8 Allocation and Reallocation Design . . . . . . . . . . . . . . . . . . . . . . 46 9 Evaluation 49 9.1 Output of some implemented Cloud Primitives . . . . . . . . . . . . . . . . 49 9.2 Load Simulator Tests and Results . . . . . . . . . . . . . . . . . . . . . . . 49 9.3 Discussion on the Load Simulator Results 9.4 Further Work: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 . . . . . . . . . . . . . . . . . . 53 10 Summary and Conclusion 63 Bibliography 65 Abbreviations 67 Glossary 69 List of Figures 70 List of Tables 73 viii CONTENTS A Report on Milestones Implementation 77 A.1 Guiding Principle 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A.2 Guiding Principle 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A.3 Milestone 1: Search for a suitable Cloud Simulation Tool . . . . . . . . . . 78 A.4 Milestone 2: Comparison to Simulator Integration into Openstack . . . . . 78 A.5 Milestone 3: Decison on Alternatives . . . . . . . . . . . . . . . . . . . . . 78 A.6 Milestone 4: Input Parameter Design . . . . . . . . . . . . . . . . . . . . . 78 A.7 Milestone 5: Reallocation Design . . . . . . . . . . . . . . . . . . . . . . . 78 A.8 Milestone 6: Consumption Data Design . . . . . . . . . . . . . . . . . . . . 78 A.9 Milestone 7: Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 78 A.10 Milestone 8: Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 B User Guide 79 B.1 Implemented Code Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 79 B.2 Use-Case 1: Running Simulations with existing Cloud Components . . . . 79 B.3 Use-Case 2: Running Simulations with new Cloud Components . . . . . . 81 B.4 Use-Case 3: Exploring the Cloud using the implemented Primitives . . . . 85 C Contents of the CD 91 Chapter 1 Introduction 1.1 Motivation With the rise of SCIs such as clusters, grids and clouds the fair allocation of multiple resources rapidly gains relevance in communication systems. In particular, different resources like CPU, RAM, disk space and bandwidth have to be shared among users with different demands, such that the overall outcome can be considered fair. Although several recent studies have focused on resource allocation in clouds, they have done so without presenting the physical, logical and system architecture underlying the cloud SCIs. Since this underlying cloud SCI architecture is undergoing constant innovations, we found it beneficial in this study to present it along with the investigated resource allocation mechanisms to achieve more relevance and accuracy. The fact is that resource allocation mechanisms that are applicable to centralized SCIs such as clusters are completely different from those applicable to clouds. Hence any meaningful study of resource allocation mechanisms in SCIs including cloud SCIs requires an in-depth understanding of the workings of the system architecture that underlies the SCI under study. Moreover, investigating resource allocation mechanisms in cloud SCIs and its applicable fairness mechanisms requires a cloud simulator where timely, repeatable, and controllable methodologies for investigating these algorithms can be applied. We design and implement such a cloud simulator in this study. 1.2 Description of Work In this study we pay a special focus to the cloud SCI architecture. We present virtualization as the key foundation of resource sharing in clouds with its related technologies. Further we design and implement a cloud infrastructure simulator using the Openstack technology. In addition we design and implement an elastic load simulator that is a distinct software component but integrated to the the cloud infrastructure simulator. The load simulator uses real-time cloud state to simulate resource allocation among different VMs running on any compute host in the cloud such that the overall outcome at any 1 2 CHAPTER 1. INTRODUCTION time is consistent with the fairness metric used. As such it is able to simulate resource allocation for a few VMs or for the entire cloud. 1.3 Thesis Outline In chapter 2, we review related work. Chapter 3 is about SCI architectures concepts including cluster, grid and cloud SCIs. Chapter 4 deals with virtualization concepts where we present RAM, CPU and I/O virtualization concepts. Further we present modern-day cloud hypervisor technologies. In chapter 5, we cover resource allocation in clouds by defining RAM, CPU overcommitment techniques as well as hypervisors allocation and reclaiming techniques. Chapter 6 begins with a summary comparison between the CloudSim simulation tool and the Openstack cloud technology. This is followed by a detailed presentation of the Openstack cloud technology. In Chapter 7 and chapter 8, we present the implemented cloud infrastructure simulator and the implemented load simulator respectively. The evaluation in chapter 9 shows the results of some sample simulations performed with the implemented load simulator. The mapping of the implementation of the initial milestones to the present report is found in Appendix A. Appendix B contains the code statistics along with the user-guide. Chapter 2 Related Work In this chapter, we overview 4 studies related to multi-resource allocation in shared SCIs. 2.1 Optimal Joint Multiple Resource Allocation [5] This study models the cloud as allocating the required amount of multiple types of resources simultaneously from a common resource pool for a certain period of time for each request. The allocated resources are used by a single request and not shared. The study thereafter proposes a new resource allocation method that considers only identified resources in the selection of a center. This method adopts Best-Fit approach and aims to reserve as much as possible for future requests that may require a larger size of processing. In addition, the proposed method aims to reduce the possibility that the deadlock situation will occur. This in turn improves the current allocation method which is solely based on one type of resource and uses Round-Robin. Round-Robin does not consider the situation of both processing ability and bandwidth in the resource allocation. The proposed method tends to reduce the request loss probability in comparison to the use of Round-Robin. 2.2 Multi-dimensional Resource Allocation [6] The starting point of this study is the following remark: Cloud resource allocation is typically restricted to optimizing up to three objectives functions which are cost, makespan and data locality. The study then proposes an optimization that is done with four objectives: makespan, cost, data locality and the satisfaction level of the user. To evaluate the proposed allocation scheme the study proposes the following multi-cloud workflow framework architecture developed using the CloudSim simulation toolkit. Figure 2.1 shows the multi-cloud workflow framework architecture based on CloudSim. A closer look at the inner-workings of the proposed framework reveals the following details: 3 4 CHAPTER 2. RELATED WORK Figure 2.1: Multi-cloud workflow framework architecture based on CloudSim [6] The workflow engine receives in a first step a workflow description and the SLA requirements from the user. After parsing the description, the workflow engine applies different clustering techniques to reduce the number of workflow tasks. the match-maker selects the cloud resources that can fit the user given requirements by applying different matching policies. After that all the requested virtual machines (VMs) and cloud storage are deployed on the selected clouds and the workflow engine transfers the input data from the client to the cloud storage and then starts to release the workflow tasks with respect to their execution order. As presented above the proposed framework aims for the optimization of four objectives functions which represents an improvement over existing systems which can optimize only three objective functions. 2.3. DOMINANT RESOURCE FAIRNESS [14] 5 Figure 2.2: Number of large jobs completed for each allocation scheme in comparison of DRF against slot-based fair sharing and CPU-only fair sharing [14] 2.3 Dominant Resource Fairness [14] This study proposes Dominant Resource Fairness (DRF), a generalization of max-min fairness to address the problem of multiple resource types allocation. The following scenario is provided as an example of a situation requiring a fair allocation of multiple resources: A system consisting of 9 CPUs and 18 GB RAM, and two users: user A runs tasks that require h1 CPUs, 4 GBi each, and user B runs tasks that require h3 CPUs, 1 GBi each. What constitutes a fair allocation policy for this case using max-min fair allocation policy for multiple resources and heterogeneous requests? To address this gap using heterogeneous requests, the study proposes DRF. The study includes the following important properties into the DRF scheme 1. Sharing incentive: Each user should be better off sharing the cluster, than exclusively using her own partition of the cluster. Consider a cluster with identical nodes and n users. Then a user should not be able to allocate more tasks in a cluster partition consisting of n1 of all resources. 2. Strategy-proofness: Users should not be able to benefit by lying about their resource demands. This provides incentive compatibility, as a user cannot improve her allocation by lying. 3. Envy-freeness: A user should not prefer the allocation of another user. This property embodies the notion of fairness. 4. Pareto efficiency: It should not be possible to increase the allocation of a user without decreasing the allocation of at least another user. This property is important as it leads to maximizing system utilization subject to satisfying the other properties. The study presents a basic DRF algorithm and a weighted DRF scheduling algorithms: Figure 2.2 shows the proposed DRF schemes performs better than Fair-sharing and CPU-only fair sharing. As presented above DRF tends to address the problem of heterogeneous multi-resource sharing. 2.4 Multi-Resource Allocation [7] This study notes that fairness can be quantified with a variety of metrics. Moreover, different notions of fairness including proportional and max-min fairness can be achieved through various techniques. However, the study contends that when it comes to allocating multiple types of resources, there has been much less systematic study. Indeed, it is unclear what it means to say that a multi-resource allocation is fair. For example, datacenters allocate different resources (memory, CPUs, storage, bandwidth) to competing users with 6 CHAPTER 2. RELATED WORK Figure 2.3: Example of multi-resource requirements in data-centers [7] different requirements. One such user might have computational jobs requiring more CPU cycles than memory, while another might have the opposite requirements. Using Figure 2.3 the paper presents the following examples to highlight several multi-resource allocation fairness metric : User 1 requires 2 GB of memory and 3 CPUs per job, while user 2 needs 2 GB of memory and 1 CPU per job. There is a total of 6 GB of memory and 4 CPUs. Many allocations might be considered fair in this example: Should users be allocated resources in proportion to their resource requirements? Or should they be allocated resources so as to process equal numbers of jobs? Moreover, the paper cites datacenters that sell bundles of CPUs, memory, storage and network bandwidth as examples of multi-resource allocation problem. As a result of the above, the paper then presents some mathematical functions that could help to define fairness. These functions include FDS (Fairness on Dominant Shares) and GFJ (Generalized Fairness on Jobs) which are two families of fairness functions that could be used for multi-resource allocation. Further mathematical fairness theory is presented in Appendix A including the following mathematical axioms: The axiom of continuity, the axiom of Saturation, the axiom of Partition and the axiom of Starvation. By developing the FDS and GFJ functions, this project aims to contribute to the formalization of the fairness theory. 2.5 Limitation In this work we design, implement and test a cloud simulator infrastructure based on the Openstack cloud technology. Moreover, we implement a load simulator using a fair share metric. The cloud infrastructure we design and implement as well as the load simulator are meant to support further research on fairness in cloud resource allocation mechanisms at the CSG. Although the load simulator is integrated into the cloud infrastructure simulator, it is a distinct and separate component. Chapter 3 SCI Architectures Concepts In this chapter we present the architectural overview of three of the main SCI infrastructure types: cluster, grid and cloud [1]. This will serve as the foundation of the subsequent discussion helping to put in perspective the use of the term cloud SCIs with its related architecture. 3.1 Cluster Computing Infrastructure [1] A cluster as presented in Figure 3.1 is a group of computers with a direct network interconnect, centralized management, and distributed execution facilities. In a cluster the centralized management include: Authorization and authentication, shared filesystem, application execution and management. The distributed execution facilities include: Execution of jobs, multiple units of the same parallel job may reside on separate resources. One of the main usage of cluster computing is batch processing. With respect to the present work, the following cluster functional components are of interest: Resource Manager: Monitors compute infrastructure, launches and supervises jobs, clean up after termination; Job manager/scheduler: Allocates resources and time slots (scheduling); Workload Manager: Policy and orchestration of jobs: fair share, workflow orchestration, QoS, SLA. Some standard scheduling schemes used in cluster systems include: First Come, First Served; Shortest Job First, Priority-based scheduling, Fair-share scheduling. In the context resource sharing the two scheduling schemes of interest are the Priority-based scheduling and the Fair-share scheduling. Following are further characteristics of the scheduling schemes used in clusters. Priority-based scheduling: In this scheme, the priority function is usually the weighted sum of various contributions including: Requested run time: how much historical information is kept and used for calculating resource usage; Number of processors; Wait time in queue; Recent usage by same user/group (Fair share); Administrator set QoS. Fair-share scheduling: Fair-share scheduling assigns higher priorities to users/groups that have not used all of their resource quota (usually expressed in CPU time). It uses a variety of parameters such as: Window length: how much historical information is kept 7 8 CHAPTER 3. SCI ARCHITECTURES CONCEPTS Figure 3.1: Overview of a cluster computing architecture [1]. Fairness can be introduced by modifying the scheduler and resource allocation manager and used for calculating resource usage; Interval: how often is resource utilization computed; Decay: weights applied to resource usage in the past (e.g. 2 hours of CPU time one week ago might weigh less than 2 hours of CPU time today. Possibility of introducing single-resource allocation fairness or multi-resource allocation fairness in clusters: In the cluster all users’ jobs compete for the same physical resources. These users jobs are centrally submitted and managed by the cluster scheduler and the resource manager. In the cluster SCI, introducing a novel single-resource or multi-resource fairness scheme would imply modifying the behaviour of both the scheduler and the resource manager to achieve the desired fairness outcome. It should be noted that some of the cluster schedulers already implement some sort of fairness. However, in order to complement existing fairness schemes or to introduce a novel scheme of fairness, there is a need to modify both the scheduler and the resource manager. An example of a cluster is the UZH main HPC cluster Schroedinger. 3.2 Grid Computing Infrastructure [1] A computational grid is a hardware and software infrastructure that provides dependable, consitent, pervasive and inexpensive access to high-end computational capabilities. From a system viewpoint a grid can be viewed as an aggregation of computational clusters for execution of a large number of batch jobs. As shown in Figure 3.2 it is typically 3.3. CLOUD COMPUTING INFRASTRUCTURE [1] 9 Figure 3.2: Overview of a grid computing architecture [1]. Fairness can be introduced at the domain or cluster level geographically distributed and resources come from multiple domains (or clusters). Using the discovery service, the client host selects one cluster and submits a job there. Then he periodically polls for status information. From the foregoing, resource allocation considerations in the grid are similar to those described in the cluster with the supplementary step of using a directory service to locate a cluster or domain. The reason being that the resource allocation decisions are taken locally by each cluster independently of the others. The main purpose of the discovery service is to publish clusters available in the grid to the user. An example of a Grid is the Swiss Multi-Science Computing Grid (SMSCG). 3.3 Cloud Computing Infrastructure [1] Cloud computing is a model for enabling convenient on-demand network access to a shared pool of virtualized, configurable computing resources (e.g. networks, servers, storage, applications and services) that can be rapidly provisioned over the internet and released with minimal management effort or service provider interaction. In a cloud computing infrastructure virtualization is the foundation for resource allocation and sharing. Thus, to study resource allocation in cloud computing environments requires an in-depth understanding of the fundamentals of virtualization. The fact is that resource allocation and sharing mechanisms in cloud SCIs are not applicable to SCIs where virtualization is not central to the allocation mechanisms. These include traditional cluster and grid SCIs where virtualization technology is not the mechanism on which resource allocation is based. Cloud computing can be defined as internet-based computing in which large 10 CHAPTER 3. SCI ARCHITECTURES CONCEPTS Figure 3.3: Overview of a cloud SCI architecture showing an Openstack cloud [1] groups of remote servers, storage arrays and network equipment are networked, virtualized, dynamically provisioned and orchestrated via a cloud operating system allowing the creation of a pool of compute, storage and network resources which could be allocated to a user’s VMs on demand. We will examine in detail resource allocation in cloud SCIs in a future section.The important components of a modern day cloud infrastructure architecture include the following: Compute: Responsible for the instantiation of VMs. Identity: Responsible for authentication and authorization. Image Repository: Use to locate and retrieve images used in instantiating VMs. Storage: Block or Object storage used to store images and users data. Network: Responsible to define a complete complex user network using Software Defined Networking (SDN). Telemetry: Use to store, process and retrieve the cloud metrics. An overview of a cloud infrastructure architecture is shown in Figure 3.3. Chapter 4 Virtualization Concepts In this chapter we define hardware virtualization, VMM, CPU virtualization, memory virtualization and I/O virtualization. 4.1 Hardware Virtualization [4] The term virtualization broadly describes the separation of a service request from the underlying physical delivery of that service. With x86 hardware virtualization, a virtualization layer is added between the hardware and operating system as shown in Figure 4.1. The virtualization layer (hypervisor) is the software responsible for hosting and managing all virtual machines on a host. This virtualization layer allows multiple operating system instances to run concurrently within virtual machines on a single computer, dynamically partitioning and sharing the available physical resources such as CPU, storage, memory and I/O devices. For standard x86 systems, virtualization approaches use either a hosted or a hypervisor architecture. Figure 4.1 shows a hosted virtualization approach or type 2 hypervisor. A hosted architecture installs and runs the virtualization layer as application on top of an operating system. Figure 4.2 shows a hypervisor virtualization approach using type 1 hypervisor. Type 1 hypervisor is mostly simply referred to as hypervisor or bare-metal hypervisor as it sits directly on the hardware. It installs the virtualization layer directly on a standard x86 hardware. Since it has direct access to the hardware resources rather than going through an operating system, a type 1 hypervisor is more efficient than a type 2 hypervisor and delivers greater scalability, robustness and performance. Most production clouds almost exclusively use type 1 hypervisors. Figure 4.3 shows a different view of x86 hardware virtualization using type 1 hypervisor. Modern clouds mostly use type 1 hypervisors for productive environments. However, in the cloud simulator we designed and implemented in the framework of the present work, we used a type 2 hypervisor or a hosted virtualization (Oracle Virtualbox) solution to create the controller and the compute nodes. Within the compute nodes we use the QEMU hypervisor which is an emulated type 1 hypervisor (emulated KVM hypervisor). 11 12 CHAPTER 4. VIRTUALIZATION CONCEPTS Figure 4.1: x86 Virtualization Overview: A hosted virtualization or type 2 hypervisor: The hypervisor runs on an OS. Example: Oracle Virtualbox, VMware flash player [4] Figure 4.2: x86 Virtualization Overview: A hypervisor virtualization or type 1 hypervisor: The hypervisor runs on bare-metal. Most widely used in productive clouds. Example: KVM, XEN, VMware ESXi, Microsoft Hyper-V [4] 4.2. ROLE OF VMM [3] 13 Figure 4.3: x86 Virtualization Overview: A virtualization layer is added between the hardware and the operating system [3] 4.2 Role of VMM [3] Within any given type 1 hypervisor architecture there exists a key component sometimes called a Virtual Machine Monitor (VMM) that implements the virtual machine hardware abstraction and is responsible for running a VM. Each VMM has to partition and share the CPU, memory and I/O devices of the physical host and present them to the VM as full virtualized resources. Figure 4.4 shows an overview of a VMM within a hypervisor. Figure 4.4: VMM architecture Overview: Each VMM partition physical resources and present them to VMs as virtual resources [3] 14 CHAPTER 4. VIRTUALIZATION CONCEPTS Figure 4.5: x86 privilege level architecture with no virtualization implemented [3] 4.3 CPU Virtualization [3] The x86 computer systems are designed to run directly on the bare-metal hardware, so they naturally assume they fully own the computer hardware. The x86 architecture offers four levels of privileges known as Ring 0, 1, 2 and 3 to operating systems and applications to manage access to the computer hardware. While user level applications generally run in Ring 3, operating systems must have direct access to hardware and hence must run in Ring 0. Figure 4.5 shows an overview of an x86 Architecture hardware access privilege levels with no virtualization implemented. Virtualizing an x86 processor therefore poses the challenge of placing a virtualization layer under the OS (which expects to be run in the most privileged Ring 0). The virtualization layer will in turn be responsible for creating VMs and their hardware (resource provisioning or assignment). To address these challenges several CPU virtualization technologies have been developed and include: binary translation, paravirtualization and hardware assisted virtualization. These virtualization technologies are implemented in hypervisors running in modern-day clouds. Following is a brief presentation of these techniques: Full virtualization using binary translation: This technique does the translation of the guest OS kernel code to replace nonvirtualizable instructions with new sequences of instructions that have the intended effect on the virtual hardware. Meanwhile user level code is directly executed on the processor of the hypervisor to achieve a good level of performance. OS assisted virtualization also called paravirtualization: In this approach the OS kernel of the guest OS is modified to replace non-virtualizable instructions with hypercalls that communicate directly with the virtualization layer (hypervisor). The hypervisor also provides hypercall interfaces for other critical kernel operations such as memory management, interrupt handling and time keeping. Hardware Assisted Virtualization: In hardware assisted virtualization, virtualization technologies such as Intel VT-x and AMD’s AMD-V are built right inside the CPU chipset through a new CPU execution mode called the root mode. 4.4. MEMORY VIRTUALIZATION [3] 15 Figure 4.6: x86 Memory Virtualization. The VMM is responsible for mapping the VM physical memory to the host physical memory [3] 4.4 Memory Virtualization [3] Besides CPU virtualization, the x86 memory must also be virtualized. This involves sharing the physical system memory and dynamically allocating it to virtual machines. Virtual machine memory virtualization is very similar to the virtual memory support provided by modern operating systems such as Linux. Applications see a contiguous address space that is not necessarily tied to the underlying physical memory in the system. The operating system keeps the mappings of virtual page numbers to physical page numbers store in page tables. All modern x86 CPUs include a memory management unit (MMU) and a translation lookaside buffer (TLB) to optimize virtual memory performance. To run multiple virtual machines on a single system, another level of memory virtualization is required. One has to virtualize the MMU to support the VM (guest OS). The guest OS continues to control the mapping of virtual addresses to the guest memory physical addresses, but the guest OS cannot have direct access to the actual physical machine memory. The VMM is responsible for mapping guest physical memory to the actual physical machine memory, and it uses shadow page tables to accelerate the mappings. The VMM uses TLB hardware to map the virtual memory directly to the machine memory to avoid the 2 levels of translation on every access. When the guest OS changes the virtual memory to physical memory mapping, the VMM updates the shadow page tables to enable a direct lookup. MMU virtualization creates some overhead which can be mitigated by using hardware assisted virtualization. Figure 4.6 shows an overview of memory virtualization. 4.5 Device and I/O Virtualization [3] In addition to CPU and memory virtualization, x86 hardware devices and I/O must also be virtualized. Virtualizing the x86 devices involves managing the routing of I/O requests between virtual devices and the shared physical hardware. In most modern-day devices, I/O virtualization is done via software in contrast to a direct pass-through to the hardware. This approach enables a set of new features and simplified management. For example with networking, creating virtual NICs (vNICs) and virtual switches allows the creation of virtual networks. Virtual networks consume no bandwidth on the physical 16 CHAPTER 4. VIRTUALIZATION CONCEPTS Figure 4.7: x86 Device and I/O Virtualization. The Hypervisor uses software to emulates virtual devices and I/O and translate VMs requests to the system hardware [3] network as long as the traffic is not destined to a VM running on a different physical host. As such sharing bandwidth on modern-day clouds does not always involve sharing the physical bandwidth. This becomes necessary only when VMs traffic must exit the compute host (hypervisor). The hypervisor uses software to virtualize the physical hardware and presents each virtual machine with a standardized set of virtual devices. These virtual devices effectively emulate well-known hardware and translate the virtual machine requests to the system hardware. Figure 4.7 shows an overview of devices and I/O virtualization. 4.6 Hypervisor Technologies in Cloud SCIs Most modern-day hypervisors typically implement a combination of virtualization technnologies while offering support for others. The often implemented technologies include full virtualization using binary translation, OS-assisted virtualization or paravirtualization and hardware-assisted virtualization using chipset virtualization extensions such as intel VT-x and AMD’s AMD-V. In the following sections, we briefly present the 4 main modern-day hypervisors used in clouds. These include KVM, VMware ESXi, XEN and Microsoft Hyper-V. KVM Hypervisor: It is one of the mostly used hypervisor in clouds. The KVM hypervisor uses a combination of hardware assisted virtualization and paravirtualization. Hardware assisted virtualization is used for the core CPU and memory virtualization by leveraging the Intel and AMD processors virtualization extensions. The processors extensions enable running fully isolated virtual machines at native hardware speeds for some workloads. KVM use of paravirtualization is supported for device drivers to improve I/O performance. In KVM the paravirtualized drivers support is implemented in the virtio modules. 4.6. HYPERVISOR TECHNOLOGIES IN CLOUD SCIS 17 VMware ESXi: While VMware ESXi implements full virtualization using binary translation, it also takes full advantage of hardware-assisted virtualization in modern chipset and paravirtualization in the form of paravirtualized drivers to achieve higher virtualization performance. Rather than implementing code to emulate real-world I/O devices, VMware ESXi writes code for simpler virtual devices pratical for all purposes and yet achieving greater levels of performance. These pravirtualized drivers ship in the form of VMWare Tools. XEN Hypervisor: The XEN Hypervisor uses paravirtualization and supports hardwareassisted virtualization. Paravirtualization requires modification of the OS kernel to support the guests. This implies that guest OSes running on the XEN hypervisor must be virtualization-aware. To address this disadvantage for the Linux guest OS, most recent Linux distributions have built-in drivers to run unmodified on XEN. XEN achieves a lower virtualization overhead because the operating system and hypervisor work together more efficiently, without the overhead imposed by the emulation of the system’s hardware resources. This can allow virtual disk and virtual network cards to operate at near-native hardware performance. Main differences between XEN and KVM: Xen is an external hypervisor and as such it assumes control of the physical machine and divides resources among guests. On the other hand, KVM is part of Linux and uses the regular Linux scheduler and memory management. This means that KVM is much smaller and simpler to use for example KVM can swap guests to disk in order to free RAM. While KVM only runs on processors that supports the Intel VT and AMD-V instruction extensions, Xen also allows running modified guest OS on non hardware-assisted CPU. Main differences between KVM and QEMU:It should be noted that KVM and QEMU are two related hypervisors sometimes called KVM-QEMU. While the QEMU hypervisor uses emulation, KVM uses processor extensions for virtualization. QEMU allows a user to use a VM as a compute host. Thus, QEMU is the hypervisor used in the cloud simulator infrastructure we implemented in this project. Microsoft Hyper-V Hypervisor: Just like the KVM hypervisor and XEN, Microsoft Hyper-V uses Hardware-assisted virtualization technology. As such any hardware on which Hyper-V is run requires a processor with HVM (Hardware Virtualization Extensions) instruction sets such as Intel VT-x and AMD-V. It should be noted that most recent x86 processors are built with these extensions. Also like KVM, Hyper-V supports paravirtualization to improve I/O performance. 18 CHAPTER 4. VIRTUALIZATION CONCEPTS Chapter 5 Resource Allocation in Clouds In this chapter we look at how cloud resources are bundled, the role of the cloud scheduler in the resource allocation process, the role of the compute host (hypervisor) and explain the cloud consolidation ratio. Next we present resources overcommitment and reclaiming techniques used in cloud SCIs. 5.1 How Cloud Resources are bundled In a cloud SCI, resources are bundled in flavors (resource templates). These flavors are associated to VMs at creation time. After this association is successfully done, the newly created VM inherits the resources bundled in the flavor. Flavors encapsulate the maximum resources intended for the VM. This include the maximum RAM size, the maximum number of vCPUs and the maximum disk size. VMs obtain other resources at creation time that are not part of a flavor. For example network bandwidth is not part of a flavor but a VM will obtain a vNIC (Virtual Network Interface Card) for their networking needs at creation time. Some hypervisors allow the resizing of a VM to make room for more resources in the VM. This resizing is a permanent operation and from the resizing point onward, the VM will be bound by its new resource limits. Resource control in the cloud is done at two layers: At the cloud scheduler layer and at the compute host layer. 5.2 The Role of the Cloud Scheduler In the cloud SCIs, the role of the cloud scheduler can be summarized as follows: Given a request for a VM from a cloud user, find a suitable compute host in the cloud that has enough resources to create and host the user’s VM. If you do not find any suitable host in the cloud able to satisfy the user’s VM creation request, deny this request and generate an error message. These mechanisms imply that the user request’s validity, authentication and his authorization limits (quotas) have been checked. Further the placement of workloads or jobs in VMs is done at the VM layer by the user. As the cloud 19 20 CHAPTER 5. RESOURCE ALLOCATION IN CLOUDS technology evolves, new tools are being developed to help the user automate placement of workloads to his VMs in the cloud. In the Openstack cloud technology one such tool is Heat. But even with such tools the users workload remain tied to the resources available in their VMs. The cloud scheduler main responsibility is to orchestrate the dynamic placement of VMs to compute hosts in the cloud. This has led to new challenging research use-cases in clouds such as the optimization of VMs placements in clouds using a number of dimensions such as locality. 5.3 The Role of the Hypervisor or Compute Host The hypervisor or compute host has the responsibility to use its virtualization technology to manage resource allocation, reallocation as well as resource reclaiming from its running VMs. To this end it uses its scheduler. This means that at runtime allocation of resources to the VMs themselves is handled by the compute host scheduler. At this stage the cloud scheduler has already determined that the hypervisor can host the VM and hence the VM has already been placed on the hypervisor. When a user starts a huge workload on small-size VM (VM with few resources), the compute host will not increase the VM size to accomodate the huge workload. To the contrary, this will result in a slow processing or even to the failure of such processing in case of extreme resource scarcity. The compute host will always allocate resources to a VM under the constraint that the amount of these resources can never be more than the VM size. Assuming the compute host has already allocated all its resources to some VMs. What will happen when a VM equally being hosted by the same compute host request resources? The compute host will gradually reclaim the resources from other VMs to allocate to the new requesting VM under the constraint that no existing running job in other VMs should fail. In the rare case where reclaiming resources from running VMs could lead to the failure of already running workloads, the hypervisor will not satisfy the new request for resources for a period of time leading to further waiting time on the part of the requesting VM [2]. At this point the cloud scheduler can also intervene to automatically and transparently place the requesting VM to another compute host with enough resources if such host exists in the cloud. 5.4 Cloud Consolidation Ratio A key benefit of virtualization is the ability to consolidate multiple workloads onto a single computer system . It enables users to consolidate virtual hardware on less physical hardware resources, thereby efficiently using hardware resources. To achieve higher utilization rate, higher VMs density and thus a better cloud consolidation ratio, cloud SCIs make use of resource overcommitment [2]. For instance the amount of overcommitted memory in a cloud is the amount of memory the cloud pretends to have. This amount is usually higher than what the cloud actually has which is the amount of the physical resource itself. The consolidation ratio is a measure of the virtual hardware that has been placed 5.5. MEMORY OVERCOMMITMENT TECHNIQUES [13] 21 on physical hardware. For example if a cloud has a consolidation ratio of 2 it means the overall number of VMs created in the cloud have twice the amount of physical resources available in the cloud. A higher consolidation ratio typically indicates greater efficiency. We can infer from the foregoing that resources assigned to a VM at creation time are not actually allocated at creation time, they are promised to the VM at that time. They are allocated to the VM when it has to actually process some workload. The idea behind overcommitment is that all VMs will seldom simultaneously request their maximum resources. The advantages modern clouds derive from a higher consolidation ratio include savings in power consumption, capital expense, and administration costs. The degree of savings depends on the ability to overcommit hardware resources such as memory, CPU cycles, I/O, and network bandwidth. It should be noted that the same techniques used to overcommit resources in clouds are equally used to reclaim resources. We next consider some memory and CPU overcommitment techniques [2],[3],[4]. 5.5 Memory Overcommitment Techniques [13] Memory overcommitment enables a higher consolidation ratio in a hypervisor. Using memory overcommitment, users can consolidate VMs on a physical machine such that physical resources are utilized in an optimal manner while delivering good performance. Memory ballooning is a technique in which the host instructs a cooperative guest to release some of its assigned memory so that it can be used for another purpose. This technique can help refocus memory pressure from the host onto a guest. Kernel Same-page Merging (KSM) uses a kernel thread that scans previously identified memory ranges for identical pages, merges them together, and frees the duplicates. Systems that run a large number of homogeneous virtual machines benefit most from this form of memory sharing. Memory Swapping. Using this technique a hypervisor follows the traditional concepts of virtual memory overcommitment used traditionally in Linux Systems. As such memory pages requested by a process are not allocated until they are actually used. Using the Linux page cache, multiple processes can save memory by accessing files through shared pages; as memory is exhausted, the system can free up memory by swapping less frequently used pages to disk. These techniques can result in a substantial difference between the amount of memory that is allocated and the amount actually used by VMs leading to higher consolidation. In KVM for example VMs are seen by the Linux host simply as Linux processes. 5.6 CPU Overcommitment Techniques [17] A virtual CPU assigned to a VM equates to a physical core in the hypervisor, but when the VM attempts to process something, it can potentially run on any of the cores that happen to be available at that moment in the Hypervisor. The hypervisor scheduler handles this, and the VM is not aware of it. Also one can assign multiple vCPUs to a VM which allows it to run concurrently across several cores of the hypervisor as long as 22 CHAPTER 5. RESOURCE ALLOCATION IN CLOUDS the process being run supports some form of parallelism (simultaneous usage of multiple cores). Cores are shared between all VMs as needed, so for example we could have a 4-core hypervisor and 10 VMs running on it with 2 vCPUs assigned to each. VMs share all the cores in the hypervisor quite efficiently as determined by the hypervisor scheduler leading to the maximum use of under-utilized resources and a higher consolidation ratio. However, if the VMs are so busy that they have to contend for CPU time, the outcome is that VMs may have to wait for CPU time. Although this is transparent to the VMs and managed by the hypervisor scheduler, it results in processing delays in VMs. At the cloud level the cloud scheduler can be configured in advance to limit the amount of CPU and memory overcommittment in the entire cloud. This requires prior knowledge of the cloud resources needs. 5.7 Hypervisors Resource Allocation Techniques [2] KVM hypervisor: VMs are regular processes in KVM, and therefore standard memory management techniques like swapping apply. For Linux guests, a balloon driver is installed and it is controlled by the host via the balloon monitor command. Some hosts also support kernel shared page merging (KSM). KVM requires hosts and guests OSes to support memory overcommitment. A guest OS that doesn’t support memory overcommitment cannot run on KVM. VMware ESXi: ESXi works for all guest OSes. In addition to ballooning it also uses content-based page sharing and memory compression. This approach improves VM performance as compared to the use of only ballooning and hypervisor-level swapping. Xen hypervisor: Xen uses a mechanism called dynamic memory control (DMC) to implement memory reclamation. It works by proportionally adjusting memory among running VMs based on predefined minimum and maximum memory. VMs generally run with maximum memory, and the memory can be reclaimed via a balloon driver when memory contention in the host occurs. However, Xen does not provide a way to overcommit the host physical memory, hence its consolidation ratio is largely limited. Xen provides a memory management mechanism to manage all host idle memory and guest idle memory. The idle memory is collected into a pool and distributed based on the demand of running VMs. This approach requires the guest OS to be paravirtualized, and only works well for guests with non-concurrent memory pressure. Microsoft Hyper-V: Hyper-V uses dynamic memory for supporting memory overcommitment. With dynamic memory, each VM is configured with a small initial RAM when powered on. When the guest applications require more memory, a certain amount of memory will be hot-added to the VM and the guest OS. When a host lacks free memory, a balloon driver will reclaim memory from other VMs and make memory available for hot adding to the demanding VM. In rare and restricted scenarios, Hyper-V will swap VM memory to a host swap space. Chapter 6 Search for a suitable Cloud Simulation Tool In this chapter, we present a comparison overview between CloudSim [8] and Openstack [12] followed by a detailed Openstack presentation. The detailed Openstack presentation includes the Keystone identity service, the Nova compute service and the Glance image service. These are the most three critical services used in the cloud simulator. They are also directly related to the implementation of the load simulator that include a Nova reader and a Keystone reader. 6.1 Comparison of CloudSim and Openstack [8] The layered cloud computing architecture is shown in Figure 6.1 [8]. To research the complex mechanisms underlying the cloud SCIs infrastructures including the resource allocation mechanisms along with its fairness schemes, we need an adequate cloud simulator. The flexibility offered by a simulator is the ability to design, implement and test without affecting any production environment. Further one can improve an initial design over time without worrying about the cost related to using a productive cloud. In relation to the present project, an adequate cloud simulator tool should therefore have the functionality to enable the creation and management of cloud components. This should include the ability of creating and simulating IaaS components such as compute host, VMs and tenants. Ideally, such a tool should be opensource, have extensive documentation and enjoy wide acceptance both in the academic research community and in the industry research Labs. While there are several tools aiming at simulating real-world clouds, we restricted our focus on two such tools: CloudSim and Openstack. We investigated both tools using the afore-mentioned desirable qualities. Table 6.1 summarizes the results of our investigation. Although both CloudSim and Openstack can be used as cloud simulators, the following four critical reasons have determined our final decision to adopt Openstack. These reasons can be inferred from Table 6.1 : 23 24 CHAPTER 6. SEARCH FOR A SUITABLE CLOUD SIMULATION TOOL Figure 6.1: Layered cloud computing architecture [8] Table 6.1: Comparison of CloudSim and Openstack cloud Simulation tools Parameter CloudSim OpenStack Platform SimJava Linux License type Opensource Opensource Speed of execution Moderate, built on Java Fast, built on python Limited Worldwide adoption Extent of implementation Uni Melbourne HP, MIT, Berkeley,CERN Microsoft IBM, Cisco, Google, NASA european universities Microsoft Physical model None, no Cloudsim Full, productive cloud technology Openstack technology Documentation Limited Extensive Developer guides Admin, Architect Developer guides Ease of creating IaaS Limited, no hypervisor Extensive, supports Components technology support KVM, Hyper-V, ESXi Integration into None, no such By Default Openstack cloud integration exists 6.2. OPENSTACK ARCHITECTURE OVERVIEW [12] 25 1. CloudSim remains a cloud simulator: It has no associated cloud technology. Openstack has become not only the standard in cloud technology, but it can also be used as a cloud simulator. 2. Ease of Integration to Openstack: We were specifically required to look for a cloud simulator that easily integrates with the Openstack technology. As such the Openstack cloud simulator is in-built in the Openstack cloud technology 3. Available Documentation: While CloudSim has some available documentation, the Openstack documentation including the User guides, the Architectures guides and the Development guides is much more extensive. 4. Widespread use and tool for the future: While CloudSim has been around for several years and widely used, the recent developments has made Openstack the industry standard cloud technology. While most industry research labs have embraced Openstack, we are convinced the academic community will follow soon after overcoming the initial steep learning curve. Openstack appears to be much more complex than CloudSim. However, this complexity is due to the fact that Openstack has become the defacto cloud technology and not just a cloud simulator tool. Most of the cloud services built therein can also be used for simulative purposes. This makes Openstack a very powerful cloud simulator tool with much more functionalities than CloudSim. 6.2 Openstack Architecture Overview [12] The OpenStack project as a whole is designed to deliver a massively scalable cloud operating system. To achieve this, each of the component or service is designed to work with other components to provide a complete Infrastructure as a Service (IaaS). This integration is facilitated through public application programming interfaces (APIs) that each component offers and that other components and users alike consume. While these APIs allow each of the services to use another service, it also allows a developer to modify any service transparently to the user as long as the APIs remain unchanged. These APIs are both available to other cloud services and to the cloud end-users/tenants. The openstack release used in this project is the icehouse release. Figure 6.2 shows the architecture overview of the Openstack cloud infrastructure. After reviewing several sources, we found it beneficial to present the overview of each service using both Wikipedia [11] and Openstack.org [12] perspective. Compute (Nova) is the control layer of the Infrastructure-as-a-Service (IaaS) cloud computing platform. It allows the control over instances and networks, and allows the managed and control access to the cloud through users and projects. The Nova compute service does not include virtualization software. Instead, it defines drivers that interact with underlying virtualization mechanisms that run in the hypervisor, and exposes functionality over a web-based API [12]. Compute (Nova) is a cloud computing fabric controller, which is the main part of an IaaS system. It is designed to manage and automate 26 CHAPTER 6. SEARCH FOR A SUITABLE CLOUD SIMULATION TOOL Figure 6.2: Openstack cloud architecture overview [12] pools of computer resources and can work with widely available virtualization technologies, as well as bare metal and high-performance computing (HPC) configurations. KVM, Xen , Hyper-V and Linux container technology such as LXC are all supported [11]. Identity Service (Keystone) performs the following functions: Tracking users and their permissions; providing a catalog of available services with their API endpoints. When implementing the Identity Service, one must register each service to be made available in the cloud infrastructure. Identity service can then track which Openstack services are available and where they are located on the network [12]. Identity Service(Keystone) provides a central directory of users mapped to the Openstack services they can access. It acts as a common authentication system across the cloud operating system and can integrate with existing backend directory services like LDAP. It supports multiple forms of authentication including standard username and password credentials, token-based systems and AWS-style (Amazon Web Services) logins. Additionally, the catalog provides a queryable list of all of the services deployed in an Openstack cloud in a single registry. Users and third-party tools can programmatically determine which resources they can access [11]. Networking (Neutron) is a system for managing networks and IP addresses. Openstack Networking ensures the network is not a bottleneck or limiting factor in a cloud deployment, and gives users self-service ability, even over network configurations. Users can create their own networks, control traffic, and connect servers and devices to one or more networks. Administrators can use software-defined networking (SDN) technology like OpenFlow to support high levels of multi-tenancy and massive scale. Openstack Net- 6.2. OPENSTACK ARCHITECTURE OVERVIEW [12] 27 working provides an extension framework that can deploy and manage additional network services such as intrusion detection systems (IDS), load balancing, firewalls, and virtual private networks (VPN) [11]. Networking (Neutron) allows the creation and attachment of interface devices managed by other Openstack services to networks. Plug-ins can be implemented to accommodate different networking equipment and software, providing flexibility to Openstack architecture and deployment [12]. Object Storage (Swift) is a scalable redundant storage system. Objects and files are written to multiple disk drives spread throughout servers in the data center, with the Openstack software responsible for ensuring data replication and integrity across the storage cluster. Storage clusters scale horizontally simply by adding new servers. Should a server or hard drive fail, Openstack replicates its content from other active nodes to new locations in the cluster. Because Openstack uses software logic to ensure data replication and distribution across different devices, inexpensive commodity hard drives and servers can be used [11]. Object Storage (Swift) is a multi-tenant object storage system. It is highly scalable and can manage large amounts of unstructured data at low cost through a RESTful HTTP API [12]. Block Storage (Cinder) provides persistent block-level storage devices for use with Openstack compute instances. The block storage system manages the creation, attaching and detaching of the block devices to servers. Block storage volumes are fully integrated into Openstack Compute and the Dashboard allowing for cloud users to manage their own storage needs [11]. Block Storage (Cinder) adds persistent storage to a virtual machine. Block Storage provides an infrastructure for managing volumes, and interacts with Openstack Compute to provide volumes for instances. The service also enables management of volume snapshots, and volume types [12]. Image Service (Glance) provides discovery, registration, and delivery services for disk and server images. Stored images can be used as a template. It can also be used to store and catalog an unlimited number of backups. The Image Service can store disk and server images in a variety of back-ends, including Openstack Object Storage [11]. Image Service (Glance) is central to Infrastructure-as-a-Service (IaaS). It accepts API requests for disk or server images, and image metadata from end users or Openstack Compute components. It also supports the storage of disk or server images on various repository types, including Openstack Object Storage [12]. Telemetry (Ceilometer) provides a single point of contact providing all the counters across all current Openstack components. The delivery of counters is traceable and auditable, the counters must be easily extensible to support new projects, and agents doing data collections should be independent of the overall system [11]. The Telemetry module performs the following functions: efficiently collects the metering data about the CPU and network costs; collects data by monitoring notifications sent from services or by polling the infrastructure; configures the type of collected data to meet various operating requirements. It accesses and inserts the metering data through the REST API; expands the framework to collect custom usage data by additional plug-ins; produces signed metering messages that cannot be repudiated [12]. Dashboard (Horizon) provides administrators and users a graphical interface to access, provision, and automate cloud-based resources. The design accommodates third party products and services, such as billing, monitoring, and additional management tools. The dashboard is one of several ways users can interact with Openstack resources [11]. Dashboard (Horizon) is a modular Django web application that provides a graph- 28 CHAPTER 6. SEARCH FOR A SUITABLE CLOUD SIMULATION TOOL ical interface to Openstack services [12]. Orchestration (Heat) is a service to orchestrate multiple composite cloud applications using templates, through both an Openstack-native REST API and a cloud formationcompatible Query API [11]. The Orchestration module provides a template-based orchestration for describing a cloud application, by running Openstack API calls to generate running cloud applications. The software integrates other core components of Openstack into a one-file template system. The templates allow you to create most Openstack resource types, such as instances, floating IPs, volumes, security groups and users. This enables Openstack core projects to receive a larger user base. The service enables deployers to integrate with the Orchestration module directly or through custom plug-ins [12]. Database Service (Trove) is a database-as-a-service providing relational and nonrelational database engine [11]. The Database service (Trove) provides scalable and reliable cloud provisioning functionality for both relational and non-relational database engines. Users can quickly and easily use database features without the burden of handling complex administrative tasks. Cloud users and database administrators can provision and manage multiple database instances as needed. The Database service provides resource isolation at high performance levels, and automates complex administrative tasks such as deployment, configuration, patching, backups, restores, and monitoring [12]. 6.3 VMs provisioning in the Openstack Cloud [12] In this section, we present a detailed VM provisioning process description in the cloud using the Openstack cloud software as shown in Figure 6.3. It assumes all the cloud services involved and their respective software components have been successfully implemented. But in a user-specific implementation such as the cloud infrastructure implemented in this project, some cloud services deemed unnecessary such as Neutron for the network or Cinder for block storage have not been implemented. Instead we have implemented Nova legacy networks and Nova legacy storage in these cases. Following are the detailed steps of an instance provisioning process: 1. The dashboard or CLI gets the user credentials and authenticates with the Identity Service via REST API. The Identity Service authenticates the user with the user credentials, and then generates and sends back an auth-token which will be used for sending the request to other components through REST-call. 2. The dashboard or CLI converts the new instance request specified in launch instance or nova-boot form to a REST API request and sends it to nova-api. 3. nova-api receives the request and sends a request to the Identity Service for validation of the auth-token and access permission. The Identity Service validates the token and sends updated authentication headers with roles and permissions. 4. nova-api checks for conflicts with nova-database. nova-api creates initial database entry for a new instance. 6.3. VMS PROVISIONING IN THE OPENSTACK CLOUD [12] 29 5. nova-api sends the rpc.call request to nova-scheduler expecting to get updated instance entry with host ID specified. 6. nova-scheduler picks up the request from the queue. 7. nova-scheduler interacts with nova-database to find an appropriate host via filtering and weighing. nova-scheduler returns the updated instance entry with the appropriate host ID after filtering and weighing. nova-scheduler sends the rpc.cast request to nova-compute for launching an instance on the appropriate host. 8. nova-compute picks up the request from the queue. 9. nova-compute sends the rpc.call request to nova-conductor to fetch the instance information such as host ID and flavor (RAM, CPU, Disk). 10. nova-conductor picks up the request from the queue. 11. nova-conductor interacts with nova-database. nova-conductor returns the instance information. nova-compute picks up the instance information from the queue. 12. nova-compute performs the REST call by passing the auth-token to glance-api. Then, nova-compute uses the Image ID to retrieve the Image URI from the Image Service, and loads the image from the image storage. 13. glance-api validates the auth-token with keystone. nova-compute gets the image metadata. 14. nova-compute performs the REST-call by passing the auth-token to Network API to allocate and configure the network so that the instance gets the IP address. 15. neutron-server validates the auth-token with keystone. nova-compute retrieves the network info. 16. nova-compute performs the REST call by passing the auth-token to Volume API to attach volumes to the instance. 17. cinder-api validates the auth-token with keystone. nova-compute retrieves the block storage info. 18. nova-compute generates data for the hypervisor driver and executes the request on the hypervisor (via libvirt or API). In the following sections, we present an overview of the architecture of the Keystone Identity service, the Nova Compute service and the Glance Image service which are the 3 most critical services leveraged in the design and implementation of both the cloud infrastructure simulator and that of the load simulator. 30 CHAPTER 6. SEARCH FOR A SUITABLE CLOUD SIMULATION TOOL Figure 6.3: Overview of the VM provisioning process in an Openstack based cloud [12] 6.4. ARCHITECTURE OF OPENSTACK KEYSTONE [12] 31 Figure 6.4: Overview of the Keystone identity service architecture [12] 6.4 Architecture of Openstack Keystone [12] Figure 6.4 shows the overview of the Keystone architecture. The Keystone Identity service performs two essential functions in the cloud: User management: Its tracks users and their permissions by managing the Users, Tenants and Roles entities. Service catalog: It provides a catalogue of available services with their API endpoints. The keystone service uses a number of backend stores for managing its entities. Remark: The cloud load simulator implemented in this project uses the Identity backend to establish the correspondence between VMs inserted in the input loadfile and their owners (tenants names). 6.5 Architecture of Openstack Nova [12] The Nova Compute service is made up of several components as shown in Figure 6.5. We present next some Nova important components relevant to the cloud infrastructure simulator and the load simulator implemented in this project: Nova-API: It accepts and responds to end user compute API calls. It also initiates most of the orchestration activities (such as running an instance) as well as enforces some policy (mostly quota checks). The nova-compute process: It is primarily a worker daemon that creates and terminates virtual machine instances via hypervisor’s APIs 32 CHAPTER 6. SEARCH FOR A SUITABLE CLOUD SIMULATION TOOL Figure 6.5: Overview of the Nova compute service architecture highlighting the implemented fake compute driver [12] The legacy nova-network: It is a worker daemon that accepts networking tasks from the queue and then performs tasks to manipulate the network (such as setting up bridging interfaces or changing iptables rules. The nova-schedule process: It takes a virtual machine instance request from the queue and determines where it should run (specifically, which compute server host it should run on). The queue: It provides a central repository for passing messages between daemons. Remark: In the implemented cloud simulator, the queue implementation is done via the deployment of the RabbitMQ Message Broker. The MySQL database: It stores most of the build-time and runtime state of the cloud infrastructure. Remark: The load simulator leverages the real-time cloud state by accessing the buildtime and runtime cloud information via its several MySQL databases interfaces. 6.6. ARCHITECTURE OF OPENSTACK GLANCE [12] 6.6 33 Architecture of Openstack Glance [12] The Glance service provides services for discovering, registering, and retrieving virtual machine images. It is made up of the following components: Glance-API: It accepts Image API calls for image discovery, image retrieval and image storage. Glance-registry: It stores, processes and retrieves metadata about images (size, type). A database: It is used to store the image metadata. Remark: In our implementation the Glance database is a MySQL database. A storage repository: It is used for the actual image files. Remark: The cloud infrastructure simulator leverages Glance to instantiate VMs. 34 CHAPTER 6. SEARCH FOR A SUITABLE CLOUD SIMULATION TOOL Chapter 7 Overview of the Cloud Infrastructure Simulator In this chapter, we present how the implemented cloud infrastructure simulator works. This includes its physical configuration and how tenants and VMs are created. Further we explain how the Nova compute FakeDriver affects the cloud simulator and the load simulator. Additionally we explain how decoupling affects VMs creation and placement in the cloud. Finally we review the high level design principles used in the implementation of the cloud simulator. 7.1 Physical Configuration The cloud infrastructure simulator uses the Openstack technology specifically the icehouse release which was the current release at the time of the implementation [15]. It is made up of one controller node (ctr01.mgmt.local) and 16 compute nodes (cp01.mgmt.local up to cp16.mgmt.local). The cloud simulator infrastructure is hosted on the UZH/CSG n19 physical node. The configuration presented here was a design decision to take into account the physical resources available on the n19 physical node (16-core CPU, 64GB RAM, 500GB Hard Disk). Figure 7.1 shows the overview of the implemented cloud infrastructure simulator. The load simulator is a distinct implemented software component integrated to the cloud simulator infrastructure via the controller node and can be seen hosted in the cloud controller node in Figure 7.1. While the n19 is a physical node, all the 17 nodes (1 contoller node and 16 compute nodes) of the cloud infrastructure simulator are all VMs created using the Oracle Virtual Box technology. Table 7.1 summarizes the physical resources of node n19 and those of the 17 cloud nodes. Table 7.2 and Table 7.3 present the implemented cloud services on the cloud controller node and on the cloud compute nodes respectively. Remark: The basic services also called cloud core services must be implemented in any cloud deployment based on the Openstack cloud software. 35 36 CHAPTER 7. OVERVIEW OF THE CLOUD INFRASTRUCTURE SIMULATOR Figure 7.1: Overview of the cloud infrastructure simulator based on the Openstack cloud technology Table 7.1: Physical resources of the cloud infrastructure Node CPU RAM n19 2.5GHz 12-Core 64GB 1 x Physical node 64-bit AMD Opteron Ubuntu 14.04 17 x VMs 2.5GHz 1-vCPU 2 GB Ubuntu 14.04 components Disk 500GB 13GB Table 7.2: Implemented cloud services running on the cloud controller node Service Service Type Utility MySQL Database Supporting service Databases services for the cloud RabbitMQ Supporting service Message broker service for the cloud Keystone Basic service Identity services for the cloud Nova Basic service Provides compute services for the cloud Glance Basic service Provides image services for the cloud Table 7.3: Implemented cloud services running on each of the 16 compute nodes Service Service Type Utility Nova API Basic service Provides compute services for the compute host Hypervisor Basic service Instantiates VMs for the cloud tenants Nova Networking Basic service Provides network services for cloud VMs 7.2. CLOUD RESOURCES, TENANTS AND VMS 37 Table 7.4: Theoretic number of VMs that can be created in the cloud infrastructure simulator size of VMs Theoretic no of VMs size[1vCPU,2GB RAM,10GB Disk,100Mbps 16x106 VMs size[2vCPU,4GB RAM,20GB Disk,100Mbps 8x106 VMs size[4vCPU,8GB RAM,40GB Disk,100Mbps 4x106 VMs size[8vCPU,16GB RAM,80GB Disk,100Mbps 2x106 VMs 7.2 Cloud Resources, Tenants and VMs The cloud simulator controller has been configured with a RAM Overcomitment Ratio of 1 : 106 and a CPU Overcomittment Ratio of 1 : 106 . As a result the number of VMs that can be created and hosted on this infrastructure is virtually unlimited. Table 7.4 shows the number of VMs that can be created based on some examples of VMs sizes. This design choice allows for the scalability of the cloud simulator infrastructure as new VMs can be added as needed. While new VMs creation is the normal thing to do, VMs deletion should be the exception and should be done only in rare cases. A situation where VMs deletion is appropriate is the deletion of VMs with duplicate names in the cloud. To interact with the cloud infrastructure simulator, we have implemented a number of primitives some of which are presented in Table 7.5. A user-guide is found in Appendix B of the current report. It shows with the help of use-cases how to make use of the implemented primitives to use the cloud simulator infrastructure. The user-guide also contains use-cases for the use of the load simulator. 7.3 Nova Compute FakeDriver and the Cloud Simulator During the design of the cloud infrastructure simulator, we opted for an environment that can scale with virtually no limit constrained only by the physical resources available on the n19 node. One key architectural decision to achieve this has been the implementation of the Nova FakeDriver with a RAM Overcommitment Ratio of 1 : 106 and a CPU Overcommitment Ratio of 1 : 106 leading to the theoretical limits found in Table 7.4. How does overcommitment of resources affect the cloud simulator? Simply put, by using using a RAM overcommitment ratio of 1:2 and a CPU overcommitment ratio of 1:2 for example, we are telling the cloud scheduler to allow a number of VMs in the cloud such that the overall total amount of RAM and CPU in the created VMs is double the overall amount of physical RAM and CPU in all the compute hosts. In other words the total CPU and RAM promised by the cloud is double the overall amount of physical resources in the cloud. In our implementation we promised a total amount of RAM and CPU = overall physical resources (CPU and RAM on the 16 compute hosts) x 106 . Hence the very high theoretical limit of the number of VMs that can be created in the implemented cloud simulator as seen in Table 7.4. 38 CHAPTER 7. OVERVIEW OF THE CLOUD INFRASTRUCTURE SIMULATOR Table 7.5: Some implemented cloud primitives along with some default primitives. These are used to explore the cloud infrastructure user-interface Purpose create vm.py Creates a number of VMs in the cloud. These will belong to the current tenant. create tenant.py Creates a new tenant. Must be created with admin credentials. view cloud.py Shows all the VMs running in the cloud along with the host and VMs resources. Useful for selecting the compute hosts on which to run simulations. view cloud by vms.py Shows all the VMs running in the cloud along with the host and VMs resources. Useful for selecting the next VMs valid names view cloud all hosts.py Shows all compute hosts running in the cloud along with their resources and number of VMs. view cloud detailed host.py Shows all VMs running in a specific compute host along with their resources and number of VMs. keystone tenant-list View all tenants configured in the cloud. nova list Shows the current user’s VMs in the cloud. nova keypair-list Shows the current user’s keypair. How does overcommitment affect VMs creation in the cloud? The cloud overcommitment level is spread to all compute hosts that are running in the cloud. The cloud scheduler and specifically the Nova scheduler in our case is responsible to enforce the resource limits promised by the cloud. Once a request for creation of a VM in the cloud arrives at the scheduler, it accepts it or rejects it based among other factors on the current level of resources already provisioned. Thus, once a VM has been created and placed on a compute host, there is a guarantee that the cloud resources needed for this VM creation are within the predefined overcommitment level. Using our implementation as an example, the request for a VM creation will theoretically never be rejected. How does this impact the load simulator designed in the present project? Simply put neither the compute host nor the load simulator has to perform any check for adequate available resources in the cloud. This check has already been performed by the cloud scheduler that will automatically, transparently and dynamically move VMs from one compute host to another at startup or at creation to enforce the fact that the cloud can only deliver the resources it has promised using its predefined overcommitment level. Taking advantage of this cloud operation principle means that the only job left for the load simulator is to attempt to perform an allocation of resources that is consistent with the fair share metric. 7.4. PRINCIPLE OF DECOUPLING AND VMS CREATION IN THE CLOUD 7.4 39 Principle of Decoupling and VMs Creation in the Cloud To run experiments on the cloud simulator infrastructure, one can either use the existing tenants and VMs or create new ones. The default cloud quotas are set to a limit almost infinite. This means that when a tenant requests the creation of a VM using the implemented primitives, the request is passed on to the nova compute scheduler. The nova compute scheduler checks if there is an available compute host with enough compute resources to satisfy the VM creation. Since the scheduler will theoretically always find such a host in the implemented cloud simulator, it will therefore transparently and automatically allow the VM creation and placement on the available compute host. This process is completely independent of the load simulator (decoupling). In order words when the cloud scheduler which is part of the cloud infrastructure simulator decides to place a VM on a compute host, it has no idea whether the load simulator intends to place a load on this VM in the future. The only criteria for placing a VM on a compute host is enough resources on the compute host to satisfy the VM creation. When VMs are restarted or after new VMs are created, the nova compute scheduler may place them dynamically on a different host. This placement process is completely independent of the cloud user and of the load simulator. This decoupling between the cloud infrastructure simulator and the load simulator is a key principle in our design and guarantees that the cloud infrastructure simulator and the load simulator performs correctly. Moreover, by enforcing this design, we make sure the process of creating VMs and placing them on compute hosts is completely independent from the process of simulating a load placement on a VM. This reflects the workings of a real world cloud where a user can decide at any time to connect to his VMs to start or stop workloads. Cloud state after the creation of new VMs: After the creation of new VMs in the cloud, the cloud operator (the person doing the experiments in the cloud) should always review the new cloud state with the help of the view cloud.py primitive or another adequate primitive. Simply adding the newly created VMs to the input loadfile along with previous VMs already in the loadfile can lead to unexpected results. The reason is simple: After the creation of the new VMs the nova cloud scheduler has dynamically reorganized VMs placement in the cloud. The previous VMs which were likely selected because they run together on a given compute host do no longer necessarily run on the same compute hosts as before the creation of the new VMs. VMs default loads: Although the implemented cloud infrastructure may have a number of VMs running (over 170 VMs as of the writing of the present report), from the perspective of the load simulator all these VMs are up and running but consume 0 resources. The 16 compute hosts resources are available and can at any time be requested for consumption by any valid VM running in the cloud. The load simulator will always arbitrate resources requests relative to the cloud compute host on which the VMs are running. Cloud compute hosts resources: For the sake of experiments the compute hosts resources can be artificially increased to any level desired. The load simulator is built-in with methods to this effect such that a compute host possessing n x vCPUs can actually appear to the simulator as possessing 2n x vCPUs. This can be generalized to kn x vCPUs where k is an arbitrary multiplier factor. 40 CHAPTER 7. OVERVIEW OF THE CLOUD INFRASTRUCTURE SIMULATOR 7.5 Cloud Simulator high Level Design Principles The design of the cloud simulator infrastructure factors the following principles: Isolation: A given cloud tenant/user can only access his own VMs and networks. Automation: Vagrant/Oracle Virtual Box tools are used to automatically semi-provision the cloud infrastructure. The cloud operator can assume a tenant ID and use the create vm.py to automatically create the needed number of VMs belonging to the tenant whose identity is assumed. Scalability: The cloud infrastructure can scale by allowing VMs and tenants to be added as the need may be. This is constrained by the amount of physical resources present in the physical node n19. Quotas-free: The cloud environment has been configured with quotas that are theoretically infinite to allow users and tenants to freely create VMs and carry out experiments as the need may be. Decoupling: The cloud simulator has been designed and implemented in such a way that it is completely decoupled from the load simulator. Chapter 8 Overview of the Load Simulator In this chapter we present how the load simulator works. This includes the load simulator logical architecture components including the reader layer, the validation, aggregation and grouping layer. Further, the load consumption, time translation and report generation layers as well as the input parameter design (load design) are presented. A dedicated section explains why physical time cannot be measured in the load simulator. Finally, the fair-share metric is presented followed by the allocation and reallocation design. 8.1 Logical Architecture Overview The load simulator is implemented with interfaces to the cloud infrastructure simulator that enable it to build a cloud state that is always up-to-date. At runtime, it makes realtime calls to both the nova compute layer and to the keystone identity layer to retrieve the real-time cloud state. It reads its input loadfile, performs real-time calls to the nova compute layer and to the keystone identity layer using MySQL DBMS interfaces. It gathers all necessary information about the VMs resources sizes, the resources of the corresponding compute hosts, the tenants and owners of the VMs. Based on this dynamic information, it starts an allocation of resources using the fairness metric. If the input loadfile contains a VM that doesn’t exist in the cloud an error message is displayed and the load simulation is not even attempted. The load simulator is made up of three logical layers as presented in Figure 8.1. Additionally Figure 8.4 shows the logical view of the load simulator relative to the cloud simulator. Let’s present the functions of each of these layers. 8.2 Reader Layer This layer is made up of 3 components: The Nova reader, the Keystone reader and the load tables reader. An example of a valid input loadfile is shown in Figure 8.2. The load data from the input loadfile is imported into an Sqlite3 database prior to processing. 41 42 CHAPTER 8. OVERVIEW OF THE LOAD SIMULATOR Figure 8.1: Logical architecture of the load simulator Hence the load tables reader accesses load data via a DBMS Sqlite interface for further processing. The load tables reader reads the load consumption data from the load table, retrieves the VM names and passes them to the Nova Reader. The Nova Reader will scan the cloud to retrieve the following information with respect to each VM present in the loadfile: VM resources size for example: csg01 [2vCPUs, 8GB RAM, 120GB Disk, 10 Gbps vNIC], its project ID; the name of the compute host on which it is running and the corresponding host resources for example cp16 [10vCPUs, 1000 GB RAM, 10000 GB Disk, 1000 Mbps NIC]. The Nova reader also retrieves the project ID data with respect to VMs and passes this information to the Keystone reader. The Keystone reader will scan the cloud to establish a correspondence between the loadfile VMs project ID and their respective tenants and owners. 8.3 Validation, Aggregation and Grouping Layer Since the input loadfile contains only load information pertaining to each VM making a resource request and no other information, the load simulator via its MySQL DBMS interfaces to the nova compute layer, performs an initial validation of all VMs to be loaded. Through this process, the cloud is scanned to check the existence and state of the VM to be loaded. If the VM is not valid (doesn’t exist or has been deleted), the validation layer displays an error message and the load simulation is not even attempted. On the other hand if all VMs present in the input loadfile are valid, the load aggregation and grouping 8.4. LOAD CONSUMPTION, TIME TRANSLATION AND REPORTING LAYER 43 layer proceeds to aggregate and group all relevant information from the cloud. As a result it determines the VMs resources sizes, their owners, their grouping by compute hosts, the resource sizes of the compute hosts on which they are currently running and passes this information to the load consumption layer. Remark: The cloud scheduler can automatically and dynamically reassign VMs to cloud compute hosts as it sees fit at any time. Thus, it is critical for the load simulator to always have this updated information at run-time. 8.4 Load Consumption, Time Translation and Reporting Layer Based on the aggregated and grouped data, the time translator uses the start time data in the input loadfile. It evaluates both the total length of the load (total number of instants) and their start times relative to all other VMs running on the same compute host. This results in a dynamic allocation of resources to VMs whose overall outcome is consistent with the fair share metric. This is the case whether the VMs started at the same time or not. As a result the load simulator simulates load consumption of VMs relative to other VMs running inside the same compute host taking the current cloud state into account. At the end of the load simulation process, a number of summary reports are generated for the whole allocation experiment. These reports include among other, the summary of all resources requested, the summary of all resources allocated, the duration of the allocation per VM and per compute host. Remark1: In order to obtain a real-time cloud state, the load simulator requires that each VM name on the cloud has a unique name. The load simulator results may differ from expectations when two VMs with the same name exist in the cloud. One strong constraint for the implemented load simulator is that the cloud names remain unique. Remark2: To ensure unique names in the cloud, a standard naming convention where VMs are named after tenants followed by a sequential number (e.g. Patrick1, Patrick2, ..., Patrick100) has been implemented with the create vms.py primitive. However, it is the responsibility of the cloud operator to ensure that the parameters given to this command enforce uniqueness. The cloud technology itself allows duplicates names in the cloud. Remark3: If VMs with duplicate names are found in the cloud, the duplicates must be deleted for the sake of the load simulator. 8.5 Input Parameter Design: Load Design A load in the context of this project is a bundle of resources (CPU, RAM, Disk, Bandwidth) that are requested over a number of instants by a specific cloud VM identified by its unique name. The load has a relative start time. An input loadfile is made up of a set of valid loads. Using the example of the input loadfile of Figure 8.2, we extract a valid input load and present it on Figure 8.3. We succinctly present hereafter the fields of a valid load. 44 CHAPTER 8. OVERVIEW OF THE LOAD SIMULATOR Figure 8.2: Input table to the load simulator Figure 8.3: An example of a valid load extracted from Figure 8.2 8.6. PHYSICAL TIME IN THE LOAD SIMULATOR 45 Instant: This field is a positive integer field varying from 0 to the maximum number of instants of the resources requests. This field uniquely identifies each line in the load (but does not uniquely identify each line in the loadfile) and represents the instant at which an amount of resources is requested. A complete load is a set of such requests over all load instants. The assumption is that the loads can be of different lengths and vary over time from instant 0 to 6 using the example on Figure 8.3. cpu, ram, disk, bandwidth: These fields are positive numbers varying from 0 to the maximum available resources present in the VMs. Thus, a VM created with [2 vCPUs, 1GB RAM, 10GB Disk, 100Mbps vNIC] will have a valid CPU request from [0..2], a valid RAM request [0..1], a valid disk request [0..10] and a valid bandwidth request of [0..100]. vm: This field is the vm name which uniquely identifies the cloud VM which makes the resource allocation request. start time: This field is the relative start time of the load consumption of a VM in a compute host relative to all other VMs requesting resources on the same compute host. All resource allocation for a VM will always start at the start time. Using the example of Figure 8.2, the load simulator will attempt to place a load (allocate resources) on csg01 at t=2, on csg02 at t=5, on csg03 at t=6 and on csg04 at t=9. Remarks: The terms: load, resources request, allocation request, load request have the same meaning in the context of this project. On the other hand the terms: allocation, resource allocation, load allocation and load consumption also have the same meaning in the context of this project. 8.6 Physical Time in the Load Simulator At the start of this project, we had to make an architectural design decision: The first option was to build a small real-world productive cloud with real VMs that are accessible for processing using the Nova Libvirt compute driver. The advantage of this option is to allow the use of a well known tool such as the ”Stress” tool widely used in the industry to simulate real workloads in the cloud VMs. The disadvantage of this option is to restrict ourselves to fewer real compute hosts and a few real VMs running in the compute hosts. Furthermore the reallocation mechanisms would involve kernel programming at the KVM hypervisor layer to implement any fair scheme reallocation mechanism. The second option was to build a simulated cloud environment using the Nova fake compute driver. The advantage of this possibility is to build a cloud that theoretically can have an infinite number of fake VMs. The VMs are considered fake in the sense that they do not allow any real processing and consume very little resources. Moreover, the second possibility allows the simulation of the fair allocation mechanisms using a cloud load simulator application rather than having to implement them directly in the hypervisor kernel as is the case in the first option. The disadvantage of the second option is to have a cloud with a high number of VMs but these VMs do not allow any real-world processing. By choosing the second option in this project, the direct consequence for the load simulator is that the time used to run loads on VMs must also be simulated. Thus, the implementation mechanisms that attempt to measure the physical time an allocation request takes to complete in a VM are not feasible. Clearly by choosing the second option, we now have the ability to simulate the running of hundreds of VMs with little physical 46 CHAPTER 8. OVERVIEW OF THE LOAD SIMULATOR resources overhead. Moreover, the implemented load simulator can simulate the allocation of resources using the fair share metric. However, because the running VMs are not real functional entities, we have no way to measure the physical time used to complete a resource allocation in a VM. 8.7 The Fair Share Metric Following is an overview of the fair share metric used for the allocation and reallocation implemented in the load simulator: Case 1: At any given instant, if the sum of all resources requested by all VMs running on a given compute host fall within the limits of the compute host total resources: Each VM receives its full request at each instant. Case 2: The sum of all requests from all VMs is greater than the compute host total maximum. Each VM receives its fair share. The unallocated amount of resources is carried to the next instant. The fraction received at each allocation instant using the fair share metric is: vmi ∗ host max vm1 +vm2 +...+vmn where i=1,...,n with n being the total number of VMs simultaneously requesting this resource type on the compute host, vmi , vm1 , ..., vmn is the maximum of this resource type in the corresponding VMs and host max is the total maximum of this resource type in the compute host. Remark: The Fair share metric scheme described here is based on the following constraints: A VM never receives more than it requested and a VM never requests more than its maximum limits of a resource type. Moreover, all requested resources by a VM must be allocated. 8.8 Allocation and Reallocation Design The resource allocation and reallocation mechanisms implemented in this project are based on the fair share metric. Hence it can be said that once several VMs request a total amount of resources that is greater than the maximum total resources of a compute host, the load simulator arbitrates the allocation requests among these VMs such that the overall outcome is always fair. The load simulator fair share arbitration scale to the dimension of the whole cloud allowing it to arbitrate the requests of resources among competing VMs on any valid compute host in the cloud. Let’s refer to the example of Figure 8.2 where the load simulator will attempt to place a load on csg01 at t=2, on csg02 at t=5, on csg03 at t=6 and on csg04 at t=9. The dynamic information retrieved from the cloud (cloud state) at run time include the compute host on which these VMs are running, the resource sizes of the compute hosts, the resource sizes of the VMs and the total number of VMs simultaneously making requests on the compute host. This information will be of critical importance to the load simulator. Using the above example let’s explain two possible scenarios: Scenario 1: The load simulator detects at run-time that csg01, csg02, csg03 and csg04 8.8. ALLOCATION AND REALLOCATION DESIGN 47 Figure 8.4: Logical view of the load simulator relative to the cloud simulator are running on the same compute host: The compute host resources will be shared by the load simulator among the 4 VMs using the fair share metric. Scenario 2: The load simulator detects at run-time that each of the VMs csg01, csg02, csg03 and csg04 runs on a different compute host: The resource allocation will produced a complete different outcome as in scenario 1. In this case each VMs will get the maximum resources obtainable at each instant. Remark: Suppose there are other VMs running on the same host as the compute host in Scenario 1 and as the compute hosts in Scenario 2: The outcome remains unchanged as these other VMs consume 0 resources. Only the VMs on the input loadfile consume resources and have an influence on the load simulator outcome. An important assumption is that the person performing experiments with the cloud simulator will validate the input loadfile. The results obtained will always reflect the data present in this file. 48 CHAPTER 8. OVERVIEW OF THE LOAD SIMULATOR Chapter 9 Evaluation 9.1 Output of some implemented Cloud Primitives 1. view cloud.py python view cloud.py: It shows a complete view of the cloud as shown in Figure 9.1. Its output includes the total number of VMs running in the cloud as shown in Figure 9.2 2. view host cloud.py python view host cloud.py: It shows the complete view of a cloud host including its VMs and its resources as shown in Figure 9.3 3. view all user vms.py python view all user vms.py: It shows VMs belonging to a specific cloud tenant or user and on which compute host they are running as shown in Figure 9.4 9.2 Load Simulator Tests and Results Test Case 1: 3 VMs are chosen from the cloud such that each VM runs on a different compute host. Each VM requests an amount of resources within its own size. We use the primitive view cloud.py and view host cloud.py in making our VMs selection. For the sake of the size of the output we choose loads of small lengths. Scenario 1: All the 3 VMs request different amount of resources and have different start time. Figure 9.5 shows the resources requests. Test Case 1/Scenario 1 results: The results are shown in Figure 9.6, Figure 9.7 and Figure 9.8. Test Case 2: 5 VMs are chosen from the cloud such that: 2 VMs run on the first compute host and 3 VMs run on the second compute host. The amount of resources they request is always within their size limits. We use the primitive view cloud.py and view host cloud.py in making our VMs selection. Scenario 1: All the 5 VMs request resources with the same start time in their respective 49 50 CHAPTER 9. EVALUATION Figure 9.1: Part1 of the truncated output from the view cloud.py Figure 9.2: Part 2 of the truncated output from the view cloud.py Figure 9.3: Truncated output of the view host cloud.py command 9.2. LOAD SIMULATOR TESTS AND RESULTS 51 Figure 9.4: Truncated output of the view all user vms.py command Figure 9.5: Test Case 1/Scenario 1: All the 3 VMs request resources with different starting times 52 CHAPTER 9. EVALUATION Figure 9.6: Part 1 results: Test Case 1/Scenario 1 Figure 9.7: Part 2 results: Test Case 1/Scenario 1 9.3. DISCUSSION ON THE LOAD SIMULATOR RESULTS 53 Figure 9.8: Part 3 results: Test Case 1/Scenario 1 compute host. Figure 9.9 shows the resources requests. Test Case 2/Scenario 1 results: The results are shown in Figure 9.10, Figure 9.11, Figure 9.12 and Figure 9.13. Scenario 2: All the 5 VMs request resources with a different start time within their compute host. Figure 9.14 shows the resources requests. Test Case 2/Scenario 2 results: The results are shown in Figure 9.15, Figure 9.16, Figure 9.17 and Figure 9.18. 9.3 Discussion on the Load Simulator Results Consistent with its design, the load simulator reads the input load table, validates, aggregates and groups this data such that all VMs running in a particular compute host are grouped together. It also retrieves the VMs resources sizes and their corresponding compute host resource sizes from the cloud layer and performs a resource allocation consistent with the fair share metric whether the VMs started at the same instant or not. While each compute host has a real physical size resource limit, the resource allocation is done based on the simulated maximum resources size of the compute host. This value appears as max cpu, max ram, max disk and max bandwidth on the output. This functionality has been added to introduce elasticity to the compute host resources sizes. Thus, once the amount of resources requested by VMs is above the simulated compute host maximum, the fair share metric arbitrates resource allocation to the competing VMs. The above conclusions can be verified from all test cases scenarios. Let’s have a look at the test cases results. Test Case 1/Scenario 1: Based on the results shown in Figure 9.6, Figure 9.7 and Figure 9.8, the following can be inferred: The load simulator detects that each VM lm10, lm11 and lm12 each runs on a separate compute host. The compute hosts detected are respectively cp4, cp10 and cp16. In this case each VM receives the resources it requested 54 CHAPTER 9. EVALUATION Figure 9.9: Test Case 2/Scenario 1: All the 5 VMs with same start times within their respective compute hosts 9.3. DISCUSSION ON THE LOAD SIMULATOR RESULTS Figure 9.10: Part 1 results: Test Case 2/Scenario 1 Figure 9.11: Part 2 results: Test Case 2/Scenario 1 55 56 CHAPTER 9. EVALUATION Figure 9.12: Part 3 results: Test Case 2/Scenario 1 Figure 9.13: Part 4 results: Test Case 2/Scenario 1 9.3. DISCUSSION ON THE LOAD SIMULATOR RESULTS 57 Figure 9.14: Test Case 2/ Scenario 2: Load request for 5 VMs with different start times within their respective compute hosts 58 CHAPTER 9. EVALUATION Figure 9.15: Part 1 results: Test Case 2/Scenario 2 Figure 9.16: Part 2 results: Test Case 2/Scenario 2 9.3. DISCUSSION ON THE LOAD SIMULATOR RESULTS Figure 9.17: Part 3 results: Test Case 2/Scenario 2 Figure 9.18: Part 4 results: Test Case 2/Scenario 2 59 60 CHAPTER 9. EVALUATION at each instant. The reason is that the amount of resources requested by each VM at each instant is within the VM resources sizes and therefore also within the compute host resources. In this case the duration of each allocation equals the duration of the request. The time output shows that the allocation starts from the instant of request at t=0, t=3 and t=5 respectively. Test Case 2/Scenario 1: Based on the results shown in Figure 9.10, Figure 9.11, Figure 9.12 and Figure 9.13, the following can be inferred: The load simulator detects that the first two VMs lm21 and lm15 are both running on cp10. Since their start time is identical at t=5, after the validation, aggregation and grouping phase, the load simulator allocates the resources requests. The total amount of cpu, disk and bandwidth requested at each cycle by the 2 VMs fall within the maximum physical resources of the compute host cp10. For these resources the number of instants of requests equal the number of instants of allocation. Things are different for the RAM requests however. Since the total requested at each cycle is 12288 GB well above the maximum of 8192 GB of the compute host, the load simulator uses the fair share metric to perform the RAM allocation. Thus at t=5, the two VMs each receives a fair share proportional to its size with the value vmi ∗ host max, where VMi (i=1,2) identifies the respective VM RAM max size. At vm1+vm2 each cycle the non-fulfilled requests are carried on to the next cycle until the allocation is complete. host max is the RAM max size of the compute host. As can be noted using this example, the least available resource, in this case the RAM resource determines the longest allocation time. Using the fair share metric at t=5, lm21 4096 8192 ∗ 8192 = 5461.33; lm15 receives 8192+4096 ∗ 8192 = 2730.67; What ever receives 8192+4096 is left unallocated at this cycle is carried on to the next cycle. Thus at t=6 the resources requests changes for both VMs. lm21 now requests 8192 + (8192 − 5461.33) = 10922.67 and lm15 now requests 4096 + (4096 − 2730.67) = 5461.33. However, the amount of resources they receive at t=6 do not change. This process continues until t=13 where both VMs receive the total amount of RAM requested. At this point both lm21 and lm15 have received all the resources they requested. Test Case 2/Scenario 2: Using the results presented in Figure 9.15, Figure 9.16, Figure 9.17 and Figure 9.18, the following can be inferred: The 2 VMs lm21 and lm15 make their allocation requests starting at different start times t=5 and t=8. During the first 3 instants of allocation at t=5, t=6 and t=7 lm21 receives the maximum requested because it is alone making the requests. At t=8 however, lm15 makes its first requests and since the total requested by both VMs is greater than the physical maximum of the compute host, the allocation continues using the fair share metric. At t=12, lm21 has received all the resources requested leaving lm15 full access to the compute host resources. At t=13, lm15 receives a full share and the allocation finishes at t=14 when all resources are allocated. The time translator reports a duration of 7 instants and 6 instants starting at t=5 and t=8 respectively. As for the three VMs fake4, lm10 and lm13 running on compute host cp14, the allocation results are consistent and correct. All total requests for all given resources at all instants are always within the total physical resources available on the compute host. However, since the requests are made at different start times, the end times are also different. The respective allocation start times are t=1, t=4 and t=7 respectively while the finish times are t=6, t=9 and t=12 respectively. 9.4. FURTHER WORK: 9.4 61 Further Work: As can be inferred from the load simulator results discussion, the fair share metric introduces fairness in the compute hosts resource allocation mechanisms. However, since the load simulator time output is not based on a physical time scheme, a future comparative study could be done to establish the gains of introducing fair share mechanisms in real-world compute hosts that do not implement such fairness schemes or otherwise implement alternative fairness metrics. This comparative study could investigate the gains in terms of processing time gains or VMs resource allocation famine reduction. Such a study could also investigate gains based on other valid gains metrics. Another exciting future possibility is the extension of the load simulator with the implementation of other cloud resource allocation algorithms. The results of these can be easily compared with those obtained from the fair share metric with a view to innovate and improve. 62 CHAPTER 9. EVALUATION Chapter 10 Summary and Conclusion In this project, we have addressed the problem of simulating resource allocation in clouds in two ways: First by designing and implementing a cloud infrastructure simulator using the Openstack cloud technology. The implemented cloud simulator allows the creation of IaaS components such as compute hosts, VMs, tenants and projects. It can be used to conduct repeatable and controllable investigation of the resource allocation mechanisms, algorithms and policies used in the cloud SCIs. Second by designing and implementing an elastic load simulator that simulates fair resource allocation in clouds using a fair share metric. The load simulator architecture is made up of three logical layers: a reader layer, a layer for validation, aggregation and grouping, a layer for load consumption, time translation and reporting. These 3 layers enable the load simulator to maintain a current cloud state at all times and thus to perform an accurate simulation of resource allocation in the cloud. The load simulator is an elastic and scalable load simulator. As such it can be used to simulate resource allocation on a few VMs or on all VMs in the cloud taking into account all tenants and compute hosts. The load simulator is an extensible framework that can be extended with the implementation of other cloud resource allocation algorithms. This offers the interesting possibility of comparing the implemented algorithms with a view to improve these algorithms. The results obtained from the dataset used in the load simulator are consistent with the fair share metric used. As possible future work, we have identified the possibility of conducting a comparative study to establish the gains of the implemented fair share metric over systems that implement an alternate allocation scheme using metrics such as gains in time, gains in VMs famine reduction or any other gains metrics. Another interesting opportunity is to extend the load simulator by implementing other cloud resource allocation algorithms with a view to establish comparisons. 63 64 CHAPTER 10. SUMMARY AND CONCLUSION Bibliography [1] LSCI 2012, Riccardo Murri, Sergio Maffioletti Grid Computing Competence Center, UZH [2] Memory Overcommitment in the ESX Server,VMware Technical Journal,Vol 2, NO. 1 June 2013 [3] VMware Whitepaper: Understanding Full Virtualization, Paravirtualization and Hardware Assist, David Marshall, VMware Inc, 2007 [4] An Overview of Virtualization Technologies, Pierre Riteau University of Rennes 1, IRISA, June 2011 [5] Optimal Joint Multiple Resource Allocation Method for Cloud Computing Environments,Shin-ichi Kuribayashi, International Journal of Research and Reviews in Computer Science,Vol. 2, No. 1, March 2011 [6] Multi-dimensional Resource Allocation for Data-intensive Large-scale Cloud Applications, Foued Jrad, Jie Tao, Ivona Brandic and Achim Streit; Closer 2014 - 4th International Conference on Cloud Computing and Services Science [7] Multi-Resource Allocation: Fairness-Efficiency Tradeoffs in a Unifying Framework, Carlee Joe-Wong, Soumya Sen, Tian Lany, Mung Chiang, INFOCOM, 2012 Proceedings IEEE [8] CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Software Practice and Experience, Volume 41, January 2011 [9] Cloudsim, http://www.cloudbus.org/cloudsim/ [10] KVM, http://www.linux-kvm.org [11] Wikipedia, http://en.wikipedia.org/wiki/OpenStack [12] Openstack http://www.openstack.org [13] Manage resources on overcommitted KVM hosts Consolidating workloads by overcommitting resources, IBM DeveloperWorks, Feb 2011 65 66 BIBLIOGRAPHY [14] Dominant Resource Fairness: Fair Allocation of Multiple Resource Types,Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica, Proceedings of the 8th USENIX conference on Networked systems design and implementation, Pages 323-336 [15] Openstack, http://docs.openstack.org/icehouse/install-guide/install/ apt/content/index.html [16] Oracle Virtualbox, https://www.virtualbox.org/ [17] kvm: the Linux virtual machine monitor, Kivity & al, Proceedings of the Linux Symposium, volume 1, pages 225–230, year 2007 [18] Virtual Cpu Scheduling Techniques for Kernel Based Virtual Machine (Kvm), Raghavendra, K. T. Cloud Computing in Emerging Markets (CCEM), 2013 IEEE International Conference on. IEEE, 2013. Abbreviations SaaS IaaS PaaS SCIs CPU RAM VM I/O SLA DRF FDS GFJ QoS HPC SMSCG VMM OS MMU TLB vNIC NIC HVM vCPU KSM DMC API LDAP AWS IP SDN IDS VPN HTTP REST CLI ID Software as a Service Infrustructure as a Service Platform as a Service Shared Computing Infrastructures Central Processing Unit Random Access Memory Virtual Machine Input/Output Service Level Agreement Dominant Resource Fairness Fairness on Dominant Shares Generalized Fairness on Jobs Quality of Service High Performance Compting Swiss Multi-Science Computing Grid Virtual Machine Monitor Operating System Memory Management Unit Translation Lookaside Buffer Virtual Network Interface Card Network Interface Card Hardware Virtualization Extensions Virtual Central Processing Unit Kernel Same-page Merging Dynamic Memory Control Application Programming Interface Lightweight Directory Access Protocol Amazon Web Services Internet Protocol Software Defined Networking Intrusion Detection System Virtual Private Network Hypertext Transfer Protocol Representational State Transfer Command Line Interface Identity 67 68 URI DBMS ABBREVIATONS Uniform Resource Identifier Database Management System Glossary Cloud Computing It is a model for enabling convenient on-demand network access to a shared pool of virtualized, configurable computing resources (e.g. networks, servers, storage, applications and services) that can be rapidly provisioned over the internet and released with minimal management effort or service provider interaction. Hypervisor It is a piece of computer software, firmware or hardware that creates and runs virtual machines. Type 1 Hypervisor These are hypervisors that run directly on the host hardware to control the hardware and to manage guest operating systems. Type 2 Hypervisor These hypervisors run on a conventional operating system just as other computer programs do. Virtual Machine Monitor (VMM) It implements the virtual machine hardware abstraction and is responsible for running a VM. Binary Translation It is a CPU virtualization technique that does the translation of the guest OS kernel code to replace non-virtualizable instructions with new sequences of instructions that have the intended effect on the virtual hardware. Overcommitment is a hypervisor feature that allows a virtual machine (VM) to use more memory space than the physical host has available. Memory Ballooning It is a memory reclaiming technique in which the host instructs a cooperative guest to release some of its assigned memory so that it can be used for another purpose. Nova (Compute) It is a cloud computing fabric controller, which is the main part of an IaaS system based on the Openstack cloud technology. Keystone (Identity) It provides authorization and authentication for users and tenants in the cloud. It provides a central directory of users mapped to the OpenStack services they can access. Neutron (Networking) It allows the creation and attachment of interface devices managed by other OpenStack services to networks. Swift (Object Storage) It is a multi-tenant object storage system. 69 70 GLOSSARY Cinder (Block Storage) It adds persistent storage to a virtual machine. Glance (Image Service) It provides discovery, registration, and delivery services for disk and server images. Ceilometer (Telemetry) It provides a single point of contact providing all the counters across all current OpenStack components Horizon (Dashboard) It is a modular Django web application that provides a graphical interface to OpenStack services. Heat (Orchestration) It is a service to orchestrate multiple composite cloud applications using templates, through both an OpenStack-native REST API and a cloud formation-compatible Query API. Trove (Database) It provides scalable and reliable cloud provisioning functionality for both relational and non-relational database engines. Virtualization It describes the separation of a service request from the underlying physical delivery of that service. Nova Compute FakeDriver It is a Nova Compute driver that allows the creation of non-functional VMs that consume little or no resources on the compute host. It is mainly used for simulation purposes. List of Figures 2.1 Multi-cloud workflow framework architecture based on CloudSim [6] . . . . 4 2.2 Number of large jobs completed for each allocation scheme in comparison of DRF against slot-based fair sharing and CPU-only fair sharing [14] . . . 5 2.3 Example of multi-resource requirements in data-centers [7] . . . . . . . . . 6 3.1 Overview of a cluster computing architecture [1]. Fairness can be introduced by modifying the scheduler and resource allocation manager . . . . . 8 Overview of a grid computing architecture [1]. Fairness can be introduced at the domain or cluster level . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 3.3 Overview of a cloud SCI architecture showing an Openstack cloud [1] . . . 10 4.1 x86 Virtualization Overview: A hosted virtualization or type 2 hypervisor: The hypervisor runs on an OS. Example: Oracle Virtualbox, VMware flash player [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 x86 Virtualization Overview: A hypervisor virtualization or type 1 hypervisor: The hypervisor runs on bare-metal. Most widely used in productive clouds. Example: KVM, XEN, VMware ESXi, Microsoft Hyper-V [4] . . . 12 4.3 x86 Virtualization Overview: A virtualization layer is added between the hardware and the operating system [3] . . . . . . . . . . . . . . . . . . . . 13 4.4 VMM architecture Overview: Each VMM partition physical resources and present them to VMs as virtual resources [3] . . . . . . . . . . . . . . . . . 13 4.5 x86 privilege level architecture with no virtualization implemented [3] . . . 14 4.6 x86 Memory Virtualization. The VMM is responsible for mapping the VM physical memory to the host physical memory [3] . . . . . . . . . . . . . . 15 4.7 x86 Device and I/O Virtualization. The Hypervisor uses software to emulates virtual devices and I/O and translate VMs requests to the system hardware [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 71 72 LIST OF FIGURES 6.1 Layered cloud computing architecture [8] . . . . . . . . . . . . . . . . . . . 24 6.2 Openstack cloud architecture overview [12] . . . . . . . . . . . . . . . . . . 26 6.3 Overview of the VM provisioning process in an Openstack based cloud [12] 6.4 Overview of the Keystone identity service architecture [12] . . . . . . . . . 31 6.5 Overview of the Nova compute service architecture highlighting the implemented fake compute driver [12] . . . . . . . . . . . . . . . . . . . . . . . . 32 7.1 Overview of the cloud infrastructure simulator based on the Openstack cloud technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 8.1 Logical architecture of the load simulator . . . . . . . . . . . . . . . . . . . 42 8.2 Input table to the load simulator . . . . . . . . . . . . . . . . . . . . . . . 44 8.3 An example of a valid load extracted from Figure 8.2 . . . . . . . . . . . . 44 8.4 Logical view of the load simulator relative to the cloud simulator . . . . . . 47 9.1 Part1 of the truncated output from the view cloud.py . . . . . . . . . . . . 50 9.2 Part 2 of the truncated output from the view cloud.py . . . . . . . . . . . 50 9.3 Truncated output of the view host cloud.py command 9.4 Truncated output of the view all user vms.py command . . . . . . . . . . 51 9.5 Test Case 1/Scenario 1: All the 3 VMs request resources with different starting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 9.6 Part 1 results: Test Case 1/Scenario 1 . . . . . . . . . . . . . . . . . . . . 52 9.7 Part 2 results: Test Case 1/Scenario 1 . . . . . . . . . . . . . . . . . . . . 52 9.8 Part 3 results: Test Case 1/Scenario 1 . . . . . . . . . . . . . . . . . . . . 53 9.9 Test Case 2/Scenario 1: All the 5 VMs with same start times within their respective compute hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 30 . . . . . . . . . . . 50 9.10 Part 1 results: Test Case 2/Scenario 1 . . . . . . . . . . . . . . . . . . . . 55 9.11 Part 2 results: Test Case 2/Scenario 1 . . . . . . . . . . . . . . . . . . . . 55 9.12 Part 3 results: Test Case 2/Scenario 1 . . . . . . . . . . . . . . . . . . . . 56 9.13 Part 4 results: Test Case 2/Scenario 1 . . . . . . . . . . . . . . . . . . . . 56 9.14 Test Case 2/ Scenario 2: Load request for 5 VMs with different start times within their respective compute hosts . . . . . . . . . . . . . . . . . . . . . 57 LIST OF FIGURES 73 9.15 Part 1 results: Test Case 2/Scenario 2 . . . . . . . . . . . . . . . . . . . . 58 9.16 Part 2 results: Test Case 2/Scenario 2 . . . . . . . . . . . . . . . . . . . . 58 9.17 Part 3 results: Test Case 2/Scenario 2 . . . . . . . . . . . . . . . . . . . . 59 9.18 Part 4 results: Test Case 2/Scenario 2 . . . . . . . . . . . . . . . . . . . . 59 B.1 View of the selected 5 VMs from compute host cp12 . . . . . . . . . . . . . 80 B.2 View of the selected 3 VMs from compute host cp16 . . . . . . . . . . . . . 80 B.3 View of the selected 2 VMs from compute host cp15 . . . . . . . . . . . . . 81 B.4 View of the load requests for all the 10 VMs . . . . . . . . . . . . . . . . . 82 B.5 View of the tenant list showing tenant patrick does not exist . . . . . . . . 83 B.6 View of the output of the create tenant.py command . . . . . . . . . . . . 84 B.7 View of the output of the keystone tenant-list command showing the newly created tenant patrick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 B.8 View of the output of the nova keypair-list command showing the newly created patrick-key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 B.9 View showing the 20 newly created VMs of tenant patrick as well as the updated total for the cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 B.10 View of the entire cloud sorted on compute hosts names. Useful to select compute hosts for simulations . . . . . . . . . . . . . . . . . . . . . . . . . 87 B.11 Entire view of the cloud sorted on VMs names. Useful to select the next valid VMs names in the cloud . . . . . . . . . . . . . . . . . . . . . . . . . 87 B.12 View of the compute hosts along with their resources and number of VMs . 88 B.13 Detailed view of compute host cp01 . . . . . . . . . . . . . . . . . . . . . . 88 B.14 View of all the tenants configured in the cloud . . . . . . . . . . . . . . . . 88 B.15 View of the current loadfile . . . . . . . . . . . . . . . . . . . . . . . . . . 89 74 LIST OF FIGURES List of Tables 6.1 Comparison of CloudSim and Openstack cloud Simulation tools . . . . . . 24 7.1 Physical resources of the cloud infrastructure components . . . . . . . . . . 36 7.2 Implemented cloud services running on the cloud controller node . . . . . . 36 7.3 Implemented cloud services running on each of the 16 compute nodes . . . 36 7.4 Theoretic number of VMs that can be created in the cloud infrastructure simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 7.5 Some implemented cloud primitives along with some default primitives. These are used to explore the cloud infrastructure . . . . . . . . . . . . . . 38 B.1 Table with code implementation statistics: Total number of code lines = 1533 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 B.2 Examples of some implemented primitives that allow interaction with the Cloud Infrastructure Simulator . . . . . . . . . . . . . . . . . . . . . . . . 86 75 76 LIST OF TABLES Appendix A Report on Milestones Implementation A.1 Guiding Principle 1: In agreement with the project supervisor, the current project was evolved to take into account the latest development in the cloud technology and allocation mechanisms. This latest information is contained in chapters 4 and 5 of the present report. As such, we agreed not to adhere to the strategy of the Master Thesis of Beat Kuster for the allocation/reallocation design. For instance we do not develop any strategy of reallocation design based on using the KVM hypervisor ”nova resize” primitives. We rather implement an allocation/reallocation design based on the fair share metric. A.2 Guiding Principle 2: This project was extended to cover the full implementation of the load simulator as found in chapter 8. This is an extra contribution on our part to this project. 77 78 APPENDIX A. REPORT ON MILESTONES IMPLEMENTATION A.3 Milestone 1: Search for a suitable Cloud Simulation Tool A.4 Milestone 2: Comparison to Simulator Integration into Openstack A.5 Milestone 3: Decison on Alternatives Milestone 1, 2 and 3 are covered in chapter 6. A.6 Milestone 4: Input Parameter Design Milestone 4 is covered in chapter 8, specifically in section 8.5. A.7 Milestone 5: Reallocation Design A.8 Milestone 6: Consumption Data Design Milestone 5 and Milestone 6 are covered in chapter 8, specifically in on section 8.8. A.9 Milestone 7: Implementation The implementation of the cloud infrastructure simulator and of the load simulator are covered in chapter 7 and chapter 8. A.10 Milestone 8: Evaluation Milestone 8 is entirely covered in chapter 9. The evaluation in Milestone 8 is expected to be done by running various loads (datasets) in the cloud simulator and load simulator. The results are checked for correctness with respect to the implemented fairness metric. Appendix B User Guide B.1 Implemented Code Statistics All the implemented programs are shown in Table B.1 along with the number of lines of code in each. B.2 Use-Case 1: Running Simulations with existing Cloud Components Description: We run a simulation using 10 VMs running on 3 Compute hosts. All the VMs and tenants needed for this experiment already exist in the cloud. Assumption: The Login process assumes the user has an active IFI network connection or an active UZH VPN network connection. Moreover, we use the VNC viewer to have a GUI display. This can be downloaded from Internet and installed on a local laptop. The user name used here is louismariel. However, each cloud operator should use his own user name to perform load simulations in the cloud. 1. Login to the node n19: We use the following command: # putty -ssh -L 4545:n19:5901 [email protected]. The password for louismariel is: csgcsg123. On a Windows Laptop the above command is typed from the cmd command line. 2. Launch VNC Viewer. We use the 64-bit Windows version. The password for VNC Viewer is: csgcsg. We are now logged into the node n19. 3. We will now select the 10 VMs on which to run the load simulations such that 2 VMs run on the first compute host, 3 VMs run on the second compute host, and 5 VMs run on the third compute host. 79 80 APPENDIX B. USER GUIDE Table B.1: Table with code implementation statistics: Total number of code lines = 1533 Name of program Purpose Code lines total db read load tables new.py validation and load tables reader 407 dbquery vm.py Nova and Keystone reader 248 db ram cpu disk band alloc.py main simulator class, time translation 499 aggregation, grouping, allocation view cloud by vms.py entire cloud view by VMs 57 view cloud entire cloud view by hosts 57 view cloud all hosts.py shows all compute hosts 38 view cloud detailed host.py shows details of a compute host 75 db view load.py view the table load 30 create tenant.py create a tenant in the cloud 53 create keypair.py create a keypair for a tenant 22 create vms.py create VMs for a tenant 47 Figure B.1: View of the selected 5 VMs from compute host cp12 4. We use the commands: # cd /home/csg/v cloud fake #python view cloud.py. We use the above command to view the cloud. Based on its output we write down the VM names we intend to use for the simulations. In a next step we will add these VMs to the input loadfile. Our selections are shown in Figure B.1, Figure B.2 and Figure B.3 5. We will now create the input loadfile on the controller node ctr01. For this experiment all VMs will request the maximum amount of resources equal to their sizes. Remark:All load simulator commands should be executed from the directory /home/csg/v cloud fake 6. We use the commands: # cd /home/csg/v cloud fake #cp load for test 1.txt load csg 1.txt We copy the template file load for test 1.txt to the new loadfile named load csg 1.txt. A user can also create this file from scratch instead of copying. The fields in this Figure B.2: View of the selected 3 VMs from compute host cp16 B.3. USE-CASE 2: RUNNING SIMULATIONS WITH NEW CLOUD COMPONENTS81 Figure B.3: View of the selected 2 VMs from compute host cp15 file are comma-separated. A valid text editor is required. In these examples we use vi as our text editor. 7. We modify the load csg 1.txt to contain the new load for the 10 VMs selected. 8. We connect to the load database to import the loadfile with the following commands. loadvm.db is the load database specified on the command line. #sqlite3 loadvm.db #.separator , #.import load csg 1.txt load vms 9. We verify the new load table with the command: #python db view load.py The results of the above command are shown in Figure B.4 10. We run the simulation using the following command and the results are redirected to the results csg 1.txt file #python db ram cpu disk band alloc.py > results csg 1.txt 11. The results can now be consulted in the file results csg 1.txt with the command: #more results csg 1.txt Remark: We can also import these results on our laptop for further processing. When we are running multiple experiments, we should differentiate the results files by using different numbers at the end of the results filenames. Example results csg 1.txt, results csg 2.txt and so on. B.3 Use-Case 2: Running Simulations with new Cloud Components Description: We will use 50 VMs running on 10 Compute hosts to run the second experiment. Some of these VMs will be newly created and will belong to the newly created tenant 1. We want to create a new tenant called patrick that will own 30 VMs in the cloud. At the present this tenant does not exist. We use the following 2 commands to view the existing tenants in the cloud: #source admin-openrc.sh 82 APPENDIX B. USER GUIDE Figure B.4: View of the load requests for all the 10 VMs B.3. USE-CASE 2: RUNNING SIMULATIONS WITH NEW CLOUD COMPONENTS83 Figure B.5: View of the tenant list showing tenant patrick does not exist #keystone tenant-list The output of the above command is shown in Figure B.5. It confirms that tenant patrick does not exist. 2. Next we create the tenant patrick with the following command: #source admin-openrc.sh #python create tenant.py patrick. The above command creates tenant patrick and generates the file patrick-openrc.sh containing the credentials of user patrick. The above command output is shown in Figure B.6 3. We confirm the creation of the new tenant patrick with the following command: #keystone tenant-list The output of the above command is shown in Figure B.7 and confirms that tenant patrick has been created. 4. Next we create a key-pair for tenant patrick with the following command: #source patrick-openrc.sh #python create keypair.py patrick. 5. We verify that the keypair has been created with the following command: #nova keypair-list. The output of the above command is shown in Figure B.8 6. Next we view the VM flavors with the following command: #nova flavor-list. 7. Next we create VMs using the patrick tenant with the following command: #python create vm.py patrick m1.large 1 20. 84 APPENDIX B. USER GUIDE Figure B.6: View of the output of the create tenant.py command Figure B.7: View of the output of the keystone tenant-list command showing the newly created tenant patrick Figure B.8: View of the output of the nova keypair-list command showing the newly created patrick-key B.4. USE-CASE 3: EXPLORING THE CLOUD USING THE IMPLEMENTED PRIMITIVES85 Figure B.9: View showing the 20 newly created VMs of tenant patrick as well as the updated total for the cloud The above command creates 20 VMs with flavor m1.large for tenant patrick with the names Patrick1,Patrick2,...,Patrick20 8. Finally we verify that the new VMs have been created with the following command: #python view cloud by vms.py. Figure B.9 shows the 20 newly created VMs of the new tenant patrick. Remark 1: The VMs names need to be unique in the cloud in order for the load simulator to resolve them. Duplicates names need to be deleted when they are created by error. Remark 2: From now onward the new VMs can be used in the simulation experiments as shown in use-case 1. The fact that these new VMs belong to the new tenant Patrick will be automatically retrieved by the load simulator at runtime. Remark 3: Deletion of VMs from the cloud is discouraged. This should be the exception and not the rule. Using the developed cloud primitives help to check where the index associated with the VMs of a tenant end. This helps determine the valid starting index for new VMs. B.4 Use-Case 3: Exploring the Cloud using the implemented Primitives Table B.2 shows how to explore the cloud infrastructure using the implemented primitives. It summarizes the commands and their purposes. Next we show the output of each command as executed from the implemented cloud simulator. Remark: All commands are executed from the simulator home directory /home/csg/v cloud fake 86 APPENDIX B. USER GUIDE Table B.2: Examples of some implemented primitives that allow interaction with the Cloud Infrastructure Simulator user-interface Purpose Shows all the VMs running in the cloud view cloud.py along with the host and VMs resources. Useful for selecting the compute hosts on which to run simulations. view cloud by vms.py Shows all the VMs running in the cloud along with the host and VMs resources. Useful for selecting the next VMs valid names. view cloud all hosts.py Shows all running compute hosts running in the cloud along with their resources and number of VMs. view cloud detailed host.py Shows all VMs running in a specific compute host along with their resources. keystone tenant-list View all tenants configured in the cloud. These will belong to the current tenant. Example 1: Viewing the entire cloud # python view cloud.py Figure B.10 shows the entire view of the cloud including all the VMs running, their resources as well as their owners. The information provided here is ideal for selecting the compute hosts on which to run simulations. This is because the compute hosts are listed with all their running VMs. # python view cloud by vms.py Figure B.11 shows the entire view of the cloud including all the VMs running, their resources as well as their owners. The information provided here is ideal for selecting the next valid names for VMs. This is because all VMs names belonging to a tenant appear in sequential order. Example 2: Viewing Cloud compute hosts # python view cloud all hosts.py Figure B.12 shows the compute hosts running in the cloud along with their resources and the total number of VMs running on each of them. # python view cloud detailed host.py cp01 Figure B.13 shows the detailed view of a specific compute host in the cloud. In this example we have a detailed view of compute host cp01 specified on the above command line. Example 3: Viewing Cloud tenants # source admin-openrc.sh # keystone tenant-list Figure B.14 shows the detailed view of all tenants configured in the cloud. In this example tenant louis, patrick, test and demo have been created for experiments. The admin tenant is used for administrative purposes and tenant service is used for service creation in the cloud. Example 4: Viewing the load requests # python db view load.py B.4. USE-CASE 3: EXPLORING THE CLOUD USING THE IMPLEMENTED PRIMITIVES87 Figure B.10: View of the entire cloud sorted on compute hosts names. Useful to select compute hosts for simulations Figure B.11: Entire view of the cloud sorted on VMs names. Useful to select the next valid VMs names in the cloud 88 APPENDIX B. USER GUIDE Figure B.12: View of the compute hosts along with their resources and number of VMs Figure B.13: Detailed view of compute host cp01 Figure B.14: View of all the tenants configured in the cloud B.4. USE-CASE 3: EXPLORING THE CLOUD USING THE IMPLEMENTED PRIMITIVES89 Figure B.15: View of the current loadfile Figure B.15 shows the input loadfile. It contains the resources requests for VMs appearing in the loadfile. The start time represents the start time of load placement on a specific VM. 90 APPENDIX B. USER GUIDE Appendix C Contents of the CD All the implemented code. 91