Critical Evaluation of Current Approaches to Grid Security
Transcription
Critical Evaluation of Current Approaches to Grid Security
MSc Secure Electronic Commerce Royal Holloway, University of London Critical Evaluation of Current Approaches to Grid Security Submitted by Ali Nasrat Haidar Supervised by Dr. Kenny Paterson 2002-2003 Critical Evaluation of Current Approaches to Grid Security CONTENT ACKNOWLEDGMENT ..........................................................................5 CHAPTER 1 ..............................................................................................6 INTRODUCTION TO GRID SECURITY................................................6 1.1 Introduction..........................................................................................................6 1.2 Example of a Grid application .............................................................................6 1.3 Globus ..................................................................................................................7 1.3.1 Globus Model............................................................................................8 1.3.2 Globus and CORBA ...............................................................................10 1.4 Setting the scene ................................................................................................10 1.5 Security issues on the Grid ................................................................................13 1.6 E-commerce security Vs. Grid Security ............................................................16 1.7 Aim of this research ...........................................................................................16 1.8 Dissertation Organisation...................................................................................17 CHAPTER 2 ............................................................................................18 VIRTUAL ORGANISATION..................................................................18 2.1 Introduction........................................................................................................18 2.2 VO partner organisations trust all VO members................................................19 2.3 VO organizations trust all VO members and a Central Database....................20 2.4 Public Key Infrastructure (PKI).........................................................................22 2.4.1 Overview of Public Key Cryptography ..................................................22 2.4.2 X.509 Certificate.....................................................................................23 2.4.3 Certificate Authority (CA)......................................................................24 2.4.4 Registration Authority (RA) ...................................................................25 2.4.5 Certificate Revocation ............................................................................25 2.4.6 Problems with PKI..................................................................................27 2.5 VO organizations trust a third party...................................................................28 2 Critical Evaluation of Current Approaches to Grid Security CHAPTER 3 ............................................................................................32 GRID AUTHENTICATION....................................................................32 3.1 Introduction........................................................................................................32 3.2 Design issues in Grid Authentication protocol ..................................................32 3.3 Approaches to Authentication............................................................................33 3.4 VO sites trust all VO members ..........................................................................34 3.5 VO sites trust all VO members and a Central Database ....................................35 3.6 Grid Authentication with PKI: VO sites trust a third party ...............................37 3.6.1 Advantages of using PKI on the Grid .....................................................39 3.6.2 Vulnerabilities in PKI .............................................................................39 3.7 Proxies and Delegation ......................................................................................40 3.8 Security issues in proxies...................................................................................41 3.9 Other Alternatives..............................................................................................42 3.10 Globus Toolkit (GT2): GSI approach to Authentication .................................43 3.10.1 GT2 authentication with proxies...........................................................44 3.10.2 MyProxy ...............................................................................................46 CHAPTER 4 ............................................................................................47 GRID AUTHORISATION ......................................................................47 4.1 Introduction........................................................................................................47 4.2 Fundamental model of access control................................................................47 4.3 Resource centred with Access Control List (ACL) ...........................................49 4.4 Role Based Access Control (RBAC) .................................................................50 4.5 Distributed Authorisation...................................................................................51 4.6 Globus approach to Authorisation .....................................................................53 4.7 Community Authorisation Service (CAS) .........................................................54 4.8 Access Control with PKI ...................................................................................56 4.9 Firewalls and the Grid........................................................................................57 4.9.1 Brief overview of firewalls .....................................................................57 4.9.2 Accessing resources behind the firewall.................................................58 4.9.3 Naming issue with Network Address Translation (NAT) ......................59 4.9.4 Globus and firewalls ...............................................................................60 4.10 Future network solutions..................................................................................60 3 Critical Evaluation of Current Approaches to Grid Security CHAPTER 5 ............................................................................................62 CONFIDENTIALITY, INTEGRITY, AVAILABILITY AND ACCOUNTABILITY ON THE GRID ....................................................62 5.1 Introduction........................................................................................................62 5.2 Confidentiality on the Grid ................................................................................63 5.2.1 Brief overview of Encryption .................................................................63 5.2.1 Communication security .........................................................................66 5.2.2 Data resource privacy .............................................................................67 5.2.3 Remote Data privacy...............................................................................67 5.3 Integrity on the Grid...........................................................................................68 5.4 Grid Availability ................................................................................................69 5.5 Accountability....................................................................................................71 CHAPTER 6 ............................................................................................73 TOWARD A TOP DOWN VIEW OF GRID SECURITY......................73 6.1 Introduction........................................................................................................73 6.2 Risk management of Grid Assets.......................................................................74 6.2.1 Overview of Risk and Risk Analysis ......................................................74 6.2.2 Risk Analysis of Grid Assets ..................................................................75 6.2.3 Enhancing security of core Grid Assets..................................................76 6.3 Security hierarchy of Grid Resources ................................................................77 6.4 Threats................................................................................................................80 7. CONCLUSION ...................................................................................81 REFERENCES........................................................................................84 4 Critical Evaluation of Current Approaches to Grid Security Acknowledgment I am most grateful to Dr. Kenny Paterson for his supervision, guidance, illuminating discussions and critical feedbacks on this project. In fact, I couldn't have wished for a better supervisor! I am also very grateful to Prof. Peter Wild for his extremely valuable comments on a draft of this report and for his supervision while Dr. Paterson was away in August. Having said that, all inaccuracies and deficiencies in this report are my responsibility alone. I would like to thank all the lecturers, research staff, and MSc colleagues in the Information Security Group at RHUL for creating an exiting lively environment for learning, intellectual discussions, and professional activities within a most friendly and enjoyable atmosphere! I am also indebted to several people who gave wonderful lectures at the MENA Advanced Summer School on parallel, distributed, and Internet computing, July 7-19, 2002. Most of all, Prof. Dieter Gollmann for interesting me in security through his brilliant lectures on this subject. Prof. Mark Baker (Portsmouth) and Prof. Salim Hariri (Arizona) for introducing me to Grid, parallel computing and middleware concepts. Prof. Peter Coveney (director of the Centre for Computational Science, UCL) for his lectures on RealityGrid and for many enjoyable discussions this year which made me appreciate the big gap between what scientists require and what current Grid Security solutions offer. Finally, Dr. Ali Abdallah (London South Bank University) for introducing me to formal modelling, abstraction, and precision. 5 Critical Evaluation of Current Approaches to Grid Security Chapter 1 Introduction to Grid Security 1.1 Introduction The vision of the computational Grid [1, 3] is to provide high performance computing and data infrastructure supporting flexible, secure and coordinated resource sharing among dynamic collections of individuals and institutions known as “virtual organizations” (VO) [1, 3]. “Grid Computing” is rapidly emerging from the scientific and academic area to the industrial and commercial world. It is intended to offer seamless and uniform access to substantial resources without having to consider their geographical locations. Resources can be high performance supercomputers, massive storage space, sensors, satellites, software applications, and data belonging to different institutions and connected through the Internet [1, 3]. Grids can enable collaboration between several organisations [1, 3]. The Grid provides the infrastructure that enables dispersed institutions (commercial companies, universities, government institutions, and laboratories) to form virtual organisations (VOs) that share resources and collaborate for the sake of solving common problems. 1.2 Example of a Grid application A typical example of a Grid application is “weather prediction”. This involves collaboration between several partners: TV stations that produce regular weather news reports, a Satellite Company that regularly provides space images of the earth, a super computing centre that rapidly analyses the images and a visualization centre that produces visual interpretations of the weather analysis (Figure 1.1). The smooth running of this project for the timely production of regular weather reports crucially depends on appropriate schemas for securely sharing, exchanging, and coordinating information between these partners. 6 Critical Evaluation of Current Approaches to Grid Security Satellite Company TV Station VO Visualization Center Analysis Center Intensive Computation Visual. Tool Figure 1.1 Weather prediction using Grid The power of Grid is particularly useful in arenas involved in intensive processing such as life science research [26], financial modelling [26], industrial design [26], and graphics rendering [26]. Many governments have recently initiated special programmes to support the Grid: UK e-science programme [27] is funding project such as Reality Grid [28]. EU is funding projects such as European Data Grid [29], EuroGrid [30], and in US, NASA is funding an Information Power Grid [31] and Department of Energy is funding Globus project [32]. The benefits of having partnerships between institutions to achieve ambitious projects have been well recognised and well documented in [1]. Currently these programmes exist in concept and one of the biggest barriers to their realisation is security. 1.3 Globus Globus [31] was the first attempt to coordinate resource sharing between many research institutions in the US collaborating together to enable the construction of computational Grid. It provides a software infrastructure, Globus Toolkit [31], which enables an application to pool geographically distributed instruments, visualization 7 Critical Evaluation of Current Approaches to Grid Security tools, high Performance Computing, and information resources. It also introduces security access controls to resources, despite their geographical distribution and their heterogeneous nature. The primary objective of Globus is to integrate these heterogeneous resources into a single virtual machine [8]. Currently, the Globus Toolkit, GT2, [31] is considered the de-facto standard middleware used for building Grid applications because of its wide acceptance and deployment worldwide [3, 7]. Many Globus concepts are adopted in IBM products [26] and current Grid projects such as NASA’s Information Power Grid and European Data Grid. Several alternatives do exist such as Legions [39] and Unicore [34], which is used by Reality Grid project for building Grid applications. 1.3.1 Globus Model Globus adopts the “hourglass model”. This model provides a set of core services as basic infrastructure. A typical Grid environment would incorporate heterogeneous resources such as different machines, different operating systems, and possibly different hardware architecture. Therefore, it is very difficult to design a middleware that meets all sorts of applications requirements. Each Grid project may be specialised in different area that requires different types of resources. This model is used to construct high-level specialised Grid applications. Diverse applications Core Services (GSI) Various OS Figure 1.2 The “hourglass” model of Globus [31] 8 Critical Evaluation of Current Approaches to Grid Security Instead of offering a specialised solution for Grid applications, this model will allow developer to build dedicated applications on top of Globus, keeping the latter participation low. Globus consists of many components namely: Globus Toolkit Resource Allocation Manager (GRAM) [45], Grid FTP, Monitoring and Discovery Service (MDS) [46] and Grid Security Infrastructure (GSI) [8]. GRAM: is an HTTP-based protocol used for remote allocation of computational resources and for monitoring and managing the status of the execution on those resources [8]. Here is brief explanation of these components. GridFTP: is based on File Transfer Protocol, is used to provide a high-performance, secure and reliable data transfer and data access on the Grid. MDS: is used to provide access to static and dynamic information of resources. This information includes capability and availability of the resource. GSI: is used to provide authentication and related security services discussed is details in the next section. GridFTP MDS GRAM Grid Security Infrastructure (GSI) Figure 1.3 Components hierarchy in GT2 Grid Security Infrastructure (GSI) component is the most important part of the Globus Toolkit. All other components are built on top of it (figure 1.3). This will help developers to define protocols and primitives that allow “…Secure negotiations, initiations, monitoring, accounting, and payment of sharing operations on individual resources” [3]. The functionalities of GSI are discussed in more details in chapter 3 and 4. 9 Critical Evaluation of Current Approaches to Grid Security 1.3.2 Globus and CORBA Many large organizations such as financial institutions have built their IT system over the years by adding new hardware, software and applications to meet new user requirements. The upgrade sometimes requires rewriting an application in order to work on a new platform. Therefore, sharing data between those applications requires customised software solution [2]. CORBA [2] is a middleware designed to enable the construction of distributed applications within an organisation. It provides a set of library primitives that allows applications to be constructed to run on machines with different kinds of operating systems and different classes of parallel hardware architectures. Thus, improving the interoperability of applications. CORBA is built on client-server model rather than coordinated use of multiple resources [3]. CORBA does not allow: • A user to grant access rights to an application running on a remote site. • Pooling resources from multiple administrative domains when required to solve a complex a problem. • Sharing between different organisations. The sharing in CORBA is restricted within one organization with one security policy rather than many organizations with different security policies. Globus complements CORBA rather than replacing it. There are many scenarios described in [3] on how Globus could possibly be used with CORBA within enterprise computing. We examine Globus security mechanisms in several sections throughout this project. Globus approach to authentication in chapter 3, to authorisation and firewalls in chapter 4. 1.4 Setting the scene To set the scene, we shall first understand what a virtual organisation (VO) is. A VO is a community of resource providers and users from multiple administrative domains, collaborating in order to achieve common objectives [3]. The following VO example will be used in several sections through this work. We have a virtual organisation (VO) comprising several institutions collaborating in a Grid project. These institutions can be Academic, Governments, Industrial and 10 Critical Evaluation of Current Approaches to Grid Security Commercial institutions. We assume that these institutions are geographically distributed. Figure 1.4 illustrates the Grid Infrastructure and the collaborating partners’ characteristics. Grid Resources Company A University B VO GRID INFRASTUCTURE Lab C Company D Grid Users Figure 1.4 A typical Virtual Organisation and Grid Infrastructure Each institution in the project has a local security policy that governs access to its local resources. Although these institutions are partners in the VO, not all their users are members of the VO (denoted Grid users in Figure 1.4). In addition, not all resources are shared with the VO (denoted Grid resources in Figure 1.4). Some resources are restricted to the VO and only accessible by local users. Each institution has a local intranet security solution such as Kerberos [2, 3] or Public Key Infrastructure (PKI) [10, 12, 13, 14, 53]. 11 Critical Evaluation of Current Approaches to Grid Security The main roles involved in this VO are: User: This role includes users from institution A, B, C and D. Each user has a job position and permissions associated to him to perform job functions. Let U.A denotes user U from institution A. Administrator: This role includes administrators from institution A, B, C and D. The administrator is responsible for managing project users’ accounts and granting permissions to those users/roles. In addition, this role involves administering firewalls, intrusion detection systems and databases. Let Admin.A denotes administrator Admin from institution A. Site Contact: Each VO site will have a site Contact. The main role of the site contact is to confirm the identity of users from the site he represents. Also, the site contact role includes communicating with resource administrators in other sites in order to add/remove new users in the VO. Let S.A denotes Site contact in A. Project Leader: Each Grid project has a project leader. This role involves identifying members of the project, accepting new users, and setting permissions for the project members according to the project security policy. Registration Authority (RA) [12, 13, 14]: This role is only relevant when PKI solution is adopted. The RA is responsible for authenticating project users’ identity and submitting certificate request on their behalf to the Certificate Authority (CA) [12, 14]. Resource: The role of a resource is to provide services to project users. On the Grid, a resource can submit jobs on a user’s behalf to other remote resources in different sites. Therefore, resources must be authenticated. That is why they are considered as a role. The resources that are being shared can include [1]: • Computer resources such as high performance supercomputers, large clusters, massive storage devices and desktop machines. • Data resources such as databases, archives and sophisticated simulation software. • Instruments such as telescopes, satellites, lab facilities and sensors. These resources are dynamic and heterogeneous. For instance, new computing power can be added or removed to the project. Data resources such as databases and archives can be available for a limited period for the project. Let R.A denotes resource R from site A. 12 Critical Evaluation of Current Approaches to Grid Security 1.5 Security issues on the Grid Grid applications are characterised by the coordinated use of resources from different administrative domains. Figure 1.5 indicates this situation by showing the policies and platforms in each domain. Each site in the VO is independently administered and has its own local security solutions such as Kerberos and PKI. These solutions are built on top of different platforms such as UNIX [35], Windows [36] and OS2 [37]. Company A University B Policy B Policy A UNIX Kerberos VO Global Policy GRID INFRASTUCTURE Lab C Company D Policy D Policy C OS2 PKI Figure 1.5 Reconcile local policies with Global policy When these institutions are brought together to collaborate on a common project in this heterogeneous environment, many security problems arise: Interoperability: Interoperability is a key issue on the Grid. It is impractical to change the security mechanisms at each site in the VO due to technical, financial and political reasons [7]. Thus, the security of the Grid project must be able to interoperate with the local security solutions at different levels: Policy level: Each partner in the VO has its own security policy (Figure 1.5), which is carefully tailored to maximise the protection of its valuable resources. The main issues to be addressed are: 13 Critical Evaluation of Current Approaches to Grid Security • How to reconcile global security policy with local security policy. • How conflicts between local and global policy can be solved, in other words which policy will apply, local or global. Authentication level: VO sites require mechanisms for identifying users from one security domain to another. For example, the identity of a user from company A (U.A) and his credential as expressed in Policy A are meaningless in the other VO sites. Therefore, how does U.A authenticate (i.e. UNIX login) to site B to access resource (R.B) (i.e. Kerberos)? Authorisation level: Access control mechanisms used vary from one VO site to another depending on the type and value of the resource accommodated. For example, site A may use an Access Control List (ACL) [2] or a Role Based Access Control (RBAC) [2] as mechanisms in order to gain access to its resources. The first problem is how to determine whether a user, U.A, authenticated in site B, is allowed access to resource, R.B in B. The second is who decides what the access rights of U.A are. Scalability: The number of users and resources in the VO is dynamic. New users/resources can be added/removed to the project as required. For example, CERN [38], a high energy and nuclear physics project, involves 1800 physicists from 150 institutions in 32 countries. Thus, a scalable way to manage users’ authentication and their access rights to access project resources is required. Confidentiality and integrity issues: On the Grid, users transmit data over the Internet and access remote data resources that may be very sensitive. Moreover, Grid users can run programs on remote sites. Therefore, confidentiality and integrity are required to: • Protect transmitted data over a public network such as the Internet • Ensure the privacy and accuracy of the results of programs executed on remote sites. • Ensure the secrecy and correctness of the shared data resources. Trust: Scientists and commercial companies want to know whom they are trusting with their data and commodities. The question that arises: Who to trust individuals/sites/third parties. 14 Critical Evaluation of Current Approaches to Grid Security Usability: Grid users are from different types of organisations such as academic, government and financial institutions. Thus, they may not be security experts. Therefore, usability is required so that access to the VO resources is as smooth and seamless as access to local resources. Firewall: A frequently encountered problem on the Grid is firewalls [7, 57]. VO members want to share resources with other partners but also, want to keep their other resources private. Collaborating partners on the Grid have to allow requests from and replies to jobs initiated from other sites to pass through their firewall to access their resources. This requires opening a port in the firewall to access those resources, which could introduce another vulnerability to the local security of the VO partner’s organisation. For commercial companies, it is unthinkable to compromise local security so they may end up without collaboration. Company A University B LL WA E R FI FI RE WA LL VO GRID INFRASTUCTURE Lab C Company D LL WA E R FI FI RE WA LL Figure 1.6 VO with Firewalls 15 Critical Evaluation of Current Approaches to Grid Security 1.6 E-commerce security Vs. Grid Security The major differences between e-commerce security and Grid security are: Collaboration and sharing: In e-commerce, there is no concept of collaboration and resource sharing. The notion of resources in e-commerce is restricted to files and databases. Remote program execution: Users in e-commerce cannot run programs because of the security consequences on the e-commerce company. In Grid, users can run programs on remote sites. Trust: There is no trust relationship between the e-commerce Company and customers. This allows them to install firewall and define trusted zones: Trusted in the company and un-trusted zone that is the Internet Authentication: Authentication in e-commerce is not a top priority as in Grid. As long as the customer can provide a valid credit card details, he can get access to services and resources on the company’s site. 1.7 Aim of this research The aim of this research is to investigate, compare and contrast several approaches to Grid Security. The Grid is based on concepts of virtual organizations (VOs) whose definitions, administrative capabilities and functionalities have rapidly evolved over the last decade. One reason for this evolution, among several others, is the drive to provide more satisfactory solutions to aspects of Grid security such as: integrity, confidentiality, availability, accountability, authorisation, and authentication. The Grid solutions to these aspects depend strongly on the definition of a VO, the trust relationship between VO sites and third parties, and the adoption of new advances in cryptography and PKI technology. The main aims of this research are to: • Give a brief introduction to Grid computing, the reasons for using it, and to show why in practice Grid computing is important. • Present several types of VOs ranging from elementary to current Grid concept by precisely modelling the roles involved, the administrative capabilities, and the trust relationships. Understand the concept and the mission of a Virtual organisation. 16 Critical Evaluation of Current Approaches to Grid Security • Understand how to reconcile aspects of security policies of local sites with the global security policy of the VO • Understand mechanisms for achieving certain fundamental security aspects such as: o Authentication mechanisms: user name and password, Single Sign-On [2], PKI, identity delegation [3, 4, 5]. o Authorisation mechanisms: Access Control Lists [2], Role Based Access Control [2], Identity mapping to apply local security [3, 6], Community Authorisation Service [5] based on Trusted Third Party. o Confidentiality, Integrity, Availability and accountability mechanisms. • Give an overview of the de-facto standard Globus [1] project. Globus infrastructure introduces security access controls over resources despite their geographical distribution, and their heterogeneous nature. • Give a critical discussion of the mechanisms and solutions to aspects of Grid security in each type of VOs and compare it with Globus (GT2, toolkit 2). Provide a serious attempt at giving a top down view of Grid security based on combining classical security definitions with useful concepts taken from risk analysis, threat modelling and resource ordering (hierarchy). 1.8 Dissertation Organisation The dissertation is organised as follows. In Chapter 2, we discuss how the concept of VO has evolved in the last decade and attempt to clarify the mission of the VO via several schemes. In Chapter 3 we present the authentication problem on the Grid, the issues to be solved, possible solutions and examples from Grid Security Infrastructures and Globus will be used throughout the chapter. In Chapter 4 we present the authorisation problem, issues to be solved, possible solutions and Globus approach. In Chapter 5, we explain how confidentiality, Integrity and Availability are maintained on the Grid. We give our definition of the accountability problem on the Grid. In Chapter 6 we present a top down view of Grid security. We discuss relationship between resources on the Grid and security considerations and conclude with current status of the Grid and future work. 17 Critical Evaluation of Current Approaches to Grid Security Chapter 2 Virtual Organisation 2.1 Introduction Virtual organisation (VO) [1, 3] is a fundamental concept on the Grid. A VO is a set of resource providers and users from multiple administrative domains, possibly geographically distributed, collaborating in order to achieve common objectives [3]. The essential issue about the Grid is enabling sharing of resources. According to [3] “…This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO)”. The difference between a VO and a classical organisation is that individuals and/or institutions that have agreed to collaborate and share resources continue to belong to real organisations. In addition, they may be members of several VOs at the same time [3]. The characterization of the VO given above is very broad. Understanding what does a VO mean is essential for the design of authentication protocols and authorisation mechanisms on the Grid. Researchers and scientists want to know who are trusting with their data. Commercial companies want to know whom they trust with their commodities. The aim of this chapter is to clarify the mission of the VO, the explicit assumptions about trust relationships among roles in the VO (administrators, sites, users, certificate authorities, and third parties), and how security policy conflicts between VO and local sites are resolved. We will investigate several VO schemes, and see how these schemes have evolved over the last decade. In addition, we will analyse their merits and weaknesses. 18 Critical Evaluation of Current Approaches to Grid Security 2.2 VO partner organisations trust all VO members Typically, a VO member will have access to a wide range of resources within a Grid project. The question to be addressed is “how does a user from an institution become a VO member?” In the first scheme, a VO is set of ad hoc connections. Consider the Grid project described in section 1.3. Company A University B Ad hoc connections Lab C Company D Figure 2.1 The first scheme: VO sites trust all VO members The roles involved in this scheme are: site Contact and Users. Each VO site will have a site Contact (i.e. S.A, S.B). Figure 2.1 illustrates how user U.A can be a VO member. Each user who wishes to join the VO will need to have a separate account at each VO site. The VO sites will treat project users as an extended part of their core organisation. Therefore, if U.A has a valid username and password for each resource at each site, then all VO sites will consider him trustworthy. Thus, the trust relationship in this scheme is with “named users”. The policy of the VO will be governed by local policies because each site will locally authenticate users and apply its local security policy. 19 Critical Evaluation of Current Approaches to Grid Security This scheme has several disadvantages: 1. The user is responsible for the details and security key of each account. Therefore, the security of the VO resources depends on how U.A protects his account details. 2. It is unacceptable to security-sensitive companies, in particular, commercial companies. Because users will be considered an extended part of the organisation while they still belong to a different institution. 3. It is not scalable. Administrator from one VO site will have to manage in addition to his local users, members from different institutions. 2.3 VO organizations trust all VO members and a Central Database In this scheme, the VO is a central authority responsible for maintaining user’s account information to access VO resources. This information is stored in a database maintained by the VO. All VO members and all VO sites trust the VO database. Thus, the VO becomes a trusted third party that has its own policy, roles, procedures and mechanisms. The VO is set up and funded by the partners and is managed by employees from the participating institutions. The roles involved in this scheme include: Project Leader: This role involves identifying members of the project, accepting new users, and setting permissions for the project members according to the project security policy. An employee from institution C, emp.C, can also have a role as project leader in the VO, Porjectleader.VO Site contact: the site contact role includes liaising with resources’ administrators in other sites in order to confirm the identity of a user from his institution who wishes to join the VO. Administrator: The administrator is responsible for managing project users’ accounts information on the VO’s database server. For instance, an administrator from institution A, Admin.A can also have a role as VO database administrator, Admin.VO. Users: This role includes users from VO partners’ organisations. Each user will have account in the VO that allows him to gain access to VO resources. 20 Critical Evaluation of Current Approaches to Grid Security To summarise, the VO is set up and funded by the collaborating partners. It is managed by employees from the collaborating institutions (A, B, C and D). Those employees may have a role in their organisation that is different from the role in the VO. For instance a project manager in institution A, can be the database administrator in the VO. A scientist from institution C can be the project leader of the VO. For U.A to be a VO member (Figure 2.2): 1. Needs to register in the VO database. 2. The project leader in the VO contacts institution A to get assurance that U.A’s registered information is true. 3. It is the job of Site contact, S.A, to confirm the validity of this information. 4. If the confirmation is positive, the project leader contacts the administrators of the other VO sites to create a local account and permissions for U.A. [steps 4,5,6] 5. Each local site administrator in the VO will send the newly created username and password to the VO. This credential will be stored in the U.A’s account on the VO’s server. Thus, U.A does not need to store these credentials locally. [steps 7,8,9] VO Company A DB User.A 1-Register 2-Verify information Site contact. A 3- Information valid Project Leader initiates steps 2,4, 5 and 6 4- Create account U.A 7- (u1,pw1,B) Company B 5- Create account U.A 8- (u2,pw2,C) Company C 6- Create account U.A 9-(u3,pw3,D) Company D Figure 2.2 All VO organizations trust all VO members and the VO server 21 Critical Evaluation of Current Approaches to Grid Security This scheme provides significant improvement over the previous scheme. The credentials needed to access VO resources are protected by the VO, not by the user. This substantially reduces the risk that passwords to resources be compromised due to user mismanagement. In addition, it gives assurance to the VO organisations about the identity of users accessing their resources. Also, this scheme introduces new risks: • The VO becomes a single point of attack. A successful denial of service attack on the VO database server will leave users disconnected from their resources. • The VO central database is vulnerable to failure, which also could prevent users from accessing project resources. However, this problem can be solved with replication techniques. The third VO scheme is based on Public Key Infrastructure (PKI). It requires understanding the main components of PKI and their functionalities, which are described next. 2.4 Public Key Infrastructure (PKI) PKI is recognized as an essential enabling technology for security in a large-scale network. The core concept in PKI is that of a “certificate” [12, 13, 53]. A certificate is a data structure containing the public key and related details about the key owner and signed by a Certification Authority (CA), thus it is tamper proof [10, 12]. The role of the certificate is to bind the public key to a particular entity on the Grid. The private key represents the identity of each entity on the Grid: user, resource and process. An important advantage of PKI is Interoperability. PKI functionality is increasingly included in standard products such as Windows [36], Web-Services [22], .NET [22] and XML [22, 33], and Secure Shell (SSH) [18, 47], Secure Socket Layer (SSL)/ Transport Layer Security (TLS) [9, 18]. The standards related to PKIs are X.509 v3 [12], PKIX (RFC 2459) [12], and SPKI (RFC 2692, 93) [12]. 2.4.1 Overview of Public Key Cryptography With Public key cryptography [10] (asymmetric encryption), each party has a pair of related keys: one for encryption and one for decryption. The same key cannot be used 22 Critical Evaluation of Current Approaches to Grid Security for both. The main assumption in public key cryptography is that one of the keys must remain secret (private key) and the other is made public (public key). If a message sender uses its private key to encrypt the message then any recipient who can obtain the public key can decrypt it. In contrast, if a message sender uses a public key to encrypt a message then only the owner of the private can decrypt the message. Signing is the process of encrypting with the private key. Verification is the process of decrypting with the public key. RSA [11, 42] and EL-GAMAL [11] are two wellknown public key algorithms currently used. The main assumption in public key cryptography is that the “Private Key must remain secret”. Thus, it needs to be adequately protected. One way to protect the private key file is to encrypt it with a password. So, if it is stolen from a machine or a storage device, the user will have enough time to revoke his corresponding public key. For effectively protecting the private key, a smart card [6, 7] with a password can be used. The number of attempts to enter the right password can be restricted to prevent brute force attack [2] and dictionary [2] attacks on the smart card. The only problem with this solution is that not all users have smart card reader. 2.4.2 X.509 Certificate X.509 version 3 is the most widely used data format for public key certificates today. It provides a uniform way for expressing identity of entities on the Grid. It is also standardised by the Internet Engineering Task Force (IETF) [43]. For illustration, we give X.509 v3 certificate format in Figure 2.3. There are several types of certificates [13]: • Identity certificate: contains the public key of a user and his identity together with some other information, encrypted with the secret key of the CA. This makes the certificate tamper resistant. • Attribute certificate: contains a set of attributes of user such as his occupation, role in a company together with some other information, digitally signed under the private key of the CA. The extensions in the certificate are essential for the Grid as they allow standardising policy with respect to the use of certificates. Globus uses X.509 as the main credential for authenticating users of the Grid. 23 Critical Evaluation of Current Approaches to Grid Security Version Serial number Signature algorithm id Subject name Issuer name v3 54321 RSA with SHA1 CN=Ali Haidar, OU= MSc. Student, O =RHUL CA= Trust Me, OU= PKI, O = Trust Me Subject public key info 12676576436434654366543….. Validity period Not before 20/7/02 , Not after 27/07/04 Issuer Public key info 657854566765756732522115…. Extensions Key Usage, Policy restriction….. CA Signature 56456454$$$&&666894#4… Figure 2.3 X.509 version 3 certificate format (Adapted from [12]) 2.4.3 Certificate Authority (CA) A CA is an institution trusted by others to guarantee for the authenticity of a public key [12, 14]. The main role of the CA is to issue digital certificates that cryptographically bind a public key to the user’s identity information [10]. This is done by signing the information using the CA’s private key. The relying parties require the CA’s public key so that they can verify the digital signature on the certificates issued by the CA [10]. The CA has many other responsibilities in addition to issuing certificates. These responsibilities include generating key pairs, revoking certificates and maintaining a Certificate Revocation List (CRL) and other revocation forms [12, 13]. A CA can be any partner in the Grid project such as a University, Government, or a third party operating for profit such as Verisign [40] and Entrust [41]. If the Grid project is large enough, then it might establish its own CA. For example, in UK, the escience programme has established its own CA called UK-CA [7], and in the Globus project, the Department of Energy in the US has established the DOE-CA [3] to authenticate all Globus users. A CA must be trusted only to issue valid certificate [2]. 24 Critical Evaluation of Current Approaches to Grid Security 2.4.4 Registration Authority (RA) The RA [14] is the trusted representative of the CA. The basic functionalities of the RA include: • Authenticating VO members’ claimed identity. The procedure of identifying a user’s identity depends on the sensitivity of the Grid project, and the resources involved. The procedure strictness may vary between face-to-face interviews, to a letter from the human resource department of the user’s organisation confirming his identity. • Providing the CA’s public key and certificate to VO members. The VO member establishes trust with the CA by obtaining the CA’s public key and the CA’s certificate from the RA in a secure way. For instance, personally via a floppy disk or over the Internet via a secure connection SSL. • Sending the certificate creation request to the CA. • Obtaining the public key from the subscriber (optional) 2.4.5 Certificate Revocation Certificate revocation [12, 14] is a core component of PKI. It is used for checking whether a certificate is still valid or not. There are number of events that can invalidate a certificate. For example, if the private key of the certificate is stolen or lost, if the CA’s private key is compromised, if the date has expired or if the holder of the certificate is no longer authorised to use the certificate [12, 46]. When any of these events happens, the certificate must be revoked, as it does not represent the owner anymore. The relying parties must check that a certificate is not revoked before using the public key on the certificate. There are different forms of revocation mechanisms such as: Certificate Revocation Lists (CRLs), Online Certificate Status Protocol (OCSP) and Simple Certification Verification Protocol (SCVP) [12, 14]. Here is a brief explanation of CRLs and OCSP mechanisms as they are commonly used. 2.4.5.1 Certificate Revocation List (CRL) The CRL is the first form of revocation mechanisms. It is a highly controlled online database that contains the serial number and the revocation time of certificates that have become invalid [12]. Many CAs update and sign the CRL on a daily basis to allow relying parties to verify the integrity and authenticity of the CRL [12]. This 25 Critical Evaluation of Current Approaches to Grid Security allows the CRL to be transmitted over a public network without being tampered with. The CRL is usually specific to a single CA. There are several types of CRL namely: Full CRL, Delta CRL and partitioned CRL [14]. The full CRL contains all revocation information for all certificates of a particular CA [14]. The main disadvantages of this type of CRLs are: • Freshness: The CRL is issued periodically, on a daily basis for instance. So, if a certificate is compromised and reported for revocation, it will not be revoked within 24 hours (revocation time should be reasonable). But in the mean time, the certificate can still be used while it is invalid because it is not list on the CRL yet. • Scalable: The size of the CRL may be large because it is always increasing. Thus, it will take time to download it every time. This is not convenient for the clients. In order to reduce the size of the full CRL, a delta CRL can be used [14]. The idea is to publish changes to the revocation information since the last full CRL was issued. The CRL that contains the changes is called delta CRL. This mechanism requires that the user already have a full CRL. In order to create a fresh CRL, the delta CRL can be applied to the full CRL. 2.4.5.2 Online Certificate Status Protocol (OCSP) OCSP [12, 14] is an online protocol for requesting status information of certificates. It is a server-based revocation mechanism that provides real time verification of certificate status information [12]: • OCSP Client requests status information for a specific certificate. • OCSP Server replies with a signed response status such as acceptable, revoked or unknown. The OCSP protocol has several disadvantages [14] • Response time and scaling: The response signing process will limit the server scalability because digital signature is computationally intensive. The signature is slow, but verification is fast. • Need for multiple queries to verify the entire certificate path. Thus, the time to set the connection with the server and perform the check can be long. • According to [14] “The protocol provides pre-computed responses including validity.” 26 Critical Evaluation of Current Approaches to Grid Security 2.4.6 Problems with PKI PKI functionalities can only be reliable if the CA, RA and the certificate revocation mechanisms are operating with high level of security. Here are some of the problems faced by PKI: • Registration process [14]: The major problem is entity registration with the RA. The RA needs to prove the identity of an entity, which will be issued a certificate by the CA. The procedure of validating that the entity’s information is correct before issuing and signing the certificate is vital. For example, in 2002, Verisign a commercial CA issued an ex-Microsoft employee a certificate that allows him to sign code on behalf of Microsoft as a current Microsoft employee [10]. When registering a host name, the verification process can be done using WHOIS lookup for host names on the Internet for instance. This will give the name of the institution and the corresponding Domain Name Server (DNS). • Key quality [11]: There is another important issue with Public key encryption in general that is the generation of the key-pair. A user who wishes to generate his own key pair will need to use specialised software. The quality of the key pairs depends on the ability of the Pseudo Random Number Generator [11], PRNG, in the software to produce a non-weak key-pair and also not to keep a copy of the private key. • Communicating private key to end entity [10, 13]: The user may allow the CA to generate the key pairs. The problem that arises is how to communicate the private key to the end entity in a secure way. Currently, a smart card is a reliable option but it is not practical because not all users have a smart card reader. • Revocation [13, 46]: The revocation mechanisms have been a continuing problem for PKI systems. There is a trade-off between how often a CRL update should be released and the security of the relying party’s system. According to [13], “Certificates are of most use in off-line scenarios, but revocation pushes to do online checks for the revocation”. If the relying parties need to get a new CRL each time they want to validate a certificate then the CA must be on-line, but the primary advantage of PKI is that it is supposed to allow off-line verifcation. • Bootstrapping (trusted anchor) [13]: When a user get a certificate from a CA, he will need to get a valid copy of the CA’s public key and certificate from a trusted 27 Critical Evaluation of Current Approaches to Grid Security source. Otherwise, how can he trust that the certificate is really valid, if there is no infrastructure to validate it? • Trusting other CAs [12, 46]: There are two practical problems with trusting other CAs: 1. Interoperability between the two CAs. Each CA may be using a different PKI product, thus, they may be using different cryptographic algorithms different revocation mechanisms and different revocation format. 2. How the CA operates. Even if one trusts their CA to validate another CA’s credentials (i.e. in cross-certification), should they trust that this other CA is taking the proper precautions in the entity they certify? In order to promote trust in PKI, issues of security, liability and obligations are contained in a Certificate Policy (CP) [12]. A CP “is a named set of rules that indicates the applicability of a public key certificate to a particular community and/or class of application with common security requirements...” [12]. Furthermore, some CAs publishes a detailed description of the practices followed in issuing and maintaining certificates in a Certificate Practice Statement (CPS) [12]. A CPS is “a statement of the practices which a CA employs in issuing public key certificates.” The CPS also is the basis for compliance audits in order to ensure that PKI components are operating in accordance to the specification contained in the CPS. Two CAs can establish trust after reviewing each other’s CP and that they both offer equivalent amount of trust and verification processes and liability [14]. 2.5 VO organizations trust a third party After defining the main PKI components and their functionalities, now, the third VO scheme can be described. The trust relationship in this scheme is with a third party, the Certification Authority (CA). All VO sites will trust the CA. The role of the CA is to issue digital certificates for project users. Before issuing a certificate, a background check on the identity of the certificate requestor and his role in his local institution must be done. A Registration Authority (RA) usually does this check. In this scheme, the VO acts as a RA for the CA. The VO becomes the trusted representative of the CA and is responsible for authenticating VO members’ identity. The roles involved in 28 Critical Evaluation of Current Approaches to Grid Security this scheme will include in addition to the roles described in the previous scheme, a Registration Authority role. Company A VO Certificate Authority Registration Authority VO Site Contact U.A Project leader S.A 1-Register 2-Verify Information 3-Verfied 4-Request Certificate 5- Certificate Distribution 6-Create Account U.A University B 7-same as 6 Lab C 8-same as 6 9- Confirmation Company D Figure 2.4 VO is a registration Authority Figure 2.4 illustrates how U.A can be a VO member: 1. U.A registers with the VO to get a certificate. He would be required to submit some form of identification (letter from HR or personal interview). 2. The VO will contact the institution A in order to verify that the information submitted by U.A is true. 3. The site contact in A confirms the U.A’s identity, and his role in the project. 4. The VO asks the CA to issue a digital certificate for U.A. 5. Once the certificate is created, the user can download it. 6. The project leader contacts all VO sites to create a local account for U.A based on the Subject name in his Certificate. [Steps 5,6, 7] 7. The user is sent a confirmation that his account with the VO has been established. [Step 9] 29 Critical Evaluation of Current Approaches to Grid Security This scheme has several advantages: • If implemented “properly” it can provide the Grid infrastructure with basic security features such as confidentiality, integrity and non-repudiation. • Uniform credentials. Users will have X509v3 certificate that allows expressing their identities to different security domains in a uniform way. Furthermore, it is platform independent, which means that it does not require major changes to local security solutions of local sites. • Interoperability. PKI functionality is increasingly included in standard products such as SSL/TLS. • This scheme does not require major changes in the local security policy of local sites. • The digital certificate is tamper resistant because the CA signed it with its private key. If an attacker attempts to modify a certificate, the modification will be detected. This scheme crucially relies on the secure operation of PKI components that are the CA, the RA and the CRL. Here are some of the issues that might put at risk the whole VO if it is based on PKI. • The CA’s private key is the crucial part in the whole public key infrastructure. The CA signs every issued digital certificate for Grid users. Therefore, it is crucial that the CA protects his private key according to best security practices [12]. • The procedure of validating that the user’s information is correct before issuing and signing the certificate is vital [12]. • A major way to jeopardize the trust of a PKI environment on the Grid is to compromise the integrity of the CRL process. If it is not possible to assure the validity of the certificate in use, then the whole Grid authentication system is at risk [12]. How to trust a CA The VO must decide which CA to trust. Grid involves institutions from different countries. One of these institutions may trust a local CA in the country where it resides. When this institution joins a Grid project, other VO members may choose not to accept their certificates, because the CA is not well known and they don’t know how this CA operates. Moreover, the security practice of the CA may be considered 30 Critical Evaluation of Current Approaches to Grid Security unreliable. Therefore, VO members need to agree on what CA to trust and the criterion of trusting a CA. This VO scheme is currently adopted in most Grid projects such as Globus and Unicore. Both rely on the existence of a public key infrastructure to allow users to access VO resources. 31 Critical Evaluation of Current Approaches to Grid Security Chapter 3 Grid Authentication 3.1 Introduction Authentication [2, 4] is paramount to Grid Security. Authentication is important to authorization, confidentiality and auditing. Thus, if authentication fails, the whole project security will fail as well. Authentication aims at verifying the identity of an entity [2]. Users of a Grid project often require remote access to project valuable resources and services over the Internet. Authentication is needed because the user identity is a parameter in most access control (authorisation) solutions used on the Grid [2]. Furthermore, access to confidential Grid information will only be authorised if users are properly authenticated. Finally, authentication is crucial to accountability, because user’s identity is part of security events logged in the audit trail [2]. In addition to authenticating users on the grid, resources and processes running on user’s behalf may require authentication [4]. For example, users want to make sure that they are communicating with a genuine resource before sending confidential data. 3.2 Design issues in Grid Authentication protocol There are several issues that need to be taken into account when designing an authentication protocol for the Grid: • The parties in the Grid project do not trust each other. Thus, the protocol must provide mutual authentication. • Interoperability is fundamental. Institutions have different platforms and security solutions (i.e. Kerberos and PKI). The authentication protocol must be uniform, not platform dependent. 32 Critical Evaluation of Current Approaches to Grid Security • Usability. For users, usability is extremely important. Access to the VO has to be seamless and transparent as accessing the local organisation’s resources. • Dynamic administrating. The number of users is dynamic. New users are added and removed as required. Issuing and revoking credentials for project users should be flexible. • Mobility. The user should not be tied to one machine. Scientists and researchers move between institutions to collaborate with each other. The authentication mechanism should allow users to be mobile. • Organizations have to make some changes to their security policy. The authentication protocol should make these changes minimal. • Scalability. Some VO may be large, such as CERN project 1800 physicists or small. The authentication protocol should be able to scale to the size of the virtual organisation. Currently only PKI seems to scale. • Single sign-on [2, 3]. The project may involve many resources. It is impractical for a user to authenticate to every resource during a session. The user must be able to authenticate once per work session. 3.3 Approaches to Authentication Consider the Grid project described in section 1.3. The first task of the VO is to decide how to authenticate users of that particular project. So, if a user U from institution A (U.A) is authenticated in A, would the VO admit him to access resource R in site B (R.B)? The answer to this question is deeply influenced by the nature of the trust relationships between VO users, VO organisations, and third parties. Chapter 2 has highlighted three VO schemes with different trust relationship. 1. VO organisations trust all VO users. In this scheme, trust is with “named users”. Each VO user has a separate account with each VO organisation. VO users are added to the core membership list of each organisation in the VO. The details and security key of each account are stored on the user’s machine. 2. VO organisations trust all VO users as well as a Central Database. The details and security key of each account is stored in a central database that users and partners trust. 33 Critical Evaluation of Current Approaches to Grid Security 3. VO organisations trust a third party. All VO organisations and VO users will trust a CA. The VO will act as a registration authority of that CA. Depending on which VO scheme is adopted, an authentication solution can be applied. This solution would not be applicable for other VO schemes. 3.4 VO sites trust all VO members Each VO user will have an account in each VO site. Therefore, any access to resource requires a username and a password. This means local sites will treat VO members as an extended part of their core organisation. The details and password for each account on the VO is stored with the user. In this case, U.A requires accounts on all VO sites and B has to authenticate U.A locally. This type of authentication is considered “weak authentication” [2]. 1- (u1,pw1) Company A (u1,pw1,B) User.A 2- (u2,pw2) (u2,pw2,C) University B Lab C 3-(u3,pw3) (u3,pw3,D) Company D Figure 3.1 User Authentication This solution has many disadvantages: 1) Unacceptable to “security-sensitive” organisations such as Commercial, Pharmaceuticals and Financial companies because adding non-staff member to their security domain weakens their security defences. These accounts will be behind the company’s firewall. 2) The solution does not scale because it involves lots of overhead management and administration. The local site administrator has to manage individual accounts. Each time a new user is added or removed to the project, the local site administrator has to be informed in order to create/delete accounts. 34 Critical Evaluation of Current Approaches to Grid Security 3) Usability problems. The user has to remember numerous usernames and passwords for each VO site. Even worse, a password may be required for each resource on the VO. 4) High vulnerability to attacks. The security of local sites depends on the strength of the password used by U.A. As a result, the user might take shortcuts by using the same password or a shorter password on all accounts or by sending the whole authorisation/authentication list by email to his/her mailbox. Passwords are vulnerable to guessing [2], dictionary attacks [2] and password cracking. 5) Long term credentials. Forcing users to change passwords at regular intervals is not practical because the user has too many passwords to change. Thus, it is very likely that the user will use the same password for the whole period of the project. 6) It does not allow coordinated use of resources. If a user submits a job to site B and this job determines that it needs some data from site C, then it will not continue because only the user knows the username and password for site C. 3.5 VO sites trust all VO members and a Central Database This approach is a variation of the previous solution. The security details of each account are stored in a Central database that users trust as well as VO partners. A central authority would be responsible for supplying the right account information for a given resource request. U.A is authenticated as a VO member first, and then the target resource site, site B, authenticates U.A locally. In this scenario, the VO maintains a central Database (DB). The DB contains a shared credential between the user and the VO. For each VO user, a list of username/ password for each resource in the VO is stored in the database. Figure 3.2 shows how a VO user can use the VO database to authenticate to a specific resource. The initial assumption is that only the VO user, the VO DB and the resource know the password. 35 Critical Evaluation of Current Approaches to Grid Security c is credential to R.B Company A User.A Central DB R, B est u Server q e r c t .A n u U S L co 1L/T .B ac SS R nt s se i .A 2- U SSL/TLS (U.A, (U1,PW1), R, B) (U.A, (U2,PW2), R, C) (U.A, (U3,PW3), R, D) 3-U.A uses c to authenticate to R.B Company B Figure 3.2 User authentication using VO DB When a user from A (U.A) wants to access resource at site B (R.B): 1. U.A sends his credentials and the target resource name to the VO over a secure connection such as SSL/TLS [9]. [U.A (Credential, Resource-name)]. 2. The VO compares U.A’s credential against the entries stored in the VO’s database. U.A’s authentication will succeed if his credential exists in the stored entries. The VO fetches the username and password (U1, Pw1) for the requested resource R.B. 3. U.A can now submit a task and use the username and password supplied by the VO to access R.B. This approach is similar to e-commerce online banking, where the user authenticates to the bank with a user name and password to access his account details. This solution has solved several problems from the previous approach such as: • Usability problem. The usability is significantly improved. U.A is authenticated to the VO with one username and password, instead of maintaining a list of passwords. • Accountability. Each user has a “Username” and a “Password” associated with his identity. Therefore, it is possible to determine the user responsible for performing a job on a specific resource. • Reduced vulnerability. The vulnerability is reduced because the list of username and passwords is under the VO’s protection, not the user. The user is only responsible for protecting one password to authenticate to the VO. 36 Critical Evaluation of Current Approaches to Grid Security This solution still has most of the drawbacks discussed in the previous section. These include: scalability, long-term passwords to resources, no coordinated use of resources and acceptability. Also, this solution introduces new security risks. The VO becomes a: • Single point of Attack. Since all sites in the VO will depend on the VO’s database server, a denial of service attack on that server will prevent users from accessing the project resources. • Single point of failure. A failure in the VO’s database server will leave all users disconnected from the VO resources. 3.6 Grid Authentication with PKI: VO sites trust a third party The previous two sections highlighted two authentication approaches based on what is called weak authentication [2]. Furthermore, the solutions described above do not satisfy many of the requirements mentioned in section 3.2. Presently, PKI is the most adopted authentication solution in many Grid projects such as Globus [32] and Unicore [34]. The trust relationship is with the CA. The VO acts as a registration authority (RA) for that CA. Each Grid user would be required to register with the VO in order to get a digital certificate. This certificate is issued by a CA trusted by the VO. Any access to resources on the Grid would have to be accompanied with a digital certificate. The authentication solution based on the third VO scheme is depicted in figure 3.3. The authentication process is achieved using SSL/TLS implementation on both sides. The protocol should be configured to provide mutual authentication. When a user U.A wants to access resource R. at site B (R.B): 1. U.A sends his digital certificate to R.B. 2. R.B checks whether the Certificate is issued by a trusted CA or not. Then, it checks the integrity and validity (format, expiry date and CRL) of the certificate. Finally, R.B verifies that the user can demonstrate proof of possession of the certificate private key. 3. R.B sends his digital certificate to U.A 4. U.A does the same checking as in step 2. 37 Critical Evaluation of Current Approaches to Grid Security If these steps are successfully done, then the mutual authentication succeeds. However, having a valid certificate trusted by the VO does not mean that U.A can access resources in any VO’s site. U.A still needs to be registered as a VO user at each VO site. Certificate Authority Trusted CA Trusted Company A Company B U.A SSL/TLS 1-U.A uses X.509v3 certificate to authenticate to R.B R.B Figure 3.3 User authentication using trusted third party Members of a Grid project may trust different Certificate Authorities that are considered trustworthy by the VO. Thus, they should be able to present credentials obtained from any source such as their local CA, a third party CA or the Project CA. Therefore, if U.A already has a certificate, this raises the following questions. Which CA issued the certificate? Is it trustworthy? How does the CA operate? If U.A’s certificate is not issued by a trusted CA, then U.A must get another certificate issued by a CA trusted by the VO. Otherwise, he cannot be part of the project. Consequently, U.A will need to use the new certificate to authenticate to the target site. However, A can be a member of another VO that trusts the CA who issued the first certificate. As a result, U.A will have many credentials, which causes a new problem: how to decide which one to use where. In this scheme, authentication may not necessarily involve explicitly named users. By using attribute certificates (see section 2.4.2 for details), VO Organisations might 38 Critical Evaluation of Current Approaches to Grid Security enable access to any user who can demonstrate that he is a “scientist”, “student” or a member of a specific institution. The protocol needs to be configured to allow mutual authentication otherwise the authentication fails. Also, the root certificates of different CAs trusted by the VO should be installed. Only the administrator in each VO site can add/remove a root certificate. It is the role of the VO (partner organisations) to decide what CA to trust. 3.6.1 Advantages of using PKI on the Grid There are many reasons for PKI to be the best candidate for authentication on the Grid. PKI solves most of the issues mentioned in section 3.2. In addition to the advantages mentioned in section 2.5, PKI if implemented properly [10]: • It provides a One-to-many authentication mechanism when used with a certificate. • It provides strong authentication [2] • It assumes no previous trust relationship between Grid entities. • Usability problem is significantly improved. The user is required to remember a pass phrase for his certificate only. • Provides for mobility, as the certificate can be stored on a floppy disk or smart card device. • Scalability. User authentication can be done offline. • Uniform credential through the use of X.509v3 certificates. 3.6.2 Vulnerabilities in PKI Section 2.4.6 has highlighted some of the issues that put PKI and Grid authentication at risk. The security of PKI crucially relies on the PKI components operating with high degree of security. For instance: • User’s private key must be kept confidential. • The CA’s private key must be protected with high degree of security. • The procedure used by the VO to authenticate candidate users must be strong in order to avoid identity theft at the point of certificate creation. [7] describes a real scenario for identity theft. 39 Critical Evaluation of Current Approaches to Grid Security • CA and RA computer systems and applications must be protected physically and protected from tampering according to best security practices such as BS7799 [52]. • Public key of the CA must be securely communicated to VO users and VO sites. 3.7 Proxies and Delegation A typical Grid project consists of a dynamic number of resources. A project user can initiate a job request that involves the coordinated use of these different resources at different sites. However, this raises two problems: 1. Convenience: it is impractical for the user to authenticate multiple times to access different resources [3]. 2. Vulnerability: the user will have to sign a challenge for each authentication with his private key. This provides an opportunity for an attacker to collect challenges and their corresponding cipher texts in order to recover the private key [5]. These problems are solved using proxy credentials [3, 5, 7]. A proxy certificate is a special X.509 v3 certificate that is signed by the private key of a Grid entity [4, 5]. It allows processes running on user’s behalf to authenticate the user directly to Grid resources. In addition, this certificate can be used for delegation [2, 5] that is the process of passing authority from one entity to another. The proxy credentials grants the bearer all/subset of the Grid entity’s access right [5]. These credentials have a short lifetime usually in term of hours. There are two types of proxy credentials [4, 5]: 1. “Restricted proxy” [4, 5] this type of proxies has the ability to hold policy information restricting its use. The policy restriction field is a part of the extensions of X.509 certificate. For instance, the policy may state that the holder of this proxy can run SELECT query on a specific database resource. If the user submits a request to run UPDATE query, it will fail, as it is not consistent with the policy on the proxy. This proxy type has been enhanced and submitted by the Globus project [3, 5] as an Internet draft to the Internet Engineering Task Force (IETF) PKIX working group [3, 4, 5]. 40 Critical Evaluation of Current Approaches to Grid Security 2. “Impersonation certificate” [4, 5] is an unrestricted proxy certificate that allows an entity to delegate all its authority to another entity. 3.8 Security issues in proxies The main advantage of proxy credentials is significant speed [4, 5, 6]. Users can run jobs that require coordinated access to resources without the user’s intervention. This raises new security concerns such as: • How the proxy’s key pairs are generated? • How the proxy’s private key is protected on the user’s machine? • When delegating proxy credentials to a remote Grid entity, who has control over the proxy’s private key? The first concern can be viewed from a cryptographic point of view, as the length and quality of the public key pairs are crucial. The random number generator used to generate primes for the public key should be reliable. Factorisation attacks have significantly improved in the last decade [11]. When the user generates a public key 512-bit long used with RSA [25], this means it is vulnerable to factorisation [10, 11]. As a result, the attacker will be able to recover the private key from the public key [28] and sign requests on user’s behalf. The minimum recommended key length is 760-bit [11]. Current implementation of RSA uses 1024-bit and 2048-bit [11]. Presently, there is no mechanism to detect proxy’s private key compromise [5]. Another problem with proxy credentials is clock synchronisation. A job may determine that it requires to access resources on a different site located in a different country. Since proxy’s lifetime is short, 10 hours in Globus [6], the remote site may be 12 hours ahead. As a result, it will consider the proxy invalid as it has expired. 41 Critical Evaluation of Current Approaches to Grid Security 3.9 Other Alternatives Currently, Kerberos and Secure Shell are commonly used authentication protocols. This section will show why they are not suitable for Grid environment. Here is a brief explanation of these protocols. Kerberos: Kerberos is a TTP-aided authentication protocol based on symmetric key cryptography [2]. The Grid user and a designated trustworthy server, Key Distributed Centre (KDC) [10], share a long-term secret key. The user authenticates to the resource by getting a ticket from the KDC. Kerberos splits the role of the KDC between 2 entities, the Authentication Server (AS) [2] and the Ticket Granting Server (TGS) [2], in order to limit bottlenecks and the exposure of the long-term keys. Kerberos achieves inter-organisational, authentication by sharing key servers (AS or KDC) with other organisations [6]. Kerberos meets many of the requirements mentioned in section 3.2, such as usability and single sign-on, but when used for multi-domain authentication, several issues arise: • Acceptability: according to [6] “using Kerberos for intersite authentication also means using it for intrasite authentication”. This is not acceptable for commercial companies, because sharing key servers to allow this type of authentication means giving up control over local policy. (PKI doesn’t require major changes in the local security policy of a VO site) • Scalability: Scalability is an issue, because it is very hard to extend the protocol to multiple administrative domains due to equipments and staffing cost [6]. Also, the performance bottleneck of having a TTP seems to always limit the scalability of a system. • Clock synchronization [2]: all clocks across VO sites will need to be synchronized. Thus the AS, TGS, and all clients and all servers must somehow have nearly the same time, or allow for the difference within the key periods and decrease security. On the Grid this is very difficult to achieve because VO resources may exist in different countries with different time zones. • Key Revocation [2]: Kerberos has no mechanism for key revocation, but relies on the timestamps to expire. Using short timestamps give more security, but require more tickets to be issued. 42 Critical Evaluation of Current Approaches to Grid Security • Availability: Kerberos relies on the Authentication Server and TGS server to be online. The availability becomes more important when using tickets with short lifetimes [2]. (PKI allow offline authentication) • Kerberos does not address the delegation of access rights (tickets) [2]. (In PKI, proxy certificates can be used for delegation) Secure Shell: Secure Shell, SSH [18], is another alternative used for providing remote login. SSH is based on public-key authentication and offers a secure channel over assumed reliable transport, typically TCP [18]. It supports mutual authentication and provides confidentiality of transmitted user credentials, and can be easily deployed. SSH, however, is not suitable for Grid authentication because of: • Usability: users have to copy the public keys for any VO site they want to access. This is not practical because not all users are security experts [2]. • Limited functionalities: according to [6], SSH supports limited capabilities such as remote shell and file transfer, but not others that require authentication, such as collaborative environments and web browsers. That is why PKI currently is the most used solution in Grid project such as Globus and Unicore. 3.10 Globus Toolkit (GT2): GSI approach to Authentication GSI deals with bridging between local securities solutions of different sites in the VO. It is based on the IETF standard TLS protocol [9], public key encryption [10, 11] and X509 version 3 certificates format. The infrastructure provides fundamental security services [7]: • Single and mutual authentication: Globus uses an implementation of TLS called Secure Socket Library (SSL). The implementation requires a PKI and configured to provide mutual authentication using X.509 certificate. • Single sign-on: in Globus, certificates and proxy credentials (details in section 3.7) are used to allow the user to authenticate once to access all Grid resources. Globus 43 Critical Evaluation of Current Approaches to Grid Security team have submitted a draft to the IETF, PKIX work group, to standardise the proxy certificate format. • Confidential Communication: transmitted data over the Internet is protected using SSL communication protocol (details on confidentiality in section 5.2). • Authorisation: Globus supports identity mapping (details in section 4.6) and Community authorisation Service CAS (details in section 4.7) as access control mechanisms. • Delegation: Globus supports identity delegation and access rights delegation from user to processes running on his behalf via short-lived proxy certificates 3.10.1 GT2 authentication with proxies The primary objective of GSI is to provide authentication and message protection [4, 6]. This section will describe user proxy authentication in GSI and the delegation process. These steps are usually implemented over a secure network using SSLv3 that provides mutual authentication, confidentiality and integrity. Consider the Grid project described in section 1.3. U.A wishes to authenticate to B in order to access resource R.B. The assumptions on U.A are: • U.A has a public key pair (PU.A, SU.A) where PU.A is the public key of U.A and SU.A is the corresponding private Key. • SU.A is known only by U.A • U.A has a certificate Cert-(U.A) issued by a CA trusted by the VO. U.A creates a proxy credentials on the local machine in two steps: 1. Generates new public key pair for the proxy (PPU.A, SPU.A) where PPU.A is the public key of U.A’s proxy and SPU.A is the corresponding private Key. 2. Creates certificate and signs proxy credentials with his private key SU.A to produce this proxy credential: {Proxy} SU.A The Proxy authentication process in GSI is depicted in the Figure 3.4. Here are the steps: 1. U.A sends to R.B his certificate and the proxy certificate. 2. R.B validates U.A’s certificate by using the CA public key, CRL and expiry date (step 2). 44 Critical Evaluation of Current Approaches to Grid Security 3. R.B checks the validity of the proxy by using the U.A’s public key recovered from U.A’s certificate (step 3). 4. R.B generates and sends a challenge to U.A (step 4-5) 5. U.A signs the challenge with the proxy’s private key (step 6) 6. Verify that the response has the genuine RAND (step 7) Once these steps are successfully done, then the authentication succeeds and R.B can consider that the user is associated with the identity on the Certificate of U.A. Initially U.A has U.A Cert-(U.A) R.B PCA 1- Cert-(U.A) + {proxy} SU.A (PU.A, SU.A) 2- Validate Cert-(U.A) PU.A PCA (PPU.A, SPU.A) 3- Validate {proxy} SU.A.. 4-Genrate RAND 5-RAND 6- {RAND}SPU.A 7-{RAND}SU.A PU.A RAND1 If RAND = RAND1 Then U.A is authenticated Figure 3.4 User Proxy authentication Proxy credentials can be delegated to a process on remote host [5]. For instance, U.A may need to run a program at resource R.B. The program running on R.B determines that it need to access resources on C, R.C. U.A can delegate a proxy credential to the program running on B to act on his behalf. Assuming that U.A and R.B have already established a secure channel using OpenSSL version, the delegation process is described as follows: U.A Cert-(U.A) , (PU.A SU.A) R.B R.C PCA 1-Run program at R.B PCA 2- Needs access to C PCA 3- Generate proxy credential 4- proxy (PP.B SP.B) Sign {proxy} SU.A 5- Cert-(U.A) +{proxy} SU.A 6- Cert-(U.A) +{proxy} SU.A 7- Request data from C 8- Authentication as in the previous diagram Figure 3.5 Delegation of proxy credential to a process on a remote resource 45 Critical Evaluation of Current Approaches to Grid Security 3.10.2 MyProxy Globus team has proposed an online credential trusted server, MyProxy, to store and manage long-term user credentials, private keys and certificates. In addition the server cab be used to perform delegation on user’s behalf [6]. When the user logs in, he creates a proxy certificate and sends it to the MyProxy server along with a tag and a pass phrase. When the user initiates a job request (i.e. program on remote site), the process running the job connects to the MyProxy server, presents the tag and the pass phrase, and receives a proxy for that user [6]. The advantages of introducing this server are: • There is no need to generate the key pairs on users’ machine. • Protect the user’s private key • Provide mobility for users so they don’t have to carry their private key on a floppy or send it to their mailbox. The disadvantage of this approach is that it becomes a target for attacks because it holds the tags and pass phrase for all VO site users who are using the Grid. 46 Critical Evaluation of Current Approaches to Grid Security Chapter 4 Grid Authorisation 4.1 Introduction Resources involved in a Grid a project can be extremely valuable such as supercomputers or extremely sensitive such as classified medical records. Thus controlling access to these resources is crucial to maintain their confidentiality [2], integrity [2] and availability [2]. On the Grid, authorisation aims at controlling and restricting access to resources [2]. Users of a Grid project often require access to valuable and sensitive resources that do not belong to their institutions. Authorisation is needed to allow legitimate Grid users to access confidential Grid information and resources. Furthermore, authorisation is vital to resources’ integrity because only authorised Grid users can modify data resources and equipments resources’ configurations. Finally, authorisation is essential to availability, because attackers manage to gain unauthorised access to destroy data resources. 4.2 Fundamental model of access control “The very nature of access control suggests that there is an active subject and a passive object with some specific access operations and a reference monitor that grants or denies access” [2]. On the Grid, the subjects are users, processes running on user’s behalf and resources. The objects are resources to be shared such as supercomputers, databases and instruments. A resource on the Grid can be a subject in one request and an object in another. 47 Critical Evaluation of Current Approaches to Grid Security The access control can be either user centred or resources centred. The first focuses on the user capabilities and the second on what can be done with an object (first design principle [2]). User Access request Reference monitor Resource Figure 4.1 The fundamental model of access control [2] Consider the Grid project describe section 1.3. It consists of multiple trust domains A, B, C and D. Figure 4.2 below indicates this situation by showing local access policy that applies in institutions A, B, C and D. In addition, it shows the heterogeneous platforms that include Kerberos, OS2 and UNIX. The authorisation problem on the Grid can be described as follows: If a U.A is successfully authenticated to the VO to access resource R.B in domain B, how does B decide what access rights U.A has on R.B? Figure 4.2 Multiple trust domains 48 Critical Evaluation of Current Approaches to Grid Security The main obstacle is that the identity of U.A and his credentials as expressed in policy A are meaningless in domain B. Therefore, the first issue to be solved is how to enable interoperability between these heterogeneous domains. The second issue is to decide where the decision on U.A’s access rights is made (on the target site or centralised). Typically, each organisation in the VO wants the access control decision to remain in the control of its local site administrator. For commercial companies, it is unthinkable to give up control over their resources. As a result, the decision on U.A’s access rights has to remain under full control of the local site administrator. Several possibilities have been considered: • Resource centred using Access Control Lists (ACL) [2] • User centred with Role Based Access Control (RBAC) [2] • Distributed authorisation using identity mapping [4, 7] • Community Authorisation Service (CAS) [3, 4, 5] 4.3 Resource centred with Access Control List (ACL) ACLs [2] specify, for each resource, a list of authorised users, with their privileges on that resource [2]. When U.A is authenticated by the VO to access R.B, and his name is on the list of authorised users of R.B then, he can use the resource with his associated privileges. The access rights of U.A to R.B in the form of ACL can be viewed as follows: ACL (R.B) = [(U.A, [opB1, opB2…]), (U.C, [opB4, opB5…])…] Where opB1 is an operation on resource R.B This approach has many disadvantages: 1. Organisations in the VO must agree on access policy in advance that can be enforced by each local site administrator. However, it would be difficult to agree on a policy that operates at the level of individuals, particularly when they are unknown to the organisation providing the resources. 2. It does not scale. ACLs are tied to resources. The local site administrator has to manage individual users’ accounts on each resource. Each time a new user joins/leaves or has new responsibility in the VO, the local site administrator in each VO site will have to update the ACLs of each resource. For m resources and 49 Critical Evaluation of Current Approaches to Grid Security n users, the administrator has to manage m*n associations. For instance, CERN project [38] has 1800 users. Suppose that they share 10 valuable resources with one site. This means that the administrator of this site will deal with 18,000 associations that is a huge number to manage. 3. Vulnerability to errors. Due to the dynamic aspect of the Grid, the management of ACL on individual basis becomes cumbersome and error prone. When a user is removed from the project, the administrator has to go through each resource’s ACL to revoke that user’s permissions. As a result, it is very likely that he can make a mistake by not removing someone’s access right when he is not authorised to use the resource anymore. 4.4 Role Based Access Control (RBAC) The main problem with the previous approach is administering individuals. To reduce this problem RBAC [2] authorisation model could be used. RBAC focuses on users and the jobs users perform within the organisation [2]. Thus, it is appropriate for large and dynamic environment such as the VO. RBAC carries numerous advantages compared to the previous approaches here are some of them: 1. Reduced administrative cost and complexity [2]. The local site administrator can assign permissions to roles in the project instead of individuals, according to the global and local policy. RBAC (U.A) = [(Site1, Role1), (Site2, Role2)...] Permission (Site1, Role1) = [op11, op12, op13….] Operations can include execute program, read, write, and update. Every time there are personnel changes such as new members added/removed/ have new responsibilities, only Roles permissions are updated or added. 2. Consistency. Each role can be mapped to the user’s positions within the project and can be granted a set of operations to perform his job. For example, two students in the same role will have exactly the same permissions. The current RBAC systems used are PERMIS [6] and AKENTI [6], which has been used with Globus. 50 Critical Evaluation of Current Approaches to Grid Security 4.5 Distributed Authorisation Distributed authorisation is a user centred authorisation model [3]. The main idea behind this approach is that each site in the project has a proxy responsible for converting global credential to a local credential. So, each Grid user would be required to have an account in each administrative domain. User’s access rights are decided and managed on each VO sites. Access to resources is achieved through mapping of identities/credentials from the user’s domain to the resource domain. For example, the decision on the access rights of U.A to access R.B is achieved through identity mapping from domain A (i.e. UNIX) to a local account in domain B (i.e. Windows). Therefore, U.A can access R.B as a local user in domain B. As a result, B can apply its local security policy. This solution is currently adopted in many Grid projects such as Globus [32] and Unicore [34]. This solution has several advantages: 1. It allows local site administrator to be in full control of local resources. It is the administrator job to create user accounts and set permissions to resources in his site. 2. It provides for accountability. Usernames and passwords are associated with the subject name on the certificate. Thus, it is possible to track individual users who performed actions on a specific resource. 3. It does not require changes in the security mechanism in the local site. It can be implemented with ACLs and RBAC mechanisms. Figures 4.3 and 4.4 illustrate the identity mapping process with ACL and RBAC respectively. When U.A wants to access resource R.B in site B: 1. U.A sends his X509 certificate to the resource in site B. Upon receipt, the resource checks the certificate validity and that the sender has the private key that corresponds to the public key on the certificate (Done via SSL/TLS). 2. Once the authentication succeeds, the resource extracts the subject name (SN.U.A) from the certificate. 3. The resource compares the subject name to entries stored in his mapping database. If there is no match, then the user is not authorised to access the resource. 51 Critical Evaluation of Current Approaches to Grid Security 4. Otherwise, the resource fetches the username and password corresponding to the subject name from the database. Now, the user can access the resource as a local user in domain B. Company B Mapping database Company A [(SN, Un, Pw)] U.A Username Password U.B Username Password U.C username Password 4 3 2-Subject Name False SN.U.A =U.A 1- Not Authorised True Authorised (Username, password) UNIX Figure 4.3 Identity mapping process with ACL Company B Role Users Student U.B Mapping database Scientist U.D, U.C Admin U.B Company A Username Password Username Password username Password [(SN, Un, Pw)] 4 3 2-Subject Name SN.U.A =U.A 1- UNIX False Not Authorised True Authorised (Username, password) Figure 4.4 Identity mapping with role based However, it still has several drawbacks such as: 1) Scalability problem [7]. The user will have local accounts at all VO sites where he has access to resources. This implies that the administrator has to create/delete an account and set its access rights every time a new user is added, removed or has new responsibilities. 52 Critical Evaluation of Current Approaches to Grid Security 2) Distribution problem (inconsistency). Removing user’s access rights to project resources is cumbersome and error prone. The project leader has to contact all local system administrators to inform them in case of personnel changes or policy changes. If one site is not informed, then there will be inconsistency in the policy. 4.6 Globus approach to Authorisation The Globus toolkit uses distributed authorisation approach where the local site administrator is in full control. This section will describe identity mapping in Globus (figure 4.5). Consider the Grid project described in section 1.3. Assuming that R.B successfully authenticates U.A: • The identity of U.A is extracted from the certificate (Subject name). • The Grid-mapfile [3, 6] on the resource maps the subject name to a local identity such as UNIX account or Kerberos ticket depending on the resource platform. If the “Subject name” is not in the mapping file, this means the user is not authorised to access the resource. If the resource is running on Windows operating system, a function in Globus called SSLK5D [6], a modified Kerberos Key Distributed Centre [2, 6], takes a user credential and returns a Kerberos ticket. (Assuming that each member of the project already has an account on the site where he is authorised.) Company B GSI Grid-mapfile Company A U.A Username Password U.B Username Password U.C username Password 3 Subject Name: U.A SN (Cert U.A) =U.A 1Cert U.A True UNIX Authorised (Username, password) Figure 4.5 Identity mapping in Globus 53 False Not Authorised 4 Critical Evaluation of Current Approaches to Grid Security 4.7 Community Authorisation Service (CAS) The previous section has highlighted some of the main problems faced in approaches to Grid authorisation. Scalability, heterogeneity and the distributed nature were factors that could exclude ACL and RBAC from being the best choice for Grid authorisation. To solve these problems, a centralised authorisation model was proposed by Globus [3, 5] is: • The VO partner organisation allows the resources administrator to grant access to block of these resources, to the project as a whole. Thus, grouping resources together and granting access on theses blocks to the VO instead to the VO users. • The project itself manages fine-grained access control mechanisms. Thus, the access policy will be flexible enough to allow the project and the resource administrator to specify the way resources should be allocated. These features have been implemented in Community Authorisation Service (CAS) framework [5]. The VO will require a CAS server that will be responsible for tracking members and managing fine-grained access to resources [5]. The local site administrator uses local access control mechanism to grant access to local resources based on the subject name associated with the CAS server name, not the individual’s name. CAS security architecture is based on certificate-based PKI and delegation. The delegation is done using “restricted proxies” (details in section 3.7) because this type of proxies allows the CAS server to delegate a subset, not all, of its authority to users. “Impersonation” proxies (details in section 3.7) are not suitable because the project may involve different roles. Thus, it is inappropriate for the CAS server to delegates all its authority to users. The CAS authorisation process is shown in figure 4.6. When a user U.A wishes to access a VO resource: 1. U.A sends his X509 certificate and the request to the CAS server. The latter verifies U.A’s certificate and fetches U.A’s rights, granted by the project, from the policy database 54 Critical Evaluation of Current Approaches to Grid Security What rights the project grant to this user ate ific ert c A Company A U.A U. st, e u q Re LS y AS C L/T rox 1 SS cted p ri est SR A 2-C CAS Server VO Project Policy Database Resource R in Company B 3-Resource request, authenticated Is this request authorised for the project? with CAS proxy Local policy information Project subject name Policy Restriction Does the proxy restriction authorise this request? 4-reply Figure 4.6 CAS authorisation model (Adapted from [5]) 2. The CAS server creates and sends a restricted proxy certificate to the user in order to access the requested resource. This certificate allows the CAS server to delegate a subset of its access rights to the user in the form of capabilities. These capabilities are based on the type of the request and the role of U.A in the project. This certificate contains the name of the CAS server in the subject name field and the restrictions in the policy restriction field. 3. The user sends both the proxy certificate and the request to the resource. The user authenticates to the resource with the proxy certificate as a project user (not as an individual). The resource checks whether the request is authorised by the local policy of the organisation to the Grid project. In addition, it checks whether the proxy restriction authorises the request. Once these checks are successful, the request is processed on the remote resource. 4. The resource sends the result of the request. CAS model has several advantages: 1. Scalability: users need to be known and trusted only by the CAS server in order to get permissions to access resources. There is no direct relationship between users and resources. In addition, resource providers need to be known and trusted by the CAS server, not all users. This reduces significantly the number of association to be managed between resource providers and users. For instance, consider the 55 Critical Evaluation of Current Approaches to Grid Security number of users in CERN that is 1800 users accessing 10 resources. With CAS model, there are 1800 associations between CAS-server and VO users and 10 associations between CAS-server and Resources. This means there are 1810 associations in total. Obviously, this is much easier to manage compared with 18000 associations in identity mapping. 2. Policy support: CAS allows the local site administrator to enforce local policy, and the VO to enforce global policy. With this approach, the conflict between local and global policy can be easily resolved. Any request conflicting wit the resource provider’s local policy will be rejected. 3. Support for ACL and RBAC. It allows both mechanisms to be implemented on local sites, and support global policy of the project and the local security policy of each site. CAS also has its drawbacks. • Single point of failure: If the CAS server fails, the whole Grid project will stop working. The users will not be able to get proxy certificates from the CAS server, thus the project resources will not be accessible. • Single point of attack. A denial of service attack on the CAS server will leave the project users disconnected from the project resources. • It does not provide for accountability: In CAS, individual users are not accountable for their actions. This is because users are acting on behalf of the project not as individual. Therefore, in case of resources misuse, the resource administrator will know that a user from a specific project performed the misuse. He will not be able to detect which user in that project has performed the action. • Giving up control: The local site is giving up some control to the VO. This approach is not acceptable by security-paranoid companies such as pharmaceutical and financial companies. 4.8 Access Control with PKI It is possible to do both authentication and authorisation in one go. With X.509 v3 certificates, access rights can be linked to a public key. To get access to a resource, the user has to prove knowledge of the private key corresponding to the public key on the certificate [2] and that the access right on the certificate allows the holder to perform the requested task. Simple PKI [2, 33] “specifies a standard form for X. 509 56 Critical Evaluation of Current Approaches to Grid Security digital certificates whose main purpose is authorisation”. But currently, there is a debate about the extension that holds the access rights. The IETF has not standardised the extensions in X.509 v3 certificate [2]. This solution is not scalable and doesn’t provide for accountability, as it is not associated with an identity. 4.9 Firewalls and the Grid A major problem for the Grid is firewalls [7, 15]. VO partners’ organisations want to share specific resources while keeping their other resources private. Usually, most VO organisations protect their resources on their Local Area Network (LAN) from the Internet with firewalls. The shared resources and the private resources, within a VO site, are probably connected for internal use. By enabling access to resources behind the firewall, companies will be exposing their LAN, thus their private resources, to the outside world. This defeats the purpose of having a firewall in the first place. 4.9.1 Brief overview of firewalls A firewall is an access control mechanism that divides a network into, usually, at least three domains: • “Inner” (green, safe) trusted network that is the LAN, • “Outer” (red, unsafe) un-trusted network that is the Internet, • Demilitarised zone (DMZ) (orange), more dangerous than inner but more sheltered than outer. For instance, the Information Security Group (ISG) [49] in Royal Holloway [50] has four network domains: student network (inner), staff network (inner), DMZ, and outer network. The firewall, possibly, checks all network packets as it arrives, logs it and allows it to pass through if it satisfies the rules set by the firewall administrator. Many firewalls use Network Address Translation (NAT) [15] and a router to control access to inner network. The latter is most likely to have private IP addresses that make computers on the LAN invisible to the Internet at large. Thus, the inner network appears to the outside world as one single machine: the firewall host. As a result, all outgoing packets from internal network to the outside appear from the firewall. Several issues arise when using firewalls on the Grid: 57 Critical Evaluation of Current Approaches to Grid Security • How can a VO organisation allow VO members to access resources shared with the VO, without compromising the security of the resources that are restricted to the VO. • Because many institutions use NAT and private IP addresses, there is a problem with the Name to use on the resource certificate. (Should it be the firewall host name or the resource machine name?) 4.9.2 Accessing resources behind the firewall Each VO site would be required to open a dedicated port to allow access to its resources. The European Data Grid project lists 10 ports [7] in order to open access on resources. Once the user is behind the firewall, he can attempt to exploit vulnerabilities in other machines in the local network. To reduce this risk, the local site administrator can restrict access to ports so that only request from specific IP addresses can go through. For instance, only IP addresses of VO partners hosts. However, this solution has drawbacks: • It will not work with UDP [14] network packets, because the latter have a source address field that is unreliable and can be easily spoofed. Thus, the rules concerned with accepting packets from a specific IP address will not work. • Due to the dynamic aspect of the Grid, the firewall administrator will have to reconfigure the firewall each time new company join or leave the project. As a result, the administrator may do a mistake during the reconfiguration. If the firewall rules by default allow people in then it is not doing its job properly. If it denies them (which it should!) then they would have to be modified in the first place to let the partners companies in (and therefore modified again to lock them out once they no longer need the access). Also, using revision control systems (RCS) [14] for firewall rules is highly recommended to quickly roll back to a previous configuration if something goes wrong with the new one. 58 Critical Evaluation of Current Approaches to Grid Security 4.9.3 Naming issue with Network Address Translation (NAT) On the Grid, mutual authentication between a user and a resource is required. Some firewalls use NAT [15] so that remote users do not access directly resources behind the firewall. Since the inner network appears as the firewall host, a major issue arises is the selection of the name to use on the resource certificate. Company A F/W1 add1 University B add2 add1: public IP address add2: Private IP address Figure 4.7 Firewall problem A resource certificate usually holds a hostname. Like in e-commerce systems, in case the resource is behind a NAT firewall, the VO user will be connecting to the public IP address that is of the firewall on a specific port. Thus, the firewall name is expected to be on the certificate (figure 4.7). The firewall will then tunnel the request to the target resource. However, internal users will not be able to authenticate the resource because the certificate does not hold the resource name [3]. Usually, when checking the resource certificate, user’s host looks up the IP address of the resource host presenting the certificate and compares the host name to the one on the certificate. As long as the hostname it receives matches the one on the certificate and assuming everything else about the certificate is correct, it will accept it. Some PKI implementations have unspecified behaviour if multiple hostnames are returned for an IP address (as can be the case if the host has DNS aliases –standard procedure). So, the problem can be addressed at the DNS level. The technical details of configuring firewalls are beyond the scope of this dissertation. A possible solution to the previous problem is to have a second firewall (Fw2) for the internal users (figure 4.8). Internal users will access the resource via the first firewall as the external users. Thus the name on the certificate will match the name resulting 59 Critical Evaluation of Current Approaches to Grid Security from resolving the IP address. This solution is not convenient, as the company will have to manage two firewalls. Figure 4.8 Solution with 2 firewalls. Currently, most large organisations partition their local area network into domains using firewalls. For instance in Royal Holloway, ISG has its own firewall, which separates it from other network domains: the student network, staff network and administration network. According to [7] Grid users should be able to access only the domain that provides the Grid services. Therefore, the entire LAN should be properly configured such that it is very difficult for an attacker to gain access to other domains from the Grid domain. 4.9.4 Globus and firewalls Globus allows access to resources behind the firewall via dedicated ports. According to [7], these ports are 2119, 2135 and 2811. Globus suffer form the problem described above when the firewall is using NAT. Currently there is no good solution for firewalls with NAT. 4.10 Future network solutions IP secure Protocol, IPSEC [18, 51], is designed to provide a point-to-point secure channel, typically offering origin authentication and confidentiality at the network layer. Point-to-Point means: Host-to-Host, Gateway-to-Gateway or Host-to-Gateway. IPSEC is widely used to implement Virtual Private Network (VPN) connections [18]. It operates in two modes [51]: 60 Critical Evaluation of Current Approaches to Grid Security • “Transport” mode: used between two IPSEC enabled hosts (host-to-host). In this mode the TCP datagram is encrypted, and the destination address is left in clear text. For instance, this mode can be used when confidentiality isn’t an issue but we want to know whom we are communicating with and the data is unchanged. (Useful for firewalls on the Grid) • “Tunnel” mode: used to establish secure connection between gateways (firewalls). In this mode the TCP datagram is encrypted as well as the destination address. A new IP header is used so a Virtual private Network can be established between remote sites. IPSEC provides origin authentication and data confidentiality by using [18]: • IP authentication header (AH) • IP Encapsulating Security Payload (ESP) Both protocols require shared secret keys. The keys are negotiated between communicating parties using the Internet Key Exchange (IKE) protocol. In Grid environment, origin authentication provided by AH will: • Allow firewalls on VO sites to only accept network packets originated from a trusted IP address that belongs to a partner organisation, because with IPSEC it is difficult to spoof the IP source address. • Provide protection against denial of service attacks, because IP source address is known and unforgeable. • Work on both TCP and UDP. IPSEC also provides data confidentiality by using the ESP protocol. This feature can provide limited protection against traffic flow analysis [51]. AH and ESP protocols can be used alone or combined to provide origin authentication, confidentiality or both. According to [18], there are engineering and political reasons for separating them (encryption export restrictions). 61 Critical Evaluation of Current Approaches to Grid Security Chapter 5 Confidentiality, Integrity, Availability and Accountability on the Grid 5.1 Introduction The Grid is characterised by the flow of information between VO users, resources and VO partners’ organisations. Information is stored electronically and transmitted by electronic means. By using the Internet or public communications, VO members may be exposing themselves to new risks. VO Organisations want to protect their data resources while collaborating with other VO partners. Traditionally, security has been concerned with maintaining the confidentiality, integrity and availability of data resources [2]. Accountability [54] has been recently added to these attributes. On the Grid, Confidentiality and Integrity are required because data is transmitted over a public network. Furthermore, resources involved may be extremely sensitive and valuable. This requires that only authorised project users can gain access to read or modify these resources. Resources availability is vital on the Grid because without it the purpose of sharing resources will not be accomplished. Finally, accountability is required in order to determine project users who are responsible for performing specific jobs on VO resources and for billing purposes in the future. The smooth running of the Grid project crucially relies on security mechanisms that ensure these attributes. This chapter will describe each of these attributes, why it is needed and the mechanisms used to achieve them. 62 Critical Evaluation of Current Approaches to Grid Security 5.2 Confidentiality on the Grid Confidentiality on the Grid is crucial. The existence of some institutions in the VO, in particular commercial companies, depends on keeping their information secret. Confidential information can include: proprietary databases, sophisticated software and intellectual property such as software source code and drug formulas and compounds. Confidentiality deals with preventing disclosure of information to unauthorised entities [2]. On the Grid, Confidentiality is needed for: • Ensuring the network security. Users will remotely communicate with resources, submit jobs and upload proprietary source code to remote supercomputers over the Internet. This means traffic between users and resources is a point of attack. • Ensuring the privacy of data resources. The project may involve extremely sensitive or valuable data such as patients’ medical records and drug experiments databases. These data is shared on the basis that only authorised project users will have access to it. • Maintaining the secrecy of data processed on remote resources. The Grid project may allow its users to run programs on remote sites. The result of the program and, possibly, the source code running on the remote machine should remain confidential. The usual security mechanism used to provide confidentiality is encryption [10, 11] and is described next. 5.2.1 Brief overview of Encryption Encryption is a mean of transforming plaintext into cipher text under the control of a secret key. This process is called encryption and we write C=EK (M). Where M is a plaintext, E is the cipher algorithm, K is the key and C is the cipher text. The reverse process is called decryption or decipherment and we write M = EK(C). The current algorithms used are either symmetric [10, 11] or asymmetric (public key cryptography) [10, 11]. 63 Critical Evaluation of Current Approaches to Grid Security 5.2.1.1 Secret key cryptography With Secret key cryptography (symmetric encryption) both parties use the same key value to encrypt clear text into cipher and to decrypt a cipher text back into clear text. The assumption is that the key K must remain secret. What determines the effectiveness of the cipher algorithm E is the size of the secret key length, K. The larger the key size, the more difficult is to break the cipher text. The most widely used symmetric encryption algorithm is Data Encryption Standard (DES) [11]. It has 56-bit key length and 64-bit block. Due to advances in computing power and its short key length, DES is currently considered not secure for high value transactions [10] and for long-term data encryption. The new successor for DES is Advanced Encryption Standard (AES) [11]. It offers 128-bit key length and 128-bit block size. It is considered secure for the next two decades [11]. A major problem with symmetric key is how to exchange the secret key between parties who do not trust/know each other such as on the Internet and in our case the Grid. Also the number of keys to be managed is huge. For instance, for a group of n users, the number of keys to be managed is n*(n-1)/2. If n = 10 users, this means number of keys to be managed is 10*9/2 = 45 keys. 5.2.1.2 Public key cryptography (PKC) PKC has been described in section 2.4.1. Public key has several advantages over secret key such as: • Private keys need not to be distributed. Only require demonstrated integrity and authenticity of the public keys themselves. • For a group of n users, the number of keys to be managed is 2n, which is easier to manage compared to n2 in secret key. For n = 10, number of key to be managed in this case is 20 keys (imagine it for a 1000 user on the Grid). 5.2.1.3 Combining secret key and public key A major problem with public key cryptography is performance, because of complex mathematical operations that are computationally intensive. Public key cryptosystems are very slow, 1000 times more than symmetric system [11]. Therefore, a hybrid approach is often used. Public key algorithms are used for securely exchanging secret 64 Critical Evaluation of Current Approaches to Grid Security keys for symmetric algorithms. This approach is adopted in communication protocols such as SSL/TLS and SSH. 5.2.1.4 Key management Key management is by far the most important area of security of cryptosystems. Kerchoff’s principle states, “Security shouldn’t rest on the security of the algorithms, but only on the secrecy of the key”. Encryption algorithms such as RSA and AES are assumed secure, but how to manage the keys is the issue to worry about. The key management is really about 3 things: keeping the keys for symmetric algorithms secret, the private keys for asymmetric systems secret, and ensuring that the public keys in asymmetric systems are authentic (make sure it belongs to its owner) [10]. Key management generally covers six areas according to [10]: Key Generation, distribution, storage, usage (and preventing misuse), changing, and destruction. On the Grid we are interested in: Key generation: it is important for proxy servers and users generating their own key pair. Generally, Key generation involves generating random (or at least pseudorandom) numbers to make up the key bits. Since this is usually a single unit within a security system, it is often the focus of sophisticated attacks, which are difficult to detect. • For symmetric ciphers, the key generation needs to exclude weak or semi-weak keys from being generated. These are keys, which have been shown to produce certain effects in a cipher’s output. For instance, when using a weak key in DES, the process of encryption is the same as the process of decryption [11]. • For asymmetric ciphers, the key generator usually needs to be able to generate strong primes in addition to the values being unpredictable. This process is usually very slow and complicated. If a user generates his own key pair, it will cause a non-repudiation problem because someone has a key pair from a CA and another generates by chance the same pair. Issue of liability arises. The recommended key length to defeat factorisation attack is 760-bit [11]. Key storage and distribution has been described in section 2.4 using smart card and certificates. Key usage, changing and destruction are beyond the scope of this dissertation. 65 Critical Evaluation of Current Approaches to Grid Security 5.2.1 Communication security SSL/TLS are communication protocols that are commonly used to provide confidentiality on the Internet. Typically, encryption is used to keep information secret from a third party. But if the third party manages to masquerade as the intended recipient of the information, the encryption will fail to achieve its objective. SSL depends on the existence of a PKI to provide confidentiality between two entities. Thus, it needs to be configured to provide mutual authentication between those entities. This requires: • Trusted Root Certificates: the VO site administrator according to the VO policy must install the root certificates. • Validation and verification: When the certificates are validated, this means the intended parties are communicating together. Otherwise the authentication will fail. The client must ensure that the certificate strictly identifies the party with whom it wants to communicate. CAs usually includes the IP address (DNS) of the resource in the certificates they issue. • CRL checking: Recently, Conqueror and Internet explorer in their implementations did not check CRLs. Without checking the CRL the authentication would not be reliable. Export restriction: SSL/TLS can be weakened because of encryption export restriction. VO may comprise organisations from different countries. Some of these organisations reside in countries where there are restrictions on the use of cryptographic algorithms. For instance, some restriction may allow only 40-bit DES, which will give little protection to a high value data transmitted over the network between Grid entities. This will not allow companies to benefit from the Grid. Passive attacks: encrypted data transmitted over a public network between project partners is vulnerable to passive attacks [11]. In such attack, the attacker collects network packets in order to decrypt them offline and infer some information. The protection against this type of attacks depends on the length of the encryption key. The length of the key should be chosen such that, the time it takes to exhaustively search the key-space is prohibitively long (years/decades). In practice 112-bit and 128-bit key length are used in Triple DES and AES respectively. 66 Critical Evaluation of Current Approaches to Grid Security 5.2.2 Data resource privacy Data resources usually include databases, archives and file systems. These resources may be extremely sensitive and valuable. For instance in the example in section 1.3, the Laboratory may provide proprietary drug compound databases for project users. Only authorised project users can access these databases. Therefore, a reliable access control mechanism is required. The current security mechanisms used have already been described in chapter 4. Sometimes attackers manage to get an encrypted copy of databases backup files. The backup files should remain secret even if they are stolen. This can be achieved using strong encryption like those mentioned in the previous section. 5.2.3 Remote Data privacy A major confidentiality issue arises when sensitive data are processed on a remote resource. The Grid project may allow users to run programs and upload their private source code to a remote resource. The local site administrator, as well as other privileged users on that site, can access the data processed on that resource. Therefore, they are capable of reading or copying the result of the program execution. A level of trust should exist between project users and the local site administrator for not accessing their data. Otherwise, Commercial companies such as pharmaceutical companies that rely on keeping their experiments secret will not be able to benefit from the Grid. In order to reduce this trust assumption with the administrator, an asymmetric encryption function can be integrated within the program. So, the program takes an additional input that is a public key supplied by the user/proxy, in order to encrypt the result of the program execution. In this case, only the user or the proxy running the program has the private key that can decrypt the result of the execution. This measure will make it harder to the administrator to access the result of the program, but it has several limitations: • Performance will degrade because public key encryption is very slow especially if the amount of data to be encrypted is large. • Each program will need to have an asymmetric encryption function for encryption. 67 Critical Evaluation of Current Approaches to Grid Security • If another resource requires the result of the execution, it will not be able to read it because it doesn’t have the private key to decrypt it. Thus resources coordination will be very limited. 5.3 Integrity on the Grid Integrity deals with the prevention of unauthorised modification of data resources and resource configurations. Consider the Grid project described in section 1.3. Integrity can be achieved at different levels: Access control level: In institution A, only authorised users are allowed to change the direction of the satellite such as Admin.A. In C, changing the attributes of a chemical item on the database server should only be done by authorised users such as Scientist.C. Access controls, described in chapter 3 and 4, are mechanisms used to provide data integrity. Administration level: Hackers manage to get unauthorised access to data by exploiting known weaknesses in the systems that run the Grid project resources. This includes vulnerabilities in Operating systems, Database servers, Firewalls, Intrusion Detection Systems IDSs and Routers. Vendors respond by issuing patches and upgrades. It is therefore vital to apply all patches recommended by the vendor. Network and configuration level: Hash and Message Authentication Code (MAC) algorithms [11] are also used to maintain integrity of data. A hash algorithm is oneway function that takes an input a message of arbitrary length, and returns an output of fixed length. A MAC takes a message and key as inputs and returns a checksum of fixed length. SHA-1 [11] and MD5 [11] are hash algorithms that produce 160-bit and 128-bit hash value respectively [11]. HMAC [11] is a MAC algorithm. The hash function HASH has several properties [11]. Let the hash of message M: h = HASH (M): • Pre-image resistant: Given h = HASH (M), it is very difficult to find M. • Second Pre-image resistant: Given M and h = HASH (M), it is very difficult to find N such that h= HASH (N). • Collision free: It is very difficult to find M and N such that HASH (M)= HASH (N). 68 Critical Evaluation of Current Approaches to Grid Security Hash/MACing can be used to ensure the integrity of data resources by hashing backup files, archives or file systems. Once the hash value is produced, it can be encrypted and saved. In case there is a suspicion of unauthorised modification of any of these files, the hash value resulting from applying the hash function again will not be the same. Thus the modification can be detected. Digital Signature: is a fundamental mechanism for ensuring data integrity. A digital signature is usually created by calculating the hash value of data (document, network packets) and encrypting the hash value with the private key of an entity (for performance reason). For example, the digital signature on the certificate gives assurance that the certificate has not been modified since creation by the CA. Modifications to the certificate content can be detected because the Hash value of the certificate will be different from the value signed by the CA. Digital signature can also be applied to data resources, so Grid users can ensure that the data they are using is from the intended VO site. In communication over the public network, digital signature provides origin authentication that gives Grid users assurance of the identity of the second party they are communicating with. Concurrency level: Since the Grid is about coordinated resource sharing, a concurrency problem will occur. A job running on the Grid may involve a file or a database update. Sub-jobs cannot simultaneously read the portion of the file concurrently being updated by another job. Therefore, an application might read partially updated data and perhaps receiving a combination of old and new data. Locking and synchronizing primitives are required to maintain data integrity [19]. Typically, these primitives are built into the files system or database to automatically prevent this. 5.4 Grid Availability Most VO organisations are totally dependent on their computerised information systems and the data that they store, process and transmit. Consequently, system failure or information loss can have grave consequences on the smooth running of the Grid project. VO Organisations are also being faced with an increasingly wide and sophisticated range of threats including viruses, hacking, and denial of service attacks because resources on the Grid are connected to each other via the Internet. 69 Critical Evaluation of Current Approaches to Grid Security Availability aims at ensuring that authorised Grid users have timely and uninterrupted access to resources on the Grid [2]. Availability could be viewed from many angles: threats to the Grid infrastructure layer, threats to the underlying operating system and supporting applications and finally threat to the network infrastructure. A major threat for the Grid is “Denial-ofService” (DoS) [27]. Since the server is accessible through the Internet, it will be vulnerable to “Denial-of-Service” (DoS) and “Distributed-Denial-of-Service” attacks [27]. These types of attacks have been widely experienced on the Internet. For example, a DoS attack in early 2000 seriously interrupted the services of some wellknown Internet sites such as Amazon, eBay and Yahoo [27]. The Grid infrastructure itself has some critical components such as the CA, the CAS server and the Grid map file. If any of these components is compromised the Grid project will fail. CAS server: When the Grid project relies on the CAS server for granting users access to resources, the availability of the server becomes crucial. Since the server is accessible via the Internet, it will be vulnerable to a wide range of threats with different impacts on availability: • If the CAS server is hacked, and all its content is deleted, VO users will not be able to access any resource in the VO. • If the CAS server is infected by a worm such as Code red, the performance of the server will degrade dramatically because worms consume bandwidths in order to replicate. This will delay the response time to VO users’ requests. • CAS server is also vulnerable to network attacks such as SYN flooding [14] that exploits the handshake operation when establishing a TCP connection. (Communication with the CAS server is done via SSL thus TCP) Grid-map file: Each VO site has a Grid map file. One way of causing interruption of services on the Grid would be by deleting the Grid-mapfile or the database used to map global credential to local accounts on the resource. The impact of compromising a Grid map file will affect one VO site instead of the whole project. In Grid projects that involve data resources, replication of this data to different machines at different sites can increase data availability. Thus to stop the service the 70 Critical Evaluation of Current Approaches to Grid Security attacker need to disable all servers that host the data. Also, the DoS would consist of flooding the network with huge amount of fake requests. The success of the attack depends on the type of the scheduling algorithm used to manage requests. It may vary between jobs execution delay to complete stop. 5.5 Accountability We have researched the Grid literature and have not found an explicit statement of the accountability problem. We now present our definition of what the accountability problem is. Accountability tracks jobs execution with the intention of determining the principles responsible for performing a job. Audits logging is a mechanism used to maintain accountability. What makes accountability on the Grid more complex is the ability to simultaneously access heterogeneous resources in geographically distributed locations. Typically, an audit system will contain records about transactions such as logging on to the system and initiating a process. According to [4], a job on the Grid “is composed of dynamic group of processes running on different resources and sites.” Consider the Grid project described in section 1.3. When a U.A submits a job to a remote resource, R.B, on site B, the resource authenticates U.A first, and then maps his global credential to a local account, LocalU.A-B, on B. The audit trails on the local resource records that a local user, LocalU.A-B, has successfully logged in. From this point, all U.A’s activities (job processes) are associated with his local username on that resource, LocalU.A-B, and recorded in the audit trail of that resource. These activities may include: • Initiating processes: audit trails are expected to show when the processes were initiated, who initiated them and when the process completed (or terminated for some reason). • File access: the audit trails provides information about what files (or databases) the U.A has opened and closed. Moreover, it provides information regarding changes in these files such as read from/delete from/ write to a file done by U.A. Finally, it may records to which files there were invalid access attempts by U.A. • Resource utilisation: the audit trail will show the level of utilisation for each resource by U.A. This allows the administrator to determine which user is abusing 71 Critical Evaluation of Current Approaches to Grid Security the resource. Resources can be a massive storage device, CPU cycles, memory, and network bandwidth. The details of use of a resource can help the administrator to determine whether there are unauthorised processes such as worms or malicious code abusing the resources and take appropriate actions. The job submitted by U.A to R.B may determine that it requires access to a resource R.C on site C. Thus, R.B initiates a proxy on U.A’s behalf to access that new remote resource R.C. The latter will map the U.A’s identity on the certificate to another local username, LocalU.A-C, which may be different from LocalU.A-B, and starts recording U.A’s activity locally in a similar way. In order to determine U.A’s activities on the Grid, each local site administrator in the VO will have to filter the audit trail of his resources using the username associated with U.A’s Subject name on the certificate. Accountability on the Grid is distributed. Each site in the VO locally maintains accountability. Currently, there is no method to track users’ activities on different VO sites from a central audit trail for many reasons. Here are some of them: • Access to audit trails is usually restricted to the local administrator of the resource. If the audit trails’ integrity is compromised, the company will lose accountability. Therefore, it is unthinkable to allow external processes to access the audit trails • Due to the heterogeneous nature of resources on the Grid, audit logs have different format on different resources. The accountability level in the VO schemes, described in chapter 2, depends on how the local administrator creates and manages users’ accounts on resources. The system administrator may give all users from a particular Grid project the same account with even the same access rights because: • The number of users may be large • The number of accounts to be managed and created is too great Thus, it will be difficult to track users’ activities on the Grid. The administrator will be able to determine that a user from a particular project has performed a particular transaction. CAS authorisation model has this weakness, as accounts on resources are associated with the CAS server name, not with users’ identities. 72 Critical Evaluation of Current Approaches to Grid Security Chapter 6 Toward a Top Down View of Grid Security 6.1 Introduction Grid provides control over huge resources consisting of enormous computational power (parallel processing machines), storage (hard disks, memory), scientific devices (telescopes, satellites) and safety critical systems such as the Ferno at QMUL used by the RealityGrid project. Because of its valuable assets, the Grid can be a very attractive challenge to reputable hackers to cause intentional damages: cracking codes, obtaining confidential information, storing illegal documents, sabotaging equipments, infecting programs, and even causing physical harm --These are intentional external threats. So far we have introduced several types of VOs and discussed in depth various security mechanisms associated with each type. These mechanisms provide the infrastructure upon which grid security solutions could be built in a bottom up approach. They have mainly been user-centred focusing on authentications and credentials. To form a better understanding of grid security, I attempt in this chapter to provide a top down view which could be useful to users, site managers, and security administrators. The approach is resource centred and is inspired by the lectures of Prof. Dieter Gollmann on “security” and Dr Sherwood on “risk management”. To do this it may be useful to step back a little and remind ourselves of what “security is about”. According to Prof. Dieter Gollmann, security is about taking appropriate measures to prevent the misuse of valuable assets (protection), identify damages that have happened to them (detection) and recover the assets after the damage is done (reaction). 73 Critical Evaluation of Current Approaches to Grid Security In the Grid context the assets are the shared VO resources and the resources of the individual sites. To protect these assets, it is important to have a threat model that identifies the known threats to each asset and the vulnerabilities introduced by the security mechanisms in use. As we have seen, when an organisation becomes a VO site in a Grid project, some of its resources may be shared with other VO sites from different security domains. In addition, these shared resources will be connected to resources in other security domains through the Internet, which was not designed to be secure. Thus, these resources (even the site resources which are not intended to be shared by the VO!) are vulnerable to a range of new threats that might affect the whole organisation mission. In order to develop a strategy to protect these resources, they have to be assessed for their degree of importance. The impact resulting from the breach of confidentiality, integrity and availability of the resource decides what level of security is required. This could be achieved by conducting a risk analysis (RA) [20, 21] on the organisations’ resources. Risk analysis helps VO organisations to identify and manage the risk of sharing with the aim of reducing it to an acceptable level. The risk analysis is done from two different perspectives: • VO: to protect the security interests of VO users • Individual sites: to protect the security interests of the individual organizations within the VO. Any workable solution has to be a compromise acceptable to both parties. 6.2 Risk management of Grid Assets The purpose of risk management [20] is to determine the current security status and the level of protection required for a resource. Also, it offers a cost-effective way of providing protection needed for the organisations’ resources. However, risk management can only be effective if it is based on a sound risk analysis process. Here is a brief explanation of risk analysis. 6.2.1 Overview of Risk and Risk Analysis The key elements of a risk analysis are: • Asset: is a valuable resource for a VO partner organisation such as supercomputer, databases, equipments and software. 74 Critical Evaluation of Current Approaches to Grid Security • Threat: is any potential danger to resources: equipments, information, or systems. For example viruses, personnel, sabotage, theft and system failure. • Vulnerability: is a weakness in the protection that allows a threat to occur. For example, a miss-configured firewall would allow unauthorised access to resources. Risk is the possibility of damage. It is dependent on the asset value, the threats, and the vulnerabilities. Risk analysis analyses the relationship between these elements to determine potential loss [20]. 6.2.2 Risk Analysis of Grid Assets Sherwood [21] defines RA within a business environment as, “the process of identifying business assets, recognising the threats, assessing the level of business impact that would be suffered if the threats were to materialize, and analysing the vulnerabilities”. This variation of RA definition is more suitable to Grid. Any security solution needs to be viewed from the VO perspective as well as from the individual VO sites. The VO needs risk analysis to ensure that the VO infrastructure has an acceptable level of security. Individual VO sites need to do risk analysis in order to: • Recognise potential new threats that arise from joining the Grid. • Examine the effectiveness of current security controls used by the VO. • Understand vulnerabilities • Estimate the impact of loss of Confidentiality, integrity and availability on resources. Vulnerabilities Trust Threats User Security mechanisms Resource CIA - Impact Functionalities Figure 6.1 Risk analysis diagram depicting relationship between functionalities, security mechanisms, and vulnerabilities. 75 Critical Evaluation of Current Approaches to Grid Security Typically, each resource in the VO has: • A set of possible threats (threat model). • An impact, which results from a threat that materialises (usually loss of confidentiality, integrity or availability). • Security mechanisms for enforcing an access control policy (local and global). The VO user gains access to VO resources via various security mechanisms such as authentication and authorisation. These security mechanisms have vulnerabilities that may allow threats to materialise. After a risk analysis is implemented, the VO site can decide whether or not the vulnerabilities of the security mechanisms are acceptable or not. If the VO site finds that the level of risk is acceptable, then they will join the VO. In some circumstances, the VO site may impose conditions on certain procedures in the VO. For instance, a condition may state that only digital certificates issued by CA, X, is acceptable. 6.2.3 Enhancing security of core Grid Assets As depicted in the figure bellow [2], there is an inverse relationship between “security” and “functionality”. It is normally the case that adding more functionality to a service will result in less security. Therefore, to minimise security risks to core services, non-essential functionalities should be removed [2]. Security Functionality Figure 6.2 More security less functionality 76 Critical Evaluation of Current Approaches to Grid Security 6.3 Security hierarchy of Grid Resources To ease the process of risk assessment, it is very useful to define a partial ordering relation on resources that captures security dependency. Typically, r1 <= r2 means any vulnerability to resource r2 is also a vulnerability to resource r1. In other words, the security of r1 totally depends on the security of r2. For example, we have: r.B <= map-file.B map-file.B <= CA This ordering will allow the construction of a chain of hierarchy and greatly facilitate the derivation of the risk assessment of an asset according to the cumulative impacts on all the dependents assets. For example, compromising the CA implies compromising all the resources on the VO! This hierarchy leads naturally to a protection rings model [2] in which the most critical security services are placed in ring 0 (this has highest protection: access control, firewalls, less functionalities, etc...) and the least critical security services are placed in the outer ring. 0 123 Figure 6.3 Protection rings [2] Grid infrastructure contains some critical resources that support the functions of the VO. Consider the VO scheme where all VO organisations trust a third party, the CA (described in section 2.5). The major resources of the VO can be classified in two categories: as follows: 1. The first category comprises the Grid infrastructure resources that include: • Certificate Authority: The CA is the most critical resource on the Grid because, it issues and signs certificates to project users and other resources. These certificates are used for authentication and authorisation on the Grid. If the CA’s private key is compromised, this means the digital certificates will not be reliable anymore. As a result: 77 Critical Evaluation of Current Approaches to Grid Security 1. The authentication process will fail because the attacker will be able to sign a false certificate, which enables him to gain access to resources to which he is unauthorised. 2. The authorisation will fail. Authentication failure will trigger authorisation failure (domino effect) because authorisation is based on the subject name in the certificate. 3. The whole Grid project will fail. • Registration Authority: The RA is the second most critical resource. VO partners would trust the certificates issued by a CA, only if they are satisfied with the amount of checking done by the RA to confirm an identity, before certifying a public key for that identity. The level of the verification procedure depends on the intended purpose of the certificate. • CAS server: The CAS server provides authorisation data to Grid users. Grid users can only access the VO resources if they have a proxy certificate signed by the CAS server. If the latter is compromised, the attacker will be able to access any resource on the Grid to which the VO is authorised. The impact of this breach will depend on the type of resources shared and the authorisation level granted to the VO. For instance, if the resources involve databases or archives, unauthorised write access would destroy their value. While unauthorised read access will cause disclosure of confidential information and may result in financial losses, even worse some companies may be out of business. Furthermore, access to resource by legitimate users will be interrupted completely if the data on the CAS server is deleted. 2. The second category includes the shared resources from in each VO sites that comprise: • Grid map files: The map files are necessary for converting VO users’ global credentials to local accounts. Without the map file, it will not be possible to VO users to access resources at a specific VO site. The impact of compromising a grid map-file is less significant than the CAS server impact because, map file operates on individual VO sites or even individual resources, not at all VO resources. Therefore, the impact of unavailability is restricted to one resource or at most one site, which still may cause availability problem for 78 Critical Evaluation of Current Approaches to Grid Security the entire VO. Also, the attacker may manage to modify the Grid map-file so that his subject name can be mapped to an existing account with high privileges. By masquerading as a legitimate user, the attacker may find on the resources confidential drug description data, or confidential future car design information. VO components CA RA CAS-Server Grid-mapfile Computer resources Instruments Data resources Network resources Desktop machines Telescopes Databases Large clusters Supercomputers Sensors Archives Campus network Massive storage Satellites Software applications Local resources Figure 6.4 Resources Hierarchy • Data resources: Data resources vary from file systems to very sensitive databases such as drug experiments and medical records. The value of the data depends on the impact caused by the breach of confidentiality or integrity of data. The attacker may manage to infer a drug formula from the data. Also, if an attacker manages to modify or overwrite existing data resources, VO users will be using unreliable data. Thus, they end up with unreliable jobs results. • Computer and equipments resources: Computer resources vary from CPU cycles, massive storage to multi-million high performance supercomputers. Equipments include lab facilities, telescopes and satellite that are extremely valuable. The value of the resource decides the level of protection required. 79 Critical Evaluation of Current Approaches to Grid Security 6.4 Threats Here is a sample list of possible threats to VO resources that could bring severe damage and loss of revenue: • Loss or theft of the CA’s Private Key. • Loss or theft of a VO user’s Private Key. • Loss of the CAS server availability: The availability of the CAS server is crucial. If the server is not available, due to a denial of service attack for instance, VO users will not be able to access VO resources. Thus, the Grid project will stop completely. • Unauthorised access: Unauthorised access to crucial file systems such as grid map files or the CAS server. • Unauthorised disclosure: Disclosure of information that is confidential to a VO member such as databases, software code, and results of experiments. • Accidental system failure: Computer, data and network resources are vulnerable to failures due to electricity voltages for instance. • Deliberate sabotage: Computer, data and network resources are also vulnerable to viruses, worms and other types of deliberate sabotage. 80 Critical Evaluation of Current Approaches to Grid Security 7. Conclusion Over the last decade the Grid concept has rapidly evolved from a rigid non-scalable set of ad-hoc point-to-point connections with poor functionalities toward a scalable flexible and dynamic virtual organisation that provides much richer set of functionalities. The potential benefits of the Grid (and Virtual Organizations) are enormous but the biggest barrier for their wider adoption is security. Currently a limited concept of VO is realized by the Grid for scientific collaborations but the domains of potential applications is huge: VO possibilities range from e-government, health, e-learning to military and the coordination of multinational forces! Grid security, Globus 3, has made a strong leap forward in recent years in many aspects(for example authentication, confidentiality) but it still has many deficiencies and yet not convincing for the commercial world. The innovative applications of cryptography, PKI, and X509 have been the source of many of the radical advances in the evolution of security solutions to these aspects. In this report, we have focused on understanding the nature of a VO and the way this concept has evolved over the last dozen years. We attempted to clarify the mission of the virtual organization, the explicit assumptions about the roles involved in the VO (administrators, sites, users, certificate authorities, and third parties), the trust relationships among these roles and how security policy conflicts between VO and the local sites are resolved. Specific solutions corresponding to various types of VOs and from Grid Security Infrastructures have been discussed in depth throughout this report. For example, it has become obviously clear, as depicted in Table 7.1, what kinds of tasks are feasible to be solved by each type of VOs. For instance, running a task on a remote named resource is feasible by all the VOs but the collective execution of a distributed task on remote resources (in different sites) seems only feasible with a VO trusting a CA. 81 Critical Evaluation of Current Approaches to Grid Security VO Functionality Ad hoc connections Trusting Central Database X X X X X X Trusting a CA Delegation MyProxy U.A accesses named resources R.B U.A is dynamically allocated a resource to perform a task Distributed execution on named resources Distributed execution on dynamically allocated resources Table 7.1: Task functionalities for each type of VOs We have considered each of the main security aspects: authentication, authorisation, confidentiality, integrity, availability, and accountability and, in turn, critically discussed the main security mechanisms for achieving it within each type of VOs. We have also included desirable criteria such scalability, flexibility, and usability issues in the evaluation of each security aspects in each VO type. For the authentication aspect, the major technical problems have been solved with a CA, however, serious problems are still to be solved with confidentiality and authorization, and major problems have not yet been addressed with accountability! The major contributions of my own research to this report can be highlighted as follows: • Adopting a notation (from Hoare’s Communicating Processes) for clearer description of VO entities (users, sites, resources, contact, etc...) • Making the initial steps towards a more accurate modelling of VOs as mathematical structures (as a vector of relevant entities such as CA, sites, resources, etc…). This approach to modelling helps to clearly and concisely identify the main issue in reconciling security policies among the VO sites. • Classifying types of VOs according to the trust relationships between users, sites, CAs, and third parties. • Giving critical discussions of the major mechanisms for achieving security properties (authentication, authorization, confidentiality, etc...) in each type of 82 Critical Evaluation of Current Approaches to Grid Security VOs. Some of the benefits and drawbacks of Globus are available in the literatures but they have mainly been presented for developers and users. • Providing the initial ground work for a seemingly new top-down approach to viewing and analysing Grid security based on a careful combination of modelling, classical characterisation of security, and concepts of risk analysis. Finally, I have greatly enjoyed exploring this exciting topic and hope to have the opportunity of exploring further some the avenues and challenging problems in the future. 83 Critical Evaluation of Current Approaches to Grid Security References [1] I. Foster, C. Kesselman (eds.). “The Grid: Blueprint for a New Computing Infrastructure”. Morgan Kaufmann, 1999 [2] D. Gollman. “Computer Security”. John Wiley, 1999. [3] I. Foster, C. Kesselman, S. Tuecke “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, International Journal of Supercomputer Applications and High Performance Computing, 2001, 200-222. [4] I. Foster, C. Kesselman, G. Tsudik and S. Tuecke “A Security Architecture for Computational Grids”. ACM Conference on Computers and Security, 1998, 83-91. [5] I. Foster, L. Pearlman, V. Welch, C. Kesselman, S. Tuecke “A Community Authorisation Service for Group Collaboration”, www.Globus.org [6] R. Buttler, V. Welch, D. Engert, I. Foster, S. Tuecke, J. Volmer, C. Kesselman “A National-Scale Authentication Infrastructure”, IEEE, Dec. 2000 [7] M. Surridge “A rough Guide to Grid Security”. Issue 1.1a, IT-Innovation centre, 2002. [8] D. De Roure, M.A. Baker, N. Jennings, and N. Shadbolt. “The Evolution of the Grid”. The International Journal computation and Currency: Practice and Experience, Wiley and Sons Ltd., June 2002, ISSN 1040-3108 [9] S. Thomas. “SSL and TLS Essentials, Securing the Web”, John Wiley, 2000 [10] F. Piper. “Introduction to cryptography”, Lecture Notes, RHUL, 2003. [11] Matt Robshaw, Sean Murphy. “Advanced Cryptography”, Lecture Notes, RHUL 2003. [12] H. Mack, “Public Key Infrastructure in E-Commerce Environments”, ECommerce Infrastructure, Lecture notes, Royal Holloway, University of London, 2003. [13] G. Price, “Public Key Infrastructure: Challenges and Challengers”, Current development in E-commerce, Lecture Notes, RHUL, 2003 [14] M. D. Harper, Herald information Systems. “Trust, Security and Confidence Online: The verifier’s perspective”. Current development in e-commerce, Lecture Notes, RHUL, 2003. [15] A. Stone. “Network Security: Firewalls”, E-commerce infrastructure, Lecture Notes, Royal Holloway, University of London, 2003. 84 Critical Evaluation of Current Approaches to Grid Security [16] V. Welch, F. Siebenlist, I. Foster, J. Bresnahan, K. Czajkowski, J. Gawor, C. Kesselman, S. Meder, L. Pearlman, S.Tueke. “Security for Grid services”. www.globus.org [17] S. Zaba, “Web Security, SSL”, Network Security, Lecture notes, RHUL, 2003 http://www.isg.rhul.ac.uk/msc/teaching/sec3/sec3.shtml [18] S. Zaba “Secure Protocols and VPNs (Part 1& 2)”, Network Security, Lecture notes, RHUL, 2003. [19] C. Ciechanowicz, “Database Security”, Lecture notes, Royal Holloway, 2003. [20] I. E. Gilbert, “Guide For Selecting Automated Risk analysis Tools”, Computer Security Division, NIST. [21] J. Sherwood, “Security Issues in Internet E-Commerce”, Lecture notes, RHUL, 2003. [22] B. LaMacchia, S. Lange, M. Lyons, R. Martin, K. Price. “.NET Framework Security”. Addison-Wesley, 2002. [23] CNN.com cyber-attack batter Web heavyweights. Internet http:// www.cnn.com/2000/TECH/computing/02/09/cyberattacks.01/index.html%1, February 2000. [24] CERT coordination Centre. Denial of Service Attacks. http://www.cert.org/tech_tips/denial_of _service.html, June 2001. [25] Global Grid Forum www.ggf.org [26] IBM Grid solutions: http://www-1.ibm.com/grid/solutions/index.shtml [27] UK E-Science programme: http://www.research-councils.ac.uk/escience/ [28] www.realitygrid.org [29] www.eu-datagrid.org [30] www.eurogrid.org [31] www.ipg.nasa.gov [32] www.Globus.org [33] www.ietf.org/rfc/rfc2692.txt?number=2692; www.ietf.org/rfc/rfc2693.txt?number=2693 [34] www.unicore.org [35] www.unix.org [36] www.microsoft.com/windows [37] www.ibm.com/os2 [38] www.cern.org 85 Internet: Critical Evaluation of Current Approaches to Grid Security [39] www.legion.org [40] www.verisign.com [41] www.entrust.com [42] www.rsa.com [43] www.ietf.org [44] www.xml.org [45] www.globus.org/gram [46] www.globus.org/mds [47] www.openssl.org [48] www.openssh.org [49] www.isg.rhul.ac.uk [50] www.rhul.ac.uk [51] L. Coles-Kemp, “Virtual Private Network and IPSEC”, Current development in E-commerce, Lecture Notes, RHUL, 2003. [52] R. Sandhu, “Identification and Authentication”, Chapter 16, Computer Security Hand book, Fourth Edition, Wiley, 2002. [53] S. Chokhani, “Public Key Infrastructures and Certificate Authorities”, Chapter 23, Computer Security Hand book, Fourth Edition, Wiley, 2002. [54] D. Levine, “Auditing Computer Security”, Chapter 23, Computer Security Hand book, Fourth Edition, Wiley, 2002. [55] N. Smart, “Cryptography: An Introduction”, McGraw-Hill, 2003. [56] B. Shneier, “Secrets and Lies: Digital Security in a Networked World”, Wiley, 2000 [57] L.D. Stein, “Web Security: A Step-by-Step reference Guide”, Addison Wesley, 1998. [58] A. E. Abdallah, P. Ryan and S. Schneider, Formal Aspects of Security, LNCS, 2003. 86