 D6.1: Requirements and Coordinated Security Strategy
EUBra-BIGSEA is funded by the European Commission under the Cooperation Programme, Horizon 2020 grant agreement No 690116.
Abstract: Europe - Brazil Collaboration of BIG Data Scientific Research through Cloud-Centric Applications (EUBra-BIGSEA) is a medium-scale research project funded by the European Commission and the Government of Brazil, in the frame of the third European-Brazilian coordinated call. This document has been produced to perform an analysis of the security concerns of the project, defining the security scope and eliciting the security requirements for the different workpackages.
Applications (EUBra‐BIGSEA) is a medium‐scale research project funded by the European
of Brazil, in the frame of the third European‐Brazilian coordinated call. This document has been
an analysis of the security concerns of the project, defining the security scope and eliciting the
workpackages. 1
Document identifier: EUBRA BIGSEA -WP6-D6.1
Deliverable lead UC
Related work package WP6
Author(s): Nuno Antunes (UC), Tânia Basso (UNICAMP), Roberta Matsunaga (UNICAMP), Paulo Simões (UC), Regina Moraes (UNICAMP), Rogerio de Lemos (UC), Edmundo Monteiro (UC), Marco Vieira (UC)
Contributor(s): Ivano A. Elia (UC), Sandro Fiore (CMCC), Donatello Elia (CMCC), Ignacio Blanquet (UPV), Danilo Ardagna (POLIMI), Andrey Brito (UFCG), Dorgival Guedes (UFMG), Walter dos Santos Filho (UFMG), Daniele Lezzi (BSC)
Due date 31/06/2016
Actual submission date 07/07/2016
Start date of Project 01/01/2016
Duration 24 months
Keywords: security requirements, security strategy, privacy, AAA.
Initial structure 15/04/2016 Nuno Antunes (UC), Tânia Basso (UNICAMP), First version of state of the art Roberta Matsunaga (UNICAMP) 0.3 0.4 10/05/2016 Nuno Antunes (UC), Tânia Basso (UNICAMP)
01/06/2016 Nuno Antunes (UC), Tânia Basso (UNICAMP) Initial requirements elicitation. 0.5 0.6 16/06/2016 Nuno Antunes (UC), Tânia Basso (UNICAMP)
Integration of other WPs concerns.
22/06/2016 Nuno Antunes (UC), Tânia Basso (UNICAMP), Complete requirements elicitation. Paulo Simões (UC) 0.7 0.8 27/06/2016 Marco Vieira (UC), Regina Moraes (UNICAMP)
30/06/2016 Nuno Antunes (UC), Marco Vieira (UC), Tânia Address Danilo’s comments. Basso (UNICAMP) 0.9 1.0 04/07/2016 Nuno Antunes (UC)
Address Andrey’s comments.
05/07/2016 Nuno Antunes (UC) Draft of Section 3, 4 and 5
Internal revision of the document
Copyright notice: This work is licensed under the Creative Commons CC-BY 4.0 license.
............................................................................................................................. 10 2.2 Roles (Non‐malicious Users) ..................................................................................................... 11 2.3 Adversaries and Attack Venues ................................................................................................ 12 2.4 Interaction with other technical Work Packages ....................................................................... 12 2.4.1 Scope of WP6 in the context of WP3 ...................................................................................... 12 2.4.2 Scope of WP6 in the context of WP4 ...................................................................................... 13 2.4.3 Scope of WP6 in the context of WP5 ...................................................................................... 15 3 State of the Art .................................................................................................................... 17 3.1 AAA Provisioning ..................................................................................................................... 17 3.1.1 Mainstream and Emerging AAA Standards and Protocols ...................................................... 18 3.1.2 Federated Identity Services for Academic and Scientific Communities .................................. 21 3.1.3 Identity and Access Management Services in Cloud Infrastructures ...................................... 21 3.2 Security Assurances ................................................................................................................. 25 3.2.1 Security testing ........................................................................................................................ 25 3.2.2 Static and Dynamic Analysis .................................................................................................... 26 3.2.3 Security Benchmarking ............................................................................................................ 27 3.2.4 Vulnerability and Attack Injection ........................................................................................... 28 3.2.5 Security Modelling, Risk Management and Security Controls ................................................ 29 3.3 Data Privacy Solutions .............................................................................................................. 30 3.3.1 Data Anonymization ................................................................................................................ 30 3.3.2 Privacy Policies ........................................................................................................................ 33 3.3.3 Policy Negotiation .................................................................................................................... 35 3.3.4 Privacy Policy Enforcement ..................................................................................................... 36 3.3.5 Privacy Threats ........................................................................................................................ 37 4 Project Security Requirements ............................................................................................ 38 4.1 AAA Provisioning Requirements (T6.2) ..................................................................................... 38 4.1.1 Requirements for the EUBra‐BIGSEA Infrastructure AAA ....................................................... 39 4.1.2 Requirements for the EUBra‐BIGSEA Applications AAAaaS .................................................... 43 4.2 Security Assurances Requirements (T6.3) ................................................................................. 45 4.3 Data Privacy Requirements (T6.4) ............................................................................................ 50 4.4 Summary of the elicited requirements ..................................................................................... 55 5 Conclusions ......................................................................................................................... 57 6 REFERENCES ........................................................................................................................ 58 3
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr EXECUTIVE SUMMARY The EUBra‐BIGSEA project aims at developing cloud services empowering Big Data analytics to ease the development of massive data processing applications. Such applications deal with large amounts of heterogeneous and complex data produced very quickly by a high number of diverse sources. In this scenario, traditional treatment of data, from security to transformation, may be inefficient and inadequate. Thus, the project aims at researching efficient mechanisms to ensure privacy and security, on top of a QoS‐
aware layer for the smart and rapid provisioning of resources in a cloud‐based environment. It is well known that the security concerns of a large and complex system should not be addressed individually or in an ad‐hoc manner, as this may result in insufficient solutions. This is particularly hard in the context of complex systems such as the one being developed in the context of the EUBra‐BIGSEA. This way, it is necessary to define a coordinated strategy that allows achieving the required levels of security. Such strategy should guide the research, development and integration of the security solutions along the project. The main objective of this document is to define a global security solution able to deal with the security objectives of the project: the provisioning of Authentication, Authorization and Accounting (AAA), the assurance of the security properties of the cloud and Big Data services, and the protection of the data privacy. Considering these objectives, the development of this deliverable followed a process with five steps: 1) definition of the security scope and concerns of the EUBra‐BIGSEA infrastructure, taking into account the concerns of the remaining work packages; 2) analysis of the state of the art regarding solutions in the scope of the work package and related to the concerns identified; 3) elicitation of the concrete security requirements whose implementation will cover those concerns; 4) identification of the requirements that can be implemented by using/adapting existing solutions and the ones for which further research is needed (gaps); and 5) prioritization of the requirements to be met (in terms of must have and nice to have requirements). The result is a list of 30 high level requirements whose implementation will provide a secure environment for the infrastructure, for the application developers and even for the end users of the applications running inside the framework. In practice, these requirements will guide the research and development work of WP6 in coordination with the other technical work packages, thus defining the security strategy of the project. The defined solution includes two distinct AAA blocks: 1) a EUBra‐BIGSEA Infrastructure AAA Service, to provide the AAA functionalities to infrastructure managers and application developers/providers; and 2) a EUBra‐BIGSEA Applications AAAaaS, to serve the end users of applications hosted in the EUBra‐BIGSEA. It also includes the security assessment of key infrastructure components and the development of solutions for the issues uncovered, the benchmarking and improvement of intrusion detection systems, and the proposal of metrics to characterize the trustworthiness of the system. Finally, it includes the definition of two distinct privacy control barriers, which are responsible to protect the anonymity of both the raw data to be used and of the data resulting from the predictive and descriptive models built. 5
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 1 INTRODUCTION This document defines the requirements for a global security solution able to deal with the main security concerns of the EUBra‐BIGSEA project: the provisioning of Authentication, Authorization and Accounting (AAA), the assurance of the security properties of the cloud and Big Data services, and the protection of the data privacy. For this, the document defines the security scope and concerns of the EUBra‐BIGSEA infrastructure, taking into account the concerns of the remaining work packages, reviews the state of the art regarding solutions in the scope of the work package and related to the concerns identified, presents the concrete security requirements whose implementation will cover those concerns, and identifies and prioritizes that requirements that can be implemented by using/adapting existing solutions and the ones for which further research is needed (gaps). The EUBra‐BIGSEA general Infrastructure comprises 4 main blocks: ●
QoS Cloud Infrastructure services, which integrates the modelling of the workload, the monitoring of the resources, the implementation of vertical and horizontal elasticity and the contextualization. Big Data Analytics services, which allow operators to process huge datasets and that can be integrated in the programming models. Analytics services are characterized in the QoS cloud infrastructure models of the underlying layer, which automatically (or explicitly driven by the analytics services) adjust resources to the expected workload and considering its specificities. Programming Models, which provide a higher‐level programmatic framework (Python, Java, Spark) and are also characterized by the models of the infrastructure. The programming models ease the parallelisation of the applications developed on top of them. Privacy and Security framework, which provides the means to annotate data and processing and ensuring the proper protection of privacy and security. This document focuses on this block. On top of those four blocks, applications are developed using the programming models and the data analytics extensions. Application developers are expected to use the programming models and may use other features of underlying layers, such as the user‐level QoS metrics. Figure 1.1: High‐level view of the EUBra‐BIGSEA Architecture. Figure 1.1 shows the high‐level view of the EUBra‐BIGSEA architecture, depicting the interactions among the main blocks. Figure 1.2 shows the interactions among Work Packages focussing on how they are related to the WP6, and consequently, to this document. As shown, the information needs about the remaining work packages is of utmost importance to correctly define the scope of the WP6 and for the elicitation of the requirements. More details will be provided in the next sections. 6
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Figure 1.2: Relations between WP6 and the other technical WPs. 1.1 Target Audience This document is mainly intended for internal use, although it is publicly released. At internal level, WP6 members will use the document as a guide for the work to be performed. The document will also dictate the objectives that the team must achieve throughout the project. Researchers from the remaining WPs will be able to know what is being developed in terms of security, thus understanding what they should count on. 1.2 Procedure for the Development of the Deliverable The security concerns of a large and complex system should not be addressed individually or in an ad‐hoc manner, as this usually results in insufficient solutions. A common example is that adding ‘security features’ to a system (e.g. encryption, IDSs) does not make it secure, as it is enough that the software has some vulnerabilities to expose the system to attackers. The reverse is also true, as a correctly implemented software cannot guarantee the security of the system if the infrastructure is incorrectly configured. This is particularly challenging in the context of very complex systems such as the one to be developed in the context of the EUBra‐BIGSEA project. Thus, it is of the utmost importance to define a strategy, in coordination of the other technical work packages, that allows achieving the required levels of security. Such strategy, defined in terms of the requirements to be met, will guide the research, development and integration of the security solutions along the project. In this context, the main objective of this document is to define the requirements of a global security solution able to deal with the security concerns of the project, which are the following: ●
The provisioning of Authentication, Authorization and Accounting (AAA). Due to the criticality of the data stored, managed, and analysed, AAA services/mechanisms have to be provided to the data scientists that use the framework. These services are transversal to all the components, but the most critical aspect is the link between the external services/users and the Application Development Services as, in practice, this represents the (external) access point to the platform. The assurance of the security properties of the cloud and Big Data services. This is a concern that is transversal to all the services, and thus it is necessary to implement security assessment methodologies that provide a degree of trustworthiness on the security of these components. A 7
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr ●
key aspect is to understand the priorities of the infrastructure and the trade‐offs between security and QoS, or even between different security properties. The protection of the data privacy. This is a major challenge, as the database engines mostly used for storing Big Data (e.g. NoSQL databases) provide limited security mechanisms. This way, mechanisms that support the definition and implementation of privacy policies, including the boundaries of the access to protected data and also the boundaries within which the data can be moved, should be provided considering the requirements of different types of data. Considering these concerns, we defined a process based on the set of steps depicted in Figure 1.3 and described next: 1. Define the security scope and concerns of the EUBra‐BIGSEA infrastructure to systematize the objectives of the project from the security perspective. This must be performed taking into account the concerns of the remaining technical work packages, and includes an analysis of the roles that interact with the infrastructure and also the relevant security threats; 2. Perform an analysis of the state of the art in the solutions in the scope of the work package and related to the concerns identified. Although this analysis does not ought to be exhaustive, it must review the main options available. 3. Elicit the security requirements of the project, whose implementation will cover those concerns. This step receives inputs from the remaining technical work packages of the project, which have very specific requirements in terms of security. Furthermore, Deliverable D7.1 (End‐User Requirements Elicitation) will help in the definition of the security requirements, as it allows understanding how will the different components of the solution be used together. 4. Identify the Gaps, i.e. which requirements that can be implemented by using/adapting the existing state of the art solutions, the ones for which further research is needed, and also the relations among them. 5. Prioritize the requirements and thus the respective challenges, taking into account the identified gaps, thus establishing the ones that are more important to address (in terms of must have and nice to have requirements). Figure 1.3: Requirements and strategy definition procedure. 8
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 1.3 Structure The rest of this document is structured in four main sections, as follows. Section 2 defines the project security scope and concerns. This is done first considering the general EUBra‐BIGSEA infrastructure and then in the concrete context of WP3, WP4 and WP5, taking into account the respective concerns and requirements. Section 3 reviews the state of the art relevant to the work package. This review is divided in three subsections aligned with each of the security dimensions the work package: AAA Provisioning, Security Assurances, and Data Privacy Solutions. Section 4 elicits the security requirements defined in coordination with the remaining technical work packages. It also includes an analysis of what can be implemented by using/adapting the existing state of the art solutions and what requires further research work. Also, the requirements are prioritized in terms of must have and nice to have. Finally, Section 5 concludes the deliverable. 9
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 2 EUBRA‐BIGSEA SCOPE AND CONCERNS The first step for the definition of the security strategy is the clarification of the system and of the scope that such strategy should be concerned with, including both the regular users of the system and also the potential malicious users that might be necessary to deal with. Figure 2.1 depicts a very high level view of the infrastructure to be developed during the project, the roles interacting with it and the main classes of attackers that might try to exploit the existing weaknesses. As we can observe, it also defines the scope of this document (and of WP6) in dashed blue line. Infrastructure Users
App Owner
End Users
WP6 Scope
Data App
Data Analytics Application
Data App
WP3 Admin
Application Development Services
analytics privacy
BIGSEA maintain
sync + etl
raw data privacy
WP6 Privacy
Legitimate Roles
Data Source
Malicious Roles
Adversaries and Attack Venues
Data flow
Figure 2.1: Roles, adversaries and threats and their relation with the project. 2.1 Components A simplified view of the EUBra‐BIGSEA framework is provided in Figure 2.1 to better explain how each role and attacker interacts with it. Only the most relevant components for this analysis are represented in the figure. Some of the included components represent complex modules or sets of modules, but those details were abstracted to simplify the representation. Following, we introduce the meaning of each of the represented components: ●
lvl0DS ‐ Level 0 External Data Sources. They may represent a relational or a non‐relational database or even a data stream. They are connected to the framework through modules of synchronization and ETL (Extract, Transform and Load). Their security is out of the scope of the project, but the used communication channels, synchronization and ETL must be secure. IaaS Layer ‐ Infrastructure Layer of the EUBra‐BIGSEA, includes: ○ VMI repos ‐ Repository of virtual machine images. Discussed in detail in Section 2.4.1. 10
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr ○
CMF ‐ cloud management frameworks that support the VMs/containers running the data analytics applications (the project aims at supporting at least OpenStack (OStack) and OpenNebula (ONe)); PaaS Layer ‐ Platform Layer of the EUBra‐BIGSEA, includes: ○ QoS ‐ modules to be developed in the context of the WP3 (Quality of Service Cloud Infrastructure). Discussed in detail in Section 2.4.1. ○ Data Analytics ‐ modules to be developed in the context of the WP4 (Integrated fast and Big Data cloud platform). Discussed in detail in Section 2.4.2. ○ Application Development Services ‐ modules to be developed in the context of the WP5 (Programming abstractions layer). Discussed in detail in Section 2.4.3. ○ App Container ‐ application binary and associated dependencies, embedded in a container, to execute inside the infrastructure. ○ lvl1DS ‐ Level 1 Data Sources are repositories for storing and efficiently handling pre‐
processed data that was acquired into infrastructure, from one or more level 0 data sources, though synchronization and ETL processes. ○ lvl2DS ‐ Level 2 Data Sources are models obtained through the processing of level 1 data sources with machine learning algorithms (e.g. Descriptive Models and Predictive Models). ○ Raw data privacy ‐ privacy protection layer dedicated to the preliminary anonymization of the “raw” data acquired from level 0 data sources, according to the preferences of the Data Source Owner, which are expressed as Data Policies (usually, anonymization is done in the Transformation process of ETL). Discussed in Section 2.4.2. ○ Analytics privacy ‐ privacy protection layer dedicated to assure that the data that is sent to outside the infrastructure is not sensitive. The protection is based on the data policies. Discussed in detail in Section 2.4.2. Data Policies ‐ Anonymization and access control policies to specify, respectively, which data can be accessed and who and under which conditions can access to the data. These policies are defined by Data Source Owners and are enforced by the privacy layers: raw data privacy and analytics privacy. Discussed in detail in Section 2.4.2. Data Analytics Application ‐ developed by the data scientist using the APIs and abstractions provided by WP5. It is deployed and runs inside the framework. WP7 will develop use cases that emulate this. 2.2 Roles (Non‐malicious Users) The figure presents a set of roles that represent people or entities that interact with the framework, to use the functionalities available in a non‐malicious way. The meaning of each role is the following: ●
End‐Users ‐ This role represents the users that will take advantage of the functionalities of the applications developed. App Owner ‐ This role represents the entities that manage the applications developed. Data App Developer ‐ This role represents the scientists that develop applications to run inside EUBra‐BIGSEA and that produce the results that are of interest for the end user. Data App Admin ‐ This role represents the scientists that create and configure the infrastructure that will be used to execute the developed Apps. This role only applies to the WP3 (Section 2.4.1). Data Scientist ‐ This role represents the scientists that mine and analyse the data from data sources. This role only applies to the WP4 (Section 2.4.2). BIGSEA Manager ‐ Represents the entity that maintains the framework. During the project, it is represented by the consortium. Infrastructure Managers ‐ Represents the owners and managers of the infrastructures (e.g. CMFs) that are being used inside the framework as processing and storage power. Data Source Owners ‐ Role that represents the owner and the manager of the level 0 external data sources. 11
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 2.3 Adversaries and Attack Venues Several classes of attackers might also try to use the framework without authorization or with malicious purposes. Following we present a description of these classes: ●
Unauthorized Users ‐ users that do not have the proper authorization to access the end‐user applications or the data analytics application. App Attacker ‐ adversaries that try to use the application in a malicious way, affecting at least one of its security properties: availability, integrity and or confidentiality. Service Attacker ‐ adversaries that try to exploit the interface provided in a malicious way (e.g. through its inputs), submitting specially crafted requests or sets of requests to affect the security properties of the framework. Data Attacker ‐ adversaries that try to access or affect the data sources or the respective repositories, during transport, processing or storage. Container Attackers ‐ adversaries that deploy specially crafted applications that try to overcome the respective containers and affect the security properties of the infrastructure. Infrastructure Attackers ‐ adversaries that try to affect the security properties of the cloud management frameworks or of the virtualization and storage infrastructures. VMI Attackers ‐ adversaries that deploy specially crafted VMIs in the repository, which will try to use the respective virtualization context and affect the security properties of the host or remaining guests. 2.4 Interaction with other technical Work Packages Considering the complexity of the project and of the infrastructure to be developed, it is not possible to understand all the nuances of the project with one single representation. Therefore, the following sections analyse the WP6 from the perspective of the other technical work package, explaining and discussing how WP6 interacts with each one. 2.4.1
Scope of WP6 in the context of WP3 This section clarifies the scope of WP6 from the perspective of WP3, which focuses on the development of QoS Cloud Infrastructure services. The project deliverable D3.1 presents the architecture of the QoS Monitoring System and therefore is an important source of information regarding WP3. Figure 2.2 presents a high level view of the WP3 interaction diagram and software architecture. It also represents, among these components, the specific concerns that are to be handled by WP6. The represented roles match those of the Figure 2.1 and the same colour‐code was used. Also, correspondent components are represented using the same names, although Figure 2.2 is more detailed. 12
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Figure 2.2: Relation between WP3 and WP6, based on D3.1 diagrams. All the applications and tasks submitted by the users should be subject of authentication, authorization and accounting, to make sure they are accessing resources that they are allowed to. The authentication of each user application needs to be extended to the modules that it uses, something that can be easily addressed by a solution that uses security tokens. The software architecture devised to implement the WP3 solution includes several components that are exposed to security threats. Thus, WP6 should provide a priori assurances that these components meet the project security requirements, and also should provide metrics that depict how trustworthy these components are from a security point of view. The following modules are identified as a priority from a security assurances point of view, due to their exposure to threats: the application containers (that will host the applications from Compose and Submission Service), Cloud Management Frameworks, Virtualization Infrastructures (used to host both processing and storage Resources), and from WP4 and WP5, the Application Development Services and Data Analytics Services (as discussed in the following sections). Finally, as part of the continuous assurance of the working system security, it is necessary to use an intrusion detection system to protect the functioning of the infrastructure and respective software. The attacks detected, and the ones that are not detected but that are effective, will have an impact on the trustworthiness of the system. 2.4.2
Scope of WP6 in the context of WP4 This section clarifies the scope of WP6 in the particularities that concern WP4. The project deliverable D4.1 presents the design for the integrated big and fast data ecosystem, and therefore is an important source of information regarding WP4. Figure 3.3 presents a high level view of the WP4 components and represents, among these components, the specific concerns that should be handled by WP6. The represented roles match the previous figures, the same colour‐code was used, and the correspondent components are represented with the same names. 13
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Data
AAA (authentication, authorization and accounting)
Analytics Privacy (3)
Application Development Services
Data Policies
Access Control
Lvl2: Data Sources
Analytics Privacy (2)
Lvl1: Data Sources
Data Policies
e.g. Machine Learning
Raw Data Privacy (1)
Data Policies
Data Flow
ETL Modules
WP6 Privacy
Synchronization/Download Modules
Lvl0: External Data Sources
Data Source
WP6 Assurances
End Users
Figure 2.3: Relation between WP4 and WP6. WP4 will provide a low level API focused on data analytics. Due to the focus of the work package in data, it is understandable that most of the concerns are related to privacy and the control of access to the data. However, the users of the API defined by WP4 will need authentication, authorization and accounting. The APIs to be provided to the users will also require security assessment, to make sure that the users cannot subvert it to do more than they should be allowed to do. This will be discussed in more detail together with the WP5. Additionally, it is also necessary to assure the security of the data transportation in the Synchronization/Download Modules. This layer provides the functionalities to download and synchronize data stored into the system with the external data sources, for instance to manage streams of data and to (automatically) synchronize the storage layer. The privacy concerns are located at three main levels, which represent the three main boundaries that the data must cross, located respectively: (1) between the ETL layer and the remaining infrastructure. This layer is concerned with the anonymization of the raw data acquired into the infrastructure, and depends on the data policies defined by the owner of data source, which entrusts his data to the infrastructure according to certain conditions; (2) between Lvl1DS and Lvl2DS, concerning the anonymization of data that may be disclosed during the algorithms used to build predictive and descriptive models based on the data; and (3) between the infrastructure and the API/Users, which concerns the control of privacy of all the data that leaves the infrastructure and the control of the access to all the data handled by the external entities. It is important to understand that the described privacy layers do not act equally for every data source and scenario. In fact, it is expected that the data source owners are able to express in policies when and how should these privacy layers protect the data. Considering the types of data that are to be handled in the EUBra‐BIGSEA infrastructure and the goals of the data analytics applications to be used, it is foreseeable that there will be three very distinct scenarios regarding data privacy concerns, as depicted in Figure 3.4: 14
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr ●
Statistical Disclosure (a): big data analytics algorithm that is able to extract sensitive information about one or more specific individuals by combining information that was already anonymized (e.g., identifying a person based on the bus trajectories that such person performs every day). Aggregation (b): big data analytics algorithm that produces less sensitive data from personally identifiable information (e.g., the weight of a person might be a private info, but the average weight of a large population is not sensitive from a privacy perspective). Equivalent privacy (c): algorithms that produce data that is related to the raw or anonymized data used. Figure 2.4: Possible relations about algorithms input/output privacy, to be expressed by developers. Although it is not expectable that the privacy protection layers are able to identify automatically the characteristics of the algorithms, it is an acceptable assumption to consider that the developer of the algorithm will provide this information. This will be discussed in the next section, together with the concerns of WP5. 2.4.3
Scope of WP6 in the context of WP5 This section discusses the scope of WP6 in relation with WP5. Figure 3.5 presents a high level view of the WP5 components together with the specific concerns that should be handled by WP6. The represented roles match the previous figures, the same colour‐code was used, and the correspondent components are represented with the same names. Figure 2.5: Relation between WP5 and WP6. 15
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr As the goal of WP5 is to propose programming abstractions that work based on the layers below, the security concerns of WP5 are integrally connected with the ones of those layers. As this layer will provide one of the interfaces of the system, services for authentication, authorization and accounting will be needed. Also, it is important to assure the privacy and access control of the data for the operators that work with WP4, as discussed in the previous section. One of the concerns more specific to WP5 is the security of the API provided for application development, which should not allow the developers to perform tasks that interfere with other applications running. The API also should not allow malicious users to take advantage of its inputs to subvert the functionalities of the applications or even of the complete framework. It is also important to have a set of guidelines for secure development, similar to the ones existing for other high level languages (best practices). Finally, regarding the different scenarios of data privacy concerns of the algorithms (see Figure 3.4), it is necessary to include in the WP5 programming abstractions ways for the programmer to express the privacy characteristics of the developed algorithms. 16
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 3 STATE OF THE ART This section reviews the existing tools and methodologies that are relevant in the security scope and concerns of the EUBRA‐BIGSEA project. Some of those methodologies and tools may be used or extended to be used in the project, as defined later in this document. 3.1 AAA Provisioning AAA is a popular and generalized designation for support services providing authentication, authorization and accounting services for accessing resources (e.g. network, processing, storage, data, applications, documents). While there is a wide variety of AAA protocols and implementations, corresponding to a wide variety of usage scenarios and protected assets, they all share the same base concepts: ●
AAA services are support services, since they focus on allowing providers to control (and possibly measure and charge) user access to protected assets (applications, services, resources, and content). Authentication typically refers to the first phase of access to the AAA service interaction, where the user provides some sort of identity credentials, associating himself with a previously defined identity. A wide variety of formats is possible, for both identity and authentication credentials (login, password, IPN, IP or Ethernet network address, mobile phone SIM card, digital signatures, biometrics, secure tokens, etc.). While in some cases the management of identities is considered as part of the core AAA system (especially in scenarios with few users), in general users’ databases (user data, credentials, usage data…) are kept in autonomous Identity Management Systems. Depending on the usage scenario, these databases may be internal to the organization (e.g. subscribers’ database), federated with other organizations (e.g. for roaming purposes), or delegated (e.g. authenticating on a third party internet service using the Facebook account). Authorization refers to the phase where the AAA service verifies whether the (already identified and authenticated) user is allowed to perform a specific action on a resource, based on predefined access policies. Depending on desired granularity, authorization may happen once at the beginning of the session or several times during the session (for different actions). Federated and/or delegated Authentication/Authorization. As already mentioned, in some application scenarios users’ data is spread across different domains in a federated manner or entirely kept by external entities, instead of being kept internally on the entity providing the assets covered by AAA. In these cases, authorization and/or authentication protocols are required to explicitly support external entities in such a way that user's’ credentials are safely kept at their original location (e.g. Facebook authenticates a certain user but never reveals its credentials to the 3rd party service). Typically, only authentication is externalized (federated or delegated) and authorization decisions are performed locally, but other arrangements are possible. Accounting refers to the collection of usage data during service delivery. This data can be used for a wide variety of purposes (e.g. billing & charging, auditing, trend analysis and capacity planning). AAA services are not responsible for directly obtaining this usage data – instead, they focus on collecting and maintaining the data provided by the underlying service. It should also be mentioned that accounting operations are frequently deemed as less relevant than authentication and authorization, especially when the usage of the service is not charged. This led to the coining of alternative designations for tools focusing specifically on access control and not actively addressing accounting functionality. The two most commonly used terms are AA (Authentication and Authorization) and IAM (Identity and Access Management). Figure 2.1 illustrates the role of AAA in a typical service interaction. In the Pre‐service phase, the request from the user is processed in order to collect authentication information, check and authorize (or not) 17
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr access to information. Service delivery is responsible for collecting query details and delivery results. Accounting occurs after the service phase to charge the used service. For a few classic scenarios (especially in the field of network access), AAA has already reached a remarkable state of maturity, with well‐established protocols and tools that transformed AAA into a commodity support service. Nonetheless, that are still many outstanding challenges, especially in the field of authentication and authorization, as well as on the support of AAA on emerging specialized scenarios. This observation is supported by the analysis of key standardization activities, provided next. Figure 3.1: AAA Service Interaction (Rensing, Karsten and Stiller, 2002). 3.1.1
Mainstream and Emerging AAA Standards and Protocols Looking for instance at the Internet Engineering Task Force (IETF), we note the AAA Working Group has concluded its activities in 2007, with the specification of the DIAMETER base protocol for network access (IETF RFC 6733; Fajardo et al., 2012) and an associated set of specific application protocols. DIAMETER is an improvement over previous solutions such as RADIUS (Rigney et al., 2000) and Cisco’s TACACS+ (Carrel and Grant, 1997). It has been widely adopted by the mobile telecommunications industry, due to its strong integration with the architecture of the 3GPP IP Multimedia Subsystem (IMS) and the LTE Evolved Packet System (EPS). The DIAMETER core features have remained mostly stable, despite regular update cycles. Meanwhile, RADIUS also managed to keep a significant usage base in other application segments (e.g. wired and wireless local area networks, internet access, VPNs, etc.) and is still actively endorsed by IETF. The current IETF efforts in the field of AAA focus mostly on: ●
The recent Working Group on Authentication and Authorization for Constrained Environments (ace WG, 2016), which addresses the specific authentication and authorization challenges imposed by constrained environments such as the Internet‐of‐Things (IoT). ● The Web Authorization Working Group, which addresses the improvement of the Web Authorization protocol (OAuth 2.0) (Hardt, 2012). OAuth is a standard for user authorization (at third‐party applications and resources), which sometimes is also used for authentication (or pseudo‐authentication (Sakimura, 2016)). It is a widespread web authorization protocol, with support from players such as Google, Facebook, Twitter, Microsoft and LinkedIn and many others. The OASIS Consortium (Organization for the Advancement of Structured Information Standards) is another industry‐led consortium developing standards for areas such as security, IoT and content technologies. The most relevant outcomes of OASIS, regarding AAA, are: ●
SAML (Security Assertion Markup Language), an XML‐based protocol for exchanging authentication and user data (e.g. attributes, entitlements) between identity providers and service 18
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr providers (OASIS, 2005). Figure 2.2 provides an example of a simple single‐sign‐on (SSO) initiated by a Service Provider using redirect and post bindings. Figure 3.2: SMAL‐enabled SSO initiated by Service Provider (OASIS 2005).
XACML (eXtensible Access Control Markup Language), an XML‐based standard for interoperability across access control implementations (OASIS, 2013), allowing exchange of access control rules and policies based on attributes. Each user is associated with attributes, and these attributes, together with the request and system context, are used to decide if the user is granted access to a specific resource. There are several implementations of XACML, including for instance the well‐known Shibboleth package. Figure 3.3 presents the main actors in the XACML domain and typical data‐
flows. The access requester sends a request to the Policy Enforcement Point (PEP) that forward it to the context handler component, which in turn interacts with Policy Decision Point (PDP) or Policy Information Point (PIP) to collect the adequate attributes (from Policy Administration Point (PAP), subjects or environments), handle the context and provide the response to PEP that post the obligations as an obligations service. Considering the specific field of Cloud, OASIS has two Technical Committees (TC) worth mentioning: The OASIS Identity in the Cloud TC, focused on the challenges imposed by identity management in the Cloud; and the OASIS Cloud Authorization TC, focused on devising enhanced models for managing authorizations and entitlements in Cloud contexts. Both TC’s already produced interesting sets of reference Use Cases for defining proper requirements (OASIS 2012b) (OASIS 2014). 19
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Figure 3.3: XACML main actors and reference data‐flows (OASIS 2013). OpenID Connect is an alternative for decentralized authentication. It is promoted by the OpenID Foundation and the latest releases (OpenID, 2015) are essentially an identity layer built on top of the OAuth 2.0 framework. Comprehensive reviews of OpenID, including a comparison with SAML, can be found for instance in (Chadwick and Shaw, 2008) and (Park, 2012). The most widespread protocols for decentralized AA are probably OpenID Connect and SAML, for authentication, and OAuth and XACML, for authorization. They attract a wide base of supporters, which keeps growing. Complementing them, other organizations are also addressing specific AAA fields and/or applications areas. GSMA (GSM Alliance), for instance, is implementing Mobile Connect (GSMA 2016), an initiative focused on building a standard framework for user authentication and identity services between mobile network operators and service providers. This framework supports website and application authentication through the usage of the user’s mobile phone number. While technically different from authentication services provided by players such as Google and Facebook, Mobile Connect is also built on top of OpenID and targets the same market. Mobile operators endorse Mobile Connect as a way of taking advantage of their large customer databases to compete with Google, Facebook and other internet players in this field. The FIDO Alliance is an industry‐led consortium focused on developing industry specifications for strong authentication mechanisms, including smart cards, biometrics and trusted platform modules (Fido 2016). FIDO has meanwhile joined efforts with the World Wide Web Consortium (W3C) in order to provide FIDO‐
enabled authentication to web applications (W3C 2016). GlobalPlatform is leading the international industry standardization efforts for trusted end‐to‐end secure deployment and management solutions on top of secure chip technology, encompassing for instance 20
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr payments cards, electronic passports and ID cards, mobile phones and tablets. While AAA is not in the core scope of GlobalPlatform, Secure Identity Management and authorization mechanisms are a relevant topic of its standardization efforts (GlobalPlatform 2016). 3.1.2
Federated Identity Services for Academic and Scientific Communities In some application scenarios users’ data is spread across different domains in a federated manner or entirely kept by external entities, instead of being kept internally on the entity providing the assets covered by AAA. Considering the specific scope of the EUBra‐BIGSEA Project, it is worthwhile to identify the most relevant federated AAA services for academic and scientific communities. The most well‐known example of federated AAA framework is probably eduroam (Milinović et. al. 2008), a widespread roaming access service developed for the international research and education community and providing academics with secure network access both in their own academic institution and when visiting other institutions. eduroam is based on a hierarchical system of RADIUS servers. Initially restricted to a few European countries, it became global and currently covers academic communities from 77 territories. While eduroam is specialized in network access, eduGAIN (EduGAIN 2016) aims at federating more generalized Authentication and Authorization Infrastructures for the same academic communities. Implemented using Shibboleth and SAML, it connects identity providers (academic institutions) with service providers (often the same set of academic institutions, but also others). Despite being still less known and less used than eduroam, eduGAIN already covers a wide number of countries, including for instance most European Community countries, North America, and Brazil – through the CAFe service, described next. The CAFe (Academic Community Federated) (Cafe, 2016) is an identity federation which associates educational and research institutions in Brazil. Through CAFe, a user keeps all his information at the home institution and can access services offered by the institutions that attend the federation, without the need to create username and password for each service. CAFe infrastructure is composed of identity providers (IdPs), services providers (SPs) and federation discovery service WAYF (Where Are You From). The institutions can act as an identity provider and/or as a service provider. Developed by the Brazilian RNP (Rede Nacional de Ensino e Pesquisa – National Research and Education Network), CAFe uses SAML, universally adopted by academic identity federations, and Shibboleth, for single sign‐on (SSO). CAFe is available since 2009 and nowadays has more than 80 institutions associated, including universities (e.g., University of Campinas ‐ UNICAMP, University of Minas Gerais ‐ UFMG, University of São Paulo ‐ USP, etc.), hospitals (e.g., Hospital A.C. Camargo) and museums (e.g., Museum of Astronomy and Related Sciences ‐ MAST). Furthermore, the Brazilian eduroam infrastructure, launched in 2012, is integrated with CAFe. Since December 2012, CAFe became part of the already mentioned eduGAIN infrastructure. More recently, there has been some preparatory work towards a more ambitious approach towards a common European Authentication and Authorization platform for scientific resources in Europe, covering scientific data and scientific resources in a wider manner (Florio et al. 2012). This work, driven by TERENA, aims at designing an integrated AA platform capable of encompassing the various existing infrastructures (EduGAIN, eduroam, numerous sectorial scientific data networks, GRID resources, and institutions such as ESA and CERN). This study, mandated by the European Commission, is still more focused on identifying all the resources that should be encompassed by this platform than on defining in detail the technical solutions for building the AA platform. Nonetheless, SAML 2.0 is already identified as the prime candidate standard for exchange of AA data and EduGAIN as the prime base for the inter‐federation infrastructure. This study also discusses the specific impact of cloud infrastructures on such a framework. 3.1.3
Identity and Access Management Services in Cloud Infrastructures In addition to other security concerns, AAA services – often designated as User Identity and Access Management (IAM) systems – are also deeply impacted by the growing adoption of cloud infrastructures.
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr AAA in cloud scenarios needs to be addressed at multiple levels, corresponding to the different cloud paradigms. When looking at Infrastructure‐as‐a‐Service (IaaS) or at Platform‐as‐a‐Service (PaaS) from the perspective of the Cloud Service Provider, the role of AAA is essentially to control the access to cloud resources (virtual machines, networking resources, storage, database instances, etc.) by the direct users of these cloud resources, which we will generically designate as “application developers”. When looking at PaaS or Software‐as‐a‐Service (SaaS) from the perspective of the application developer, the role of AAA is to identify and authorize the applications’ end‐users. Next, we discuss each of these two distinct perspectives.
IAM Tools for cloud infrastructure management
Public cloud offerings such as Microsoft Azure and Amazon Web Services (AWS) offer specialized AA tools for managing access to cloud resources – for instance an application developer requesting a new Virtual Machine (VM) or a new VPN connection or requesting access to a set of already deployed VMs. Examples include AWS Identity and Access Management (IAM) and Microsoft Azure Active Directory (AD). The major advantage of these tools is the full integration with the native infrastructure, and the full support for multi‐tenancy, large organizations (e.g. the notion of groups and roles) and sophisticated, fine‐grained and automated security policies. They also promise reasonable levels of integration with third party systems, although this does not eliminate some level of vendor lock‐in.
AWS IAM, for instance, supports integration via SAML 2.0 and exhibits a considerable number of third party complaint AA tools (AWS 2016). On the other hand, the Google Cloud IAM documentation available provides little technical information about integration with third party systems, and seems to be focused on the Google ecosystem alone (Google 2016a), with more open approaches relegated to the end‐user authentication services provided directly by application engines.
Microsoft Azure also offers specific identity and access management tools, based on Azure Active Directory (AD) solutions. Unlike AWS IAM, which remains focused mainly on infrastructure management, Azure AD is gradually unifying several Microsoft AAA and identity management solutions under a common umbrella, covering not only infrastructure access but also end‐users of enterprise applications and consumer applications. Azure AD claims SAML 2.0 support for integration with third party identity providers, but available documentation is less clear on how this integration is actually achieved. The list of compliant tools is also extensive but less clear on the technical details (Microsoft 2016), despite some interesting comparative analysis of User and Access management in the Azure ecosystem (Kuppinger, 2014).
Open source cloud frameworks also provide AAA tools, but their feature set varies from case to case, in line with the maturity of the underlying framework (Habiba, 2014). OpenStack, for instance, provides OpenStack Identity, also known as Keystone (Keystone 2016), for managing access to cloud resources, with support for multi‐tenancy, authentication, policy management and catalog services, including tenant and user registration, users’ authentication and token‐based authorization. It also claims support for SAML 2.0 and OpenID Connect for federated scenarios, with some literature presenting specific examples of integration between OpenStack and mature SAML 2.0 federations such as EduGAIN (Héder et al., 2015). Figure 2.4 illustrates the Keystone identity service flow for creating a new instance (e.g. a new VM). Following OpenStack’s general design, Keystone is an independent component that may (at least in theory) be replaced by alternative IAM implementations.
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Figure 3.4: Keystone identity service flow (Keystone 2016).
OpenNebula users & group management services combine the organizational concepts of users, groups and virtual data center (VDC) to support multiple deployment scenarios. OpenNebula Users are classified as (i) Administrators (user belongs to an admin group), (ii) Regular (may access most OpenNebula functionality), Public users (may access only public interfaces) and (iii) Service users (user with OpenNebula service account). The permissions mechanisms are inspired by Unix: resources are shared by permission granting (made by resource owners) to other users on their group and/or to other users in the system, and access levels include Use, Manage and Admin. OpenNebula is supposed to support SAML 2.0 for integration with external systems, but little documentation or known integration cases are available (OpenNebula 2016).
Eucalyptus IAM takes an approach that closely follows AWS IAM principles and interfaces. In practice Eucalyptus IAM provides a (reasonably large) subset of AWS IAM functionalities (HP 2016), in line with the close relation between Eucalyptus in general and AWS.
FIWARE Identity Management (FIWARE 2015) is a FIWARE Generic Enabler (GE) for managing user identity and access. This GE encompasses both the access to cloud resources (e.g. for app developers) and the access of end‐users to FIWARE‐enabled consumer applications, as illustrated on Figure 2.5, AA operations include SAML 2.0 and OAuth 2.0 interfaces for federated identity management.
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Figure 3.5: FIWARE Identity Management Generic Enabler – High Level Architecture (FIWARE 2015). IAM Tools focused on end‐users and enterprise/consumer applications
Complementing IAM tools for controlling the access to the cloud infrastructure or platform resources, it is also important to take a look at IAM tools focused on managing the identities of end‐users of cloud‐based applications and controlling their access to those applications. These aim of these tools is to provide the application developer with a set of pre‐built IAM services for integration on the application being developed.
As already hinted, some of the IAM tools previously mentioned also address this usage scenarios, providing support for end‐user management, with the two most notable cases being the Azure AD suite and the FIWARE Identity Management Generic Enabler.
The Azure AD suite includes specific IAM functionalities for the business to consumer market (B2C), including easy integration with third party identity services such as Facebook, Google and LinkedIn, scalable deployments able to support a very large number of users, and a wide array of libraries and APIs for easy integration with the applications being developed. This suite also includes specific IAM functionalities for corporate applications, including easy integration and/or interfacing with legacy corporate directory services.
The FIWARE Identity Management GE services for enterprise/consumer applications are less sophisticated, providing only basic services/interfaces and requiring some additional effort from application developers for achieving truly sophisticated IAM services for enterprise and/or consumer applications. In line with FIWARE’s initiative in general, more advanced functionalities are expected to become gradually available.
The Google Identity Platform (Google 2016b) provides a set of libraries, SDKs and services for managing and authenticating end‐users. It supports SAML, OAuth 20.0 and OpenID Connect, and also provides services for authentication using third party accounts (e.g. Google, LinkedIn, Netflix).
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr The Cloud Foundry Platform‐as‐a‐Service framework includes the User Account and Authentication (UAA) Service for end user authentication (Cloud Foundry 2016). UAA includes native support for OAuth2, SAML, OpenID Connect and SCIM 2.0, centralized identity management and SSO.
A possible alternative to such cloud‐native services is the adoption of one of the many classic IAM tools available in the market1. Generally, these tools may be virtualized and deployed over cloud infrastructures, or at least integrated with and accessed from applications running on cloud infrastructures. However, it should be noted that there is a difference between running a classic IAM tool in a virtual machine (virtualization) and full cloudification of IDM services, with integrated support for scalability, elasticity and lifecycle management. Plain virtualization will be OK for less demanding applications, while more extensive cloudification may be required in other scenarios, in order to take full advantage of the cloud paradigm. Available documentation provides little technical details about the level of support for cloudification effectively provided by each of these tools.
3.2 Security Assurances This section reviews the existing tools and methodologies that are relevant for the assessment of the security properties of the cloud and the data services, with special focus on availability and confidentiality. 3.2.1
Security testing Several of the security assessment are based on tests. The concept of black‐box testing is based on the analysis of the program execution from an external point‐of‐view (Myers et al., 2011). In short, it consists of exercising the software and comparing the execution outcome with the expected result. There are several levels for applying black‐box testing, ranging from unit testing to integration testing and system testing. The testing approach can also be more formalized (based on models and well defined tests specifications) or less formalized (e.g. when considering informal “smoke testing”). The tests specification should define the coverage criteria (i.e. the criteria that guides the definition of the tests in terms of what is expected to be covered) and should be elaborated before development. The idea is that the test specification can help developers during the coding process (e.g. tests can be executed during development) and that, by designing tests a priori, it is possible to avoid biasing the tests due to knowledge about the code developed. Robustness testing is a specific form of black‐box testing (Myers et al., 2011) . The goal is to characterize the behaviour of a system in presence of erroneous input conditions. Although it is not directly related to benchmarking (as there is no standard procedure meant to compare different systems/components concerning robustness), authors usually refer to robustness testing as robustness benchmarking. This way, as proposed by (Mukherjee and Siewiorek, 1997), a robustness benchmark is essentially a suite of robustness tests or stimuli. A robustness benchmark stimulates the system in a way that triggers internal errors, and in that way exposes both programming and design errors in the error detection or recovery mechanisms (systems can be differentiated according to the number of errors uncovered). Penetration testing, a specialization of robustness testing, consists of the analysis of the program execution in the presence of malicious inputs, searching for potential vulnerabilities (Stuttard and Pinto, 2007). In this approach the tester does not know the internals of the web application and it uses fuzzing techniques over the web HTTP requests (Stuttard and Pinto, 2007). The tester needs no knowledge of the implementation details and tests the inputs of the application from the user’s point of view. The number of tests can reach hundreds or even thousands for each vulnerability type. Penetration testing tools2 provide an automatic way to search for vulnerabilities avoiding the repetitive and tedious task of doing hundreds or even thousands of tests by hand for each vulnerability type. Despite the use of automated tools, in many situations it is not possible to test all possible input streams, as that would take too much time. So, as soon as software specifications are complete, test cases should be 1
http://solutionsreview.com/identity‐management/identity‐management‐solutions‐directory/ 2
E.g. http://tools.kali.org/, https://www.metasploit.com/, http://www.acunetix.com/ 25
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr designed to have the biggest coverage and representativeness possible. Automated security testing tools are generally referred to as security scanners (or vulnerability scanners). These scanners have a predefined set of tests cases that are adapted to the system to be tested, saving the user from defining all the tests to be done. In practice, the user only needs to configure the scanner and let it test the application. Once the test is completed the scanner reports existing vulnerabilities (if any detected). Most of these scanners are commercial tools, but there are also some free application scanners often with limited use, since they lack most of the functionalities of their commercial counterparts. 3.2.2
Static and Dynamic Analysis Static analysis techniques examine the code of applications without executing it (Stuttard and Pinto, 2007). This can be done in one of two ways: manually during inspections and reviews or automatically by using automated analysis tools. Inspections, initially proposed by Michael Fagan in the mid 1970’s (Fagan, 1976), are a technique that consists on the manual analysis of documents, including source code, searching for problems. It is a formal technique based on a well‐defined set of steps that have to be carefully undertaken. The main advantage of inspections is that they allow uncovering problems in the early phases of development (where the cost of fixing the problem is lower). An inspection requires several experts, each one having a well‐defined role, namely: author (author of the document under inspection), moderator (in charge of coordinating the inspection process), reader (responsible for reading and presenting his interpretation of the document during the inspection meeting), note keeper (in charge of taking notes during the inspection meeting), and inspectors (all the members of the team, including the ones mentioned before). A code inspection is the process by which a programmer delivers the code to his peers and they systematically examine it, searching for programming mistakes that can introduce bugs. A security inspection is an inspection that is specially targeted to find security vulnerabilities. Inspections are the most effective way of making sure that a service has a minimum number of vulnerabilities (Curphey et al., 2002) and are a crucial procedure when developing software to critical systems. Nevertheless, they are usually very long, expensive and require inspectors to have a deep knowledge on security. Code reviews are a simplified and less expensive version of code inspections that can be considered when analysing less critical systems (Freedman and Weinberg, 2000). Reviews are also a manual approach, but they do not include the formal inspection meeting. The reviewers perform the code review individually and the moderator is in charge of filtering and merging the outcomes from the several experts. In what concerns the roles and the remaining steps reviews are very similar to inspections. Although also a very effective approach, it is still quite expensive. Code walkthroughs are an informal approach that consists of manually analysing the code by following the code paths as determined by predefined input conditions (Freedman and Weinberg, 2000). In practice, the developer, in conjunction with other experts, simulate the code execution, in a way similar to debugging. Although less formal, walkthroughs are also effective on detecting security issues, as far as the input conditions are adequately chosen. However, they still impose the cost of having more than one expert manually analysing the code. The solution to reduce the cost of white‐box analysis is to rely on automated tools, such as static code analysers. In fact, the use of automated code analysis tools is seen as an easy and fast way for finding bugs and vulnerabilities in web applications. Static code analysis tools3 inspect software code, either in source or binary form, in an attempt to identify common implementation‐level bugs (Stuttard and Pinto, 2007). The analysis performed by existing tools varies depending on their sophistication, ranging from tools that consider only individual statements and 3
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr declarations to others that consider dependencies between lines of code. Among other usages (e.g. model checking and data flow analysis), these tools provide an automatic way for highlighting possible coding errors. The main problem of this approach is that exhaustive source code analysis may be difficult and cannot find many security flaws due to the complexity of the code and the lack of a dynamic (runtime) view. The following paragraphs briefly introduce some of the most used and well‐known static code analysers, including both commercial and free tools. Static analysis does not take into account the runtime view of the code, while the main limitation of black‐
box approaches is that the vulnerability detection is restricted by the output of the application. Dynamic analysis combines both techniques to overcome their limitations and can be used for both vulnerability and attack detection. Dynamic program analysis consists of the analysis of the behaviour of the software while executing it (Stuttard and Pinto, 2007). The idea is that by analysing the internal behaviour of the code in the presence of realistic inputs it is possible to identify bugs and vulnerabilities. Obviously, the effectiveness of dynamic analysis depends strongly on the input values (similarly to testing), but it takes advantage of the observation of the source code (similarly to static analysis). For improving the effectiveness of dynamic program analysis, the program must be executed with sufficient test inputs. Code coverage analysers help guaranteeing an adequate coverage of the source code (Doliner, 2006)(Atlassian, 2010). 3.2.3
Security Benchmarking Comparing different alternatives in terms of security is a difficult problem faced by many developers and system administrators. Security benchmarking allows assessing and comparing the security of systems and/or components, supporting informed decisions while designing, developing, and deploying complex software systems and tools. Several security evaluation methods have been proposed in the past (Commission of the European Communities 1993; Infrastructure and Profile 2002; Qiu et al. 1985; Sandia National Laboratories 2012). The Orange Book (Qiu et al. 1985) and the Common Criteria for Information Technology Security Evaluation (Infrastructure and Profile 2002) define a set of generic rules that allow developers to specify the security attributes of their products and evaluators to verify if products actually meet their claims. Another example is the red team strategy (Sandia National Laboratories 2012), which consists of a group of experts trying to hack its own computer systems to evaluate security. The work presented in (Maxion and Tan, 2000) addresses the problem of determining, in a thorough and consistent way, the reliability and accuracy of anomaly detectors. This work addresses some key aspects that must be taken into consideration when benchmarking the performance of anomaly detection in the cyber‐domain. The set of security configuration benchmarks created by the Center for Internet Security (CIS) is a very interesting initiative (CIS 2012). CIS is a non‐profit organization formed by several well‐known academic, commercial, and governmental entities that has created a series of security configuration documents for several commercial and open source systems. These documents focus on the practical aspects of the configuration of these systems and state the concrete values each configuration option should have in order to enhance overall security of real installations. Although CIS refers to these documents as benchmarks they mainly reflect best practices and are not explicitly designed for systems assessment or comparison. Vieira & Madeira proposed a practical way to characterize the security mechanisms in database systems (Vieira and Madeira, 2005). In this approach database management systems (DBMS) are classified according to a set of security classes ranging from Class 0 to Class 5 (from the worst to the best). Systems are classified in a given class according to the security requirements satisfied. In (Neto and Vieira, 2008) the authors analyse the security best practices behind the many configuration options available in several well‐known DBMS. These security best practices are then generalized and used 27
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr to define a set of configuration tests that can be used to compare different database installations. A benchmark that allows database administrators to assess and compare database configurations is presented in (Neto and Vieira, 2009). The benchmark provides a trust‐based security metric, named minimum untrustworthiness, that expresses the minimum level of distrust the DBA should have in a given configuration regarding its ability to prevent attacks. The use of trust‐based metrics as an alternative to security measurement is discussed in (Neto and Vieira, 2010). Araújo & Vieira also proposed a trustworthiness benchmark based on the systematic collection of evidences of the use (or lack of it) of secure coding practices (collected using static analysis techniques), which can be used to select one among several web applications, from a security point‐of‐view. A similar approach was followed in the assessment of security configuration for mobiles devices (Vecchiato et al., 2016). As proposed in (Neto and Vieira, 2010), security assessment considers two perspectives: 1) the active search for vulnerabilities and security problems, and 2) the characterization of the proneness for other hidden or unidentified problems to exist. The methodology compares the security of user‐defined configurations with the recommendations of CIS benchmarks. Then, a risk analysis approach is used to impact and probabilities of known vulnerabilities and/or attacks to harm the device or the owner. This information is used to benchmark the security of mobile devices for personal and corporate use. Finally, the authors developed a security certification process that uses the risk analysis approach and the security benchmarking methodology to certify if a device in a contextualized environment is secure enough to access the desired information or system. Security benchmarking, and security assessment in general, is an open research problem. In fact, although there are several works in the literature, there is no “good enough” model for assessing and comparing the security of alternative systems and components. A key issue is that security is largely related with the “unknown” vulnerabilities and attacks, and comparing systems based on well‐defined attack loads may lead to conclusions that ultimately do not hold in the field (e.g. when a new vulnerability or attack type is discovered). 3.2.4
Vulnerability and Attack Injection Fault injection has long been an effective approach to validate specific fault handling mechanisms and to assess the impact of faults in actual systems, allowing the estimation of fault‐tolerant system measures such as fault coverage and error latency (Arlat et al., 1989). In the past decades, research on fault injection has specially targeted the emulation of hardware faults, where a large number of works has shown that it is possible to emulate these faults in a quite realistic way (e.g. (Carreira et al., 1998; Rodríguez et al., 1999)). More recently the interest on the injection of software faults has increased, giving rise to several works on the emulation of this type of faults (Duraes and Madeira, 2006; Durães and Madeira, 2003). In practice, software fault injection deliberately introduces faults into the system in a way that emulates real software faults. A reference technique is G‐SWFI (Generic Software Fault Injection Technique) (Duraes and Madeira, 2006), which supports the injection of realistic software faults (i.e. faults most likely present in a software). The faults injected are described in a library derived from an extensive field study aimed at identifying the types of bugs that can reasonably be expected to occur frequently in a software system. The use of fault injection techniques to assess security is a particular case of software fault injection, focused on the software faults that represent security vulnerabilities or may cause the system to fail in avoiding a security problem. Security vulnerabilities are in fact a particular case of software faults, which require adapted injection approaches. In (Fonseca and Vieira, 2008) the vulnerabilities of six applications were analysed using field data based on a set of 655 security fixes. Results show that only a small subset of 12 generic software faults is responsible for all the security problems. In fact, there are considerable differences by comparing the distribution of the fault types related to security with studies of common software faults. 28
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Neves et al. proposed a tool (AJECT) focused on discovering vulnerabilities on network servers, specifically on IMAP servers (Neves et al., 2006). In their work the fault space is the binomial (attack, vulnerability) creating an intrusion that may cause an error and, possibly, a failure of the target system. To attack the target system, they used predefined test classes of attacks and some sort of fuzzing. A procedure inspired on the fault injection technique (that has been used for decades in the dependability area) targeting security vulnerabilities is proposed in (Fonseca et al., 2007). In this work, the "security vulnerability" plus the "attack" represent the space of the "faults" that can be injected in an application; and the "intrusion" is the "error". To emulate them with accuracy real world vulnerabilities this work relies on the results obtained in a field study on real security vulnerabilities, which were used to develop a novel Vulnerability Injection tool. Conceptually, attack injection is based on the injection of realistic vulnerabilities that are automatically attacked, and finally the result of the attack is evaluated. As proposed in (Fonseca et al., 2009), a tool able to perform vulnerability and attack injection is a key instrument that can be used in several relevant scenarios, namely: building a realistic attack injector, train security teams, evaluate security teams, and estimate the total number of vulnerabilities still present in the code, among others. The problem is that current knowledge on vulnerability and attack models is quite limited, and additional studies are required to better understand how, where and when such faults should be injected (in a way that assures high representativeness). Also, existing work is focused on very specific types of vulnerabilities. Extending such approaches to additional domains is a relevant research challenge. 3.2.5
Security Modelling, Risk Management and Security Controls Different approaches are followed in security modelling domain. Some works present an analysis based on the relative attack surface (Howard et al., 2005). In this work, the authors propose to help determining if one version of a system is more secure than another, regarding a fixed set of dimensions (Howard et al., 2005). Instead of counting code bugs or vulnerability reports, this metric counts the system attack opportunities, i.e. the system “attackability”. The attack surface of the system is described along three abstract dimensions: targets and enablers, channels and protocols, and access rights. The more exposed the system surface is, the more attack opportunities, and the more likely it will be a target of attack. An alternative approach is the one presented in (Prasad et al., 2006), in this particular case applied to the automotive software. This work proposes the modelling of four cross‐functional attributes of software: Security, Privacy, Usability, and Reliability (SPUR). The reasoning behind the goal of modelling these four attributes at once, is that they do not make sense individually from a user’s perspective, without knowing the remaining. A similar approach is followed in (Hatzivasilis et al., 2016), where the authors focus on characterizing security, privacy and dependability (SPD). This methodology is based on the formal analysis of the attack surface and on other standards. In practice, the analysis is performed considering the relations between assets and threats, the controls used and their limitations. The technique produces a compounded metric, which allows characterizing these properties of system in tandem. The SPD methodology was then demonstrated in the EU founded nSHIELD project. An example of a tool to model the security of a system was built based on the Mobius framework (Deavours et al., 2002). Mobius supports multiple modelling formalisms including ADversary VIew Security Evaluation (ADVISE) (Ford et al., 2013). ADVISE allows to create and analyse a specific adversary profile in an executable state‐based security model of a system. Based on graphical primitives, ADVISE provides a high‐level modelling formalism to easily create the adversary profile of the system under consideration. In practice, the formalism allows to simulate how the adversary is likely to attack the system and how well the system is likely to defend itself against attack. The ADVISE models are build using five primitive types of objects: attack steps, accesses, knowledges, skills, and goals. 29
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 3.3 Data Privacy Solutions This section reviews the existing tools and methodologies for the data privacy protection that can cope with the threats faced by the Big Data storage and analytics processes. This includes techniques for privacy policy modelling, which will allow the data owners to define how the data can be accessed or used, and mechanisms to enforce these policies, especially regarding data anonymization. Privacy is a very abstract concept whose values and extensions vary from person to person, and has a broad and comprehensive context. In a broad sense, privacy is strongly connected with the idea that there are some things that other people should not see or know (Elgesem, 1996). More recently, in the electronic information context, Bertino et al. (2008) defined privacy as “the right of an entity to be secure from unauthorized disclosure of sensible information that is contained in an electronic repository”. This is the definition we will adopt for this project.
Anonymity, from the "Anonymous" adjective, refers to the fact that a specific individual cannot be characterized when it is part of a set of subjects. In other words, an individual has no identification, her/his identity is keep hide from third parties (Wright, 2004). A record or transaction can be considered anonymous when its data, individually or combined with other data, cannot be associated with a particular subject (Clarke, 1999). According to Wright (2004), when discussing about anonymity, it is fundamental to distinguish total anonymity and pseudonymity: ●
Total anonymity: in this case, it is impossible to determine the origin of communication. A classic example given from Wright (2005) is an unsigned letter without a return address. Pseudonymity: a real entity is hidden using an alias. In this case, it is possible to receive an answer without the possibility of linking the origin of communication to a person, that remains with unknown identity (e.g., the usage of nicknames in a chat room). Data Anonymization Data Anonymization, also known as de‐identification, consists of techniques that can be applied to prohibit the recovery of individual information. For example, do not allow that the result of a statistical query to be shown when the number of records retrieved falls below some threshold. Also, to enter deliberately small inaccuracies or "noise" in the results of statistical queries make the deduction of individual information difficult (Elmasri and Navathe, 2011). In a relational database, data are stored in tables and each record corresponds to one individual. Each record has a number of attributes, which are classified into three categories: ●
Key attributes: attributes that uniquely identifies individuals (e.g., ID, name, social security number); ● Quasi‐identifiers: attributes that can be combined with external information to expose some individuals, or to reduce uncertainty about their identities (e.g., birth date, ZIP code, position, job, blood type); ● Sensitive attributes: attributes that contain sensitive information about individuals (e.g., salary, medical examinations, credit card releases). There are several anonymization techniques that can be applied on data before or along the process of mining, in order to protect the privacy of individual. Some of these existing and most used techniques are generalization, suppression, encryption and perturbation/masking, which will be addressed following. These methods may be used and/or combined with the goal of making the data anonymous: a) Generalization: replaces (or records) quasi‐identifiers values for less specific values, but semantically consistent. In this technique, a value is replaced for another more generic, that is faithful to the original. Figure 2.6 shows an example of generalization (Fredj, 2015), where the 30
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr columns Age and Education were changed by generalized data, specified through ranges instead of the exact information. Figure 3.6: Generalization Example. b) Suppression: in this technique, the key identifier or the quasi‐identifier is deleted to form the anonymized table. It is used in the context of statistical databases, which provides only summaries of the table data instead of individual data (Xu, 2014). Figure 2.7 shows an example of suppression, where the data from the columns Name, Age and Education were deleted. Figure 3.7: Suppression Example. c) Encryption: this technique uses cryptographic schemes based on public key or symmetric key to replace sensitive data (key‐attributes, quasi‐identifiers and sensitive attributes) for encrypted data. It transforms data to make it unreadable to those who do not have the key (Hitachi, 2013). Figure 2.8 shows an example of encryption technique, where all the columns were replaced by encrypted data. Figure 3.8: Example of encryption technique. d) Perturbation (Masking): this technique consists of the replacement of the actual data values for dummy data, usually for masking databases testing or training. The general idea is to randomly change the data to mask sensitive information while preserving the critic data for data modelling. Some of the masking techniques are: ○ Replacement: random replacement for similar content, but with no relation to the real data. 31
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr ○
Shuffling: random replacement similar to the previous item, with the difference that the data itself is derived from the table column; ○ Blurring: this technique is applied to numerical data and dates. The technique changes the value of the data for some percentage of their random real value; ○ Redaction/Nulling: this technique replaces sensitive data for null values (NULL). The technique is used when the data in the table are not required for testing or training; The masking data is used to provide information that seem real but do not reveal information about anyone. This technique protects the privacy of personal data in the database, as well as other sensitive information that cannot be placed at the disposal for the test team or user training. Figure 2.9 shows an example of masking, where the names were replaced by fake ones. Figure 3.9: Example of perturbation/masking technique. Data Anonymization Models A common approach to prevent disclosure is called k‐anonymity. This model requires that any combination of quasi‐identifiers appears at least in k‐records in an anonymity table. k must be a positive integer value set by the owner of the data, whereas, a high value of k indicates that the anonymized table has a low risk of disclosure, because the probability of re‐identify a record is 1 / k. Considering a table T (A1, …, An) with associated quasi‐identifiers QIT, T is said to satisfy k‐anonymity if, and only if, each sequence of values in T(QIT) appears at least k times in T(QIT) (Sweeney, 2002).
Another approach found was the model ℓ‐diversity that requires, for each combination of quasi‐identifiers attributes must be at least ℓ‐ “well represented” values for each sensitive attribute. ℓ must be a positive integer value set by the owner of the data. Considering a q*‐block as the set of tuples in T (table) whose nonsensitive attribute values generalize to q , a q ‐block is ℓ‐diverse if contains at least ℓ “well‐
represented” values for the sensitive attribute S. A table is ℓ‐diverse if every q ‐block is ℓ‐diverse (Machanavajjhala, 2006). The model t‐closeness was proposed to correct some limitations of ℓ‐diversity with regard to protection attribute disclosure. The goal is to limit the risk of disclosure to an acceptable level. The t‐closeness technique assumes that the opponent can infer information on sensitive attributes, from the knowledge of the frequency of occurrence of these attributes in the table. An equivalence class is said to have t‐closeness if the distance between the distribution of a sensitive attribute in this class and the distribution of the attribute in the whole table is no more than a threshold t. A table is said to have t‐closeness if all equivalence classes have t‐closeness. The b‐likeness model ensures that the confidence of a striker worth a sensitive attribute does not increase in relative terms, more than a pre‐set limit b, then the attacker is aware of published data. Assume that DB is a table with a sensitive attribute SA. Let V = {v1, v2, . . ., vm} be the SA domain, and P = (p1, p2, . . ., pm) be the overall SA distribution in DB. Suppose that Q = (q1, q2, . . ., qm) is the SA distribution in an equivalence class G, formed by tuples from DB. The information gain on any SA value vi V is D(pi, qi), where D is a distance function between pi and qi (Cao, 2012). 32
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Query Anonymization There are two types of statistical databases: the Pure statistical database and the Ordinary database with statistical access. The Pure statistical database only stores statistical data, an example is the census database of a country. In this case, the users are authorized to execute queries in the entire database. The second type, the ordinary database, contains individual entries, this type of database supports a set of statistical users who are only permitted statistical queries. Aggregate statistics based on the underlying raw data are generated in response to a user query. For the second database type, the query control intends to provide users with the aggregate information without compromising the confidentiality of any individual entity represented in the database. According to Stallings and Brown (2007) “the security problem is one of inference. The database administrator must prevent, or at least detect, the statistical user who attempts to gain individual information through one or a series of statistical queries.” Statistical queries produce values calculated over a query set. They are restricted to obtain only aggregate or statistical data from the database and cannot have access to individual records. In other words, the aggregate information obtained by a user as a result of successive queries should not allow him or her to infer information on specific individuals (Domingo‐ferrer, 2015). According to Stallings and Brown (2007) and Domingo‐ferrer (2015), there are three types of query control: ●
Query restriction: Rejects a query that can lead to a compromise, and the answers provided are accurate. Perturbation: Provides answers to all queries, but the answers are approximate. Camouflage: “hide” the confidential data within a larger data set and answers queries with respect to that set. Privacy Policies In cloud computing, cloud providers may have to share data or release the data to authorized requestors. Privacy policies aim to describe the organization’s practices, including, most of the time, the collection, usage, storage and disclosure of personally identifiable information (PII) from their users and customers. The policies intend to protect the organization and to signal integrity commitment to site visitors. To guide browsing and transaction decisions, consumers adhere (or should) to the stated cloud providers’ policies. These policies are so important that can influence the organization’s credibility: if they are clearly and explicitly stated, then the visitor/consumer perceives the organization as more trustworthy (Han and Maclaurin, 2002). Nowadays, privacy policies usually are written in natural language. They are typically composed by a long text written in legal terms, and they rarely are fully understood, or even read, by the users. As a result, most of the users of cloud and service providers are not aware about the conditions under their data are handled. There is a need to support the user in this process, providing an as‐automatic‐as‐possible mean to process privacy policy. Another need regarding privacy policies is to allow users to express their privacy preferences. Usually, cloud and service providers present the privacy policy and give the user only the options to agree or disagree with this whole policy, not allowing them to express their own privacy preferences (i.e., to make specific choices from the policy). This leads to the possibility of the private data be handled with purposes different from the ones intended by their owners. Still in the context of privacy policy definition and processing, it is known that multiple service providers coexist in clouds and collaborate to provide various services. These services providers might have different security approaches and privacy mechanisms. So, a standard to provide heterogeneity among their policies is necessary (Catteddu and Hogben, 2009; Blaze et al. 2009; Zhang and Joshi, 2009). 33
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr According to Wharton and Lin (2015), no clear and definitive standards have emerged in regard to cloud computing policies to a universally‐accepted degree. The P3P ‐ Platform for Privacy Preferences (Cranor et al., 2006), developed by W3C (World Wide Consortium), provides a standard and machine‐understandable privacy policy, which allows users to express their privacy preferences (a user can give his agreement to the collection and processing of the given personal data by agreeing to parts of a P3P policy). However, the policies described through P3P lack semantic information and P3P only apply to websites, not supporting privacy protection in service composition. Therefore, P3P cannot be applied in cloud computing, since all entities in cloud computing are service and provide service through service composition. Similar to P3P, EPAL ‐ Enterprise Privacy Authorization Language (Ashley et al., 2003), allows enterprises to formalize their privacy promises into policies. The aim is that service providers can substantiate that they have followed their privacy policy. Also, an EPAL policy is linked to the corresponding personal data so that access requests to these data can be decided based upon this policy. However, EPAL does not support an interaction with the individuals/data owners (Sonehara et al., 2011). Lack of semantics and service composition are also limitations for this solution. The OASIS ‐ Organization for the Advancement of Structured Information Standards proposed the XACML ‐ Extensible Access Control Markup Language (OASIS, 2003), which permits to create and enforce access control policies. XACML 2.0 (OASIS, 2013) extends the XACML and introduces support of privacy policy. It implements two standard attributes (one indicates the purpose for which the data resource was collected and other indicates the purpose for which access to the data resource is requested) and one standard rule which stipulates that access shall be denied unless the purpose for which access is requested matches the purpose for which the data resource was collected (OASIS 2005). However, different users in cloud computing have different privacy requirements, requiring different definition of sensitive privacy information. XACML privacy policies only apply to service provider without considering user privacy requirement, and hardly guarantee the composite service satisfying user privacy requirement (Ke et al., 2013). From a software development perspective, privacy reference models can help stakeholders to understand the privacy domain. The PMRM ‐ Privacy Management Reference Model and Methodology (OASIS, 2012), which is also an OASIS specification, provides a conceptual model and a methodology for understanding and analysing privacy policies and their privacy management requirements. The Ponder (Damianou et al., 2001) is an object‐oriented policy language for the management of distributed systems and networks. It supports expressions for permissions, obligations and delegation of rights. However, Ponder lacks explicit support for purposes, not allowing users to express their privacy preferences. Furthermore, it presents some drawbacks as lack of generality, category‐specific syntax, and difficult dynamic policy update (Suzic et al., 2015). Karjoth et al. (2002) introduces the paradigm of sticky policies. In this paradigm, a user, when submitting data to an enterprise, consents to the applicable policy and to the selected opt‐in and opt‐out choices. Then the policy (with respective user’s preferences) is linked to the data and this set (data and policy) is sent within the cloud. However, this work does not describe how the strong associations between policies and confidential data are enforced, especially across enterprise boundaries. Among sticky policy approaches, some works were developed. Trabelsi et al. (2011) developed the PrimeLife Privacy Policy Language (PPL), which extends XACML to express users’ and service providers’ privacy policies. Song et al. (2012) proposed the DPaaS (Data Protection as a Service), which is a trusted platform using a combination of encryption, application confinement and information‐flow controls to enforce application‐level sticky policy attached to the data. Mont et al. (2012) presented another sticky policy approach with a Consent & Revocation module allowing to dynamically checking the access to a data of an enterprise by applications running inside or outside of this enterprise. Chadwick and Fatema (2012) proposed a scalable authorization infrastructure, based on sticky policy and with conflict resolution capabilities, which a cloud provider can use to control the web service requests made by applications on 34
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr the sensitive data. Li et al. (2015) propose a solution for modelling and enforcing sticky policies. It is composed of an implementation‐independent and fine‐grained meta‐model for security policy and a sticky policy framework, which supports the proposed meta‐model at the IaaS (Infrastructure as a Service) level. All these approaches basically offer a policy‐based control when an application wants to access sensitive data, but this control is not really embedded within the cloud infrastructure. The negotiation is only between users and services. Solutions are needed in order to automatically trigger this control before accessing to a privacy sensitive data. Furthermore, services providers might have different privacy policies and approaches, as well as enforcement mechanisms, which reinforce the need of a universally‐accepted standard for specifying and managing privacy policies. 3.3.3
Policy Negotiation One of the characteristics of cloud computing is service composition in SaaS (Software as a Service). Different services can have different privacy policies and negotiation and matching of privacy policy is desired for privacy‐preserving web service interactions. There are some works in this direction. Zhu and Zhou (2006) proposed a semantic web service privacy framework, which defined privacy ontology and allowed service provider to clarify the required privacy data in service input. In the meantime, this framework also provided a privacy negotiation protocol, through which user and service provider can negotiate automatically. Similarly, Garcia et al. (2010) use policies defined in the Web Services Policy Framework (WS‐Policy) and an ontology defined in the Web Ontology Language (OWL) in order to support Web service interactions with suitable privacy preservation levels. The goal is to use ontology to verify compatibility between privacy policy assertions, based on semantic comparison using OWL‐based operators. El‐Khatib et al. (2003) presented negotiation protocol pointing to the inconsistency of user and service provider privacy policy, and also put forward a privacy system, in which privacy policy based on P3P can be defined. Yan et al. (2009) set up a framework of parsimonious semantic trust negotiation, which can greatly reduce the degree of disclosed privacy identity information, without exchanging entire attribute certificates. Meziane and Benbernou (2010) propose a framework for privacy management in Web services, where both service customer and provider might agree before any running process. Tbahriti et al. (2011) also propose a dynamic framework, called Meerkat. In this framework, clients and providers specify their privacy concerns/practices via privacy requirements and policies and, in case of incompatibility, a negotiation protocol reconciliates the requirements. These previous works have some limitations. Garcia et al. (2010) only verify compatibility between privacy policy assertions, without negotiation procedures. El‐
Khatib et al. research (2003) is on privacy policy negotiation based on P3P, which only apply to web site, not supporting service composition. Moreover, privacy information described by P3P do not have semantic, it is hardly extracted and negotiated automatically. Furthermore, none of these works (Zhu and Zhou, 2006; Garcia et al., 2010; El‐Khatib et al., 2003; Meziane and Benbernou, 2010; Tbahriti et al, 2011) were conceived considering specific characteristics of cloud computing. Ke et al. (2013) present a privacy information description method and negotiation mechanism for privacy protection in cloud computing. First, the authors describe a Privacy Negotiation Language (PNL). Second, they establish a negotiation between user and service composer. Third, they obtain privacy policy that satisfies both parties. Although the authors established this theory basis for cloud computing, the work was not performed specifically for this environment and need to be implemented in this environment for evaluation. Tbahriti et al. (2014) proposed a dynamic privacy model for Web services. The model deals with privacy at the data and operation levels and provides a negotiation approach to tackle the incompatibilities between privacy policies and requirements. The authors goal is to extend DaaS (Data as a Service ‐ service‐
oriented technologies to enable fast access to data resources on the Web) descriptions with privacy capabilities. Sadki and El Bakkali (2015) present an approach to solve the problem of conflicting privacy policies in mobile health‐Cloud environments through negotiation, trying to reach an agreement between 35
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr two negotiators. The patients are classified into groups in terms of privacy preferences, which make resolving conflicts among policies easier. 3.3.4
Privacy Policy Enforcement The enforcement of a privacy policy is not a trivial task. It is necessary to adopt resources that can be used in order to enforce the privacy policy statements, respecting user’s preferences. Currently, in most cases, policies are defined using textual natural language, which makes their enforcement difficult due to the semantics involved. Moreover, the way over used to enforce policies is to rely only on existing security‐
related technologies as, for example, access control, auditing, cryptography, among others. However, these technologies alone are not enough to protect privacy as a whole. We believe the key is to combine different ones to overcome their individual limitations. Following, we describe some technologies that can be used for privacy policy enforcement. It is important to mention that this list is representative, but not exhaustive. ●
Activity Tracking Detection are tools that verifies if the system user has his/her activities tracked. E.g., the work of Roesner et al. (2012), who developed a client‐side method for detecting and classifying five kinds of third‐party trackers based on how they manipulate browser state. Privacy Violation Monitoring/Detection are tools that verifies if the user’s privacy is violated sometime. E.g. the work of Meziane et al. (2010), which proposed the PAM ‐ Privacy Agreement Monitoring tool for dynamically controlling the private data usage flow. The tool helps to make analysis, diagnosis and provide reasoning services on violations, as, for example, why the violations happen. Also, the work of Gao et al. (2010) proposes a collaborative method to identify websites that disclose users' privacy by using a privacy disclosure finding protocol inspired from secure multiparty computation (SMC). It identifies email addresses disclosed to third parties and used improperly as spam. Data leakage detection. Papadimitriou and Garcia‐Molina (2011) developed a model for assessing the “guilt” of agents in data leakage. This is done in the scenario where, after giving a set of objects to agents, the distributor discovers some of those same objects in an unauthorized place. Similarly, Kumar et al. (2014) propose a model to identify the culprit who has leaked the critical organizational data in cloud computing environment. Attack Detection are tools that verifies if the system suffers an attack. Examples of attack detection tools are the Sign‐WS tool (Antunes and Vieira, 2011), which is based on attack signatures and interface monitoring for detection of injection vulnerabilities, and the J‐Attack (Fernandes et al., 2011), a tool to perform attacks looking for Cross Site Scripting, SQL Injection and Cross‐site Request Forgery vulnerabilities. User Pattern Identification is a process that analyses stored users behaviours and uses them as a security resource against fake users. Usually it consists of observing and collecting data over time periods and then applying analysis methodologies to identify different user patterns. Obviously, as it gathers personally identifiable information, this process must be used only for security purposes. Auditing refers to auditing processes and mechanisms that web application must use to monitor and identify possible privacy violation sources. These resources should monitor all the system elements, as databases, servers, application, services calls, etc. This can be done automatically or with the support of auditors actions. The work of Biswas and Niemi (2011) is an example of automatizing part of auditing process. It presents a solution to streamline the log generation process by deriving the auditing specifications directly from the policies to be audited. Given privacy policies as input, the output of the proposed tool is the corresponding auditing specifications that can be installed directly in the databases, to produce logs to audit the given policies. Identity Management is a set of processes and technologies to manage, simplify and protect against unauthorized access. E.g., Shibboleth (2016), whose emphasis is on the privacy of user attributes, based on privacy policies and the user’s personal preferences. 36
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr ●
Access Control is also a process with a set of rules by which users are authenticated and by which the access to applications and other information services is granted or denied. Usually, access control policies are used. Many technologies can be used, as P‐RBAC ‐ Privacy‐Aware Role‐Based Access Control (Ni et al., 2009), XACML ‐ eXtensible Access Control Markup Language (OASIS, 2003), which permits to create and enforce access control policies. ● Cryptography is the process used to cypher information and avoid unauthorized access. E.g., the PGP ‐ Pretty Good Privacy (PGP, 2002), which is a public key encryption program based on RSA ‐ Rivest‐Shamir‐Adleman algorithm. ● Anonymization is the process used to avoid disclosure of stored confidential information that is retrieved even by means of data analysis. The k‐anonymity is a representative algorithm to support this process. ● Security Measures are configurations each user must perform in their own environment in order to protect their privacy. For example, configurations users can do in the own system or web page in order to refuse some services as, for example, advertisements or cookies and similar. It includes web browser security settings, security packages updates, use of antiviruses and firewalls, etc. Betgé‐Brezetz et al. (2013) present an approach of end‐to‐end privacy policy enforcement over the cloud infrastructure and based on the sticky policy paradigm. Data protection is performed within the cloud nodes and is transparent for the applications. Hamlen et al. (2012) outline a general policy enforcement framework needed for policy‐compliant cloud data management. They discuss different policy types applicable for cloud data management and show how various techniques (as in‐lined reference monitors ‐ IRM) that could be used to enforce such policies in a cloud‐certified manner. Younis et al. (2014) proposes an access control model for cloud computing. The goal is to fulfil access control requirements for diverse cloud based users who are sharing resources among potential untrusted tenants. 3.3.5
Privacy Threats Statistical Disclosure Control (SDC), Statistical Disclosure Limitation (SDL), database anonymization or database sanitization deal with the protection of data that can be published without revealing confidential information that can be linked to a specific individual among those to whom the data correspond (Domingo‐ferrer, 2015). The challenge for SDC is to provide the necessary and sufficient protection for information released with the least possible information. There are two types of disclosure that may occur in anonymised data: disclosure of identity, which occurs when the identity of an individual can be rebuilt and associated with a record in a table; and attribute disclosure, which occurs when the value of an attribute can be associated with an individual. There are three types of privacy attacks, they are better described below: 1. Attack of attribute connection: In this type of attack, sensitive attribute values are inferred from the anonymised data published. To prevent this attack, the overall strategy is to reduce the correlation between the sensitive attributes and quasi‐identifiers attributes; 2. Attack of register connection: In this type of attack, records with the same values for a given set of quasi‐identifiers attributes form a group. If the values of quasi‐identifiers attributes are vulnerable and can be connected to a small number of records in the group, the adversary can identify a particular record relates to an individual; Attack to table connection: This type of attack happens when the opponent can infer the absence or presence of a victim's record in anonymised table. In the case of medical or financial records, the mere presence of the victim Registration ID in the table may already cause harm to it. 37
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 4 PROJECT SECURITY REQUIREMENTS This section elicits the security requirements of the project. We present high level requirements and explain and discuss the concepts that are related to the needs identified. These requirements are also discussed considering the state of the art, identifying which parts of them are addressed, or may be adapted from other solutions. Based on this discussion, the requirements are prioritized defining the direction that will be adopted. In practice, the requirements will guide the research and development work of WP6, thus defining the security strategy of the project. These requirements were defined in coordination with the remaining technical work packages of the project and are presented divided in three subsections, which are aligned with the remaining tasks of WP6. For each requirement, we present, besides the name, a set of definitions necessary to understand the requirement and its priority (must have ‐ mandatory requirement; and nice to have ‐ optional). Furthermore, we explain the requirement, discussing how it is addressed in the state of the art. Finally, we list candidate solutions to meet the requirement, if it is or not handled by the state of the art and whether it requires action in the context of the EUBRA‐BIGSEA project. 4.1 AAA Provisioning Requirements (T6.2) The requirements regarding the AAA Provisioning are organized according to the two distinct AAA blocks, which have distinct functionality: one that provides AAA support to the overall EUBra‐BIGSEA infrastructure, and the other that provides AAA support to the end users deploying applications. Figure 4.1 illustrates both of these blocks, described as follows: ●
The EUBra‐BIGSEA Infrastructure AAA, which provides the AAA functionalities required for managing the EUBra‐BIGSEA framework (access to cloud resources), from both the Infrastructure and Platform perspectives (focusing on infrastructure managers and application developers/providers). The scope of this service is the whole EUBra‐BIGSEA framework, and it directly matches the nature of the first set of services analysed in Section 3.1.3 (“IAM Tools for cloud infrastructure management”). The first set of requirements presented next relates with this service (R6.2.1 to R6.2.6). ●
The EUBra‐BIGSEA Applications AAAaaS, which provides AAA‐as‐a‐Service for applications developed and hosted in the EUBra‐BIGSEA framework and in need of services for authenticating and authorizing their end users. The scope of AAAaaS instance is limited to the application making use of it, and AAAaaS directly matches the nature of the second set of services discussed in Section 2.1.4 (“IAM Tools focused on end‐users and enterprise/consumer applications”). Requirements R6.2.7 to R6.2.10 apply to this service. This option of defining two distinct AAA blocks will improve focus, since each of these blocks requires different features and different design and implementation approaches. This does not preclude the sharing of some modules (whenever appropriate), but provides increased flexibility. Furthermore, this will allow us dynamically adapt to the evolving priorities of the project, investing more or less in each block according to the effective needs relayed by other work packages. In fact, this approach is not different from the one followed by ecosystems such as Azure AD (with a common umbrella designation but different tools for each of these two AAA services) or Google Cloud (IAM and Identity Platform). 38
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Figure 4.1: Workflow for the usage of AAAaaS, in the context of EUBra‐BIGSEA infrastructure. 4.1.1
Requirements for the EUBra‐BIGSEA Infrastructure AAA The EUBra‐BIGSEA Infrastructure AAA service corresponds to an extension of the planned infrastructure management tool (the IM ‐ Infrastructure Manager), complementing it with AAA functionalities focused on the access to Cloud resources. This AAA service will therefore run on top of each EUBra‐BIGSEA cloud deployment, which will be based on specific Cloud Management Frameworks (e.g. OpenStack). The AAA Service will have a southbound interface for actually implementing the AAA functionalities on each CMF and a westbound interface for interconnection with external identity providers. For some CMFs specific adaptation plugins may be required, as well. R6.2.1: Provide a Cloud Management Framework‐Agnostic solution Definitions ● CMF‐Agnostic: Independence from underlying cloud management frameworks. Priority: Must have Explanation: Taking into account the intention of keeping EUBra‐BIGSEA agnostic towards the underlying cloud management framework(s) to be used, the design of the EUBra‐BIGSEA Infrastructure AAA Service must also be agnostic towards any proprietary solutions employed by such platforms, even if those solutions inspire and integrate with the EUBra‐BIGSEA Infrastructure AAA Service. To do this, it will be necessary to define a CMF‐independent architecture for the EUBra‐BIGSEA Infrastructure AAA Service. This should follow a top‐down design, incorporating clear and standards‐based interfaces with underlying frameworks – based on the already identified AAA standards, and should follow cloud management standards such as the Open Cloud Computing Interface (OCCI). Candidate Technologies / Solutions: ● Open standards and APIs for integration of underlying frameworks. ● Flexible architectures based on well‐defined interfaces. Handled by State of the Art: YES Action(s) Needed: Adapt and Research 39
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr R6.2.2: Support identity and access control management functionalities Definitions ●
Key identity management functionalities: includes the support for multi‐tenancy, user and group management, role‐based access control and definition of fine‐grained access policies. Priority: Must have Explanation: In line with the AAA functionalities currently provided by most popular AAA frameworks for cloud resources, the EUBra‐BIGSEA Infrastructure AAA Service must provide support for multi‐tenancy scenarios, adequate user and group management functionalities, and fine‐grained access policies. A top‐down approach will be followed for the definition of the AAA functionalities to be supported, inspired by the best practices of existing approaches and the specific requirements which may be imposed by EUBra‐BIGSEA. Candidate Technologies / Solutions: ●
AM functionalities inspired by or even directly adapted from frameworks such as OpenStack’s Keystone and AWS IAM. Handled by State of the Art: YES Action(s) Needed: Reuse and Adapt R6.2.3: Support for/compliance with base CMF to be adopted by EUBra‐BIGSEA Definitions ●
Compliance with CMFs: consistent integration with the underlying cloud management frameworks to be adopted by the EUBra‐BIGSEA, likely including IM (Infrastructure Manager), OpenStack and others, either by usage of open and standardized APIs or the usage/development of direct integration layers. Priority: Must have Explanation: According to the current vision of the project, the EUBra‐BIGSEA platform will be built on top of already existing cloud management frameworks, with a unified management layer to be provided by IM. As such, the EUBra‐BIGSEA infrastructure AAA service needs to provide consistent integration with those frameworks since they will be ultimately responsible for managing and controlling the access to the infrastructure resources. This means a common AAA layer should be provided at the top level (IM), for allowing the sophisticated functionalities mentioned in the previous request, with consistent reflection on the AAA services of each cloud framework to be adopted by EUBra‐BIGSEA (e.g. OpenStack’s Keystone), also including support for SSO. Ideally this integration should be accomplished using solely open standards such as OpenID Connect and SAML (for authentication), and OAuth and XACML (for authorization), but in practice some tailor‐made integration work may be required to create platform‐specific plugins, depending on the set of Cloud frameworks to be supported. These plugins should use a common interface on the top side, in order to make it possible to add support for new cloud frameworks in the long run. For achieving complete integration, and depending on the EUBra‐BIGSEA usage and business models, accounting might become a relevant feature. This specific aspect will be discussed in Requirement R.6.2.4. To create a CMF‐agnostic AAA module, as an extension of Infrastructure Management, and to use the more relevant open standards for authentication and authorization as the basis for the Southbound interface with each adopted CMF. The preliminary analysis conducted in the state‐of‐the‐art study shows that CMF’s are increasingly adopting these standards, ensuring an adequate integration path. 40
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Candidate Technologies / Solutions: ● OpenID Connect and SAML (open interfaces for authentication). ● OAuth and XACML (open interfaces for authorization). ● Tailor‐made integration plug‐ins, with a common northbound interface (plug‐in <‐> unified layer). Handled by State of the Art: NO Action(s) Needed: Reuse and Adapt R6.2.4: Support for authentication using external identity providers and support for federation Definitions ●
Identity Federation: means of linking an electronic identity and attributes, stored across multiple distinct identity management systems (e.g., eduroam, EduGAIN and CAFe). Priority: Nice to have Explanation: While is it possible to use only internal user databases (handled manually or via directory services), it is desirable that the EUBra‐BIGSEA platform allows user authentication based on external services, in order to support the usage of federations such as eduroam, EduGAIN and CAFe. The idea is not necessarily to grant access to all users existing in those federations (although this might be possible) but to use the user’s credentials on those federations to avoid duplicating user’s identities or credentials. From a technical point of view commercial identity repositories such as Google or Facebook may also be adopted, though this is not a core objective at this point. In addition to the simple usage of those external authentication services, it might also be desirable to provide support for federated identity management, in order to have for instance two EUBra‐BIGSEA domains mutually providing user authentication. A side effect of this support is the indirect possibility of using alternative (and potentially safer) authentication methods, such as FIDO‐enabled web authentication and GSMA’s Mobile Connect services. It should be noted, however, that at the moment the support for such authentication methods are not a priority. To add to the already mentioned CMF‐agnostic AAA module (extension of Infrastructure Management) westbound services based on OpenID Connect and SAML. In case this strategy does not enable eduroam support (either directly or through CAFe and EduGAIN) to study the addition of RADIUS functionalities. FIDO and GSMA support will be studied later, since they are likely to soon become supported indirectly through OpenID Connect and SAML. Candidate Technologies / Solutions: ● OpenID Connect and SAML. ● RADIUS (in case of eduroam). ● FIDO‐enabled authentication, GSMA Mobile Connect (lower priority). Handled by State of the Art: YES Action(s) Needed: Adapt and Research R6.2.5: Consistent support for Accounting Data across underlying frameworks Definitions ●
Accounting data: all the data provided by the monitoring infrastructures to allow the functioning of the accounting services. Priority: Nice to Have 41
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Explanation: Unlike authentication and authorization, for which there are well‐known and popular interoperability standards, cross‐framework integration of accounting (resource usage) data is more difficult to provide (less standards and open interfaces) and will require developing specialized modules for each framework. Since the current vision of EUBra‐BIGSEA envisages the usage of different cloud frameworks, each with its own accounting features, if the EUBra‐BIGSEA deployment and business models truly need unified accounting functionality (and the effective need of this integration is still under discussion) it will be necessary to integrate and unify accounting data. For achieving this integration, two possible alternatives are (i) the usage of classic network‐oriented AAA protocols such as RADIUS and DIAMETER, where accounting is reasonably well supported (despite the lesser role played by those protocols in cloud scenarios) and/or (ii) the integration of logging information collected from each framework (for providing at least a basic level of integrated accounting information). Nonetheless, extensive work will still be required for each adopted framework, in order to devise how to transform local accounting data into unified data. The goals is to study the specific accounting needs of the EUBra‐BIGSEA framework for assessing whether they fit into existing accounting data formats or require new formats/extensions. Afterwards, to study the best integration solution, considering the current support for RADIUS and DIAMETER on cloud ecosystems, and the expected effort of developing a specific solution for collecting accounting data. Candidate Technologies / Solutions: ● RADIUS, Diameter ● Collection and unification of logging data with relevance for accounting. Handled by State of the Art: NO Action(s) Needed: Adapt and Research R6.2.6: Common Authentication to the Infrastructure Definitions ● Authentication: An algorithm that can be used to verify if the user is who he claims to be. ● Some evidence that can be provided in order to help the process of validation. Priority: Must have Explanation: Password‐based methods obligate the user to choose strong passwords. However, the security of strong passwords is still far from the security of cryptographic keys used in certificate‐based solutions. Users tend to reuse their passwords across different applications and malwares can record keyboard interruptions to recover the password. Clearly, the transmission of passwords over the network must be protected against eavesdroppers and stored in the servers using an appropriate cryptographic primitive, as for example a secure hash algorithm like SHA‐512, or a stronger encryption algorithm. On the other hand, using certificate‐based methods needs to be careful with key management. Keys must be stored in the client and if not properly stored it could be possible for malwares to recover the key, breaking the security of the system. The utilization of authentication tokens contributes to improve the security of certificate‐based methods, since the key is stored into the token, offering more protection to the system. To adopt the most secure web authentication mechanisms for non‐critical authentication scenarios, including two‐factor authentication. For more sensitive scenarios it is possible to consider the adoption of additional authentication mechanisms, including certificate‐based tokens, FIDO‐ and GlobalPlatform‐based mechanisms. Candidate Technologies / Solutions: ●
Certificate‐based using token 42
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr ● Cryptography token ● Hybrid solution using biometry Handled by State of the Art: YES Action(s) Needed: Reuse and Adapt 4.1.2
Requirements for the EUBra‐BIGSEA Applications AAAaaS Figure 4.1 depicts the functionality of the EUBra‐BIGSEA Applications AAAaaS, which provides AAA‐as‐a‐
Service for applications developed and hosted in the EUBra‐BIGSEA. As we can observe, the developer that deploys an application is able to include a dedicated AAA instance, which should be configurable. The scope of such AAA instance is limited to the application. Below, we present the identified requirements to implement this part of the solution. 6.2.7: Support for B2C IAM functionalities Definitions ● B2C: business to consumer Priority: Must have Explanation: Support for business to consumer IAM functionalities, including user self‐registration (sign‐
up), web based login (sign‐in), user self‐management and easy integration of most popular web based authentication clients/methods. This will allow application developers to use AAAaaS to build consumer oriented applications. To adopt OpenID Connect and OAuth as base interfaces for integration with third‐party identity providers, since these are the prevailing protocols with increasing support from relevant providers. To develop and integrate adequate user self‐* tools and to provide adequate APIs and SDKs for application developers. Candidate Technologies / Solutions: ● SSO, OpenID Connect, OAuth Handled by State of the Art: NO Action(s) Needed: Research R6.2.8: Support for external Identity Providers Definitions ●
Support for open authentication: ability to support external identity providers such as Google and Facebook, as well as federations such as eduroam, EduGAIN and CAFe. Priority: Must have Explanation: this requirement is similar to Requirement R6.2.4, although in this particular case more focused towards consumers and therefore potentially more interested in providers such as Google and Facebook. Academic and research federations are also relevant since for some applications the consumers might be part of those federations. Define and implement northbound services based on OpenID Connect and SAML. In case this strategy does not enable eduroam support (either directly or through CAFé and EduGAIN) to study the addition of RADIUS functionalities. FIDO and GSMA support will be studied later, since they are likely to soon become supported indirectly through OpenID Connect and SAML. Candidate Technologies / Solutions: 43
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr ● OpenID Connect and SAML (in the case of external identity providers) ● OAuth and XACML (open interfaces for authorization). ● GSMA Mobile Connect (in the specific case of mobile phone‐based authentication) Handled by State of the Art: YES Action(s) Needed: Adapt and Research R6.2.9: Support for High Availability, including Dynamic Elasticity Definitions ● Elasticity: on‐demand allocation/deallocation of resources during system operation. Priority: Nice to Have Explanation: Support for coping with consumer application’s requirements regarding high availability (e.g., server failover, session failover, redundant servers and load‐balancing). Support for elastic capacity management (e.g., scale‐up, scale‐down) in order to dynamically adapt the service to variable demand. This is important to cope with demanding consumer applications with workload peaks. To design adequate self‐management tools for the AAAaaS components, including a service orchestrator capable of monitoring the AAAaaS workload and performance and of dynamically adding or removing resources, in order to achieve the desired levels of redundancy and elasticity. At this point in time it is still not clear if this will be possible using a CMF‐agnostic approach (based for instance on OCCI) or if CMF‐
specific support is going to be required, due to specificities of lifecycle management tools and load‐
balancing resources. Candidate Technologies / Solutions: ● Redundancy, load‐balancing technologies. ● Cloud‐based service lifecycle management mechanisms. ● Provision of self‐adaptive AAA OpenStack/Keystone Handled by State of the Art: NO Action(s) Needed: Research R6.2.10: Support for Application‐level Accounting Mechanisms Definitions ● Application‐level: mechanisms to enable application‐aware rating & charging or capacity planning. Priority: Nice to Have Explanation: While resource usage is the focus of accounting services when controlling the access to cloud‐
based resources, when controlling the access to consumer‐oriented applications the focus is mainly on application‐specific usage data, which somehow requires the support for application‐aware accounting mechanisms (i.e. allowing the application to report accounting data to the AAA service, using a common and expandable reporting format). To develop and provide libraries and SDKs to application developers, so their applications can easily provide accounting data to the AAAaaS module. Candidate Technologies / Solutions: ● Extensions of current messaging formats for monitoring and accounting. Handled by State of the Art: YES Action(s) Needed: Adapt and Research 44
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 4.2 Security Assurances Requirements (T6.3) R6.3.1: Security Assessment of Application Development Services Definitions ●
Security Assessment: process to analyse a system in order to understand how resilient it is to malicious attacks. ● End‐Users ‐ users that will take advantage of the functionalities of the applications developed (see Section 3.2). Priority: Must have Explanation: The applications that will run inside the EUBra‐BIGSEA infrastructure will receive inputs directly or indirectly from the respective End Users, and after processing, provide them with the results. The End Users should not be able to subvert, through the inputs of such applications, the functionalities implemented. The high level APIs to be made available to the developers are not general use APIs, therefore there are no assessment works studying if they perform according to what is specified. This way, it is necessary to perform a detailed assessment of the APIs to be used in the development of applications. This includes using security testing techniques and code analysis to uncover problems of availability, confidentiality and integrity. It will also be necessary to propose a set of recommendations for development best practices. Candidate Technologies / Solutions: ● Security testing ● Static Analysis and Code Reviews ● Dynamic Analysis Handled by State of the Art: YES Action(s) Needed: Adapt and Research R6.3.2: Assessment of Application Development Services Security regarding Developers Definitions ●
Data App Developer: scientists that develop applications to run inside EUBra‐BIGSEA (see Section 3.2). ● Application Development Services: API to be developed by WP5. Priority: Must have Explanation: An high level API, to be developed by WP5, will be provided to the Data App Developers for the development of Data Analytics Applications. Data App Developers should not be able to develop applications that interfere with the remaining applications running inside the framework. This way, it is necessary to perform a detailed assessment about the API and propose a set of recommendations for development best practices in the context of WP5. This includes using security testing techniques and code analysis to uncover problems of robustness, isolation, availability and integrity. Although there are some works in analysing and assessing the security of languages and APIs, from which it is possible to learn very valuable lessons, each of these works is very language specific. Furthermore, the type of APIs to be assessed here are of an extremely high level and therefore require different techniques. Candidate Technologies / Solutions: ●
Security testing Static Analysis and Code Reviews 45
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr ● Dynamic Analysis Handled by State of the Art: NO Action(s) Needed: Adapt and Research R6.3.3: Security Assessment of Data Analytics Services Definitions ● Data Analytics Services: API to be developed by WP4. ● Data Scientists: the scientists that mines and analyses the data from data sources (see Section 3.2). Priority: Nice to have Explanation: The lower level API, to be developed by WP4, will support the WP5 API and will be provided to the Data Scientists for the ad hoc analysis of the data. Regarding security, although this API is to be used by “trusted” users, it is important that these users are not allowed to subvert the API performing functionalities other than the ones duly allowed by the authentication and authorization modules. This way, it is necessary to perform a detailed assessment of the API. This includes security testing techniques and code analysis to uncover problems of availability, confidentiality, integrity and isolation. This API, due to its objective of handling data, must be checked with particular attention to make sure that it does not allow to the user to circumvent the access control and privacy enforcement layers. Candidate Technologies / Solutions: ● Security testing ● Static Analysis and Code Reviews ● Dynamic Analysis Handled by State of the Art: NO Action(s) Needed: Research R6.3.4: Security Assessment of Application Containers Definitions ●
Application Containers: also known as operating‐system‐level virtualization, is a virtualization method that allows the existence of multiple isolated user‐space instances, instead of just one; e.g., Docker. Priority: Must have Explanation: Application containers are a lighter option to traditional virtualization that avoids the overhead of starting and maintaining virtual machines. Containers provides abstraction and automation for such technology and its containers wrap up a software in a filesystem that contains everything it needs to run: code, runtime, system tools, system libraries. The problem is that these containers share machines with other containers that, as happens in this project, may be owned by a different organization, thus their security is of utmost importance. This way, it is necessary to perform a detailed assessment about the security of the Docker platform, which will be used in the project. This includes online security reports, security testing techniques and code analysis to uncover problems of availability, confidentiality, integrity and isolation in the platform. The works on assessment of this type of technologies has been, until now, very limited in scope and depth. These technologies are being used mostly for users to deploy their own applications, which they can trust, and seldom used for the deployment of third party applications. Conversely, in EUBra‐BIGSEA it is possible that the application developers are third parties that are not totally trusted. Therefore, it may be possible that they submit malicious applications that will explore the weaknesses of the containers. 46
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Candidate Technologies / Solutions: ● Security testing ● Static Analysis ● Code analysis Handled by State of the Art: NO Action(s) Needed: Adapt and Research R6.3.5: Security Assessment of Cloud Management Frameworks Definitions ●
Cloud Management Frameworks (CMF): system or set of system to deploy and manage cloud computing infrastructures. E.g., OpenStack, OpenNebula. Priority: Nice to have Explanation: CMFs play a strategic role in the project and thus it is necessary to assess their resilience to malicious attacks. Due to their complexity, these frameworks are extremely hard to assess and traditional security testing may not be enough. Many times the components are validated individually, to reduce the impact of the complexity in the assessment. Therefore, it will be necessary to develop new techniques that can assess these frameworks following a more holistic approach. Vulnerability injection techniques allow just that, as they propose the introduction of security vulnerabilities to allow a subsequent attack to the system. This allows understanding how the layers below deal with the attack, and therefore to validate defence in depth strategies. In such a complex system as an CMF, the methodology may be applied in many ways. e.g. vulnerabilities might be injected on the network management modules to analyse how the remaining modules deal with attacks. Candidate Technologies / Solutions: ● Vulnerability and Attack Injection ● Attack Surface Analysis Handled by State of the Art: NO Action(s) Needed: Adapt and Research R6.3.6: Security Assessment of Virtualization Infrastructures Definitions ●
Virtualization Infrastructures: infrastructures that allows the creation of virtual instances of physical devices such as network, storage or processing units (e.g., Xen, KVM, VMware). Priority: Must have Explanation: Virtualization technology made the Cloud possible, as it allows the creation of virtual instances of physical devices such as network, storage or processing units. A virtualized system is governed by a hypervisor and resources are shared amongst virtual machines (VMs), which are entitled to a contracted amount of each resource. While virtualization provides many benefits it also introduces some new challenges, including security, availability, and isolation. If an attack is successful against a hypervisor or the host OS it can damage the quality of services of the VMs or expose sensitive information. These attacks can be external, e.g. over the network, or internal, e.g. from a malicious user. This way, it is necessary to perform a detailed assessment about the security of the virtualization infrastructures that are of most interest for the project: Xen and KVM. This includes online security reports, security testing 47
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr techniques and code analysis to uncover problems of availability, confidentiality, integrity and isolation in the platform. The assessment of virtualization infrastructures received attention in the last few years in terms of security properties, although there are still some attack venues that need to be analysed. Most of these techniques can be adapted to the technologies that are interesting for the project, and it will be necessary to understand which attack venues are not covered and are important for the project. Candidate Technologies / Solutions: ● Security testing ● Dynamic Analysis ● Vulnerability and Attack Injection ● Attack Surface Analysis Handled by State of the Art: YES Action(s) Needed: Reuse and Adapt R6.3.7: Benchmarking Intrusion Detection Systems Definitions ●
Benchmarking: standard tools that allow evaluating and comparing different systems, components and tools according to specific characteristics. Priority: Must have Explanation: Intrusion Detection Systems (IDSs) can be used at many levels to protect a cloud infrastructure such as the one being developed in the project. However, it is important to understand how good these systems are in detecting and stopping attacks. It is also important for the selection of the best alternatives and configurations to include in the overall system. New techniques should be considered, inclusively new metrics that can represent how the IDS copes with the elasticity of the system. Based on the benchmarking results achieved, it is necessary to propose an IDS solution for the EUBra‐BIGSEA infrastructure and respective configurations. Parts of benchmarking approaches have been proposed for this domain, including representative workloads and new metrics. Attack loads have also been proposed in the past, but with very limited size and usefulness. Therefore, it still to be proposed and demonstrated a complete and convincing benchmarking approach, that can help in the selection of the IDSs for specific scenarios. Candidate Technologies / Solutions: ● Vulnerability and Attack Injection Handled by State of the Art: YES Action(s) Needed: Adapt and Research R6.3.8: Development of Intrusion Detection Systems Definitions ● Elasticity: on‐demand allocation/deallocation of virtualized resources during system operation. Priority: Nice to have Explanation: Several IDSs solutions are available to be used and adapted, but besides the need for evaluation and comparison, there is also some room for improvement. This way, it is possible to build on top of existing open source solutions to develop IDSs with adaptive characteristics that would allow them 48
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr to cope with the elasticity of the system and the consequent changing resource requirements, allowing this way to have a reduced impact on the QoS provided by WP3 methodologies. Although IDS solutions are available to be used and/or adapted for Cloud and Virtualization infrastructures, there are no satisfactory solutions that can perform online detection of attacks while coping with the characteristics of the cloud systems, in special with elasticity. Candidate Technologies / Solutions: ● Xenini; ● OSSEC; ● Snort; Handled by State of the Art: NO Action(s) Needed: Research R6.3.9: Devise Solutions for Security Problems Found Definitions ● Security Problems: weakness that may expose the system to attacks. Priority: Nice to have Explanation: Security issues can be detected during the assessment of the EUBra‐BISEA application development services and supporting infrastructure. In these cases, there are two main options: 1) do not use the related component, or 2) mitigate the identified issues. Although the first option is usually preferable, in practice it is not always possible. However, some of this problems can be corrected or mitigated, lowering the threat to an acceptable level. Examples are software vulnerabilities that can be solved using strict input limitations, or an infrastructure vulnerability that can be mitigated by limiting the users that can access a certain resource. Several techniques have been proposed in the past for automated correction of vulnerabilities or for the prevention of attacks. However, these techniques are usually very specific for the kind of vulnerabilities that are being addressed. Candidate Technologies / Solutions: ● Wrapping techniques; ● Intrusion Detection Systems; Handled by State of the Art: NO Action(s) Needed: Research R6.3.10: Trustworthiness Characterization of the System Definitions ● Trustworthiness: trust that can be justifiably put in the security of a system. ● Metrics: consistent standard for measurement with the goal of quantifying data to facilitate insight. Priority: Must have Explanation: Considering the assessment techniques proposed, it is necessary to aggregate their output in order to compute understandable metrics that allow the characterization of the security of the services, and of how resilient they are to malicious attacks. The idea is that it is possible to provide the user with a degree of confidence in the framework, allowing him to decide whether he should use the applications there. These metrics can change with time depending on the security features deployed, on the success of 49
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr attacks, on the results of the auditing systems, etc. In practice, a user can use this metric to, depending on the criticality of their application, to understand if it is the most adequate or not for him. Security benchmarking is still an open problem. Several works propose the use of trustworthiness benchmarking as a way to characterise the system based on the evidence of how securely it was designed and built. However, these works are far from finished products, still missing metrics that can be easily understood by users, and also ways to integrate in automated fashion data from different sources. Candidate Technologies / Solutions: ● Trustworthiness Benchmarking (Neto and Vieira, 2010); Handled by State of the Art: NO Action(s) Needed: Research 4.3 Data Privacy Requirements (T6.4) R6.4.1: Definition of a Standard Privacy Policy Format Definitions ●
Privacy Policy: statement (or a legal document, usually based on privacy laws) that discloses the conditions under which a party can gather, use, disclose, and manage personally identifiable information. ● Format: standard format for expressing privacy policies, with focus on a machine‐readable structure. Priority: Must have Explanation: Usually, external data sources store personally identifiable information. The aim of this requirement is to provide a standard format that allows data source owners to specify, through data policies, who and under which conditions this information can be accessed. The enforcement of these data policies would prevent any user or application with access to data from obtaining and using personally identifiable information in those ways until given explicit permission. Nowadays, the most common way of presenting the privacy policy is in human‐readable format, i.e., using natural language. We want to define a machine‐readable format, adequate to the domains of data that are targeted by EUBra‐BIGSEA. Expressing privacy policies in machine‐readable standard formats allows these policies to be directly used as input for mechanisms, tools and software enforcement frameworks. Although some standard privacy policies exist (e.g., P3P, EPAL), they are focused on the end user, allowing them to express their preferences. Our privacy policy will be defined by the data source owners and we need to define a machine‐readable format for this policy, which allow specifying the information that must be protected. It includes personally (or sensitive) identifiable information and information obtained by data combination (e.g., trajectories, i.e., the path traversed by one end user while using public transportation). This policy shall be interpreted and enforced by the framework, through appropriate anonymization and access control techniques. In our view, privacy‐related policies can be organized in a hierarchy: highest‐level policies are described in natural language; lowest‐level policies are specified in machine‐readable format, and used by the application itself to, e.g., perform access control. Reproducing high‐level statements in machine‐readable statements is a big challenge due to the semantics involved. The lower the level, the lower is the impact of semantics. Candidate Technologies / Solutions: 50
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr ● P3P (Platform for Privacy Preferences) ● EPAL (Enterprise Privacy Authorization Language) ● XML (eXtensible Markup Language) Handled by State of the Art: NO Action(s) Needed: Research. R6.4.2: Avoidance of Statistical Disclosure Definitions ●
Statistical Disclosure: occurs when statistics are published and, from these statistics, a third party can identify data from an individual, revealing previously unknown information about him/her. ● Statistical Disclosure Control (SDC): methods that aims to protect data in statistical databases, so that, the data will be published without revealing confidential information of a specific individual. Priority: Must have Explanation: Usually, big data analytics processes are interested in statistical data to identify trends among larger groups of information (which includes sensitive and personally identifiable information). However, in some cases, it is possible to obtain information that identifies specific individuals, violating their privacy. So, information must be obtained from database only for statistical purposes, without revealing specific individual data. For this, SDC methods must be used, i.e., mechanisms for anonymization must be implemented in order to protect the user identification. The database anonymization will be implemented according with privacy policies defined by data source owner. The SDC must also help preventing privacy attacks (e.g., attack of attribute connection, attack of register connection and attack of table connection). It will be necessary to follow the data owner anonymization police for the Big Data. The state of the art shows that there are some techniques to achieve data anonymization, a study will be performed in order to verify how these techniques can attend the data owner anonymization policy. It may also be necessary to provide extensions or adaptations to the anonymization techniques. Candidate Technologies / Solutions: ●
Anonymization techniques and models (e.g., k‐anonymity, ℓ‐diversity t‐closeness or b‐likeness, or a combination of these models). Handled by State of the Art: YES Action(s) Needed: Reuse, adapt. R6.4.3: Efficient Data Anonymization/Obfuscation Techniques ●
Data Anonymization: process of modifying personal data in such a way that individuals cannot be reidentified and no information about them can be learned. ● Data obfuscation: process where data is purposely masked to prevent unauthorized access, especially sensitive and personally identifiable information. Priority: Must have Explanation: It is necessary to have efficient techniques for data anonymization and obfuscation (e.g., generalization, suppression, encryption, perturbation, masking, etc.). These techniques can be used or combined with each other in order to reach the best results. Due to the context of big data, it is imperative that the techniques used have reduced performance impact, while keeping acceptable levels of privacy protection. 51
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr There are several techniques for data anonymization and obfuscation (e.g., generalization, suppression, encryption, perturbation, masking, etc.). However, they can provide different performance impact. The actions to be taken in this context is to select solutions that, in the context of the project, have this impact reduced and keep acceptable levels of privacy protection. This can be done through performance evaluations and comparative studies of the anonymization techniques. The goal is that data anonymization can be done without compromising the quality of data mining results. Candidate Technologies / Solutions: ● Use data partitioning techniques together with masking and fake data injection. Handled by State of the Art: YES Action(s) Needed: Reuse, adapt. R6.4.4: Measurement of Data Utility and Disclosure Risk after anonymization Definitions ●
Data Utility: a challenge for Statistical Database Control is to achieve protection with minimum loss of data accuracy; ● Disclosure Risk: risk that a user or an intruder can use the protected data to derive confidential information of an individual; Priority: Nice to have Explanation: In general, excessive anonymization can make the disclosed data less useful to the recipients because some analysis becomes impossible or the analysis produces biased and incorrect results. So, it is important to measure the data utility of an anonymized database; the information extracted from an anonymized database must remain utile and relevant. Another important measure is the disclosure risk. Even when anonymization methods are applied, there is a risk of data disclosure, i.e., a probability of data re‐identification. Usually, a threshold is determined to decide whether to release a dataset or not, and the calculation of disclosure risk must be below this threshold to be acceptable. There are some techniques that enables the measure of Data Utility and Disclosure Risk. But, these techniques were not applied in the Big Data context. It will be necessary to evaluate (or improve) the use of these techniques in the context of is project, provide improvements to the measurement techniques, and investigate the feasibility of using algorithms input/output privacy, to be expressed by developers, in the Disclosure Risk Measurement. Candidate Technologies / Solutions: ●
Mechanisms to measure data utility (propensity score measure, cluster analysis measure) Techniques to identify and classify risks of disclosure (score metric, information loss ‐ IL, t‐
closeness). Handled by State of the Art: YES Action(s) Needed: Reuse, adapt. R6.4.5: Query Anonymization Definitions ●
Query Anonymization: The aggregate information obtained by a user as a result of successive queries should not allow him or her to infer information on specific individuals. Priority: Must have 52
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Explanation: In some cases, the data analytics team may prefer to access the raw data of the database or, in other words, they will prefer to access the non anonymized tables. In these cases, it is important to anonymize queries that will be performed by the data analytics team. So that, techniques for anonymize queries should be used. There are some techniques that enables the query anonymization, but these techniques were not applied in the Big Data and Data Analytics context, especially regarding the use of data policies as basis for the anonymization. It will be necessary to provide a technique to execute the query anonymization in the context of this project. Candidate Technologies / Solutions: ● Perturbation ● Query restriction ● Camouflage Handled by State of the Art: YES Action(s) Needed: Research. R6.4.6: Access Control for Privacy Protection Definitions ●
Access Control: process with a set of rules by which users/services are authenticated and by which the access to information is granted or denied. Priority: Must have Explanation: Besides data anonymization, it is necessary to implement mechanisms to protect the privacy of the data that will ensure that the data is accessed and processed only by authorized roles and services, and prevent access otherwise. The access control is based on data policies, and works at the data level, i.e., the data access must be restricted to authorized users/services. There are several access control mechanisms available (e.g., Discretionary Access control ‐ DAC, Mandatory Access Control ‐ MAC, Role‐Based Access Control ‐ RBAC, Privacy‐aware Role Based Access control ‐ P‐
RBAC, Access control list ‐ ACL). However, the innovative features of MapReduce systems and NoSQL datastores along with the variety of data models and query languages introduced for them make the definition of Privacy Aware Access Control solutions a new research goal. New solutions or extensions of existing solutions must be provided. The mechanism must consider the format defined for data policies. Candidate Technologies / Solutions: ● P‐RBAC (Privacy‐Aware Role‐Based Access Control) ● XACML (eXtensible Access Control Markup Language) Handled by State of the Art: YES Action(s) Needed: Reuse, adapt. R6.4.7: Privacy Violation Detection Definitions ●
Privacy Violation: is an event that breaches the privacy policy or privacy agreement between an end user (data subject) and a data collector. Priority: Nice to have Explanation: Similarly, to current Intrusion Detection Systems (IDS), it is important to have frameworks and tools to monitor, collect and assess events that indicate possible violation of data access and disclosure. 53
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr The idea is to check privacy policies with data source owners preferences against the process of sensitive information and information flow analysis, to detect misalignments. There are some tools that verify if the user’s privacy is violated sometime (e.g. Meziane et al., 2010; Gao et al., 2010). However, their focus is on privacy policies compatibility and negotiation protocols, with the goal of not allowing information traffic between client and service with inconsistent privacy policies. For data leakage detection, some models were proposed (e.g. Papadimitriou and Garcia‐Molina, 2011; Kumar et al., 2014), but their focus are in detecting the agent who leaked the information. We need automatic tools that, similarly to intrusion detection systems, and based on privacy policies defined by data source owners, detect and avoid data leakage and privacy attacks. Handled by State of the Art: NO Action(s) Needed: Research R6.4.8: Database Auditing Definitions ●
Database Auditing: refers to the monitoring and recording of individual and collective actions performed by database users. Priority: Nice to have. Explanation: Auditing helps the detection and monitoring of data leakage and unauthorized access to data and operations. More than enforcing privacy policies and users preferences, it is important to prove the users it has been done, in order to improve trust. So, this requirement aims implementing a set of auditing services, so that users can directly supervise how data are being protected and outsourced to the cloud databases. Auditing can be done automatically or with the support of human auditors. Our idea is to have an automatic process, based on privacy policies, to detect data leakage. Although some works address this task (e.g., Biswas and Niemi (2011)), it derives auditing specifications from privacy policies given in specific formats. We need to investigate how to derive auditing specifications from the policies written with the policy format we provide. Also, the audit specifications need to consider characteristics of different types of data sources, including NoSQL databases. Candidate Technologies / Solutions: ● Auditing activities as auditing of database connections, privileged activities and transaction logs. Handled by State of the Art: YES Action(s) Needed: Research 54
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 4.4 Summary of the elicited requirements Table 4.1 presents a summary of the requirements identified before for the three technical tasks of WP6. The priority for each requirement in the context of the project is presented, distinguishing among requirements that must be implemented and requirements that would be nice to have. Finally, the table links the requirements to the state of the art, distinguishing the ones for which a solution already exists, even if that solutions may need to be adapted to be used in the context of the EUBRA‐BIGSEA project, from the ones that require further research. In practice, Table 4.1 provides an analysis of the existing gaps with regard to the requirements identified, which will guide the research and development work of WP6 in coordination with the other technical work packages, thus defining the security strategy of the project. Table 4.1: Requirements analysis summary Requirement 6.2 AAA Provisioning Task 6.3 Security Assurances Priority Handled Actions by SoA Needed
R6.2.1 (Should be Cloud Management Framework‐Agnostic) Must have Yes Adapt, Research R6.2.2 (Support adequate identity and access control management functionalities) Must have Yes Reuse, Adapt R6.2.3 (Support for/compliance with base CMF to be adopted by EUBra‐
BIGSEA) Must have No Reuse, Adapt R6.2.4 (Support for authentication using external identity providers and support for federation) Nice to Have Yes Adapt, Research R6.2.5 (Consistent support for Accounting Data across underlying frameworks)
Nice to Have No Adapt, Research R6.2.6 (Common Authentication to the Platform) Must have Yes Reuse, Adapt R6.2.7 (Support for B2C IAM functionalities) Must have No Research R6.2.8 (Support for external Identity Providers) Must have Yes Adapt, Research R6.2.9 (Support for High Availability, including Dynamic Elasticity) Nice to Have No Research R6.2.10 (Support for Application‐level Accounting Mechanisms) Nice to Have Yes Adapt, Research R6.3.1 (Security Assessment of Application Development Services) Must have No Adapt, Research R6.3.2 (Assessment of Application Development Services Security regarding Developers) Must have No Adapt, Research R6.3.3 (Security Assessment of Data Analytics Services) Nice to Have No Research R6.3.4 (Security Assessment of Application Containers) Must have No Adapt, Research Nice to Have No Adapt, Research Must have Yes Reuse, Adapt R6.3.5 (Security Assessment of Cloud Management Frameworks) R6.3.6 (Security Assessment of Virtualization Infrastructures) 55
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr Must have Yes Adapt, Research R6.3.8 (Development of Intrusion Detection Systems) Nice to Have No Research R6.3.9 (Assurance of Solutions for Security Problems Found) Nice to Have No Research R6.3.10 (Trustworthiness Characterization of the System) Must have No Research R6.4.1 (Standard Privacy Policy Format) Must have No Research R6.4.2 (Application of Statistical Disclosure Control) Must have Yes Reuse, Adapt R6.4.3 (Efficient Data Anonymization/Obfuscation Techniques) Must have Yes Reuse, Adapt Nice to have Yes Reuse, Adapt R6.4.5 (Query Anonymization) Must have Yes Research R6.4.6 (Access Control for Privacy Protection) Must have Yes Reuse, Adapt R6.4.7 (Privacy Violation Detection) Nice to have No Research R6.4.8 (Database Auditing) Nice to have Yes Research R6.3.7 (Benchmarking Intrusion Detection Systems) 6.4 Data Privacy R6.4.4 (Techniques to Measure Data Utility and Disclosure Risk after anonymization) 56
www.eubra‐bigsea.eu | contact@eubra‐bigsea.eu |@bigsea_p3peubr 5 CONCLUSIONS This document performed an extensive review of the goals of the project, its scope and concerns regarding security. As a transversal concern for EUBra‐BIGSEA, it was necessary to have a clear view of the general requirements and also of how the security solutions are going to be implemented. Furthermore, it was of utmost importance to work in coordination with the remaining work packages, understanding how they relate to WP6. This document establishes the blueprint for what will be the works of Work Package 6 throughout the remaining duration of the project, in a form of a list of high level requirements for the research and implementation of the solution. This blueprint, developed within T6.1, was further detailed in the work of the remaining tasks: T6.2, T6.3 and T6.4. We believe that the defined solution will provide the necessary AAA services not only to the infrastructure managers and application developers/providers, but also to the end users of the applications to be hosted inside the EUBra‐BIGSEA infrastructure. Furthermore, it will assess the security of the infrastructure components and provide the users with information on the degree of trustworthiness that the infrastructure deserves, from a security point of view. Finally, it will provide the EUBra‐BIGSEA users with the necessary privacy precautions for big data processing. The research and engineering necessary to meet the requirements prioritised as “must have” will provide the minimum acceptable degree of security for the framework to achieve its goals. Meeting the “nice to have” requirements, will add to the level of excellence of the work to be developed. 57
