Operational GARUDA Architecture

Transcription

Operational GARUDA Architecture
Operational GARUDA Architecture
Version 1.0
Jan 2010
Centre for Development of Advanced Computing
Knowledge Park, Bangalore
Operational Garuda Architecture
OPERATIONAL GARUDA ARCHITECTURE
1.
Introduction
GARUDA is a SOA based cyber-infrastructure supporting collaboration of science
researchers and experimenters on a nation wide grid of computational nodes and
mass storage. It connects a wide variety of resources, services and users to provide
stable, robust and efficient grid environment with guaranteed QoS for a wider variety
of uses.
The Department of Information Technology (DIT), Government of India has funded
the Centre for Development of Advanced Computing (C-DAC) to deploy the nationwide computational grid ‘GARUDA’ spans across 17 cities and 45 institutions with an
aim to bring Distributed/Grid networked infrastructure to Academic labs, research
labs and industries in India.
This document is focused on describing the top level of GARUDA's architecture that
spans across different critical pillars of grid namely Security, Data management,
Resources and services monitoring and Job Management.
GARUDA resources are integrated through Service Oriented Architecture in that
resources provide a "service" that is defined in terms of interface and operation.
Functions such as single-signon, remote job submission, workflow support for
complex applications, data movement tools etc. are conceived, (will be) developed
from the end-user perspective and offered as a service.
Distributed accounting and user account management software, verification and
validation software, programming and monitoring tools, tools to enforce the
adherence and capturing violations of SLAs, Operation and administrative support
tools like remote installer, request tracker are also (will be) developed by GARUDA.
2. Interoperability and Standards
It is important that GARUDA Grid's systems and service interfaces are designed and
operated in such a way that they can interoperate as smoothly as possible with the
other elements of our users' overall IT landscape. Although there is a huge diversity
in external systems, which we have little or no control over, we believe that
standards, both formal and informal, offer a good likelihood of making our user's
work easier as they integrate GARUDA systems into their overall workflow.
Consequently, we need to place a high premium on compliance with community
standards wherever it is possible. This plays out most clearly in the design of
GARUDA architecture and defining service interfaces. Grid community standards, in
the area of Remote Login and Computation, Data Movement and Management,
Science Workflow and Scientific Visualization Support and Application Development
and Runtime Support makes it significantly easier for Garuda’s virtual organizations,
and other individual users to add the use of GARUDA resources to their existing suite
of systems.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
2
Operational Garuda Architecture
One area where we have identified an opportunity for significant improvements is in
regard to user identity certification and credential management. Identity certification
is achieved by setting up and managing the IGCA, accredited by APgridPMA for Grid
Authentication. X.509 PKI based Security Credential management is done by
deploying MyProxy, a popular open source based tool whereas managing the
authorization data within multi-institutional collaborations is achieved by VOMS.
Standardization in the area of Remote computation and Data related activities is
achieved by adhering to the community standards proposed and provided by the
Globus Toolkit 4.X version. All the components of GT4 like MDS, SOAP, gsiftp WSRF
follows its separate standards. For detailed information on these refer document on
GT4 standards.
3. GARUDA Grid system Architecture
GARUDA's resources are key to achieving its objective of providing stable and robust
grid environment for wider variety of uses. They are not sufficient to attain other
objectives namely efficient and guaranteed QoS. Progress in these areas relies on
two additional architectural elements: (1) a set of core system components that
provide system-wide services and (2) a set of common interface definitions that
resources or services may implement in order to provide users with familiar and
consistent interfaces on which to build applications and infrastructure extensions.
This architecture is conceived based on service oriented approach, means all the
support extended / offered by Garuda to the user community through Garuda
services(GS), having a well thought and unambiguous interfaces. Both internal users
using for maintaining and managing the Garuda resources and external users,
namely application developers, want to access the resource can do so through
writing clients to these GS only.
3.1.
GARUDA Grid Core System Components
Core system components of Garuda include Network, Resources, Federated
Information Service, security with Authentication and Authorization service, Job
Management which comprises data movement, scheduling, reservation and
accounting. It also consists of access mechanisms like access portal which
primarily acts as a GUI interface for the core systems, Workflow tools and
Problem Solving Environments (PSE). Entire system need to have a proper
procedure for maintenance and packaging to ensure smooth operation of Garuda.
The above mentioned components have been captured in a layered diagram as
shown in Figure 1. In the following sections some of these components are
explained in detail and a separate section is dedicated for the description of fault
tolerance and failover process in Garuda.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
3
Operational Garuda Architecture
Grid-Enabled Applications
Grid PSE
Visualization
Workflow tool
Access Portal
Job Scheduler
Data
Packaging
Grid
Programming &
Development
Environment
-MpichG2
-Gridhra
-compiler Service
Federated
Information
WSRF +GT4 + [(login, Accounting) +other services]
Virtualization support
Grid Security and High-Performance Grid Networking
NKN
CDAC Resource
centers
Non-Research
Organizations
Research
Organizations
Educational institutions
Computing Centers
Computing Resources and Virtual Organizations
Figure 1: Highlevel System Components
3.2.
Network
Network of Garuda grid is entirely depend National Knowledge Network (NKN),
the facility built by Govt of India, The design philosophy of this n/w is to build a
scalable network which can expand both in terms of accessibility and speed. This
should act as a common Network Backbone like national highway, wherein
different categories of users shall be supported.
Figure 2: Logical NKN Architecture
Sridharan R, CDAC KP, Bangalore
Ver.1.0
4
Operational Garuda Architecture
Main objective of this n/w is to build a national highway to enable differentiates
to leverage the common infrastructure. The CORE in this N/W will have 2.5/10
Gpbs and EDGES have 100Mpbs / 1Gpbs.
Figure 2, shows the logical architecture of the NKN backbone links, Routers and
last mile connectivity to the NKN participating organizations. The last mile
connectivity to the participating agencies varies from 1 Gigabits/sec, 100
Megabits/sec to 10 Megabits/sec.
Some of the features of NKN include High capacity and Scalable backbone, highly
reliable and available, supporting strict QoS and Security, having wide
geographical coverage and having common standard platform.
Figure 3: NKN-GARUDA connectivity for Participating Organization
Figure 3 explains the Very last mile connectivity
devices to the NKN router. The GARUDA Virtual
mechanism, which enabled in the NKN routers
organizations, allows the GARUDA participating
address scheme, which was used in the earlier
continued in the on going GAURDA project.
3.3.
between the GARUDA network
Routing and Forwarding (VRF)
for the GARUDA participating
institutes to use the same IP
GARUDA project phases to be
Resources
Essentially Garuda grid is formed by pooling the compute and storage resources
and special devices like telescope provided by CDAC and it partners.
By
computing resource we mean a set of computers combine to form a cluster.
Every cluster will have a Head node with many Computing Nodes attached to it.
Every centre, be it is of partners’ or cdac will have one Gateway which acts as
entry point. There will be one or more access terminals through which user will
Sridharan R, CDAC KP, Bangalore
Ver.1.0
5
Operational Garuda Architecture
access the Garuda resources. Optionally every participating institution may
deploy firewall protection.
Access
Terminal
G
Gridfs
Access
Terminal
Internet
Access
LAN
Local
User
H
C-DAC,
Bangalor
Partner
without
resource
s
Compute
Nodes
MPLS
Access
G
User
Access
Terminal
LAN
Access
Terminal
LAN
Partner
with
resources
G
H
Storage
Legend
Compute
Nodes
H Head Node
G Gateway
Telescope
Figure 4: GARUDA SOA based deployment
Software List on each Head Node
SF1:
• Globus 4.0.7 & above and its
Dependent Software
• Ganglia (Gmetad)
• MyProxy Clients
• GridWay (Integrated with Globus)
•
•
PBS or LoadLeveler Server & Sched
(Integrated with Globus)
Plus or Maui (Reservation s/f)
Software List on Compute Node SF2:
• PBS_mom or LoadLeveler starter
• Ganglia (Gmond)
Operating Systems supported in Garuda includes various flavors of Linux and Aix.
Some of the Local Resource Managers (LRM) running in the head nodes is Torque
and LSF on Linux cluster where as Loadleveler on Aix cluster. Every cluster will
have certain amount of scratch space so that application can run using that
Sridharan R, CDAC KP, Bangalore
Ver.1.0
6
Operational Garuda Architecture
scratch and some storage space for temporarily storing the input and out. There
should be dedicated storages to handle voluminous data to support larger
retention time.
3.4.
Garuda information service
Garuda Information System (GIS), helps in keeping track of all the information
about the distributed Garuda resources, plays a major role in resource allocation,
monitoring and management. GIS needs to gather, collate and publish both static
and dynamic information. Some of static information is operating systems,
processors, storage devices, network devices where as dynamic information
includes CPU load, file systems, job queues etc.
3.4.1. GIS Vision:
•
•
•
•
Create a coordinated way for Garuda resource to publish about the
services they offer,
Device a method for Garuda to aggregate and index the information
from all the resource including those of the partners, and
To publish the collated information to the user in a form that can
easily
be accessed by user software, user interfaces, and Garuda service
providers themselves to discover Garuda’s capabilities and how to
access them
The publication of the collated information should follow a proper interface so
that all the grid applications, monitoring and discovery tools, Garuda services
and grid meta-scheduler(s) can fetch the same information for decision
making.
GIS uses the Globus MDS4 with GLUE schema 2.0 as a base component.
MDS4 has two higher-level services, an Index Service, which collects and
publishes aggregated information as WSRF resources properties, and a
Trigger Service, which collects resource information and performs actions
when certain conditions are triggered. There will be set of information
providers that collect and format resource information, for use by an
aggregator source or by a WSRF service when creating resource properties.
3.4.2. Architecture:
GIS introduces aggregation layer called Regional Information Service (RIS).
Multiple levels of RIS can be built hierarchically based on proximity of
resources in a geographical region. One among the Garuda cluster Head
nodes in that region acts also as a RIS. Prime activity of RIS is indexing the
information from different head nodes in that region at the lowest level,
where in other level RIS will index the information from the immediate lower
level RIS.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
7
Operational Garuda Architecture
Head Node
& RIS(…….)
Head Node
& RIS (II)
Head Node
Centralized
Information Server
Head Node
& RIS(II)
Head Node
& RIS
Head Node
Head Node
& RIS
Head Node
Head Node
Figure 5: GIS Architecture
Each head node will have an Index Service running, which will collect and
index the information provided by Ganglia and Local resource managers
(LRM). The information will be published to the respective RIS. Index Service
running in the RIS will publish the information to a higher level until it
reaches the centralized Information Index Server, which provides a Garuda
grid wide view of aggregated resource information. Refer figure 5.
3.4.3. High level components
WS/
HTTP
Apache 2.0
Clients
PostgreSQL
Repositories
of Garuda
grid-wide
information
Cache
Tomcat WebMDS
WS MDS4
WS/
SOAP
Clients
Figure 6: High –Level Components of GIS
3.4.4. Deployment
Information provided by Ganglia includes; basic host data (name, ID),
memory size, OS name and version, file system data, processor load data,
and other basic cluster data from and Information provided by LRM includes;
queue information, number of CPUs available and free, job count information
Sridharan R, CDAC KP, Bangalore
Ver.1.0
8
Operational Garuda Architecture
etc. Also, custom information providers can be added at head node level to
publish the information about installed and available software, tools, libraries
and their licenses. Sample deployment of Garuda Information System based
on the new architecture is captured in figure 7.
IGIB Delhi
JNU Delhi
IIT Karagpur
IIT Gauawti
IIT Delhi
GARUDA IS
backup
C-DAC Pune
GARUDA
Information
Server
Applications
& Tools
C-DAC KP
CDAC Chennai
IISC Bangalore
IMSC Chennai
RRI Bangalore
C-DAC Hyderabad
Figure 7: Sample Deployment of GIS
To realize this architecture the MDS4 must be configured on individual cluster
Head Node for collecting and indexing information from, Ganglia, LRMS and
custom information providers. Each head node will publish their information
to a pre-selected head node, by accepting it as the regional information index
server. These regional information index servers in turn publish the complete
resource information in that region to the central grid level information index
server.
3.4.5. Advantages
•
•
•
With this GIS architecture, the GARUDA meta-scheduler can fetch the
resource information directly from respective RIS avoiding the centralized
information server. This facilitates the high availability of the resource
information to the meta-scheduler, such that if a particular region or
information server in a region goes down then the meta scheduler can
carry out the work with remaining regions.
Upon the required match is not met within the RIS, scheduler can look in
the Centralized Information Server.
Applications and tools are provided with a centralized information server
for gathering information about the GARUDA Grid.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
9
Operational Garuda Architecture
3.5.
Security
Major security component of Garuda are VOMS for virtual organization
management and My Proxy for certificate management, where as Certificate is
awarded by IGCA. Clients accessing Garuda through any of the interfaces like
portal, Command line, PSE or workflow need to have a PKI X.509 certificate.
Most of the resources under garuda is protected by site level Firewall rules. Even
though Indiviaudal resource site may follow customised rules, they must adhere
to certain common rules specific to operations of Garuda. This includes rules
pertainng to opening of specfic,
set of ports related to middleware and
monitoring. Other rules related to applications can be decided on a case to case
basis.
Information
Service
VO, Certificate Management Service
Resources
Info
MyProxy
VOMS
Proxy
Certificate
with VO
Proxy Certificate
with VO
CP/CPS
Get VOMS
Attribute
IGCA
Put ProxyCertificate
with VO
ssh + voms-myproxy-init
Meta
scheduler
Certificate
Management Server
User
Certificate
Private key
CA Service
Proxy
Certificate
with VO
Proxy
Certificate
with VO
voms-myproxy-init
log-in
Client Environment
Proxy Certificate with VO
credentials
Portal
PSE
Work Flow
gsi-ssh
client
Request/Get
Certificate
Signed Job
Description
Figure : 8 Garuda secuirty Components
GAruda security components and its interactions are captured in the figure 8.
More details about the authentication and autherization service with explanation
of policy taxonomy, PDP and PEP are discussed in detail in the comming sections.
3.5.1. Certificate Award management
Sridharan R, CDAC KP, Bangalore
Ver.1.0
10
Operational Garuda Architecture
Garuda entirely depends on Indian Grid Certification Authority (IGCA) for the
management of entrie process of identifying and verfying the user crediential
and providing / revoking certificates. An
accredited member of the
APgridPMA (Asia Pacific Grid Policy Management Authority) for Grid
Authentication, IGCA provides X.509 certificates to support the secure
environment in grid computing.
This PKI certificate is the basic component without which no operations are
permitted with in Garuda grid. Every member and resources, want to be a
part of Garuda grid need to adhere to the rules and regulations of IGCA. For
detailed description of IGCA and it activities including certificate policy,
certification
practice
statement
and
end-entity
certificates
refer
http://ca.garudaindia.in
3.5.2. Authentication/Authorization Services
A primary purpose of authentication (verifying identity) and authorization
(granting permission to access resources based on identity) is to implement
the policies that organizations have created to govern and manage the use
of computing resources.
Goal of Garuda grid is to create a scalable infrastructure that leverages local
identity (AuthN) while managing access to shared resources (AuthZ) across
Garuda grid.
Authentication (AuthN) is the process in which user authentication
credentials are evaluated and verified as being trusted, or from a trusted
source. Examples of credentials include password, public-key certificate,
photo ID, fingerprint, or a biometric.
Garuda grid’s principles for AuthN are:
•
•
Grid user authentication should leverage existing local identity
management processes.
AuthN to various applications should be transparent to a user,
seamlessly integrating with the existing local infrastructure and userenvironment.
True to the heterogeneous nature, partners participating in Garuda grid do
not necessarily implement local authentication with the same mechanism.
Kerberos, LDAP (Lightweight Directory Access Protocol), password databases,
and even PKI all can be used as mechanisms to establish local identity.
On Garuda, the Globus security component, which uses PKI, has been
deployed. The Globus Gridmap file is used during grid-to local authentication
translation and vice versa and informs the grid resource that the user’s grididentity certificate has been verified.
Authorization(AuthZ) can also refer to the issuing of a token that proves a
user has the right to access resources, or permission that is granted to
access the object, or to the token itself (e.g.,a signed assertion). AuthZ is
used on grids to enforce conditions of use on a grid resource as specified by
the resource owner.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
11
Operational Garuda Architecture
In Garuda, the Gridmap file can be viewed as the mechanism for AuthZ.
Authentication occurs when the users local identity is expressed as a
certificate that is understandable within the grid PKI infrastructure. On the
resource side, authentication is completed when this certificate is verified.
From that point on, use of the certificate to obtain access to grid resources
can be considered authorization.
3.5.3. Access policy Taxonomy
Any physical users of the grid need to have AuthN-ID, Distinguished Name
(DN) and user name. Once proper mapping is done with local user, grid user
becomes a logical user having valid access ID for that cluster.
Users are grouped logically to form a user group with the special attribute and
Roles. Similarly Physical Resources will have Physical File, URL and Fully
Qualified Name (FQN), which is further divided as Logical resource having
Logical file name and unified resource Name, which is visible across the
Garuda.
“Physical” User, AuthN-ID,
DN, Username
“Logical” User, Access-ID
“Logical” Resource, LFile, URN
User Group, Attribute, “Role”
Puser/Luser/UGroup/Role
“Operation / Action
|
Op| Perm| Rgroup/LRsrc/PRsrc
Resource Group,
Classification
Permission
Permit | Deny | Not applicable
Physical Resource,
Filename, URL, FQN
Meta-Data
integrated
with
Access
Policy
Figure 9: Access policy Taxonomy
Meta-data with access policy is maintained for any Physical Resources with its
corresponding logical resource and Resource Group.
Garuda user (physical and logical), being a member of atleast one user group
having valid permission are allowed to operate on Garuda resource (physical
and logical) which is grouped based on some common classification criteria.
Refer figure 9.
3.5.4. Operation and Permissions
Varieties of tools can be deployed to permit the users to do their requested
operations based on their attributes and policy decision. Possibilities of
different tools that can be deployed if needed in the future to strengthen the
Garuda Security are captured with its control flow in figure 10.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
12
Operational Garuda Architecture
MyProxy
VOMS
GridMap
AZA
ATA
Policies
Attributes
AuthZ Svc
Attr Svc
PERMIS
XACML
SAML
SAZ
PRIMA
CAS
AZA
ATA
Policies
Attributes
AuthZ Svc
Attr Svc
VOMS
“Pull”
AuthZ assertion
Attr assertion
AuthZ assertion
Attr assertion
PDP / PEP
“PUSH”
Server
Client
AZA: AuthZ Authority; ATA: Attribute Authority
PDP: Policy Decision Point; PEP: Policy Enforcement Point
Figure 10: Control flow for PDP and PEP
Users’ PKI certificate issued by IGCA is stored with MyProxy server, and his
VO credentials are stored with VOMS server. When client want to use the
Garuda resources he will download the proxy certificate for the desired period
embedded with relevant VO attributes from the MyProxy server. This
information will be pushed to Policy decision / enforcement server.
This server in turn will pull relevant information from Authorization and
Attribute authority. These authorities will fetch the relevant policy and
attribute for the given certificate form the respective database. Upon
receiving the information the server will do the matching between requested
and approved operation and give permissions accordingly. Presently the policy
and attribute evaluation are done at single domain with centralized policy
database / service. But Garuda ultimately need to split policy rules and
attribute mappings and distribute across sites.
3.5.5. Access determination and policy assertions
VOMS
SAZ/PRIMA/GUMS
Puser / Luser/ UGroup/Role
|Op|
Data Service after staging..
Perm| Rgroup/ LRsrc /PRsrc
MyProxy AuthN – UsrName => DN Mapping
Meta-Data Catalog
Figure 11: Possible tools for Access and Policy Control
Access determination and policy assertions in Garuda are handled by
deploying open source tool kits namely MyProxy and VOMS. Figure 11 shows
role of MyProxy for user mapping and VOMS for user group. Other possible
Sridharan R, CDAC KP, Bangalore
Ver.1.0
13
Operational Garuda Architecture
public domain tools for supporting permitted operations are also indicated.
We will discuss in detail about MyProxy and VOMS in the next two sections.
3.5.6. MyProxy Deployment in Garuda
MyProxy, a open source tool for managing X.509 Public Key Infrastructure
(PKI) security credentials (certificates and private keys), installed in Garuda,
acts as an online credential repository so as to allow users to securely obtain
credentials when and where it is needed inside Garuda .
o
o
o
o
MyProxy Server, deployed on one of the identified Head Nodes acts
as centralized credential server.
One more Head node of Garuda will also gets configured with MyProxy
server as Fail-Over or Backup Credential Server incase of failure to
centralized server
MyProxy clients, comes as part of the standard GT4 distribution will be
available on all the Head Nodes.
The MyProxy server will only act as an Online Credential Repository
and will not function as a Certificate Authority
3.5.6.1.
MyProxy usage
Figure 12: Control flow in MyProxy tool
o
o
o
o
o
After obtaining the signed certificates from the IGCA, user can
optionally upload his certificates into the MyProxy server, to make
it available in the Credential repository.
The duration & validity of the stored credential can be controlled by
the user during the time of upload.
At the time of usage, the user can download the stored Credentials
from the MyProxy Server to any of the machines from which he
would like to access the Grid services. This proxy certificate
enables single sign-on and access to a node demanding
authentication.
Users can renew their expired Proxy credentials using the MyProxy
clients.
Portals, PSEs and other tools which require grid authentication, can
directly integrate with the MyProxy APIs, to download the user
credentials and access the grid services on their behalf.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
14
Operational Garuda Architecture
3.5.7. VOMS deployment in Garuda
Virtual organizations in Garuda are managed by deploying Virtual
organization Management Services (VOMS), an open source tool
developed and maintained by EGEE grid. It provides a database of user
roles and capabilities. A set of tools for accessing and manipulating the
database and using the database contents to generate Grid credentials for
users when needed, is also available. Separate set of tools are provided
for the administrators who are responsible for admitting and assigning
role for the users.
•
•
•
•
•
•
•
•
•
•
•
Each VO will have a corresponding ‘VO Server’,
Each VO servers belonging to specified application area will be setup
on one of the identified Head Nodes belonging to the respective VO.
Some other Head Node, belongs to the same VO gets installed with VO
Server and acts as a Fail-Over.
(In addition there will be a common VO server, deployed
centrally will be used for the members not belong to any of the
application VO.)
VO Management Clients will be available on all the Head Nodes
The VO Server can contain the fine grained Access Control List & Role
definitions for the members of the VO
Role definitions can be used to collectively define a set of authorized
operations.
Individual Grid services can be enabled to validate & authorize based
on the VO credentials.
Resources can be part of different Virtual Organizations (VO) in a grid.
The same resource can be a part of one or more VO.
Grid Users can become members of different VOs in the grid.
3.5.7.1.
VOMS usage
VOMS-Proxy-Init with
desired role
Client tool for
role-selection
VOMS-proxy-init
Retrieves VO membership
/role attributes
VOMS
Server
VOMS
Attribute
Repository
Figure 13: Control flow in VOMS tool
o
o
o
o
o
In addition to the standard X.509 based Grid certificates, the users
will be required to obtain separate VO credentials, from the VO
Server to get authorized and access the VO enabled services.
The VO credentials identify the membership and ‘Role” of a Grid
User to a specific VO.
A Grid User can have multiple ‘Roles’ defined in a VO. At the time
of request for the credentials, the users can specify the Role for
which they need the credentials.
The VO Credentials will be appended to the standard X.509 grid
proxy certificates.
VO services can then grant or deny access to the users making
requests, depending upon the validity of these credentials.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
15
Operational Garuda Architecture
3.5.8. Integrated AuthN & AuthZ
Control flow of the Integrated authentication and authorization using MyPoxy
and Voms is captured in figure 14. Garuda has a centralized MyProxy server
and two VO server namely VO1 and VO2 installed in some of the head node.
MyProxy Server
VO-1 Server
Head Node
VO-2 Server
Head Node
Head Node
LEGEND
Head Node
VO 1
Grid Credential
Request
Proxy Credentials
VO 2
VO Credential
Request
VO Credentials with Roles
Figure 14: Flow of Integrated AuthN and AuthZ in Garuda
4. Job Management in GARUDA
A key area in Grid computing is job management, which typically includes planning
of job’s dependencies, selection of the execution cluster(s) for the job, scheduling of
the job at the cluster(s) and ensuring reliable submission and execution.
Garuda provides mechanism to Reserve the resources for their assured availability
which enables smooth completion of job. It also provides functions to capture
Accounting information once job ends, to create the Usage Record (UR) and
Resource Usage Services (RUS) as standardized by OGF (Open Grid Forum).
4.1.
Garuda Job Management Interfaces
Job Management in Garuda is handled by the User Portal, Command Line
Interface Meta-scheduler, Local resource Manager and other various components
that constitute the Garuda Middleware.
The access points for GARUDA users for job submission are the GARUDA Portal
and the Command Line Interface (CLI). The Garuda Access Portal is also
accessible through Internet, so that users not belonging to the Garuda Network
can submit jobs on any of the Garuda resources. Other job submission
mechanisms like Problem Solving Environments (PSE) and Work Flow tools are
accessible through the Portal.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
16
Operational Garuda Architecture
User
Internet
Client Environment
Garuda
Portal
User
User
PSE
Work
flow
User
Garuda Middleware
CLI
CLI
Garuda Network
Garuda
M/W
Garuda
M/W
LRM
LRM
Compute Nodes
Compute Nodes
Figure 15: Garuda Job Management Interfaces
Both PSE, an environment ready for specific problems that has been already
enabled in the grid and Work Flow tools, where in programs, data and I/O
relations with their execution sequence are defined and submitted as a job can be
invoked through Garuda portal. Using the Resource Reservation interfaces
available in the Head Nodes Garuda compute nodes can be reserved to run the
jobs during the specified time slots.
Jobs submitted on any of the resources in GARUDA can be tracked and queried
for job status using portal. Users can also make use of the Command Line
Interface (CLI) available on the Garuda Head Nodes, to login to the GARUDA
Grid and submit their jobs on the resources.
4.2.
Types of jobs supported in Garuda
Users can submit Jobs which may include, but not restricted to the following
types of jobs:
Serial/Sequential Jobs
Array Jobs
Client/Server Jobs
Data Transfer Jobs
Sridharan R, CDAC KP, Bangalore
o
o
o
Parallel / Distributed Jobs
Distributed jobs without Parallel programming
paradigm
MPI Jobs running on a Single cluster
MPI Jobs distributed across different cluster
Ver.1.0
17
Operational Garuda Architecture
Jobs requiring specific software environment for execution can be made available
on selected clusters, on a request on a case to case basis. Currently the Garuda
Job submission mechanism does not support the submission of Interactive and
Real time jobs which require user intervention during the program execution.
4.3.
Components involved in Job Management
The Job submission portal and the Command line tools accept the job request
from the end users and pass it to the GARUDA Metascheduler (Gridway).
This meta-scheduler takes care of workload management in GARUDA grid.
Garuda Job
Submission
GridWay
Accounting
Globus
Accounting
Local Resource Manager
Failover
Cluster
Accounting
DB
Accounting
Replica DB
Figure 16: Various components of Garuda Job Management
The Reservation module is well integrated into the Garuda Access Portal. The
module also offers command line tools which can help in creating advance
resource reservations. The meta-scheduler can schedule the jobs on reserved
resources based on a prior Resource Reservation Identifier.
•
•
•
•
4.4.
Job execution is achieved using the Globus GRAM services
The super scheduler submits the jobs to the GRAM components available
on the Head nodes, after querying the information system to find a
suitable resource candidate on which the Job can be potentially executed.
Job execution is carried out in the computing cluster with the help of the
Local Resource Managers namely PBS/Torque and Loadleveler.
The Accounting module integrates with Gridway, RLM and Globus
components to gather the usage information and store it in a centralized
database.
Data staging
Data staging for job execution in Garuda is primarily dealt through Gridftp which
uses ftp protocol beneath. Users can also use the RFT which intern uses the
Gridftp. Storage from where to take input and place where to copy the output is
to be specified by the user through meta-scheduler. Input and output of the
applications will be moved into scratch space during the execution of the
application. After execution is complete outputs will be moved to storage as
Sridharan R, CDAC KP, Bangalore
Ver.1.0
18
Operational Garuda Architecture
specified by the user. These data needs to be moved to permanent locations as
early as possible failing which Garuda will not guarantee the availability of data
for the future reference.
4.5.
Super scheduler
According to this new architecture GridWay, the meta-scheduler gets installed in
all the head nodes.
GridWay is customized to support Garudas’ resource
reservation module by introducing wrappers for some of the Gridway commands
like…… Customized GridWay need to recognize the virtual organization concept
and identify resource according to the user member ship to a particular VO
before scheduling his job in the Grauda.
GridWay has been configured to look for the resource match primarily in their
corresponding Regional Information Server. Unsuccessful search will lead
GridWay to querying the grid-wide centralized Information server. This feature is
achieved by modifying the MAD module of Gridway. Connection failures to the
centralized information server are overcome by searching into the pool of
regional information servers. This deployment model ensures high availability of
resources information and allied services for job submission in GARUDA.
4.6.
Resource Reservation and Management (RRM)
Advance reservation facility in grid ensures the availability of resources required
by a user or an application in future times. Resource reservation in grid, which
can possibly independent of a particular job, can be requested by user or
administrator. It will be granted by the reservation management system based
on the privileges of the user or application, and in accordance with the policies
enforced on the requested / matched resources. The reservation causes the
associated resources to be reserved for the specified user, administrator, or an
application.
4.6.1. RRM Vision
o
o
o
RRM should allow the use of reserved resources for the specified
duration without any influence from the subsequent reserved and nonreserved jobs.
Avoid non-utilization / mis-utilization of reserved resource by
employing suitable mechanisms.
Some resource should be made available to run non-reserved jobs
also.
4.6.2. Reserved job
The Garuda middleware should handle Co-allocation job as the reserved ones.
MPI jobs demanding no Co-allocation are to be treated as non-reserved one.
The Co-allocation jobs are referred to as jobs which run multiple computing
resources concurrently to obtain solutions. To ensure their operations, the
reserved jobs are to be assigned a higher priority than the non-reserved
ones.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
19
Operational Garuda Architecture
Once resource reservation is established for a reserved job, the job should be
started at the specified time, date as specified during the reservation period..
Failing which, the reservation will be automatically cancel the reservation
after certain period of time. This elapse time could be configurable. Jobs
should strictly adhere to maximum runtime for which the resource is booked.
4.6.3. Non-reserved job
Non-reserved jobs are run without usage time reservation according to the
operation policy like First Come, Fair Share, etc. These non-reserved jobs
may be put together with the reserved jobs. These jobs are scheduled to run
using free resources and free time of the established job, however, the
running non-reserved jobs are to be canceled when any reserved job is
submitted or run.
To minimize the impact of this feature it is recommended to prepare a
computing resource which cannot be reserved to assure an environment for
running the non-reserved job. If an attempt is made to run the reserved or
non-reserved job exceeding the specified duration, the job is forcibly
terminated
4.6.4. Architecture of Garuda RRM
The Reservation architecture in Garuda can be achieved using a two level
implementation approach namely, Grid level and cluster level components.
The block diagram depicting the components that are involved RRM of Garuda
grid is captured in figure17.
Garuda level reservation component
GridWay Meta-scheduler
Globus 4.x
Reservation
DB
Garuda Middleware
Reservation
Component
Garuda LRM
reservation component
Reservation
Replica DB
Failover
Reservation Manager
and Scheduler
Local Resource Manager
Figure 17: components of Garuda Resource Reservation
Cluster reservation component responsible for enforcing reservations at the
individual cluster along with cluster reservation manager and scheduler,
guarantees the mapping of grid reservation on to the cluster compute nodes.
These components will reside on every cluster participating in the Garuda.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
20
Operational Garuda Architecture
4.7.
Accounting Server
Most commercial organizations are not likely to go on the Grid" unless they are
paid for the resources they provide. In order to enable economic compensation, it
is necessary to keep track of the resources utilized by Grid users. This is where
Grid accounting enters the picture.
The information acquired through Grid accounting can serve several useful
purposes. Some of them can
•
•
•
•
•
Form a basis for economic compensation. Once usage information is
available, the resource provider could apply a transformation to convert
resource usage into some monetary unit. Direct payments could also be
envisioned, where resource usage is charged for immediately, e.g., by
means of credit card transactions or Bank payment gateways.
Be used to enforce Grid-wide resource allocations. E.g., resources might
only grant access to users whose current resource allocation has not been
used up.
Allow tracking of user jobs. Users can obtain information about a
submitted job, such as where it was submitted, the resources it consumed
and the output it generated.
Enable evaluation of resource usage. Resource providers need to be able
to determine to what extent different resources have been utilized.
Furthermore, administrators can obtain information about what job
executed on a specific resource at a certain time. Such information can be
useful when debugging programs or tracking malicious users.
Be used by resources to dynamically assign priorities to user requests
based on previous resource usage.
4.7.1. Architecture of Garuda Accounting Service
GARUDA is deployed with Globus that supports accounting and auditing
facilities, as its middleware. Accounting information related to jobs submitted
to Globus can be captured into a centralized server using PostgresSQL
database through this facility.
Similarly information like cpu and memory utilization, execution time etc.,
related to the jobs submitted to LRM can also be configured to get into the
central server by writing custom module. The most important activity is to
identify the mapping between Globus and LRM job ids. Once identified by
combining these two tables we shall get all the accounting information.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
21
Operational Garuda Architecture
Client 1
Client 2
Client 3
...
Client n
Accounting Service
PostgreSQL
job_id
start_time
end_time
cpu_time
memory
……….
Head Node
GridWay
Meta-scheduler
Globus GRAM
Head Node
Globus GRAM
LRM Server
LRM Server
Compute Nodes
Compute Nodes
Figure 18: components of Garuda Accounting Service
Service interface need to be developed to give the requested information by
the individual client after querying the database on user selected filters.
These queries may be on the usage details based on “Per User Basis” and
“Per Resource Basis”.This service model provides the much needed
modularity so that changes in the individual client and its access method can
be avoided even if the accounting database schema changes in the future.
4.8.
Login and compiler services
Garuda grid being built using Service oriented Architecture; some of the essential
facilities of Garuda need to be exposed to the user through service components.
Immediate requirement is that of Login and compiler services.
Since Garuda can potentially host various kinds of large resource and applications
belonging to various user groups there exist a need to give single sign-on so as
to avoid requesting user to go through the Authz mechanism for each resources
with in a session. More over this service should be built with the facility to
accommodate various types of AuthN and AuthZ mechanism that garuda can host
in the future. It needs to take minimum input parameters from the user and give
single identifier which will be recognized across the Garuda for Auth purposes.
As Garuda is made up of different resource pooled from partners having variety
of operating systems and LRMs building application will have significance in an
application development for Garuda. In order to hide the complexities derived out
Sridharan R, CDAC KP, Bangalore
Ver.1.0
22
Operational Garuda Architecture
of garuda’s heterogeneous nature a compiler service need to be deployed to build
applications quickly.
Apart from having the normal feature of a grid service, Compiler services is
expected provide features like provision to modify the makefile even though it
can generated a makefile on its own after getting necessary requirements form
the user, resource selection facility to helps user to select the best matching
resource to build his/her application etc. Inbuilt intelligence to eliminate non
supportive resources based on the availability and the user requirement of special
libraries, software etc…
5.
Fault tolerance mechanisms
Considering the heterogeneity and geographical distribution of Garuda resource it is
very difficult to make sure all the resource is always available. In such a scenario
Failover / Fault-tolerance becomes important so that user will not face any difficulty
in using the resources. Very important aspect is that user should not aware of
change and should not feel the hitches during their utilization. Triggering a sequence
of action upon some service and resource failure, sending an alert and managing the
entire process also holds a vital role for a smooth functioning operational Garuda.
Every individual tools or components of Garuda can have there own fault-tolerance
mechanisms, however for the some of the component that forms essential part of
the architecture is capture here.
All the tools developed as part of this architecture which requires a centralized
database will replicate the same in a geographically different location with proper
sinking mechanism. However there is no mandatory that the entire tool should use
physically the same database server as current architecture does not envisaged
providing database itself as service.
5.1.
Information Service
In the case of Information Server there will be a backup Information Server,
which will also follow the pull mechanism and pull the information data available
with the Regional Information servers. Same FQDN will be assigned to both the
machine. DNS server need to be tweaked to resolve to the backup server once
the first machine failed. As resource information in a grid is of dynamic in nature
there won’t be any stale information in either of the server.
5.2.
Resource Reservation System
Failover mechanism of Garuda Reservation System needs to dealt in two Parts.
The Grid and Cluster Level Reservation Program Components failure can be
redressed easily as these components are deployed in each and every cluster
participating in Garuda. To overcome this failure, user can login to other cluster
and continue his activity with out facing any problems.
The second part is failure of Reservation Database System that needs to be
solved by adhering to replicating methods. The correctness of this is highly
depends on sinking mechanisms that is deployed.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
23
Operational Garuda Architecture
5.3.
Monitoring, Alert and Event triggering
Operational Garuda is deployed with Centralized Nagios, a public domain s/w for
Comprehensive monitoring of all Garuda infrastructure components - including
applications, services, operating systems, network protocols, system metrics, and
network infrastructure. Nagios need to be configured to bring awareness to
Garuda administrators via email and SMS. Multi-user notification escalation
capabilities of Nagios ensure alerts reach the attention of the right people.
Nagios can remediate the problems if we configure Event handlers to
automatically restart failed applications, services, servers, and devices as soon as
problems are detected. Extendable Architecture of Nagios provides methods for
easy integration of in-house developed and third-party applications. Using this
facility Garuda can write its own plug-in to gather information relevant to support
QoS.
Nagios helps in meeting the accepted SLAs by providing historical records of
outages, notifications, and alert response for later analysis.
Sridharan R, CDAC KP, Bangalore
Ver.1.0
24