Presentation - Information Services
Transcription
Presentation - Information Services
Role of advanced e-infrastructure in Scientific Collaboration and Big Discovery Subrata Chattopadhyay CDAC , Bangalore [email protected] 18-Mar-15 Outline • • • • • • C-DAC and background Recent Big Experiments and discovery Role of e-Infrastructure – HPC / Grid / Cloud G.A.R.U.D.A - platform for collaboration Use Cases – Applications Conclusion 2 C-DAC Centre for Development of Advanced Computing (C-DAC) is the premier R&D organization of the Department of Electronics and Information Technology (DeitY), Ministry of Communications & Information Technology (MCIT) for carrying out R&D in IT, Electronics and associated areas Established in 1988, spread over 11 cities about 3000 employees Eleven Centres Thematic Areas High Performance Computing, Grid and Cloud Computing Multilingual Computing and Heritage Computing Professional Electronics including VLSI and Embedded Systems Software Technologies, including FOSS Cyber Security and Cyber Forensics Health Informatics Education and Training Significant Achievements: HPC, Grid & Cloud Computing C-DAC's PARAM YUVA-II: I dia’s fastest and the first supercomputer to cross 500 TeraFlops peak performance Ranked No. 44 in the Green500 list of World's Supercomputers announced in November 2013. It is the No. 1 system in India and No. 9 in Asia Pacific as per this list Bio Blaze: an exclusive Supercomputer for Bioinformatics research for better diagnosis of diseases and discovery of new drugs Meghdoot Cloud Stack: A free and open source cloud stack Launched PARAM Shavak – Supercomputer in a Box on 25th December 2014 by Shri Ravi Shankar Prasad, Hon'ble Minister of Communications and IT National HPC Facilities NPSF @ Pune Biogene CTSF @ Bangalore Biocrome System Bioinformatics Resources & Applications Facility (BRAF), Pune Scientific Discovery Observational to Computationally-intensive research Computer simulations reconcile the inductive and deductive approaches of the Scientific Method 2 9 Time scale(s) LHC start LHC simulation LHC End ??? LHC approval CLIC simulations CLIC Approval ?? CLIC start CLIC End ????? 10 Square Kilometre Array - SKA • Next generation radio telescope • Large multi national Project • 100 x more sensitive • 1000000 X faster • 5 square km of dish over 3000 km • Cost Euro 1.5 Billion , construction start in 2018, partially • Ready in 2020, fully in 2025 • 10 member countries, India is Assoc. member • Currently the worlds most ambitious IT project • First real exascale ready application • Largest global big-data challenge SKA is a cosmic time machine Cosmic Questions: • Universe not eternal - what beginning and what end? • Shape – Sphere / Saddle / Flat • Multiverse ? Universe : • • • • 0.5% planets & stars 4% gas 24% Dark matter 71.5% Dark energy Science data processor pipeline Beam Steering SKA 1 SKA 2 Observation Time-series Buffer Searching 10 Tb/s 50 PB 200 Pflop 1000Tb/s 10/1 TB/s 10 Eflop 10 Tb/s Software complexity Imaging Search analysis 10 Pflop 1 Eflop Image Storage HPC science processing Beamforming/ De-dispersion Gridding Visibilities Bulk Store Switch Observation Buffer Image Processor Visibility Steering UV Processor Course Delays Fine F-step/ Correlation Buffer store Corner Turning Buffer store Non-Imaging: Switch … Incoming Data from collectors Course Delays Correlator Beamformer Corner Turning Imaging: Object/timing Storage 1 EB/y 100 Pflop 1 Eflop 10 EB/y Thirty Meter Telescope (TMT) Project • Time line – – – – 2004 2009 2011 2018 – – – – – – UC Caltech Canada Japan India China project start, design development preconstruction phase start construction complete, first light, start AO science • Partnership • Cost – 970M$ GTC 2009Jul25 14 About TMT The project was conceived in the year 2004 USA, Canada, Japan, India and China are the participating countries Construction 30 meter diameter primary mirror. Mirror consists of 492 smaller (1.4 m), hexagonal mirrors. The shape of each segment, as well as its position relative to neighbouring segments, controlled actively. A 3 m secondary mirror produces an unobstructed fieldof-view of 20 arc minutes in diameter Makes use of Adaptive Optics Scientific instrumentation for gathering information apart from images TMT Mauna Kea GTCELTs 2009Jul25 ALMA and 2009 16 16 BRAIN Initiative • Announced by US president in 2013 • Mapping and understanding the most complex organ – 100 Billion Neurons • The Brain Research through Advancing Innovative Neurotechnologies Initiative (BRAIN Initiative) is a broad, collaborative research initiative to unlock the mysteries of the human brain 17 The BRAIN Initiative: Surviving the Data Deluge Mapping brain activity will produce nearly as much data as the Large Hadron Collider, yet managing the sheer volume of information will be the simplest challenge for brain data managers. BRAIN Initiative spans biology, physical sciences, engineering, computer science, and the social and behavioral sciences. Research - the development of molecular-scale probes that can sense and record the activity of neural networks; Adva es i “Big Data to a alyze the huge a ou ts of i for atio mainly to understand how thoughts, emotions, actions, and memories are represented in the brain. # Fund $300 million /year for next 10 years – by 3 federal agencies - NIH, DARPA and NSF also 4 private research institutes Role of e-Infrastructure Grid Computing Climate Modeling Disaster Management Bio Informatics CFD Crypt analysis Grid Middleware GG-BLR GG-CHE GG-HYD TF BLR TF PUNE IITD 20 PRL YUVA Grid Computing Sharing of resources among the community Seen as a collective pool Heterogeneous Geographically distributed Different Administrative domains Wide variety of Tools, Interfaces to choose with. Components of Grid Middleware 22 Popular Middleware • • • • • • Globus – Globus Alliance GridBus – University of Melbourne UNICORE - Uniform Interface to Computing Resource gLite – CERN / EGEE /EGI Legion – (Avaki - Corporate Distributor) Alchemi – (.NET Grid Computing Framework) • Condor • SGE 23 70 + Partners 6000 CPUs – 550TF 1700 + Certificates 220TB Storage EGI, chain reds caBIG NKN GARUDA – India’s national grid computing initiative bringing together academic, scientific and research communities for developing their data and compute intensive 18-Mar-15 applications. National Knowledge Network (NKN) Emerging Advanced Network Themes Virtual Classrooms Remote Medical Diagnosis Collaborativ e Research Grid Computing Multi-10G Core Backbone Typically 1G at the Edge 1000+ Institutes Connected of 1500 Approved Eventually Connect 255,000 Villages Core Distribution Edge International: Mumbai-CERN Link: D. Foster Initiative (from 2008) Now 2G guaranteed; bursting to 10G 2013: Transition to an International 10G Infrastructure – Indian Grid Certification Authority located at C-DAC, Knowledge Park, Bangalore, India. – IGCA is the accredited member of APGridPMA. – Issues X.509 Certificates to support the secure environment in Grid. (for GARUDA, institutes that do research in grid from India and foreign institutes that collaborates with GARUDA). – http://ca.garudaindia.in 1749 • Certificates Issued 41 • Valid Host Certificates 45 • Registration Authorities CLI Workflows Grid PSE Access Portal Cloud Interface Federated Information Server Hand held devices Programming Development Environment Job Scheduler WSRF+GT4 + other Services + Cloud S/W (Nimbus/ VMware) Virtualization support Grid Security and High-Performance Grid Networking NKN CDAC Resource centers Non – Research Research Organizations Educational institutions Computing Centers Organizations Computing Resources and Virtual Organizations Resources 18-Mar-15 Security Middleware Resource Management User Environments Programming Environments Data Grid Data Grid Resource Enabler & Monitoring GARUDA – enabled Applications Visualization Garuda Access Portal GSRM 18-Mar-15 Paryavekshanam Garuda Information Registry Garuda GridFTP Globus Online 18-Mar-15 AGSG PSP Scilab Galaxy Workflow – OSDD Garuda Megha VRGeo 18-Mar-15 Garuda User Forum CDAC Resource : • 4TF HPC clusters each at Bangalore, Chennai & Hyderabad • PARAM Yuva II at Pune and PARAM Padma at Bangalore Fourteen of the partner institutions are also contributing resources including satellite terminals. Total computing power is more than 6000 CPUs equivalent to 550TF Storage space 220 TB 18-Mar-15 Job Flow RSL Job Template Output & Error files are available to user GG-CHE Gridway GRIDFS IMSC USER IITD GLOBUS TF PUNE GG-BLR TF BLR GG-HYD LRM SCRIPT 32 18-Mar-15 DMSAR Processing in Grid DMSAR – Disaster Management using Synthetic Aperture Radar Co po e ts of a Grid e a led SA‘ Syste Disaster Remote Visualization Data Acquisition & Raw data Transfer Transferring the captured data into disaster Ground Unit Raw data transmission to the GARUDA Grid Head Node Raw data Splitting & Initialization Bangalore P-1 Bangalore Grid Head Node Linux AIX Cluster Cluster Delhi P-2 TB of Input Raw data @ t1 Splitter Programme Linux Cluster Chennai P-3 Linux Cluster Pune P-4 Linux Cluster Backward Transfer of Results Ban-HN A/L Cluster Pune Visualization Server @ Bangalore Linux Cluster Delhi Linux Cluster Chennai Using GARUDA High Speed Network resources Linux Cluster Using G-SAT resources Grid based Real time Remote Visualization Setup Windows based tool interfaced with Grid Setup Bioinformatics : Open Source Drug Discovery Project Team : OSSD community OSDD HeadNode Internet / NKN Garuda Middleware Stack, login service, Gridway Metascheduler DB Ext DB OSDD User Community OSDD Customized Galaxy Internet / NKN GARUDA Grid Internet / NKN Garuda Middleware Stack JNU Cluster OSDD Tools – weka, cdk,… 18-Mar-15 • Galaxy Workflow for genomics proteomics applications • Distributed job execution through Gridway LRM- Torque Yuva Cluster • HPC clusters to run drug discovery problems • Users connected through both NKN and Internet NKN GGHYD Cluster • OSDD users given access to Garuda through OSDD VO Other OSDD Cluster Grid Enabled Bioinformatics tools useful in drug discovery pipeline CAE: Aeroacoustics Optimization Project Team : Zeus Numerix Aim: Optimize the noise generated by a 3-D wing with flaps in landing configuration by variation of flap location and orientation. • Uses Kepler workflow Framework integrated with native Globus job submission routines • Optimization Module uses OPT4J framework • Optimization module includes AFFG (Adaptive Fuzzy Fitness Granule) routine which can reduce the number of fitness function evaluations up to 50%. • Completion sucessful 40 simultaneous simulations (parallel + serial )e Fuzzy Fitness Granule) routine which 18-Mar-15 18-Mar-15 Scilab • Open source, cross-platform numerical computational package and a high-level, numerically oriented programming language. • In collaboration with IITB • scilab.in accesses Megha for executing scilab code and rendering graphics • Many textbooks examples are solved and available as part of text book companion project 43 VRGeo Open-Source Collaborative Mapping Platform for Crowdsourcing Geospatial information 44 18-Mar-15 The Global Grid… and the “non-Global” middleware CNGrid Genesis II NKN & Garuda GISELA 46 courtesy : Roberto Barbera, INFN SAGrid & SANREN EUAsiaGrid Collaboration in CHAIN-REDS 2,800 people outreached in total • Serving applications of National Importance • First in India • Global Integration – Alliance with the Open Source Drug Discovery (OSDD) project of CSIR – Disaster management applications – Weather forecasting models & Earthquake engineering – Applications from the fields of Bioinformatics, CAE & Material sciences – Setting up of Indian Grid Certification Authority (IGCA) in 2009, to issue digital certificates for grid researchers in India – Digital certificates trusted by other International Certification authorities – Issued more than 1400 IGCA certificates – Integrated with the European Grid Infrastructure through the EU-India Grid – Achieved middleware interoperability between the European Glite middleware & Garuda middleware components Conclusion • C-DAC leading key developments in HPC, Grid and Cloud • Advanced e-infrastructure play a critical role in big scientific discovery • Garuda – unique platform in India provides opportunity for R&D collaboration in order to solve national problems • Garuda also aims to accelerate international Collaboration for research in next generation technology 50 Applying Advanced Computing for Human Advancement Thank you www.cdac.in