proposal riset

Transcription

proposal riset
HIGH PERFORMANCE COMPUTING FOR MOLECULAR DYNAMIC ANALYSIS :
PART OF INDONESIAN
HERBAL PHARMACOLOGICAL SCREENING ACTIVITIES IN SILICO STUDY
Heru Suhartanto1, Arry Yanuar2
Ari Wibisono1, Muhammad Hilman1, Surya Darma3
1: faculty of computer Science, univerity of Indonesia (UI)
2: dept of pharmacy, UI
3: dept of physic, UI
Presented at SEAIP 2010,
It will be available at http://hsuhartanto.wordpress.com
OUTLINES
• Introduction
• Molecular docking and dynamics
• Indonesian Higher Education Networks
• InGrid (Indonesian/Inherent Grid)
• Tentative HPC performance in Molecular Dynamics
• Challenges and prospects
MOLECULAR DYNAMICS
Molekul
Source :4)
Atom
Source :6)
Protein
Source :5)
MOLECULER DYNAMICS
Moleculer Dynamic
Simulation
Drug Discovery
Understanding Molecule Structure
Trajectory
Position
Memontem
Time
In Vitro
Source :10)
PROTEIN SIMULATION
Curcumin
Curcuma Longa
Inhibitor
Anti-Cancer Compound
Inflamation
PROTEIN SIMULATION
MOLECULAR DOCKING AND VIRTUAL SCREENING
• Molecular docking is a computational procedure that attempts to
predict non covalent binding of macromolecules.
• The goal is to predict how small molecules, such as substrates or drug
candidate, to bind to a receptor of known 3D structure.
• The prediction process is based on information that embedded inside
the chemical bond of substance.
• Autodock Vina is used in the simulation.
7
THEORITICAL ASPECT IN DOCKING
CONFORMATION SEARCH & SCORING FUNCTION
Conformation search
between ligand and receptor
using spesific algorithm that
being calculated by certain
scoring function
MOLECULAR DYNAMIC SIMULATION
• used to study the solvation of proteins, the interaction of DNA-protein complexes
and lipid systems, and study the ligand binding and folding of proteins.
• to produce a trajectory of molecules in a finite time period, where each the
molecules in these simulations have positional parameters and momentum.
• be used to assist drug discovery. The usage of computers offer a method of insilico as a complement to the method in-vitro and in-vivo that are commonly used
in the process of drug discovery. Terminology in-silico, analog with in-vitro and invivo, refers to the use of computer in drug discovery studies
• GROMACS is used in the simulation.
10
MOLECULER DYNAMICS
Drug Discovery
In Vitro
Moleculer
Dynamics
• Protein Information
• Conformations
• Enzyme Activity
GROMACS
Gromacs
Gromacs (Groningen Machine for
Chemical Simulation)
Universitas Groningen Belanda
Moleculer Dynamics
One way to assess the movement of a molecular system according to the laws of
physics
STAGES IN MOLECULAR DYNAMICS SIMULATION
WITH GROMACS
Newton Molecules Movement in
Newton equation
Atom (i=1,2,...,N) From
molecules system.
atom coordinate (r),
speed (v), atom (i), mass
(mj)
PRELIMINARY TEST
• GROMACS/inGrid
Rad
Gem
10 Ns eksperiment/simulation needed 14 days of 5 processors power.
AN ILUSTRATION
Molecular dynamics
(10 ps)
Molecular docking
(Binding constant = Aktifitas farmakologis)
Arry Yanuar
LENGTHER SIMULATION TIME NEEDS MORE COMPUTATION
RESOURCES
CPU
days
1
0
41,7
206,3
1250
3750
4
0
10,4
52,1
512,5
937,5
8
0
5,2
26,0
156,3
468,8
16
0
2,6
13,0
78,1
234,4
32
0
1,3
6,5
39,1
117,2
The visualitation of Gromacs results in 90 ns which shows the development of veskel
DPPC (dipalmitoylphosphatidylcholine) [De Vries 2008]
INHERENT:
INDONESIA HIGHER EDUCATION NETWORK
18
INGRID: INHERENT/INDONESIA GRID
• Idea
– RI-GRID: National Grid Computing infrastructure development proposal, Mei 2006, by FAculty of
Computer Science, UI
• Part of UI competitive grants (PHK INHERENT K1 UI)
”Menuju Kampus Dijital: Implementasi Virtual Library, Grid Computing, Remote-Laboratory, Computer
Mediated Learning, dan Sistem Manajemen Akademik dalam INHERENT,” Sep ‟06 – Mei „07
• Objective:
– Developing Grid Computing Infrastructure with computation capacity intially 32 processors (~intel
pentium IV) and 1 TB storage.
– Hopes: the capacity will improve as some other organization will joint the InGRid.
– Developing e-Science community in Indonesia
19
INGRID: PORTAL
HTTP://GRID.UI.AC.ID/PORTAL
20
THE INGRID ARCHITECTURE
inGRID
PORTAL
User
U*
Globus
Head Node
User
Windows/x86
Cluster
INHERENT
Linux/x86
Cluster
Solaris/x86
Cluster
UI
I*
Globus
Head Node
Globus
Head Node
Linux/Sparc
Cluster
Custom
PORTAL
21
H/W SPECS
• inGRID Portal
– SUN Fire X2100, AMD Opteron Processor (2.4 GHz, dual core), 2 GB Memory, 80 GB Disk, 2
10/100/1000 Mbps NICs, DVD-ROM Drive
• Globus Head Node
– SUN Fire X2100, AMD Opteron Processor (2.2 GHz, dual core), 1 GB Memory, 80 GB Disk, 2
10/100/1000 Mbps NICs, DVD-ROM Drive
• Linux Cluster (16 nodes)
– SUN Fire X2100, AMD Opteron Processor (2.2 GHz, dual core), 1 GB Memory, 80 GB Disk, 2
10/100/1000 Mbps NICs
• Storage Server
– Dual Xeon Processor (3.0GHz), 2 GB Memory, 1 TB Disk
22
HW/ SW SPECIFICATION
(CLUSTER HASTINAPURA)
Source :13)
Head node (1)
• Sun Fire X2100
• AMD Opteron 2.2GHz (Dual
Core)
• 2 GB RAM
• Debian GNU/Linux 3.1
“Sarge”
Worker nodes (16 )
• Sun Fire X2100
• AMD Opteron 2.2GHz (Dual
Core)
• 1 GB RAM
• Debian GNU/Linux 3.1
“Sarge”
Storage node (1)
• Dual Intel Xeon 2.8GHz (HT)
• 2 GB RAM
• Debian GNU/Linux 4.0-testing
“Etch”
• Harddisk 3x320 GB
HW/SW SPECIFICATION
(CLUSTER FARMASI)
Worker Node HW (6 Unit/24 Logical Prosessor)
•
•
•
•
•
Prosessor Intel QuadQore (2.66 GHz)
RAM : 4 GB
Hard Disk Drive : Western Digital 320 GB
Graphic Card : NVIDIA GeForce 8800
Ethernet Speed: 1Gb /s
Worker Node SW (6 Unit/24 Logical Prosessor)
• NFS (Network File System)
• MPI (Message Passing Interface) MPICH2
• Gromacs 4.0.5
HW/SW SPECIFICATION
(CLUSTER FARMASI)
Database Server
grid01
grid04
Web Server
Router Farmasi
grid01
Gigabit Ethernet Switch
JUITA (Jaringan Universitas
Indonesia Terpadu)
grid03
grid05
grid06
INGRID S/W SPECS
• User Interface:
– UCLA Grid Portal
• Middleware
– Globus Toolkit
• Job Scheduler:
– Sun Grid Engine (SGE)
• Programming:
– C, Java
– Paralel: MPICH
• Applications:
– Chemistry:
• Gromacs
– Biology:
• Blast
– Computer Graphic:
• Povray
– Utilities:
• Matrics multiplication, Sort, Octave
(Matlab-like)
26
AUTODOCK VINA 1.1
developed by The Scripps Research Institute, nonprofit biomedical research from San Diego,
California, USA
Autodock Vina is the next generation of
molecular docking engine after The Scripps
Research Institute released Autodock in the
first place
Boost C++ libraries for multithreading
Modified parallel Monte Carlo method
BFGS, an efficient quasi newton was used
Autodock 4.2 dan Autodock Vina 1.1
can take advantage of cluster technology
as embarassingly parallel application
MESSAGE PASSING
• Embarrassingly Parallel (EP) Paradigm
• No communication required
• Easily load balances
• Perfect speed up
• Regular and synchronous
• Easily (static) load balances
• Expect good speed up for local communication
• Expect reasonable for non-local communication
• Irregular and/or asynchronous
• Difficult to load balances
• Communication overhead usually high
• Usually can’t be done efficiently
using data parallel programming
Parallel Paradigm
EP PROBLEM
• Each element of an array (sub
problems) can be processed
independently of the others.
• No communication required, except to
combine the final result.
• Static load balancing is usually trivial –
can use any kind of distribution since
communication is not a factor.
• Dynamic load balancing can be done
using a task form approach.
• Expect perfect speedup.
Parallel Paradigm
EP PROBLEM
Disconnected computational graph
Parallel Paradigm
EP PROBLEM
Dynamic master slave approach
Parallel Paradigm
EXPERIMENT RESULT
Autodock Vina 1.1,
Speed up for cluster [22 cpu] in Autodock vina 1.1 is 29.16 with efficiency
1.325
Autodock Vina 1.1
Serial
Paralel
15294.2
12370.5
8117.6
4629.72
2277.42
77.43
1000
292.27
159.5
2000
3000
Bioinformatic Case
406.8
4000
509.8
5000
ANOTHER RESULT
Autodock Vina 1.1
Speed up for 8 cpu is 7.25
The Scripps Research Institute
running time (menit)
521.85
Autodock 4.2
8.41
1.16
Vina 1.1 (1 cpu)
Vina 1.1 (8 cpu)
HASTINAPURA CLUSTER PERFORMANCE
ANALYSIS USING GROMACS
Execution Time Based on Processor
No
Time Step
1
1
2
3
200ps
1d:00h:28m:16s
12h:29m:01s
9h:37m:00s
5h:33m:27s
400ps
2d:02h:15m:59s
1d:00h:35m:07s
19h:12m:38s
12h:00m:06s
3
600ps
3d:05h:36m:52s
1d:11h:52m:40s
1d:05h:24m:26s
19h:59m:36s
4
800ps
4d:10h:05m:20s
2d:01h:39m:51s
1d13h04:45s
1d:01h:01m:45s
1000ps
5d:13h:37m:29s
1d19h39:35s
1d:05h:28m:02s
2
5
2d12h:04m:00s
4
5
PHARMACY CLUSTER PERFORMANCE
ANALYSIS USING GROMACS
Time Based on Prosessor
No
Time
1
2
3
4
5
1
200ps
13h:37m:38s
7h:23m:47s
5h:32m:34s
4h:26m:20s
3h:38m:48s
2
400ps
1d:03h10m:06s
14h:44m:02s
11h:01m:38s
8h:41m:15s
7h:16m:42s
3
600ps
1d:16h:22m:34s
22h:04m:25s
16h:40m:14s
13h:17m:38s
10h:55m:54s
4
800ps
2d:06h:52m:48s
1d:03h:02:m46s
22h:11m:54s
17h:46m:35s
14h:35m:29s
5
1000ps
2d:21h:22m:57
1d:13h:00m:25s
1d:03h:41m:49s
22h:06m:03s
18h:09m:47s
CLUSTER HASTINAPURA PERFORMANCE
ANALYSIS
Time(S)
Hastinapura Cluster performance
600000
400000
200000
0
200ps
400ps
1 Processor
2 Processor
600ps
3 Processor
800ps
4 Processor
1000ps
5 Processor
Pharmacy cluster performance
Time(S)
300000
200000
100000
0
200ps
400ps
1 Processor
2 Processor
600ps
3 Processor
800ps
4 Processor
5 Processor
1000ps
CLUSTER HASTINAPURA SPEED UP
Speed Up Cluster Hastinapura
Speed-Up (x)
6
4
2
0
200ps
400ps
1 Processor
600ps
2 Processor
3 Processor
800ps
4 Processor
1000ps
5 Processor
Speed-Up Cluster Farmasi
6
Speed-Up (x)
4
2
0
200ps
400ps
1 Processor
2 Processor
600ps
3 Processor
800ps
4 Processor
1000ps
5 Processor
CHALLENGES
• Unreliable electricity supplies
• Relies on grant fund which leads to other negatives effects such as,
– Most Indonesian funding resources do not allow hardware (computers) investment (only spare
parts are allowed  )
– Permanent human resources that manage the Grid,
– Maintenance of the grid to adapt with current technology development.
• Many organization are “very protective” to their computing resources, only a few
are willing to share them.
• Only few (may one or two) faculties teach cluster, cloud and grid Computing. So
only few master and understand them.
• A limited cluster computing nodes/workers (maximum used 22 were available), in
order to have a reliable results more than 100 nodes are needed.
38
PROSPECTS
• More people are becoming interested in shared computing facilities,
• Many free of charge grid development tools are available,
• Considering GP GPU for the next computing environment,
• Develop a strong unit that capable building the Grid infrastructure, but it needs
commitment and dedication from at least university level and government, or
• Perhaps Cloud computing is the alternative solution in one way, however
……….
• The internet connection is still not reliable and the cloud itself has some
challenges
39
CLOUD COMPUTING CHALLENGES: DEALING WITH TOO MANY
ISSUES [REF BUYYA]
Scalability
Reliability
Billing
Utility & Risk
Management
Programming Env.
& Application Dev.
Uhm, I am not quite
clear…Yet another
complex IT paradigm?
Software Eng.
Complexity
40
REFERENCES
1. Luebke, David, The Democratization of Parallel computing: High Performance Computing with CUDA, the
International Conference for High Performance Computing, Networking, Storage and Analysis, 2007,
http://sc07.supercomputing.org/
2. de Vries, A.H., A. E. Mark, and S. J. Marrink Molecular Dynamics Simulation of the Spontaneous Formation of a Small
DPPC Vesicle in Water in Atomistic Detail, J. Am. Chem. Soc. 2004, 126, 4488-448
3. Buck, Ian, Cuda Programming, the International Conference for High Performance Computing, Networking, Storage
and Analysis, 2007, http://sc07.supercomputing.org/
4. Fatica, Massimiliano, CUDA Libraries, the International Conference for High Performance Computing, Networking,
Storage and Analysis, 2007, http://sc07.supercomputing.org/
5.
Cuda Medicine, Aplikasi Medicine, http://www.nvidia.co.uk/object/cuda_medical_uk.html [akses 13 Feb 2010]
6.
de Vries, A.H., A. E. Mark, and S. J. Marrink Molecular Dynamics Simulation of the Spontaneous Formation of a Small DPPC Vesicle in Water in Atomistic
Detail, J. Am. Chem. Soc. 2004, 126, 4488-448
7. Karplus, M. & J. Kuriyan. Molecular Dynamics and Protein Function. PNAS, 2005. 102 (19): 6679-6685
8. Spoel DVD, Erick L, Berk H. Gerit G, Alan EmM & Herman JCB., Gomacs: Fast, Flexible and Free., J. Comput Chem,
2005, 26(16): 1701-1707
9. Adcock SA dan JA McCammon. Molecular Dynamics: Survey Methods for Simulating tha Activity of Protein. Chem Rev
2006. 105(5):1589-1615
10. Correll,RN., Pang C, Niedowicz, DM, Finlin, BS and. Andres, DA., The RGK family of GTP-binding Proteins: Regulators
of Voltage-dependent Calcium Channels and Cytoskeleton Remodeling
11. Kutzner, C, D. Van Der Spoel, M Fechner, E Lindahl, U W. Schmitt, B L. De Groot, H Grubmüller, Speeding up parallel
GROMACS on high-latency networks J. Comp. Chem. 2007. 28(12): 2075-2084
CLOSING STATEMENTS
• Thanks for inviting us to this meeting,
• Thanks to Indonesian Ministry of Research and Technology for2009 –
2010 the research grant,
• Thank you for listening to our talk and providing your suggestions