Project (Protein) Science Mission
Transcription
Project (Protein) Science Mission
IBM Research Project (Protein) Science Mission Robert S. Germain Biomolecular Dynamics & Scalable Modeling http://www.research.ibm.com/bluegene Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Outline Science Overview Application Planning infrastructure for BGW to support protein science mission Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research December 1999: IBM Announces $100 Million Research Initiative to build World's Fastest Supercomputer "Blue Gene" to Tackle Protein Folding Grand Challenge YORKTOWN HEIGHTS, NY, December 6, 1999 -- IBM today announced a new $100 million exploratory research initiative to build a supercomputer 500 times more powerful than the world’s fastest computers today. The new computer -- nicknamed "Blue Gene" by IBM researchers -- will be capable of more than one quadrillion operations per second (one petaflop). This level of performance will make Blue Gene 1,000 times more powerful than the Deep Blue machine that beat world chess champion Garry Kasparov in 1997, and about 2 million times more powerful than today's top desktop PCs. Blue Gene's massive computing power will initially be used to model the folding of human proteins, making this fundamental study of biology the company's first computing "grand challenge" since the Deep Blue experiment. Learning more about how proteins fold is expected to give medical researchers better understanding of diseases, as well as potential cures. Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Blue Gene Science Mission Advance our understanding of biologically important processes via simulation, in particular the mechanisms behind protein folding Current Activities include: – Thermodynamic & kinetic studies of model peptide systems – Structural and dynamical studies of membrane and membrane/protein systems Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Time Scales: Biopolymers and Membranes Simulation 10-15 | Experiment 10-12 | 10-9 10-6 | | 10-3 | 1 | 103 | 106 109 | | Bond Vibration DNA Twisting Hinge Motion Helix-Coil Transition Protein Folding Lipid exchange via diffusion Ligand-Protein Binding Torsional correlation in lipid headgroups Electron Transfer Adapted from “The Protein Folding Problem”, Chan and Dill, Physics Today, Feb. 1993 Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Year Investigators Comp. Platform System Time Scale 1955 Fermi, Pasta, Ulam MANIAC 64 particle chain with nonlinear interactions 50,000 cycles Hard spheres 1959 Alder & Wainwright 1964 Rahman CDC 3600 Lennard-Jones liquid (argon), 864 particles 1971 Rahman & Stillinger Water, 216 molecules 1977 McCammon, Gelin et al. BPTI 10 ps 1998 Sheinerman & Brooks Cray T3D, T3E,C-90 Segment B1 of Protein G 100 ps 1998 Duan & Kollman Cray T3E Villin headpiece (36 residues) in 3000 water molecules 1 sec Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research The science plan – a spectrum of projects 1L2Y 1EOM 1LE1 systematically cover a range of system sizes, topological complexity 1FME – discovering the "rules" of folding – applying those rules to have impact on disease 1ENH address a broad range of scientific questions and impact areas: – thermodynamics – folding kinetics – folding-related disease (CF, Alzheimer's, GPCR's) improve our understanding not just of protein folding but protein function 1LMB 1BBL GPCR in membrane Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research How can we use large scale computational resources? Capability – Increase time scales probed (strong scaling) – Increase system size studied Capacity – Improve sampling to reduce statistical uncertainties – Run large ensembles of trajectories Make contact with experiment Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Protein Folding Simulations Initial thermodynamic studies – 32-64 replicas running on SP2 (one node/replica) – nanosecond-scale trajectories in each replica – small systems (~5000 atoms) Future studies: – 64-128 replicas running on BG/L (8-512 nodes/replica Æ 64 rack simulations possible) – tens of nanoseconds or more/replica – Larger systems (20,000+ atoms) Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research -hairpin Simulation Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Free Energy Landscape of Beta Hairpin (PNAS 2001) Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research “trp-cage” folding (PNAS 2003) Small 20 amino acid miniprotein Simulations started from a completely unfolded state Simulations could reproduce & explain sequencedependent folding Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Membrane Proteins Membrane processes enable: – cell signal detection, ion and nutrient transport – infection processes target specific membranes – Over 50% of drug discovery research targets are membrane proteins Experiment and simulation play a concerted role in understanding membrane biophysics Simulation can be validated by experiment Blue Gene Science Simulation can then help to interpret experiment May 11, 2005 © 2005 IBM Corporation IBM Research GPCR-based drugs among the 200 best-selling prescriptions, and their GPCR targets GPCR target Drug Company 2000 sales(US $m) AstraZeneca 870 Merck 850 Schering-Plough 2,200 Aventia 1,100 Psychosis Johnson & Johnson 1,600 Imitrex Migraine GlaxoSmithKline 1,100 BuSpar Anxiety Bristol-Myers Squibb 714 Zyprexa Schizophrenia Eli Lilly 2,400 Merck 1,700 AstraZeneca 580 Congestive heart failure GlaxoSmithKline 250 Serevent Asthma GlaxoSmithKline 940 Muscarinic acetylcholine receptors Atrovent COPD Boehringer Ingelheim 600 GnRH receptors Zoladex Cancer AstraZeneca 740 Dopamine receptors Requip Parkinson’s diseases AstraZeneca 90 Prostaglandin (PGE1) receptors Cytotec Ulcers Pharmacia 100 ADP receptors Plavix Stroke Bristol-Myers Squibb 900 Zantac Histamine receptors Pepcid Claritin Allegra Risperdal 5-HT receptors Angiotensin receptors Disease Ulcers Allergies Cozaar Hypertension Toprol-XL Adrenoceptors Blue Gene Science Coreg May 11, 2005 http://www.predixpharm.com/market_table.htm © 2005 IBM Corporation IBM Research Rhodopsin and the Eye Light sensitive Outer segment Protein of Rod Retina http://www2.mrc-lmb.cam.ac.uk/groups/GS/eye.html Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Overview of Blue Gene Membrane Protein Studies SOPE – Extensive hydrogen bonding network with headgroups – Excellent agreement with experiment for both structural and dynamic properties 3:1 SDPC/Cholesterol – Cholesterol induces dramatic lateral organization – Cholesterol shows preference of STEA over DHA – Significant Angular anisotropy of Cholesterol Environment GPCR in a membrane environment – Rhodopsin with 2:2:1 SDPC/SDPE/CHOL – 100 ns cis-retinal - 200+ ns trans-retinal – Current production rate ~5 hrs / ns on 1024 nodes BG/L Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Current Simulations of Rhodopsin in Membrane Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Rhodopsin in 2:2:1 SDPE/SDPC/Cholesterol after 120ns Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Simulations of Membranes & Membrane-bound proteins Lipid bilayers (12,000 atoms) -- 10-30ns Lipid bilayers with cholesterol – 10-30ns Rhodopsin in lipid bilayer with cholesterol (44,000 atoms) – goal of microsecond-scale simulation Simulations of larger functional units might involve 100,000+ atoms Parameterization of coarse-grained models Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Usage Monitoring Restart Simulation Problem set-up: starting structure generation, force field assignment Blue Gene Science Post-processing: analysis, visualization, .... May 11, 2005 © 2005 IBM Corporation IBM Research Blue Matter – Application Platform for BG Protein Science Architected from the ground up to track hardware architecture and incorporate “state-of-the-art” methods Prototype platform for exploration of application frameworks suitable for cellular architecture machines Blue Matter comprises all the necessary application components—those that run on BG/L and those that run on the host systems (offload function to host whenever possible) Protein folding simulations at SC2003 within 4 months of first hw Limited production science runs since May 2004, published work in early 2005 Setup MD Core (massively parallel, minimal in size) Monitoring & analysis Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research What Limits the Scalability of MD? Inherent limitations on concurrency: – Bonded force evaluation * Represents only small fraction of computation, can be distributed moderately well. – Real space non-bond force evaluation * Large fraction of computation, but good distribution can be achieved using volume or interaction decompositions. – Reciprocal space contribution to force evaluation for Ewald/P3ME * P3ME uses 3D FFT with global communication (global data dependencies) * Ewald with direct evaluation uses floating point reduction Load balancing Hardware and software overheads Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Scalability Results for Blue Matter 1 91K atom Factor IX 51K atom Mini FBP 43K atom Rhodopsin Elapsed Time (seconds) 23K atom DHFR 0.1 0.01 0.001 10 100 1000 10000 Node Count Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Blue Matter on BG/L vs. NAMD on PSC Lemieux 10 NAMD ApoA1 on BG/L (MPI) PME every step NAMD ApoA1 on Lemieux (Elan/Quadrics) Blue Matter Factor IX on BG/L (MPI) Elapsed Time (seconds) 1 0.1 0.01 0.001 1 10 100 1000 10000 Node/CPU Count Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research 3D-FFT Performance on MPI vs. BG/L Advanced Diagnostic Environment 0.01 $128^3$ MPI $128^3$ Blade Single Core $64^3$ MPI Elapsed Time (seconds) $64^3$ Blade Single Core 0.001 0.0001 100 1000 10000 100000 Node Count Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Blue Matter Dataflow Hardware Platform Description MD UDF Registry Blue Matter Molecular System Spec MSD.cpp File Blue Matter Parallel Application Kernel on BG/W XML Filter DVS File C++ Compiler Blue Matter Start Process Blue Matter Runtime User Interface RTP File Probspec Db2 Restart Process (Checkpoint) Reliable Datagrams DVS Filter RegressionTest Driver Scripts Raw Data Management System (UDP/IP-based) Long Term Archival Storage System Online Analysis/ Monitoring/ Visualization Blue Gene Science Molecular System Setup: CHARMM, IMPACT, AMBER, etc. Offline Analysis Visualization May 11, 2005 © 2005 IBM Corporation IBM Research Blue Matter Performance on 43K Atom Rhodopsin Node Count px 16 128 512 1024 2048 4096 4096 py 4 8 8 16 16 32 16 Blue Gene Science pz 2 4 8 8 16 16 16 2 4 8 8 8 8 16 Time/Time-step Computation Rate (ns/day) Dual Core Single Core Dual Core Single Core 0.3646 0.4471 0.47 0.39 0.09113 0.13215 1.90 1.31 0.02527 0.03172 6.84 5.45 0.01845 0.02057 9.37 8.40 0.01022 0.0137 16.91 12.61 0.0135 0.0157 12.80 11.01 0.00896 0.01035 19.29 16.70 May 11, 2005 © 2005 IBM Corporation IBM Research Data Funnel to Tape 40 GB/sec. bandwidth from BG/L Science Host Hierarchical Storage 0.5GB/sec 12 x 3592 tape drives Managment Tape Archive Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Data Rates for Molecular Dynamics Node Count Snapshot Period (ts) Data Rate (MB/sec.) Data Rate (GB/day) 512 1 78.78 6646.69 20480 1 3151.03 265867.77 20480 10 315.10 26586.78 20480 100 31.51 2658.68 1000 20480 3.15 265.87 20480 10000 0.32 26.59 20480 100000 0.03 2.66 Data rates extrapolated from measured 512 node results on 43K atom Rhodopsin system Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Outreach Activities Blue Gene Protein Science Workshops – San Diego 2001 – Edinburgh 2002 – Brookhaven 2003 Blue Gene Seminar Series – Over 40 external speakers since inception (2000) Collaborations – Bruce Berne (Columbia), Scott Feller (Wabash), Martin Grubele (UIUC), Klaus Gawrisch (NIH), Vijay Pande (Stanford), Ken Dill (UCSF), Teresa Head-Gordon (UC Berkeley), Jeff Madura (Duquesne), Hans Andersen (Stanford) Blue Gene Science May 11, 2005 © 2005 IBM Corporation IBM Research Acknowledgements Alex Balaeff Mike Pitman Bruce Berne Alex Rayshubskiy Maria Eleftheriou Yuk Sham Scott Feller Frank Suits Blake Fitch Klaus Gawrisch Alan Grossfield Jed Pitera Blue Gene Hardware and System Software teams, particularly Mark Giampapa Blue Gene Science Bill Swope Chris Ward Yuri Zhestkov Ruhong Zhou Facilities & BGW Support Staff May 11, 2005 © 2005 IBM Corporation IBM Research Selected Publications Role of Cholesterol and Polyunsaturated Chains in Lipid-Protein Interactions: Molecular Dynamics Simulation of Rhodopsin in a Realistic Membrane Environment J. Am. Chem. Soc.; 2005; 127(13) pp 4576 - 4577 Molecular-Level Organization of Saturated and Polyunsaturated Fatty Acids in a Phosphatidylcholine Bilayer Containing Cholesterol; Biochemistry 43(49); 2004; 15318-15328 Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory; The Journal of Physical Chemistry B; 2004; 108(21); 6571-6581 Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 2. Example Applications to Alanine Dipeptide and a beta-Hairpin Peptide; The Journal of Physical Chemistry B; 2004; 108(21); 6582-6594 Understanding folding and design: Replica-exchange simulations of "Trp-cage" miniproteins, PNAS USA, Vol. 100, Issue 13, June 24, 2003, pp. 7587-7592 Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?, Proc. Natl. Acad. Sci. USA, Vol. 99, Issue 20, October 1, 2002, pp. 12777-12782 The free energy landscape for beta-hairpin folding in explicit water, Proc. Natl. Acad. Sci. USA, Vol. 98, Issue 26, December 18, 2001, pp. 14931-14936 Blue Gene Science May 11, 2005 © 2005 IBM Corporation