Project (Protein) Science Mission

Transcription

Project (Protein) Science Mission
IBM Research
Project (Protein) Science Mission
Robert S. Germain
Biomolecular Dynamics & Scalable Modeling
http://www.research.ibm.com/bluegene
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Outline
ƒ Science Overview
ƒ Application
ƒ Planning infrastructure for
BGW to support protein
science mission
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
December 1999:
IBM Announces $100 Million Research Initiative to
build World's Fastest Supercomputer
"Blue Gene" to Tackle Protein Folding Grand Challenge
YORKTOWN HEIGHTS, NY, December 6, 1999 -- IBM today announced a new $100 million
exploratory research initiative to build a supercomputer 500 times more powerful than the world’s
fastest computers today. The new computer -- nicknamed "Blue Gene" by IBM researchers -- will be
capable of more than one quadrillion operations per second (one petaflop). This level of performance
will make Blue Gene 1,000 times more powerful than the Deep Blue machine that beat world chess
champion Garry Kasparov in 1997, and about 2 million times more powerful than today's top desktop
PCs.
Blue Gene's massive computing power will initially be used to model the folding of human proteins,
making this fundamental study of biology the company's first computing "grand challenge" since the
Deep Blue experiment. Learning more about how proteins fold is expected to give medical
researchers better understanding of diseases, as well as potential cures.
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Blue Gene Science Mission
ƒ Advance our understanding of biologically important
processes via simulation, in particular the
mechanisms behind protein folding
ƒ Current Activities include:
– Thermodynamic & kinetic studies of model
peptide systems
– Structural and dynamical studies of membrane and
membrane/protein systems
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Time Scales: Biopolymers and Membranes
Simulation
10-15
|
Experiment
10-12
|
10-9
10-6
|
|
10-3
|
1
|
103
|
106
109
|
|
Bond Vibration
DNA Twisting
Hinge Motion
Helix-Coil Transition
Protein Folding
Lipid exchange via diffusion
Ligand-Protein Binding
Torsional correlation in lipid headgroups
Electron Transfer
Adapted from “The Protein Folding Problem”, Chan and Dill, Physics Today, Feb. 1993
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Year Investigators
Comp.
Platform
System
Time Scale
1955 Fermi, Pasta,
Ulam
MANIAC
64 particle chain with nonlinear interactions
50,000
cycles
Hard spheres
1959 Alder &
Wainwright
1964 Rahman
CDC 3600
Lennard-Jones liquid
(argon), 864 particles
1971 Rahman &
Stillinger
Water, 216 molecules
1977 McCammon,
Gelin et al.
BPTI
10 ps
1998 Sheinerman &
Brooks
Cray T3D,
T3E,C-90
Segment B1 of Protein G
100 ps
1998 Duan & Kollman
Cray T3E
Villin headpiece (36
residues) in 3000 water
molecules
1 sec
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
The science plan – a spectrum of projects
1L2Y
1EOM
1LE1
ƒ systematically cover a range of system sizes, topological complexity
1FME
– discovering the "rules" of folding
– applying those rules to have impact on disease
1ENH
ƒ address a broad range of scientific questions and impact areas:
– thermodynamics
– folding kinetics
– folding-related disease (CF, Alzheimer's, GPCR's)
ƒ improve our understanding not just of protein folding but protein function
1LMB
1BBL
GPCR in membrane
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
How can we use large scale computational resources?
ƒ Capability
– Increase time scales
probed (strong scaling)
– Increase system size
studied
ƒ Capacity
– Improve sampling to reduce
statistical uncertainties
– Run large ensembles of
trajectories
ƒ Make contact with
experiment
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Protein Folding Simulations
ƒ Initial thermodynamic studies
– 32-64 replicas running on SP2 (one node/replica)
– nanosecond-scale trajectories in each replica
– small systems (~5000 atoms)
ƒ Future studies:
– 64-128 replicas running on BG/L (8-512 nodes/replica Æ 64
rack simulations possible)
– tens of nanoseconds or more/replica
– Larger systems (20,000+ atoms)
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
-hairpin Simulation
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Free Energy Landscape of Beta Hairpin (PNAS 2001)
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
“trp-cage” folding (PNAS 2003)
ƒ Small 20 amino acid
miniprotein
ƒ Simulations started
from a completely
unfolded state
ƒ Simulations could
reproduce & explain
sequencedependent folding
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Membrane Proteins
ƒ Membrane processes enable:
– cell signal detection, ion and nutrient transport
– infection processes target specific membranes
– Over 50% of drug discovery research targets are membrane
proteins
Experiment and simulation
play a concerted role in
understanding membrane
biophysics
Simulation can be
validated by
experiment
Blue Gene Science
Simulation can then help to interpret experiment
May 11, 2005
© 2005 IBM Corporation
IBM Research
GPCR-based drugs among the 200 best-selling prescriptions,
and their GPCR targets
GPCR target
Drug
Company
2000 sales(US $m)
AstraZeneca
870
Merck
850
Schering-Plough
2,200
Aventia
1,100
Psychosis
Johnson & Johnson
1,600
Imitrex
Migraine
GlaxoSmithKline
1,100
BuSpar
Anxiety
Bristol-Myers Squibb
714
Zyprexa
Schizophrenia
Eli Lilly
2,400
Merck
1,700
AstraZeneca
580
Congestive heart failure
GlaxoSmithKline
250
Serevent
Asthma
GlaxoSmithKline
940
Muscarinic acetylcholine
receptors
Atrovent
COPD
Boehringer
Ingelheim
600
GnRH receptors
Zoladex
Cancer
AstraZeneca
740
Dopamine receptors
Requip
Parkinson’s diseases
AstraZeneca
90
Prostaglandin (PGE1)
receptors
Cytotec
Ulcers
Pharmacia
100
ADP receptors
Plavix
Stroke
Bristol-Myers Squibb
900
Zantac
Histamine receptors
Pepcid
Claritin
Allegra
Risperdal
5-HT receptors
Angiotensin receptors
Disease
Ulcers
Allergies
Cozaar
Hypertension
Toprol-XL
Adrenoceptors
Blue Gene Science
Coreg
May 11, 2005
http://www.predixpharm.com/market_table.htm
© 2005 IBM Corporation
IBM Research
Rhodopsin and the Eye
Light sensitive
Outer segment
Protein
of Rod
Retina
http://www2.mrc-lmb.cam.ac.uk/groups/GS/eye.html
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Overview of
Blue Gene Membrane Protein Studies
ƒ
SOPE
– Extensive hydrogen bonding network with headgroups
– Excellent agreement with experiment for both structural
and dynamic properties
ƒ
3:1 SDPC/Cholesterol
– Cholesterol induces dramatic lateral organization
– Cholesterol shows preference of STEA over DHA
– Significant Angular anisotropy of Cholesterol Environment
ƒ
GPCR in a membrane environment
– Rhodopsin with 2:2:1 SDPC/SDPE/CHOL
– 100 ns cis-retinal - 200+ ns trans-retinal
– Current production rate ~5 hrs / ns on 1024 nodes BG/L
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Current Simulations of Rhodopsin in Membrane
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Rhodopsin in 2:2:1 SDPE/SDPC/Cholesterol after 120ns
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Simulations of Membranes & Membrane-bound proteins
ƒ Lipid bilayers (12,000 atoms) -- 10-30ns
ƒ Lipid bilayers with cholesterol – 10-30ns
ƒ Rhodopsin in lipid bilayer with cholesterol (44,000
atoms) – goal of microsecond-scale simulation
ƒ Simulations of larger functional units might
involve 100,000+ atoms
ƒ Parameterization of coarse-grained models
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Usage
Monitoring
Restart
Simulation
Problem set-up:
starting structure
generation,
force field
assignment
Blue Gene Science
Post-processing:
analysis,
visualization,
....
May 11, 2005
© 2005 IBM Corporation
IBM Research
Blue Matter – Application Platform for BG Protein Science
ƒ Architected from the ground up to track hardware architecture and incorporate
“state-of-the-art” methods
ƒ Prototype platform for exploration of application frameworks suitable for
cellular architecture machines
ƒ Blue Matter comprises all the necessary application components—those that
run on BG/L and those that run on the host systems (offload function to host
whenever possible)
ƒ Protein folding simulations at SC2003 within 4 months of first hw
ƒ Limited production science runs since May 2004, published work in early 2005
Setup
MD Core (massively parallel, minimal in size)
Monitoring & analysis
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
What Limits the Scalability of MD?
ƒ Inherent limitations on concurrency:
– Bonded force evaluation
* Represents only small fraction of computation, can be distributed moderately well.
– Real space non-bond force evaluation
* Large fraction of computation, but good distribution can be achieved using volume or
interaction decompositions.
– Reciprocal space contribution to force evaluation for Ewald/P3ME
* P3ME uses 3D FFT with global communication (global data dependencies)
* Ewald with direct evaluation uses floating point reduction
ƒ Load balancing
ƒ Hardware and software overheads
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Scalability Results for Blue Matter
1
91K atom Factor IX
51K atom Mini FBP
43K atom Rhodopsin
Elapsed Time (seconds)
23K atom DHFR
0.1
0.01
0.001
10
100
1000
10000
Node Count
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Blue Matter on BG/L vs. NAMD on PSC Lemieux
10
NAMD ApoA1 on BG/L (MPI) PME every step
NAMD ApoA1 on Lemieux (Elan/Quadrics)
Blue Matter Factor IX on BG/L (MPI)
Elapsed Time (seconds)
1
0.1
0.01
0.001
1
10
100
1000
10000
Node/CPU Count
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
3D-FFT Performance on MPI vs. BG/L Advanced
Diagnostic Environment
0.01
$128^3$ MPI
$128^3$ Blade Single Core
$64^3$ MPI
Elapsed Time (seconds)
$64^3$ Blade Single Core
0.001
0.0001
100
1000
10000
100000
Node Count
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Blue Matter Dataflow
Hardware
Platform
Description
MD UDF
Registry
Blue Matter
Molecular System
Spec
MSD.cpp
File
Blue Matter
Parallel Application
Kernel on BG/W
XML
Filter
DVS
File
C++ Compiler
Blue Matter
Start Process
Blue Matter Runtime
User Interface
RTP
File
Probspec
Db2
Restart Process
(Checkpoint)
Reliable
Datagrams
DVS
Filter
RegressionTest Driver Scripts
Raw Data
Management
System
(UDP/IP-based)
Long Term
Archival
Storage
System
Online
Analysis/
Monitoring/
Visualization
Blue Gene Science
Molecular
System
Setup:
CHARMM,
IMPACT,
AMBER,
etc.
Offline
Analysis
Visualization
May 11, 2005
© 2005 IBM Corporation
IBM Research
Blue Matter Performance on 43K Atom Rhodopsin
Node Count px
16
128
512
1024
2048
4096
4096
py
4
8
8
16
16
32
16
Blue Gene Science
pz
2
4
8
8
16
16
16
2
4
8
8
8
8
16
Time/Time-step
Computation Rate (ns/day)
Dual Core Single Core Dual Core
Single Core
0.3646
0.4471
0.47
0.39
0.09113
0.13215
1.90
1.31
0.02527
0.03172
6.84
5.45
0.01845
0.02057
9.37
8.40
0.01022
0.0137
16.91
12.61
0.0135
0.0157
12.80
11.01
0.00896
0.01035
19.29
16.70
May 11, 2005
© 2005 IBM Corporation
IBM Research
Data Funnel to Tape
40 GB/sec. bandwidth from BG/L
Science Host
Hierarchical
Storage
0.5GB/sec
12 x 3592
tape drives
Managment
Tape Archive
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Data Rates for Molecular Dynamics
Node Count
Snapshot Period (ts) Data Rate (MB/sec.) Data Rate (GB/day)
512
1
78.78
6646.69
20480
1
3151.03
265867.77
20480
10
315.10
26586.78
20480
100
31.51
2658.68
1000
20480
3.15
265.87
20480
10000
0.32
26.59
20480
100000
0.03
2.66
Data rates extrapolated from measured 512 node results on 43K atom
Rhodopsin system
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Outreach Activities
ƒ Blue Gene Protein Science Workshops
– San Diego 2001
– Edinburgh 2002
– Brookhaven 2003
ƒ Blue Gene Seminar Series
– Over 40 external speakers since inception (2000)
ƒ Collaborations
– Bruce Berne (Columbia), Scott Feller (Wabash), Martin
Grubele (UIUC), Klaus Gawrisch (NIH), Vijay Pande (Stanford),
Ken Dill (UCSF), Teresa Head-Gordon (UC Berkeley), Jeff
Madura (Duquesne), Hans Andersen (Stanford)
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation
IBM Research
Acknowledgements
ƒ Alex Balaeff
ƒ Mike Pitman
ƒ Bruce Berne
ƒ Alex Rayshubskiy
ƒ Maria Eleftheriou
ƒ Yuk Sham
ƒ Scott Feller
ƒ Frank Suits
ƒ Blake Fitch
ƒ Klaus Gawrisch
ƒ Alan Grossfield
ƒ Jed Pitera
ƒ Blue Gene Hardware and
System Software teams,
particularly Mark Giampapa
Blue Gene Science
ƒ Bill Swope
ƒ Chris Ward
ƒ Yuri Zhestkov
ƒ Ruhong Zhou
ƒ Facilities & BGW Support Staff
May 11, 2005
© 2005 IBM Corporation
IBM Research
Selected Publications
ƒ
Role of Cholesterol and Polyunsaturated Chains in Lipid-Protein Interactions: Molecular Dynamics Simulation
of Rhodopsin in a Realistic Membrane Environment J. Am. Chem. Soc.; 2005; 127(13) pp 4576 - 4577
ƒ
Molecular-Level Organization of Saturated and Polyunsaturated Fatty Acids in a Phosphatidylcholine Bilayer
Containing Cholesterol; Biochemistry 43(49); 2004; 15318-15328
ƒ
Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory; The Journal of Physical
Chemistry B; 2004; 108(21); 6571-6581
ƒ
Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 2. Example Applications to Alanine
Dipeptide and a beta-Hairpin Peptide; The Journal of Physical Chemistry B; 2004; 108(21); 6582-6594
ƒ
Understanding folding and design: Replica-exchange simulations of "Trp-cage" miniproteins, PNAS USA, Vol.
100, Issue 13, June 24, 2003, pp. 7587-7592
ƒ
Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?,
Proc. Natl. Acad. Sci. USA, Vol. 99, Issue 20, October 1, 2002, pp. 12777-12782
ƒ
The free energy landscape for beta-hairpin folding in explicit water, Proc. Natl. Acad. Sci. USA, Vol. 98, Issue
26, December 18, 2001, pp. 14931-14936
Blue Gene Science
May 11, 2005
© 2005 IBM Corporation