TU - Physik-Department

Transcription

TU München
Physik-Department
AG Molekulardynamik (T38)
Prof. Dr. Martin Zacharias
Fortgeschrittenen Praktikum - WS 2015/16
Versuch 74 - Molecular dynamics
Betreuung:
Christina Frost ([email protected])
Physik-Department - Physik I - Raum 2073 - 089 / 289-12732
Isaure deBeauchene ([email protected])
Physik-Department - Physik I - Raum 2061 - 089 / 289-12766
1 General information
During this experiment of the ’Blockpraktikum Biophysik’, you will learn the basic theory and application of Molecular Dynamics Simulations. Using this method, you will
create different small peptide/protein systems, simulate them, and analyze the simulation
output.
In a first part, from approx. 9.00h to 10.00h, the tutors will present the general concept of
Molecular Dynamics Simulations, the theory behind it, and some details about programs
that are used throughout this course.
The second part (approx. from 10.00h to 12.00h), will take place at workstations of the lab
and you will be introduced to the AMBER simulation package [1] by working through a
short tutorial. After this tutorial is finished, every student will set up a small peptide folding simulation (as advised in the manual) which will finish simulating during lunch time
(12.00h to 14.00h).
After lunch break, the results of the peptide simulation will be analyzed and a second
simulation (investigating protein-ligand dissociation) will be set up and analyzed.
During the whole lab course, the tutors will be present to assist with problems.
Please be at room PH1-2073
(http://www.physik.tu-muenchen.de/roomfinder.htm?room=2073)
at 9.30 s.t.
2
2 Molecular Dynamics Simulations
In the last decades, the Molecular Dynamics technique has evolved to a versatile method
to study molecules in an implicit and explicit water environment at atomic level under the
influence of external forces. It allows insight into dynamical processes at the picosecond
to nanosecond timescale and is thereby complementary to structural data obtained from
X-ray, NMR spectroscopy, or atomic force microscopy experiments. In the following, the
methodology, approximations and limitations of this versatile method will be pointed out.
Approximation 1 – Born–Oppenheimer Approximation:
In 1925, Erwin Schrödinger postulated the equation that describes the time-dependence
of any quantum mechanical system:
Hψ = ih̄
δψ
,
δt
(0.1)
with the Hamiltonian H (total energy of the system), the wave function ψ (the probability
density of all particles in a system), and h̄ =
h
2π
(h is Plank’s constant). Due to their much
lower mass (approx. 1:1836) as compared to the nuclei and thereby higher velocities, electrons are assumed to follow the nuclear motion instantaneously. Hence, it is possible to
separate the variables associated with the nuclei wavefunction from those of the electrons
(Born-Oppenheimer approximation):
ψtot (R, r) = ψnucl (R)ψel (R; r).
(0.2)
where R is the vector containing the coordinates and momenta of the nuclei and r those
of the electrons, respectively. Insertion into the Schrödinger equation yields a timedependent equation for the nuclei and a time-independent Schrödinger equation for the
electrons.
Approximation 2 – Force Fields:
In the second approximation, the electronic potential is approximated by a force field
3
(or potential function) V . Typical force fields treat bio-molecules as extended spheres
(atoms) and sticks (bonds) between atoms: They are composed of non-bonded interactions (Coulomb interactions and Lennard-Jones potentials) and bonded interactions. The
latter typically include harmonic terms for covalent bonds between atoms and for angles
as well as a contribution for rotations around a bond (dihedral terms).
A typical force field is given by:
X
V =
i<j
X
qi qj
+
4π 0 rij
i<j
Bij
Aij
− 6
12
rij
rij
X 1
+
kijb (rij − b0ij )2
2
bonds
X 1
θ
0 2
+
kijk
(θijk − θijk
)
2
angles
X
+
k φ (1 + cos(n(φ − φ0 )))
(0.3)
dihedrals
where qi is the partial charge of atom i, rij denotes the distance between two atoms i and
j, Aij and Bij are Lennard-Jones parameters, k b , k θ , and k φ are bond/angle/dihedral force
constants, n the dihedral multiplicity and b0 , θ0 , and φ0 are equilibrium values for bond
lengths, angles, and dihedral angles.
During the MD simulation setup, a force field may be chosen from the different versions
that have been developed, e.g. AMBER [2], CHARMM [3], OPLS[4], or GROMOS[5].
Approximation 3 – Newton’s second law:
The time-dependent Schrödinger equation for the nuclei positions is approximated by
Newton’s equation of motion (second law of Newton (1687)),
Fi = mi ai
(0.4)
where the force Fi acting on atom i equals the product of the mass of atom i and its respective acceleration i. An efficient numerical double-integration of this equation can be
achieved by using the Verlet algorithm [6] or the Leap-Frog algorithm [7].
The time step size for integration has to be chosen small as compared to the fastest motions in the system e.g. bond vibrations involving hydrogen atoms and is thereby restricted
to less than 1 femtosecond (= 1×10−15 s). Using algorithms as LINCS [8] or SHAKE [9]
allows to enlarge the time step to 2 fs by constraining the bond lengths.
4
Often, one applies periodic boundary conditions within the simulations. Hence, the simulation box is duplicated infinitely in all three dimensions and by this, e.g. a simulation of
a lipid bilayer resembles the experimental situation of multilamellar lipid bilayers.
Because the non-bonded interactions (first two sums in equation 0.3) between atoms are
the computationally most expensive part for the force calculation, a cutoff of usually 1.0
to 1.4 nm is applied for the Lennard-Jones interactions, as these decrease quickly with
increasing distance between two atoms ( r16 ).
Long range electrostatic interactions are treated by the grid-based Ewald summation which
calculates the electrostatic contributions within a cutoff explicitly and the contributions
outside of this cutoff in the reciprocal space. The Particle Mesh Ewald (PME) method
yields another way to enhance the efficiency of Ewald summation by using fast Fourier
transformations from N 2 for the explicit calculation of coulombic interactions to N log(N ),
N being the number of atoms in the system.
Constant temperature during the simulation can be assured by coupling the system to an
external temperature bath with given temperature T0 . This can be achieved e.g. by the
Berendsen thermostat [10]:
T0 − T
dT
=
,
dt
τ
(0.5)
where a deviation of the system temperature T from T0 decays exponentially with the a
time constant τ (typical value 1 ps). The Berendsen coupling scheme was also used to
keep the system’s pressure to a constant value of 1 bar.
However, there are also regulations and limitations for MD-Simulations: Firstly, not all
molecules have yet been adequately parameterized. The second limitation is connected
to the computational effort that is necessary for only one nanosecond of simulation as
mentioned before. Hence, the simulation length is practically restricted to 50 to 100
nanoseconds, dependent on the system size. It is further not possible to describe chemical reactions by the simulation of a system, as the classical force field approximation
cannot describe breakage or formation of covalent bonds. Also, processes at low temperature (less than 10 K) and the dynamics of hydrogen atoms are poorly described due to
the increasing importance of quantum dynamical behaviour at low temperatures or small
masses.
5
3 Introduction to AMBER
Setup of a molecular dynamics simulation of a peptide
This part of the course will make you familiar with the AMBER 9 Simulation suite [1].
Step by step, we will go through a full simulation setup and execution.
Open a terminal and change to the lab tutorial folder named ’Blockpraktikum’
cd Blockpraktikum
Typing the command ’ls’ should show the 3 subfolders needed for this lab tutorial.
Change to the folder ’amber’
cd amber
In this tutorial we will attempt to setup and run an MD simulation of an oligo-alanine peptide in implicit solvent using the Amber9 simulation package (Case et al., 2003, UCSF).
Before we can start the simulation we need to create the following files:
1. Initial coordinates
2. Topology file (describes the molecular mechanics force field of the peptide)
3. Input files for energy minimization and MD run.
Stage 1 - Building the initial coordinates and topology
Initial coordinates can be generated either from a database of experimental structures or
we can generate the initial structures using the Leap module available in the Amber package. Here we use the Leap module to generate the start structure of oligo-alanine. The
sequence is ALA-ALA-ALA-ALA-ALA-ALA (6 alanine amino acid). To protect the
terminal ends we use capped poly alanine peptide, so the sequence is ACE-ALA-ALAALA-ALA-ALA-ALA-NME. Start the leap program and generate the structure, in the
6
command prompt type
xleap
Here, be sure that ’Caps Lock’ on your keyboard is disabled. Now we have to load the
force field parameters
source leaprc.ff03.r1
Use the command sequence to create the structure, type:
protein = sequence {ACE ALA ALA ALA ALA ALA ALA NME}
We have created the structure, now save the coordinates and topology file using the command saveamberparm
saveamberparm protein ala.top ala.crd
quit xleap by typing:
quit
Now in your working directory you will find the two files (ala.top, ala.crd) we generated from xleap.
From this topology and coordinate file we can generate a pdb file which can be used
to visualize the peptide structure. The amber package contains a module called ambpdb
to do this job. In the command prompt type
ambpdb -p ala.top < ala.crd> ala.pdb
Use VMD to visualize the structure
vmd ala.pdb
Stage 2 - Minimising the structure
Before we start running MD we need to perform a short minimization of our starting
structure. The energy minimization removes the steric clashes, if any, and will move the
structure to the nearest local minimum. For this purpose, we need an input file which you
can find in the folder (filename min.in) Look at the file (e.g. using more min.in),
the different settings for the minimzation run are commented.
7
Run the minimization with the MD-executable of AMBER called sander:
sander -O -i min.in -p ala.top -c ala.crd -o alamin.out -r alamin.crd
-O - to overwrite the output files if it exist
-p - flag for topology file
-c - flag for input coordinate file
-o - flag for output file
-r - flag for output coordinate file
-i - flag for the input command file
If the simulation was successful, we have two more files alamin.out and alamin.crd in
the working directory. Look at the output files alamin.out and alamin.crd. In the
command prompt type
more alamin.out or
more alamin.crd
Now convert the output coordinate file into pdb, so that we can visualize the structure
ambpdb -p ala.top < alamin.crd> alamin.pdb
vmd alamin.pdb
Compare the minimized structure to the initial starting structure. What has changed?
Stage 3 - Heating up the system
The next stage is to do the MD run. As we normally do the MD simulations at 300K
(room temperature) we should heat up the system to 300k in a step by step manner. The
heating in stages will equilibrate the system at each temperature. We do it in three steps
like 100K, 200K and 300K and for 5ps in each step.
In your folder, you can find three input files named md1.in, md2.in, and md3.in.
Open and compare these files. What are the differences?
Now run the MD runs subsequently:
MD at 100K
sander -O -i md1.in -p ala.top -c alamin.crd -o alamd1.out -r alamd1.crd -x alamd1.trj
8
MD at 200K
sander -O -i md2.in -p ala.top -c alamd1.crd -o alamd2.out -r alamd2.crd -x alamd2.trj
MD at 300K
sander -O -i md3.in -p ala.top -c alamd2.crd -o alamd3.out -r alamd3.crd -x alamd3.trj
Note that these simulations and the following production run are done in vacuum with distance dependent dielectric constant.
Stage 4 - Production run
Now we run a production MD simulation. Usually one needs to run it for several nanoseconds, but due to time constraints (of this lab course) we run it for only 1 ns. The inputfile
for the production run is called ’mdpr.in’. The machine you are working on has a 4core processor, to use all 4 efficiently and to decrease the real simulation time, we will
use MPI.
Run the simulation using:
mpiexec -n 4 sander.MPI -O -i mdpr.in -p ala.top -c alamd3.crd -o alamdpr.out -r
alamdpr.crd -x alamdpr.trj
This will take around 3 minutes...
Stage 5 - Visualize the trajectory in VMD
vmd -parm7 ala.top -crd alamdpr.trj
If every thing went fine, now you can see the trajectories in the display window.
Choose graphics in the VMD panel
The peptide can be represented in the several drawing methods ( as lines, as ribbons , as
cartoon)
Choose from the list box under drawing method , for example choose cartoon representation, you can see the cartoon representation of the peptide model in the display window.
Describe what is happening throughout the trajectory and which conformational changes
you can observe.
9
Stage 6 - RMSD vs. time plot
Now we calculate the root mean square deviation (RMSD) of the start structure over the
trajectory. We use a module ptraj that is available in Amber 9 package.
The contents of the input file rmsd.in are:
trajin alamdpr.trj
rms first @CA out rmsd.dat time 1.0
Run the ptraj program:
ptraj ala.top < rmsd.in> rmsd.out
Now plot the rmsd data using xmgrace
xmgrace rmsd.dat
The popup window shows the plot of Rmsd vs. time.
Summary of the steps
1. Generate a start structure using the leap module
2. Generate coordinates and topology files
3. Perform energy minimization
4. Run MD simulations
5. Visualize the trajectory in VMD
6. Calculate the RMSD values using ptraj and plot rmsd vs. time using xmgrace.
Additional tasks:
• record the hydrogen bond formation.
• create a Ramachandran plot
• record the end-to-end distance of the peptide
10
4 Folding simulation of Chignolin
The 10 residue chignolin peptide[11] is an example of a small peptide that forms a stable
hairpin type three-dimensional (3D) structure in solution (at 300 K, see Figure 1). Aim of
the MD simulation studies is to simulate the structure formation process starting from a
fully extended unfolded peptide conformation. Note that the outcome can strongly depend
on the starting structure (traps in energy landscapes,...). The simulations are performed
employing an implicit solvent model (Generalized Born model) as implemented in the
Amber simulation package (input parameter igb=1). During the simulations a Langevin
type equation of motion is solved numerically in small steps.
Change to the chignolin-folder by
cd ../chignolin
Like previously learned, use xleap to create the following sequence:
protein = sequence {NGLY TYR ASP PRO GLU THR GLY THR TRP CGLY}
Here, we will use a slightly different born radii from default to enhance the performance:
set default pbradii mbondi2
Again we save the Topology and the coordinates and quit xleap.
saveamberparm protein chi.top chi.crd
Minimize it using the following input file:
mpiexec -n 4 sander.MPI -O -i min.in -p chi.top -c chi.crd -o chimin.out -r chimin.crd
Now, we run the short production simulation
mpiexec -n 4 sander.MPI -O -i mdpr.in -p chi.top -c chimin.crd -o chimdpr-short.out
-r chimdpr-short.crd -x chimdpr-short.trj
This is a shorter simulation than the one we will analyse later. But it can still give us
a good first impression. As this simulation will run for approx. 1h, you can now go to
Lunch-Break.
11
5 Analysis of the chignolin results
It is useful to take a look at the generated trajectory using VMD and compare the sampled
conformations with the starting structure and the reference conformation (folded experimental Chigolin conformation). Prepare a Figure showing the superposition of one of the
final structures and the start structure with the experimental peptide structure.
Prepare an Rmsd plot with respect to the start structure and with respect to the experimental folded structure.
Estimate the fraction of folded conformations during the last 5 ns simulation time.
Prepare Rmsd plots for the central parts of the structure and for the stem part of the hairpin.
Monitor the formation of key H-bonds in VMD and generate plots with the help of ptraj.
Which part forms first? Which part is the most flexible?
Make a Ramachandran plot of residue 2 to 9 (all residues besides the caps).
12
6 Potential of mean force for
ligand-receptor dissociation
The WW-domain (shown in cartoon and ribbon representation) in complex with a
proline-rich ligand peptide (stick representation on top)
WW domains are small protein structures of ∼ 40 amino acids that are subdomains of
many larger proteins. WW domains can fold as isolated units and bind proline rich peptides at a specific binding cleft. Aim of the simulation experiment is to induce dissociation
of a pro-rich peptide from the WW domain and to calculate the free energy change along
a dissociation pathway. The simulations are performed including solvent and surrounding
ions explicitly. Dissociation is achieved along a reaction coordinate (d) defined as the
distance between centers of mass of the WW-domain and a bound peptide. The dissociation is induced in a series of simulations by adding a penalty potential along the reaction
coordinate of the following form (umbrella potential):
Vpenalty (d) = kpenalty × (d − d0 )2
13
(0.6)
The distance d0 corresponds to a reference distance that can be varied between 9 Å (complexed state) and 22 Å (dissociated state). The force constant kpenalty is set to 1 kcal mol-1
Å -1. This allows for fluctuations of the actual distance d around d0 of 1 Å . Dissociation
is induced by running a set of simulations starting with a d0 =9 Å followed by a stepwise
(1 Å ) increase of d0 to 22 Å . During the simulations the actually sampled distance d is
recorded. From the distance distributions it is finally possible to calculate a free energy
change along the reaction coordinate using the Weighted Histogram Analysis Method
(WHAM)[12].
Setup of the simulations using the xleap program
Open the pdb file of the WW-domain in complex with a pro-rich ligand in xleap.
ww = loadpdb ww2.pdb
Using the command
charge ww
we see that the WW-domain is still charged.
This would be problematic and thus we will add one Natrium ion to counter the charge
addions ww Na+ 1
For this small waterbox we only need two Na, and two Cl ions to reach the intracellular
ion concentration (in vitro).
addions ww Na+ 2 Cl- 2
Create a box of 9 Å radius around the protein and solvate it with TIP3P water
solvateoct ww TIP3PBOX 9.0
We will use pregenerated files for the later simlation and analysis and will thus only produce a pdb file.
saveamberparm ww ww-water.top ww-water.crd
savepdb ww ww-water.pdb
The stored protein topology file and structure contain not only parameters and coordinates
of the protein and peptide but also of the surrounding water and ions.
Now, an energy minimization and a series of short MD simulations to equilibrate the system are performed.
14
Use the input file mini.in for the minimization.
mpiexec -n 4 sander.MPI -O -i mini.in -p ww-water.top
-c ww-water.crd -o ww-watermin.out -r ww-watermin.crd
During equilibration the system is heated to 300 K with the protein coordinates restraint
to the initial experimental structure. In a second set of simulations the restraints are gradually removed. We use for the next steps an already pre-equilibrated system (wws0.crd)
and will continue directly with the induced dissociation.
Before we can start the induced peptide dissociation the centers of mass for the WWdomain and for the bound peptide need to be defined in a restraint definition file. For this
purpose, the MD-program needs the atom numbers of those atoms that form each center
of mass. It is best to use the Cα atom of each peptide residue to define one center and the
Cα atom of the protein to define the second center of mass. The format of the restraint
definition file will be provided.
For the peptide dissociation, a series of 14 MD simulations is performed with 14 different reference distances (d0 ). Each simulation will be run for 2000 steps (4 ps) and the
resulting trajectories will be used to visualize the dissociation process. For the quantitative evaluation of the results, a much longer (already pre-calculated) trajectory will be
used. The md-input files for the distance restraining runs are mdrs1.in to mdrs14.in
and can be started running the script prs.script by giving the following command to
the terminal:
./prs.script
The file disN.in contains the group definition and the distance restraining information:
&rest iat=-1, -1,
iresid=0, irstyp=0,ifvari=1,ninc=0, imult=0,ir6=0,ifntyp=0,
r1= 0.000,r2=11.000,r3=11.000,r4=99.000,rk1=1.000,rk2=1.000,
igr1= 150,166,...,0,
igr2= 567,581,...,0,
15
Simulation analysis and potential of mean force for peptide
dissociation
Use the VMD program to visualize the dissociation trajectory and to prepare a movie of
the process.
Which parts of the contacts between peptide and protein are first disrupted? Has the
conformation of peptide and protein changed during dissociation?
Prepare distance distribution curves for each of the 14 MD simulations. Compare the
distributions with the reference distance d0 for each simulation.
Perform a WHAM analysis of the dissociation process to calculate the potential of mean
force or free energy profile along the reaction coordinate.
The WHAM program by A. Grossfield [12] can be used for this purpose with the following command line arguments:
wham 8.0 24.0 100 0.001 300.0
begin end steps
toler.
4
temp a.e.p.
input.txt pmf.out
infile
outfile
The output file first has to be cleaned of invalid lines and columns to analyse it. Do
this with your editor of choice.
Tip: Vi/Vim/Emacs/Kate etc. have a column selection mode.
Kate: Ctrl+Shift+B
Vi/Vim: Ctrl+Alt+V
It is useful to check the Rmsd of the protein and the complex as a function of simulation
stage.
Can we relate the calculated free energy curve to specific dissociation events during the
simulation?
16
Important commands
Generate and setup the structures
xleap
load forcefield parameters
create a sequence
protein = sequence {...}
load a pdb structure
protein = loadpdb xyz.pdb
use different set of born radii
set default pbradii mbondi2
check the charge of the structure
charge protein
add ions to neutralize or simulate a cellular environment
addions protein ion amount
create a box and solvate with water
solvateoct protein TIP3PBOX 9.0
save the topology and coordinates
saveamberparm protein xyz.top xyz.crd
save as pdb
savepdb protein xyz.pdb
close the program xleap
quit
Visualize a strucutre
normal pdb
vmd -pdb xyz.pdb
implicit solvent simulation
vmd -parm7 xyz.top -crd xyz.crd/.trj
explicit solvent simulation
vmd -parm7 xyz.top -crdbox xyz.trj
17
Generate pdbfile from .top and .crd
ambpdb -p xyz.top <xyz.crd> xyz.pdb
Run a simulation
Minimisation
sander -O -i simulation parameter -p topology -c coordinates
-o output log -r new coordinates
Other
sander -O -i simulation parameter -p topology -c coordinates
-o output log -r new coordinates -x new trajectory
for long simulations use
mpiexec -n 4 sander.MPI ...
Analysis
create different infiles and read in the trajectory
trajin trajectory
calculate distances / hbonds
distance dist :residue@atom :residue@atom out filename.dat
time x.y
calculate RMSD
rms reference structure :residue@atom out filename.dat
time x.y
use ptraj to perform the calculations
ptraj xyz.top <infile> output log
Generate plots of the calculated data
xmgrace file1 file2 ...
18
Bibliography
[1] T.E. Cheatham III C.L. Simmerling J. Wang R.E. Duke R. Luo M. Crowley Ross C. Walker W. Zhang K.M.
Merz B.Wang S. Hayik A. Roitberg G. Seabra I. Kolossvry K.F.Wong F. Paesani J. Vanicek X.Wu S.R. Brozell
T. Steinbrecher H. Gohlke L. Yang C. Tan J. Mongan V. Hornak G. Cui D.H. Mathews M.G. Seetin C. Sagui
V. Babin D.A. Case, T.A. Darden and P.A. Kollman. Amber 9. University of California, San Francisco, (16), Dec.
[2] S. J. Weiner, P. A. Kollman, D. A. Case, U. Singh, C. Ghio, G. Alagona, Jr. S. Profeta, and P. Weiner. A new force
field for molecular mechanical simulation of nucleic acids and proteins. JACS, 106:765–784, 1984.
[3] B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus. CHARMM: A
program for macromolecular energy, minimization, and dynamics calculations. J Comp Chem, 4(2):187–217,
1983.
[4] W. L. Jorgensen, D. S. Maxwell, and J. Tirado-Rives. Development and testing of the OPLS all-atom force field
on conformational energetics and properties of organic liquids. JACS, 118:11225–11236, 1996.
[5] W. F. van Gunsteren, S. R. Billeter, A. A. Eising, P. H. Hünenberger, P. Krüger, A. E. Mark, W. R. P. Scott, and
I. G. Tironi. Biomolecular Simulation: The GROMOS96 Manual and User Guide. Vdf Hochschulverlag AG an
der ETH Zürich, Zürich, Switzerland, 1996.
[6] R. E. Gillian and K. R. William. Shading, rare events and rubber bands – a variational verlet algorithm for
molecular dynamics. JCP, 97:1757–1772, 1992.
[7] W. F. van Gunsteren and H. J. C. Berendsen. A leap-frog algorithm for stochastic dynamics. Mol. Sim., 1:173185,
1988.
[8] B. Hess, H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije. LINCS: A linear constraint solver for molecular
simulations. J. Comp. Chemistry, 18:1463–1472, 1997.
[9] Philippe H. Hnenberger Vincent Krutler, Wilfred F. van Gunsteren. A fast shake algorithm to solve distance
constraint equations for small molecules in molecular dynamics simulations. Journal of Computational Chemistry,
22:501–508, 2001.
[10] H. J. C. Berendsen, J. P. M. Postma, W. F. Van Gunsteren, and J. Hermans. Interaction model for water in relation
to protein hydration., pages 331–342. D. Reidel Publishing Company, Dordrecht, The Netherlands, 1981.
[11] Shinya Honda, Kazuhiko Yamasaki, Yoshito Sawada, and Hisayuki Morii. 10 residue folded peptide designed by
segment statistics. Structure, 12(8):1507–1518, Aug 2004.
[12] http://membrane.urmc.rochester.edu/Software/WHAM/WHAM.html.
19

TU - Physik-Department

Transcription

Similar documents

View - Max-Planck-Institut für biophysikalische Chemie