TU - Physik-Department
Transcription
TU - Physik-Department
TU München Physik-Department AG Molekulardynamik (T38) Prof. Dr. Martin Zacharias Fortgeschrittenen Praktikum - WS 2015/16 Versuch 74 - Molecular dynamics Betreuung: Christina Frost ([email protected]) Physik-Department - Physik I - Raum 2073 - 089 / 289-12732 Isaure deBeauchene ([email protected]) Physik-Department - Physik I - Raum 2061 - 089 / 289-12766 1 General information During this experiment of the ’Blockpraktikum Biophysik’, you will learn the basic theory and application of Molecular Dynamics Simulations. Using this method, you will create different small peptide/protein systems, simulate them, and analyze the simulation output. In a first part, from approx. 9.00h to 10.00h, the tutors will present the general concept of Molecular Dynamics Simulations, the theory behind it, and some details about programs that are used throughout this course. The second part (approx. from 10.00h to 12.00h), will take place at workstations of the lab and you will be introduced to the AMBER simulation package [1] by working through a short tutorial. After this tutorial is finished, every student will set up a small peptide folding simulation (as advised in the manual) which will finish simulating during lunch time (12.00h to 14.00h). After lunch break, the results of the peptide simulation will be analyzed and a second simulation (investigating protein-ligand dissociation) will be set up and analyzed. During the whole lab course, the tutors will be present to assist with problems. Please be at room PH1-2073 (http://www.physik.tu-muenchen.de/roomfinder.htm?room=2073) at 9.30 s.t. 2 2 Molecular Dynamics Simulations In the last decades, the Molecular Dynamics technique has evolved to a versatile method to study molecules in an implicit and explicit water environment at atomic level under the influence of external forces. It allows insight into dynamical processes at the picosecond to nanosecond timescale and is thereby complementary to structural data obtained from X-ray, NMR spectroscopy, or atomic force microscopy experiments. In the following, the methodology, approximations and limitations of this versatile method will be pointed out. Approximation 1 – Born–Oppenheimer Approximation: In 1925, Erwin Schrödinger postulated the equation that describes the time-dependence of any quantum mechanical system: Hψ = ih̄ δψ , δt (0.1) with the Hamiltonian H (total energy of the system), the wave function ψ (the probability density of all particles in a system), and h̄ = h 2π (h is Plank’s constant). Due to their much lower mass (approx. 1:1836) as compared to the nuclei and thereby higher velocities, electrons are assumed to follow the nuclear motion instantaneously. Hence, it is possible to separate the variables associated with the nuclei wavefunction from those of the electrons (Born-Oppenheimer approximation): ψtot (R, r) = ψnucl (R)ψel (R; r). (0.2) where R is the vector containing the coordinates and momenta of the nuclei and r those of the electrons, respectively. Insertion into the Schrödinger equation yields a timedependent equation for the nuclei and a time-independent Schrödinger equation for the electrons. Approximation 2 – Force Fields: In the second approximation, the electronic potential is approximated by a force field 3 (or potential function) V . Typical force fields treat bio-molecules as extended spheres (atoms) and sticks (bonds) between atoms: They are composed of non-bonded interactions (Coulomb interactions and Lennard-Jones potentials) and bonded interactions. The latter typically include harmonic terms for covalent bonds between atoms and for angles as well as a contribution for rotations around a bond (dihedral terms). A typical force field is given by: X V = i<j X qi qj + 4π 0 rij i<j Bij Aij − 6 12 rij rij X 1 + kijb (rij − b0ij )2 2 bonds X 1 θ 0 2 + kijk (θijk − θijk ) 2 angles X + k φ (1 + cos(n(φ − φ0 ))) (0.3) dihedrals where qi is the partial charge of atom i, rij denotes the distance between two atoms i and j, Aij and Bij are Lennard-Jones parameters, k b , k θ , and k φ are bond/angle/dihedral force constants, n the dihedral multiplicity and b0 , θ0 , and φ0 are equilibrium values for bond lengths, angles, and dihedral angles. During the MD simulation setup, a force field may be chosen from the different versions that have been developed, e.g. AMBER [2], CHARMM [3], OPLS[4], or GROMOS[5]. Approximation 3 – Newton’s second law: The time-dependent Schrödinger equation for the nuclei positions is approximated by Newton’s equation of motion (second law of Newton (1687)), Fi = mi ai (0.4) where the force Fi acting on atom i equals the product of the mass of atom i and its respective acceleration i. An efficient numerical double-integration of this equation can be achieved by using the Verlet algorithm [6] or the Leap-Frog algorithm [7]. The time step size for integration has to be chosen small as compared to the fastest motions in the system e.g. bond vibrations involving hydrogen atoms and is thereby restricted to less than 1 femtosecond (= 1×10−15 s). Using algorithms as LINCS [8] or SHAKE [9] allows to enlarge the time step to 2 fs by constraining the bond lengths. 4 Often, one applies periodic boundary conditions within the simulations. Hence, the simulation box is duplicated infinitely in all three dimensions and by this, e.g. a simulation of a lipid bilayer resembles the experimental situation of multilamellar lipid bilayers. Because the non-bonded interactions (first two sums in equation 0.3) between atoms are the computationally most expensive part for the force calculation, a cutoff of usually 1.0 to 1.4 nm is applied for the Lennard-Jones interactions, as these decrease quickly with increasing distance between two atoms ( r16 ). Long range electrostatic interactions are treated by the grid-based Ewald summation which calculates the electrostatic contributions within a cutoff explicitly and the contributions outside of this cutoff in the reciprocal space. The Particle Mesh Ewald (PME) method yields another way to enhance the efficiency of Ewald summation by using fast Fourier transformations from N 2 for the explicit calculation of coulombic interactions to N log(N ), N being the number of atoms in the system. Constant temperature during the simulation can be assured by coupling the system to an external temperature bath with given temperature T0 . This can be achieved e.g. by the Berendsen thermostat [10]: T0 − T dT = , dt τ (0.5) where a deviation of the system temperature T from T0 decays exponentially with the a time constant τ (typical value 1 ps). The Berendsen coupling scheme was also used to keep the system’s pressure to a constant value of 1 bar. However, there are also regulations and limitations for MD-Simulations: Firstly, not all molecules have yet been adequately parameterized. The second limitation is connected to the computational effort that is necessary for only one nanosecond of simulation as mentioned before. Hence, the simulation length is practically restricted to 50 to 100 nanoseconds, dependent on the system size. It is further not possible to describe chemical reactions by the simulation of a system, as the classical force field approximation cannot describe breakage or formation of covalent bonds. Also, processes at low temperature (less than 10 K) and the dynamics of hydrogen atoms are poorly described due to the increasing importance of quantum dynamical behaviour at low temperatures or small masses. 5 3 Introduction to AMBER Setup of a molecular dynamics simulation of a peptide This part of the course will make you familiar with the AMBER 9 Simulation suite [1]. Step by step, we will go through a full simulation setup and execution. Open a terminal and change to the lab tutorial folder named ’Blockpraktikum’ cd Blockpraktikum Typing the command ’ls’ should show the 3 subfolders needed for this lab tutorial. Change to the folder ’amber’ cd amber In this tutorial we will attempt to setup and run an MD simulation of an oligo-alanine peptide in implicit solvent using the Amber9 simulation package (Case et al., 2003, UCSF). Before we can start the simulation we need to create the following files: 1. Initial coordinates 2. Topology file (describes the molecular mechanics force field of the peptide) 3. Input files for energy minimization and MD run. Stage 1 - Building the initial coordinates and topology Initial coordinates can be generated either from a database of experimental structures or we can generate the initial structures using the Leap module available in the Amber package. Here we use the Leap module to generate the start structure of oligo-alanine. The sequence is ALA-ALA-ALA-ALA-ALA-ALA (6 alanine amino acid). To protect the terminal ends we use capped poly alanine peptide, so the sequence is ACE-ALA-ALAALA-ALA-ALA-ALA-NME. Start the leap program and generate the structure, in the 6 command prompt type xleap Here, be sure that ’Caps Lock’ on your keyboard is disabled. Now we have to load the force field parameters source leaprc.ff03.r1 Use the command sequence to create the structure, type: protein = sequence {ACE ALA ALA ALA ALA ALA ALA NME} We have created the structure, now save the coordinates and topology file using the command saveamberparm saveamberparm protein ala.top ala.crd quit xleap by typing: quit Now in your working directory you will find the two files (ala.top, ala.crd) we generated from xleap. From this topology and coordinate file we can generate a pdb file which can be used to visualize the peptide structure. The amber package contains a module called ambpdb to do this job. In the command prompt type ambpdb -p ala.top < ala.crd> ala.pdb Use VMD to visualize the structure vmd ala.pdb Stage 2 - Minimising the structure Before we start running MD we need to perform a short minimization of our starting structure. The energy minimization removes the steric clashes, if any, and will move the structure to the nearest local minimum. For this purpose, we need an input file which you can find in the folder (filename min.in) Look at the file (e.g. using more min.in), the different settings for the minimzation run are commented. 7 Run the minimization with the MD-executable of AMBER called sander: sander -O -i min.in -p ala.top -c ala.crd -o alamin.out -r alamin.crd -O - to overwrite the output files if it exist -p - flag for topology file -c - flag for input coordinate file -o - flag for output file -r - flag for output coordinate file -i - flag for the input command file If the simulation was successful, we have two more files alamin.out and alamin.crd in the working directory. Look at the output files alamin.out and alamin.crd. In the command prompt type more alamin.out or more alamin.crd Now convert the output coordinate file into pdb, so that we can visualize the structure ambpdb -p ala.top < alamin.crd> alamin.pdb vmd alamin.pdb Compare the minimized structure to the initial starting structure. What has changed? Stage 3 - Heating up the system The next stage is to do the MD run. As we normally do the MD simulations at 300K (room temperature) we should heat up the system to 300k in a step by step manner. The heating in stages will equilibrate the system at each temperature. We do it in three steps like 100K, 200K and 300K and for 5ps in each step. In your folder, you can find three input files named md1.in, md2.in, and md3.in. Open and compare these files. What are the differences? Now run the MD runs subsequently: MD at 100K sander -O -i md1.in -p ala.top -c alamin.crd -o alamd1.out -r alamd1.crd -x alamd1.trj 8 MD at 200K sander -O -i md2.in -p ala.top -c alamd1.crd -o alamd2.out -r alamd2.crd -x alamd2.trj MD at 300K sander -O -i md3.in -p ala.top -c alamd2.crd -o alamd3.out -r alamd3.crd -x alamd3.trj Note that these simulations and the following production run are done in vacuum with distance dependent dielectric constant. Stage 4 - Production run Now we run a production MD simulation. Usually one needs to run it for several nanoseconds, but due to time constraints (of this lab course) we run it for only 1 ns. The inputfile for the production run is called ’mdpr.in’. The machine you are working on has a 4core processor, to use all 4 efficiently and to decrease the real simulation time, we will use MPI. Run the simulation using: mpiexec -n 4 sander.MPI -O -i mdpr.in -p ala.top -c alamd3.crd -o alamdpr.out -r alamdpr.crd -x alamdpr.trj This will take around 3 minutes... Stage 5 - Visualize the trajectory in VMD vmd -parm7 ala.top -crd alamdpr.trj If every thing went fine, now you can see the trajectories in the display window. Choose graphics in the VMD panel The peptide can be represented in the several drawing methods ( as lines, as ribbons , as cartoon) Choose from the list box under drawing method , for example choose cartoon representation, you can see the cartoon representation of the peptide model in the display window. Describe what is happening throughout the trajectory and which conformational changes you can observe. 9 Stage 6 - RMSD vs. time plot Now we calculate the root mean square deviation (RMSD) of the start structure over the trajectory. We use a module ptraj that is available in Amber 9 package. The contents of the input file rmsd.in are: trajin alamdpr.trj rms first @CA out rmsd.dat time 1.0 Run the ptraj program: ptraj ala.top < rmsd.in> rmsd.out Now plot the rmsd data using xmgrace xmgrace rmsd.dat The popup window shows the plot of Rmsd vs. time. Summary of the steps 1. Generate a start structure using the leap module 2. Generate coordinates and topology files 3. Perform energy minimization 4. Run MD simulations 5. Visualize the trajectory in VMD 6. Calculate the RMSD values using ptraj and plot rmsd vs. time using xmgrace. Additional tasks: • record the hydrogen bond formation. • create a Ramachandran plot • record the end-to-end distance of the peptide 10 4 Folding simulation of Chignolin The 10 residue chignolin peptide[11] is an example of a small peptide that forms a stable hairpin type three-dimensional (3D) structure in solution (at 300 K, see Figure 1). Aim of the MD simulation studies is to simulate the structure formation process starting from a fully extended unfolded peptide conformation. Note that the outcome can strongly depend on the starting structure (traps in energy landscapes,...). The simulations are performed employing an implicit solvent model (Generalized Born model) as implemented in the Amber simulation package (input parameter igb=1). During the simulations a Langevin type equation of motion is solved numerically in small steps. Change to the chignolin-folder by cd ../chignolin Like previously learned, use xleap to create the following sequence: source leaprc.ff03.r1 protein = sequence {NGLY TYR ASP PRO GLU THR GLY THR TRP CGLY} Here, we will use a slightly different born radii from default to enhance the performance: set default pbradii mbondi2 Again we save the Topology and the coordinates and quit xleap. saveamberparm protein chi.top chi.crd Minimize it using the following input file: mpiexec -n 4 sander.MPI -O -i min.in -p chi.top -c chi.crd -o chimin.out -r chimin.crd Now, we run the short production simulation mpiexec -n 4 sander.MPI -O -i mdpr.in -p chi.top -c chimin.crd -o chimdpr-short.out -r chimdpr-short.crd -x chimdpr-short.trj This is a shorter simulation than the one we will analyse later. But it can still give us a good first impression. As this simulation will run for approx. 1h, you can now go to Lunch-Break. 11 5 Analysis of the chignolin results It is useful to take a look at the generated trajectory using VMD and compare the sampled conformations with the starting structure and the reference conformation (folded experimental Chigolin conformation). Prepare a Figure showing the superposition of one of the final structures and the start structure with the experimental peptide structure. Prepare an Rmsd plot with respect to the start structure and with respect to the experimental folded structure. Estimate the fraction of folded conformations during the last 5 ns simulation time. Prepare Rmsd plots for the central parts of the structure and for the stem part of the hairpin. Monitor the formation of key H-bonds in VMD and generate plots with the help of ptraj. Which part forms first? Which part is the most flexible? Make a Ramachandran plot of residue 2 to 9 (all residues besides the caps). 12 6 Potential of mean force for ligand-receptor dissociation The WW-domain (shown in cartoon and ribbon representation) in complex with a proline-rich ligand peptide (stick representation on top) WW domains are small protein structures of ∼ 40 amino acids that are subdomains of many larger proteins. WW domains can fold as isolated units and bind proline rich peptides at a specific binding cleft. Aim of the simulation experiment is to induce dissociation of a pro-rich peptide from the WW domain and to calculate the free energy change along a dissociation pathway. The simulations are performed including solvent and surrounding ions explicitly. Dissociation is achieved along a reaction coordinate (d) defined as the distance between centers of mass of the WW-domain and a bound peptide. The dissociation is induced in a series of simulations by adding a penalty potential along the reaction coordinate of the following form (umbrella potential): Vpenalty (d) = kpenalty × (d − d0 )2 13 (0.6) The distance d0 corresponds to a reference distance that can be varied between 9 Å (complexed state) and 22 Å (dissociated state). The force constant kpenalty is set to 1 kcal mol-1 Å -1. This allows for fluctuations of the actual distance d around d0 of 1 Å . Dissociation is induced by running a set of simulations starting with a d0 =9 Å followed by a stepwise (1 Å ) increase of d0 to 22 Å . During the simulations the actually sampled distance d is recorded. From the distance distributions it is finally possible to calculate a free energy change along the reaction coordinate using the Weighted Histogram Analysis Method (WHAM)[12]. Setup of the simulations using the xleap program Open the pdb file of the WW-domain in complex with a pro-rich ligand in xleap. source leaprc.ff03.r1 ww = loadpdb ww2.pdb Using the command charge ww we see that the WW-domain is still charged. This would be problematic and thus we will add one Natrium ion to counter the charge addions ww Na+ 1 For this small waterbox we only need two Na, and two Cl ions to reach the intracellular ion concentration (in vitro). addions ww Na+ 2 Cl- 2 Create a box of 9 Å radius around the protein and solvate it with TIP3P water solvateoct ww TIP3PBOX 9.0 We will use pregenerated files for the later simlation and analysis and will thus only produce a pdb file. saveamberparm ww ww-water.top ww-water.crd savepdb ww ww-water.pdb The stored protein topology file and structure contain not only parameters and coordinates of the protein and peptide but also of the surrounding water and ions. Now, an energy minimization and a series of short MD simulations to equilibrate the system are performed. 14 Use the input file mini.in for the minimization. mpiexec -n 4 sander.MPI -O -i mini.in -p ww-water.top -c ww-water.crd -o ww-watermin.out -r ww-watermin.crd During equilibration the system is heated to 300 K with the protein coordinates restraint to the initial experimental structure. In a second set of simulations the restraints are gradually removed. We use for the next steps an already pre-equilibrated system (wws0.crd) and will continue directly with the induced dissociation. Before we can start the induced peptide dissociation the centers of mass for the WWdomain and for the bound peptide need to be defined in a restraint definition file. For this purpose, the MD-program needs the atom numbers of those atoms that form each center of mass. It is best to use the Cα atom of each peptide residue to define one center and the Cα atom of the protein to define the second center of mass. The format of the restraint definition file will be provided. For the peptide dissociation, a series of 14 MD simulations is performed with 14 different reference distances (d0 ). Each simulation will be run for 2000 steps (4 ps) and the resulting trajectories will be used to visualize the dissociation process. For the quantitative evaluation of the results, a much longer (already pre-calculated) trajectory will be used. The md-input files for the distance restraining runs are mdrs1.in to mdrs14.in and can be started running the script prs.script by giving the following command to the terminal: ./prs.script The file disN.in contains the group definition and the distance restraining information: &rest iat=-1, -1, iresid=0, irstyp=0,ifvari=1,ninc=0, imult=0,ir6=0,ifntyp=0, r1= 0.000,r2=11.000,r3=11.000,r4=99.000,rk1=1.000,rk2=1.000, igr1= 150,166,...,0, igr2= 567,581,...,0, 15 Simulation analysis and potential of mean force for peptide dissociation Use the VMD program to visualize the dissociation trajectory and to prepare a movie of the process. Which parts of the contacts between peptide and protein are first disrupted? Has the conformation of peptide and protein changed during dissociation? Prepare distance distribution curves for each of the 14 MD simulations. Compare the distributions with the reference distance d0 for each simulation. Perform a WHAM analysis of the dissociation process to calculate the potential of mean force or free energy profile along the reaction coordinate. The WHAM program by A. Grossfield [12] can be used for this purpose with the following command line arguments: wham 8.0 24.0 100 0.001 300.0 begin end steps toler. 4 temp a.e.p. input.txt pmf.out infile outfile The output file first has to be cleaned of invalid lines and columns to analyse it. Do this with your editor of choice. Tip: Vi/Vim/Emacs/Kate etc. have a column selection mode. Kate: Ctrl+Shift+B Vi/Vim: Ctrl+Alt+V It is useful to check the Rmsd of the protein and the complex as a function of simulation stage. Can we relate the calculated free energy curve to specific dissociation events during the simulation? 16 Important commands Generate and setup the structures xleap load forcefield parameters source leaprc.ff03.r1 create a sequence protein = sequence {...} load a pdb structure protein = loadpdb xyz.pdb use different set of born radii set default pbradii mbondi2 check the charge of the structure charge protein add ions to neutralize or simulate a cellular environment addions protein ion amount create a box and solvate with water solvateoct protein TIP3PBOX 9.0 save the topology and coordinates saveamberparm protein xyz.top xyz.crd save as pdb savepdb protein xyz.pdb close the program xleap quit Visualize a strucutre normal pdb vmd -pdb xyz.pdb implicit solvent simulation vmd -parm7 xyz.top -crd xyz.crd/.trj explicit solvent simulation vmd -parm7 xyz.top -crdbox xyz.trj 17 Generate pdbfile from .top and .crd ambpdb -p xyz.top <xyz.crd> xyz.pdb Run a simulation Minimisation sander -O -i simulation parameter -p topology -c coordinates -o output log -r new coordinates Other sander -O -i simulation parameter -p topology -c coordinates -o output log -r new coordinates -x new trajectory for long simulations use mpiexec -n 4 sander.MPI ... Analysis create different infiles and read in the trajectory trajin trajectory calculate distances / hbonds distance dist :residue@atom :residue@atom out filename.dat time x.y calculate RMSD rms reference structure :residue@atom out filename.dat time x.y use ptraj to perform the calculations ptraj xyz.top <infile> output log Generate plots of the calculated data xmgrace file1 file2 ... 18 Bibliography [1] T.E. Cheatham III C.L. Simmerling J. Wang R.E. Duke R. Luo M. Crowley Ross C. Walker W. Zhang K.M. Merz B.Wang S. Hayik A. Roitberg G. Seabra I. Kolossvry K.F.Wong F. Paesani J. Vanicek X.Wu S.R. Brozell T. Steinbrecher H. Gohlke L. Yang C. Tan J. Mongan V. Hornak G. Cui D.H. Mathews M.G. Seetin C. Sagui V. Babin D.A. Case, T.A. Darden and P.A. Kollman. Amber 9. University of California, San Francisco, (16), Dec. [2] S. J. Weiner, P. A. Kollman, D. A. Case, U. Singh, C. Ghio, G. Alagona, Jr. S. Profeta, and P. Weiner. A new force field for molecular mechanical simulation of nucleic acids and proteins. JACS, 106:765–784, 1984. [3] B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J Comp Chem, 4(2):187–217, 1983. [4] W. L. Jorgensen, D. S. Maxwell, and J. Tirado-Rives. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. JACS, 118:11225–11236, 1996. [5] W. F. van Gunsteren, S. R. Billeter, A. A. Eising, P. H. Hünenberger, P. Krüger, A. E. Mark, W. R. P. Scott, and I. G. Tironi. Biomolecular Simulation: The GROMOS96 Manual and User Guide. Vdf Hochschulverlag AG an der ETH Zürich, Zürich, Switzerland, 1996. [6] R. E. Gillian and K. R. William. Shading, rare events and rubber bands – a variational verlet algorithm for molecular dynamics. JCP, 97:1757–1772, 1992. [7] W. F. van Gunsteren and H. J. C. Berendsen. A leap-frog algorithm for stochastic dynamics. Mol. Sim., 1:173185, 1988. [8] B. Hess, H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije. LINCS: A linear constraint solver for molecular simulations. J. Comp. Chemistry, 18:1463–1472, 1997. [9] Philippe H. Hnenberger Vincent Krutler, Wilfred F. van Gunsteren. A fast shake algorithm to solve distance constraint equations for small molecules in molecular dynamics simulations. Journal of Computational Chemistry, 22:501–508, 2001. [10] H. J. C. Berendsen, J. P. M. Postma, W. F. Van Gunsteren, and J. Hermans. Interaction model for water in relation to protein hydration., pages 331–342. D. Reidel Publishing Company, Dordrecht, The Netherlands, 1981. [11] Shinya Honda, Kazuhiko Yamasaki, Yoshito Sawada, and Hisayuki Morii. 10 residue folded peptide designed by segment statistics. Structure, 12(8):1507–1518, Aug 2004. [12] http://membrane.urmc.rochester.edu/Software/WHAM/WHAM.html. 19