Analyzing a Simulation of Met-Enkephalin

This is a short piece of documentation designed to supplement a series of scripts that are relevant to analyzing the results of a Molecular Dynamics Simulation. Hopefully, you found this document in a directory containing those scripts.

If not, you can download and unzip the directory from the following link: tutorial2.tgz

In the top level directory you should find the following:

metp-wat8.top: This is an AMBER "prmtop" file for the protected pentapeptide met-enkephalin solvated in water

metp8.top: This is an AMBER "prmtop" file for the protected pentapeptide met-enkephalin NOT solvated in water

min.rst: This is an AMBER "restart" file containing the coordinates of a linear structure of met-enkephalin that looks like this:

master.bash: This is a shell script which can be used to sequentially execute all of the commands contained in this exercise.

You should understand, however, that by simply doing this, you could miss a lot of the detail of what is being done. You are therefore encouraged to look a little more deeply and follow what is written here. As is the case for master.bash, all of the scripts associated with this exercise are named with the extension bash (e.g. file.bash). They each have numerous comments written inside and you are encouraged to read them.

clean.bash is a script for removing the files produced in the exercise should you wish to start fresh.

In addition to the above files there are several subdirectories that shall be addressed in the following order:

Contents

A) Preparation (prep)

B) Principal Component Analysis (pca)

C) Extracting the Total Energies (etot)

D) Evaluating the Energies Using an Implicit Scoring Function (imin)

E) Clustering Analysis (clust)

Before getting started there are a few comments:

This tutorial is set up to run on a unix machine using the bash where the programs Amber8, MMTSB, Grace, and VMD are installed and available to all users. It follows on in a way from a more basic tutorial that can be found at Tutorial.htm. If you have difficulty with some of the aspects in the current tutorial, it may be a good idea to take at look at the previous one. Another handy link is http://www.google.com.

As mentioned above, everything is driven by shell scripts with extensive comments. If you wish to only execute a subset of the commands in any one script, you simply add the # symbol to the beginning of any line you don't wish to execute. Otherwise, most of the commands can be simply given to the shell by hand.

The commands given here are to the bash, written in italics and are preceded by ">"

A) Preparation

Assuming you are starting in the top level directory:

>cd prep

In this directory you will find a large file (>1Gb) called mdp1.crd.000. This is an AMBER trajectory file containing 25000 snapshots. It actually corresponds to the ensemble at the lowest temperature of a Replica Exchange Molecular Dynamics run using explicit solvent. At this point, however, that is unimportant in that the following procedures are applicable to any trajectory.

Other than that you should find a shell script called prep.bash and several inputs to the AMBER post-processing program "ptraj", all with the extension ptrj (e.g. file.ptrj).

The file prep.bash is a script that can be used to execute the various steps below.

**********

The 1st input "strip.ptrj" reads in snapshots contained in mdp1.crd.000, strips out (deletes) all water molecules and writes out the remaining coordinates to a binary "binpos" file (strip.bps).

strip.ptrj looks like this:

trajin mdp1.crd.000

strip :WAT

trajout strip.bps binpos nobox

The command:

>ptraj ../metp-wat8.top strip.ptrj

Will execute the ptraj commands contained in strip.ptrj using the topology contained in the appropriate file (metp-wat8.top). Because this command is working on a >1Gb ASCII file, it will take several minutes to complete.

**********

The 2nd input (allign-min.ptrj) alligns the resulting trajectory (strip.bps) to the starting conformation (min.rst) based on the positions of the alpha carbon atoms, outputs a new trajectory (almin.bps) as well as the (Calpha) RMS deviation of each structure from that conformation (rms-min.dat) and the corresponding average structure (av.pdb)

allign-min.ptrj looks like this:

reference ../min.rst

trajin strip.bps

trajout almin.bps binpos nobox

rms reference out rms-min.dat @CA

average av.pdb * pdb nobox

And the command:

>ptraj ../metp8.top allign-min.ptrj

will execute it accordingly. Note that we use metp8.top in place of metp-wat8.top as our input trajectories (min.rst & strip.bps) do not contain any water. Because of the lack of the water and the binary format of the trajectory, this will execute significantly faster than the previous command.

You can graph the RMS deviations (rms-min.dat) in a program such a Grace. It should look something like this:

/home/dsmith/metp-small/rms1.agr

You can see that most of the time, the peptide is in a conformation quite different to that of the starting structure although it often returns to similar structures folding and unfolding many times during in the simulation.

The structural average of all these snapshots (av.pdb) looks something like this:

We should not expect this to look like a physical structure and indeed it doesn't. If you look closely the aromatic rings of the PHE and TYR sidechains are squashed and the MET sidechain is also squashed. This is because the average structure represents the average positions of the atoms throughout a dynamical simulation.

**********

The 3rd input (allign-av.ptrj) realligns the trajectory (almin.bps) to the average structure (av.pdb), outputs the alligned trajectory (alav.bps) and computes the resulting RMS deviations (rms-av.dat)

allign-av.ptrj looks like this:

reference av.pdb

trajin almin.bps

trajout alav.bps binpos nobox

rms reference out rms-av.dat @CA

And the command:

>ptraj ../metp8.top allign-av.ptrj

will execute it accordingly.

If we add the resulting RMS data to our graph, we see that the deviation from the average is much less than from the reference but, considering the definition of average, this is hardly surprising.

/home/dsmith/metp-small/rms1.agr

For various reasons, we will use the trajectory that is aligned to the average structure in our subsequent analysis.

**********

The 4th input (cut.ptrj) simply cuts down our trajectory from 25000 frames (alav.bps) to 1000 frames (alav-sm.bps) by ignoring the first 5000 (equilibration) and then taking every 20th frame.

cut.ptrj looks like this:

trajin alav.bps 5001 25000 20

trajout alav-sm.bps binpos nobox

And the command:

>ptraj ../metp8.top cut.ptrj

will execute it accordingly.

We might ordinarily want to operate with say 20000 frames but for the purpose of speed in this exercise, we will use only 1000.

**********

The 5th input (conv.ptrj) converts our short binary trajectory (alav-sm.bps) to a short ascii one (alav-sm.crd) that we will need later.

conv.ptrj looks like this:

trajin alav-sm.bps

trajout alav-sm.crd trajectory nobox

And the command:

>ptraj ../metp8.top conv.ptrj

will execute it accordingly.

If you are interested, you could try and load the resulting alav-sm.crd into VMD and view the 1000 structures that we just extracted.

This is left as an exercise.

**********

Finally, we would like to cut down our rms graphs (rms-av.dat & rms-min.dat) so that they correspond to the 1000 frames that we will use later. This could be done by realigning the short trajectories and recalculating the RMSDs but here we will just use the "hack":

 

>sed -n '4981~20p' rms-av.dat | awk '{print $2}' > rms-av-sm.dat

>sed -n '4981~20p' rms-min.dat | awk '{print $2}' > rms-min-sm.dat

**********

Contents

B) Principal Component Analysis.

After our excursion into basic trajectory manipulation with ptraj, we will now look at some other features of this very useful program, namely principal component analysis.

Without going back to the top level directory you should now change into a subdirectory of prep called pca.

>cd pca

Here you should find a script pca.bash and two ptraj inputs.

The first input pca.ptrj looks like:

trajin ../alav.bps

matrix covar name covmat out covmat-ca.dat @CA

analyze matrix covmat out evecs-ca.dat vecs 2

And can be executed by:

>ptraj ../../metp8.top pca.ptrj

This input reads in the 25000-frame binary trajectory that we aligned to the average structure above (alav.bps). Our aim is to calculate the first two principal components of our dataset. We will later project our 1000 structures onto these components but we wish to evaluate the components themselves with the full 25000 frames. Now, the first two principal components correspond to the first two eigenvectors of the covariance matrix and that is how we will evaluate them. This means that we must first construct the covariance matrix and that is precisely the purpose of the first line of the above input. Following that, we need to determine the first two eigenvectors of this matrix and output them to a file (evecs-ca.dat). This is the function of the second line of input above. Note that we are again using only the alpha carbon atoms.

The second input proj.ptrj looks like:

trajin ../alav.bps

projection modes evecs-ca.dat out pca12-ca beg 1 end 2 @CA

And can be executed by:

>ptraj ../../metp8.top pca.ptrj

This simply calculates the projection of each structure onto the 1st and 2nd principal components and outputs it to a file (pca12-ca). To make this file easier to plot, we will do a little more hacking:

>sed '1,2d' pca12-ca | awk '{print $2 "\t" $3}' > pca12-ca.dat

>rm covmat-ca.dat

>rm pca12-ca

We can now plot the projections contained in pca12-ca.dat to gain a 2d overview of our 25000 structures.

/home/dsmith/metp-small/pca.agr

**********

 

We now wish to calculate the projections of our 1000 structures onto these principal components. We could hack this with sed but, as this is a task one may need to do in a different context, we will do it with ptraj to illustrate how.

The relevant files are in a different directory.

>cd ../../pca

Here you should find a ptraj input called projs.ptrj which looks like:

trajin ../prep/alav-sm.bps

projection modes ../prep/pca/evecs-ca.dat out pca12-cas beg 1 end 2 @CA

This should be more or less self-explanatory. After a little bit of hacking:

>sed '1,2d' pca12-cas | awk '{print $2 "\t" $3}' > pca12-cas.dat

>rm pca12-cas

We can plot the projections of our 1000 structures (pca12-cas.dat):

/home/dsmith/metp-small/pcas.agr

You can see that the 1000 structure "sample" gives a reasonable approximation to the entire dataset.

Contents

 

C) Extracting the Total Energies

We might like to know the potential energies of our 1000 structures. In order to extract them from the output file of our molecular dynamics run we need to do just a little bit of hacking.

>cd ../etot

In this directory you will find a file called mdp1.inf.000. This is the output corresponding to our trajectory file mdp1.crd.000.

In order to extract the correct 1000 energies, we simply need to:

>grep EPtot mdp1.inf.000 | awk '{print $9}' | sed -n '4977~20p' > eptot.dat

You could also just execute the script pegen.bash and achieve the same result. The comments in this script might be of interest.

Contents

 

D) Evaluating the Energies Using an Implicit Scoring Function

In this exercise we shall calculate the energies of our 1000 structures (obtained from dynamcis with explicit solvent) with an Implicit (generalized Born) treatment of solvation.

>cd ../imin

In this directory you should find a file called min.in which will basically do this for us. This is a normal input for an AMBER minimization except for imin=5, which means that the trajectory will be read in from a file (in this case the ASCII version of the short trajectory that we produced earlier, alav-sm.crd). Execution of this task can be achieved with:

>$AMBERHOME/exe/sander -O -i min.in -o min.out -p ../metp8.top -x ../prep/alav-sm.crd -c ../min.rst

This will take a couple of minutes. In order to extract the energy and its components for later use we can:

>grep ESURF min.out | awk '{print $3}' | sed 'n;d;' > esurf.dat

>grep VDWAALS min.out | awk '{print $3}' | sed 'n;d;' > vdw.dat

>grep VDWAALS min.out | awk '{print $6}' | sed 'n;d;' > eel.dat

>grep VDWAALS min.out | awk '{print $9}' | sed 'n;d;' > egb.dat

>grep ENE= min.out | awk '{print $4}' > eim.dat

Alternatively, you can execute the imin5.bash script that is present in the same directory.

esurf.dat will thus contain the surface energies for each of the 1000 structures.

vdw.dat will contain the Van der Waals energies

eel.dat will contain the electrostatic component of the energies

egb.dat will contain the generalized Born (GB) solvation energies

eim.dat will contain the total energy using the implicit description of the solvent

Contents

 

E) Clustering analysis

We now arrive to the main point of what we are doing. In short, we are going to cluster the 1000 structures in order to gain a better idea of what our simulation is telling us. We would then like to sort some of the properties we have been calculating by cluster to see if we can observe some interesting trends.

>cd ../clust

In this directory you will find quite a few files. The most important is anal.bash and its execution will carry out all of the steps below. For some tasks, the easiest way to carry them out is actually to comment out (use #) the unwanted steps and execute various pieces of the script independently. Nevertheless, you should find all the information you need to follow this exercise below. You are also encouraged to read the comments in the script anal.bash.

For the clustering, we will use the MMTSB toolkit. You are also encouraged to find the online documentation of this very useful set of tools.

The MMTSB toolkit needs pdb files as input. Our first step is thus to create them with ptraj using mpdbs.ptrj.

>mkdir pdb

>ptraj ../metp8.top mpdbs.ptrj

has the effect of creating 1000 pbd files in the directory pdb. The names (numbering) of these pdb files need to be fixed using a shell script:

>./fix_numbering_pdb.csh

We will also need a file containing a list of the files

>cd pdb

>ls . > ../clustfils

**********

Now we are in a position to actually perform the clustering.

>kclust -mode rmsd -ca -centroid -cdist -radius 2.0 -lsqfit -iterate ../clustfils > ../Centroids

This command uses the fixed radius clustering with a radius of 2.0 Angstrom taking into account only the coordinates of the alpha carbon atoms. For more information, see the MMTSB documentation. It should only take a minute or so to run.

>cd ..

You should now see a file called Centroids.

This file contains the coordinates of the centroid of each cluster as well as a list of the pdb files that constitute each cluster. To make it a bit easier to deal with, we will run an awk script called extract_centroids.awk.

>awk -f extract_centroids.awk Centroids > Centroids.stats

This will generate a few extra files. The direct output, Centroids.stats looks like:

1 #Centroid 410

2 #Centroid 284

3 #Centroid 306

and tells us that, with a radius of 2.0 Ang, we separated our 1000 snapshots into 3 clusters containing 410, 284, and 306 structures, respectively. We also obtain pdb files of the structures of the centroids themselves (centroid01.pdb, centroid02.pdb, & centroid03.pdb):

As with the average structure, these are not actual physical structures but averages of those structures present in each cluster. Recall that we used only the coordinates of the alpha carbon atoms for the clustering so it is only really the backbone conformation that we are interested in. You can see that the centroid of cluster 1 is quite helical, that of cluster 2 more extended while cluster 3 seems to represent some form of a turn.

You should also see three files that contain the members of each cluster: centroid01.member.dat, centroid02.member.dat, & centroid03.member.dat.

As the centroids themselves are not actually physical structures, it can be quite helpful to find the actual physical structure whose backbone structure most closely matches the centroid.

We could do this manually but we will automate it here. You can either, comment out everything except lines 49-70 in anal.bash and execute it, or you can copy the following text into a file (e.g. name.bash), make it executable (chmod +x name.bash) and run it (./name.bash).

#!/bin/bash

rm -rf temp

mkdir -p dat

i=1

while [ $i -le 3 ]; do

cat centroid0$i.member.dat >> temp

awk '{print $2}' centroid0$i.member.dat > dat/rmsc$i.dat

j=`sort -n -k2 centroid0$i.member.dat | head -1 | awk '{print $1}'`

hello=`head -$j clustfils | tail -1`

cp pdb/$hello ./best$i-$j.pdb

let i=i+1

done

sort -n <temp> centroidall.dat

rm -rf temp

While doing this we created a directory called dat and placed 3 files there (rmsc1.dat, rmsc2.dat, & rmsc3.dat). These files contain the deviations of each structure from its respective cluster centroid. We will investigate them shortly.

We should also have 3 new files in the current directory called best1-668.pdb, best2-612.pdb, & best3-472.pdb. The names simply reflects that the structure closest to the centroid of cluster 1 was structure number 668 etc. These files look like:

Again, recall that the clustering is done on the basis of the alpha carbon positioning and so the backbone is the most important thing here. Basically, these structures are a bit more meaningful than the centroids themselves as they correspond to actual configurations. Nevertheless, it is important to know how representative they are of each cluster. There are a few ways to do this. We will look at the RMS deviations of each member of each cluster from the so-called best cluster and compare them to the deviations from the centroids themselves (rmsc1.dat, rmsc2.dat, & rmsc3.dat, from above).

We still need to generate the former plots. While we are at it we will make trajectories of each cluster for later visualization or analysis. Again we will use our friend the shell to do this. You can either comment out all of anal.bash except for lines 77-96 and execute it or use the following text in a shell script of your own:

#!/bin/bash

mkdir -p dat

while [ $i -le 3 ]; do

lin=`wc -l centroid0$i.member.dat | awk '{print $1}'`

k=`expr $lin + 1`

rm -rf traj0$i

echo reference `ls best$i*` > traj0$i

j=1

while [ $j -lt $k ]; do

num=`head -$j centroid0$i.member.dat | tail -1 | awk '{print $1}'`

echo trajin pdb/`head -$num clustfils | tail -1` >> traj0$i

let j=j+1

done

echo trajout clust0$i.crd trajectory nobox >> traj0$i

echo rms reference out dat/rmsb$i.dat @CA >> traj0$i

ptraj ../metp8.top traj0$i

rm traj0$i

let i=i+1

done

You should now have 3 more files in the dat directory: rmsb1.dat, rmsb2.dat, & rmsb3.dat. These contain the deviations from the "best" structures that we wanted. We are now in a position to compare them to the deviations from the centroids themselves (rmsc1.dat, rmsc2.dat, & rmsc3.dat)

/home/dsmith/metp-small/rms-cent.agr

You can see that, although the deviations from the "best" structures (lower graph) are larger than from the centroids (upper graph), they follow basically the same pattern. These graphs also show that cluster 1 is significantly tighter than cluster 2 (is spread much less in RMSD space) while cluster 3 is somewhere in between.

The other files that we produced with the last script are called: clust01.crd, clust02.crd, & clust03.crd. These are AMBER trajectory files of the individual clusters and you are encouraged to visualize them with VMD to see for yourself how representative each centroid (or the actual structure closest to it) is of each cluster as a whole.

**********

In the next phase, we will make use of the "ensemble computing" cababilities of the MMTSB. Once again you are encouraged to look at the MMTSB documentation to see what else you can do with these tools.

We first need to check in our structures:

 

>cd pdb

>checkin.pl -dir ens -f ../clustfils metp

>cd ..

This can take a few minutes. The toolkit does have an "ensemble clustering" facility that we could have used. Unfortunately, it does not support the outputting of centroids or deviations from them. In addition, it does not support RMSD-based clustering with atom selections other than alpha AND beta carbons. However, once the "ensemble" knows about the cluster there are some neat things that it can do. Therefore we will trick it with an inefficient but successful solution. We simply execute the script enscl.bash and move the resulting output file to the ensemble directory.

>./enscl.bash

>mv metp.cluster pdb/ens

This can also take a couple of minutes.

Now we are in a position enter the various properties that we have calculated into the "ensemble" facility.

This can be done quite simply, either by selectively executing lines 124-134 of anal.bash or by sequentially executing the following commands, from the prompt or in a script of your own:

>setprop.pl -dir pdb/ens -f centroidall.dat -inx 2 metp rmsc

>setprop.pl -dir pdb/ens -f ../prep/rms-av-sm.dat -inx 1 metp rmsa

>setprop.pl -dir pdb/ens -f ../prep/rms-min-sm.dat -inx 1 metp rmsm

>setprop.pl -dir pdb/ens -f ../etot/eptot.dat -inx 1 metp eptot

>setprop.pl -dir pdb/ens -f ../imin/eim.dat -inx 1 metp eim

>setprop.pl -dir pdb/ens -f ../imin/esurf.dat -inx 1 metp esurf

>setprop.pl -dir pdb/ens -f ../imin/vdw.dat -inx 1 metp vdw

>setprop.pl -dir pdb/ens -f ../imin/egb.dat -inx 1 metp egb

>setprop.pl -dir pdb/ens -f ../imin/eel.dat -inx 1 metp eel

>setprop.pl -dir pdb/ens -f ../pca/pca12-cas.dat -inx 1 metp pca1

>setprop.pl -dir pdb/ens -f ../pca/pca12-cas.dat -inx 2 metp pca2

These commands should be self-explanatory and you can look at earlier parts of the exercise to remind yourself of the contents of the various files.

We will now use one of the handy ensemble tools to identify the lowest energy structure in our ensemble. Just to see the operation of this tool you can try:

>cd pdb/ens

>getprop.pl -prop eptot metp | sort +1n | head

This should give you a list of the 10 lowest energies along with their structure number. You will notice that structure number 767 has the lowest energy. We might want to view this to see what it looks like. This should be straightforward to do manually but we will go a little bit beyond that and align all of the structures to this lowest energy one and calculate the RMSDs. You can execute lines 141-153 of anal.bash or use the following text in a script of your own:

#!/bin/bash

best=`getprop.pl -prop eptot metp | sort +1n | head -1 | awk '{print $1}'`

bestf=`head -$best ../../clustfils | tail -1`

cp ../$bestf ../../low-$best.pdb

cd ../..

rm -rf traj

echo reference low-$best.pdb >> traj

echo trajin ../prep/alav-sm.bps >> traj

echo trajout ../prep/allow-sm.bps binpos nobox >> traj

echo rms reference out ../prep/rms-low-sm.dat @CA >> traj

ptraj ../metp8.top traj

rm -rf traj

setprop.pl -dir pdb/ens -f ../prep/rms-low-sm.dat -inx 2 metp rmsl

This will place a file called low-767.pdb in the clust directory of the exercise. It looks like:

It will also place the trajectory (allow-sm.bps) aligned to this structure and the corresponding RMS deviations (rms-low-sm.dat) in the prep directory. The final line enters these RMS deviations into the ensemble with the tag "rmsl"

**********

Now we finally get to the point of all this. With the data entered in this way we can very quickly produce plots of the properties sorted by cluster. A good example of this is the principal components. To do this one can type:

>getprop.pl -prop pca1,pca2 -cluster t.1 metp | awk '{print $2 "\t" $3}' > ../../dat/pca-1.dat

>getprop.pl -prop pca1,pca2 -cluster t.2 metp | awk '{print $2 "\t" $3}' > ../../dat/pca-2.dat

>getprop.pl -prop pca1,pca2 -cluster t.3 metp | awk '{print $2 "\t" $3}' > ../../dat/pca-3.dat

which should be relatively self explanatory. If we were to plot the files pca-1.dat, pca-2.dat, & pca3.dat, after a bit of fiddling, we would see the following:

/home/dsmith/metp-small/pcac.agr

In my opinion this gives a very concise overview of the data in 2d. You can see that cluster 1 is a bit "tighter" than cluster 3 which, in turn, is a bit "tighter" than cluster 2. You can see that, in this case, a projection onto 2 dimensions is able to provide a reasonable representation of the multidimensional clustering analysis with virtually no overlap of the clusters visible in the plot. It also gives you a good visual guide as to the relative populations of the clusters. All this in one 2d graph with 3 colors. The significance of the plot might also be better appreciated after you have viewed the trajectories themselves.

Obviously one can also see from this data that the spread of the clusters is not uniform and there are clearly features of the structural ensemble that go beyond a simple 3-cluster approach. If you wish to investigate those features you can try various options such as a smaller cluster radius or re-clustering the clusters. While that is beyond the scope of this exercise, you should be able to adapt the approach presented herein to carry out such tasks.

Of course, the PCA is not the only data that we might want to look at. Fortunately, we can now generate properties sorted by cluster with relative ease. If, for example, you were to execute lines 159-177 of anal.bash or the following text in a script:

#!/bin/bash

i=1

while [ $i -le 3 ]; do

getprop.pl -prop eptot -cluster t.$i metp > ../../dat/eptot$i.dat

getprop.pl -prop eim -cluster t.$i metp > ../../dat/eim$i.dat

getprop.pl -prop rmsl -cluster t.$i metp > ../../dat/rmsl$i.dat

getprop.pl -prop rmsm -cluster t.$i metp > ../../dat/rmsm$i.dat

getprop.pl -prop pca1,pca2 -cluster t.$i metp | awk '{print $2 "\t" $3}' > ../../dat/pca-$i.dat

getprop.pl -prop rmsl,eptot -cluster t.$i metp | awk '{print $2 "\t" $3}' > ../../dat/rmsl-eptot$i.dat

getprop.pl -prop rmsl,eim -cluster t.$i metp | awk '{print $2 "\t" $3}' > ../../dat/rmsl-eim$i.dat

getprop.pl -prop pca1,eptot -cluster t.$i metp | awk '{print $2 "\t" $3}' > ../../dat/pca1-eptot$i.dat

getprop.pl -prop pca1,eim -cluster t.$i metp | awk '{print $2 "\t" $3}' > ../../dat/pca1-eim$i.dat

getprop.pl -prop pca1,esurf -cluster t.$i metp | awk '{print $2 "\t" $3}' > ../../dat/esurf$i.dat

getprop.pl -prop pca1,vdw -cluster t.$i metp | awk '{print $2 "\t" $3}' > ../../dat/vdw$i.dat

getprop.pl -prop pca1,egb -cluster t.$i metp | awk '{print $2 "\t" $3}' > ../../dat/egb$i.dat

getprop.pl -prop pca1,eel -cluster t.$i metp | awk '{print $2 "\t" $3}' > ../../dat/eel$i.dat

let i=i+1

done

You would get many different files in the clust/dat directory. These are just some examples that are possible and you are encouraged to understand the procedure and try your own combinations.

**********

The following discussion will attempt to illustrate the utility of this approach. The graphs shown below give visual representations of some of the data we produced above. Investigation of the remainder of the data is left as an exercise.

/home/dsmith/metp-small/first.agr

The first graph (upper left) is simply the PCA1 vs PCA2 plot again.

The second graph (upper middle) is the explicit potential energies against the RMSD from the lowest energy structure (low-767.pdb). The first thing to notice is that the lowest energy structure (i.e. the one with zero deviation from low-767.pdb or the very first on the left) is a member of cluster 2 (extended). It is not therefore surprising that the other structures that have low deviations from this configuration are also members of cluster 2. Otherwise, the potential energies in different clusters are all spanning much the same range and so there is nothing particularly remarkable about this graph.

We can quantify this with averages if we wish. e.g.:

>bestcluster.pl -prop etot -crit avglow metp (you may also omit the "avglow" and see what happens)

In other types of folding simulations, such a plot (E vs RMSD) can be very revealing as to which clusters are lower in energy than others and whether or not structures close to the lowest energy (or indeed native) one are also low in energy.

The third graph (upper right) shows the implicit potential energies against the same RMSD from the lowest (explicit) structure. On this occasion, one can see a clear preference for cluster 1 (helical) structures, followed by cluster 2 (turn) structures with finally the cluster 3 (extended) structures. You can convince yourself of this with averages by simply substituting "eim" for "etot" in the above command(s). This is an interesting result, suggesting that the implicit and explicit solvation treatments don't agree as to the relative energies of met-enkephalin structures. Indeed, the implicit model seems to over stabilize helices at the expense of extended structures. We can investigate this a little further by looking at the energy components.

The fourth and fifth graphs (centre left and centre middle) are simply the same energies (explicit and implicit) plotted against the projection onto the 1st principal component, as this provides better separation than the RMSD plot (at least that with respect to the minimum energy structure, you could try the different references we used if you like).

The sixth graph (centre right) shows the electrostatic part of the implicit solvation energy (generalized Born). Before seeing this plot, one might have been tempted to speculate that it was this component that was responsible for the over stabilization of helical structures by the implicit energy function. Obviously the reverse is true here.

The seventh graph (bottom left) shows the non-electrostatic part of the implicit solvation energy (which is proportional to the solvent-accessible surface area). This term does indeed show a preference for helical structures over extended ones but the magnitude of this term is too small even to overcome the opposite tendency in the electrostatic component (GB) of the solvation energy, let alone to have a significant influence on the overall implicit energy.

On the other hand, the eighth and ninth graphs (bottom middle and bottom right) showing internal electrostatic and Van der Waals energies do look like good candidates to explain the trend (at first glance). However, we must remember that as we used the same force-field to evaluate the explicit and implicit solvation energies, these terms are practically the same in both approaches. To be more specific, the EEL and VDW components of the intra-met-enkephalin part of the energy in the explicit simulations are virtually the same as in the implicit (except for the use of a cutoff in the explicit simulations). It is, therefore, unlikely that the behavior of these components can explain the overall trend.

The logical conclusion (a.k.a the Sherlock Holmes logic) is that what remains must be the answer. That is, the compensation in the explicit calculations for the clear inherent preference of helical conformations by the force field is only overcome by specific interactions between the solute and the solvent in those same explicit simulations.

That is just one example of how one might go about analyzing the results generated in a molecular dynamics simulation. There are, of course, many others and you are encouraged to read the literature and explore the web to learn more about it.

OK, that's all for now. Try a bit on your own and see where you get.....

Contents