Instructions for Running Structure
Calculations
Written by:
Melanie Nelson and Patty Fagan Jones
This document contains some basic instructions for running NMR
structure calculations using DIANA and AMBER. It was written based
on advice from Lena Maler.
You will need approximately 200 distance constraints to make a first
attempt at a structure calculation. There are two important types of
NOEs to identify at this stage: medium range NOEs (which will help
define the secondary structure) and long range NOEs (which will
"fold up" the protein). However, in the process of calibrating the
bin sizes for the distance constraints and identifying the medium
range and long range NOEs, you will also identify many sequential
NOEs, and these should also be included in the first restraint
list.
Before you can use any NOEs from a particular NOESY in your
restraint list, you must calibrate the bin sizes.
I began by defining three bins for my distance restraints, with upper bounds of
3.5, 4.5, and 6.0 angstroms. Later in the structure
calculation process, it might be desirable to define four bins, and
to fine tune the bounds. Here is one procedure for calibrating the
bin sizes:
- Identify approximately 10 dNN (i to i+1) NOEs in regions
known to be helical (from the chemical shift index or HNHA data).
Use the volumes of these peaks to determine the smallest volume
that will be put into the tightest restraint bin (in my case, 3.5
angstroms).
- Identify approximately 10 daN (i to i+3) NOEs in regions
known to be helical. Use the volumes of these peaks to determine the
smallest volume that will be put into the middle restraint bin (e.g.
4.5 angstroms).
- All NOEs with volumes less than the cutoff identified in the
previous step will be put into the loosest restraint bin (in this
case, 6.0 angstroms).
- Check that these bins are OK once you have gone through and
found as many of the sequenctial and i to i+3 NOEs as can be
unambiguously identified. Some sequential NOEs may fall in the
medium bin, and a few i to i+3 NOEs may fall in the tight bin, but
you do not want too many of these exceptions.
I used the intra-residue distances between HD and HE in phenylalanines
to calibrate the tight bin for the D2O
NOESY. The measured distance between HD and HE is 2.5 angstroms. It was necessary to divide the measured volume by two for the
calibration, since there are two HDs and two HEs on a phenylalanine residue,
and both would be contributing to the NOE. It is also
necessary to be conservative in assigning bins from these distances,
since spin diffusion will certainly contribute to the volumes of
these intra-ring NOEs.
Once you have the bins calibrated, you are ready to begin to
build up a restraint list. If you have a good knowledge of the
secondary structure of your protein, either from chemical shift
indexing, experiments to measure torsion angles (such as the HNHA
experiment), or other considerations, a reasonable next step is to
systematically search for the NOEs that define the secondary
structure, such as the i to i+3 (HA to HN) NOEs for the helices. See
chapter 7 of NMR of Proteins and Nucleic Acids, by Wuthrich
for a useful table of short inter-proton distances in standard
secondary structures.
It doesn't really matter how you set up your structure calculation
directories. However, the instructions in this document will assume
the following set up:
Make the following directories under one main directory (called
struct_calc here):
- diana
Make directories under this directory with date-based names for each
structure calculation run. For instance, calculations run on May
12, 1999 would go in a directory called 990512. This
helps keep the output from the calculations straight.
- amber
Make date-based directories under this directory for the actual
AMBER runs. Also make:
- leap
This directory holds the pdb files as they are being converted
to AMBER xyz format. You need the following scripts/programs
and files in this directory:
- rst
This directory is where you convert the restraints from DIANA
format to AMBER format. You need the following scripts/programs and
files in this directory:
- dodomd (this script will
create date-based named directories with the RST files in them)
- domdprep (be sure to
edit the dodomd script to call this program correctly)
- noeconv (part of the GAP package of structure analysis
programs)
- diana2habas
- a sample PDB file that exactly matches the AMBER xyz
files. There is more information about how to create this PDB file
under the Converting from DIANA to AMBER section.
You will also need a directory with all of the MD scripts, etc.
The various scripts we currently use will assume that this directory
is called ~username/bin/md. You will have to edit the scripts to
make sure they have the correct location of this directory. Here is
a list of the contents of the
MD directory.
You will need the following files in order to run DIANA:
- The command file (the file that actually controls the DIANA
run). It is usually named basename.com,
where basename is whatever name you are using to identify your structures (for
instance, f36g or crcn).
Some points about the command file:
- Be sure to change the first line so that the cd
command has the correct directory
- The example script is generating 50 structures. Early in
the calculations, you will want to generate about this many
structures, because you may lose several during AMBER calculations,
as discussed below. DIANA runs fairly quickly, so I continued
to generate 50 structures in DIANA as my calculations progressed,
but then only took the best structures to AMBER calculations.
- The seed is the seed for the random number generator
(random numbers are used to help the program sample conformational
space). I never changed this value, but in principle you
could.
- The cutoffs determine which structures are included in the
restraint violation output. The only way to determine what these
should be is empirically.
- The sample script runs DIANA four times, using the results
from the previous run to start the next run. You are interested in
the coordinates generated by the last run (the .cor
files).
- The distance restraint upper bounds file. It is usually named
basename.upl.
See below for procedure to
create this file from Felix97. This distance restraint list must be in a
defined format, and must use the correct atom names for DIANA.
Errors in atomnames are the most common
reason for DIANA jobs to fail. Check the DIANA manual for the
correct atom names for each residue.
- The distance lower bounds file. It is usually named
basename.lol. This file is usually used for hydrogen
bond restraints. If you do not have any of these restraints, leave
the file empty.
- The angular restraint file. It is usually named
basename.aco.
Note the format for the
chi(1) restraints for the trans rotamer (120 degrees to 240 degrees,
not 120 degrees to -120 degrees. Although these are equivalent, the
latter does not work with the conversion to AMBER format
restraints.)
- The stereospecific assignments file. It is usually named
basename.ssa. If you do not
have any stereospecific assignments, this file should be left empty.
See below about making stereospecific
assignments and creating this file.
- The sequence file. This file lists the sequence of your
protein in three letter code. Here is an
example sequence file
Procedure for creating a .upl DIANA input file from Felix97:
- For each of your NOESY experiments, run the volume measurement routine in
Felix97 using the Measure:Integral/Volume:Measure_All_Volumes menu pulldown.
- Export the volume tables (one for each NOESY experiment) to text files
by using the File:Export:Table menu pulldown, double click on "vol", click
on filename indicating which experiment it is (i.e. nnoe3d), and hit okay.
Type text output filename into the window and save it.
- Export the xpk tables in the same way, but do File:Export:Peaks.
- Delete the first line in each of the exported .vol and .xpk text files.
- First time only: Edit
mkf27col. You need to edit the last section in
the file to be sure your Felix atom names are all converted to the appropriate
DIANA names. You also need to delete the section where pseudoatom NOE intensities
are divided by 2, and either leave it that way or put in your list of pseudoatoms
for each residue.
- First time only: Edit input and output file names
and choose between onenoe.7col and onenoerev.7col
in makerst and/or
makerst.new. Also edit the number of residues in
your protein in makerst.new.
- Run makerst (see here) or makerst.new (see below). makerst is a wrapper script which calls mkf27col and
diana_filt, followed by onenoe(rev).7col and the
sort command.
SYNTAX: makerst
INPUT FILES:
- noesyname.bin (bin file -- one for each NOESY experiment)
- noesyname.xpk (exported peak file from Felix97 -- one for each NOESY experiment)
- noesyname.vol (exported vol file from Felix97 -- one for each NOESY experiment)
- stereo (stereospecific assignments file, can be empty).
Note that this file is in different format from the basename.ssa
stereospecific assignments file which is used as DIANA input.
Breakdown of the commands executed by makerst:
- Runs mkf27col to convert your FELIX files
from each of your NOESY experiments to DIANA format (7 column format).
SYNTAX: mkf27col -b basename.bin -c 1 -p basename.xpk -s stereo -v
basename.vol > outputfile
- Concatenates all of the mkf27col output files into one file. Now you have a
file that makerst calls unfilt_rst.
- Runs diana_filt, which fixes the count of
intraresidue restraints.
SYNTAX: diana_filt unfilt_rst >! filt1.rst
- Runs onenoe.7col or
onenoerev.7col, which find duplicate restraints in the list
and keep the one with the shorter (onenoe.7col) or longer (onenoerev.7col) bin. You need
to specify which one is run by editing makerst!
SYNTAX: onenoe(rev).7col filt1.rst > filt2.rst
- Runs the UNIX command called sort which sorts the commented lines away from
the uncommented lines.
SYNTAX: sort filt2.rst > filt3.rst
Finishing the .upl file after makerst:
- OPTIONAL: Run addtropp5 to add 0.5 angstroms
to each methyl restraint as a correction factor. A script called
addtropp4 has a more thorough set of correction factors for restraints arising from pseudoatoms.
SYNTAX: addtropp5 inputfile > outputfile
- Delete all of the lines in filt3.rst which have been commented out.
- Run ambisort to sort intraresidue, sequential,
medium range, and long range NOEs in the restraint file. A tally will appear at the
beginning of the output file -- very useful information.
SYNTAX: ambisort -r [# of residues] filt3.rst > filt4.rst
- Copy filt4.rst to the directory in which you plan to run the DIANA job, and
rename the file to basename.upl
- OR run makerst.new. makerst.new is the same as
makerst but also calls rst_fix. rst_fix reads
fix_list, a list of restraints to
be "fixed" (because the peaks are in overlapped regions, etc.) You need to
generate this list manually. If you have used the multiple assign module in
Felix97, makerst.new will also read mult_assig,
which can be generated using mkmult_assig
(you should generate separate files for each multiply assigned NOESY and
concatenate them to make the final mult_assig file).
SYNTAX: makerst.new
INPUT FILES:
- noesyname.bin (bin file -- one for each NOESY experiment)
- noesyname.xpk (exported peak file from Felix97 -- one for each NOESY experiment)
- noesyname.vol (exported vol file from Felix97 -- one for each NOESY experiment)
- stereo (stereo file, can be empty) Note that this file is in different format
from the basename.ssa stereospecific assignments
file which is used as DIANA input.
- fix_list (a list of restraints that need to be adjusted)
- mult_assig (a list of restraints from multiply assigned peaks in Felix97, generated
using mkmult_assig)
Breakdown of the commands executed by makerst.new:
- Runs mkf27col to convert your FELIX files
from each of your NOESY experiments to DIANA format (7 column format).
SYNTAX: mkf27col -b basename.bin -c 1 -p basename.xpk -s stereo -v
basename.vol > outputfile
- Concatenates all of the mkf27col output files plus your mult_assig file
into one file. Now you have a file that makerst calls unfilt_rst.
- Runs diana_filt, which fixes the count of
intraresidue restraints.
SYNTAX: diana_filt unfilt_rst >! filt1.rst
- Runs onenoe.7col or
onenoerev.7col, which find duplicate restraints in the list
and keep the one with the shorter (onenoe.7col) or longer (onenoerev.7col) bin. You need
to specify which one is run by editing makerst.new!!
SYNTAX: onenoe(rev).7col filt1.rst > filt2.rst
- Runs the UNIX command called sort which sorts the commented lines away from
the uncommented lines.
SYNTAX: sort filt2.rst > filt3.rst
- Runs rst_fix, which reads the fix_list file
and removes or changes the distances for those restraints in the filt3.rst file. Also
removes duplicate restraints (lines with comments).
SYNTAX: rst_fix -r [# of residues] -f fix_list >! filt4.rst
- Runs ambisort to sort intraresidue, sequential,
medium range, and long range NOEs in the restraint file. A tally will appear at the
beginning of the output file -- very useful information.
SYNTAX: ambisort -r [# of residues] filt4.rst > basename.sort.rst
Finishing the .upl file after makerst.new:
- OPTIONAL: Run addtropp5 to add 0.5 angstroms to each
methyl restraint as a correction factor. A script called
addtropp4 has a more thorough set of corrections for restraints arising from pseudoatoms.
SYNTAX: addtropp4 inputfile > outputfile
- Copy basename.sort.rst to the directory in which you plan to run the DIANA
job, and rename the file to basename.upl
You will also need to have the diana executable and libraries
accessible. I installed them in my home directory. If you are
copying the diana directory from someone else (and not reinstalling
it from scratch), you will need to rebuild diana in order to have
the correct libraries used. If you do not do this, diana will
continue to use the libraries in the old directory (the place from
which you copied the directory). As long as the old directory stays
in place, this will not cause problems. However, it is best to
rebuild the program. To rebuild diana in the new
directory:
- log onto a machine that has a fortran compiler (f77) that
has the same OS as your machine and/or the SGI clusters
(depending on where you want to run diana)
- cd to your (new) diana directory
- type make clean
(this removes the configuration files leftover from the previous
installation)
- type ./configure
- edit the config.make file:
remove the tmp-mnt portion of the path from the BASEDIR line
- edit the Makefile:
change the INSTALLDIR line to reflect the new location of the
executable
Run diana either via the batch queue on the SGI clusters (see the
Research Computing
page on how to use NQE for batch queue instructions if you are
at Scripps) or on your local machine. The command is simply the
command file name (i.e. basename.com).
The most useful output file from DIANA is the .ovw file.
This is a overview of the restraint violations in the structures.
There is one for each set of structures DIANA generated. Look at the
last one (basename_1d.ovw).
After you have finished with distance geometry, you are ready to
do simulated annealing with AMBER. First, you must convert both the
coordinate files and the restraint files to AMBER format.
To convert the coordinate files:
- run d2p.com in the diana directory in which you have
your output files. The input directory will therefore be .
and the diana cor filename is basename_1d, where basename is the
name by which you are referring to your structure (such as
f36g and the 1d portion refers to the fourth set of
structures diana generated. You can delete all the other .cor files.
- move the resulting .pdb files to the amber/leap
directory under your main structure calculations directory.
- Convert the PDB files to AMBER xyz files as follows (in the
leap directory):
- lmpdb2amber (this generates a lot of error
output to the screen. Ignore it.)
- change
- runleap >! leap.log (you will need to have the
LEAPROOT variable correctly set in your .cshrc in order to run
leap)
- move the .xyz files from the leap directory to the data-based
directory in which you will be running AMBER
To convert the restraint files:
- copy all of the restraint files (.upl,
.aco, .lol,
.ssa, if
they are not empty) to the amber/rst directory
- First time only: generate your sample
PDB file as follows:
- convert one of the .xyz files generated above into a PDB
file using ambpdb with the following command line:
ambpdb -p ../leap/prmtop.leap91 file.xyz > file.pdb
ambpdb can be found in the md directory, which you
should obtain from the bin of someone who has done structure
calculations.
- call this sample PDB file something reasonable, then
be sure to edit mdprep to refer to the correct
file.
- edit dodomd and
domdprep to reflect your
filenames
- run dodomd:
SYNTAX: dodomd YYMMDD
(where YYMMDD is the date, such as 990512)
To run amber, make a date-based named subdirectory in your
struct_calc/amber directory. Put the following files in
this directory:
- prmtop.leap91
- the .xyz files from LEAP (move them from your
amber/leap directory)
- the restraint files converted from your DIANA restraints
(move from amber/rst/mdprep.monomer.YYMMDD directory). This
file will be called basename.RST.
- copy basename.RST to basename.min.RST.
These will be the restraints used during the short minimization
before the simulated annealing run.
- mkanneal.sgi, a script
that will create the anneal command files for the actual AMBER run.
If you are at Scripps, you have two choices:
- Run on the SGI cluster.
- Run mkanneal.sgi. You will get a file named
anneal0XX for each .xyz file. You are now ready to run
AMBER.
- Log into krusty. Submit your anneal jobs to the batch queue
(SGI cluster) using the submit.sgi script. Be sure that your
anneal0xx files call the sander executable
/thr/slater/amber6/exe.R10000/sander_classic.
- Or, run on the Cray. Log into buzz and FTP your files over to your
directory there at /scratch/username/. Submit your anneal jobs to buzz
using the anneal.all script. Be sure the anneal0xx files call the
sander executable /scratch/case/amber6/exe/sander_classic.
Once all of your anneal jobs have finished, you are ready to
analyze the output and see how your structures look. There are two
common types of problems that will prevent a particular anneal job
from finishing correctly:
- The job crashes during minimization. In this case, the job
will quit very early in the run (after only a few seconds or
minutes). You will have no output, because the rMD files will not
have been created, and the minimization files will have been
deleted. If you would like to look at the minimization output files,
edit the mkanneal.sgi script so that they are not deleted. If the
job has crashed in minimization, you will find the following error
in these output files:
RESTARTED DUE TO LINMIN FAILURE
I saw more structures crashing in minimization in two
instances:
- when I had a few bad restraints in my list
- when my structures were occupying two distinct
conformations (in my cases, a particular loop could be in one
of two positions)
See the
LINMIN failures portion of the AMBER FAQ for more
information.
- The job "blows up" during the restrained molecular dynamics
(rMD). In this case, you will have all of your output files, but
there willbe no meaningful information in the output xyz file (here
is an example of the xyz file resulting
from this problem), and there will be some odd lines in the
output file (here is an example output
file resulting from this problem). This problem is most
common early in the calculations, when you don't have many
restraints.
As long as you didn't lose too many structures to these two types
of problems, you can continue with the analysis. I never lost more
than 10 structures out of 50.
Randy Ketchem wrote a handy script for analyzing your structures.
It is called doptrail, for "do paper
trail". It can be run for either all of the structures in your
ensemble, or for only a family of the best structures. Here are the
steps for running it:
- Set up your analysis directories. If you are at Scripps, the
easiest way is to copy the amber/YYMMDD/analysis
directory from someone else. Otherwise, here are the directories to
make under the analysis subdirectory in your
amber/YYMMDD directory:
- pdb
Put these scripts in this directory:
mkpdb and
mkload
- noevioe
Put the runnoevio script in
this directory
- senergy
Put the runsenergy script in
this directory
- srst
Put the runsrst script in
this directory
- suppose
Put the runsuppose script in
this directory (First time only: Edit
this script to overlay selected regions, probably helices,
in your molecule.)
- sviol
Put the runsviol script in
this directory
- torvio
Put the runtorvio script in
this directory
- These scripts all call scripts that are stored in the MD
directory in your bin. Here is a
list of the contents of this
directory. You will need to have the appropriate scripts in
this directory before you can analyze your AMBER output using
Randy's scripts.
- You must create PDB files in the correct format from
the AMBER output xyz files. Do this in the analysis/pdb
directory. Run mkpdb. When it encounters xyz files that
resulted from AMBER runs in which the structure blew up in rMN, it
will give an error. Ignore this error, but be sure to remove the PDB
files for these structures. They will be empty (0 size in a ls
-l listing).
- Create a family.all file. Do this in the pdb
directory as follows:
- vi family.all
- delete the contents, if any
- read in a list of the PDB files:
:r !ls *.pdb
- remove the (empty) first line
- Copy the family.all file to the main analysis
directory
- Run doptrail:
doptrail YYMMDD all
- You can print all the output from this script using
printem:
printem all
Once my structures got good enough to allow me to define a useful
family, I stopped printing all of this output, and just printed the
output from the runsrst script (analysis/srst/srst.all.ps)
- Use the output from the srst script (which is the structures
listed by increasing restraint violation energies) to define a
starting family. I usually took the 25 structures with the lowest
E(rst).
- Go to the pdb directory and copy family.all to
family.fam. Edit this file to remove all put the structures in your
preliminary family.
- Look at your family in Insight:
- Edit mkload to reflect the
correct names of your PDB files.
- Create an Insight bcl script for loading the families
using mkload:
mkload fam >! loadfam.bcl
- If you would rather look at all the entire ensemble, do
the following instead:
mkload all >! loadfam.bcl
- open Insight (it is best to do this in the pdb
directory)
- Read in your structures: File->Source file->loadfam.bcl
(this will take a while)
- You can often visually identify structures that are clearly
"outliers" in the family. Early in the calculation, it is OK to just
go ahead and remove them from the family.fam file.
- Once you have your "final" family listed in the family.fam
file, run doptrail and printem again, this time just for the family:
doptrail YYMMDD fam
printem fam
- Some things to check for when looking at the analyses from
doptrail, and at the structures in Insight:
- Look at the table of restraint violations in the various
structures. Use this to identify restraints that might be
wrong (either misassigned or put into too tight of a bin).
Removing these will generally improve your structures noticeably.
- Track the progress of the RMSDs reported by suppose.
Obviously, these should get smaller as the calculations progress.
- Check the "difference from mean by molecular number" at the
end of the suppose output. Remove a structure which is clearly
an outlier.
- Look at the family in Insight to identify regions that are
particularly poorly defined-- it often pays to look intensively
for some more NOEs in these regions. Even a few new NOEs can
noticeably improve things.
How to choose a family of structures
Criteria for a final family of structures
Here are some rules of thumb to check to see if you are finished with your structure
calculation!
Be sure to run twice as many structures through the calculation as you expect to have in your
final family.
Visually check your family in InsightII, and throw away structures if you can justify it upon
the basis of restraints and/or chemical reasonableness.
In the doptrail output which displays all of the analysis of your AMBER structures, check:
- suppose.fam file
- The all atom RMSD should be less than or equal to twice the backbone RMSD
(where RMSD = RMS difference from the mean structure).
- sviol.fam file
- In the first section, under the "max" column, you don't want any
values >0.2 in your final family.
- In the second section (ANGLES), under the "max" column, you don't want
any values >5 degrees in your final family.
- Check through the individual restraint violations to see if any
restraint is consistently violated by more than ? angstroms. Check it.
- senergy.fam file
1 star before value denotes >= (mean + 1*SD)
2 stars before value denotes >= (mean + 2*SD)
3 stars before value denotes >= (mean + 3*SD)
- Etot, Epot, EAMBER: should not have any items
with 2 or 3 stars (stars are listed to the left of the columns), except if the
value varies from the mean in the favorable direction.
- All other entries in the table at the top of the file: No items with 3 stars
allowed. If an item has 2 stars, check it.
- NOTE: EKtot is meaningless, and EElec is less meaningful
than the other items listed (BOND, ANGLE, DIHED, 1-4 NB, 1-4 EEL, VDWAALS,
EHBOND, CONSTRAINT)
last updated 12/31/02 by Kevin Weiss