Instructions for Running Structure Calculations

Written by: Melanie Nelson and Patty Fagan Jones

This document contains some basic instructions for running NMR structure calculations using DIANA and AMBER. It was written based on advice from Lena Maler.

Generating the First Restraint Lists

You will need approximately 200 distance constraints to make a first attempt at a structure calculation. There are two important types of NOEs to identify at this stage: medium range NOEs (which will help define the secondary structure) and long range NOEs (which will "fold up" the protein). However, in the process of calibrating the bin sizes for the distance constraints and identifying the medium range and long range NOEs, you will also identify many sequential NOEs, and these should also be included in the first restraint list.

Before you can use any NOEs from a particular NOESY in your restraint list, you must calibrate the bin sizes. I began by defining three bins for my distance restraints, with upper bounds of 3.5, 4.5, and 6.0 angstroms. Later in the structure calculation process, it might be desirable to define four bins, and to fine tune the bounds. Here is one procedure for calibrating the bin sizes:

Identify approximately 10 dNN (i to i+1) NOEs in regions known to be helical (from the chemical shift index or HNHA data). Use the volumes of these peaks to determine the smallest volume that will be put into the tightest restraint bin (in my case, 3.5 angstroms).
Identify approximately 10 daN (i to i+3) NOEs in regions known to be helical. Use the volumes of these peaks to determine the smallest volume that will be put into the middle restraint bin (e.g. 4.5 angstroms).
All NOEs with volumes less than the cutoff identified in the previous step will be put into the loosest restraint bin (in this case, 6.0 angstroms).
Check that these bins are OK once you have gone through and found as many of the sequenctial and i to i+3 NOEs as can be unambiguously identified. Some sequential NOEs may fall in the medium bin, and a few i to i+3 NOEs may fall in the tight bin, but you do not want too many of these exceptions.

I used the intra-residue distances between HD and HE in phenylalanines to calibrate the tight bin for the D₂O NOESY. The measured distance between HD and HE is 2.5 angstroms. It was necessary to divide the measured volume by two for the calibration, since there are two HDs and two HEs on a phenylalanine residue, and both would be contributing to the NOE. It is also necessary to be conservative in assigning bins from these distances, since spin diffusion will certainly contribute to the volumes of these intra-ring NOEs.

Once you have the bins calibrated, you are ready to begin to build up a restraint list. If you have a good knowledge of the secondary structure of your protein, either from chemical shift indexing, experiments to measure torsion angles (such as the HNHA experiment), or other considerations, a reasonable next step is to systematically search for the NOEs that define the secondary structure, such as the i to i+3 (HA to HN) NOEs for the helices. See chapter 7 of NMR of Proteins and Nucleic Acids, by Wuthrich for a useful table of short inter-proton distances in standard secondary structures.

Setting Up the Structure Calculations Directories

It doesn't really matter how you set up your structure calculation directories. However, the instructions in this document will assume the following set up:

Make the following directories under one main directory (called struct_calc here):

diana
Make directories under this directory with date-based names for each structure calculation run. For instance, calculations run on May 12, 1999 would go in a directory called 990512. This helps keep the output from the calculations straight.
amber
Make date-based directories under this directory for the actual AMBER runs. Also make:
- leap
  This directory holds the pdb files as they are being converted to AMBER xyz format. You need the following scripts/programs and files in this directory:
  - lmpdb2amber
  - PROTON_INFO
  - change
  - changeh
  - prmtop.leap91
  - tleap (part of the LEAP program)
  - xleap (part of the LEAP program)
  - runleap
- rst
  This directory is where you convert the restraints from DIANA format to AMBER format. You need the following scripts/programs and files in this directory:
  - dodomd (this script will create date-based named directories with the RST files in them)
  - domdprep (be sure to edit the dodomd script to call this program correctly)
  - noeconv (part of the GAP package of structure analysis programs)
  - diana2habas
  - a sample PDB file that exactly matches the AMBER xyz files. There is more information about how to create this PDB file under the Converting from DIANA to AMBER section.

You will also need a directory with all of the MD scripts, etc. The various scripts we currently use will assume that this directory is called ~username/bin/md. You will have to edit the scripts to make sure they have the correct location of this directory. Here is a list of the contents of the MD directory.

Running DIANA (Distance Geometry)

You will need the following files in order to run DIANA:

The command file (the file that actually controls the DIANA run). It is usually named basename.com, where basename is whatever name you are using to identify your structures (for instance, f36g or crcn). Some points about the command file:
- Be sure to change the first line so that the cd command has the correct directory
- The example script is generating 50 structures. Early in the calculations, you will want to generate about this many structures, because you may lose several during AMBER calculations, as discussed below. DIANA runs fairly quickly, so I continued to generate 50 structures in DIANA as my calculations progressed, but then only took the best structures to AMBER calculations.
- The seed is the seed for the random number generator (random numbers are used to help the program sample conformational space). I never changed this value, but in principle you could.
- The cutoffs determine which structures are included in the restraint violation output. The only way to determine what these should be is empirically.
- The sample script runs DIANA four times, using the results from the previous run to start the next run. You are interested in the coordinates generated by the last run (the .cor files).
The distance restraint upper bounds file. It is usually named basename.upl. See below for procedure to create this file from Felix97. This distance restraint list must be in a defined format, and must use the correct atom names for DIANA. Errors in atomnames are the most common reason for DIANA jobs to fail. Check the DIANA manual for the correct atom names for each residue.
The distance lower bounds file. It is usually named basename.lol. This file is usually used for hydrogen bond restraints. If you do not have any of these restraints, leave the file empty.
The angular restraint file. It is usually named basename.aco. Note the format for the chi(1) restraints for the trans rotamer (120 degrees to 240 degrees, not 120 degrees to -120 degrees. Although these are equivalent, the latter does not work with the conversion to AMBER format restraints.)
The stereospecific assignments file. It is usually named basename.ssa. If you do not have any stereospecific assignments, this file should be left empty. See below about making stereospecific assignments and creating this file.
The sequence file. This file lists the sequence of your protein in three letter code. Here is an example sequence file

Procedure for creating a .upl DIANA input file from Felix97:

For each of your NOESY experiments, run the volume measurement routine in Felix97 using the Measure:Integral/Volume:Measure_All_Volumes menu pulldown.
Export the volume tables (one for each NOESY experiment) to text files by using the File:Export:Table menu pulldown, double click on "vol", click on filename indicating which experiment it is (i.e. nnoe3d), and hit okay. Type text output filename into the window and save it.
Export the xpk tables in the same way, but do File:Export:Peaks.
Delete the first line in each of the exported .vol and .xpk text files.
First time only: Edit mkf27col. You need to edit the last section in the file to be sure your Felix atom names are all converted to the appropriate DIANA names. You also need to delete the section where pseudoatom NOE intensities are divided by 2, and either leave it that way or put in your list of pseudoatoms for each residue.
First time only: Edit input and output file names and choose between onenoe.7col and onenoerev.7col in makerst and/or makerst.new. Also edit the number of residues in your protein in makerst.new.
Run makerst (see here) or makerst.new (see below). makerst is a wrapper script which calls mkf27col and diana_filt, followed by onenoe(rev).7col and the sort command.
SYNTAX: makerst
INPUT FILES:
- noesyname.bin (bin file -- one for each NOESY experiment)
- noesyname.xpk (exported peak file from Felix97 -- one for each NOESY experiment)
- noesyname.vol (exported vol file from Felix97 -- one for each NOESY experiment)
- stereo (stereospecific assignments file, can be empty). Note that this file is in different format from the basename.ssa stereospecific assignments file which is used as DIANA input.
Breakdown of the commands executed by makerst:
- Runs mkf27col to convert your FELIX files from each of your NOESY experiments to DIANA format (7 column format).
  SYNTAX: mkf27col -b basename.bin -c 1 -p basename.xpk -s stereo -v basename.vol > outputfile
- Concatenates all of the mkf27col output files into one file. Now you have a file that makerst calls unfilt_rst.
- Runs diana_filt, which fixes the count of intraresidue restraints.
  SYNTAX: diana_filt unfilt_rst >! filt1.rst
- Runs onenoe.7col or onenoerev.7col, which find duplicate restraints in the list and keep the one with the shorter (onenoe.7col) or longer (onenoerev.7col) bin. You need to specify which one is run by editing makerst!
  SYNTAX: onenoe(rev).7col filt1.rst > filt2.rst
- Runs the UNIX command called sort which sorts the commented lines away from the uncommented lines.
  SYNTAX: sort filt2.rst > filt3.rst
Finishing the .upl file after makerst:
- OPTIONAL: Run addtropp5 to add 0.5 angstroms to each methyl restraint as a correction factor. A script called addtropp4 has a more thorough set of correction factors for restraints arising from pseudoatoms.
  SYNTAX: addtropp5 inputfile > outputfile
- Delete all of the lines in filt3.rst which have been commented out.
- Run ambisort to sort intraresidue, sequential, medium range, and long range NOEs in the restraint file. A tally will appear at the beginning of the output file -- very useful information.
  SYNTAX: ambisort -r [# of residues] filt3.rst > filt4.rst
- Copy filt4.rst to the directory in which you plan to run the DIANA job, and rename the file to basename.upl
OR run makerst.new. makerst.new is the same as makerst but also calls rst_fix. rst_fix reads fix_list, a list of restraints to be "fixed" (because the peaks are in overlapped regions, etc.) You need to generate this list manually. If you have used the multiple assign module in Felix97, makerst.new will also read mult_assig, which can be generated using mkmult_assig (you should generate separate files for each multiply assigned NOESY and concatenate them to make the final mult_assig file).
SYNTAX: makerst.new
INPUT FILES:
- noesyname.bin (bin file -- one for each NOESY experiment)
- noesyname.xpk (exported peak file from Felix97 -- one for each NOESY experiment)
- noesyname.vol (exported vol file from Felix97 -- one for each NOESY experiment)
- stereo (stereo file, can be empty) Note that this file is in different format from the basename.ssa stereospecific assignments file which is used as DIANA input.
- fix_list (a list of restraints that need to be adjusted)
- mult_assig (a list of restraints from multiply assigned peaks in Felix97, generated using mkmult_assig)
Breakdown of the commands executed by makerst.new:
- Runs mkf27col to convert your FELIX files from each of your NOESY experiments to DIANA format (7 column format).
  SYNTAX: mkf27col -b basename.bin -c 1 -p basename.xpk -s stereo -v basename.vol > outputfile
- Concatenates all of the mkf27col output files plus your mult_assig file into one file. Now you have a file that makerst calls unfilt_rst.
- Runs diana_filt, which fixes the count of intraresidue restraints.
  SYNTAX: diana_filt unfilt_rst >! filt1.rst
- Runs onenoe.7col or onenoerev.7col, which find duplicate restraints in the list and keep the one with the shorter (onenoe.7col) or longer (onenoerev.7col) bin. You need to specify which one is run by editing makerst.new!!
  SYNTAX: onenoe(rev).7col filt1.rst > filt2.rst
- Runs the UNIX command called sort which sorts the commented lines away from the uncommented lines.
  SYNTAX: sort filt2.rst > filt3.rst
- Runs rst_fix, which reads the fix_list file and removes or changes the distances for those restraints in the filt3.rst file. Also removes duplicate restraints (lines with comments). SYNTAX: rst_fix -r [# of residues] -f fix_list >! filt4.rst
- Runs ambisort to sort intraresidue, sequential, medium range, and long range NOEs in the restraint file. A tally will appear at the beginning of the output file -- very useful information.
  SYNTAX: ambisort -r [# of residues] filt4.rst > basename.sort.rst
Finishing the .upl file after makerst.new:
- OPTIONAL: Run addtropp5 to add 0.5 angstroms to each methyl restraint as a correction factor. A script called addtropp4 has a more thorough set of corrections for restraints arising from pseudoatoms.
  SYNTAX: addtropp4 inputfile > outputfile
- Copy basename.sort.rst to the directory in which you plan to run the DIANA job, and rename the file to basename.upl

You will also need to have the diana executable and libraries accessible. I installed them in my home directory. If you are copying the diana directory from someone else (and not reinstalling it from scratch), you will need to rebuild diana in order to have the correct libraries used. If you do not do this, diana will continue to use the libraries in the old directory (the place from which you copied the directory). As long as the old directory stays in place, this will not cause problems. However, it is best to rebuild the program. To rebuild diana in the new directory:

log onto a machine that has a fortran compiler (f77) that has the same OS as your machine and/or the SGI clusters (depending on where you want to run diana)
cd to your (new) diana directory
type make clean
(this removes the configuration files leftover from the previous installation)
type ./configure
edit the config.make file:
remove the tmp-mnt portion of the path from the BASEDIR line
edit the Makefile:
change the INSTALLDIR line to reflect the new location of the executable

Run diana either via the batch queue on the SGI clusters (see the Research Computing page on how to use NQE for batch queue instructions if you are at Scripps) or on your local machine. The command is simply the command file name (i.e. basename.com).

The most useful output file from DIANA is the .ovw file. This is a overview of the restraint violations in the structures. There is one for each set of structures DIANA generated. Look at the last one (basename_1d.ovw).

Converting from DIANA to AMBER

After you have finished with distance geometry, you are ready to do simulated annealing with AMBER. First, you must convert both the coordinate files and the restraint files to AMBER format.

To convert the coordinate files:

run d2p.com in the diana directory in which you have your output files. The input directory will therefore be . and the diana cor filename is basename_1d, where basename is the name by which you are referring to your structure (such as f36g and the 1d portion refers to the fourth set of structures diana generated. You can delete all the other .cor files.
move the resulting .pdb files to the amber/leap directory under your main structure calculations directory.
Convert the PDB files to AMBER xyz files as follows (in the leap directory):
- lmpdb2amber (this generates a lot of error output to the screen. Ignore it.)
- change
- runleap >! leap.log (you will need to have the LEAPROOT variable correctly set in your .cshrc in order to run leap)
move the .xyz files from the leap directory to the data-based directory in which you will be running AMBER

To convert the restraint files:

copy all of the restraint files (.upl, .aco, .lol, .ssa, if they are not empty) to the amber/rst directory
First time only: generate your sample PDB file as follows:
- convert one of the .xyz files generated above into a PDB file using ambpdb with the following command line:
  ambpdb -p ../leap/prmtop.leap91 file.xyz > file.pdb
  ambpdb can be found in the md directory, which you should obtain from the bin of someone who has done structure calculations.
- call this sample PDB file something reasonable, then be sure to edit mdprep to refer to the correct file.
edit dodomd and domdprep to reflect your filenames
run dodomd:
SYNTAX: dodomd YYMMDD
(where YYMMDD is the date, such as 990512)

Running AMBER

To run amber, make a date-based named subdirectory in your struct_calc/amber directory. Put the following files in this directory:

prmtop.leap91
the .xyz files from LEAP (move them from your amber/leap directory)
the restraint files converted from your DIANA restraints (move from amber/rst/mdprep.monomer.YYMMDD directory). This file will be called basename.RST.
copy basename.RST to basename.min.RST. These will be the restraints used during the short minimization before the simulated annealing run.
mkanneal.sgi, a script that will create the anneal command files for the actual AMBER run.

If you are at Scripps, you have two choices:

Run on the SGI cluster.
- Run mkanneal.sgi. You will get a file named anneal0XX for each .xyz file. You are now ready to run AMBER.
- Log into krusty. Submit your anneal jobs to the batch queue (SGI cluster) using the submit.sgi script. Be sure that your anneal0xx files call the sander executable /thr/slater/amber6/exe.R10000/sander_classic.
Or, run on the Cray. Log into buzz and FTP your files over to your directory there at /scratch/username/. Submit your anneal jobs to buzz using the anneal.all script. Be sure the anneal0xx files call the sander executable /scratch/case/amber6/exe/sander_classic.

Analyzing the AMBER Output

Once all of your anneal jobs have finished, you are ready to analyze the output and see how your structures look. There are two common types of problems that will prevent a particular anneal job from finishing correctly:

The job crashes during minimization. In this case, the job will quit very early in the run (after only a few seconds or minutes). You will have no output, because the rMD files will not have been created, and the minimization files will have been deleted. If you would like to look at the minimization output files, edit the mkanneal.sgi script so that they are not deleted. If the job has crashed in minimization, you will find the following error in these output files:
RESTARTED DUE TO LINMIN FAILURE
I saw more structures crashing in minimization in two instances:
- when I had a few bad restraints in my list
- when my structures were occupying two distinct conformations (in my cases, a particular loop could be in one of two positions)
See the LINMIN failures portion of the AMBER FAQ for more information.
The job "blows up" during the restrained molecular dynamics (rMD). In this case, you will have all of your output files, but there willbe no meaningful information in the output xyz file (here is an example of the xyz file resulting from this problem), and there will be some odd lines in the output file (here is an example output file resulting from this problem). This problem is most common early in the calculations, when you don't have many restraints.

As long as you didn't lose too many structures to these two types of problems, you can continue with the analysis. I never lost more than 10 structures out of 50.

Randy Ketchem wrote a handy script for analyzing your structures. It is called doptrail, for "do paper trail". It can be run for either all of the structures in your ensemble, or for only a family of the best structures. Here are the steps for running it:

Set up your analysis directories. If you are at Scripps, the easiest way is to copy the amber/YYMMDD/analysis directory from someone else. Otherwise, here are the directories to make under the analysis subdirectory in your amber/YYMMDD directory:
- pdb
  Put these scripts in this directory: mkpdb and mkload
- noevioe
  Put the runnoevio script in this directory
- senergy
  Put the runsenergy script in this directory
- srst
  Put the runsrst script in this directory
- suppose
  Put the runsuppose script in this directory (First time only: Edit this script to overlay selected regions, probably helices, in your molecule.)
- sviol
  Put the runsviol script in this directory
- torvio
  Put the runtorvio script in this directory
These scripts all call scripts that are stored in the MD directory in your bin. Here is a list of the contents of this directory. You will need to have the appropriate scripts in this directory before you can analyze your AMBER output using Randy's scripts.
You must create PDB files in the correct format from the AMBER output xyz files. Do this in the analysis/pdb directory. Run mkpdb. When it encounters xyz files that resulted from AMBER runs in which the structure blew up in rMN, it will give an error. Ignore this error, but be sure to remove the PDB files for these structures. They will be empty (0 size in a ls -l listing).
Create a family.all file. Do this in the pdb directory as follows:
- vi family.all
- delete the contents, if any
- read in a list of the PDB files:
  :r !ls *.pdb
- remove the (empty) first line
Copy the family.all file to the main analysis directory
Run doptrail:
doptrail YYMMDD all
You can print all the output from this script using printem:
printem all
Once my structures got good enough to allow me to define a useful family, I stopped printing all of this output, and just printed the output from the runsrst script (analysis/srst/srst.all.ps)
Use the output from the srst script (which is the structures listed by increasing restraint violation energies) to define a starting family. I usually took the 25 structures with the lowest E(rst).
Go to the pdb directory and copy family.all to family.fam. Edit this file to remove all put the structures in your preliminary family.
Look at your family in Insight:
- Edit mkload to reflect the correct names of your PDB files.
- Create an Insight bcl script for loading the families using mkload:
  mkload fam >! loadfam.bcl
- If you would rather look at all the entire ensemble, do the following instead:
  mkload all >! loadfam.bcl
- open Insight (it is best to do this in the pdb directory)
- Read in your structures: File->Source file->loadfam.bcl (this will take a while)
You can often visually identify structures that are clearly "outliers" in the family. Early in the calculation, it is OK to just go ahead and remove them from the family.fam file.
Once you have your "final" family listed in the family.fam file, run doptrail and printem again, this time just for the family:
doptrail YYMMDD fam
printem fam
Some things to check for when looking at the analyses from doptrail, and at the structures in Insight:
- Look at the table of restraint violations in the various structures. Use this to identify restraints that might be wrong (either misassigned or put into too tight of a bin). Removing these will generally improve your structures noticeably.
- Track the progress of the RMSDs reported by suppose. Obviously, these should get smaller as the calculations progress.
- Check the "difference from mean by molecular number" at the end of the suppose output. Remove a structure which is clearly an outlier.
- Look at the family in Insight to identify regions that are particularly poorly defined-- it often pays to look intensively for some more NOEs in these regions. Even a few new NOEs can noticeably improve things.

Distance Filtering

Stereospecific Assignments

How to choose a family of structures

Criteria for a final family of structures

Here are some rules of thumb to check to see if you are finished with your structure calculation!
Be sure to run twice as many structures through the calculation as you expect to have in your final family.
Visually check your family in InsightII, and throw away structures if you can justify it upon the basis of restraints and/or chemical reasonableness.

In the doptrail output which displays all of the analysis of your AMBER structures, check:

suppose.fam file
- The all atom RMSD should be less than or equal to twice the backbone RMSD (where RMSD = RMS difference from the mean structure).
sviol.fam file
- In the first section, under the "max" column, you don't want any values >0.2 in your final family.
- In the second section (ANGLES), under the "max" column, you don't want any values >5 degrees in your final family.
- Check through the individual restraint violations to see if any restraint is consistently violated by more than ? angstroms. Check it.
senergy.fam file
1 star before value denotes >= (mean + 1*SD)
2 stars before value denotes >= (mean + 2*SD)
3 stars before value denotes >= (mean + 3*SD)
- E_tot, E_pot, E_AMBER: should not have any items with 2 or 3 stars (stars are listed to the left of the columns), except if the value varies from the mean in the favorable direction.
- All other entries in the table at the top of the file: No items with 3 stars allowed. If an item has 2 stars, check it.
- NOTE: E_Ktot is meaningless, and E_Elec is less meaningful than the other items listed (BOND, ANGLE, DIHED, 1-4 NB, 1-4 EEL, VDWAALS, E_HBOND, CONSTRAINT)

last updated 12/31/02 by Kevin Weiss