Instructions for Running Structure Calculations: Distance Filtering

Instructions for Running Structure Calculations

Written by: Patty Fagan Jones and Melanie Nelson, 1999

This document contains some basic instructions for running distance filtering during NMR structure calculations using DIANA and AMBER.

Distance Filtering

Once you can define a reasonable family of structures, you can use distance filtering to help you find more restraints. This is the process by which you let the sum of the rest of the data (information that is contained in your preliminary family of structures) help you determine which of the possible assignments (based on chemical shifts) for a given NOE is valid. Of course, you will need to be conservative about making assignments, particularly early in the structure calculations when you have a poorer family of structures. There are several ways to do distance filtering. What is required is:

A way to generate the possible assignments for each NOE. (Methods for this include using GENXPK and using FELIX's built in assignment procedures)
A method for generating the distances between protons in your family of structures. (Methods for this include using either the distance or noevio programs in Garry' Gippert's GAP package)
A method for connecting these two pieces of information, and using the distances to rule out potential assignments. (Methods for this include two scripts written by Randy Ketchem: disambig, which is a more manual method, as is particularly useful at the early stages of distance filtering, when you need to be most conservative, and doambig, which is really a wrapper script that calls several different filters, including one or more distances filters. The more manual disambig script is also useful when working in regions where simply decreasing the number of possible assignments to 2-3 may allow assignments based on chemical shifts to be made. I found this to be the case in my D₂O NOESY, where I often could distinguish between two aromatic protons that were close in chemical shift, but not betwenn the variety of aliphatic protons to which they could be making an NOE. Both disambig and the scripts called by doambig will probably need to be modified to match the atom nomenclature you are using. check the Perl scripts for structure calculations page for more information about the scripts, what they do, and how to modify them.)
A consistent set of rules by which you are filtering the possible assignments. These rules will become increasingly stringent as the calculations progress, allopwing more NOEs to be assigned. Potential starting points are:
- For disambig: a cutoff distance of 9 or 10 angstroms. A peak is assigned to a given possibility if the average distance from the family minus the RMSD of distances is within the realm of detection (5-6 angstroms, depending on your data), the average distance plus the RMSD is less than the cutoff, and no other possibility has an average distance (minus the RMSD or twice the RMSD, depending on how stringent you want to be) of less than the cutoff.
- For doambig: the distance filtering is done by a program called in dofilter called ambidis. It removes possible assignments for which the average distance is greater than the cutoff and fewer than a defined percentage of the structures in the family have distances less than the cutoff. Reasonable starting points for the cutoff and the percentage are 9-10 angstroms and 20%.

Distance filtering using GENXPK followed by doambig/disambig:

This method utilizes GENXPK to do a first step of chemical shift filtering, followed by doambig/disambig to do the second step of distance filtering. Note that GENXPK only analyzes the peaks which are unassigned in Felix, and ignores all fully assigned peaks in Felix.

If you wish to filter against all of your peaks in a particular NOESY spectrum, you can "unassign all peaks" in the Assign module of Felix, DON'T save it to your dba, and export the peaks to a text file (genxpk.txt -- see general instructions below). In this way the disambig/doambig output will be comprehensive for all of your picked peaks.

Start with GENXPK. Run GENXPK in your /home/yourname/felix97/text/noesyname/ directory. Select one NOESY spectrum to filter against.

Make a .assignments file from Felix97:
- In Felix97, copy the spectrum-specific chemical shifts for the spectrum you wish to filter against to the generic chemical shifts (in the Assign module).
- Export the spinsystems table to a text file called spinsys.txt
- Run pat_to_4col. This script converts the file format to 4 column GENXPK format.
  SYNTAX: pat_to_4col spinsys.txt [expt # of this NOESY in the Felix97 project experiment list] > spinsys.4col
- Check the spinsys.4col file. Delete all lines which contain zero information.
- First time only: Edit fix_4col to include any non-IUPAC atom or residue names you may have used in your Felix assignments.
- Run fix_4col. This script converts your Felix97 nomenclature to GENXPK nomenclature. This script is not yet tested with a stereopecific assignments file.
  SYNTAX: fix_4col spinsys.4col > .assignments
Make a refparm.noesy file manually or in Felix97 using getref.mac, a Felix macro for extracting reference parameters for 3D matrices.
Make the other two input files you will need for GENXPK: genxpk.vol and genxpk.txt.
- In FELIX, measure all volumes for the NOESY you have selected.
- Export the volumes to a text file called /felix/text/noesyname/noesyname.vol
- Export the peaks to a text file called /felix/text/noesyname/noesyname.xpk
- Manually delete the first line of each of these two text files.
- Rename these two files: noesyname.vol -> genxpk.vol, and noesyname.xpk -> genxpk.txt
Run GENXPK. You need to install it on your machine.
- SYNTAX: Type the following commands.
  - genxpk refparm.cnoe (or whatever your refparm file name is called) (This command starts up GENXPK.)
  - vol 1
  - rxpk (You'll get lots of output to the screen. Don't worry.)
  - sap (This command shows you the current settings for assignment parameters. Change them if they are incorrect. See the GENXPK manual for commands.)
  - sap write
  - asg (You'll get lots of output to the screen. Don't worry.)
  - quit
- OUTPUT file: ASG_RESULTS

Repeat this procedure if you wish to do distance filtering against a different NOESY spectrum.

Now run disambig or doambig.

disambig, which concatenates the ambiguity output from either GENXPK or Felix with the distance information for a family, from distance (part of Garry Gippert's GAP package). The atom names in the two files must match! Therefore, another version of this script is available, for cases where there is GENXPK format output with Felix-type atom nomenclature. This modified version also has the newer nomenclature for pseudoatoms. For instance, "QPA" is replaced by "QA", etc. All modifications are restricted to the ConvertGenxpkName subroutine. Users with Felix format ambiguity files may need to modify the ConvertFelixName subroutine. Another modified version of the script is required for GENXPK output from 3D data sets, due to the difference in column numbers in the ASG_RESULTS file. This script is called disambig3d, and was made using the modified version of disambig (disambig.mn). It is run exactly as disambig is.
To use a disambig script:
- Generate a list of unassigned NOEs, with the possible assignments, either using GENXPK or Felix. The comments in the script give examples of the correct format for these input files.
- Generate a list of inter-atom distances for your current family of structures, using the distance program in Garry Gippert's GAP package. It is important that all the pseudoatoms are included in this distance file. Here is how I got that to work: 1) I used a modified pseudoatom map file, in which I had removed the pseudoatoms I did not want included. 2) I ran distance with the following command line:
  distance -fam family.fam -sub "^[HCMNQ]" -pms "^[HMNQ]" -pam pseudomap.mn -list 1 -cut 25 > output
  Refer to the help and documentation available for the distance program for more information about the command line options.
- Run the script. The comments of the script give the command line:
  disambig -d [distance file] -f [felix|genxpk] -p [peak file]
written by: Randal R. Ketchem; modified by Melanie Nelson
aro_filt, which filters disambig output and only prints lines in which the potential assignment in D1 is an aromatic proton. It is for use with D₂O NOESY data. It is run as follows:
aro_filt inputfile [>! output file]
Where inputfile is the output from disambig.
written by: Melanie Nelson
doambig performs a similar function as disambig. The output can be sent to ambigrab to simplify the process of manually finding new assignments. Doambig screens through the unassigned peaks from one NOESY spectrum, based on the spectrum-specific assignments from that NOESY spectrum. doambig is a wrapper script which runs/requires the following scripts:
- ambig2ncol (2D version) (3D version) (3D version with chemical shift filtering)
  Chemical shift filtering removes peaks from the file which arise within a user-defined chemical shift range, and is needed in the case where there are many unassigned resonances in this range, preventing unambiguous assignment of such peaks.
  Note: you need to edit the doambig script to correctly denote which of these three versions of ambig2ncol you wish to run. To test this script before running doambig, use the following syntax:
  ambig2ncol(.3d) -f [felix|genxpk] -p [peak file] -s [stereo file] > outputfile
- noevio
- all of the scripts and files in ~username/bin/ambi/ (check with anyone who has done structure calculations)
- dofilter: This script can be commented out of the doambig script when doing manual distance filtering (early in your structure calculations). dofilter performs automated distance filtering. It uses ambipick and ambidis, which contain the filters that you will need to edit to do automated distance filtering.
INPUT files:
- noesyname_asgresults (called the "peak file", this is the GENXPK output file ASG_RESULTS renamed to noesyname_asgresults)
- noesyname.ssa (stereospecific assignments file, can be empty)
- family.all or family.fam (lists .pdb files)
OUTPUT:
- 7col.noesyname_asgresults
- 7col.noesyname_asgresults.noevio (this file can be fed into ambigrab)
- 7col.noesyname_asgresults.noevio.filter
Edit doambig before running:
- to point to the correct noesyname_asgresults file
- to point to either family.all or family.fam file
- to use the correct version of ambig2ncol (see the three versions above)
Run doambig in your amber/rst/doambig/ directory.
SYNTAX: doambig YYMMDD [monomer|dimer]
ambigrab sorts the output from doambig and places asterisks next to the most probable assignments.
(OPTIONAL: Before running ambigrab, you can run ambigroup. ambigroup outputs a list in which each line is
[ambiguity, or # of assignment possibilities]: [number of peaks which display this ambiguity].
SYNTAX: ambigroup 7col.noesyname_asgresults.noevio > outputfile)

To run ambigrab,
INPUT files:
- 7col.noesyname_asgresults.noevio
OUTPUT:
- filename designated in execute command
Run ambigrab in your amber/rst/doambig/YYMMDD/ directory, where the doambig output is.
SYNTAX: ambigrab -b ["bins"] -n [number of peaks] -p [peak file] > outputfile
(for example, ambigrab -b "3 6 10" -n 8 -p 7col.cnoe_asgresults.noevio > 7col.cnoe_asgresults.noevio.8
In this example, ambigrab places *** next to assignments with an avg. distance less than 3 angstroms, ** next to those which are 3 to 6 angstroms, and * next to those which are 6 to 10 angstroms.)

Note: To print out ambigrab output in a useful hardcopy format, use a2ps first:
SYNTAX: a2ps -q -nL -1 -F4.5 -H 7col.cnoe_asgresults.noevio.8 > 7col.cnoe_asgresults.noevio.8.ps
Then print the resulting postscript file using lp or lpr as you normally would.

Distance filtering using xpkasgn.sgi

xpkasgn.sgi is a compiled script (written in FORTRAN) which originated in the Wright lab. It performs chemical shift filtering followed by distance filtering on Felix output. The main difference between this method and the GENXPK/doambig or disambig method is that all peaks in a spectrum are analyzed, both those which were already assigned in Felix and those which are unassigned. Note that the .xpk and .assgn files must have nomenclature which matches the .pdb output from AMBER. You can use a script like subs to make the nomenclature changes. But you will need to edit subs to match your current nomenclature!

INPUT files:
- crcn.xpk
- crcn.ref (contains chemical shift referencing info)
- crcn.pat (contains tolerances for chemical shift filter)
- crcn.assgn (contains assignments in 4 column format)
- crcn.pdb (contains a list of .pdb files for distance filter)
- crcn1.pdb, crcn2.pdb, crcn3.pdb, . . . . . crcnx.pdb (AMBER output family)
- x.com (OPTIONAL)
OUTPUT files:
- assign.lis
- close.dis
- xpk.new
SYNTAX: Type x.com (automated) or xpkasgn.sgi (for interactive input)

Ambiguous assignments

You can use the ambigrab output to identify ambiguous restraints. The simplest method for handling these restraints is to manually enter them into the fix_list file which is read by diana_filt while makerst.new is run to create DIANA input restraints.
Look through the ambigrab output to find two or more possible NOEs which contribute to one NOE cross peak. You can check the average distance between the proton pairs, and see that the avg dist minus the rms is less than 7 angstroms. If two or more proton pairs make it through both the chemical shift and this distance filter, assign them manually to the largest bin -- 5 or 6 angstroms.

last updated 8/21/99 by Patty Fagan Jones (fagan@scripps.edu)