Formatting AMBER pdb files for submission to the PDB

This document briefly explains through example how to format pdb files generated by AMBER during NMR structure refinement for submission to the protein data bank.

For reference, updated information, and further documentation on completing a submission, visit the official PDB data deposition procedures page at Rutgers.


I have an awk script called amb2brook that lives in /sb/apps/bin/noarch which does most of the work for you. It reads a family of pdb files in the format created by ambpdb, and outputs them all into one file suitable for PDB deposition. It will take care of things like strand IDs, 4-letter atom names, MODEL and ENDMDL records, and adding null occupancy and B-factor fields.

There is also some code in the script that handles HETATM records. The residue name(s) for this function are hardwired in, so to use this feature you'll need to make your own copy of amb2brook and slightly modify that section of the script. This should only require changing a residue name in one or two places.

There are a couple of things you need to do in order to prepare your pdb files before processing them with amb2brook.

  1. You need to superimpose the structures on each other. I usually use suppose for that job, which can superimpose the structures and write the results back out to pdb files. The rotated, superimposed files get names like <input_prefix>.rot.pdb. A linux version of suppose should be in your path at /sb/apps/bin/Linux2/suppose.

    Here is an example all-atom superposition:

    tintin:/home/jsmith/pdb% suppose -mat -rot *.pdb
    
    TIP: Look at the output of the above suppose command and make a note of which structure is closest to the mean. Rename your pdb files so that this one appears first in a directory listing. This will cause the subsequent steps to put this structure first in your pdb deposition, making it easier for people to figure out (and for you to specify) which MODEL record should be used as the representative structure of your NMR family.

  2. Make sure there are TER cards between each strand in your superimposed pdb files. If there aren't, just add a line with TER by itself between each strand in each file.

  3. Add END cards to the bottom of each rotated pdb file. Here is an easy way to accomplish that (csh):
    tintin:/home/jsmith/pdb% foreach f (*.rot.pdb)
    foreach? echo 'END' >> $f
    foreach? end
    
Once you have followed these steps, your pdb files should be ready for processing by amb2brook. Here is how to run it:
tintin:/home/jsmith/pdb% amb2brook *.rot.pdb > myfamily.pdb
In this example, myfamily.pdb should be ready for submission. Look it over, checking the MODEL records against your input pdb files to make sure nothing obvious went wrong.