![]() | Chazin Home | Ca-binding Protein DB | Vanderbilt Home | ![]() |
Research Description | Publications | Wisdom | Search | ||
How to contribute | About this page |
written by: Melanie Nelson, 8/15/96
This is a general method to generate contact maps, and also more specific information about contact between residues in a structure. CHARMM determines the center of geometry for each residue and then can give you a list of residues which are in contact. You define what you mean by "contact" by setting a cutoff value-- any residues closer than this cutoff are considered to be in contact. You can also set this cutoff to be really high and use this method to generate a list of the distances between the centers of geometry of all residues. I usually use this method, and then filter the output later-- either with an awk script or with Charlie Brooks' map generating program. CHARMM can generate a distance list for all members of an NMR family. Charlie's map generating program can then be used to average these lists to generate an average contact map. I also have a modified version of the map generating program which outputs a list of the average distances in addition to the postscript plot. In the future this might be further modified to output the standard deviation as well, but the program is in FORTRAN, so don't hold your breath.
Use CHARMM for this purpose. You will need your pdb file(s) in absolutely correct CHARMM format. Since CHARMM is written in FORTRAN it tolerates no deviations from this format. You will also need a .seq (sequence) file, a topology file, a parameter file, and an input file. I will describe each of these in turn.
A. The PDB File(s): CHARMM is written in FORTRAN, and is therefore rather particular about the format of its input files. Therefore it is ABSOLUTELY ESSENTIAL that all the spacing, etc in your pdb file be correct. In addition, the final lines must be the TER and the END line, or else CHARMM will ignore the last residue. I have a sample file called sample.pdb which is absolutely correct. (or at least correct enough to work!) CHARMM uses the old pdb format. Therefore, several minor adjustments are required in order to use most PDB files. I have a script which makes this conversion. It is called std_to_charmm, and is located in my bin (~mnelson/bin/pdb_tools). It is a shell script containing two gawk scripts. It absolutely requires gawk (or a relatively new version of nawk), and will not run correctly with awk. The command line is: "std_to_charmm inputfile > outputfile" . Several other programs also require this older format, or some variation thereof, so please feel free to take this script and change it to fit your particular needs. This script does not deal with histidines or tryptophans. There are three types of histidine according to the charmm parameters file I use-- HSD (neutral histidine with a proton on ND1), HSE (neutral histidine with a proton on NE2, and HSP (protonated histidine with protons on both ND1 and NE2). For NMR structues you will have to determine which type of histidines you have, and change the residue names in your pdb file. CHARMM will not recognize residues named "HIS." For crystal structures, you can just pick one type. For the purposes of contact maps it should't matter which type you use.
As far as I can tell, CHARMM requires that your pdb file start with residue #1 (which also has to be the first residue in your .seq file). This means that you will have to renumber residues if you have a pdb file for a crystal structure which is missing the first few residues. I am sure that there is a way around this, but I have not found it yet.
For large NMR families you may end up with more than 9999 lines in your pdb file. In this case, the line numbers (in column 12 in the pdb file) start to replace the last letter of the PDB code (in column 11). This is a problem, because CHARMM requires that each line of the PDB file have the same PDB code. I fixed this with a sed script which replaced all occurances of 1CF1 (part pdb code, part line number) with 1CFC (the correct pdb code for my structures). CHARMM does not seem to mind the fact that the line numbers are changed by this fix. This is not a general fix- the sed script will have to be modified for each case. However, if you want this script you are welcome to it. It is called fix_code.sed, and is in ~mnelson/bin. If you make this fix before you split up your family into individual files, this fix can easily be performmed in vi.
As I mentioned above, NMR families must be split into separate files. I have a script which does this. It assumes that the individual structures are separated by a line containing the word "MODEL" (as is the case for files downloaded from the PDB). The script is called splitpdb, and is located in ~mnelson/bin/pdb_tools. It is a shell script which calls gawk. Again, it absolutely requires either gawk or a new version of nawk. As written, this script calls the new PDB files model_001.pdb, model_002.pdb, etc. To change this, you must change the script. At the end of the script there is a function which opens each successive file. Change the line setting the filename. Put whatever filename base you want in the sprintf statement.
If you are using a crystal strucutre, there will be no hydrogens in the
pdb file. You can add the hydrogens with CHARMM itself, or you can do it
in Insight/Discover first. Adding hydrogens with CHARMM is very simple.
The basic command is HBUILD. You can then minimize the hydrogen positions
(while holding heavy atoms fixed). However, in the test case I ran, the
RMSD between the coordinates before and after minimization was only 0.086
angstroms, so it is probably not necessary to run this minimization. The
file camn_ca.inp is an example input file which adds hydrogens and performs a minimization.
B. The .seq file: This is just a file with the number of residues followed by a list of the residues. The residues in the list must be in three letter code, capital letters, one residues per line. I also have a script for generating a .seq file from a PDB file. It is a shell script that uses awk to first count the number of residues in the pdb file and then generate the .seq file. It is called make_seq, and is in ~mnelson/bin/pdb_tools.
C. The topology file: This is a standard file which contains topology information about the amino acids. It is this file and the parameter file which set the atom nomenclature (among other things!). If someone ever gets around to creating a topology file with the new pdb nomenclature, the conversion described above would not be necessary. Don't hold your breath. A bunch of topology files can be found in /cb/monet2/charmm/c25a1/toppar. I have nothing to do with these files, and don't actually know all that much about them. The two I have used are top_all22_prot.inp, which contains information for residues using ALL hydrogens, and toph19.inp, which only has information about polar hydrogens. For the purposes of contact map making, I use top_all22_prot.inp. D. The parameter file This is another standard input file. This one contains information about atom partial charges and other stuff like that. Parameter files can also be found in /cb/monet2/charmm/c25a1/toppar. I have used par_all22_prot.inp. which contains information about all hydrogens, and param19.inp, which only contains information about polar hydrogens. For contact map making, I use par_all22_prot.inp.
D. The input file: This is the script which runs CHARMM. CHARMM has its own scripting language, which is apparently a lot like FORTRAN. A full description of this language is beyond my knowledge. Perhaps the easiest way to explain how to write a file to generate contact maps is to give an annotated example of an input file. An example input file is camn_apo.inp, which generates distance maps for a family of 25 structures of calmodulin N terminal domain.
Now you can actually run CHARMM. It runs on SGIs and on the HPs. Check out my .cshrc for the paths to the executables and that sort of thing. I copied this stuff from Charlie Brooks' .cshrc. Here is a summary of the changes: put these lines anywhere (or resign yourself to typing to long pathnames):
setenv toppar /cb/monet2/charmm/c25a1/topparput this alias under your "if ( $OS == IRIX )" stuff:
alias charmm25 /cb/monet2/charmm/c25a1/exec/sgi/charmm.large.mips2
put this alias under your "if ( $ARCH == hppa )" stuff:
alias charmm25 /cb/monet2/charmm/c25a1/exec/hpux/charmm.large.hpux
Note that these programs may have been moved. I will try to keep this page up to date. However, if you get a command not found error, check my .cshrc for the latest location (at least the latest location of which I am aware!)
I usually run on one of the HPs. I particularly recommend running on an HP if you are doing a minimization. To run, type: "charmm25" output Call the output file what ever you want. This will contain error messages and other general information about the run, but NOT the distance information. You name the files for the distance information in the input file. It is a good idea to get in the habit of looking at the error output, though-- CHARMM will run even if it can't find some of the atoms it needs, but it will print an error message to standard output. Check the output of CHARMM for any warnings indicating that it could not find coordinates for a particular atom or that no parameters could be found for a particular atom. This usually indicates a problem with your pdbfile and may create problems in the contact map.
I use a variation of a program written by Charlie Brooks. Both his original program and my variation are in my bin (~mnelson/bin). The only difference is that my variation will output a file called "ave_list," which contains the average distances used in generating the average contact map for a family of structures. The original program is called cntct_map-2-ps2. My variation is called cmap_to_ps. These programs are in FORTRAN. The compiled versions in my bin run on bianca and asterix, and probably other SGIs as well. If you want access to the source code for any reason, you will need to talk to Charlie Brooks. The program is interactive. For just one map, you can run it this way. If you are making an average of lots of maps, this is annoying, so I use an input script. I have an annotated example script called cmap_to_ps.inp (which also explains what all the questions the program asks mean). The output from this program is a postscript file. It is in color, but can be printed in black and white (on tintin, for instance).
I personally find a list of interresidue distances and/or a list of contacting residues and the distance between them more informative than a simple two-dimensional plot. I generate these lists from an "extra" file created by my version of the contact map generating program, called cmap_to_ps. When this program is generating the average contact map, it creates a file called "ave_list." This file is a list of the average distances between residues. NOTE: cmap_to_ps will not run if there is already a file by this name in the directory. I recommend immediately renaming the file after it is created. This file can then be filtered to create a list of residues deemed to be in contact and the distance between them. I have a simple awk script which will do this, called cntct_filt.awk.. It is in my bin (~mnelson/bin/cmap_stuff/cntct_filt.awk). The script uses a cutoff of 8 angstroms to define whether or not two residues are in contact.