Making Modelfree work.

	Chazin Home \| Ca-binding Protein DB \| Vanderbilt Home
	Research Description \| Publications \| Wisdom \| Search
	How to contribute \| About this page

Modelfree 4.0: Making things work.

Art Palmer's Modelfree is a very efficient and powerful program for the modelfree analysis of relaxation data -- if you can get it to run. For the most part, you can follow the manual and expect things to work correctly, although certain things will cause problems, usually resulting from problems in the input files, which are created in our lab through a set of scripts to assist the analysis of FELIX data written by Mikael Akke.

The README from that package is below, with some modifications and clarifications I figured out the hard way.

The following is a flow-chart type description of the analysis procedure; additional information on a certain script is generally available in the header of that script.

The evaluation of peak intensities are made within Felix. Felix macros (.mac) and schemas (.sch) are provided for this purpose.

1. Create an xpk:peaks entity in your dba. Back-up this data base (just to be on the safe side).

2. Copy the schema xpkhgtv.sch to your schema directory.

3. Run the macro xpk_hgt_vol.mac. This will create a new entity called peakmax in your dba. This macro has one small problem with it. It tries to delete vol:volumes and xpk:peakmax, which is fine, although the first time you run it there won't be those two tables and the macro will crash. Copy the macro, rename it, and run it again, with both deletion commands commented out. peakmax looks much like xpk:peaks, but contains the following additional items: (i) the intensity maximum found within the xpk boundaries and (ii) the volume of the xpk. in addition, the coords for the xpk center will be replaced by the coords for the intensity maximum. The macro will write out the peakmax entity. You may want to change the name of the output file, so that it fits nicely with subsequent analysis scripts (read on).

If you have a number of matrices to measure peak intensities in (as you most likely will have), then the macro doall.mac can be useful, at least as a starter for further modifications. Felix2.3 allows batch processing, making it easy to process-evalutate-and-delete the matrices, and thereby saving space on the disk. The script doall does just this.

NOTE: For all data sets, e.g. T1, T2 and NOE, there may be a concern that for weak peaks, especially at later time points of the relaxation data, the xpk_hgt_vol.mac will find the peak maximum at a location that doesn't correspond to that where the stronger peaks have been found. To check if this presents a problem run the script: checkmaxpt (more below).

If this indeed does present a problem, then a possible solution for the T1 and T2 data sets is to first run xpk_hgt_vol.mac on the spectrum with the highest intensities, and then xpk_hgtvfix.mac on the low intensity spectrum using the database that was modified by xpk_hgt_vol.mac.Note that xpk_hgtvfix.mac can have the same problems as xpk_hgt_vol.mac above; if it crashes, comment out the appropriate deletion command and re-run the macro.

After this stage, you will have one Felix output-file for each matrix (e.g. int.1, int.2, etc), containing the peak intensity information. Assuming that you have recorded duplicate spectra for some of the relaxation delays, the next step will be to use these for the evaluation of uncertainty in peak intensities in the spectra.

1. Run the snratio_felix script for each pair of duplicates:

snratio_felix int.1 int.2 > output

2. For T1 and T2 data: Create a masterfile (use your favorite text editor, e.g. `vi'), that contains a list of the identifier for the int file (e.g. for file int.2, the identifier is 2), the corresponding relaxation delay in seconds and the uncertainty in peak height (as obtained from snratio_felix). Interpolate points in order to obtain estimates of the uncertainties for the relaxation time points where no duplicate were taken.

A example masterfile lies below.

1 0.016 4561.33
2 2.608 3313.22
4 1.224 3012.10
5 0.216 3411.75
6 0.800 2919.85
7 0.360 3290.46
8 0.544 3135.48
9 0.216 3411.75
10 0.800 2919.85
11 0.016 4561.33
12 2.608 3313.22

1	0.016	4561.33
2	2.608	3313.22
4	1.224	3012.10
5	0.216	3411.75
6	0.800	2919.85
7	0.360	3290.46
8	0.544	3135.48
9	0.216	3411.75
10	0.800	2919.85
11	0.016	4561.33
12	2.608	3313.22

NOTE: Several of the scripts get their input by doing an `ls' and looking for all files present that conform (in some way) to the file name(s) provided on the command line. The draw-back with this is that you may need to think twice about how you name your files (and make sure that you keep rather clean directories), but the great benefit is that you never need to give the number of files, nor give a list of all of the files, as input.

3. For T1 and T2 data:
Run the felix2relax script:

felix2relax filename masterfile output_extension

filename is the core filename (i.e. int, in our example). masterfile is (you guessed it) the name of the masterfile. output_extension is the extension of the output-file names. One output file will be created for each peakmax entity (i.e. each record in the int.* files). The output-file name will be given by the xpk name (e.g. s74) and output_extension (e.g. t1) [resulting in s74.t1]. felix2relax will take care of ambiguous xpk names; currently we have indicated ambiguity by incorporating a "/" or "?" in the xpk name in the Felix dba (e.g. a12/g54, t24?, s45/, v34/k73?, e33/y99/w90, or any such combination). You can easily modify this to fit your own taste.

3 1/2. For T1 and T2 data:
Run the checkmaxpt script:

checkmaxpt file_name cut_off_w2 cut_off_w1 new_extension outfile

filename is the core filename (i.e. int, in our example). cut_off_w2 is the maximum number of points that you allow the location of the peak maximum to differ in w2 between different spectra in the series. cut_off_w1: ditto for w1. new_extension is for output files corresponding to each individual residue. outfile is file containing all of the individual files concatenated. checkmaxpt creates a file called outfile.summary that contains only those instances where the peak maximum location for a certain xpk differs more than cut_off points from the average coordinates.

3 1/2+. For T1 and T2 data:
Run the merge_input script:

merge_input input_extension1 input_extension2 checkmaxpt_file \ masterfile output_extension

input_extension1 is the extension filename for the relaxation series, where the intensity is evaluated by searching for the maximum point. input_extension2 is the extension filename for the relaxation series, where the intensity is evaluated at fixed coordinates. checkmaxpt_file is the summary output file from checkmaxpt (possibly edited by you, after you have looked into the origin of the deviations). masterfile is the same as the one described above. output_extension is the extension of the resulting output files (e.g. t1merge).

4. For T1 and T2 data:
Run the run.r1 (run.r2) script:

run.r1 input_extension output_extension output_file

input_extension is the same as output_extension from the previous step (i.e. t1, following the example of 3 above). output_extension is the extension of the output-file names for the individual xpk files (e.g. rifit). output_file is a file that contains the contents of all of the indivual files. run.r1 will drive the invrecr1 program that fits your t1 inversion recovery data (your *.t1 files), create one output file of fitted relaxation parameters for each residue (*.r1fit) and one output file containing the contents of all of the individual output files.

5. For T1 data:
Run the r1X2comp script:

r1X2comp input_file > output

input_file is the same as the output_file from run.r1 (step 4). r1X2comp will check which xpks have been successfully fitted to the inversion recovery equation, and which have not.

6. For T2 data:
Run the r2bestfit script:

r2bestfit input_file > output_file

input_file is the same as the output_file from run.r2 (step 4). r2bestfit will check which xpks have been successfully fitted to the cpmg decay equation, and how many parameters are needed for the fit.

7. For T1 data:
Run the r1plotextract script to get a file that can be fed to your favourite plotting program (e.g. kaleidagraph, templegraph):

r1plotextract input_extension output_file

input_extension is the same as output_extension from the previous step (i.e. r1fit, following the example of 4 above).

8. For T2 data:
Run the r2plotextract script to get a file that can be fed to your favourite plotting program (e.g. kaleidagraph, templegraph):

r2plotextract r2bestfit_output output_file

r2bestfit_output is the output_file from r2bestfit (step 6).

9. For NOE data:
Run the noecalc_felix script:

noecalc_felix noe_int no_noe_int >output

noe_int and no_noe_int are int files (Felix output). (Preferrably from a pair of spectra that have been acquired interleaved). output is a file with the xpk name and the noe, i.e. the ratio of the peak intensity in noe_int and no_noe_int [i.e. intensity(noe)/intensity(no_noe)] as calculated from both the peak heights and volumes.

10. For NOE data:
Run the noeavesd script:

noeavesd noe_file_1 noe_file_2 ... > output

noe_file_1 is the output from noecalc_felix for one pair of spectra. noe_file_2 is the output from noecalc_felix for another pair of spectra. (noe_file_3 is the output from noecalc_felix for a third pair of spectra.) noeavesd calculates the average noe and the standard deviation from the duplicate (triplicate?...) noe_files. The output can be directly imported into plotting programs.

11. Now you have all the relaxation data in neat little files. Time to create the input files for mfgrid. Run the makemfgin script:

makemfgin r1_file r2_file noe_file output_file (=mfinput)

You will need to edit the output file in order to set some flags that the optimization program uses.

12. After having run mfgrid and modelfree, you will probably want to extract the model-free parameters, and put them into a plotting- program-friendly format. Run the s2plotextract script:

s2plotextract modelfree_output_file > output

There are several different versions of s2plotextract that outputs different statistical data (*.95, *.gearyz) or the entire sequence of residue numbers (*.seq).

For further notes and help, please first take a look in the header of the script in question, or look at the man pages of the model-free programs.

Last edited on October 14th, 1998 by John Blankenship