relax_scripts. Copyright 1994, Mikael Akke.
Feb 26, 1994; Aug 20, 94; Nov 16, 94
Obtain by anonymous ftp from cuagpa.bioc.columbia.edu (128.59.96.21)
/users/ftp/pub/model_free/relax_scripts.tar.Z
**************** INTRO *******************************************************

This README file describes the use of a set of scripts that attempts to
stream-line the analysis of relaxation data using Dr. Arthur G. Palmer's 
suite of programs (including invrecr1, cpmgr2, tmest, mfgrid, modelfree). 

Scripts similar to these were originally created and compiled over several 
years, starting in 1991 at The Scripps Research Institute. 
The principal authors are Drs. Martin J. Stone, Johan Ko"rdel and Mikael Akke.

The current set of scripts (Copyright 1994, Mikael Akke) have been rewritten
and incorporate several modifications in order to "interface" the analysis 
with output from the Felix program (Biosym Technologies, Inc.). 
The scripts may be useful to you even if you don't use Felix, but you will
most likely have to modify some of the scripts that are used at the earlier
stages of the analysis process, e.g. `felix2relax'.
In several aspects this is a stripped-down package that is intended to 
provide a main engine of analysis scripts; the original set of scripts 
(from Scripps) is larger, and contain many "little" scripts to perform 
virtually any analysis you'll ever need on your data.

Copyright blurb:
I ask that you do not re-distribute this set of scripts without my approval,
and that (if and when you do so) you then include this README file together
with all of the _original_ scripts. Of course, you are free to make any 
changes you want for your own purposes. I would appreciate hearing your
comments and learning about your improvements to the current set of scripts.
The sole reason I want to keep the files away from the public domain is that
it will help me answer any questions that any future users may have regarding 
these scripts.
relax_scripts are distributed free of charge (of course) to anyone who
wants them. i encourage you to email me if you use these scripts, as i
keep a list of email addresses in order to let users know of bugs, bug fixes, 
new versions etc.

The "package" should contain the following files:

README.relax_scripts (this file)

xpk_hgt_vol.mac (Felix macro)
xpk_hgtvfix.mac (Felix macro)
xpkhgtv.sch     (Felix schema)

snratio_felix	(awk)
felix2relax 	(csh, awk)
checkmaxpt	(csh, awk)
checkmaxpt.awk	(awk)
merge_input     (csh)
merge_input.awk (awk)
run.r1	 	(csh, awk)
run.r2 		(csh, awk)
noecalc_felix	(awk)
r1plotextract	(csh, awk)
r2plotextract	(csh, awk)
r1X2comp	(awk)
r2bestfit	(awk)
t1t2ratio	(awk)
noeavesd	(csh)
noeavesd.awk	(awk)
makemfgin	(csh)
makemfgin.awk	(awk)
s2plotextract   (awk)

doall_stuff/README.doall
doall_stuff/doall			(csh, awk)
doall_stuff/doall.mac 			(Felix macro)
doall_stuff/init.mac.batch.mac		(Felix macro)
doall_stuff/t1w2batch.mac		(Felix macro)
doall_stuff/t1w1batch.mac		(Felix macro)
doall_stuff/xpk_hgt_vol_batch.mac	(Felix macro)


**************** HOW TO USE THESE SCRIPTS ************************************

The following is a flow-chart type description of the analysis procedure:
Additional information on a certain script is generally available
in the header of that script.


The evaluation of peak intensities are made within Felix. Felix macros (.mac) 
and schemas (.sch) are provided for this purpose.

1. Create an xpk:peaks entity in your dba. Back-up this data base (just to
   be on the safe side).

2. Copy the schema xpkhgtv.sch to your schema directory.

3. Run the macro xpk_hgt_vol.mac. This will create a new entity called 
   peakmax in your dba. peakmax looks much like xpk:peaks, but contains
   the following additional items: (i) the intensity maximum found within the 
   xpk boundaries and (ii) the volume of the xpk. in addition, the coords for 
   the xpk center will be replaced by the coords for the intensity maximum.
   The macro will write out the peakmax entity. You may want to change
   the name of the output file, so that it fits nicely with subsequent
   analysis scripts (read on).

   If you have a number of matrices to measure peak intensities in (as you
   most likely will have), then the macro doall.mac can be useful, at least
   as a starter for further modifications. Felix2.3 allows batch processing,
   making it easy to process-evalutate-and-delete the matrices, and thereby
   saving space on the disk. The script doall does just this.

 
NOTE: For all data sets, e.g. T1, T2 and NOE, there may be a concern that
      for weak peaks, especially at later time points of the relaxation
      data, the xpk_hgt_vol.mac will find the peak maximum at a location 
      that doesn't correspond to that where the stronger peaks have been
      found. To check if this presents a problem run the script:
      checkmaxpt (more below). 
      If this indeed does present a problem, then a possible solution for 
      the T1 and T2 data sets is to first run xpk_hgt_vol.mac on the 
      spectrum with the highest intensities, and then xpk_hgtvfix.mac 
      on the low intensity spectrum using the database that was modified 
      by xpk_hgt_vol.mac.


After this stage, you will have one Felix output-file for each matrix
(e.g. int.1, int.2, etc), containing the peak intensity information.
Assuming that you have recorded duplicate spectra for some of the relaxation 
delays, the next step will be to use these for the evaluation of uncertainty
in peak intensities in the spectra.


1. Run the snratio_felix script for each pair of duplicates:

	snratio_felix int.1 int.2 > output


2. For T1 and T2 data: 
   Create a masterfile (use your favorite text editor, e.g. `vi'),
   that contains a list of the identifier for the int file (e.g. for file
   int.2, the identifier is 2), the corresponding relaxation delay in seconds
   and the uncertainty in peak height (as obtained from snratio_felix).
   Interpolate points in order to obtain estimates of the uncertainties for 
   the relaxation time points where no duplicate were taken.


NOTE: Several of the scripts get their input by doing an `ls' and looking
      for all files present that conform (in some way) to the file name(s) 
      provided on the command line. 
      The draw-back with this is that you may need to think twice about
      how you name your files (and make sure that you keep rather clean
      directories), but the great benefit is that you never need to give 
      the number of files, nor give a list of all of the files, as input.


3. For T1 and T2 data: 
   Run the felix2relax script:

	felix2relax filename masterfile output_extension

   filename is the core filename (i.e. int, in our example).
   masterfile is (you guessed it) the name of the masterfile.
   output_extension is the extension of the output-file names.
   One output file will be created for each peakmax entity (i.e. each record
   in the int.* files). The output-file name will be given by the xpk name 
   (e.g. s74) and output_extension (e.g. t1) [resulting in s74.t1]. 
   felix2relax will take care of ambiguous xpk names; currently we have 
   indicated ambiguity by incorporating a "/" or "?" in the xpk name in the
   Felix dba (e.g. a12/g54, t24?, s45/, v34/k73?, e33/y99/w90, or any such
   combination). You can easily modify this to fit your own taste. 

3 1/2. For T1 and T2 data:
   Run the checkmaxpt script:

	checkmaxpt file_name cut_off_w2 cut_off_w1 new_extension outfile

   filename is the core filename (i.e. int, in our example).
   cut_off_w2 is the maximum number of points that you allow the location
   of the peak maximum to differ in w2 between different spectra in 
   the series.
   cut_off_w1: ditto for w1.
   new_extension is for output files corresponding to each individual 
   residue.
   outfile is file containing all of the individual files concatenated.
   checkmaxpt creates a file called outfile.summary that contains only
   those instances where the peak maximum location for a certain xpk
   differs more than cut_off points from the average coordinates.

3 1/2+. For T1 and T2 data:
   Run the merge_input script:

	merge_input input_extension1 input_extension2 checkmaxpt_file \
                    masterfile output_extension

   input_extension1 is the extension filename for the relaxation series,
   where the intensity is evaluated by searching for the maximum point.
   input_extension2 is the extension filename for the relaxation series,
   where the intensity is evaluated at fixed coordinates.
   checkmaxpt_file is the summary output file from checkmaxpt (possibly
   edited by you, after you have looked into the origin of the deviations).
   masterfile is the same as the one described above.
   output_extension is the extension of the resulting output files (e.g. 
   t1merge).

4. For T1 and T2 data: 
   Run the run.r1 (run.r2) script:

	run.r1 input_extension output_extension output_file

   input_extension is the same as output_extension from the previous step 
   (i.e. t1, following the example of 3 above).
   output_extension is the extension of the output-file names for the 
   individual xpk files (e.g. rifit). 
   output_file is a file that contains the contents of all of the indivual 
   files.
   run.r1 will drive the invrecr1 program that fits your t1 inversion recovery 
   data (your *.t1 files), create one output file of fitted relaxation
   parameters for each residue (*.r1fit) and one output file containing the 
   contents of all of the individual output files.


5. For T1 data:
   Run the r1X2comp script:

	r1X2comp input_file > output

   input_file is the same as the output_file from run.r1 (step 4).
   r1X2comp will check which xpks have been successfully fitted to
   the inversion recovery equation, and which have not.


6. For T2 data:
   Run the r2bestfit script:

	r2bestfit input_file  > output_file

   input_file is the same as the output_file from run.r2 (step 4).
   r2bestfit will check which xpks have been successfully fitted to
   the cpmg decay equation, and how many parameters are needed for the fit.


7. For T1 data:
   Run the r1plotextract script to get a file that can be fed to 
   your favourite plotting program (e.g. kaleidagraph, templegraph):

	r1plotextract input_extension output_file

   input_extension is the same as output_extension from the previous step 
   (i.e. r1fit, following the example of 4 above).

   
8. For T2 data:
   Run the r2plotextract script to get a file that can be fed to 
   your favourite plotting program (e.g. kaleidagraph, templegraph):

	r2plotextract r2bestfit_output output_file

   r2bestfit_output is the output_file from r2bestfit (step 6).

   
9. For NOE data:
   Run the noecalc_felix script:

	noecalc_felix noe_int no_noe_int >output

   noe_int and no_noe_int are int files (Felix output). (Preferrably 
   from a pair of spectra that have been acquired interleaved).
   output is a file with the xpk name and the noe, i.e. the ratio of the peak 
   intensity in noe_int and no_noe_int [i.e. intensity(noe)/intensity(no_noe)] 
   as calculated from both the peak heights and volumes.


10. For NOE data:
    Run the noeavesd script:

	noeavesd noe_file_1 noe_file_2 ... > output

    noe_file_1 is the output from noecalc_felix for one pair of spectra.
    noe_file_2 is the output from noecalc_felix for another pair of spectra.
    (noe_file_3 is the output from noecalc_felix for a third pair of spectra.)
    noeavesd calculates the average noe and the standard deviation from the
    duplicate (triplicate?...) noe_files.
    The output can be directly imported into plotting programs.


11. Now you have all the relaxation data in neat little files.
    Time to create the input files for mfgrid. Run the makemfgin script:

	makemfgin r1_file r2_file noe_file output_file (=mfinput)  

    You will need to edit the output file in order to set some flags
    that the optimization program uses.

12. After having run mfgrid and modelfree, you will probably want to
    extract the model-free parameters, and put them into a plotting-
    program-friendly format. Run the s2plotextract script:

	s2plotextract modelfree_output_file > output

    There are several different versions of s2plotextract that outputs
    different statistical data (*.95, *.gearyz) or the entire sequence
    of residue numbers (*.seq). 


Note that makemfgin is even less smart than the rest of the scripts. 
It is my intention that makemfgin (sometime in the future) will take any 
number of input files (e.g. if you have relaxation data from more than one 
field), and create the appropriate output.

That's about it...

For further notes and help, please first take a look in the header of the 
script in question, or look at the man pages of the model-free programs.