relax_scripts. Copyright 1994, Mikael Akke. Feb 26, 1994; Aug 20, 94; Nov 16, 94 Obtain by anonymous ftp from cuagpa.bioc.columbia.edu (128.59.96.21) /users/ftp/pub/model_free/relax_scripts.tar.Z **************** INTRO ******************************************************* This README file describes the use of a set of scripts that attempts to stream-line the analysis of relaxation data using Dr. Arthur G. Palmer's suite of programs (including invrecr1, cpmgr2, tmest, mfgrid, modelfree). Scripts similar to these were originally created and compiled over several years, starting in 1991 at The Scripps Research Institute. The principal authors are Drs. Martin J. Stone, Johan Ko"rdel and Mikael Akke. The current set of scripts (Copyright 1994, Mikael Akke) have been rewritten and incorporate several modifications in order to "interface" the analysis with output from the Felix program (Biosym Technologies, Inc.). The scripts may be useful to you even if you don't use Felix, but you will most likely have to modify some of the scripts that are used at the earlier stages of the analysis process, e.g. `felix2relax'. In several aspects this is a stripped-down package that is intended to provide a main engine of analysis scripts; the original set of scripts (from Scripps) is larger, and contain many "little" scripts to perform virtually any analysis you'll ever need on your data. Copyright blurb: I ask that you do not re-distribute this set of scripts without my approval, and that (if and when you do so) you then include this README file together with all of the _original_ scripts. Of course, you are free to make any changes you want for your own purposes. I would appreciate hearing your comments and learning about your improvements to the current set of scripts. The sole reason I want to keep the files away from the public domain is that it will help me answer any questions that any future users may have regarding these scripts. relax_scripts are distributed free of charge (of course) to anyone who wants them. i encourage you to email me if you use these scripts, as i keep a list of email addresses in order to let users know of bugs, bug fixes, new versions etc. The "package" should contain the following files: README.relax_scripts (this file) xpk_hgt_vol.mac (Felix macro) xpk_hgtvfix.mac (Felix macro) xpkhgtv.sch (Felix schema) snratio_felix (awk) felix2relax (csh, awk) checkmaxpt (csh, awk) checkmaxpt.awk (awk) merge_input (csh) merge_input.awk (awk) run.r1 (csh, awk) run.r2 (csh, awk) noecalc_felix (awk) r1plotextract (csh, awk) r2plotextract (csh, awk) r1X2comp (awk) r2bestfit (awk) t1t2ratio (awk) noeavesd (csh) noeavesd.awk (awk) makemfgin (csh) makemfgin.awk (awk) s2plotextract (awk) doall_stuff/README.doall doall_stuff/doall (csh, awk) doall_stuff/doall.mac (Felix macro) doall_stuff/init.mac.batch.mac (Felix macro) doall_stuff/t1w2batch.mac (Felix macro) doall_stuff/t1w1batch.mac (Felix macro) doall_stuff/xpk_hgt_vol_batch.mac (Felix macro) **************** HOW TO USE THESE SCRIPTS ************************************ The following is a flow-chart type description of the analysis procedure: Additional information on a certain script is generally available in the header of that script. The evaluation of peak intensities are made within Felix. Felix macros (.mac) and schemas (.sch) are provided for this purpose. 1. Create an xpk:peaks entity in your dba. Back-up this data base (just to be on the safe side). 2. Copy the schema xpkhgtv.sch to your schema directory. 3. Run the macro xpk_hgt_vol.mac. This will create a new entity called peakmax in your dba. peakmax looks much like xpk:peaks, but contains the following additional items: (i) the intensity maximum found within the xpk boundaries and (ii) the volume of the xpk. in addition, the coords for the xpk center will be replaced by the coords for the intensity maximum. The macro will write out the peakmax entity. You may want to change the name of the output file, so that it fits nicely with subsequent analysis scripts (read on). If you have a number of matrices to measure peak intensities in (as you most likely will have), then the macro doall.mac can be useful, at least as a starter for further modifications. Felix2.3 allows batch processing, making it easy to process-evalutate-and-delete the matrices, and thereby saving space on the disk. The script doall does just this. NOTE: For all data sets, e.g. T1, T2 and NOE, there may be a concern that for weak peaks, especially at later time points of the relaxation data, the xpk_hgt_vol.mac will find the peak maximum at a location that doesn't correspond to that where the stronger peaks have been found. To check if this presents a problem run the script: checkmaxpt (more below). If this indeed does present a problem, then a possible solution for the T1 and T2 data sets is to first run xpk_hgt_vol.mac on the spectrum with the highest intensities, and then xpk_hgtvfix.mac on the low intensity spectrum using the database that was modified by xpk_hgt_vol.mac. After this stage, you will have one Felix output-file for each matrix (e.g. int.1, int.2, etc), containing the peak intensity information. Assuming that you have recorded duplicate spectra for some of the relaxation delays, the next step will be to use these for the evaluation of uncertainty in peak intensities in the spectra. 1. Run the snratio_felix script for each pair of duplicates: snratio_felix int.1 int.2 > output 2. For T1 and T2 data: Create a masterfile (use your favorite text editor, e.g. `vi'), that contains a list of the identifier for the int file (e.g. for file int.2, the identifier is 2), the corresponding relaxation delay in seconds and the uncertainty in peak height (as obtained from snratio_felix). Interpolate points in order to obtain estimates of the uncertainties for the relaxation time points where no duplicate were taken. NOTE: Several of the scripts get their input by doing an `ls' and looking for all files present that conform (in some way) to the file name(s) provided on the command line. The draw-back with this is that you may need to think twice about how you name your files (and make sure that you keep rather clean directories), but the great benefit is that you never need to give the number of files, nor give a list of all of the files, as input. 3. For T1 and T2 data: Run the felix2relax script: felix2relax filename masterfile output_extension filename is the core filename (i.e. int, in our example). masterfile is (you guessed it) the name of the masterfile. output_extension is the extension of the output-file names. One output file will be created for each peakmax entity (i.e. each record in the int.* files). The output-file name will be given by the xpk name (e.g. s74) and output_extension (e.g. t1) [resulting in s74.t1]. felix2relax will take care of ambiguous xpk names; currently we have indicated ambiguity by incorporating a "/" or "?" in the xpk name in the Felix dba (e.g. a12/g54, t24?, s45/, v34/k73?, e33/y99/w90, or any such combination). You can easily modify this to fit your own taste. 3 1/2. For T1 and T2 data: Run the checkmaxpt script: checkmaxpt file_name cut_off_w2 cut_off_w1 new_extension outfile filename is the core filename (i.e. int, in our example). cut_off_w2 is the maximum number of points that you allow the location of the peak maximum to differ in w2 between different spectra in the series. cut_off_w1: ditto for w1. new_extension is for output files corresponding to each individual residue. outfile is file containing all of the individual files concatenated. checkmaxpt creates a file called outfile.summary that contains only those instances where the peak maximum location for a certain xpk differs more than cut_off points from the average coordinates. 3 1/2+. For T1 and T2 data: Run the merge_input script: merge_input input_extension1 input_extension2 checkmaxpt_file \ masterfile output_extension input_extension1 is the extension filename for the relaxation series, where the intensity is evaluated by searching for the maximum point. input_extension2 is the extension filename for the relaxation series, where the intensity is evaluated at fixed coordinates. checkmaxpt_file is the summary output file from checkmaxpt (possibly edited by you, after you have looked into the origin of the deviations). masterfile is the same as the one described above. output_extension is the extension of the resulting output files (e.g. t1merge). 4. For T1 and T2 data: Run the run.r1 (run.r2) script: run.r1 input_extension output_extension output_file input_extension is the same as output_extension from the previous step (i.e. t1, following the example of 3 above). output_extension is the extension of the output-file names for the individual xpk files (e.g. rifit). output_file is a file that contains the contents of all of the indivual files. run.r1 will drive the invrecr1 program that fits your t1 inversion recovery data (your *.t1 files), create one output file of fitted relaxation parameters for each residue (*.r1fit) and one output file containing the contents of all of the individual output files. 5. For T1 data: Run the r1X2comp script: r1X2comp input_file > output input_file is the same as the output_file from run.r1 (step 4). r1X2comp will check which xpks have been successfully fitted to the inversion recovery equation, and which have not. 6. For T2 data: Run the r2bestfit script: r2bestfit input_file > output_file input_file is the same as the output_file from run.r2 (step 4). r2bestfit will check which xpks have been successfully fitted to the cpmg decay equation, and how many parameters are needed for the fit. 7. For T1 data: Run the r1plotextract script to get a file that can be fed to your favourite plotting program (e.g. kaleidagraph, templegraph): r1plotextract input_extension output_file input_extension is the same as output_extension from the previous step (i.e. r1fit, following the example of 4 above). 8. For T2 data: Run the r2plotextract script to get a file that can be fed to your favourite plotting program (e.g. kaleidagraph, templegraph): r2plotextract r2bestfit_output output_file r2bestfit_output is the output_file from r2bestfit (step 6). 9. For NOE data: Run the noecalc_felix script: noecalc_felix noe_int no_noe_int >output noe_int and no_noe_int are int files (Felix output). (Preferrably from a pair of spectra that have been acquired interleaved). output is a file with the xpk name and the noe, i.e. the ratio of the peak intensity in noe_int and no_noe_int [i.e. intensity(noe)/intensity(no_noe)] as calculated from both the peak heights and volumes. 10. For NOE data: Run the noeavesd script: noeavesd noe_file_1 noe_file_2 ... > output noe_file_1 is the output from noecalc_felix for one pair of spectra. noe_file_2 is the output from noecalc_felix for another pair of spectra. (noe_file_3 is the output from noecalc_felix for a third pair of spectra.) noeavesd calculates the average noe and the standard deviation from the duplicate (triplicate?...) noe_files. The output can be directly imported into plotting programs. 11. Now you have all the relaxation data in neat little files. Time to create the input files for mfgrid. Run the makemfgin script: makemfgin r1_file r2_file noe_file output_file (=mfinput) You will need to edit the output file in order to set some flags that the optimization program uses. 12. After having run mfgrid and modelfree, you will probably want to extract the model-free parameters, and put them into a plotting- program-friendly format. Run the s2plotextract script: s2plotextract modelfree_output_file > output There are several different versions of s2plotextract that outputs different statistical data (*.95, *.gearyz) or the entire sequence of residue numbers (*.seq). Note that makemfgin is even less smart than the rest of the scripts. It is my intention that makemfgin (sometime in the future) will take any number of input files (e.g. if you have relaxation data from more than one field), and create the appropriate output. That's about it... For further notes and help, please first take a look in the header of the script in question, or look at the man pages of the model-free programs.