AMBER Archive (2002)

Subject: Re: "Unit 5 Error" with a Linux/MPICH Amber7

From: Stéphane Teletchéa (steletch_at_biomedicale.univ-paris5.fr)
Date: Mon May 06 2002 - 11:22:42 CDT


Hi Vincent, it seems that you have forgotten to put the mini1.in in the
correct directory, or that your mini1.min is incorrect (corrupted, wrong ..).

Stef

-- 
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
Teletchéa Stéphane - CNRS UMR 8601
Lab. de chimie et biochimie pharmacologiques et toxicologiques
45 rue des Saints-Peres 75270 Paris cedex 06
tel : (33) - 1 42 86 20 86 - fax : (33) - 1 42 86 83 87
mél : steletch_at_biomedicale.univ-paris5.fr
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*

Le Lundi 6 Mai 2002 17:50, Vincent BOSQUIER a écrit : > Hi all, > > I have installed AMBER7 with MPICH on a Linux RedHat-7.2 cluster. To > validate my installation, I run a script that has already been run on > AMBER7 installed on a 1 CPU SGI server. This script includes several > successive commands, including calls to sander. The script has the > following structure: > > ----------- > > #!/bin/csh -f > > setenv $AMBERHOME /data/test/amber7 > setenv MPICH_HOME /usr/share/mpi > setenv DO_PARALLEL "$MPICH_HOME/bin/mpirun -np 4 -machinefile > $MPICH_HOME/share/machines.LINUX" > > $AMBERHOME/exe/sander -O \ > -i mini1.in \ > -o test1.out \ > -p test.top \ > -c test.crd \ > -inf test1.info \ > -r test1.rst > > $AMBERHOME/exe/sander -O \ > -i mini2.in \ > (...) > > > ----------- > > It seems that sander crashes with the following error messages, whenever I > try to run such a script: > > ----------- > > Unit 5 Error on OPEN: mini1.in > [0] MPI Abort by user Aborting program ! > [0] Aborting program! > p0_3492: p4_error: : 1 > > Unit 5 Error on OPEN: mini2.in > [0] MPI Abort by user Aborting program ! > [0] Aborting program! > p0_3493: p4_error: : 1 > > (...) > > ----------- > > As I already said above, a researcher in our molecular modeling team tried > to run the same test files on an SGI machine where I previously installed > AMBER7 locally without MPICH and it worked fine. Is there a problem with > our input files? Is there a difference in the input files for AMBER7 when > you runit on 1 or on several processors? Today, what I'm sure about is that > "make test.sander" passed without any problem on the cluster. I don't know > wether MPICH is correctly configured or not, but I think it is, because of > some tests I have successfully made (see below). > > Can one tell me what is a "Unit 5 error", and how I can manage it so that > sander runs normally with all the processors I define in the machinefile? > > We also experienced sander-crashes problems with "Unit 6 error" that seemed > to be related to ".out" files. Has anyone any information about this too? > > Here are some informations about the machines and the tests I ran to > validate my MPICH module. Maybe it will help you have an idea of what is > happening: > > ----------- > > IBM x330series - Linux RedHat-7.2 > Test of parallel computing using "mpich-1.2.0" installed from RedHat's > RPMs. $MPICH_HOME=/usr/share/mpi > DO_PARALLEL="$MPICH_HOME/bin/mpirun -np 4 -machinefile > $MPICH_HOME/share/machines.LINUX" MPICH Machinefile is "machines.LINUX" and > contains 4 lines formatted that way: > > machine2.ourdomain > machine2.ourdomain > machine1.ourdomain > machine1.ourdomain > > "machine2" and "machine1" are biprocessors nodes in my cluster > > The /data/test directory is a local directory on "machine1" and is > NFS-mounted on "machine2" where /data/test is also the name of the > mountpoint. User "me" owns $MPICH_HOME directory (and all of its contents). > User "me" also owns /data/test directory (and all of its contents, > including the "cpi" executable file). Command line used and associated > results look like this: > > <me_at_machine1:/data/test>/usr/share/mpi/bin/mpirun -np 4 -machinefile > /usr/share/mpi/share/machines.LINUX ./cpi Process 0 on machine1.ourdomain > Process 3 on machine1.ourdomain > Process 1 on machine2.ourdomain > Process 2 on machine2.ourdomain > pi is approximately 3.1416009869231249, Error is 0.0000083333333318 > wall clock time = 0.001346 > > ----------- > > Thanks in advance to all those who will help me. > > Vincent. > > > --------------------------------------------------------------------- > Vincent Bosquier > IT Engineer > > Synt:em > Computational Drug Discovery > Parc Scientifique G.Besse > Allee Charles Babbage > 30035 Nimes Cedex 1 > France > > E-mail: vbosquier_at_syntem.com > Ligne directe: +33 (0)466 042 294 > Standard: +33 (0)466 048 666 > Fax: +33 (0)466 048 667 > --------------------------------------------------------------------- > Discover New Drugs, Discover Synt:em > http://www.syntem.com > ---------------------------------------------------------------------