AMBER Archive (2009)

Subject: Re: [AMBER] amber parallel test fail

From: Jason Swails (jason.swails_at_gmail.com)
Date: Fri Oct 23 2009 - 06:58:19 CDT


Nahoum,
It appears as though mdout.jar.001 already exists. This is probably due to
the fact that the test has already been run before in that directory, and
Run.jar does not include a command to remove the mdouts (it does, though,
remove every other output file created. That's probably a fix that's
readily applied). If you go into the directory $AMBERHOME/tests/jar_multi
and rm -f mdout.jar.001 mdout.jar.000 and rerun the tests, it should finish
just fine (assuming of course previous tests don't have the same issue now
that you've run them as well). The sander call in Run.jar does not specify
the -O (overwrite) flag, so it quits in error if it tries to open a 'new'
file that already exists.

Good luck!
Jason

On Fri, Oct 23, 2009 at 6:20 AM, Nahoum Anthony <nahoum.anthony_at_strath.ac.uk
> wrote:

> Dear Amber users,
>
>
>
> I've compiled AMBER in parallel using ifort and the Math Kernel Libraries
> after successful installation in serial (passing all tests), make clean and
> configure with mpich2 (Myricom's version as we're using Myrinet
> interconnect). The make parallel command compiles without problem, but the
> jar_multi test fails and aborts the testing process... my terminal has this
> output:
>
>
>
> ...
>
> ==============================================================
>
> cd plane_plane_restraint && ./Run.dinuc_pln
>
> SANDER: Dinucleoside restrained with new plane-plane angle
>
> restraint that was defined with new natural
>
> language restraint input.
>
> diffing mdout.dinucAU_pln.save with mdout.dinucAU_pln
>
> PASSED
>
> ==============================================================
>
> diffing dinuc_pln_vs_t.save with dinuc_pln_vs_t
>
> PASSED
>
> ==============================================================
>
> cd bintraj && ./Run.bintraj
>
> diffing nc_headers.save with nc_headers
>
> PASSED
>
> ==============================================================
>
> make[1]: Leaving directory `/home/amber/AMBER/amber10/test'
>
> export TESTsander=/home/amber/AMBER/amber10/exe/sander.MPI; cd 4096wat &&
> ./Run.column_fft
>
> diffing mdout.column_fft.save with mdout.column_fft
>
> PASSED
>
> ==============================================================
>
> export TESTsander=/home/amber/AMBER/amber10/exe/sander.MPI; cd jar_multi &&
> ./Run.jar
>
>
>
> Running multisander version of sander amber10
>
> Total processors = 2
>
> Number of groups = 2
>
>
>
>
>
> Unit 6 Error on OPEN: mdout.jar.001
>
>
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
>
> rank 1 in job 197 imp1.sibs.strath.ac.uk_50001 caused collective abort
> of
> all ranks
>
> exit status of rank 1: return code 1
>
> ./Run.jar: Program error
>
> make: *** [test.sander.BASIC.MPI] Error 1
>
>
>
>
>
> For the purpose of this test, I have DO _PARALLEL set to 'mpiexec -n 2' and
> I can see sander.MPI appearing on both nodes when I use 'top' to check
> processes whilst the test is running... I've checked the Amber archives
> mailing list and couldn't find anything to direct me as to what caused that
> problem. Anyone got an idea ? anymore information required ?
>
>
>
> Best regards and thanks for your time,
>
>
>
> Nahoum
>
> _______________________________________________
> AMBER mailing list
> AMBER_at_ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
---------------------------------------
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER mailing list
AMBER_at_ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber