AMBER Archive (2006)

Subject: RE: AMBER: Amber 9 parallel test fail on 4096wat/Run.column_fft

From: Ross Walker (ross_at_rosswalker.co.uk)
Date: Wed Nov 01 2006 - 11:21:02 CST


Dear Yu,
 
Assuming that you did not compile in support for binary trajectories, the
-bintraj option to configure, then you can safely ignore the first error.
The second error is strange. Are you certain DO_PARALLEL is set to 'mpirun
-np 4'?
 
This works fine for me. Typically the column_fft test case will fail if the
number of processors selected is not a power of two. E.g. setting -np 3 or
-np 6 will likely lead to a failure of this test case. 4 however, should be
fine.
 
For the time being you can just ignore the column_fft test case (comment it
out of $AMBERHOME/test/Makefile) if you want to be able to run the rest of
the test cases. As long as you don't set column_fft=1 in the &ewald namelist
everything else should be fine. Column_fft is only really of use at very
high processor counts (>128) and so you are unlikely to have need of it.
 
Nonetheless we should try and track down where this is coming from. Can you
try the following:
 
export TESTsander=$AMBERHOME/exe/sander.MPI
export DO_PARALLEL='mpirun -np 2'
cd $AMBERHOME/test/4096wat
./Run.column_fft
export DO_PARALLEL='mpirun -np 4'
./Run.column_fft
export DO_PARALLEL='mpirun -np 8'
./Run.column_fft
 
And let us know which tests pass and which don't here. Then perhaps Mike
Crowley might have a better chance of tracking down where the problem is
coming from.
 
All the best
Ross

/\
\/
|\oss Walker

| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
| http://www.rosswalker.co.uk <http://www.rosswalker.co.uk/> | PGP Key
available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

 

  _____

From: owner-amber_at_scripps.edu [mailto:owner-amber_at_scripps.edu] On Behalf Of
Yu Chen
Sent: Wednesday, November 01, 2006 08:23
To: Amber Maillist
Subject: AMBER: Amber 9 parallel test fail on 4096wat/Run.column_fft

Hi, I have successfully compiled and installed Amber 9 on our RHEL AS 3
Linux cluster. It passed serial test, but in parallel test, I got the
following errors, hope someone can help me with. Thanks in advance!

First, our configurations:
=================================
RHEL AS 3 on Athlon,
Using LAM 7.0.6 which was compiled with intel compiler version 8.0 of
icc/icpc/ifort
Amber 9 was compiled using the same compilers
DO_PARALLEL set to 'mpirun -np 4'

Second, the error messages:
====================================
...
...
cd bintraj; ./Run.bintraj
sander and ptraj: test sander netCDF output and ptraj netCDF input
----------------------------------------------------------------------------
-
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 29219 failed on node n1 (10.0.0.8) with exit status 1.
----------------------------------------------------------------------------
-
./Run.bintraj: Program error
make[1]: [test.sander.BASIC] Error 1 (ignored)
make[1]: Leaving directory `/raid5/p2/raid1_p12/hhmi/software/Amber/v9/test'
export TESTsander=/hhmi/software/Amber/v9/exe/sander.MPI; cd 4096wat;
./Run.column_fft
ASSERTion 'processor == numtasks' failed in spatial_fft.f at line 488.
----------------------------------------------------------------------------
-
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 29221 failed on node n1 (10.0.0.8) with exit status 1.
----------------------------------------------------------------------------
-
./Run.column_fft: Program error
make: *** [test.sander.BASIC.MPI] Error 1

===============================================

Yu Chen
chen_at_hhmi.umbc.edu
Baltimore, MD 21250

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu