AMBER Archive (2006)Subject: Re: AMBER: Amber 9 parallel test fail on 4096wat/Run.column_fft
From: Yu Chen (chen_at_hhmi.umbc.edu)
Date: Fri Nov 03 2006 - 09:18:35 CST
Hello, Ross
Thanks for the replying. See replys inline.
> Dear Yu,
>
> Assuming that you did not compile in support for binary
> trajectories, the -bintraj option to configure, then you can safely
> ignore the first error. The second error is strange. Are you
> certain DO_PARALLEL is set to 'mpirun -np 4'?
Yeah, I ignored the first error, and I am certain DO_PARALLEL is set
to "mpirun -np 4". Afterwards, I commented it out, and everything
finished nicely.
> Nonetheless we should try and track down where this is coming from.
> Can you try the following:
>
> export TESTsander=$AMBERHOME/exe/sander.MPI
> export DO_PARALLEL='mpirun -np 2'
> cd $AMBERHOME/test/4096wat
> ./Run.column_fft
> export DO_PARALLEL='mpirun -np 4'
> ./Run.column_fft
> export DO_PARALLEL='mpirun -np 8'
> ./Run.column_fft
>
Here is the interesting part. I did the tests. it passed on np=2, 8,
32, 128, but failed on np=4,16,64 with the "ASSERTion 'processor ==
numtasks' failed in spatial_fft.f " error. And, just for try, it all
failed on nps not power of 2.
Physically, we have a 25 nodes cluster plus the head node, each with
two AMD Athlon CPUs, and using LAM/MPI.
BTW, any other programs in Amber require number of processors be set
to power of 2?
Thanks,
Chen
> And let us know which tests pass and which don't here. Then perhaps
> Mike Crowley might have a better chance of tracking down where the
> problem is coming from.
>
> All the best
> Ross
> /\
> \/
> |\oss Walker
>
> | HPC Consultant and Staff Scientist |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery,
> may not be read every day, and should not be used for urgent or
> sensitive issues.
>
>
>
> From: owner-amber_at_scripps.edu [mailto:owner-amber_at_scripps.edu] On
> Behalf Of Yu Chen
> Sent: Wednesday, November 01, 2006 08:23
> To: Amber Maillist
> Subject: AMBER: Amber 9 parallel test fail on 4096wat/Run.column_fft
>
> Hi, I have successfully compiled and installed Amber 9 on our RHEL
> AS 3 Linux cluster. It passed serial test, but in parallel test, I
> got the following errors, hope someone can help me with. Thanks
> in advance!
>
> First, our configurations:
> =================================
> RHEL AS 3 on Athlon,
> Using LAM 7.0.6 which was compiled with intel compiler version 8.0
> of icc/icpc/ifort
> Amber 9 was compiled using the same compilers
> DO_PARALLEL set to 'mpirun -np 4'
>
> Second, the error messages:
> ====================================
> ...
> ...
> cd bintraj; ./Run.bintraj
> sander and ptraj: test sander netCDF output and ptraj netCDF input
> ----------------------------------------------------------------------
> -------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 29219 failed on node n1 (10.0.0.8) with exit status 1.
> ----------------------------------------------------------------------
> -------
> ./Run.bintraj: Program error
> make[1]: [test.sander.BASIC] Error 1 (ignored)
> make[1]: Leaving directory `/raid5/p2/raid1_p12/hhmi/software/Amber/
> v9/test'
> export TESTsander=/hhmi/software/Amber/v9/exe/sander.MPI; cd
> 4096wat; ./Run.column_fft
> ASSERTion 'processor == numtasks' failed in spatial_fft.f at line 488.
> ----------------------------------------------------------------------
> -------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 29221 failed on node n1 (10.0.0.8) with exit status 1.
> ----------------------------------------------------------------------
> -------
> ./Run.column_fft: Program error
> make: *** [test.sander.BASIC.MPI] Error 1
>
> ===============================================
>
>
> Yu Chen
> chen_at_hhmi.umbc.edu
> Baltimore, MD 21250
>
>
>
Yu Chen
chen_at_hhmi.umbc.edu
Baltimore, MD 21250
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
|