AMBER Archive (2007)

Subject: RE: AMBER: sander MPI fails included tests

From: Sergio Wong (swong_at_mccammon.ucsd.edu)
Date: Thu Jun 28 2007 - 14:14:02 CDT


Hi;

> So let me get this correct - everything works except the change_target.ntr
> test case is that right?

Almost, I commented out that test and then I get some errors in the REM
calculations. Apparently it can't re-write the rem.out.00* files. It
completes the first run (up to step 100), but when it has to re-start
again and write a new out file, it says

   Unit 6 Error on OPEN: ./rem.out.000
[1] MPI Abort by user Aborting program !
[1] Aborting program!
forrtl: error (76): Abort trap signal
Image PC Routine Line Source
libpthread.so.0 0000003D20E0C430 Unknown Unknown
Unknown
libc.so.6 0000003D2052E21D Unknown Unknown
Unknown
libc.so.6 0000003D2052FA1E Unknown Unknown
Unknown
sander.MPI 000000000075CCE2 Unknown Unknown
Unknown
sander.MPI 0000000000764ADB Unknown Unknown
Unknown
sander.MPI 0000000000755C9C Unknown Unknown
Unknown
sander.MPI 00000000007431F0 Unknown Unknown
Unknown
sander.MPI 000000000073AF49 Unknown Unknown
Unknown
sander.MPI 0000000000597117 Unknown Unknown
Unknown
sander.MPI 0000000000573C75 Unknown Unknown
Unknown
sander.MPI 00000000004D8609 Unknown Unknown
Unknown
sander.MPI 00000000004BA066 Unknown Unknown
Unknown
sander.MPI 00000000004B71ED Unknown Unknown
Unknown
sander.MPI 0000000000408962 Unknown Unknown
Unknown
libc.so.6 0000003D2051C3FB Unknown Unknown
Unknown
sander.MPI 00000000004088AA Unknown Unknown
Unknown

and the same for rem.out.001. The other tests, however, run fine.

>
> Could you verify if it works in serial - I.e. without DO_PARALLEL set and
> using regular 'sander'
>

Running regular (serial) sander is fine. All tests pass.

> I.e. see if it is only when you run it in parallel with 2 cpus that it
> segfaults.

if I use -np 4 , then I get the following message:

cd tgtmd/change_target.ntr; ./Run.tgtmd
SANDER: Targeted MD with changing target and restraints
  DO_PARALLEL set to mpirun -np 4
  too many processors for this test, exiting

> Next can you try compiling without optimization and with debugging turned
> on.
>
> on the load line in config.h add -g
>
> change the FFLAGS and FOPTFLAGS to read:
>
> -w95 -g $(LOCALFLAGS) $(AMBERBUILDFLAGS)
>
> Then make clean, make (if the serial one fails.) or make parallel (if it is
> only in parallel that it fails). Then hopefully it might show a traceback as
> to where it segfaulted.

OK, I recompiled with the options you specified and still get the same
error when running the change_target.ntr test. Ugh. I used the old
compiler for this last test. The new compiler (v 10) fails to compile.
Apparently the naming conventions are somehow different and it claims some
functions are delared multiple times.

Thanks

-Sergio

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu