AMBER Archive (2006)

Subject: Re: AMBER: MPICH and Sander

From: David A. Case (case_at_scripps.edu)
Date: Thu Jun 15 2006 - 22:47:36 CDT


On Fri, Jun 16, 2006, Andrew Box wrote:
>
> When performing a Sander run on our local super computer (VPAC, Australia),
> my runs sometimes freeze, like shown in the text below:
>
> | Flags: MPI
> getting new box info from bottom of inpcrd
>
> Unit 9 Error on OPEN: md7.rst
>
> While I know how the problem arises and can fix it easily, when the run
> fails, the simulation does not terminate, but continues to use the cpu time
> (in this case, 16 cpu for 72 hours). Does this problem effect anyone else,
> and if so is there a way to fix it?

After printing the above statement, the code should go to mexit.f, where
it should call mpi_abort(). You probably should add a print statement inside
mexit.f to make sure that mpi_abort() is indeed being called. Then you might
have to make sure that that call will indeed stop jobs on your queing/mpi
system. Write a tiny program that just calls mpi_abort(), and see if it will
indeed stop a job as it should.

Only the master node is executing this code, but that is supposed to be enough
to signal all processes to quit. But it might require something different
on any given system to make sure that such a signal gets propagated to the
calling process.

...good luck...dac

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu