AMBER Archive (2002)

Subject: Re: MPI_BCAST error

From: Scott Brozell (sbrozell_at_scripps.edu)
Date: Wed Dec 18 2002 - 15:24:55 CST


Hello,

I have not encountered that message.
It generally indicates that a receive buffer is too small.
Since mpi_bcast produces the error and since the problem is
sporadic, the cause is probably a dangling pointer, which in Fortran
means an out of bounds array index or some other type of corrupted memory.

One approach to finding the problem is to build gibbs with the
compiler's automatic array bounds checking, stack checking,
etc. turned on.

Scott Brozell, Ph.D. | e-mail: sbrozell_at_scripps.edu
Dept. of Molecular Biology, TPC15 | fax: +1-858-784-8896
The Scripps Research Institute | phone: +1-858-784-8754
10550 N. Torrey Pines Rd. | home page:
La Jolla CA 92037 USA | http://www.scripps.edu/~sbrozell

On Wed, 18 Dec 2002, Nathan A. Baker wrote:

> Hi All --
>
> Has anyone encountered the message:
>
> -----------------------------------------------------
> 1 - MPI_BCAST : Message truncated
> [1] Aborting program !
> [1] Aborting program!
> Child process exited unexpectedly 0
>
>
> ** Signal 134519144 **
>
>
> End of diagnostics
> -----------------------------------------------------
>
> when running Gibbs using shared memory MPI (MPICH with device
> ch_shmem)?
>
> We've ran into this a few times (very randomly, some runs work & some
> runs die in the middle with this error) on our Linux cluster and are
> tyring to find the problem.
>
> Thanks for your help!
>
> -- Nathan Baker
>
>
> --
> Nathan A. Baker, Assistant Professor
> Washington University in St. Louis School of Medicine
> Dept. of Biochemistry and Molecular Biophysics
> Center for Computational Biology
> 700 S. Euclid Ave., Campus Box 8036, St. Louis, MO 63110
> Phone: (314) 362-2040, Fax: (314) 362-0234
> URL: http://www.biochem.wustl.edu/~baker
>