AMBER Archive (2003)

Subject: Re: AMBER: Problem with MPI_Finalize

From: Robert Duke (rduke_at_email.unc.edu)
Date: Tue Nov 18 2003 - 08:54:36 CST


Thomas -
This is sort of an "off the top of my head" guess, but it is possible you
are hitting the flush() problem with the new SGI libraries. What SGI
apparently did is change flush() to have 2 arguments instead of 1. Now
there is a second istat argument, and if you look at sys.f under
amber7/src/Machines/standard, you will see that amflsh uses a flush call
with 1 arg. Introduce a second integer arg, istat, that you ignore, and
things will probably be okay. What is happening is that the stack is
getting trashed on return from the flush call, and under some circumstances,
I think this kills the master process (which does the mdout i/o).
Generally, I think it just happens as a run is printing final data, and
because the master croaks, the finalize's get messed up. Anyway, it is
worth a try. I actually took the flush calls out of pmemd over this issue
(much annoyed), and instead do timed close/open's. Changing library
interfaces is bad. I don't know if there is a sander7 bugfix for this one
or not, but the problem with doing a fix is you have to know the version of
s/w being used (in other words, doing the fix on machines with old SGI
software will break them).
Regards - Bob

----- Original Message -----
From: <Thomas.Fox_at_bc.boehringer-ingelheim.com>
To: <amber_at_scripps.edu>
Sent: Tuesday, November 18, 2003 9:10 AM
Subject: AMBER: Problem with MPI_Finalize

> Hi -
>
> looking through the archives, I didnt find anything helpful, so Im
reporting
> my own observations:
>
> I have compiled sander (amber7) on an SGI O3000 with IRIX64 6.5, and MPI
> 3.2.0.7, with MIPS Pro 7.3.1.2m.
> Running a MD simulation of a protein on this machine, everything goes fine
> and my MD
> calculation runs through smoothly...however, we just got a new machine
> (O3000 with IRIX 6.5, and MPI 4.3 MPT 1.8,
> but no compiler on it) and running my sander executable on it, I get
> basically identical results, my simulation runs
> through to the end, but now I get the following error message
>
> MPI: Program /home/foxt/amber7/exe_mpi/sander, Rank 0, Process 6182
> received signal SIGSEGV(11)
>
>
> MPI: --------stack traceback-------
>
>
> sh: dbx: not found
>
> MPI: -----stack traceback ends-----
> MPI: Program /home/foxt/amber7/exe_mpi/sander, Rank 0, Process 6182:
> Dumping core on signal SIGSEGV(11) into directory
> /home/foxt/PROJECTS/LIE_FXA/LIE_RUNS/RUNS_RST
> MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
> MPI: aborting job
> MPI: Received signal 11
>
> The output stops before the final timing information ("5. TIMINGS") - but
> this could be a buffering issue...
> minimizations are no problem, just MD calculations.
>
> To be honest, this behavior is more an annoyance, as I dont get the timing
> information and a lot of garbage
> in my log-files (and, yes, lots of core dumps that I have to remove), but
> still...I have looked through the code
> but couldnt find anything obvious, but this is probalbly as Im not
familiar
> enough with MPI...
>
> Any idea/suggestion ?
>
> Th.
>
> Dr. Thomas Fox
> Dept. Lead Discovery - Computer Aided Molecular Design
> K91-00-10
> Boehringer Ingelheim Pharma GmbH & Co KG
> 88397 Biberach, Germany
> thomas.fox_at_bc.boehringer-ingelheim.com
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu