AMBER Archive (2009)

Subject: Re: [AMBER] MPI process terminated unexpectedly after cluster upgrade

From: Dmitri Nilov (nilovdm_at_gmail.com)
Date: Mon Nov 02 2009 - 07:11:15 CST


Yes, I've recompiled Amber but I couldn't change mvapich because I'm just
client on serious cluster)

On Mon, Nov 2, 2009 at 3:15 PM, Jason Swails <jason.swails_at_gmail.com> wrote:

> It could be that the new version of mvapich broke the previous
> installation,
> since the libraries could easily have changed (and if it's really, in fact,
> a new version, I'd bet on it since there's not much else that could
> 'change'). Did you try recompiling?
>
> Do the test cases still pass? If not, I'd say your only options are to
> recompile amber/pmemd in parallel or revert back to the old version of
> mvapich if it's still on the cluster.
>
> Good luck!
> Jason
>
> On Mon, Nov 2, 2009 at 4:17 AM, Dmitri Nilov <nilovdm_at_gmail.com> wrote:
>
> > Hello!
> > Sander.MPI tasks are crushing just after launch since mvapich software
> was
> > upgraded on cluster.
> > Sander.MPI.out contains:
> >
> > MPI process terminated unexpectedly
> > Exit code -5 signaled from node-23-06
> > Killing remote processes...forrtl: error (69): process interrupted
> (SIGINT)
> > Image PC Routine Line
> > Source
> > libpthread.so.0 00007F2132C1EB00 Unknown Unknown
> Unknown
> > libpthread.so.0 00007F2132C1DB7E Unknown Unknown
> Unknown
> > libmpich.so.1.0 00007F21334CB1AC Unknown Unknown
> Unknown
> > libmpich.so.1.0 00007F21334E1ADE Unknown Unknown
> Unknown
> > libmpich.so.1.0 00007F21334C050A Unknown Unknown
> Unknown
> > libmpich.so.1.0 00007F21334A2DED Unknown Unknown
> Unknown
> > libmpich.so.1.0 00007F21334A1DC6 Unknown Unknown
> Unknown
> > sander.MPI 000000000093A0EF Unknown Unknown
> Unknown
> > sander.MPI 00000000004BC222 Unknown Unknown
> Unknown
> > sander.MPI 000000000041E05C Unknown Unknown
> Unknown
> > libc.so.6 00007F213216ACF4 Unknown Unknown
> Unknown
> > sander.MPI 000000000041DF69 Unknown Unknown
> Unknown
> > forrtl: error (69): process interrupted (SIGINT)
> > and so on..
> >
> > I've found similar problem at
> > http://archive.ambermd.org/200907/0092.html, that seems to be still
> > unsolved.
> > I don't think it's infiniband problem. So what i have to do?
> >
> > Thanks a lot!
> > Dmitri Nilov,
> > Lomonosov Moscow State University
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER_at_ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> >
>
>
> --
> ---------------------------------------
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Graduate Student
> 352-392-4032
> _______________________________________________
> AMBER mailing list
> AMBER_at_ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER mailing list
AMBER_at_ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber