AMBER Archive (2009)

Subject: Re: [AMBER] MPI process terminated unexpectedly after cluster upgrade

From: Jason Swails (jason.swails_at_gmail.com)
Date: Mon Nov 02 2009 - 11:04:14 CST


What about the tests?

On Mon, Nov 2, 2009 at 9:11 AM, Dmitri Nilov <nilovdm_at_gmail.com> wrote:

> Yes, I've recompiled Amber but I couldn't change mvapich because I'm just
> client on serious cluster)
>
> On Mon, Nov 2, 2009 at 3:15 PM, Jason Swails <jason.swails_at_gmail.com>
> wrote:
>
> > It could be that the new version of mvapich broke the previous
> > installation,
> > since the libraries could easily have changed (and if it's really, in
> fact,
> > a new version, I'd bet on it since there's not much else that could
> > 'change'). Did you try recompiling?
> >
> > Do the test cases still pass? If not, I'd say your only options are to
> > recompile amber/pmemd in parallel or revert back to the old version of
> > mvapich if it's still on the cluster.
> >
> > Good luck!
> > Jason
> >
> > On Mon, Nov 2, 2009 at 4:17 AM, Dmitri Nilov <nilovdm_at_gmail.com> wrote:
> >
> > > Hello!
> > > Sander.MPI tasks are crushing just after launch since mvapich software
> > was
> > > upgraded on cluster.
> > > Sander.MPI.out contains:
> > >
> > > MPI process terminated unexpectedly
> > > Exit code -5 signaled from node-23-06
> > > Killing remote processes...forrtl: error (69): process interrupted
> > (SIGINT)
> > > Image PC Routine Line
> > > Source
> > > libpthread.so.0 00007F2132C1EB00 Unknown Unknown
> > Unknown
> > > libpthread.so.0 00007F2132C1DB7E Unknown Unknown
> > Unknown
> > > libmpich.so.1.0 00007F21334CB1AC Unknown Unknown
> > Unknown
> > > libmpich.so.1.0 00007F21334E1ADE Unknown Unknown
> > Unknown
> > > libmpich.so.1.0 00007F21334C050A Unknown Unknown
> > Unknown
> > > libmpich.so.1.0 00007F21334A2DED Unknown Unknown
> > Unknown
> > > libmpich.so.1.0 00007F21334A1DC6 Unknown Unknown
> > Unknown
> > > sander.MPI 000000000093A0EF Unknown Unknown
> > Unknown
> > > sander.MPI 00000000004BC222 Unknown Unknown
> > Unknown
> > > sander.MPI 000000000041E05C Unknown Unknown
> > Unknown
> > > libc.so.6 00007F213216ACF4 Unknown Unknown
> > Unknown
> > > sander.MPI 000000000041DF69 Unknown Unknown
> > Unknown
> > > forrtl: error (69): process interrupted (SIGINT)
> > > and so on..
> > >
> > > I've found similar problem at
> > > http://archive.ambermd.org/200907/0092.html, that seems to be still
> > > unsolved.
> > > I don't think it's infiniband problem. So what i have to do?
> > >
> > > Thanks a lot!
> > > Dmitri Nilov,
> > > Lomonosov Moscow State University
> > >
> > > _______________________________________________
> > > AMBER mailing list
> > > AMBER_at_ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber
> > >
> > >
> >
> >
> > --
> > ---------------------------------------
> > Jason M. Swails
> > Quantum Theory Project,
> > University of Florida
> > Ph.D. Graduate Student
> > 352-392-4032
> > _______________________________________________
> > AMBER mailing list
> > AMBER_at_ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> >
> _______________________________________________
> AMBER mailing list
> AMBER_at_ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>

-- 
---------------------------------------
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER mailing list
AMBER_at_ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber