AMBER Archive (2009)

Subject: Re: [AMBER] MPI process terminated unexpectedly after cluster upgrade

From: Dmitri Nilov (nilovdm_at_gmail.com)
Date: Tue Nov 03 2009 - 08:06:11 CST


And one thing more.
mpif90 is in /usr/lib/mvapich-intel-x86_64/bin/ on cluster. There is also
mpirun in this folder but its execution is forbidden. That was made for
running mpirun only from /usr/bin/. So could it make some problem?

Thanks!
Dmitri Nilov,
Lomonosov Moscow State University

On Tue, Nov 3, 2009 at 4:20 PM, Dmitri Nilov <nilovdm_at_gmail.com> wrote:

> Yes, I've followed all these instructions. Program is
> Amber10/Sander.MPI. Serial tests are OK. Most of parallel test cases are
> finished with "possible FAILURE: check *.dif", and corresponding
> sander.MPI.out files contain similar error.
> What test cases are most appropriate to analyse outputs?
>
> > ./configure -mvapich ifort
> I suppose that it means ./configure_amber -mpich ifort?
>
> I don't suppose there could be serious mistakes in infiniband or mvapich
> installation.
>
> Thanks!
> Dmitri Nilov,
> Lomonosov Moskow State University
>
> On Tue, Nov 3, 2009 at 2:36 AM, Ross Walker <ross_at_rosswalker.co.uk>wrote:
>
>> Are you certain it is linking to the correct version of infiniband?
>>
>> Make sure you do the following:
>>
>> I assume this is sander but similar instructions should be followed for
>> pmemd.
>>
>> 1) run > which mpif90
>>
>> Check that it is the path you expect. Check that it is the same path as
>> mpirun. Also check that the compute nodes use the same mpirun.
>>
>> 2) cd $AMBERHOME/src/
>> 3) make clean
>> 4) Update your MPI_HOME to point to the NEW mpi location
>> 5) ./configure -mvapich ifort
>> 6) make parallel
>> 7) Run the test suite in parallel and see if this works - probably easiest
>> to request an interactive session on your cluster and then set DO_PARALLEL
>> to the correct run command. E.g. "mpirun -np 8 -machinefile $PBS_NODEFILE
>> "
>> and cd $AMBERHOME/test/; make test.parallel
>>
>> If this crashes then I would check to make sure the new MVAPICH is
>> actually
>> working properly. There should be a test suite with it that checks it is
>> working. Is it definitely using the correct version, e.g. the 64 bit
>> version
>> on x86_64?
>>
>> Note, if you just recompiled without making clean and without building a
>> new
>> config_amber.h file and updating your MPI_HOME then it likely has been
>> built
>> with a mix of the old and new versions of MPI which is probably what is
>> causing your problems.
>>
>> Also make sure you are up to date on all the bugfixes.
>>
>> All the best
>> Ross
>>
>> > -----Original Message-----
>> > From: amber-bounces_at_ambermd.org [mailto:amber-bounces_at_ambermd.org] On
>> > Behalf Of Dmitri Nilov
>> > Sent: Monday, November 02, 2009 5:11 AM
>> > To: AMBER Mailing List
>> > Subject: Re: [AMBER] MPI process terminated unexpectedly after cluster
>> > upgrade
>> >
>> > Yes, I've recompiled Amber but I couldn't change mvapich because I'm
>> > just
>> > client on serious cluster)
>> >
>> > On Mon, Nov 2, 2009 at 3:15 PM, Jason Swails <jason.swails_at_gmail.com>
>> > wrote:
>> >
>> > > It could be that the new version of mvapich broke the previous
>> > > installation,
>> > > since the libraries could easily have changed (and if it's really, in
>> > fact,
>> > > a new version, I'd bet on it since there's not much else that could
>> > > 'change'). Did you try recompiling?
>> > >
>> > > Do the test cases still pass? If not, I'd say your only options are
>> > to
>> > > recompile amber/pmemd in parallel or revert back to the old version
>> > of
>> > > mvapich if it's still on the cluster.
>> > >
>> > > Good luck!
>> > > Jason
>> > >
>> > > On Mon, Nov 2, 2009 at 4:17 AM, Dmitri Nilov <nilovdm_at_gmail.com>
>> > wrote:
>> > >
>> > > > Hello!
>> > > > Sander.MPI tasks are crushing just after launch since mvapich
>> > software
>> > > was
>> > > > upgraded on cluster.
>> > > > Sander.MPI.out contains:
>> > > >
>> > > > MPI process terminated unexpectedly
>> > > > Exit code -5 signaled from node-23-06
>> > > > Killing remote processes...forrtl: error (69): process interrupted
>> > > (SIGINT)
>> > > > Image PC Routine Line
>> > > > Source
>> > > > libpthread.so.0 00007F2132C1EB00 Unknown Unknown
>> > > Unknown
>> > > > libpthread.so.0 00007F2132C1DB7E Unknown Unknown
>> > > Unknown
>> > > > libmpich.so.1.0 00007F21334CB1AC Unknown Unknown
>> > > Unknown
>> > > > libmpich.so.1.0 00007F21334E1ADE Unknown Unknown
>> > > Unknown
>> > > > libmpich.so.1.0 00007F21334C050A Unknown Unknown
>> > > Unknown
>> > > > libmpich.so.1.0 00007F21334A2DED Unknown Unknown
>> > > Unknown
>> > > > libmpich.so.1.0 00007F21334A1DC6 Unknown Unknown
>> > > Unknown
>> > > > sander.MPI 000000000093A0EF Unknown Unknown
>> > > Unknown
>> > > > sander.MPI 00000000004BC222 Unknown Unknown
>> > > Unknown
>> > > > sander.MPI 000000000041E05C Unknown Unknown
>> > > Unknown
>> > > > libc.so.6 00007F213216ACF4 Unknown Unknown
>> > > Unknown
>> > > > sander.MPI 000000000041DF69 Unknown Unknown
>> > > Unknown
>> > > > forrtl: error (69): process interrupted (SIGINT)
>> > > > and so on..
>> > > >
>> > > > I've found similar problem at
>> > > > http://archive.ambermd.org/200907/0092.html, that seems to be still
>> > > > unsolved.
>> > > > I don't think it's infiniband problem. So what i have to do?
>> > > >
>> > > > Thanks a lot!
>> > > > Dmitri Nilov,
>> > > > Lomonosov Moscow State University
>> > > >
>> > > > _______________________________________________
>> > > > AMBER mailing list
>> > > > AMBER_at_ambermd.org
>> > > > http://lists.ambermd.org/mailman/listinfo/amber
>> > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > ---------------------------------------
>> > > Jason M. Swails
>> > > Quantum Theory Project,
>> > > University of Florida
>> > > Ph.D. Graduate Student
>> > > 352-392-4032
>> > > _______________________________________________
>> > > AMBER mailing list
>> > > AMBER_at_ambermd.org
>> > > http://lists.ambermd.org/mailman/listinfo/amber
>> > >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER_at_ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER_at_ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER mailing list
AMBER_at_ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber