AMBER Archive (2007)

Subject: Re: AMBER: MPI Quiescence problem in REMD

From: In Hee Park (ipark_at_chemistry.ohio-state.edu)
Date: Wed Jul 11 2007 - 11:41:47 CDT


> does the same system perform well with MD?
yes, I tested single MD run out of 64 replicas used REMD, and worked
okay without any problem.

> did you equilibrate each temperature first (outside REMD)?
yes

> did you get any output in the mdout files? remlog?
yes, I see normal output files(crd, rst, out, mdout for every relica)
kept regularly updated up to some time step, but then REMD run
terminated with `MPI Quiescence problem` message with empty rem.log.
===
MPIRUN: MPI progress Quiescence Detected.
MPIRUN: 48 out of 64 ranks showed no MPI send or receive progress in 900 seconds.
===
Now I just notice that, however, only 16 replicas (out of 64) reached to
the final time step (I set NSTEP=1000), whereas other 48 replicas
stopped earlier time step (like NSTEP=800, 900, 950), which seems to be
relavant to the MPI Quiescence message shown above.

May I need to extend time step for all the replicas?
_____________
In-Hee Park

[2007-07-11.Wed.5:28am] Carlos Simmerling wrote `Re: AMBER: MPI Quiescence...'

  it's hard to guess what's going on.
  does the same system perform well with MD?
  did you equilibrate each temperature first (outside REMD)?
  did you get any output in the mdout files? remlog?
  there just isn't enough info to help except that it's likely
  some of the replicas have crashed. do they still show as
  running on the nodes?

  On 7/11/07, In Hee Park <ipark_at_chemistry.ohio-state.edu> wrote:
> Dear Amber users,
>
> I would like to ask you about `MPI Quiescence problem` that I've
> encounterd during REMD.
>
> I was trying to two set of REMD's, each of which consists of 64 replicas
> for (1) monomer, (2) protein system, respectively using the Amber9 on
> the AMD/Suse10.1 cluster.
>
> For the (1) monomer system REMD run, REMD worked well up to
> Temp-exchange production run as shown in the attached to this message;
> "monomer-REMD.result" and "monomer-REMD-pr-Texchange.out".
>
> In contrast to monomer case, for the (2) dimer system REMD, I performed
> additional step -- using the NMR restraint option -- to prevent overflow
> of rst's for every 64 targeted Temp elevation and indeed obtained
> non-overflow rst files.
>
> Then again, however, dimer-REMD(Temp exchange) doesn't work and the run
> ended up with following message:
> ===
> MPIRUN: MPI progress Quiescence Detected.
> MPIRUN: 48 out of 64 ranks showed no MPI send or receive progress in 900
> seconds.
> ===
>
> In order to check whether indeed MPI communication problem related to
> the cluster system itself or rather relevant to dimer-REMD setup, I ran
> "monomer-REMD" again (because it was alreday confirmed that working
> well). It turned out that monomer-REMD ran okay as usual, so MPI system
> itself is not problematic, but then I have no clue but asking you some
> help on this issue.
>
> For more your information, I attached both monomer and dimer REMD result
> (showing how temperatures exchanges are performed) and both REMD run's
> output message.
>
> Has anyone ever met this kind of problem before? Thanks a lot.
>
> _____________
> In-Hee Park
>
> [2007-06-27.Wed.4:14pm] Carlos Simmerling wrote `Re: AMBER: rst overflow...'
>
> I would suggest trying a distance restraint on the center of mass but using
> the NMR restraint option and setting the r2 distance short (1A)
> and the r3 to be larger, say 100A. That way it can move between these
> without any penalty but not move farther than 100A. I have to say, though,
> that one the dimer dissociates you will have a hard time getting it back.
> I am not aware of any studies using REMD on multiple chains except for
> looking at oligomerization of short chains under periodic boundary
> conditions.
> I would check the literature to see the current state of the art for
> figuring out protein-protein interaction- I don't think MD is the way to
> go. If you know the interface and just want to optimize it, then using
> shorter
> distances in the restraint to keep it from dissociating would be better, but
>
> you'll have to go carefully and may have to try many variations to find a
> protocol
> that works well. It all depends on what you mean by "drastic" changes.
> I would consider it an unsolved research problem.
>
> On 6/27/07, In Hee Park <ipark_at_chemistry.ohio-state.edu> wrote:
> >
> > Dr. Simmerling,
> >
> > Thanks for your critical help, your prediction was correct. The dimer
> > equilibration to the target Temp ended up with overflowing again even
> > with option "NSCM" used.
> >
> > Could you give me more guidance on your suggestion that setting a
> > restraint to keep the centers of mass from getting too far apart? Since
> > I am interested in conformational changes around the dimer interface,
> > which is at the center of mass, I am a bit hesitating just setting up
> > the typical group restraint around that interface.
> >
> > If as long as I am concerned about getting kind of drastic
> > conformational changes on around the interface, then LES of dimer would
> > be better?
> >
> > Thanks for your help.
> >
> > _____________
> > In-Hee Park
> >
> > [2007-06-26.Tue.11:35am] Carlos Simmerling wrote `Re: AMBER: rst
> > overflow...'
> >
> > with a dimer I am not sure if that is correct if the problem is that the
> > CM of the system (the dimer) is still at the origin.
> > the monomers may drift far apart. in essence, what you are simulating
> > is two monomers at infinite dilution. you probably should set a
> > restraint
> > to keep the centers of mass from getting too far apart, or write some
> > code
> > to keep the monomers inside a virtual box.
> >
> > On 6/26/07, David A. Case <case_at_scripps.edu> wrote:
> > >
> > > On Tue, Jun 26, 2007, In Hee Park wrote:
> > > >
> > > > Although setting "iwrap=1" is recommended to keep the coordinate
> > output
> > > > from overflowing the trajectory file format, this option can be used
> > PME
> > > > run only. Now, it just shifting to explicit(or hybrid)-REMD the only
> > > > possible way to make my dimer REMD possible? Is there no way to
> > resolve
> > > > the overflow problem under GB?
> > > >
> > >
> > > I think the nscm option can be used to do what you want for GB runs.
> > >
> > > ...dac
> > >
> > >
> > -----------------------------------------------------------------------
> > > The AMBER Mail Reflector
> > > To post, send mail to amber_at_scripps.edu
> > > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
> > >
> >
> >
  -----------------------------------------------------------------------
  The AMBER Mail Reflector
  To post, send mail to amber_at_scripps.edu
  To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu