AMBER Archive (2007)

Subject: RE: AMBER: Test fails in Parallel version Amber9

From: Ross Walker (ross_at_rosswalker.co.uk)
Date: Fri Oct 19 2007 - 09:24:47 CDT


Hi Gong,

I suspect the problem you are seeing is because in parallel an exchange is
being triggered but in serial it isn't. This may be a simple function of
rounding differences caused by running in parallel. Trajectories often
diverge when run on different numbers of processors. There is nothing
inherently wrong with this, it just reflects the nature of Newton's
equations of motion. In parallel the order of operations is different and so
rounding differences occur and as you integrate over time these can magnify
rapidly.

I suspect that in this case the decision on whether to exchange or not is
right on the cusp so that the rounding difference just happens to make it
exchange whereas before it didn't. Of course once an exchange is made in
REMD the target temperature changes and so the trajectory will diverge very
very quickly.

The question is whether what you are seeing is innocuous and just a function
of this test case being very sensitive to this rounding difference or it is
a bug somewhere in the code. Can you try running this test case with
different numbers of processors, I'm not sure if it will run in serial but
if it will please try it. Then try 4,8,12,16 and see what happens. This will
help tie down what is going on.

All the best
Ross

/\
\/
|\oss Walker

| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

> -----Original Message-----
> From: owner-amber_at_scripps.edu
> [mailto:owner-amber_at_scripps.edu] On Behalf Of gong wb
> Sent: Thursday, October 18, 2007 19:01
> To: amber_at_scripps.edu
> Subject: Re: AMBER: Test fails in Parallel version Amber9
>
> Hi, all
> Does anyone know how to figure out these problems?
>
> On 10/17/07, gong wb <bnmrcamber_at_gmail.com> wrote:
> > Dear Scott,
> > Thanks for your reply. We have check the ptraj.out and find that
> > we had only used -bintraj in paralllel compile, so the
> ptraj does not
> > support bintraj. Thus, we recompile both serial and parallel amber9,
> > both with -bintraj. This time, we passed bintraj test. But we still
> > got three possible failures, which is the same as we reported last
> > time.
> > Here is the diff file:
> > possible FAILURE: check mdout.jar.001.dif
> > /public/amber9/test/jar_multi
> > 177c177
> > < Etot = -3538.3785 EKtot = 478.2764 EPtot =
> > -4016.6548
> > ---
> > > Etot = -3538.3784 EKtot = 478.2764 EPtot
> = -4016.6548
> > 180c180
> > < EELEC = -18.4200 EGB = -2503.6434 RESTRAINT =
> > 3.6286
> > ---
> > > EELEC = -18.4199 EGB = -2503.6434
> RESTRAINT = 3.6286
> > ---------------------------------------
> > possible FAILURE: check rem.log.dif
> > /public/amber9/test/rem_gb_4rep
> > 26c26
> > < 2 1.15 234.76 -3.24 300.00 400.00 0.80
> > ---
> > > 2 1.15 261.02 -4.61 300.00 400.00 0.80
> > ---------------------------------------
> > possible FAILURE: check reminfo.000.dif
> > /public/amber9/test/rem_gb_4rep
> > 16,20c16,20
> > < NSTEP = 100 TIME(PS) = 100.800 TEMP(K) =
> 234.76 PRESS = 0.
> > < Etot = 21.0164 EKtot = 24.2585 EPtot =
> > -3.2421
> > < BOND = 14.3725 ANGLE = 19.8208 DIHED =
> > 25.4361
> > < 1-4 NB = 5.7103 1-4 EEL = 182.5250 VDWAALS =
> > -5.9319
> > < EELEC = -213.6574 EGB = -31.5175
> RESTRAINT = 0.
> > ---
> > > NSTEP = 100 TIME(PS) = 100.800 TEMP(K) =
> 261.02 PRESS = 0.
> > > Etot = 22.3628 EKtot = 26.9719 EPtot
> = -4.6092
> > > BOND = 14.9791 ANGLE = 17.9986 DIHED
> = 25.4386
> > > 1-4 NB = 5.6257 1-4 EEL = 182.4183
> VDWAALS = -5.8981
> > > EELEC = -213.4152 EGB = -31.7563
> RESTRAINT = 0.
> > ---------------------------------------
> > And now, we can say that the last two possible failure is
> > reproducible. How can we figure them out?
> >
> --------------------------------------------------------------
> ---------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu