AMBER Archive (2004)

Subject: Re: AMBER: Minimization error

From: opitz_at_che.udel.edu
Date: Wed Aug 25 2004 - 15:55:24 CDT


('binary' encoding is not supported, stored as-is) Dear Dr. Walker,

I checked the first two of your suggestions already.
No, shake was not turned on.
Secondly, I went back and ran the tests again. They were fine with 2
processors, only one that said possible failure. The same was true for 1
processor. The output there was as follows:
119a120
> CHECK_SKIN FAILED: maxdis = 1.0656 9 steps since last update
126d126
< CHECK_SKIN FAILED: maxdis = 1.0656 9 steps since last update
To run it with 1 processor I have to use mpirun -np 1, would this make a
difference somehow?
I will try your other suggestions soon, I just wanted to know the answer
to these questions sooner.
As I do not know if the Amber installation on the cluster has passed the
tests, would it make more sense to contact that administrator and ask
about the tests before performing the short MD runs you suggested?

Best Regards,

Armin

==============Original message text===============
On Wed, 25 Aug 2004 13:23:13 EDT "Ross Walker" wrote:

Dear Armin

> The tests that Dr. Walker asked about, they passed on Octane
> with just a
> few warnings, but they turned out to be rounding errors. On
> the cluster I
> am not certain, as the system administrator had installed
> Amber7 there.

If the tests pass on the octane then you should be able to trust the results
you get from it. Just a note though - did you run the tests in parallel for
the same number of processors you have been using for the minimisation? I.e.
did you setenv DO_PARALLEL "mpirun -np 2" for example before running the
test? Often things can run perfectly on a single processor but when you move
to multiple processors you are more likely to have problems caused by
compiler / mpi optimisations etc.

Minimisation is a very difficult thing to get reproducible on different
numbers of processors due to rounding errors. However, your energies are
wildly different and so I am quite suspicious of the results. One quick
check - ensure you haven't got SHAKE turned on during the minimisation.

Try taking one of the structures you got and run MD with that structure on
1,2,4 cpu's etc on your octane and your cluster. Run it only for about 100
steps, set ntpr=1 so it prints the info on every step. MD on different
machines and numbers of cpu's will diverge over time again due to rounding
errors. However, over a 100 steps you should get trajectories that are
almost identical to each other (if you used the same starting structure). If
the first 5 steps or so are NOT identical (to the last couple of decimal
places) then something is definitely wrong. Post you output files at this
point and we can take a look and see if something is not getting broadcast
correctly in the parallel run.
 
> I have appended the final results output from each of the
> runs as well as
> attached an excel spreadsheet with these results a bit more
> organized. My
> question is, which of these states is acceptable? Are all of
> these wrong?

Since your differences are so huge I suspect that yes indeed something is
wrong. But it will take a bit to pin it down. The energy values on the first
step would be much more helpful for diagnosis than the last step. Are you
sure the octane tests passed for both 1 and 2 cpus?

All the best
Ross

/\
\/
|\oss Walker

| Department of Molecular Biology TPC15 |
| The Scripps Research Institute |
| Tel:- +1 858 784 8889 | EMail:- ross_at_rosswalker.co.uk |
| http://www.rosswalker.co.uk/ | PGP Key available on request |

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu