AMBER Archive (2002)

Subject: Different numbers on different architectures.

From: David Smith (David.Smith_at_cup.uni-muenchen.de)
Date: Fri Oct 04 2002 - 08:33:24 CDT


Hello all,

Up until now I have been running AMBER on linux systems compiled
with g77. Just recently I got access to some new platforms on which I
attempted to compile AMBER. I would like to use this mail to report on
my experiences and, in particular, ask about the seriousness of the
differing results I get between these builds.

For the benifit of those who might be able to help me I will try to
include as much relevant detail as possible, so the mail may end up
being rather long.

I am still running with AMBER 6 (sorry, I like sander_classic), which I
understand is not supported anymore. However, I think most of the issues
are not version specific, especially as my main interest is in the
results coming out of gibbs.

For each build I ran the tests provided as well of some jobs of my own.

A) G77

For example, on an AMD processor using an unmodified version of
Machine.g77, the gibbs tests pass with only one possible failure which
is the following in gibbs_2.out.dif:

----
144,146c144,146
<  delta(LAMBDA)=0.2500000E-01  dA/d(LAMBDA) [SLOPE]=  0.00000E+00
<  slope*delta(LAMBDA)= 0.00000E+00  corr. coef.= 0.000000  pts for
line=   0.00
<  delA(for)-delA(rev)= 0.00000E+00  multiplier=  1.0000
---
>  delta(LAMBDA)=0.2500000E-01  dA/d(LAMBDA) [SLOPE]=   0.0000
>  slope*delta(LAMBDA)=  0.0000      corr. coef.= 0.000000  pts for line=   0.00
>  delA(for)-delA(rev)=  0.0000      multiplier=  1.0000
----

Which is only a difference in the output format of the 0s.

I also have a test job of my own I have been using. This is a perturbation in a small box of water with periodic boundary conditions in the NTP ensemble. I use internal constraints via intr=1 and itor=2. I equilibrate this box using gibbs at lambda=1 by setting NSTMEQ > NSTLIM. I then use the restart file from this run to start the perturbation (51 2ps steps with electrostatic decoupling and Thermodynamic Integration, total 102ps for the electrostatic part), reading in the positions, velocities and box dimensions via ntx=7. The input file is as follows:

perturbation of bu1 (to bu3) in water &cntrl irest=0, ntx=7, init=4, ntb=2, nrun=51, dt=0.001, nstlim=2000, nstmeq=600, nstmul=1400, ntc=2, ntf=2, intr=1, temp0=300.0, tautp=0.5, ntt=1, ntp=1, pres0=1.0, taup=0.5, npscal=1, cut=9.0, scnb=2.0, scee=1.2, dielc=1.0, cutprt=12.0, nsnb=20, ielper=1, intprt=0, idifrg=1, isande=1, almda=1.0, almdel=0.02, isldyn=-3, ntpr=200, ntwx=200, &end 00002 00001 00000 00000 00000 00000 00002 000.00000 001.00000 0000.00000 001.82532 000.00000 001.82795 0000 0000 other internal constraints ...

run with:

gibbs -O -i ti-100e.in -o ti-100e.out -r ti-100e.rst \ -p bu1-s-pert.top -c equil.rst -ms ti-100e.sum -x ti-100e.crd

and the final energy is:

Lambda = 0.000000 F_energy = 0.40692 Enthalpy = 0.76915 T*Entropy = 0.36223

B) PGF77

I recently got hold of the portland compiler and thought I'd give it a go. I used an unaltered Machine.pgf77 file. This time gibbs_2.out.dif has:

146c146 < delA(for)-delA(rev)= 0.00000E+00 multiplier= 1.0000 --- > delA(for)-delA(rev)= 0.92044E-15 multiplier= 1.0000

which I thought was a pretty small difference.

For my job above, I used the same input file with the same script and the same PINCRD (equil.rst).

The final energy is pretty close to before (at least G if not H and S) but not quite the same:

Lambda = 0.000000 F_energy = 0.39719 Enthalpy = 0.20729 T*Entropy = -0.18991

In addition, a window by window comparison shows quite some differences (e.g. window 35 has F= 0.0112 for g77 and F=0.0057 for pgf77).

C) Alpha Linux

I also got hold of a couple of compaq workstations and wanted to try with Alpha Linux (RedHat 7.2) which I recently put on. I got the Compaq compilers and used the following (slightly modified) version of Machine.alpha_linux:

setenv MACHINE "DEC Alpha linux" setenv MACH AXP_OSF setenv MACHINEFLAGS " -DPREC -DREGNML -DEWALD -DHAS_FTN_ERFC"

# CPP is the cpp for this machine

setenv CPP "/lib/cpp -traditional"

# SYSDIR is the name of the system-specific source directory for makemake

setenv SYSDIR Machines/alpha

setenv LOADLIB "/usr/lib/libcxml.a "

# COMPILER ALIASES:

setenv CC "ccc " setenv LOADCC "ccc " setenv VENDOR_BLAS yes setenv VENDOR_LAPACK yes

# LOADER/LINKER: setenv LOAD "fort -convert big_endian " setenv L0 "fort -arch host -extend_source -convert big_endian -c -tune host -fast " setenv L1 "fort -arch host -extend_source -convert big_endian -c -O -tune host -fast " setenv L2 "fort -arch host -extend_source -convert big_endian -c -O -tune host -fast " setenv L3 "fort -arch host -extend_source -convert big_endian -c -O5 -tune host -fast -unroll 3 "

# ranlib, if it exists setenv RANLIB ranlib

#--------------------------------

and then I had to compile leap separately using gcc but without the -taso, -non_shared, and -ldnet_stub flags in the $AMBERHOME/src/leap/src/leap/Imakefile.

This time gibbs_2.out.dif has:

144,146c144,146 < delta(LAMBDA)=0.2500000E-01 dA/d(LAMBDA) [SLOPE]= 0.00000E+00 < slope*delta(LAMBDA)= 0.00000E+00 corr. coef.= 0.000000 pts for line= 0.00 < delA(for)-delA(rev)= 0.00000E+00 multiplier= 1.0000 --- > delta(LAMBDA)=0.2500000E-01 dA/d(LAMBDA) [SLOPE]= 0.0000 > slope*delta(LAMBDA)= 0.0000 corr. coef.= 0.000000 pts for line= 0.00 > delA(for)-delA(rev)= 0.65746E-15 multiplier= 1.0000

Again I ran my test job and the final answer is:

Lambda = 0.000000 F_energy = 0.41389 Enthalpy = 0.33619 T*Entropy = -0.07770

Once again, the window by window comparison shows quite some differences.

Finally on an SGI (8 times R12000) I compiled with an unaltered Machine.sgi (I used the sgi_mpi for sander).

gibbs_2.out.dif:

102a103 > | Running shared memory parallel version on 4 processors 146c147 < delA(for)-delA(rev)= 0.00000E+00 multiplier= 1.0000 --- > delA(for)-delA(rev)= 0.26298E-15 multiplier= 1.0000

and my job gives: Lambda = 0.000000 F_energy = 0.37039 Enthalpy = 0.31467 T*Entropy = -0.05572

With MACINE=Machine.sgi_nopar I get all tests passed but my job gives the same answer.

The window by window comparison is as before (quite different on all versions).

In summary:

Water box. compiler/architecture F_energy

g77/i686 0.40692 pgf77/i686 0.39719 compaq/alpha 0.41389 mips/sgi 0.37039

I also have an 50ps FEP (double wide) run of a protein in a droplet which I didn't run under pgf yet but I do get:

Protein. complier/architectuere DG(forward) DG(reverse)

g77/i686 0.12555 -0.16087 compaq/alpha 0.17262 -0.21105 mips/sgi 0.02955 -0.07174

I know that both of the runs are quite short but I was expecting a little better agreement. I was just wondering if the developers had a feel if this was the normal amount of noise observed across different architectures or if I have done something obviously wrong.

Is there any way to reduce this variation or is it something one just has to live with. In general, I am extending my runs to times long enough to see convergence. It seems to me that this is something I cannot do across different architectures, would everybody agree ??

If you got this far, thanks for your patience and I will really appreciate any comments anybody has.

--------------------------------------- Dr. David Smith Department of Chemistry Ludwig Maximilians University Butenandt-Str. 5-13, D-81377 Munich Germany Tel.: +49 (0)89 2180 7740 Fax.: +49 (0)89 2180 7738 e-mail: David.Smith_at_cup.uni-muenchen.de ---------------------------------------