AMBER Archive (2005)

Subject: AMBER: pmemd - bug report

From: Petr Kulhanek (kulhanek_at_chemi.muni.cz)
Date: Mon Jul 04 2005 - 11:33:05 CDT


Dear AMBER developers,

I would like to report a bug in PMEMD code (AMBER 8.0).

The bug is responsible that some atoms are incorrectly addressed to
respective subcells. The bug occures only if some atoms are very close
to box walls (confirmed) or if they are on subcell interfaces (just
hypothesis). As the result, program can crash or the non-bonded energy
is not calculated correctly. The bug probably depends on architecture
and/or used compiler.

Test cases (for IA32, ifort 8.1.024, and 1CPU) can be downloaded from:

segmentation fault
http://www.ncbr.chemi.muni.cz/~kulhanek/downloads/tst1.tar.gz

infinity Eel
http://www.ncbr.chemi.muni.cz/~kulhanek/downloads/tst2.tar.gz

incorrect Eel
http://www.ncbr.chemi.muni.cz/~kulhanek/downloads/tst3.tar.gz

In each archive, six results are present:

mdout_bug_reference - calculated by pmemd on coordinates that leads to
incorrect results
mdout_bug_imaged_reference - calculated by pmemd on coordinates, which
were processed by ptraj (solut is centered and system is imaged)
mdout_fix_reference - calculated by modified pmemd (see below) on
coordinates that leads to incorrect results with non-modified pmemd
mdout_fix_imaged_reference - calculated by modified pmemd (see below) on
coordinates, which were processed by ptraj (solut is centered and system
is imaged)
mdout_sander_reference - calculated by sander on coordinates that leads
to incorrect results with non-modified pmemd
mdout_sander_imaged_reference - calculated by sander on coordinates,
which were processed by ptraj (solut is centered and system is imaged)

The bug is located in following files:

ew_direct_cit.f90 -> subroutine get_fract_crds (confirmed)
fraction could be equal to 1.0 due to round errors, however the code
suppose that it has to be less than 1.0

ew_direct_cit.f90 -> subroutine setup_crd_idx_lst_tbl (confirmed)
indexes to crd_idx_lst_tbl array are not checked if they are in legal ranges
if fractions calculated in get_fract_crds are equal to 1.0 then indexes
overflow crd_idx_lst_tbl boundaries

ew_pairlist.f90 -> subroutine get_nb_list (confirmed) and probably
get_nb_list_nonorthog (not tested)
indexes to x_bkts, y_bkts, and especially z_bkts arrays are not checked
if they are in legal ranges
array overflows could occur even if fractions are in legal ranges
because the indexes are calculated directly from imaged coordinates
(different approach -> different round errors)

ew_recip_cit.f90 -> subroutine claim_recip_imgs (hypothesis) and
probably claim_recip_imgs_nonorthog (not tested)
z_bkt_lo and z_bkt_hi, the same as above

I prepared two patches for testing purposes. (reference source is amber8
distro + bugfixes 1-44)
http://www.ncbr.chemi.muni.cz/~kulhanek/downloads/check.patch
http://www.ncbr.chemi.muni.cz/~kulhanek/downloads/fix.patch

The first one extends code with boundary check in above mentioned
subroutines. If some overflow occures then the program is immediatelly
stop with error message. The second one fixes the problem (boundary
check is still present).
fix.patch is only intended for testing purposes because it does not
solve possible problems in simulations using truncated octahedral box
and it was not tested under parallel execution.

Some comments at the end:

I found this bug when I did postprocessing analysis of trajectory
calculated by pmemd (MD was run without any problems as I remember). In
the analysis, the electrostatic energy is calculated for several system
subparts still under PBC. In the reported cases, only solut and solvent
are present (ions were stripped away), so the test cases represent rare
systems. However observed errors depend only on coordinates and I do not
see any reason why it could not occur during normal MD !
I think that it is neccessary to carefully check the pmemd code and
remove possible problems with reported round errors when subcell indexes
are calculated. Secondly, it is also important to keep the parts, in
which subcell indexes are calculated, as same as possible to avoid
different round errors.

For SANDER developers:

Please look to tst1.tar.gz archive. mdout_sander_reference and
mdout_sander_imaged_reference differ. The difference is about 3 kcal/mol
in Eel as well as in Evdw. I think that this could be a bug in sander.

Best regards,
        Petr

-- 
##################################################
   Petr Kulhanek
  ------------------------------------------------
   kulhanek_at_chemi.muni.cz
  ------------------------------------------------
   National Centre for Biomolecular Research
   Masaryk University Brno
   Kotlarska 2, CZ-611 37 Brno
   Czech Republic
##################################################
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu