AMBER Archive (2007)

Subject: Re: AMBER: pmemd source issues

From: Thomas Zeiser (thomas.zeiser_at_rrze.uni-erlangen.de)
Date: Wed May 09 2007 - 10:00:00 CDT


Hi Robert,

On Wed, May 09, 2007 at 09:37:59AM -0400, Robert Duke wrote:
> Hi Thomas -
>
> Issue 1):
>
> I recently had the same question in my mind regarding some new code I am
> developing. None of the various ways I can think of to avoid this is
> particularly elegant, and they all further obfuscate the code (which is
> already fairly obfuscated, unfortunately). I tried looking at the
> standards to determine whether this is allowed - I could not find a
> definitive statement, and I have never seen a compiler refuse to compile
> code that does this.

The compiler did not refuse to compile it but "-check all" created
a runtime error telling that an unallocated variable was referenced.

> So here's the deal. A perfectly good test case on
> something like this is the allocated() intrinsic. If you can pass stuff in
> and check it's allocation status, that basically says this is okay - and I
> think you can.

> Something to remember about using extensive checks is that
> the checks code itself often does not get everything right.

I fully agree on that - and I only activated it to track down some
other issues we have in the MPI start up (probably not an Amber
issue but somewhere between [Intel-MPI], mpiexec and torque - but
that's a different story.)

> Anyway, I will
> play with this issue some to insure there is no real issue here, but I
> think there is not a real issue here (access to igrp(*) is guarded by the
> ibelly flag, essentially).

correct, the access itself if guarded, so it's more or less only
cosmetic.

> Issue 2):
>
> There is extensive code in loadbal to guard against just this sort of
> thing. Did you observe an out-of-range value for my_img_lo in an actual
> run?

yes, it's real-world and occurs with a production case of a user

> Please send me your config.h and your test case as well as the output
> of ifort -V and an exact description of what happened, and I will look into
> it further.

my last config.h is attached; the MPI version probably does not
matter (I used mvapich2 in the last tests and Intel-MPI before)

the code was compiled with
ifort -V
Intel(R) Fortran Compiler for Intel(R) EM64T-based applications,
Version 9.1 Build 20070320 Package ID: l_fc_c_9.1.045
(and 10.0.017beta gives exactly the same result).

If you apply a trivial patch (attached) you do not rely on the
compiler to do range checking.

I'm not yet sure how much the boundary violation depends on the
number of MPI processes; at least using
mpirun -np 128 ./pmemd -i bench_1jv2.in -p box_neutral.top -c bench_1jv2.crd
results in
        my_img_lo=193534 and img_cnt=193533
(at least on some processes or calls of the routine). Running with
64 processes gives exactly the same result. Running with 16
processes only does not trigger it.

Concerning the input files for the testcase: I have to check with
the user who provided them to me. I hope to be able to post a
download link soon.

> This is in the category of something that is possible, but the
> code has already taken the issue into consideration and been extensively
> tested at very high processor count where this sort of thing would be
> likely to happen (the basic problem is in dividing up the image workload -
> you have to be sure that it sums exactly to img_cnt, and if it doesn't all
> sorts of pandemonium would be expected).

sounds reasonable.

> Regards - Bob Duke

Regards,

thomas

> ----- Original Message -----
> From: "Thomas Zeiser" <thomas.zeiser_at_rrze.uni-erlangen.de>
> To: <amber_at_scripps.edu>
> Sent: Wednesday, May 09, 2007 7:33 AM
> Subject: AMBER: pmemd source issues
>
>
> >Dear All,
> >
> >I compiled pmemd9 (including Amber9 patches 1-34) using the latest
> >Intel EM64T compiler and enabled extensive runtime error checking
> >(-g -traceback -check all). Two types of issues came up:
> >
> >1) contraints.f90 only allocates "amt_igroup" if "ibelly" is set.
> >degcnt() is called from runmd.f90 and "amt_igroup" is passed in all
> >cases. The Intel compiler now complains (if ibelly is not set)
> >that "integer :: igrp(*)" is not allowed as an unallocated variable
> >is accessed.
> >An "allocate(atm_igroup(0))" in constraints.f90 solves this issue.
> >
> >A similar behaviour is observed for "gbl_loadbal_node_dat" which
> >gets only allocated on the master process (alltasks_setup.f90).
> >
> >I did not check the Fortran standard if using "type :: var(*)" is
> >allowed or not (I guess "no" as an unallocated variable does not
> >have any defined ranges which can be used for the assumed shape) -
> >but passing a valid variable seems to be a good idea anyway.
> >
> >
> >2) The probably more sever issue was detected in find_img_range()
> >from img.f90. At least for the testcase I got from our chemistry
> >people, "my_img_lo" is one unit larger than "img_cnt", thus, the
> >check "img_atm_map(img_i) .lt. 0" causes an array bound violation.
> >No idea about the implication of that (or a correct fix).
> >
> >
> >Kind regards,
> >
> >Thomas Zeiser





-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu