AMBER Archive (2002)

Subject: Re: message during minimization

From: David A. Case (case_at_scripps.edu)
Date: Thu Dec 19 2002 - 10:05:00 CST


On Wed, Dec 18, 2002, Ioana Cozmuta wrote:

[problem was trying to run minimization on 128 processors]

> Here is a more explicit error message from my minimization run:
>
> Job 56060.lomax.nas.nasa.gov started on Wed Dec 18 17:47:11 PST 2002
> mpirun -np 128 $AMBERHOME/exe/sander -O -i ./box20Amin.in -o ./box20Amin_128cpu.out -p ./box20A.prmtop -c ./box20A.prmcrd -r ./box20A_128cpu.restrt -ref ./box20A.prmcrd -inf ./box20A.128cpu.mdinfo
>
> * NB pairs 171 2394 exceeds capacity ( 2424) 2
> SIZE OF NONBOND LIST = 2424
> EWALD BOMB in subroutine ewald_list
> Non bond list overflow!
> check MAXPR in locmem.f
>

OK...here's what is happening: Amber assumes that the nonbonded list can
be pretty equally distributed among all processors. With a large number
of CPU's, the "granularity" becomes big enough, so that the algorithm for
the division no longer works. In your case, the assumed size of the nonbonded
list for each processor is very small (only 2424 elements), but some
processors require more than this.

Go into locmem.f, search for where MAXPR is calculated, and increase its
estimate; you could easily give each processor 10 times as big a value,
and that should get you going.

..good luck...dac

-- 

================================================================== David A. Case | e-mail: case_at_scripps.edu Dept. of Molecular Biology, TPC15 | fax: +1-858-784-8896 The Scripps Research Institute | phone: +1-858-784-9768 10550 N. Torrey Pines Rd. | home page: La Jolla CA 92037 USA | http://www.scripps.edu/case ==================================================================