AMBER Archive (2000)

Subject: Re: Sander/MPI/redimension

From: Michael Crowley (crowley_at_scripps.edu)
Date: Fri Nov 03 2000 - 15:28:51 CST


Dear Mihaly,
This message is reporting a problem iwth your system, not with sander
or its compiled parameters. In a parallel PME sander run, the system
is distributed among the compute nodes spatially. Thus, for a 2 node
run, the atoms in half of the unit cell will have the nonbond force
and contributions calculated by node 0 and atoms in the other half of
the cell will be calculated by the other node. Whenever there is an
large inhomogeneity inthe atom density across your unit cell, some
nodes will get significantly more atoms to work on than other cells.
Although there is some (30% or so) extra allowance for inhomogeneity,
the allocated space for things like the nonbond list can be exceeded
on nodes with too many atoms. An extreme example would be the 2 node
run if half of the cell was empty and all the atoms were on the other
side. Then the allocated space of 1.3*(half the atoms) will be
exceeded since that is less than all the atoms.

We designed the space allocation for parallel runs with the
assumption that there would be relatively little inhomogeneity
(vacuum spaces and the like) and that the inhomogeneity would not
exceed 30% of the 1/#nodes average on any processor.
In general, what it amounts to is that we assume that PME is used
for periodic simulations of solutions with no sizeable empty spaces.

Your problem has exceeded that limit. There must be some very dense
parts of the system compared to other parts of the unit cell where
you will find relatively little atom density. This can occur when the
unit cell is not constructed properly or is not specified properly.,
or you have, by design, tried to simulate a very inhomogeneous system
( a cell with only protein and no water or a system with a vacuum
interface.)

Please continue this discussion oif the explanation is not clear or
you need more help. This error message is
a common one, and relatively few people know what it means.
I hope that we can help others avoid and/or solve this
problem in their systems.

Sincerely,
Mike Crowley