|
|||||||||||||||||||||||||||||||||
AMBER Archive (2008)Subject: RE: AMBER: amber 9 on Intel Harpertown
From: Ross Walker (ross_at_rosswalker.co.uk)
Hi Geoff,
I have not encountered such problems before, I assume this is PMEMD you are
Note however that openMPI's performance is pretty aweful, especially on >64
All the best
Ross
From: owner-amber_at_scripps.edu [mailto:owner-amber_at_scripps.edu] On Behalf Of
Dear Reflector,
We are currently testing amber 9 on a new machine. We are having problems
The basic specks of the machine are as follows:
128 compute nodes, each with two quad-core Intel Harpertown 3.0 GHz
processors, for a total of 1024 cores;
Voltaire 20 Gbit/s InfiniBand fabric used both to share files thru GPFS and
11:07:15 cal2 root - /root > rpmg kernel
kernel-smp-2.6.16.46-0.12
kernel-ib-devel-1.3-2.6.16.46_0.12_smp.volt2986
kernel-smp-2.6.16.54-0.2.5
kernel-ib-1.3-2.6.16.46_0.12_smp.volt2986
kernel-source-2.6.16.46-0.12
kernel-source-2.6.16.54-0.2.5
We have successfully compiled amber 9 using openmpi/1.2.6_gcc-4.1.2 and
The InfiniBand retry count between two MPI processes has been
exceeded. "Retry count" is defined in the InfiniBand spec 1.2
(section 12.7.38):
The total number of times that the sender wishes the receiver to
retry timeout, packet sequence, etc. errors before posting a
completion error.
This error typically means that there is something awry within the
InfiniBand fabric itself. You should note the hosts on which this
error has occurred; it has been observed that rebooting or removing a
particular host from the job can sometimes resolve this issue.
Two MCA parameters can be used to control Open MPI's behavior with
respect to the retry count:
Thanks in advance.
----------------------------------------------------------------------------
Dr Geoffrey Wood
Ecole Polytechnique Fédérale de Lausanne
SB - ISIC - LCBC
BCH 4108
CH - 1015 Lausanne e-mail:
----------------------------------------------------------------------------
-----------------------------------------------------------------------
| |||||||||||||||||||||||||||||||||
|