|
|||||||||||||||||||||||||||||||||
AMBER Archive (2007)Subject: Re: AMBER: Sander slower on 16 processors than 8
From: Martin Stennett (martin.stennett_at_postgrad.manchester.ac.uk)
In my experience Sander slows dramatically with even two processors. The message passing interface used means that it frequently drives itself into bottlenecks, with one or more processors waiting for very long periods for others to finish.
I have been trying to get decent scaling for amber calculations on our cluster and keep running into bottlenecks. Any suggestions would be appreciated. The following are benchmarks for the factor_ix and jac on 1-16 processors using amber8 compiled with pgi 6.0 except for the lam runs which used pgi 6.2
BENCHMARKS
mpich1 (1.2.7) factor_ix 1:928 2:518 4:318 8:240 16:442
mpich2 (1.0.5) factor_ix 1:938 2:506 4:262 8:*
mpich1 (1.2.7) jac 1:560 2:302 4:161 8:121 16:193
mpich2 (1.0.5) jac 1:554 2:294 4:151 8:111 16:181
lam (7.1.2) jac 1:516 2:264 4:142 8:118 16:259
* timed out after 3hours
QUESTIONS
First off, is it unusual for the calculation to get slower with increased number of processes?
Does anyone have benchmarks for a similar cluster, so I can tell if there is a problem with the configuration of our cluster? I would like to be able to run on more than one or two nodes.
SYSTEM CONFIGURATION
The 10 compute nodes use 2.0GHz dual core opteron 270 chips with 4GB memory and 1Mb memory Cache, tyan 2881 motherboards, HP Procurve 2848 switch, and single 1Gb/sec Ethernet connection to each motherboard. The master node is configured similarly but also has a 2TB of raid storage that is automounted by the compute nodes. We are running SuSE 2.6.5-7-276-smp for the operating system. Amber8 and mpich were compiled with pgi 6.0.
I have used ganglia to look at the nodes when a 16 process job is running. The nodes are fully consumed by system CPU time. The User CPU time is only 5% and this node is only pushing 1.4 kBytes/sec out over the network
Steve
------------------------------
Stephen F. Sontum
-----------------------------------------------------------------------
| |||||||||||||||||||||||||||||||||
|