|
|||||||||||||||||||||||||||||||||
AMBER Archive (2007)Subject: Re: AMBER: amber on AMD opteron-250
From: servaas michielssens (servaas.michielssens_at_student.kuleuven.be)
More info:
2cpu per node
So my main problem is the jump when you take more than 4 cpu's, calculations are faster on 4 cpu's than 8. Scaling from 2 to 4 is ok, but the main problem is more than 4 cpus. Any suggestions there?
kind regards,
servaas
----- Original Message -----
No, it should not be that bad, even for gigabit ethernet, presuming this is a more-or-less standard pme run. If I run pmemd 8, JAC benchmark (pme, nve simulation, 500 steps, ~23K atoms) on my two intel xeon 3.2 GHz dual cpu workstations connected with an XO cable, GB ethernet, server nics, I get the following runtimes:
# procs wallclock sec
The 3.2 GHz xeons and opterons really have pretty similar performance.
So if you look at the 2 --> 4 processor performance, it comes pretty close to doubling. The 1-->2 processor performance typically does not for small dual core nodes; this is a matter typically of shared cache and other sharing effects, as well as the fact that there is a ton of overhead in the parallelization code that has maximum impact and minimum benefit at 2 cpu's (and the single cpu code has none of this - it is essentially a separate implementation, optimized for the single processor). You don't show single processor performance at all though. PMEMD 9 performance is even better. So you have other things going on.
Hi Servaas,
This is generally due to your network, which you did not mention so I assume we are talking about the gigabit ethernet, and to the number of CPU's per node, which also you neglected to specify. However, with my experience on dual CPU opterons (240's and 248's) and a gigabit ethernet these numbers seem about right. Unfortunately you may only be able to get good scaling for 20k atoms upto 32 CPUs, but only if you have a faster network like infiniband or myirnet or the like.
Good luck,
On 12/5/07, servaas michielssens <servaas.michielssens_at_student.kuleuven.be > wrote:
4 processors gives the optimum, it seems to be independent of how I
kind regards,
servaas michielssens
-----------------------------------------------------------------------
-----------------------------------------------------------------------
| |||||||||||||||||||||||||||||||||
|