AMBER Archive (2007)

Subject: Re: AMBER: amber on AMD opteron-250

From: Robert Duke (rduke_at_email.unc.edu)
Date: Wed Dec 05 2007 - 15:31:28 CST


No, it should not be that bad, even for gigabit ethernet, presuming this is a more-or-less standard pme run. If I run pmemd 8, JAC benchmark (pme, nve simulation, 500 steps, ~23K atoms) on my two intel xeon 3.2 GHz dual cpu workstations connected with an XO cable, GB ethernet, server nics, I get the following runtimes:

# procs wallclock sec
1 186
2 113
4 64

The 3.2 GHz xeons and opterons really have pretty similar performance.

So if you look at the 2 --> 4 processor performance, it comes pretty close to doubling. The 1-->2 processor performance typically does not for small dual core nodes; this is a matter typically of shared cache and other sharing effects, as well as the fact that there is a ton of overhead in the parallelization code that has maximum impact and minimum benefit at 2 cpu's (and the single cpu code has none of this - it is essentially a separate implementation, optimized for the single processor). You don't show single processor performance at all though. PMEMD 9 performance is even better. So you have other things going on.
Regards - Bob
  ----- Original Message -----
  From: David LeBard
  To: amber_at_scripps.edu
  Sent: Wednesday, December 05, 2007 3:29 PM
  Subject: Re: AMBER: amber on AMD opteron-250

  Hi Servaas,

  This is generally due to your network, which you did not mention so I assume we are talking about the gigabit ethernet, and to the number of CPU's per node, which also you neglected to specify. However, with my experience on dual CPU opterons (240's and 248's) and a gigabit ethernet these numbers seem about right. Unfortunately you may only be able to get good scaling for 20k atoms upto 32 CPUs, but only if you have a faster network like infiniband or myirnet or the like.

  Good luck,
  David LeBard

  On 12/5/07, servaas michielssens <servaas.michielssens_at_student.kuleuven.be > wrote:
    I ran a 20ps simulation of a system of 20000 atoms on an AMD opteron 250
    cluster with 8 processors, I used amber8 and pmemd for the simulation. I
    found some strange results:
    proc time(min)
    2 31
    3 29
    4 20
    5 23
    6 24
    7 20
    8 21

    4 processors gives the optimum, it seems to be independent of how I
    adress the processors. So for 5 processors 1-2-3-4-5 or 1-2-3-4-7 gives
    the same results, always on for processors there is an optimum. Anyone
    who experienced this scaling problem?

    kind regards,

    servaas michielssens

    -----------------------------------------------------------------------
    The AMBER Mail Reflector
    To post, send mail to amber_at_scripps.edu
    To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu