AMBER Archive (2006)

Subject: Re: AMBER: PMEMD scaling

From: Robert Duke (
Date: Thu Dec 07 2006 - 08:17:46 CST

Hi Tiziano,

Thanks for giving me an example of how the scaling game can be played! Consider two different implementations of a parallel application like MD. Say one of the implementations is twice as fast as the other on a single processor, and in addition presume that the machine has pretty darn fast cpu's, but a slow interconnect. Well, the faster application will be impacted more by the slow interconnect than the slower application because it is capable from a cpu-speed perspective of putting out 2x as much i/o to the interconnect. So the scaling is lower for the faster application, basically because you would have to push the interconnect past it's capacity to attain the same scaling you obtained from the slower application. I consider this to be a bit of a shell game that is played in the highly parallel computing world. If you never work on pushing your single cpu performance, then you can display better scaling numbers and push to higher cpu count because your per-cpu throughput expectation for 100% scaling is lower, and it will take more cpu's to start hitting the real limitations in the interconnect. So you could have one application that will get "scaling discounts" because it runs on a gazillion processors at a supercomputer center, and another that doesn't get these discounts, but in fact is capable of getting more work done per cpu hour and per unit of time. I have developed pmemd purely from this latter perspective, aiming to maximize the work done per cpu hour and per unit of time. In the development cycle between amber 8 and 9, I got some really good single cpu speedup, especially on intel and opteron ia32 architecture chips (not an accident). Presuming you really only have 2 and 4 processors here (this new "core 2 duo processor stuff can be confusing - some are dual core dual cpu, ie., 4 cpu's in a box), these look like rather nice numbers - practically a nsec/day from a pair of workstations for JAC. I appended my results from two older dual cpu machines, gb ethernet from a year ago (from the amber 9 pmemd README) below.

Regards - Bob Duke


So what about smaller machines? We have decent speedups relative to
pmemd 8 on my favorite Pentium 4 3.2 GHz / Gigabit ethernet setup. The MPI
implementation here is MPICH 1.2.6.

Factor IX NPT , 90906 atoms, 8 Angstrom cutoff, 1.5 fs step, PME

# procs pmemd 9 pmemd 8
                 psec/day psec/day
1 116 86
2 182 138
4 293 238

JAC NVE, 23558 atoms, 9 Angstrom cutoff, 1 fs step, PME

1 254 185
2 432 315
4 702 572


---- Original Message -----
From: Tiziano Tuccinardi
Sent: Thursday, December 07, 2006 3:46 AM
Subject: AMBER: PMEMD scaling

Hi all,
I 'm building a beowulf cluster: 2 (up to now) 2-cpu nodes Intel E6600 2,4 GHz Fsb 1066 4 MB cache (Core 2 Duo), 1 GB memory per node, gigabit ethernet (switch interconnected, 3com Switch 2816) / MPICH2 1.0.4p1 / Intel Fortran compiler 9.1.040 without Intel MKL.

I installed both amber8 and amber9 and these are my benchmark results:


JAC - NVE ensemble, PME, 23,558 atoms

#procs nsec/day scaling, %

    2 0.383 100
    4 0.631 82

JAC - NVE ensemble, PME, 23,558 atoms

#procs nsec/day scaling, %

    2 0.600 100
    4 0.857 71

FACTOR IX - NVE ensemble, PME, 90,906 atoms
#procs nsec/day scaling, %

    2 0.283 100
    4 0.395 70

Is it possible to have a decrease of scaling using "pmemd9" compared with amber8, or I made a mistake during the installation? The "configure" command that I used is:
./configure linux_em64t ifort mpich2

Many Thanks


+++++++++++++++++++++++++++++++++++++ Tiziano Tuccinardi, PhD Dip. Scienze Farmaceutiche Via Bonanno 6, 56126 PISA Tel ++39 050 2219572 Fax ++39 050 2219605 E-mail +++++++++++++++++++++++++++++++++++++

----------------------------------------------------------------------- The AMBER Mail Reflector To post, send mail to To unsubscribe, send "unsubscribe amber" to