|
|||||||||||||||||||||||||||||||||
AMBER Archive (2006)Subject: Re: AMBER: PMEMD scaling
From: Robert Duke (rduke_at_email.unc.edu)
Hi Tiziano,
Thanks for giving me an example of how the scaling game can be played! Consider two different implementations of a parallel application like MD. Say one of the implementations is twice as fast as the other on a single processor, and in addition presume that the machine has pretty darn fast cpu's, but a slow interconnect. Well, the faster application will be impacted more by the slow interconnect than the slower application because it is capable from a cpu-speed perspective of putting out 2x as much i/o to the interconnect. So the scaling is lower for the faster application, basically because you would have to push the interconnect past it's capacity to attain the same scaling you obtained from the slower application. I consider this to be a bit of a shell game that is played in the highly parallel computing world. If you never work on pushing your single cpu performance, then you can display better scaling numbers and push to higher cpu count because your per-cpu throughput expectation for 100% scaling is lower, and it will take more cpu's to start hitting the real limitations in the interconnect. So you could have one application that will get "scaling discounts" because it runs on a gazillion processors at a supercomputer center, and another that doesn't get these discounts, but in fact is capable of getting more work done per cpu hour and per unit of time. I have developed pmemd purely from this latter perspective, aiming to maximize the work done per cpu hour and per unit of time. In the development cycle between amber 8 and 9, I got some really good single cpu speedup, especially on intel and opteron ia32 architecture chips (not an accident). Presuming you really only have 2 and 4 processors here (this new "core 2 duo processor stuff can be confusing - some are dual core dual cpu, ie., 4 cpu's in a box), these look like rather nice numbers - practically a nsec/day from a pair of workstations for JAC. I appended my results from two older dual cpu machines, gb ethernet from a year ago (from the amber 9 pmemd README) below.
Regards - Bob Duke
>From PMEMD 9 README:
So what about smaller machines? We have decent speedups relative to
Factor IX NPT , 90906 atoms, 8 Angstrom cutoff, 1.5 fs step, PME
# procs pmemd 9 pmemd 8
JAC NVE, 23558 atoms, 9 Angstrom cutoff, 1 fs step, PME
1 254 185
End PMEMD 9 README.
---- Original Message -----
Hi all,
I installed both amber8 and amber9 and these are my benchmark results:
AMBER8
JAC - NVE ensemble, PME, 23,558 atoms
#procs nsec/day scaling, %
2 0.383 100
JAC - NVE ensemble, PME, 23,558 atoms
#procs nsec/day scaling, %
2 0.600 100
FACTOR IX - NVE ensemble, PME, 90,906 atoms
2 0.283 100
Is it possible to have a decrease of scaling using "pmemd9" compared with amber8, or I made a mistake during the installation? The "configure" command that I used is:
Many Thanks
--
| |||||||||||||||||||||||||||||||||
|