AMBER Archive (2008)

Subject: Re: AMBER: massively parallel computation

From: Mingfeng Yang (
Date: Tue May 20 2008 - 11:58:56 CDT


I have been very impressed by your work on PMEMD since I was a graduate
student. I just liked to throw out a topic, and see how the others think
we (the Amber community) can push our limit. Your answer is very
enlightening to me. Thanks!


On Tue, 2008-05-20 at 10:22 -0400, Robert Duke wrote:
> Okay, several points. First of all, we (meaning mostly me - we have a
> lean and mean staffing profile for pmemd, and have been using one guy
> to basically come pretty close to keeping up with efforts by other
> groups using between 10 and 30 folks and many many $$$) have had an
> aggressive parallel performance effort in amber/pmemd for the last
> several years. We have greatly increased the capabilities of amber in
> regard, not to it's ability to eat up time on large piles of
> processors, but in terms of its ability to produce maximal nsec/day
> simulated time with minimal resources - we emphasize THROUGHPUT, not
> how many thousands of processors we can tie up (which by the way, most
> folks can't lay their hands on anyway). We have done this so far
> without making any compromises whatsoever in terms of
> accuracy/precision of results using the amber forcefields. Currently,
> there are some programs/systems running faster than we are. I have
> not studied this issue extensively as of yet, but I do know that in at
> least some instances, compromises have been made in arithmetic
> precision and energy conservation to meet the goal of higher
> performance. I am interested in these tradeoffs, but completely
> unwilling to make them without a reasonable degree of certainty that
> actual quality of results is not being sacrificed (I regard it an open
> research question). So if you look at our benchmarks pages, we are
> actually doing quite well against things like namd, though we don't do
> direct comparisons (and some benchmark comparisons are
> apples/oranges). I think namd is finally a bit ahead of us due to
> it's ability to do very fine grained workload distribution through its
> charm++ (or whatever it is) parallelization layer. This is a great
> idea in that it allows better overlap between computation and
> communication than we will ever achieve using fortran 90 plus mpi, but
> the difference is less than a factor of two (I have not posted amber
> 10 benchmarks yet; they are better than 9, but not a huge jump up - we
> are hitting some limitations with pme that are going to be hard to get
> around). Okay, so we can now get something like 26 nsec/day for ~23K
> atom pme simulations (JAC) on good hardware (sp5's, for instance, I
> think I get close to that on lonestar too, but would have to look it
> up). We do this typically in the range of low 100's of processors,
> which is pretty good throughput, and performance per atom typically
> improves as you increase the size of the system, at least into the low
> 100's of thousands of atoms. We then basically have problems with a
> fundamental tradeoff we made to get really good performance in the
> 100's of processor range, and don't go much further. I'll be
> continuiing to think about the problem, but right now, the machines
> that are being built are not really addressing our problem space - we
> need more interconnect bandwidth and lower interconnect latency, and
> we are not getting it because it is much cheaper to slap together a
> large pile of multicore chips and count the flops. Okay, finally, to
> the paper. The paper you are referencing here is about Shaw's NT
> (neutral territory) method. It is probably a pretty good idea at the
> limit where the number of atoms each atom interacts with is the
> limiting factor on performance, because it cuts that number. Mind
> you, that number remains large, even with NT, but NT does cut it.
> Well here's the deal. I can do things in pmemd that will cut the atom
> interaction number in half, and lose performance (well really, I have
> done a couple of different things that reduced interactions in the
> 20-50% range I believe, and I lost ground). Why? Because performance
> is the sum of many many things, and each decision you make has costs,
> often hidden. So NT undoubtedly has costs, and implementing it in
> pmemd would require heaven only knows how many months of completely
> tearing up and reworking fundamental architecture, and in the end, it
> probably would not help. I recently attended a talk given by Shaw.
> He is really pushing the limits of performance on the MD problem, but
> he is doing it by 1) lowering precision, and 2) more importantly, he
> is practically moving the entire problem into hardware, where he can
> parallelize everything at the hardware level. So he projects (but
> does not yet have running, as far as I know) something like 100x
> speedup over current common codes (I am just pulling this number out
> of the air as I am not digging out the Anton paper at the moment - it
> is something like that - maybe more). Now here's the point - I heard
> him say in the talk that pushing into the "Anton range of performance"
> - he is finally seeing a NT benefit. So that is where NT can help
> you, when you have used hardware to slay all the other dragons that
> kill you. Getting MD to run really really fast is a very icky
> problem...
> Regards - Bob Duke
> ----- Original Message -----
> From: Mingfeng Yang
> To:
> Sent: Tuesday, May 20, 2008 3:03 AM
> Subject: AMBER: massively parallel computation
> Recently, a few algorithms have been developed to enable
> massively parallel computation which can efficiently use
> hundreds of CPUs simultaneously for MD simulation. For
> example, J Comput Chem 26: 1318–1328, 2005.
> Is there a plan to implement such algorithm in Amber/PMEMD? As
> computer cluster is getting cheaper and cheaper, the cluster
> size keeps expanding quickly as well. Such algorithms should
> be very helpful and indispensable to reach >ms scale
> simulation.
> Thanks,
> Mingfeng

The AMBER Mail Reflector
To post, send mail to
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)