AMBER Archive (2006)

Subject: Re: AMBER: ACML and MASS/MASSV for sander and pmemd

From: Robert Duke (
Date: Tue Sep 19 2006 - 15:42:05 CDT

Hi Ross -
Thanks for the additional pnfo on the actual characteristics of ACML;
historically (pre-GB) pmemd has not been helped much at all by math
libraries, which is probably the best explanation I have for why I am not
familiar with all of them :-) You are correct that pmemd makes no use of
BLAS or LAPACK, and I tend to agree with you about the fft libraries also.
The fft libraries that ship with pmemd are originally public domain, but I
cleaned them up and did a bit of tweaking, and the performance is actually
pretty good. Because FFTW can do runtime platform tuning, it is a bit
faster on some platforms for uniprocessor code (say 10% for the fft portion,
off the top of my head, and for p4's); the thing is that the overall
contribution of the actual fft computations actually drops pretty quickly
when you go parallel and start doing distributed fft transforms. FFTW also
does do prime factors of 2, 3, 5 and 7, as opposed to the shipped code which
just does 2, 3, and 5, so in theory you may get a better grid fit; I have
never seen that matter much either.
Best regards - Bob

----- Original Message -----
From: "Ross Walker" <>
To: <>
Sent: Tuesday, September 19, 2006 4:24 PM
Subject: RE: AMBER: ACML and MASS/MASSV for sander and pmemd

> Hi Nick,
> See $AMBERHOME/src/pmemd/src/veclib.fpp
> This routine is used to essentially overload calls to things like
> vdinvsqrt
> with whatever the library call is.
> However, taking a quick look at the ACML documentation
> ( there does not appear
> to be a routine for doing a double precision vectored inverse square root
> so
> it is unlikely to help you much. You could maybe cheat this with the use
> of
> a vectored power function but these only seem to be single precision and
> even if they do a double precision no integer power function it is likely
> to
> be hopelessly slow compared to doing a straight inverse square root.
> Intel's MKL supports a vdinvsqrt function and you may find that this works
> well even on Opteron chips. Typically the performance improvement is not
> great on Intel and AMD chips verses the compiler's internal vectorisation
> (-lsvml).
> GB calculations may benefit slightly from vectored exponential functions
> but
> without a vectored inverse square root routine it probably isn't worth the
> effort to add this. Although you are welcome to try. In the veclib file
> you
> should be able to work out how things are done for MASSV, you can just
> edit
> this with say #ifdef ACML and add the relevant ACML calls, make sure you
> only use double precision routines and that you edit the config.h file to
> have the correct ACML define and link in the correct libraries.
> Note ACML includes a number of so called relaxed (or fast) routines such
> as
> fastexp. I think these just have relaxed error handling rather than
> relaxed
> precision in which case you could call them inplace of regular exp calls
> but
> you would have to check this carefully. If the precision is also relaxed
> then they will not be appropriate for MD.
> Again they don't have a fastsqrt of fastinvsqrt function so will probably
> be
> of limited use.
> In terms of BLAS and LAPACK pmemd and sander do not benefit greatly from
> this as they use very few of these functions. Maybe Bob can correct me
> here
> but I don't think pmemd uses BLAS or LAPACK at all. I don't know where
> they
> got the line:
> "While Amber 8 is distributed with source for the required BLAS routines,
> an
> optimized BLAS library greatly improves performance."
> from. I suspect they just assumed this was the case and didn't bother to
> test it. Note other programs in the AMBER suite use BLAS, such as divcon
> and
> nmode and will benefit here but the MD engine itself (sander / pmemd) is
> unlikely to see any benefit.
> You could maybe squeeze out another 1% or so by really going to town and
> having pmemd use the ACML FFT library but our experience in general is
> that
> linking to vendor FFT libraries makes little difference over using the
> custom tuned pubFFT routines in the AMBER code.
> All the best
> Ross
> /\
> \/
> |\oss Walker
> | HPC Consultant and Staff Scientist |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- |
> | | PGP Key available on request |
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>> -----Original Message-----
>> From:
>> [] On Behalf Of Nicolas Lux Fawzi
>> Sent: Tuesday, September 19, 2006 12:36
>> To:
>> Subject: AMBER: ACML and MASS/MASSV for sander and pmemd
>> Hi Bob (and others who might be able to help),
>> I noticed your timings (
>> were on NERSC's new infiniband opteron and power5 machines. I am
>> starting to run sander and pmemd on these machines. I also noticed
>> that you did not use the ACML math library on the operton,
>> but did use
>> the MASSV lib on the power5. Here come the questions: Is there a
>> reason you (or anyone else) skipped the opteron math library -- is it
>> no faster? And second, clearly I could test this myself, but
>> I need to
>> figure out how to get the ACML and MASSV libraries to be used instead
>> of the built in functions. I have seen this page from Pathscale
>> regarding amber8 (
>> from which I suppose I can put together how to use the math libraries
>> for sander and pmemd, but I was wondering if there were instructions
>> out there somewhere. I checked through the Amber9 manual, but didn't
>> find anything.
>> Thanks for helping out a new person!
>> -Nick
>> --------------------------------------------------------------
>> ---------
>> The AMBER Mail Reflector
>> To post, send mail to
>> To unsubscribe, send "unsubscribe amber" to
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to
> To unsubscribe, send "unsubscribe amber" to

The AMBER Mail Reflector
To post, send mail to
To unsubscribe, send "unsubscribe amber" to