AMBER Archive (2008)

Subject: RE: Fw: RE: AMBER: MKL libraries/Amber10

From: Ross Walker (ross_at_rosswalker.co.uk)
Date: Fri May 30 2008 - 17:41:51 CDT


> If its is MKL that makes the problem (on different hardware, by Cenk and
> me), why not compiling Amber10 with ifort + ACML or GOTO libraries? Is any
> indication how to set the congif file?

Last time I checked GOTO was basically a cut down BLAS library designed to
get very high linpack numbers and lacked any of the real useful stuff. This
may have changed. Note it is actually LAPACK that we really need which
obviously makes use of a lot of BLAS calls so one could perhaps use the
lapack code included in AMBER and link it to GOTO blas but I don't know how
useful this will be.

As for ACML last time I looked at it (albeit a couple of years ago now) it
seemed to have largely forgotten that double precision arithmatic exists.
All the vector functions were single precision and I don't recall it having
any of the matrix diagonalization routines in there so again not very
useful. However, this may have changed so perhaps we could consider adding
ACML support. Maybe if NSF awards an AMD based machine to SDSC I'll do it to
earn some kudos ;-).
 
> In previous mail Ross said that MKL will speed up QMMM by a large (not
> minor) margin.

This will of course depend on the QM system size. For less than 30 atoms or
so there will be almost no difference, for 30 to 50 atoms it will be minor
and for >50 atoms it will be a large difference so the importance of MKL is
a function of what you want to run... <sigh> nothing is ever simple ;-).

> As far as I know ACML are equivalent BLAS, and perhaps GOTO
> is even better than MKL or ACML.

The key routines for QMMM are vdinvsqrt, vdexp, vdsqrt, vdcos, dspev,
dspevd, dspevr, dsyev, dsyevd and dsyevr.

The first 4 are vector math functions and not strictly lapack routines.
There is also not a consensus on the interface for such routines so GOTO
would be no use here. ACML might help if it does double precision vectors.
In fact looking at the documentation:
http://developer.amd.com/assets/acml_userguide.pdf

It does vectored cosine, vectored exponential and vectored log but AMD seem
to have forgotten that people might want to do vectored sqrt or inverse
sqrt. They support vectored power (to a float) but only in single precision
:-(. And that probably doesn't make use of specialist sqrt hardware anyway
even if you do it to the power 0.5 so that's probably not much good. So for
vector math ACML would appear to be pretty useless.

It at least looks like the lapack implementation is complete so this might
help a bit - the threading might also help some for matrix diagonalizations
although one would have to check.

Although as usual we see that the marketing department does not live in the
real world:

"ACML's aggressively tuned OpenMP versions mean that you don't have to worry
about managing sophisticated threading models or complex debugging. Whether
you are using dynamic or static linking, Windows, Linux or Solaris 32- or
64-bit, multi threading just works. "

Great.... for single cpu non-mpi code. If you are using MPI this sort of
threading is just a complete pain...

Just my 3c.... (my prices are going up due to the cost of oil...)

All the best
Ross

/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
      to majordomo_at_scripps.edu