AMBER Archive (2008)
Subject: Re: AMBER: amber 10: sander and pmemd performance

From: Robert Duke (rduke_at_email.unc.edu)
Date: Fri Jul 25 2008 - 09:31:30 CDT

Next message: George Tzotzos: "AMBER: AMBER 10: Fortran Runtime Error"
Previous message: Robert Duke: "Re: AMBER: the pmemd problem"
In reply to: Vlad Cojocaru: "AMBER: amber 10: sander and pmemd performance"
Next in thread: Vlad Cojocaru: "Re: AMBER: amber 10: sander and pmemd performance"
Reply: Vlad Cojocaru: "Re: AMBER: amber 10: sander and pmemd performance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

There are probably at least 10, if not 20 different things going on here,
some of which you are talking about, some of which you are not. I have no
idea how many porcessors you are using. I don't know your interconnect.
This stuff can be impacted by 1) compiler choice, 2) compiler options
choice, 3) mpi choice, 4) how mpi was built, 5) how mpi was configured,
5a) how the system communications stacks are configured, 6) how pmemd was
configured to be optimized given the hardware and software in play, 7) the
hardware that is being used, in terms of specfics about a) cpu speed, b) cpu
cache size, c) multicore impacts on memory and other communications
bandwidths, d) the system buses in use, e) the net cards in use, 8) the
actual benchmarks in use - the size of the benchmark can make a big
difference in performance, depending on how the modeled system size matches
the cache size, and the processor count (so as the processor count comes
down, more is done in each individual processor in terms of total memory
requirements, and at some point you run out of cache, and that can really
make a difference in performance, for example). A wide range of options
chosen in mdin can totally whack performance. So what I did in the amber 8
and 9 timeframes is cook up a bunch of specific configurations with known
characteristics, and I carefully optimized the software and provided
configuration options to target these machines. It is not a simple matter
to then move to any new machine/new compiler/new implementation of any other
supporting library and see performance STAY THE SAME, LET ALONE GET BETTER.
It is really really really really easy to dink up the performance of this
sort of code; sad but true. It basically is optimized to sit on the edge of
a bunch of interlocking bottlenecks; push it a little in any direction, and
you start running slower for a different reason. So I am sorry about that,
but the community needs to maybe realize that this is not a simple matter.
I have spent a couple of decades working to some extent or another on
performance and reliability issues on computers in the computer industry,
and I have to approach each new configuration carefully, or I won't get
particularly good performance (I actually build about two dozen different
versions of pmemd and just run them at present). I choose not to spend the
rest of my life doing this for every combination of hardware and software
folks can dream up; I would really recommend that if you are serious about
running amber fast, you take a look at what we support well currently, and
consider making purchases in that direction. Right now, that probably means
that the best choice in mpi (for cluster builders) is good infiniband
hardware + mvapich and the ifort compiler. I would choose faster cpu's,
lower core count, hang the additional cost, if molecular dynamics is your
thing. If I had or really wanted amd processors, I would choose pathscale
compilers - they are fast and work well. PGI is my third choice in
compilers, but this may be because of past issues that are not that big a
deal anymore - they have made an honest effort to respond to past problems
and should be given credit. For me, intel compilers have always been pretty
darn fast, but a bit of a pain to use in terms of them changing things;
still they really know how to write a code optimizer. If you are stuck with
ethernet, well, don't expect much, but we support lam, mpich 1, mpich 2,
they all work well, and they are really pretty easy to install (I think lam
in particular may be pretty easy; for historical reasons mostly I settled in
on mpich 1-vintage stuff myself; I found that all the additional features of
mpich 2 where mostly a hassle for my small clusters). If you want to run a
totally new configuration and see what it will do with pmemd, then you need
to 1) optimize the mpi configuration, carefully, on your machine, 2)
optimized the compiler configuration, carefully, on your new machine, with
pretty aggressive compiler options, and 3) go through building pmemd every
way possible (all the various optimization options) to see what you get.
Then run different size benchmarks, different size runs, and on and on. And
another big point. If you are not willing to spend money on the rest of
your configuration, to get stuff that is recommended and known to work, and
spend the effort to set up the recommended configurations, then maybe you
shouldn't be so surprised that you don't get the best performance. I
wouldn't be...

Two potentially interesting notes:
1) You should not really expect pmemd 10 to be much faster than pmemd 9 on a
small cpu configuration unless you are running NVE or NVT, with the default
value for ene_avg_sampling; in the development work for 10, I found very
little aside from things associated with this option that would improve
performance at low processor count (and as I have said elsewhere, for the
right nve benchmarks on the right machines, the single processor nve
performance can be as much as 30% better, but this is sort of best case).
2) We will reasonably soon be supporting configurations using Intel MPI on
Infiniband. This stuff has better performance than anything I have seen for
commodity clusters - SUBSTANTIALLY better, and looks to me to be worth the
money.

Regards - Bob Duke

----- Original Message -----
From: "Vlad Cojocaru" <Vlad.Cojocaru_at_eml-r.villa-bosch.de>
To: "AMBER list" <amber_at_scripps.edu>
Sent: Friday, July 25, 2008 5:57 AM
Subject: AMBER: amber 10: sander and pmemd performance

> Dear ambers,
>
> I have compiled AMBER10 with the intel compilers ifort and icc (10.1) with
> 2 different mpi libs: v1= version with mpich2 1.0.7; v2 = version with
> openmpi 1.2.6 (MKL were used in both). All MPI versions were compiled with
> the same compilers as the AMBER package. Netcdf support was included in
> all compilations.
>
> I tested these compilations on a 60K atoms system on 4 cores of an AMD
> opteron machine with 2 double core CPUs (OS: Debian Linux) . After this I
> compared with one older compilation of AMBER9 with gcc 4.1 (gfortran) and
> openmpi 1.2.5 (v3) . I used both NPT and NVE ensemble for testing.
>
> To my surprise there is little difference in performance between v1 and v2
> compilations of AMBER 10 and the old compilation of AMBER 9 with gcc. The
> new sander.MPI is just about 6-8 % faster while the new pmemd is just
> about the same speed. Interestingly, the v2 compilation (intel+openmpi)
> was slightly faster than v1 (compiled with mpich2). Also, a different
> compilation using pgi 7.1 and openmpi 1.2.5 is very similar in
> performance with the intel ones.
>
> In general sander.MPI runs 0.155 to 0.165 ns per day of NPT simulation
> while pmemd runs 0.24 to 0.25 ns/day of NPT simulation and 0.245 to 0.2670
> ns/day of NVE simulation . All simulations have a time step of 1fs.
>
> I would like to ask you if according to your experience these performance
> parameters are what one would expect on such a machine? I was hoping that
> the intel compilers would compile significantly faster executables (around
> 15-20 %) comparing to gfortran but this is not the case (or maybe the
> increase in performance comes with higher CPU counts ?). Is there
> something one can play with during the compilation with the intel
> compilers to increase performance ? There are several messages int he
> AMBER archives suggesting that the intel compilers provide faster
> executables ....
>
> Best wishes
> vlad
>
>
> --
> ----------------------------------------------------------------------------
> Dr. Vlad Cojocaru
>
> EML Research gGmbH
> Schloss-Wolfsbrunnenweg 33
> 69118 Heidelberg
>
> Tel: ++49-6221-533266
> Fax: ++49-6221-533298
>
> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
>
> http://projects.villa-bosch.de/mcm/people/cojocaru/
>
> ----------------------------------------------------------------------------
> EML Research gGmbH
> Amtgericht Mannheim / HRB 337446
> Managing Partner: Dr. h.c. Klaus Tschira
> Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
> http://www.eml-r.org
> ----------------------------------------------------------------------------
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
> to majordomo_at_scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
to majordomo_at_scripps.edu

Next message: George Tzotzos: "AMBER: AMBER 10: Fortran Runtime Error"
Previous message: Robert Duke: "Re: AMBER: the pmemd problem"
In reply to: Vlad Cojocaru: "AMBER: amber 10: sander and pmemd performance"
Next in thread: Vlad Cojocaru: "Re: AMBER: amber 10: sander and pmemd performance"
Reply: Vlad Cojocaru: "Re: AMBER: amber 10: sander and pmemd performance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

AMBER Archive (2008)Subject: Re: AMBER: amber 10: sander and pmemd performance

AMBER Archive (2008)
Subject: Re: AMBER: amber 10: sander and pmemd performance