AMBER Archive (2003)
Subject: Re: AMBER: PMEMD Performance on Beowulf systems

From: Robert Duke (rduke_at_email.unc.edu)
Date: Thu Dec 18 2003 - 22:35:40 CST

Next message: L Jin: "AMBER: MD: restrt"
Previous message: Stephen.Titmuss_at_csiro.au: "AMBER: PMEMD Performance on Beowulf systems"
Maybe in reply to: Stephen.Titmuss_at_csiro.au: "AMBER: PMEMD Performance on Beowulf systems"
Next in thread: Carlos Simmerling: "Re: AMBER: PMEMD Performance on Beowulf systems"
Reply: Carlos Simmerling: "Re: AMBER: PMEMD Performance on Beowulf systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Stephen -
Several points -
1) Gigabit ethernet is not particularly good for scaling. The numbers I
published were on IBM blade clusters that had no other load on them, and the
gigabit interconnect was isolated from other net traffic. If you split
across switches or have other things going on (ie., other jobs running
anywhere on machines on the interconnect), performance tends to really drop.
This is all you can expect to happen from such a slow interconnect. A real
killer for dual athlons is to not take advantage of the dual processors;
typically if you have gigabit ethernet you will get better performance
through shared memory, and if one of the cpu's is being used for something
else, you can't do this.
2) LAM MPI in my hands is slower than MPICH, around 10% if I recollect,
without extensive testing (ie., I probably only did the check on some
athlons with a slow interconnect, but inferred that LAM was not necessarily
an improvement). Taking this into account, your xeon numbers are really not
very different than mine (you are 10% better at 8 cpu and 20% worse at 16
cpu, roughly).
3) Our 1.6 GHz athlons are slower than our 2.4 GHz xeons. I like the
athlons, but the xeons can take advantage of vectorizing sse2 instructions.
I don't know what your athlons are, but am not surprised they are slower.
Why they are scaling so badly, I would suspect to be loading, config, net
cards, motherboards, or heaven only knows. Lots of things can be slow (back
to item 1).
4) I don't use the Portland Group compilers at all because I had problems
with them a couple of years ago, and the company did absolutely nothing to
help. Looked like floating point register issues. This probably is not
still the case, but the point is that I don't know what performance one
would expect. My numbers are from the Intel fortran compiler. There could
also be issues about how LAM was built, or MPICH if you change to that.

You have to really bear in mind that with gigabit ethernet, you are at the
absolute bottom of reasonable interconnects for this type of system, and it
does not take much at all for numbers to be twofold worse than the ones I
published. My numbers are for isolated systems, good hardware, with the mpi
build carefully checked out, and with pmemd built with ifc, which is also
well checked out.

Regards - Bob Duke

----- Original Message -----
From: <Stephen.Titmuss_at_csiro.au>
To: <amber_at_scripps.edu>
Sent: Thursday, December 18, 2003 10:19 PM
Subject: AMBER: PMEMD Performance on Beowulf systems

> Hello All,
>
> We have been testing PMEMD 3.1 on a 32 cpu (16x dual Athlon nodes)
> cluster with a gigabit switch. The performance we have been seeing (in
> terms of scaling to larger numbers of CPUs) is a bit disappointing when
> compared to the figures released for PMEMD. For example, comparing
> ps/day rates for the JAC benchmark (with the specified cutoff changes,
> etc) on our cluster (left column) and those presented for a 2.4GHz Xeon
> cluster also with a gigabit switch (right column) gives:
>
> athlon xeon
> 1cpu: 108
> 2cpu: 172 234
> 4cpu: 239 408
> 8cpu: 360 771
> 16cpu: 419 1005
> 32cpu: 417
>
> In general, in terms of wall clock time, we only see a parallel speedup
> (c.f. 1cpu) of about 3.3 at 8 cpus and struggle to get much past 3.9
> going to higher numbers of cpus. The parallel scaling presented for
> other cluster machines appears to be much better. Has anyone else
> achieved good parallel speedup on beowulf systems?
>
> Also, we are using the Portland f90 compiler and LAM in our setup - has
> anyone experienced problems with this compiler or MPI library with
> PMEMD?
>
> Thanks in advance,
>
> Stephen Titmuss
>
> CSIRO Health Sciences and Nutrition
> 343 Royal Parade
> Parkville, Vic. 3052
> AUSTRALIA
>
> Tel: +61 3 9662 7289
> Fax: +61 3 9662 7347
> Email: stephen.titmuss_at_csiro.au
> www.csiro.au www.hsn.csiro.au
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu

Next message: L Jin: "AMBER: MD: restrt"
Previous message: Stephen.Titmuss_at_csiro.au: "AMBER: PMEMD Performance on Beowulf systems"
Maybe in reply to: Stephen.Titmuss_at_csiro.au: "AMBER: PMEMD Performance on Beowulf systems"
Next in thread: Carlos Simmerling: "Re: AMBER: PMEMD Performance on Beowulf systems"
Reply: Carlos Simmerling: "Re: AMBER: PMEMD Performance on Beowulf systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

AMBER Archive (2003)Subject: Re: AMBER: PMEMD Performance on Beowulf systems

AMBER Archive (2003)
Subject: Re: AMBER: PMEMD Performance on Beowulf systems