AMBER Archive (2003)Subject: Re: AMBER: PMEMD Performance on Beowulf systems
From: Viktor Hornak (hornak_at_csb.sunysb.edu) 
Date: Mon Dec 22 2003 - 07:47:42 CST
 
 
 
 
A7M266-D Dual Socket A Motherboard (AMD 762 Chipset). It has 3 32bit 
 
33MHz PCI slots and 2 64/32bit 66/33MHz PCI slots. To get a noticable 
 
speedup in networking, the gigabit card (Intel Pro1000) needs to be 
 
placed into 64bit 66MHz PCI slot.
 
 Hope this helps,
 
-Viktor Hornak
 
Stony Brook University
 
 Aldo Jongejan wrote:
 
 >Hi,
 
>
 
> What kind of motherboards are we talking about?!
 
>
 
> aldo
 
>
 
> Carlos Simmerling wrote:
 
> >
 
> > We had gigabit network on both our dual athlons (1.6ghz)
 
> > and our dual Xeons. Scaling was much worse on the athlons
 
> > until we found that moving the network cards (Intel) to a
 
> > different slot made a huge difference for the athlon motherboards.
 
> > You should check this to see what the PCI bandwidth is on each
 
> > slot- for us they were not the same.
 
> > Carlos
 
> >
 
> > ----- Original Message -----
 
> > From: "Robert Duke" <rduke_at_email.unc.edu>
 
> > To: <amber_at_scripps.edu>
 
> > Sent: Thursday, December 18, 2003 11:35 PM
 
> > Subject: Re: AMBER: PMEMD Performance on Beowulf systems
 
> >
 
> > > Stephen -
 
> > > Several points -
 
> > > 1) Gigabit ethernet is not particularly good for scaling.  The
 
>numbers I
 
> > > published were on IBM blade clusters that had no other load on
 
>them, and
 
> > the
 
> > > gigabit interconnect was isolated from other net traffic.  If you
 
>split
 
> > > across switches or have other things going on (ie., other jobs
 
>running
 
> > > anywhere on machines on the interconnect), performance tends to
 
>really
 
> > drop.
 
> > > This is all you can expect to happen from such a slow
 
>interconnect.  A
 
> > real
 
> > > killer for dual athlons is to not take advantage of the dual
 
>processors;
 
> > > typically if you have gigabit ethernet you will get better
 
>performance
 
> > > through shared memory, and if one of the cpu's  is being used for
 
> > something
 
> > > else, you can't do this.
 
> > > 2) LAM MPI in my hands is slower than MPICH, around 10% if I
 
>recollect,
 
> > > without extensive testing (ie., I probably only did the check on
 
>some
 
> > > athlons with a slow interconnect, but inferred that LAM was not
 
> > necessarily
 
> > > an improvement).  Taking this into account, your xeon numbers are
 
>really
 
> > not
 
> > > very different than mine (you are 10% better at 8 cpu and 20% worse
 
>at 16
 
> > > cpu, roughly).
 
> > > 3) Our 1.6 GHz athlons are slower than our 2.4 GHz xeons.  I like
 
>the
 
> > > athlons, but the xeons can take advantage of vectorizing sse2
 
> > instructions.
 
> > > I don't know what your athlons are, but am not surprised they are
 
>slower.
 
> > > Why they ar scaling so badly, I would suspect to be loading,
 
>config, net
 
> > > cards, mothrboards, or heaven only knows.  Lots of things can be
 
>slow
 
> > (back
 
> > > to item 1).
 
> > > 4) I don't use the Portland Group compilers at all because I had
 
>problems
 
> > > with them a couple of years ago, and the company did absolutely
 
>nothing to
 
> > > help.  Looked like floating point register issues.  This probably
 
>is not
 
> > > still the case, but the point is that I don't know what performance
 
>one
 
> > > would expect.  My numbers are from the Intel fortran compiler. 
 
>There
 
> > could
 
> > > also be issues about how LAM was built, or MPICH if you change to
 
>that.
 
> > >
 
> > > You have to really bear in mind that with gigabit ethernet, you are
 
>at the
 
> > > absolute bottom of reasonable interconnects for this type of
 
>system, and
 
> > it
 
> > > does not take much at all for numbers to be twofold worse than the
 
>ones I
 
> > > published.  My numbers are for isolated systems, good hardware,
 
>with the
 
> > mpi
 
> > > build carefully checked out, and with pmemd built with ifc, which
 
>is also
 
> > > well checked out.
 
> > >
 
> > > Regards - Bob Duke
 
> > >
 
> > > ----- Original Message -----
 
> > > From: <Stephen.Titmuss_at_csiro.au>
 
> > > To: <amber_at_scripps.edu>
 
> > > Sent: Thursday, December 18, 2003 10:19 PM
 
> > > Subject: AMBER: PMEMD Performance on Beowulf systems
 
> > >
 
> > >
 
> > > > Hello All,
 
> > > >
 
> > > > We have been testing PMEMD 3.1 on a 32 cpu (16x dual Athlon
 
>nodes)
 
> > > > cluster with a gigabit switch.  The performance we have been
 
>seeing (in
 
> > > > terms of scaling to larger numbers of CPUs) is a bit
 
>disappointing when
 
> > > > compared to the figures released for PMEMD.  For example,
 
>comparing
 
> > > > ps/day rates for the JAC benchmark (with the specified cutoff
 
>changes,
 
> > > > etc) on our cluster (left column) and those presented for a
 
>2.4GHz Xeon
 
> > > > cluster also with a gigabit switch (right column) gives:
 
> > > >
 
> > > >        athlon   xeon
 
> > > >  1cpu:  108
 
> > > >  2cpu:  172     234
 
> > > >  4cpu:  239     408
 
> > > >  8cpu:  360     771
 
> > > > 16cpu:  419    1005
 
> > > > 32cpu:  417
 
> > > >
 
> > > > In general, in terms of wall clock time, we only see a parallel
 
>speedup
 
> > > > (c.f. 1cpu) of about 3.3 at 8 cpus and struggle to get much past
 
>3.9
 
> > > > going to higher numbers of cpus.  The parallel scaling presented
 
>for
 
> > > > other cluster machines appears to be much better.  Has anyone
 
>else
 
> > > > achieved good parallel speedup on beowulf systems?
 
> > > >
 
> > > > Also, we are using the Portland f90 compiler and LAM in our setup
 
>- has
 
> > > > anyone experienced problems with this compiler or MPI library
 
>with
 
> > > > PMEMD?
 
> > > >
 
> > > > Thanks in advance,
 
> > > >
 
> > > > Stephen Titmuss
 
> > > >
 
> > > > CSIRO Health Sciences and Nutrition
 
> > > > 343 Royal Parade
 
> > > > Parkville, Vic. 3052
 
> > > > AUSTRALIA
 
> > > >
 
> > > > Tel:   +61 3 9662 7289
 
> > > > Fax:   +61 3 9662 7347
 
> > > > Email: stephen.titmuss_at_csiro.au
 
> > > > www.csiro.au   www.hsn.csiro.au
 
> > > >
 
> > > >
 
>-----------------------------------------------------------------------
 
> > > > The AMBER Mail Reflector
 
> > > > To post, send mail to amber_at_scripps.edu
 
> > > > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
> > > >
 
> > > >
 
> > >
 
> > >
 
> > >
 
> > >
 
>-----------------------------------------------------------------------
 
> > > The AMBER Mail Reflector
 
> > > To post, send mail to amber_at_scripps.edu
 
> > > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
> > >
 
> > >
 
> >
 
> >
 
>-----------------------------------------------------------------------
 
> > The AMBER Mail Reflector
 
> > To post, send mail to amber_at_scripps.edu
 
> > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
>
 
>###########################################
 
>
 
>Aldo Jongejan
 
>Molecular Modeling Group
 
>Dept. of Pharmacochemistry
 
>Free University of Amsterdam
 
>De Boelelaan 1083
 
>1081 HV Amsterdam
 
>The Netherlands
 
>
 
>e-mail: jongejan_at_few.vu.nl
 
>tlf:    +31 (0)20 4447612
 
>fax:    +31 (0)20 4447610
 
>
 
>###########################################
 
>
 
>-----------------------------------------------------------------------
 
>The AMBER Mail Reflector
 
>To post, send mail to amber_at_scripps.edu
 
>To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
>
 
 -----------------------------------------------------------------------
 
The AMBER Mail Reflector
 
To post, send mail to amber_at_scripps.edu
 
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
 
  
 |