AMBER Archive (2009)
Subject: RE: [AMBER] Error in PMEMD run
From: Ross Walker (ross_at_rosswalker.co.uk)
Date: Fri May 08 2009 - 13:11:19 CDT
I don't think I've seen anywhere what the actual simulation you are running
is. This will have a huge effect on parallel scalability. With infiniband
and a 'reasonable' system size you should easily be able to get beyond 2
nodes. Here are some numbers for the JAC NVE benchmark from the suite
provided on http://ambermd.org/amber10.bench1.html
This is for NCSA Abe which is Dual x Quad core clovertown (E5345 2.33GHz so
very similar to your setup) and uses SDR infiniband.
Using all 8 processors per node (time for benchmark in seconds):
8 ppn 8 cpu 364.09
8 ppn 16 cpu 202.65
8 ppn 24 cpu 155.12
8 ppn 32 cpu 123.63
8 ppn 64 cpu 111.82
8 ppn 96 cpu 91.87
Using 4 processors per node (2 per socket):
4 ppn 8 cpu 317.07
4 ppn 16 cpu 178.95
4 ppn 24 cpu 134.10
4 ppn 32 cpu 105.25
4 ppn 64 cpu 83.28
4 ppn 96 cpu 67.73
As you can see it is still scaling to 96 cpus (24 nodes at 4 threads per
node). So I think you must either be running an unreasonably small system to
expect scaling in parallel or there is something very wrong with the setup
of your computer.
All the best
> -----Original Message-----
> From: amber-bounces_at_ambermd.org [mailto:amber-bounces_at_ambermd.org] On
> Behalf Of Marek Malý
> Sent: Friday, May 08, 2009 10:58 AM
> To: AMBER Mailing List
> Subject: Re: [AMBER] Error in PMEMD run
> Hi Gustavo,
> thanks for your suggestion but we have only 14 nodes in our cluster
> node = 2 x Xeon Quad-core 5365 (3,00 GHz) = 8 single CPUs per node
> connected with "Cisco InfiniBand").
> If I allocate 8 nodes and I use just 2 CPUs per node for one my job it
> means that 8x6 single CPUs = 48 will be wasted. In this
> case I am sure that my colleagues will kill me :)) Moreover I do not
> assume that 8/2CPU combination will have significantly better
> performance that 2/8CPU at least in case of PMEMD.
> But anyway, thank you for your opinion/experience !
> Dne Fri, 08 May 2009 19:28:35 +0200 Gustavo Seabra
> <gustavo.seabra_at_gmail.com> napsal/-a:
> >> the best performance I have obtained in case of using combination of
> >> nodes
> >> and 4 CPUs (from 8) per node.
> > I don't know exactly what you have in your system, but I gather you
> > are using 8core-nodes, and from it you got the best performance by
> > leaving 4 cores idle. Is that correct?
> > In this case, I would suggest that you go a bit further, and also
> > using only 1 or 2 cores per node, i.e., leaving the remaining 6-7
> > cores idle. So, for 16 MPI processes, try allocating 16 or 8 nodes.
> > (I didn't see this case in your tests)
> > AFAIK, The 8-core nodes are arranged in 2 4-core sockets, and the
> > communication between core, that was already bad within the 4-cores
> > the same socket, gets even worse when you need to get information
> > between two sockets. Depending on your system, if you send 2
> > to the same node, it may put all in the same socket or automatically
> > split it one for each socket. You may also be able to tell it to make
> > sure that this gets split in to 1 process per socket. (Look into the
> > mpirun flags.) From the tests we've run on those kind of machines, we
> > do get the best performance by leaving ALL BUT ONE core idle in each
> > socket.
> > Gustavo.
> > _______________________________________________
> > AMBER mailing list
> > AMBER_at_ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
> > __________ Informace od NOD32 4051 (20090504) __________
> > Tato zprava byla proverena antivirovym systemem NOD32.
> > http://www.nod32.cz
> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
> AMBER mailing list
AMBER mailing list