AMBER Archive (2008)
Subject: Re[2]: AMBER: Question fot Amber 10 Benchmarks

From: sychen (u8613020_at_msg.ndhu.edu.tw)
Date: Thu Nov 27 2008 - 20:18:00 CST

Next message: Guillermo Mulliert Carlín: "Re: AMBER: Who success in setting up a covalent bond between ligand and enzyme with tLeap ?"
Previous message: jitrayut jitonnom: "AMBER: Who success in setting up a covalent bond between ligand and enzyme with tLeap ?"
In reply to: Ross Walker: "RE: AMBER: Question fot Amber 10 Benchmarks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Thanks for your kindly description and Bob Duke's response.
Indeed, PMEMD performs better and the following is the benchmark by PME (original JAC),
8cpu: 32sec
2*8cpu: 28sec

We'll consider to replce GbE switch by an infiniband switch.

Thank you very much

Best Regards,
yuann

On Thu, 27 Nov 2008 08:57:06 -0800
"Ross Walker" <ross_at_rosswalker.co.uk> wrote:

> Hi Yuann,
>
> > PS: Each nodes communicates with each other by one GbE switch (3COM 2924-
> > SFPplus)
>
> To follow up on what Bob Duke said the problem is the gigabit interconnect.
> Essentially in the is day and age of multiple cores inside a node you cannot
> use ethernet to run MD in parallel. At least not regular MD. Things like
> thermodynamic integration should probably work okay, as long as you make
> sure your core mapping in the machine file is such that you run 16 threads
> but in the form of 8 threads for each image with those 8 threads all
> residing on the same node. The same is true for things like replica
> exchange. The only real solution for running MD in parallel across multiple
> nodes is a 'real' interconnect such as infiniband or myrinet etc. Something
> designed to do MPI (in hardware) as opposed to wrapping it up into tiny
> TCP/IP packets and sending it across the equivalent of the internet.
>
> Remember gigabit ethernet first came out in the days on the Pentium 2 300.
> The at that time (ignoring latency and lots of other issues) the bandwidth
> to cpu speed ratio was 1000/300 = 3.3. Consider the situation now. You have
> 2xQuad core (ignoring all the extra SSE stuff which effective doubles /
> triples the performance per Mhz potentially) the ratio would now be
> 1000/(2800*8) = 0.0446 - the problem should thus be immediately obvious.
>
> > We have compiled AMBER10 on the machines & platforms which are the same
> > as those described by Ross Walker in Amber 10 Benchmarks. (Dual XEON E5430
> > on SuperMicro X7DWA-N)
> > We use mpich2-1.0.8 & ifort9.1 to build sander.MPI, the benchmark of
> > original JAC by sander.mpi seems fine
> > (2cpu: 161sec, 4cpu: 88sec, 8cpu: 54sec),
>
> As Bob said the benchmarks I showed were for PMEMD which is designed to
> significantly outperform sander. It supports a subset of the methods
> (essentially PME and GB MD) but if the calculations you want to run fall
> within this feature set you will get better performance using PMEMD here. As
> you observe though within a machine sander does at least scale to all 8 cpus
> - although as usual with these multicore machines they are woefully
> underspecced on memory bandwidth so the scaling dies off once you try to use
> all the cores in a node.
>
> Note I did not give any benchmarks beyond 8 cpus for these machines on the
> website. This is because you can't get any scaling over ethernet. If you
> want to run larger you will need to buy some infiniband clusters or
> alternatively see if there is a supercomputer center at which you can obtain
> time.
>
> > (For 16cpu computation, abnormal usage of system CPU (60~70%) was observed
> > by top or Ganglia monitoring, while 8cpu computation was fine & system CPU
> > < 5% & user CPU > 95%)
>
> This is due to the cpus either just spinning at barriers waiting for data to
> arrive over the ethernet or spending their whole time encoding and deconding
> tcp/ip packets.
>
> > Can anyone give me some ideas to solve this problem while running parallel
> > sander jobs across nodes?
>
> It cannot be solved - not without a new interconnect, sorry. The laws of
> physics are against you here I am afraid. As I said above though you should
> be able to run things like TI calculations over 2 nodes and REMD simulations
> over all the nodes as long as you are careful to make sure all threads for a
> given 'image' run on the same node.
>
> All the best
> Ross
>
>
> /\
> \/
> |\oss Walker
>
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
> to majordomo_at_scripps.edu
>

-- sychen <u8613020_at_mail.ndhu.edu.tw>

----------------------------------------------------------------------- The AMBER Mail Reflector To post, send mail to amber_at_scripps.edu To unsubscribe, send "unsubscribe amber" (in the *body* of the email) to majordomo_at_scripps.edu

Next message: Guillermo Mulliert Carlín: "Re: AMBER: Who success in setting up a covalent bond between ligand and enzyme with tLeap ?"
Previous message: jitrayut jitonnom: "AMBER: Who success in setting up a covalent bond between ligand and enzyme with tLeap ?"
In reply to: Ross Walker: "RE: AMBER: Question fot Amber 10 Benchmarks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

AMBER Archive (2008)Subject: Re[2]: AMBER: Question fot Amber 10 Benchmarks

AMBER Archive (2008)
Subject: Re[2]: AMBER: Question fot Amber 10 Benchmarks