AMBER Archive (2007)

Subject: Re: AMBER: benchmarking pmemd and sander (pmemd segmentation fault problem follow up)

From: Robert Duke (rduke_at_email.unc.edu)
Date: Thu Mar 29 2007 - 09:53:56 CST


HI Vlad,
Okay, nothing really bad is obvious in the mdin. The fact your are using a
2 fsec step means we can double the nsec/day estimates below. You may not
really need any output of mdvel or mden, but the frequency of writing is low
enough not to matter. Looks to me like the interconnect is limiting, at
least as currently configured, so questions fall back to PNL. Okay, sorry I
forgot your question about data distribution. "DatDistrib" time here is not
initial data distribution. It is the constant (per step) need to propagate
coordinates and forces between nodes as required in each step; in each
succeeding step some processors will need updated coordinates on atoms they
"use" but don't "own"; atom "owners" are responsible for the coordinate
update, which requires that they have the summed forces from everybody else.
Contrary to statements by others about how pmemd works, we don't do a full
update of system state at every step; instead a bunch of algorithms are used
to figure out who needs what, and only that subset of the system state is
sent to each processor. However, everytime the pairlist is rebuilt, we
currently do a full update of coordinates to everyone. Typically, this
occurs about every 10 steps on most systems running a 2 angstrom skinnb at
300 K. So this is very typical behaviour, when the interconnect is
limiting, for the "DataDistrib" time to start going up out of proportion to
everything else; the problem here is it just looks worse than I would
suspect for a modern interconnect. I don't know when this machine went into
service; maybe the interconnect is just not the greatest. As an aside, the
data distribution problem typically is even worse than seen here; the cost
of fft slab distribution (the distributed fft transpose) is actually a major
factor in pme performance at high scaling. This cost is part of the "FFT"
number, and you will see that value going up too as the interconnect gets
overloaded. I'll forward this to the PNL folks.
Best Regards - Bob

----- Original Message -----
From: "Vlad Cojocaru" <Vlad.Cojocaru_at_eml-r.villa-bosch.de>
To: <amber_at_scripps.edu>
Sent: Thursday, March 29, 2007 11:25 AM
Subject: Re: AMBER: benchmarking pmemd and sander (pmemd segmentation fault
problem follow up)

> Hi Bob,
>
> Below you have a sample of my mdin file. Its not 1fs, I am doing 2fs with
> SHAKE. If you wonder about ntp=2, its because its a membrane-protein
> complex. This is what I started with and I ran those tests I described.
> However, this script was initially built and used for sander and I have
> just transferred it to pmemd. If you see something that might interfere
> with the performance, let me know.
>
> I am still wondering... As far as I got it, you calculated 1.56 ns/day by
> just taking the total time of 32 CPUs run. If you look at the 64 CPU data,
> the time required for the evaluation of nonbonded interactions drops to
> half comparing to the 32 CPUs. However, the overall time is increased by
> the "DataDistrib". If DataDistrib is done only once at the beginning of
> the simulation, then the overall time for 2ns on 32 CPUS would be 37.000
> s, while on 64 CPUs would be 18.500s ... So, the scaling would be pretty
> good actually... But I am not sure what is DataDistrib and how does the
> number of steps infuence the time needed for DataDistrib.... Also, on 256
> CPUs, the same 5000 steps take 530 s out of which 487 is only DataDistrib
> ..... So, the time of DataDistrib really increases dramatically with the
> CPU count ....
>
> I am preparing a complete graph for both pmemd and sander on different CPU
> counts and I will send it to you when is ready ...
>
> As for the other questions about the system_at_PNNL I really have to do
> research on that, because I am not sure about the MPI options, queue
> specifications (apart from being adiministered with LSF, and pmemd or
> sander is run with the prun command ....)
>
> Best wishes and thanx for the help with this,
>
> Vlad
>
> MDIN:
> &cntrl
> imin=0, ntx=5, irest=1, ntrx=1, ntxo=1,
> ntpr=100, ntwx=500, ntwv=2000, ntwe=2000,
> ntf=2, ntb=2, dielc=1.0, cut=9.0, scnb=2.0, scee=1.2,
> nsnb=100, igb=0,
> ntr=0,
> nstlim=1000000,
> t=300.0, dt=0.002,
> ntt=1, tautp=5.0, tempi=300.0, temp0=300.0,
> vlimit=15,
> ntp=2, pres0=1.0, taup=2.0,
> ntc=2, tol=0.00001,
> /
>
>
> Robert Duke wrote:
>
>> Hi Vlad,
>> Okay, good that you have it running. The benchmarking results are not
>> great, however. This is peaking at around 1.56 nsec/day if I assume you
>> have a 1 fs stepsize. If we look at lemieux at psc as a comparison (it
>> has/had a quadrics interconnect, but with a dual rail capability (so
>> faster than vanilla), and significantly less powerful processors (puts
>> more stress on the interconnect, but since they were in 4 processor
>> nodes, that helps)), we see a peak of 6.52 nsec/day on 80 processors for
>> JAC - 23.6k atoms, 1 fs step, and a peak of 4.47 nsec/day on 160
>> processors for factor ix (nvt, my setup) - 91k atoms, 1.5 fs step. So I
>> would expect the whopping power of a single itanium 2 cpu to make it
>> possible to exceed these values for nsec/day at lower processor count,
>> but for the system to bottleneck at rather unspectacular total processor
>> counts because the quadrics can't keep up (but maybe this is a better
>> quadrics than I know about - anybody got interconnect latency times on
>> this beast?). An sgi altix (itanium 2, but big smp) will get up to 15
>> nsec/day on 96 procs for JAC and up to 7.7 nsec/day on 96 procs for
>> factor ix. So there are several things to look at. First, please send
>> me your mdin file so I can look at the conditions for your run. There
>> are lots of things one can do in the mdin to get less than optimal
>> performance while at the same time not increasing the amount of useful
>> data you collect. Secondly - do you have the option of dual rail runs, or
>> selecting the layout of tasks on the machine? What options in general
>> are available for controlling how mpi works (MPI* environment variables,
>> job queue specs, etc.). These are questions perhaps better directly
>> addressed to PNNL support guys (I have been in communication with them
>> and will forward this mail). From PNNL it would also be helpful if I saw
>> the exact config.h they used in building pmemd - it is possible there
>> were some suboptimal decisions made in concocting the config.h, since I
>> don't directly support this machine configuration (PNNL is not one of the
>> places that lets me dink around with their machines, but then I have not
>> gotten around to making a request either...). Finally, getting some
>> numbers on both JAC and factor ix benchmarks is a lot more helpful for
>> evaluating the machine than just looking at your system because we have
>> data from all over the world on these two benchmarks. Then we can see if
>> you are doing something in your system that unnecessarily cuts the
>> performance you obtain. In general we get better per-processor
>> performance than namd over a practical range of processors, and then as
>> you increase the processor count to points where efficiency is less than
>> 50% namd keeps scaling a bit further and we bottleneck (has to do mostly
>> with our fft slab distribution algorithm - should be fixed in next
>> release) - I tend to avoid trying to get into benchmarking wars with
>> these other guys though; there are lots of apples and oranges comparisons
>> possible that really don't help anybody.
>> Best Regards - Bob
>>
>> ----- Original Message ----- From: "Vlad Cojocaru"
>> <Vlad.Cojocaru_at_eml-r.villa-bosch.de>
>> To: "AMBER list" <amber_at_scripps.edu>
>> Sent: Thursday, March 29, 2007 5:50 AM
>> Subject: AMBER: benchmarking pmemd and sander (pmemd segmentation fault
>> problem follow up)
>>
>>
>>> Dear Bob, Ros, amber community,
>>>
>>> So, as Bob suggested it looks like the pmemd segmentation fault that I
>>> reported some days ago had something to do with the i8 and i4 versions
>>> of amber9 that people at the PNNL compiled. As soon as I changed to the
>>> i4 version the problem dissapeared. I am currently trying to fix the
>>> problem for the i8 version together with the people responsible.
>>>
>>> I started a meticulous benchmarking (pmemd9) of my system (65 k atoms)
>>> by running 5000 steps of MD (10 ps) on 8, 16, 32, 64, 128, and 512 cpus.
>>> The first results for the total time are:
>>> 8 cpus - 775 s,
>>> 16 cpus - 463 s,
>>> 32 cpus - 277 s,
>>> 64 cpus - 402 s.
>>>
>>> Since I do not have experience with benchmarking, I was confused by the
>>> differnce between 32 cpus and 64 cpus and I noticed that the difference
>>> comes from "DataDistrib" at the end of the pmemd output (see outputs for
>>> 32 and 64 cpus below). My question is what does actually "DataDistrib"
>>> mean? Is this action done only once at the beginning of the simulation,
>>> therefore being independent of the number of MD steps ? Could you tell
>>> me which are the actions in the output table done only once at the
>>> beginning of the simulation and which are done each step (obvously the
>>> energy terms are calculated each step but for instance RunMD seems ta
>>> take the same time on different numbers of CPUs?
>>>
>>> I am asking this because I would like to use these 5000 steps benchmark
>>> runs to estimate the numebr of ns/day for each run ... Is this actually
>>> possible?
>>>
>>> Thanks a lot for help on this!!
>>>
>>> Best wishes
>>> vlad
>>>
>>> Output 32 cpus
>>> | DataDistrib 87.84 32.06
>>> | Nonbond 166.87 60.90
>>> | Bond 0.08 0.03
>>> | Angle 0.96 0.35
>>> | Dihedral 2.81 1.02
>>> | Shake 2.27 0.83
>>> | RunMD 13.09 4.78
>>> | Other 0.10 0.04
>>> | ------------------------------
>>> | Total 274.02
>>>
>>> Output 64 cpus
>>> | DataDistrib 306.89 77.25
>>> | Nonbond 71.37 17.96
>>> | Bond 0.03 0.01
>>> | Angle 0.47 0.12
>>> | Dihedral 1.37 0.34
>>> | Shake 1.54 0.39
>>> | RunMD 15.36 3.87
>>> | Other 0.24 0.06
>>> | ------------------------------
>>> | Total 397.27
>>>
>>>
>>>
>>> --
>>> ----------------------------------------------------------------------------
>>>
>>> Dr. Vlad Cojocaru
>>>
>>> EML Research gGmbH
>>> Schloss-Wolfsbrunnenweg 33
>>> 69118 Heidelberg
>>>
>>> Tel: ++49-6221-533266
>>> Fax: ++49-6221-533298
>>>
>>> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
>>>
>>> http://projects.villa-bosch.de/mcm/people/cojocaru/
>>>
>>> ----------------------------------------------------------------------------
>>>
>>> EML Research gGmbH
>>> Amtgericht Mannheim / HRB 337446
>>> Managing Partner: Dr. h.c. Klaus Tschira
>>> Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
>>> http://www.eml-r.org
>>> ----------------------------------------------------------------------------
>>>
>>>
>>>
>>> -----------------------------------------------------------------------
>>> The AMBER Mail Reflector
>>> To post, send mail to amber_at_scripps.edu
>>> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>>>
>>
>>
>> -----------------------------------------------------------------------
>> The AMBER Mail Reflector
>> To post, send mail to amber_at_scripps.edu
>> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>>
>
> --
> ----------------------------------------------------------------------------
> Dr. Vlad Cojocaru
>
> EML Research gGmbH
> Schloss-Wolfsbrunnenweg 33
> 69118 Heidelberg
>
> Tel: ++49-6221-533266
> Fax: ++49-6221-533298
>
> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
>
> http://projects.villa-bosch.de/mcm/people/cojocaru/
>
> ----------------------------------------------------------------------------
> EML Research gGmbH
> Amtgericht Mannheim / HRB 337446
> Managing Partner: Dr. h.c. Klaus Tschira
> Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
> http://www.eml-r.org
> ----------------------------------------------------------------------------
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu