AMBER Archive (2004)

Subject: Re: AMBER: pmemd and mpich - myrinet

From: Robert Duke (rduke_at_email.unc.edu)
Date: Fri Apr 23 2004 - 12:57:48 CDT


Lubos -
Regarding the last point about mpich-p4 not putting maximal load on the
processors, if cpu time != wall time, this happens, and cpu time != wall
time if there is any blocking due to communications delays, or anything else
that would cause a process to yield the cpu (like multitasking on a single
cpu). Now in amber6 and pmemd, the logfile reports cpu times for the
various processors (I think this may have changed to walltime for sander
7/8, but am not sure). PMEMD is balancing its workload in terms of wall
time (because you want all the processors to arrive at the communications
events in sync), and you see uneven utilization in the logfile if there is
any process blocking. This can get real bad with the slow interconnects
typical of mpich. It will also be bad with mpich-gm if it is heavily
loaded, or if weird things are going on on the nodes (at UNC there were some
wierd problems with hung non-pmemd processes eating up cpu time, I believe).
Bottom line is, for these systems your best measure of things being okay is
if mdout reports nonsetup wall and cpu times that are close. This of course
only tells you what is happening with the master process, but it is
generally indicative of blocking problems. I always look at these two
values to be sure there were not unusual loading conditions. On a lot of
higher end gear (think supercomputer), I think communications usually waits
using spin-locks, and when this is happening cpu time == wall time, even
though you are really getting nothing done. However, most machines will
show some divergence of cpu/wall times if things get really bad (I think
they don't spinlock indefinitely, but I have not really sorted out the
particulars). So that should help you understand some of the timing issues.
Perfection is pretty much unobtainable in this domain.
Regards - Bob

----- Original Message -----
From: "Lubos Vrbka" <shnek_at_tiscali.cz>
To: <amber_at_scripps.edu>
Sent: Friday, April 23, 2004 1:33 PM
Subject: Re: AMBER: pmemd and mpich - myrinet

> bob,
>
> thanks for quick reply.
>
> > 1) There is no machinefile for xeons with myrinet simply because I did
not
> > have access to a xeons + myrinet installation. It is easy to make the
> > appropriate modifications to Machine.mpich_gm_ifc (diff
Machine.mpich_ifc
> > and Machine.mpich_ifc_p4, and craft up your own Machine.mpich_gm_ifc_p4)
> ok, i'll try that...
>
> > 2) ... Find where libgm.a is installed on your system, and set
> > MPICH_LIBDIR2 to point at it. ...
> this stuff about MPICH_LIBDIR2 was exactly the thing i needed to know...
> i wasn't sure whether it isn't used for some other purposes, but if i
> can redefine it i'll make use of it...
>
> > 3) By the way, this is the hardest of all the possible pmemd or amber
installs
> > in my opinion, due to the conjunction of linux, ifc, and mpich-gm and
all
> > the vageries/incompatibilities that are possible (sorry, really nothing
I
> > can do about it).
> well, the build proceeded fine... i want this binary mainly for tests,
> since not all nodes at clusters i use possess myrinet... sometimes i
> experience that pmemd with mpich-p4 isn't putting the maximal load on
> the processors - and i guess this could be related to slow ethernet
> communication - myrinet should tell me, whether it is really so...
>
> once more, thanks for your help. have a nice weekend,
>
> --
> Lubos
> _@_"
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu