AMBER Archive (2008)

Subject: Re: AMBER: timing info output from pmemd

From: Vlad Cojocaru (Vlad.Cojocaru_at_eml-r.villa-bosch.de)
Date: Fri Jul 11 2008 - 08:54:02 CDT


Thanks Bob for all details,

mdout_flush_interval did not change anything. The problem is still there
and I also believe now that it is a compilation issue. Unfortunately the
person who compiles stuff at our place comes only once a week so I am
starting compiling a version for my own use. I attach here the
config_amber.h that was used for the compilation of AMBER10 and the
config.h file used bor building PMEMD. If you could just glance at these
files and if you see anything obviously problematic, I will let the
person who compiled know about it.

I will give it a try both with intel and pg compilers with MPICH2
instead of OPENMPI.

Thanks again

vlad

Robert Duke wrote:
> Presuming that you are not writing over an NFS share, the difference
> should be no more than 0-3% for a low-scaling job such as one you
> would run on an ethernet cluster. I suppose if you were dumping
> output every step, the percentage could go up. Where this flush
> interval stuff comes into play is 32+ processors, and most typically
> 128+ processors, and in instances where the way the i/o subsystem is
> designed, you don't get any mdout output in a convenient timespan. I
> originally suggested dinking with this, thinking that if the problem
> was at a regular interval, this would change the interval and identify
> the open/close as where the vulnerability occurs. But since you say
> the timing is random, I am not sure what it would really tell us. If
> the run is less than 1 hr, you can essentially turn the flush off, and
> then if the problem went away, that would say it is linked somehow to
> closing and reopening the file - still doesn't fix it though, and it
> is still likely a problem down in the file system control variables
> which are getting stomped on by something else. This may be in user
> space where the application could be stomping it, but I am not sure
> (so fortran has a file abstraction it layers over the system file
> abstraction, and data structures there could get hosed I would
> think). I still think that the best bet is we have a somehow
> incompatible combination of libraries, compilers, etc.; if there was
> some sort of buffer overrun problem in the code itself, bad things
> would be happening with pmemd running all over the place (not
> impossible, just seems unlikely). I think I did suggest just running
> the factor ix benchmark (or jac if you prefer, but with a few things
> like nstlim, maybe output params) tweaked, to see if this occurs in a
> vanilla situation, or if there is something unusual about the
> combination of things you are doing here (which I at least have never
> done myself).
> Regards - Bob
>
> ----- Original Message ----- From: "Vlad Cojocaru"
> <Vlad.Cojocaru_at_eml-r.villa-bosch.de>
> To: <amber_at_scripps.edu>
> Sent: Friday, July 11, 2008 8:33 AM
> Subject: Re: AMBER: timing info output from pmemd
>
>
>> In fact, I was trying to test whether I still get the output problem
>> described before in the previous thread by using mdout_flush_interval
>> to 0 or other value. In parallel, I wanted to see how much slower
>> pmemd is when mdout_flush_interval is set to 0 for instance. Since
>> the ouptut problem appears in average only after about 25000 steps,
>> having such a regular output about the timing (in a similar fashion
>> NAMD does) would tell me immediately the difference in performance
>> between runs using different mdout_flush_interval without the need to
>> test that in advance.
>>
>> If the difference in performance is significant, it would be useless
>> to run tests of at least 25000 steps to see if the output problem is
>> still present.
>>
>> On the other hand, I am thinking of going into compiling a version of
>> amber 10 for my own use as the compilation we have here produces
>> these problems and I was thinking that such a regular timing output
>> would be convenient for fast tests of different compilations (using
>> different compilers). But of course this can be done by running the
>> benchmarking runs.
>>
>> To summarize, this output is not really a necessary feature .. I was
>> just wondering if there is an option to have it.
>>
>> Vlad
>>
>>
>>
>> Robert Duke wrote:
>>> No, and I guess I don't understand why you would want to be able to
>>> do that. Are you looking for variations in the performance of the
>>> machine, or what? In pmemd there is what is basically a
>>> parallelization log (logfile), which is sort of similar to the
>>> sander profile file in that it offers summary parallel performance
>>> info. It also has the ability to dump details about how fft's are
>>> being distributed and details about workload redistribution,
>>> including just how much time each processor is spending doing what
>>> since the last workload redistribution. This is intended for
>>> working on parallel performance problems, and the higher dumping
>>> levels may not even be documented (the namelist variable is
>>> loadbal_verbose in &cntrl, default 0, 1 gives a bit additional info,
>>> by 3 you are getting a whole bunch of detail). This may not be what
>>> you want, but it is what I use to debug parallel performance problems.
>>> Regards - Bob Duke
>>>
>>> ----- Original Message ----- From: "Vlad Cojocaru"
>>> <Vlad.Cojocaru_at_eml-r.villa-bosch.de>
>>> To: "AMBER list" <amber_at_scripps.edu>
>>> Sent: Friday, July 11, 2008 4:49 AM
>>> Subject: AMBER: timing info output from pmemd
>>>
>>>
>>>> Dear Bob, amber users,
>>>>
>>>> Is there a way to print timing info (time/mdstep) at regular
>>>> intervals in pmemd (and/or sander) ?
>>>>
>>>> Vlad
>>>>
>>>> --
>>>> ----------------------------------------------------------------------------
>>>>
>>>>
>>>> Dr. Vlad Cojocaru
>>>>
>>>> EML Research gGmbH
>>>> Schloss-Wolfsbrunnenweg 33
>>>> 69118 Heidelberg
>>>>
>>>> Tel: ++49-6221-533266
>>>> Fax: ++49-6221-533298
>>>>
>>>> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
>>>>
>>>> http://projects.villa-bosch.de/mcm/people/cojocaru/
>>>>
>>>> ----------------------------------------------------------------------------
>>>>
>>>>
>>>> EML Research gGmbH
>>>> Amtgericht Mannheim / HRB 337446
>>>> Managing Partner: Dr. h.c. Klaus Tschira
>>>> Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
>>>> http://www.eml-r.org
>>>> ----------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> -----------------------------------------------------------------------
>>>>
>>>> The AMBER Mail Reflector
>>>> To post, send mail to amber_at_scripps.edu
>>>> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
>>>> to majordomo_at_scripps.edu
>>>>
>>>
>>> -----------------------------------------------------------------------
>>> The AMBER Mail Reflector
>>> To post, send mail to amber_at_scripps.edu
>>> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
>>> to majordomo_at_scripps.edu
>>>
>>
>> --
>> ----------------------------------------------------------------------------
>>
>> Dr. Vlad Cojocaru
>>
>> EML Research gGmbH
>> Schloss-Wolfsbrunnenweg 33
>> 69118 Heidelberg
>>
>> Tel: ++49-6221-533266
>> Fax: ++49-6221-533298
>>
>> e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
>>
>> http://projects.villa-bosch.de/mcm/people/cojocaru/
>>
>> ----------------------------------------------------------------------------
>>
>> EML Research gGmbH
>> Amtgericht Mannheim / HRB 337446
>> Managing Partner: Dr. h.c. Klaus Tschira
>> Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
>> http://www.eml-r.org
>> ----------------------------------------------------------------------------
>>
>>
>>
>> -----------------------------------------------------------------------
>> The AMBER Mail Reflector
>> To post, send mail to amber_at_scripps.edu
>> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
>> to majordomo_at_scripps.edu
>>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
> to majordomo_at_scripps.edu
>

-- 
----------------------------------------------------------------------------
Dr. Vlad Cojocaru

EML Research gGmbH Schloss-Wolfsbrunnenweg 33 69118 Heidelberg

Tel: ++49-6221-533266 Fax: ++49-6221-533298

e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de

http://projects.villa-bosch.de/mcm/people/cojocaru/

---------------------------------------------------------------------------- EML Research gGmbH Amtgericht Mannheim / HRB 337446 Managing Partner: Dr. h.c. Klaus Tschira Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter http://www.eml-r.org ----------------------------------------------------------------------------

MATH_DEFINES =
MATH_LIBS =
FFT_DEFINES = -DPUBFFT
FFT_INCLUDE =
FFT_LIBS =
NETCDF_HOME = /sw/mcm/app/amber/10/amd64/ompi-1.2.5/pgi-7.1/amber10/src/netcdf
NETCDF_DEFINES = -DBINTRAJ
NETCDF_MOD = netcdf.mod
NETCDF_LIBS = $(NETCDF_HOME)/lib/libnetcdf.a
MPI_HOME = /sw/mcm/app/openmpi/1.2/5/dynamic/amd64/pgi-7.1/
MPI_DEFINES =
MPI_INCLUDE = -I$(MPI_HOME)/include
MPI_LIBDIR = $(MPI_HOME)/lib
MPI_LIBS = -L$(MPI_LIBDIR)
DIRFRC_DEFINES = -DDIRFRC_EFS -DDIRFRC_NOVEC
CPP = /lib/cpp
CPPFLAGS = -traditional -P
F90_DEFINES = -DFFTLOADBAL_2PROC

F90 = mpif90
MODULE_SUFFIX = mod
F90FLAGS = -c
F90_OPT_DBG = -g
F90_OPT_LO = -fastsse -O1
F90_OPT_MED = -fastsse -O2
F90_OPT_HI = -fastsse -O3
F90_OPT_DFLT = $(F90_OPT_HI)

CC = pgcc
CFLAGS = -fastsse -O3

LOAD = mpif90
LOADFLAGS =
LOADLIBS =



-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
      to majordomo_at_scripps.edu