AMBER Archive (2005)

Subject: Re: AMBER: doesn't work in pmemd, and no output in parallel simulations

From: Robert Duke (rduke_at_email.unc.edu)
Date: Wed Apr 27 2005 - 07:06:46 CDT


Hwankyu -
The people to be asking are your system administrators. Do the machines
support any type of parallel code running mpi? Did they run the mpi test
cases on the machines? Did they read all the information about doing
parallel builds for sander/pmemd? You still have not said anything about
the machines, other than that they are "AMD" and I can tell that you are
using PBS for queueing, but just about ANYTHING can be wrong. One common
problem with public domain mpi's is that they don't get correctly built with
a matching fortran compiler (ie., the same compiler that is used to build
sander/pmemd). At any rate, at this level, we are unlikely to be able to
help much, especially in the absence of every detail about your systems, and
even then it often takes being knowledgeable and having access to the system
to fix these sorts of problems.
Good luck - Bob Duke

----- Original Message -----
From: "Hwankyu Lee" <leehk_at_umich.edu>
To: <amber_at_scripps.edu>
Sent: Tuesday, April 26, 2005 10:46 PM
Subject: Re: AMBER: doesn't work in pmemd, and no output in parallel
simulations

> Dear Amber-users,
>
> Thanks for your advice. I tried both sander and pmemd with serial and
> parallel version again, and realized that both work for serial version.
> For pmemd, it took much longer time to make the first output file (I
> don't know why it took so long time), but finally it run and made output
> files like sander did.
>
> However, both sander and pmemd still don't work for parallel simulation.
> An administrator of AMD machines installed sander, pmemd, and their
> parallel(ethernet) version separately (he build up sander and pmemd for
> ethernet version), and he said that all the executable files passed
> example test. But, when I tried sander and pmemd in parallel(ethernet)
> version, it didn't make any output files (.out and .mdcrd) although CPUs
> are running. I heard that parallel build of sander can be run on one
> processor, but when I tried sander using "mpirun" with only single
> processor, it didn't work. I also added "limit stacksize unlimited" in
> my .cshrc file, which is the suggestion from here.
> Could you tell me how to handle this problem? Thanks for your help in
> advance.
>
> best,
> Hwankyu.
>
>
> On Apr 26, 2005, at 5:24 PM, Robert Duke wrote:
>
>> Hwankyu -
>> Okay, I have no idea what the parallel run issues are; there is
>> insufficient information here to figure that out. You need to run the
>> amber test suite in parallel, and get that working, and then you know
>> that sander and pmemd parallel versions are properly installed.
>> Regarding the pmemd single processor run, I cannot see any cause for
>> failure; looks to me like the run may have blown up without flushing
>> buffers. The most common cause for pmemd not running when sander does
>> run on linux systems is stacksize issues. Is this a linux system? If so,
>> be sure there is a "limit stacksize unlimited" in your .cshrc (just
>> having it in .login is often not enough for parallel runs; if you invoke
>> things with bourne shell, you must use ulimit instead). Also, set ntpr
>> to something like 1 to debug a run; lots of things can happen in 500
>> steps, and you are only dumping output every 500 steps. Other than that,
>> I don't see anything. You of course do need the parallel builds of both
>> sander and pmemd (you do have two executables, built differently, for
>> each?). Also note that while the parallel build of sander can be run on
>> one processor, parallel pmemd requires a minimum of two processors (but
>> it should die with a very clear message). Anyway, check stacksize, set
>> ntpr really low, and see what you get.
>> Regards - Bob Duke
>>
>> ----- Original Message ----- From: "Hwankyu Lee" <leehk_at_umich.edu>
>> To: <amber_at_scripps.edu>
>> Sent: Tuesday, April 26, 2005 4:35 PM
>> Subject: AMBER: doesn't work in pmemd, and no output in parallel
>> simulations
>>
>>
>>> Dear Amber-users,
>>>
>>> I've done energy minimization of my system (~70,000 atoms, including
>>> water), and
>>> then run MD with restraints on the solute by using sander. And then, I
>>> was
>>> trying to simulate this system, testing with pmemd and sander with
>>> parallel and
>>> no parallel. Sander and pmemd passed example test, so there may be no
>>> problem
>>> with installation. When I ran the simulation with sander at single CPU,
>>> it
>>> nicely worked. But, in ohter cases, I got two problems.
>>>
>>> 1) When I ran sander in one CPU, it nicely worked. However, when I ran
>>> sander in
>>> multiple CPUs, I saw CPUs run but coulnd't see any output files (.out
>>> and
>>> .mdcrd). I'm sure that I made right directory for output files. Could
>>> you tell
>>> me why I can't see the output files in the directory
>>> (/home/leehk/run/g5ace90-par4)? This is script file.
>>> ---------------------------
>>> #PBS -S /bin/bash
>>> #PBS -l nodes=2,walltime=500:00:00
>>> #PBS -q protein
>>> #PBS -N sand-cpu4
>>> #PBS -j oe
>>> #PBS -M leehk_at_umich.edu
>>> #PBS -m abe
>>> #
>>> echo "I ran on `hostname`"
>>> cat $PBS_NODEFILE
>>> #
>>> export GMPICONF=/home/leehk/.gmpi/$PBS_JOBID
>>> export PATH=/usr/cac/mpich.eth/bin:$PATH
>>> #
>>> cd /home/leehk/run/g5ace90-par4
>>> mpirun -np 4 -machinefile $PBS_NODEFILE
>>> home/leehk/exe.eth/sander -O -i md2.in
>>> -o md2.out -p g5ace90.prmtop -c md1.rst -r g5ace90-md0.rst -x
>>> g5ace90-md0.mdcrd
>>> #
>>> ---------------------------
>>>
>>> 2) When I ran pmemd at one CPU or multiple CPUs, I saw CPUs run, but
>>> when I
>>> checked .out file, I saw that .out file stopped like below. Since this
>>> system
>>> work with sander, I cannot understand why it didn't work with pmemd.
>>> I also attach my script file.
>>> -----------------------------
>>>
>>> -------------------------------------------------------
>>> Amber 8 SANDER Scripps/UCSF 2004
>>> -------------------------------------------------------
>>>
>>> | PMEMD implementation of SANDER, Release 8.0
>>>
>>> | Run on 04/26/2005 at 15:28:30
>>>
>>> [-O]verwriting output
>>>
>>> File Assignments:
>>> | MDIN: md2.in
>>> | MDOUT: md2.out
>>> | INPCRD: md1.rst
>>> | PARM: g5ace90.prmtop
>>> | RESTRT: g5ace90-md0.rst
>>> | REFC: refc
>>> | MDVEL: mdvel
>>> | MDEN: mden
>>> | MDCRD: g5ace90-md0.mdcrd
>>> | MDINFO: mdinfo
>>>
>>>
>>> Here is the input file:
>>>
>>> dendrimer : 2ns of MD
>>> &cntrl
>>> imin = 0, irest = 0, ntx = 7,
>>> ntb = 2, pres0 = 1.0, ntp = 1,
>>> taup = 5.0,
>>> cut = 10, ntr = 0,
>>> ntc = 2, ntf = 2,
>>> tempi = 298.0, temp0 = 298.0,
>>> ntt = 3, gamma_ln = 1.0,
>>> nstlim = 1000000, dt = 0.002,
>>> ntpr = 500, ntwx = 500, ntwr = 20000
>>> /
>>>
>>>
>>>
>>>
>>> | Largest sphere to fit in unit cell has radius = 40.001
>>>
>>> | Duplicated 0 dihedrals
>>>
>>> | Duplicated 0 dihedrals
>>>
>>> ----------------------------------------------------------------------
>>> ----------
>>> 1. RESOURCE USE:
>>> ----------------------------------------------------------------------
>>> ----------
>>>
>>> getting new box info from bottom of inpcrd
>>>
>>> NATOM = 63187 NTYPES = 11 NBONH = 60807 MBONA = 2367
>>> NTHETH = 6554 MTHETA = 2860 NPHIH = 10308 MPHIA = 3601
>>> NHPARM = 0 NPARM = 0 NNB = 105162 NRES = 19366
>>> NBONA = 2367 NTHETA = 2860 NPHIA = 3601 NUMBND = 15
>>> NUMANG = 26 NPTRA = 7 NATYP = 14 NPHB = 1
>>> IFBOX = 2 NMXRS = 635 IFCAP = 0 NEXTRA = 0
>>> NCOPY = 0
>>>
>>> | Coordinate Index Table dimensions: 17 17 17
>>> | Direct force subcell size = 5.7636 5.7636 5.7636
>>>
>>> BOX TYPE: TRUNCATED OCTAHEDRON
>>>
>>> ----------------------------------------------------------------------
>>> ----------
>>> 2. CONTROL DATA FOR THE RUN
>>> ----------------------------------------------------------------------
>>> ----------
>>>
>>> DEU
>>>
>>> General flags:
>>> imin = 0, nmropt = 0
>>>
>>> Nature and format of input:
>>> ntx = 7, irest = 0, ntrx = 1
>>>
>>> Nature and format of output:
>>> ntxo = 1, ntpr = 500, ntrx = 1, ntwr =
>>> 20000
>>> iwrap = 0, ntwx = 500, ntwv = 0, ntwe =
>>> 0
>>> ioutfm = 0, ntwprt = 0, idecomp = 0, rbornstat=
>>> 0
>>>
>>> Potential function:
>>> ntf = 2, ntb = 2, igb = 0, nsnb =
>>> 25
>>> ipol = 0, gbsa = 0, iesp = 0
>>> dielc = 1.00000, cut = 10.00000, intdiel = 1.00000
>>> scnb = 2.00000, scee = 1.20000
>>>
>>> Frozen or restrained atoms:
>>> ibelly = 0, ntr = 0
>>> ------------------------------------
>>> Script file is below.
>>> -------------------------------------
>>> #PBS -S /bin/bash
>>> #PBS -l nodes=1,walltime=500:00:00
>>> #PBS -q protein
>>> #PBS -N sanpme1cpu
>>> #PBS -j oe
>>> #PBS -M leehk_at_umich.edu
>>> #PBS -m ae
>>>
>>> echo "I ran on `hostname`"
>>>
>>> cd /home/leehk/run/g5ace90-pme
>>> /home/leehk/exe.ser/pmemd -O -i md2.in -o md2.out -p g5ace90.prmtop -c
>>> md1.rst
>>> -r g5ace90-md0.rst -x g5ace90-md0.mdcrd &
>>> wait
>>> -------------------------------------
>>>
>>> best,
>>> Hwankyu.
>>> ----------------------------------------------------------------------
>>> -
>>> The AMBER Mail Reflector
>>> To post, send mail to amber_at_scripps.edu
>>> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>>
>>
>> -----------------------------------------------------------------------
>> The AMBER Mail Reflector
>> To post, send mail to amber_at_scripps.edu
>> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>>
>>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu