AMBER Archive (2008)

Subject: Re: AMBER: problems with Replica Exchange

From: Carlos Simmerling (carlos.simmerling_at_gmail.com)
Date: Fri Feb 29 2008 - 12:33:59 CST


I downloaded your files and they work fine for me.
try changing ntpr to a smaller number and see if you
get the output printing sooner.

I will comment that this is a very large system for REMD.
I'm not aware of any articles that show successful use of REMD
for something like this. It might work, I just want to make sure
you're very experienced and know what you're doing.
carlos

On Thu, Feb 28, 2008 at 12:43 PM, <rebeca_at_mmb.pcb.ub.es> wrote:

> Thank you very much for your answer. I have tried your suggestion and when
> I
> changed my groupfile to -rem 0 and removed numexchg in the input files, I
> got
> the same error than with the replica job. So it should be a problem of the
> format. However, I don't understand why a standar molecular dynamics
> works with
> exactly these same files top and restart.
> I send you both files. I would be really helpful if you could have a look
> at
> them and try them yourself.
> Thank you very much for your help.
>
>
> Rebeca García Fandiño Ph. D.
> Parc Cientific de Barcelona
> Barcelona Spain
> rebeca_at_mmb.pcb.ub.es
>
>
> Quoting Carlos Simmerling <carlos.simmerling_at_gmail.com>:
>
> > I tried your input files and they work fine for me with 2 replicas
> > under amber9.
> > can you just change your groupfile to -rem 0 and the mdin files to
> > remove numexchg, and run again? that way everything is the same
> > but it will not use remd.
> >
> > also if you send your inpcrd and prmtop I can try with those, I used my
> own
> > system with the mdin file that you sent.
> >
> > since it's only 2 processes, you might also try running outside of the
> > queuing software.
> > carlos
> >
> >
> > On Thu, Feb 28, 2008 at 10:16 AM, <rebeca_at_mmb.pcb.ub.es> wrote:
> >> Thanks for your reply. I am using only 2 replicas, and only one
> >> processor for
> >> each replica, since it is easier to optimize the method first with
> only
> >> 2 ones.
> >> When it works, I will add more, of course.
> >> I have tri d your suggestion, the multisander job works fine with the
> same
> >> restart and topology but DIFFERENT inputs (those for a standar
> molecular
> >> dynamics). So do you think it could be a problem with the inputs? I
> >> am using
> >> those that work for the tests, these ones:
> >>
> >> rem.in.001:
> >>
> >> Title Line
> >> &cntrl
> >> imin = 0, nstlim = 100, dt = 0.002,
> >> ntx = 5, tempi = 0.0, temp0 = 325.0,
> >> ntt = 3, tol = 0.000001, gamma_ln = 1.0,
> >> ntc = 2, ntf = 1, ntb = 0,
> >> ntwx = 500, ntwe = 0, ntwr =500, ntpr = 100,
> >> scee = 1.2, cut = 99.0,
> >> ntr = 0, tautp = 0.1, offset = 0.09,
> >> nscm = 500, igb = 5, irest=1,
> >> ntave = 0, numexchg=5,
> >> &end
> >>
> >> rem.in.002
> >>
> >> Title Line
> >> &cntrl
> >> imin = 0, nstlim = 100, dt = 0.002,
> >> ntx = 5, tempi = 0.0, temp0 = 350.0,
> >> ntt = 3, tol = 0.000001, gamma_ln = 1.0,
> >> ntc = 2, ntf = 1, ntb = 0,
> >> ntwx = 500, ntwe = 0, ntwr =500, ntpr = 100,
> >> scee = 1.2, cut = 99.0,
> >> ntr = 0, tautp = 0.1, offset = 0.09,
> >> nscm = 500, igb = 5, irest=1,
> >> ntave = 0, numexchg=5,
> >> &end
> >>
> >>
> >> As groupfile I use:
> >>
> >> #
> >> #
> >> -O -rem 1 -remlog rem.log -i ./rem.in.001 -p ./1ftg_wat.top -c
> >> ./md_prod_5.r -o
> >> ./rem.out.001 -inf reminfo.001 -r ./rem.r.001
> >> -O -rem 1 -remlog rem.log -i ./rem.in.002 -p ./1ftg_wat.top -c
> >> ./md_prod_5.r -o
> >> ./rem.out.002 -inf reminfo.002 -r ./rem.r.002
> >>
> >>
> >> And the script for executing the calculation is:
> >>
> >> #!/bin/bash
> >> # @ class = bsc_ls
> >> # @ job_name = test_parallel
> >> # @ initialdir = .
> >> # @ output = OUTPUT/mpi_%j.out
> >> # @ error = OUTPUT/mpi_%j.err
> >> # @ total_tasks = 2
> >> # @ wall_clock_limit = 00:01:00
> >>
> >> export XLFRTEOPTS="namelist=old:xrf_messages=no"
> >>
> >> srun /gpfs/apps/AMBER/src/9/exe/sander.MPI -O -ng 2 -groupfile
> groupfile <
> >> /dev/null
> >>
> >>
> >> As I told you the restart and topology work well for a multisander
> >> job, with
> >> standar molecular dynamics. When I try to execute this inputs for
> Replica
> >> Exchange calculations, it only generates the EMPTY files rem.out.001and
> >> rem.out.002 and I get this error in the error file:
> >>
> >>
> >> [0] MPI Abort by user Aborting program !
> >> [0] Aborting program!
> >> [1] MPI Abort by user Aborting program !
> >> [1] Aborting program!
> >> srun: error: s26c2b12: task[0-1]: Exited with exit code 255
> >>
> >>
> >>
> >> The output file gives:
> >>
> >> Running multisander version of sander amber9
> >> Total processors = 2
> >> Number of groups = 2
> >>
> >> Looping over processors:
> >> WorldRank is the global PE rank
> >> NodeID is the local PE rank in current group
> >>
> >> Group = 0
> >> WorldRank = 0
> >> NodeID = 0
> >>
> >> Group = 1
> >> WorldRank = 1
> >> NodeID = 0
> >>
> >>
> >> Any idea? Something wrong with the inputs?
> >>
> >>
> >> Rebeca García Fandiño Ph. D.
> >> Parc Cientific de Barcelona
> >> Barcelona Spain
> >> rebeca_at_mmb.pcb.ub.es
> >>
> >>
> >>
> >>
> >>
> >>
> >> Quoting Carlos Simmerling <carlos.simmerling_at_gmail.com>:
> >>
> >> > the thing to try first is 1 processor per group. this way you
> >> > know that output from shake errors etc will get written to the
> >> > output file, which only the master process for each replica can do.
> >> > this is the same situation in normal MD- if there is a problem with
> no
> >> > error msg in the output always try to run single processor to test
> it..
> >> > you should not need anything special in the restart file from
> sander,
> >> > it can be used directly for remd. it's hard to help more since
> >> you haven't
> >> > told us much of anything about how you are doing the calculation.
> >> >
> >> > are you using only 2 replicas?
> >> >
> >> > does the same multisander job work fine if you just turn remd off
> (but
> >> > otherwise use exactly the same input files)?
> >> >
> >> > On Thu, Feb 28, 2008 at 7:29 AM, <rebeca_at_mmb.pcb.ub.es> wrote:
> >> >> Hello,
> >> >> I am trying to do Replica Exchange calculations using Amber 9.
> When
> >> >> I try with
> >> >> the files of the example of the tests, it works, but when I try
> >> >> with my protein
> >> >> I have problems. Using directly the usual restart file from a
> >> >> sander calculation
> >> >> I get problems of the type
> >> >>
> >> >> [1] MPI Abort by user Aborting program !
> >> >> [1] Aborting program!
> >> >> [0] MPI Abort by user Aborting program !
> >> >> [0] Aborting program!
> >> >> srun: error: s30c1b04: task[0-1]: Exited with exit code 255
> >> >>
> >> >> However, when I create the restart file from the trajectory file
> >> >> with ptraj the
> >> >> calculation stops with no errors, but stop writting at the
> >> point (in the
> >> >> rem.out files):
> >> >>
> >> >> ...................
> >> >> trajectory generated by ptraj
> >> >> begin time read from input coords = 0.000 ps
> >> >>
> >> >> Number of triangulated 3-point waters found: 0
> >> >> | Atom division among processors:
> >> >> | 0 2573
> >> >> | Running AMBER/MPI version on 1 nodes
> >> >>
> >> >> | MULTISANDER: 2 groups. 1 processors out of 2 total.
> >> >> ....................
> >> >>
> >> >> It creates the correspondent files reminfo and rem.log, but they
> >> >> are all empty.
> >> >> In the error file I only can see "srun: Force Terminated job".
> >> >>
> >> >> Since the same calculation works with the protein that appears
> >> in the test
> >> >> examples, maybe could it be a problem of format? Should I do
> >> any special
> >> >> treatment to the restart file I use for the calculations?
> >> >>
> >> >> Thank you very much for you help, in advance.
> >> >>
> >> >> Rebeca García Fandiño Ph. D.
> >> >> Parc Cientific de Barcelona
> >> >> Barcelona Spain
> >> >> rebeca_at_mmb.pcb.ub.es
> >> >>
> >> >>
> -----------------------------------------------------------------------
> >> >> The AMBER Mail Reflector
> >> >> To post, send mail to amber_at_scripps.edu
> >> >> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
> >> >>
> >> >
> -----------------------------------------------------------------------
> >> > The AMBER Mail Reflector
> >> > To post, send mail to amber_at_scripps.edu
> >> > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
> >> >
> >>
> >>
> >>
> >>
> >
>
>
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu