AMBER Archive (2008)
Subject: Re: AMBER: problems with Replica Exchange

From: Carlos Simmerling (carlos.simmerling_at_gmail.com)
Date: Thu Feb 28 2008 - 10:14:14 CST

Next message: Bill Ross: "Re: AMBER: Lennard-Jones parameters"
Previous message: Rita Cassia: "AMBER: Liquid/Gas Phase"
In reply to: rebeca_at_mmb.pcb.ub.es: "Re: AMBER: problems with Replica Exchange"
Next in thread: neva_at_mmb.pcb.ub.es: "AMBER: el w-end en galicia!"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I tried your input files and they work fine for me with 2 replicas under amber9.
can you just change your groupfile to -rem 0 and the mdin files to
remove numexchg, and run again? that way everything is the same
but it will not use remd.

also if you send your inpcrd and prmtop I can try with those, I used my own
system with the mdin file that you sent.

since it's only 2 processes, you might also try running outside of the
queuing software.
carlos

On Thu, Feb 28, 2008 at 10:16 AM, <rebeca_at_mmb.pcb.ub.es> wrote:
> Thanks for your reply. I am using only 2 replicas, and only one processor for
> each replica, since it is easier to optimize the method first with only
> 2 ones.
> When it works, I will add more, of course.
> I have tried your suggestion, the multisander job works fine with the same
> restart and topology but DIFFERENT inputs (those for a standar molecular
> dynamics). So do you think it could be a problem with the inputs? I am using
> those that work for the tests, these ones:
>
> rem.in.001:
>
> Title Line
> &cntrl
> imin = 0, nstlim = 100, dt = 0.002,
> ntx = 5, tempi = 0.0, temp0 = 325.0,
> ntt = 3, tol = 0.000001, gamma_ln = 1.0,
> ntc = 2, ntf = 1, ntb = 0,
> ntwx = 500, ntwe = 0, ntwr =500, ntpr = 100,
> scee = 1.2, cut = 99.0,
> ntr = 0, tautp = 0.1, offset = 0.09,
> nscm = 500, igb = 5, irest=1,
> ntave = 0, numexchg=5,
> &end
>
> rem.in.002
>
> Title Line
> &cntrl
> imin = 0, nstlim = 100, dt = 0.002,
> ntx = 5, tempi = 0.0, temp0 = 350.0,
> ntt = 3, tol = 0.000001, gamma_ln = 1.0,
> ntc = 2, ntf = 1, ntb = 0,
> ntwx = 500, ntwe = 0, ntwr =500, ntpr = 100,
> scee = 1.2, cut = 99.0,
> ntr = 0, tautp = 0.1, offset = 0.09,
> nscm = 500, igb = 5, irest=1,
> ntave = 0, numexchg=5,
> &end
>
>
> As groupfile I use:
>
> #
> #
> -O -rem 1 -remlog rem.log -i ./rem.in.001 -p ./1ftg_wat.top -c
> ./md_prod_5.r -o
> ./rem.out.001 -inf reminfo.001 -r ./rem.r.001
> -O -rem 1 -remlog rem.log -i ./rem.in.002 -p ./1ftg_wat.top -c
> ./md_prod_5.r -o
> ./rem.out.002 -inf reminfo.002 -r ./rem.r.002
>
>
> And the script for executing the calculation is:
>
> #!/bin/bash
> # @ class = bsc_ls
> # @ job_name = test_parallel
> # @ initialdir = .
> # @ output = OUTPUT/mpi_%j.out
> # @ error = OUTPUT/mpi_%j.err
> # @ total_tasks = 2
> # @ wall_clock_limit = 00:01:00
>
> export XLFRTEOPTS="namelist=old:xrf_messages=no"
>
> srun /gpfs/apps/AMBER/src/9/exe/sander.MPI -O -ng 2 -groupfile groupfile <
> /dev/null
>
>
> As I told you the restart and topology work well for a multisander job, with
> standar molecular dynamics. When I try to execute this inputs for Replica
> Exchange calculations, it only generates the EMPTY files rem.out.001 and
> rem.out.002 and I get this error in the error file:
>
>
> [0] MPI Abort by user Aborting program !
> [0] Aborting program!
> [1] MPI Abort by user Aborting program !
> [1] Aborting program!
> srun: error: s26c2b12: task[0-1]: Exited with exit code 255
>
>
>
> The output file gives:
>
> Running multisander version of sander amber9
> Total processors = 2
> Number of groups = 2
>
> Looping over processors:
> WorldRank is the global PE rank
> NodeID is the local PE rank in current group
>
> Group = 0
> WorldRank = 0
> NodeID = 0
>
> Group = 1
> WorldRank = 1
> NodeID = 0
>
>
> Any idea? Something wrong with the inputs?
>
>
> Rebeca García Fandiño Ph. D.
> Parc Cientific de Barcelona
> Barcelona Spain
> rebeca_at_mmb.pcb.ub.es
>
>
>
>
>
>
> Quoting Carlos Simmerling <carlos.simmerling_at_gmail.com>:
>
> > the thing to try first is 1 processor per group. this way you
> > know that output from shake errors etc will get written to the
> > output file, which only the master process for each replica can do.
> > this is the same situation in normal MD- if there is a problem with no
> > error msg in the output always try to run single processor to test it.
> > you should not need anything special in the restart file from sander,
> > it can be used directly for remd. it's hard to help more since you haven't
> > told us much of anything about how you are doing the calculation.
> >
> > are you using only 2 replicas?
> >
> > does the same multisander job work fine if you just turn remd off (but
> > otherwise use exactly the same input files)?
> >
> > On Thu, Feb 28, 2008 at 7:29 AM, <rebeca_at_mmb.pcb.ub.es> wrote:
> >> Hello,
> >> I am trying to do Replica Exchange calculations using Amber 9. When
> >> I try with
> >> the files of the example of the tests, it works, but when I try
> >> with my protein
> >> I have problems. Using directly the usual restart file from a
> >> sander calculation
> >> I get problems of the type
> >>
> >> [1] MPI Abort by user Aborting program !
> >> [1] Aborting program!
> >> [0] MPI Abort by user Aborting program !
> >> [0] Aborting program!
> >> srun: error: s30c1b04: task[0-1]: Exited with exit code 255
> >>
> >> However, when I create the restart file from the trajectory file
> >> with ptraj the
> >> calculation stops with no errors, but stop writting at the point (in the
> >> rem.out files):
> >>
> >> ...................
> >> trajectory generated by ptraj
> >> begin time read from input coords = 0.000 ps
> >>
> >> Number of triangulated 3-point waters found: 0
> >> | Atom division among processors:
> >> | 0 2573
> >> | Running AMBER/MPI version on 1 nodes
> >>
> >> | MULTISANDER: 2 groups. 1 processors out of 2 total.
> >> ....................
> >>
> >> It creates the correspondent files reminfo and rem.log, but they
> >> are all empty.
> >> In the error file I only can see "srun: Force Terminated job".
> >>
> >> Since the same calculation works with the protein that appears in the test
> >> examples, maybe could it be a problem of format? Should I do any special
> >> treatment to the restart file I use for the calculations?
> >>
> >> Thank you very much for you help, in advance.
> >>
> >> Rebeca García Fandiño Ph. D.
> >> Parc Cientific de Barcelona
> >> Barcelona Spain
> >> rebeca_at_mmb.pcb.ub.es
> >>
> >> -----------------------------------------------------------------------
> >> The AMBER Mail Reflector
> >> To post, send mail to amber_at_scripps.edu
> >> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
> >>
> > -----------------------------------------------------------------------
> > The AMBER Mail Reflector
> > To post, send mail to amber_at_scripps.edu
> > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
> >
>
>
>
>
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu

Next message: Bill Ross: "Re: AMBER: Lennard-Jones parameters"
Previous message: Rita Cassia: "AMBER: Liquid/Gas Phase"
In reply to: rebeca_at_mmb.pcb.ub.es: "Re: AMBER: problems with Replica Exchange"
Next in thread: neva_at_mmb.pcb.ub.es: "AMBER: el w-end en galicia!"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

AMBER Archive (2008)Subject: Re: AMBER: problems with Replica Exchange

AMBER Archive (2008)
Subject: Re: AMBER: problems with Replica Exchange