AMBER Archive (2008)

Subject: AMBER: sander.MPI setup with SGE

From: Sasha Buzko (obuzko_at_ucla.edu)
Date: Fri Apr 04 2008 - 11:59:34 CDT


Dear Amber community,

we are trying to set up sander and (eventually) pmemd on a cluster,
however we need a little help with the final configuration pieces.

The binaries for sander.MPI and pmemd compiled fine with OpenMPI
(version 1.2.5). I tested sander.MPI vs serial sander and produced
identical results, so the binaries seem to be in good shape. For now, we
would like to start with the sander.MPI configuration, since pmemd has
not been tested yet..

On the cluster management node, I created an additional queue (sander.q)
with 5 nodes for testing purposes (4 slots on each, total of 20) and
associated it with a parallel environment patterned after an MPI
template in the SGE distribution:

pe_name mpi
slots 20
user_lists admins
xuser_lists NONE
start_proc_args /opt/n1ge6/mpi/startmpi.sh $pe_hostfile
stop_proc_args /opt/n1ge6/mpi/stopmpi.sh
allocation_rule $round_robin
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min

The nodes have amber exe/dat contents under /var/amber with the
$AMBERHOME set, as well as openmpi binaries under /var/openmpi. The same
arrangement is at the cluster management node.

At this point, the problem seems to be in the job submission script or
maybe some missing piece of configuration. The script I used is as
follows (trying to test an example from one of the Amber tutorials):

#!/bin/bash
# job name
#$ -N DNA_test
#
# pe request
#$ -pe mpi 20

/var/openmpi/bin/mpirun -np 20 -machinefile $TMPDIR/machines
$AMBERHOME/exe/sander.MPI -O -i /data/amber/polyAT_gb_md1_12Acut.in
-o /data/amber/polyAT_gb_md1_12Acut.out
-c /data/amber/polyAT_gb_init_min.rst -p /data/amber/polyAT_vac.prmtop
-r /data/amber/polyAT_gb_md1_12Acut.rst
-x /data/amber/polyAT_gb_md1_12Acut.mdcrd

(The above command is in one line, in case some mail clients wrap it
into several lines).

I invoked it as:

qsub -N DNA_test -q sander.q mpi.sh

The job appeared briefly in the pending status, and then was dumped in
the finished. I couldn't find any error messages, on the nodes or at the
server. The submitting user is the queue owner, so it doesn't seem to be
an account/permissions issue. Both input and output files are accessible
on all hosts (submit, server and the nodes), the directory is NFS
mounted.
Is anything needs to be done explicitly to specify the relevant nodes or
is this information taken from the queue information?
Do I need to set up passwordless ssh access from the server to the
nodes?

Any hints/suggestions or references to docs would be very much
appreciated.

Thank you

Sasha

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu