AMBER Archive (2007)

Subject: RE: AMBER: Slow Processor Loads when Using PMEMD

From: Ross Walker (ross_at_rosswalker.co.uk)
Date: Mon Jul 02 2007 - 20:42:01 CDT


Hi Jonathan,
 
This is really really weird. If indeed all you did was change irest=0 to 1
and ntx=1 to 5 then this makes no sense at all you should see exactly the
same performance as you saw with irest=0 and ntx=1. That said at low
temperatures you might see better performance due to less frequent list
builds but unless you have some kind of NMR restraints I would expect your
system to heat up pretty quickly to temp0. Unless you are running NVE
(ntb=1, ntt=0) in which case this would make sense since then your
temperature would remain at zero and you would never do a list build.
 
When you set irest=0 and ntx=1 and see the reasonable performance do you use
the new restrt file as the inpcrd file (obviously just ignoring the
velocities) and what do you have the initial temperature set to?
 
Gigabit ethernet is notoriously bad for performance but you really should
see any variation between runs based soely on whether it is a restart or
not. It would either both be bad or both be good.
 
A couple of things to investigate.
 
Try running the following two simulations starting from your initial
structure and the initial parameters.
 
1) 50ps straight up from the inpcrd file with irest=0 and ntx=1
 
2) 25ps straight up from the inpcrd file with irest=0 and ntx=1
  followed by a further 25ps from the restart file with irest=1 and ntx=5.
 
These should take approximately the same total time.
 
Another thing to check would be the locality of the nodes you get. Often
when people build large clusters with gigabit ethernet they just chain a
load of smaller gigabit switches together. Hence if you get all 16 nodes
allocated on one switch then you have 1 gigabit available between all nodes.
However, if they linked two switches together with just a single gigabit
crossover cable and you get 8 nodes on each switch then you have a single
1Gbit bottleneck between the two banks of 8 nodes so effectively for an all
to all communication you only have an effective bandwidth of around 0.125
Gbit per node. So check which nodes you get allocated for the two runs - are
they always the same - and or can you force the queuing system to give you
switch locality?
 
Good luck...
 
Ross
 

/\
\/
|\oss Walker

| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
| http://www.rosswalker.co.uk <http://www.rosswalker.co.uk/> | PGP Key
available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

 

  _____

From: owner-amber_at_scripps.edu [mailto:owner-amber_at_scripps.edu] On Behalf Of
Jonathan Suever
Sent: Monday, July 02, 2007 17:29
To: amber_at_scripps.edu
Subject: AMBER: Slow Processor Loads when Using PMEMD

I am currently running a simulation for a total of 10 ns. I have previously
run the simulation up to 5 ns and now would like to submit another job in
order to continue running the simulation for the remaining 5 ns. To perform
these calculations, I am using PMEMD installed on a cluster and I am
utilizing 16 processors.

I made a few changes to the input file for the second portion. These
involve changing the following values:

ntx = 5 ## I use this in order to read in the formatted velocity
information from the first job
irest = 1 ## This has to be set to 1 in order for the velocities to be read
in

When I submit this job to the cluster, it runs fine with no errors and shows
that all 16 processors are currently being used. However when the detailed
status is viewed, it can be seen that the highest load placed on any of the
processors is around 0.20 resulting in very slow calculation times.

When I change the ntx and irest values of the input file back to 1 and 0 as
used in the first run, the load on the processors returns to a normal value.

I was basically wondering if anyone has experienced this same problem when
running pmemd on a cluster and attempting to use velocity information from a
previous run.

The only other changes that I made were to the shell script used to run the
job in order that my existing files were not overwritten during the process.
Also, I set the input coordinate file to the output coordinate file from the
previous simulation. Below is the shell script I use to execute the job
(almost entirely the same as the first run):

#!/bin/bash
#$ -S /bin/bash
#$ -m e
#$ -cwd
#$ -p 20
#$ -j y
#$ -N complex_pmemd
#$ -M *******@***.***
# Resource limits: number of CPUs to use
#$ -pe mpi 16
#$ -v MPIR_HOME=/opt/mpich/intel
#$ -v P4_RSHCOMMAND=ssh
#$ -v MPICH_PROCESS_GROUP=no
#$ -v CONV_RSH=ssh
## Prepare nodelist file for mdrun ...
#
echo
"###########################################################################
##########"
echo " STARTED AT: $(date)"
echo ""
echo "NSLOTS: $NSLOTS"
echo "TMPDIR: $TMPDIR"
echo "$TMPDIR/machines file contains"
cat $TMPDIR/machines
#$ -V
export MPI_HOME=/opt/mpich/intel
export LD_LIBRARY_PATH=$MPI_HOME/lib:$LD_LIBRARY_PATH
export AMBER=/ibrixfs/apps/amber/intel/amber-9-64-mpich
export AMBERHOME=/ibrixfs/apps/amber/intel/amber-9-64-mpich
export PATH=$MPI_HOME/bin:$AMBER/exe:$PATH

MPIRUN=${MPI_HOME}/bin/mpirun
MDRUN=${AMBER}/exe/pmemd

export MYFILE=production

$MPIRUN -np $NSLOTS -machinefile $TMPDIR/machines $MDRUN -O -i $MYFILE.in -o
$MYFILE.out -p topology.top -c first_run.crd -r $MYFILE.crd -x $MYFILE.mdcrd
-inf $MYFILE.edr

Any help with this matter would be greatly appreciated. Thank you very
much.

-Jonathan Suever
Undergraduate Researcher
University of Alabama at Birmingham

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu