AMBER Archive (2005)

Subject: RE: AMBER: Amber Performance in Parallel on Itanium

From: Ross Walker (ross_at_rosswalker.co.uk)
Date: Wed Mar 30 2005 - 10:20:14 CST


Dear Rob,

> Could this problem simply be due to our use of NFS as a way
> to share the required files?
> Should we consider distributing the data over all of the
> nodes and have amber access local files? Any help or insight
> that you can provide would be greatly appreciated.

This is indeed strange behaviour. One question, however, how often are you
writing to the mdcrd or output files? When sander is running in parallel
only one cpu (the master) does any form of file IO. As such it is possible
that the node that is stuck around 50% utilisation is spending it's whole
time waiting to write data over NFS while the other nodes are stuck waiting
for a blocking send. Normally this sort of behaviour does not occur because
the amount of data to be written over NFS is small and infrequent. However,
if you are writing to the mdcrd file frequently and/or if the NFS back bone
is saturated by other machines then you could definitely have problems. I
see similar behaviour with our SGI altix machine. If you write to a
filestore that is remote from the machine during a run then it is the kiss
of death. This I believe, however, is an issue with the process by which PBS
creates virtual machines rather than a saturated NFS back bone.

You can tweak with the NFS server parameters (number of NFS server threads
to run, size of buffers etc.) and also hack the code a bit to cache all
writes directly in to ram and then write it to disk several seconds later.
However the simplest solution is to write the output files to a disk that is
local to the master node and then copy the data off afterwards. You don't
need to worry about the sander executable or the input files as these are
only read once so these can be left on NFS shares. Just the output files
(mdcrd, mdout, restrt) should be written to a local disk. If you are using
PBS for queuing then you can use the $PBSTMPDIR variable as the directory to
write output files to.

E.g.

mpirun -np 8 /nfs_share/bin/sander -O -i /nfs_share/users_dir/mdin -p
/nfs_share/users_dir/prmtop -c /nfs_share/users_dir/inpcrd -o
$PBSTMPDIR/mdout -r $PBSTMPDIR/restrt -x $PBSTMPDIR/mdcrd

Then as the last line of your PBS script you can copy the output files out
of the PBSTMPDIR and into the users home directory.

cp $PBSTMPDIR/mdcrd /nfs_share/users_dir/

Etc.

Scali's management system should have a similar method for scratch
directories but I don't know off hand what it is.

The only drawback with this way of doing things is that you can't easily
watch the output file to check on the status of the job. There are ways
around this but they are not very elegant. Often I find that the mdcrd file
is by far the largest and so really it is only this file that needs to be
written to the PBSTMPDIR. All the other files can be written to the
nfs_share with very little loss in performance. Your mileage may vary.

I hope this helps.

Just my 2c...

All the best
Ross

/\
\/
|\oss Walker

| Department of Molecular Biology TPC15 |
| The Scripps Research Institute |
| Tel:- +1 858 784 8889 |
| EMail:- ross_at_rosswalker.co.uk |
| http://www.rosswalker.co.uk/ | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu