AMBER Archive (2002)Subject: Sander, Setup for Parallel SMP Linux Cluster
From: Jianhui Wu (wujih_at_BRI.NRC.CA)
Date: Wed May 15 2002 - 14:02:27 CDT
Dear Amber Linux Cluster users,
I try to compile and run Sander parallel version on a Linux Cluster. Until
now, I can only manage to run Sander (parallel version, amber7) on single
processor. Basically, I don't know how to setup the parallel computing
environment to run sander job with multiple processors. Can someone give
me a hand or point me to some useful instruction for my system?
Here are what I have done.
(1) Sander of amber7 was compiled using pgf77, machine file:
Machine.pgf77_mpich (download from amber webpage), mpich-1.2.4 installed.
(2) Machines: 15 dual processors SMP Linux cluster (amd3-mosix)
(3) I define the DO_PARALLEL variable as follows.
setenv DO_PARALLEL "$MPICH_HOME/bin/mpirun -np 2 -machinefile
$MPICH_HOME/util/machines/machines.LINUX"
(4) The files are shared by all nodes and I can rlogin to each node
without problem.
(5) Problems:
If mpirun -np 1, then, the test jobs are fine.
If mpirun -np 2 or above, the sander job aborted with error message.
For example, if I submit the job with mpirun -np 2 at apple.x.y.ca,
after I define the machine file machines.LINUX as follow,
"apple.x.y.ca" 2
"cherry.x.y.ca" 2
......
(a) I got the error message
****************************************************************************
p0_20194: p4_error: Could not gethostbyname for host "apple.x.y.ca"; may
be invalid name : 61
**************************************************************************
(b) There is a file PI20114 exist after I submit the job. This file
contain
--------------------------------------------------
apple.x.y.ca 0 /home/....../amber7/exe/sander
"apple.x.y.ca" 1 /home/....../amber7/exe/sander
-------------------------------------------------
(c) If I change the machine file into
apple.x.y.ca:2
cherry.x.y.ca:2
.....
I got message:
**************************************
Host key not found from the list of known hosts.
Are you sure you want to continue connecting (yes/no)?
****************************************************
I also try to run lamboot at node1-3, define -np 2 and
run sander again. Similar problem.
It seems I don't even get the two processors in the
same box to work for a single Sander job. As I am new to
parallel computing, could someone give me some tips as to
what should I do (install what libray, which software....)
in order to run Sander job with multiple processors (I have
15 dual-processor nodes).
Thanks a lot for your help,
Jian Hui Wu
Lady Davis Insitute
|