AMBER Archive (2002)

Subject: Sander, Setup for Parallel SMP Linux Cluster

From: Jianhui Wu (wujih_at_BRI.NRC.CA)
Date: Wed May 15 2002 - 14:02:27 CDT


Dear Amber Linux Cluster users,

I try to compile and run Sander parallel version on a Linux Cluster. Until
now, I can only manage to run Sander (parallel version, amber7) on single
processor. Basically, I don't know how to setup the parallel computing
environment to run sander job with multiple processors. Can someone give
me a hand or point me to some useful instruction for my system?

Here are what I have done.

(1) Sander of amber7 was compiled using pgf77, machine file:
Machine.pgf77_mpich (download from amber webpage), mpich-1.2.4 installed.
 

(2) Machines: 15 dual processors SMP Linux cluster (amd3-mosix)

(3) I define the DO_PARALLEL variable as follows.

setenv DO_PARALLEL "$MPICH_HOME/bin/mpirun -np 2 -machinefile
$MPICH_HOME/util/machines/machines.LINUX"

(4) The files are shared by all nodes and I can rlogin to each node
without problem.

(5) Problems:
If mpirun -np 1, then, the test jobs are fine.
If mpirun -np 2 or above, the sander job aborted with error message.

For example, if I submit the job with mpirun -np 2 at apple.x.y.ca,
after I define the machine file machines.LINUX as follow,

"apple.x.y.ca" 2
"cherry.x.y.ca" 2
......

(a) I got the error message
****************************************************************************
p0_20194: p4_error: Could not gethostbyname for host "apple.x.y.ca"; may
be invalid name : 61
**************************************************************************

(b) There is a file PI20114 exist after I submit the job. This file
contain
--------------------------------------------------
apple.x.y.ca 0 /home/....../amber7/exe/sander
"apple.x.y.ca" 1 /home/....../amber7/exe/sander
-------------------------------------------------

(c) If I change the machine file into

apple.x.y.ca:2
cherry.x.y.ca:2
.....

I got message:
 **************************************
Host key not found from the list of known hosts.
Are you sure you want to continue connecting (yes/no)?
****************************************************

I also try to run lamboot at node1-3, define -np 2 and
run sander again. Similar problem.

It seems I don't even get the two processors in the
same box to work for a single Sander job. As I am new to
parallel computing, could someone give me some tips as to
what should I do (install what libray, which software....)
in order to run Sander job with multiple processors (I have
15 dual-processor nodes).

Thanks a lot for your help,

Jian Hui Wu

Lady Davis Insitute