AMBER Archive (2005)Subject: Re: AMBER: replica exchange trouble with unit 6
From: Yuuki Komata (komata_at_glyco.sci.hokudai.ac.jp)
Date: Thu Jan 27 2005 - 19:27:50 CST
Hello Dr. Carlos Simmerling,
Thank you for your reply! My machine have /home partition which is shared by
three machines with NFS, and I tried run REM both under /home/ and other partition.
Both failed with the same error.
Yes, all nodes share the disk which I'm writing because it is not the cluster (though
I do not exactly know each node CAN write the disk). But we use NIS to share information
between the machines which share /home/ partition, and I'm caring one thing;
when I type 'echo $HOME', it's /home/ME. But the command' rsh MYMACHINE
'echo $HOME'', it's /usr/usres/ME. I suppose MPI don't like such a situation, and MPI
doesn't find $PWD and the .out files. Am I correct?
For example when I setenv DO_PARALLEL mpirun -np 4 and run rem_vac test pogram,
rem.out.000 and 003 have RMS FLUCTUATION field (the rather last part of output file)
and rem.out.001 and 002 doesn't. they are finished just before writing RESULTS field.
It seems the different MPI process have the different $PWD information...
Thank you very much.
Yuuki Komata
what kind of machine is it? are you writing to a disk that is shared
by all nodes?
===================================================================
Carlos L. Simmerling, Ph.D.
Associate Professor Phone: (631) 632-1336
Center for Structural Biology Fax: (631) 632-1555
Stony Brook University Web: http://comp.chem.sunysb.edu/carlos
Stony Brook, NY 11794-5115 E-mail: carlos.simmerling_at_stonybrook.edu
===================================================================
Yuuki Komata wrote:
>Hello ambers,
>
> I am trying replica exchange (Multisander REM) with amber8 on SunOS 5.9.
>I succeeded to install with command 'make parallel' with adding '-DREM' to
>AMBERBUILDFLAGS. Installation was all O.K.
>
> But when I downloaded the Replica exchange test suits from AMBER web
>to test the program, it aborts with core and message;
>
> WorldRank = 5
> NodeID = 1
>
> WorldRank = 7
> NodeID = 1
>
> WorldRank = 1
> NodeID = 1
>
> WorldRank = 3
> NodeID = 1
>
>
> Unit 6 Error on OPEN: ./rem.out.000
>
>
> RUNNING MULTISANDER VERSION OF SANDER AMBER8
> Total processors = 8
> Number of groups = 4
>
> Looping over processors:
> WorldRank is the global PE rank
> NodeID is the local PE rank in current group
>
> Group = 0
> WorldRank = 0
> NodeID = 0
>
>Fatal error, aborting.
>
> Unit 6 Error on OPEN: ./rem.out.003
>
> Group = 3
> WorldRank = 6
> NodeID = 0
>
>Fatal error, aborting.
>Job cre.2427 on shikotsu-sv: received signal ABRT.
> ./Run.rem: Program error
>
>
>This problem was just the same as Mrs. Jordi Rodrigo and Mrs. Gilles Marcou
>experienced on Jan 17, 2005 on IBM SP4/AIX. It seemes to solved modifying
>mdread.f and mdfil.f, but it made no sence for my SunOS.
>
> I checked all the comments for Mrs. Jordi Rodrigo and Mrs. Gilles Marcou case
>and found all of them could not solve my problem. I want to know what is the
>problem. I'd like any suggestion.
>
>Yuuki Komata
Yuuki A. Komata, Ph. D. Eng.
komata_at_glyco.sci.hokudai.ac.jp
Fax & Phone : +81-11-706-9038
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
|