AMBER Archive (2008)

Subject: AMBER: amber 10 and mpich2 (got eof on console error message from mpich2)

From: Vlad Cojocaru (Vlad.Cojocaru_at_eml-r.villa-bosch.de)
Date: Thu Jul 17 2008 - 08:30:24 CDT


Dear amber users,

Maybe this is not the proper list to ask about this but I tried all
possible archives (mpich2 list as well) and found no answer to this. So,
I try to appeal at your experience with running mpi jobs

As I reported before, I compiled AMBER 10 (including PMEMD) with MPICH2
(intel compilers for both amber and mpich2, no root). I did this on one
node (named 06-01) in a local directory available through the network).
Everything seemed fine and the executables (both sander.MPI and pmemd)
are running nicely (also parallel performance of PMEMD is quite good) so
I was very happy. However, in the beginning I only tested on the node I
compiled 06-01 and on another one 06-02.

When I tried to run on a different node (05-02), I got an error:
mpiexec_node-05-02 (mpiexec 255): no msg recvd from mpd during version check

----------------------------command used
---------------------------------------------------------------------------------------------
${MPI_HOME}/bin/mpiexec -gdb -machinefile machines -n 4 \
${AMBERHOME}/exe/pmemd -O -i .............
------------------------------------------------------------------------------------------------------------------------------------------

Trying to disect this error, I started playing with the mpi deamons on
this node. I run mpd and mpdtrace for dignostic. To my surprise mpdtrace
did not report the name of the node (as it correctly did previously on
06-01 and 06-02). Instead I got "mpdtrace (mpdtrace 57): got eof on
console". The full error message (shown below) suggests a connection
problem from node-05-02 to itself. However I can do ssh with password
from 05-02 to itsself.

The nodes are AMD Opterons (05-02 is a 2 dual core CPU machine while
06-01 and 06-02 have 4 dual core CPUs). OS=Debian Linux. I should also
say that there are some differences in the kernel between the 05-02 node
and the 06 nodes.

Has anybody seen such a behavior before? If yes and need more details
please let know which details and I will provide them.

Best wishes
vlad

--full error message from mpdtrace -----
mpdtrace (mpdtrace 57): got eof on console
node-05-02_59965 (mpd_sockpair 226): connect 110 Connection timed out
node-05-02_59965 (mpd_sockpair 233): connect error with 110 Connection
timed out
node-05-02_59965 (mpd_sockpair 244): connect 22 Invalid argument
node-05-02_59965: mpd_uncaught_except_tb handling:
  socket.error: (22, 'Invalid argument')
    
/scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin/mpdlib.py
245 mpd_sockpair
        raise socket.error, errinfo
    
/scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin/mpdlib.py
802 create_single_mem_ring
        self.lhsSock,self.rhsSock = mpd_sockpair()
    
/scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin/mpdlib.py
848 enter_ring
        rhsHandler=rhsHandler)
    /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd 250 run
        rhsHandler=self.handle_rhs_input)
    /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd 1492 ?
        
mpd.run()

-- 
----------------------------------------------------------------------------
Dr. Vlad Cojocaru

EML Research gGmbH Schloss-Wolfsbrunnenweg 33 69118 Heidelberg

Tel: ++49-6221-533266 Fax: ++49-6221-533298

e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de

http://projects.villa-bosch.de/mcm/people/cojocaru/

---------------------------------------------------------------------------- EML Research gGmbH Amtgericht Mannheim / HRB 337446 Managing Partner: Dr. h.c. Klaus Tschira Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter http://www.eml-r.org ----------------------------------------------------------------------------

----------------------------------------------------------------------- The AMBER Mail Reflector To post, send mail to amber_at_scripps.edu To unsubscribe, send "unsubscribe amber" (in the *body* of the email) to majordomo_at_scripps.edu