| AMBER Archive (2008)Subject: AMBER: amber 10 and mpich2 (got eof on console error message from mpich2)
From: Vlad Cojocaru (Vlad.Cojocaru_at_eml-r.villa-bosch.de)Date: Thu Jul 17 2008 - 08:30:24 CDT
 
 
 
 
Dear amber users,
 Maybe this is not the proper list to ask about this but I tried all 
possible archives (mpich2 list as well) and found no answer to this. So,
 I try to appeal at your experience with running mpi jobs
 
 As I reported before, I compiled AMBER 10 (including PMEMD) with MPICH2 
(intel compilers for both amber and mpich2, no root). I did this on one
 node (named 06-01) in a local directory available through the network).
 Everything seemed fine and the executables (both sander.MPI and pmemd)
 are running nicely (also parallel performance of PMEMD is quite good) so
 I was very happy. However, in the beginning I only tested on the node I
 compiled 06-01 and on another one 06-02.
 
 When I tried to run on a different node (05-02), I got an error: 
mpiexec_node-05-02 (mpiexec 255): no msg recvd from mpd during version check
 
 ----------------------------command used 
---------------------------------------------------------------------------------------------
 ${MPI_HOME}/bin/mpiexec -gdb -machinefile machines -n 4 \
 ${AMBERHOME}/exe/pmemd -O -i .............
 ------------------------------------------------------------------------------------------------------------------------------------------
 
 Trying to disect this error, I started playing with the mpi deamons on 
this node. I run mpd and mpdtrace for dignostic. To my surprise mpdtrace
 did not report the name of the node (as it correctly did previously on
 06-01 and 06-02).  Instead I got "mpdtrace (mpdtrace 57): got eof on
 console". The full error message (shown below) suggests a connection
 problem from node-05-02 to itself. However I can do ssh with password
 from 05-02 to itsself.
 
 The nodes are AMD Opterons (05-02 is a 2 dual core CPU machine while 
06-01 and 06-02 have 4 dual core CPUs). OS=Debian Linux. I should also
 say that there are some differences in the kernel between the 05-02 node
 and the 06 nodes.
 
 Has anybody seen such a behavior before? If yes and need more details 
please let know which details and I will provide them.
 
 Best wishes
vlad
 
 --full error message from mpdtrace -----
mpdtrace (mpdtrace 57): got eof on console
 node-05-02_59965 (mpd_sockpair 226): connect 110 Connection timed out
 node-05-02_59965 (mpd_sockpair 233): connect error with 110 Connection
 timed out
 node-05-02_59965 (mpd_sockpair 244): connect 22 Invalid argument
 node-05-02_59965: mpd_uncaught_except_tb handling:
 socket.error: (22, 'Invalid argument')
 
 /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin/mpdlib.py
 245  mpd_sockpair
 raise socket.error, errinfo
 
 /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin/mpdlib.py
 802  create_single_mem_ring
 self.lhsSock,self.rhsSock = mpd_sockpair()
 
 /scratch/node-06-01/cojocavd/Software/mpich2-1.0.7-install/bin/mpdlib.py
 848  enter_ring
 rhsHandler=rhsHandler)
 /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  250  run
 rhsHandler=self.handle_rhs_input)
 /scratch/node-06-01/cojocavd/Software/mpich2/bin/mpd  1492  ?
 
 mpd.run()
 
 
-- 
----------------------------------------------------------------------------
Dr. Vlad Cojocaru
EML Research gGmbH
Schloss-Wolfsbrunnenweg 33
69118 Heidelberg
 Tel: ++49-6221-533266
Fax: ++49-6221-533298
 e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
 http://projects.villa-bosch.de/mcm/people/cojocaru/
 ----------------------------------------------------------------------------
EML Research gGmbH
Amtgericht Mannheim / HRB 337446
Managing Partner: Dr. h.c. Klaus Tschira
Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
http://www.eml-r.org
----------------------------------------------------------------------------
 -----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
      to majordomo_at_scripps.edu
 
 
 |