| AMBER Archive (2006)Subject: Re: AMBER: problems for running sander.MPI
From: Christophe Deprez (christophe.deprez_at_bri.nrc.ca)Date: Mon Oct 30 2006 - 12:27:12 CST
 
 
 
 
Dear Ross,
 Thanks for your suggestions.
 I initially didn't mention our Fedora Core nodes were running OpenMosix 
(2.4.24), which we found out was certainly part of the problem! We
 eventually switched from OpenMPI to MPICH2 (compiled with
 --enable-threads=single) and stopped OpenMosix on our nodes. This has
 been our stablest configuration so far.
 
 I am now testing another configuration using the latest OpenMosix 
(2.4.26) under CentOS 3.8, which looks fine too!
 
 Christophe
 Ross Walker wrote:
 > Dear Christophe,
>
 > This is my first experience with openmpi. Which openmpi test suite are
 > you refering to? Where is it documented?
 > I have never used Openmpi myself either. I tend to use mpich2. There
 > should be some kind of test suite distributed with the source code
 > though. Check the install docs. Typically you do something like:
 > ./configure; make; make test; make install
 >
 > It is the make test bit that you need to lookup.
 >
 > Unfortunately, the error is not always from the same node!
 >
 > HHmmm, then it could be the switch but could also be an issue with the
 > openmpi installation. Try downloading mpich2 and trying that out
 > instead and see if it works.
 >
 > You could also try building pmemd in $AMBERHOME/src/pmemd and then
 > testing this. If you see similar problems then it is definately an
 > issue with the openmpi installation or the hardware.
 >
 > All the best
 > Ross
 >
 > /\
 > \/
 > |\oss Walker
 >
 >     ------------------------------------------------------------------------
 >     From: owner-amber_at_scripps.edu [mailto:owner-amber_at_scripps.edu] On
 >     Behalf Of Christophe Deprez
 >     Sent: Thursday, October 12, 2006 06:55
 >     To: amber_at_scripps.edu
 >     Subject: Re: AMBER: problems for running sander.MPI
 >
 >     Ross Walker wrote:
 >
 >>Hi  Qizhi
 >>
 >>>enode05:03662] mca_btl_tcp_frag_send: writev failed with errno=104
 >>>
 >>>(enode05 is one of the node names of the cluster.)
 >>>
 >>>Normmally, there is no problem for minimization and constant
 >>>NVT steps.
 >>>The problems often occur during constant NPT and production run.
 >>>
 >     Hi Ross, and thanks for your reply.
 >     I'm working as sysadmin with Qizhi to troubleshoot this issue.
 >
 >>This looks like a hardware problem to me. Unfortunately a Google search
 >>sheds little light. E.g.:
 >>http://www.open-mpi.org/community/lists/users/2006/02/0684.php
 >>
 >>Have you seen this with any other codes? Can you run the openmpi test suite
 >>successfully?
 >>
 >     This is my first experience with openmpi. Which openmpi test suite
 >     are you refering to? Where is it documented?
 >
 >>I would check to see if the error is always from the same node. If you
 >>unplug that node and use the remaining nodes do you see the problem.
 >>
 >     Unfortunately, the error is not always from the same node!
 >
 >>I would also try compiling with g95 instead of gfortran. While it appears
 >>that gfortran is now mature enough to compile Amber I don't know if it has
 >>been thoroughly tested. You will probably have to recompile openmpi with g95
 >>as well.
 >>
 >     I'll give this a try.
 >
 >     Thanks for your suggestions
 >
 
 
-- 
Christophe Deprez                         christophe.deprez_at_bri.nrc.ca
----------------------------------------------------------------------
Institut de Recherche en Biotechnologies / Biotech. Research Institute
6100 Royalmount, Montréal (QC) H4P 2R2, Canada     Tel: (514) 496-6164 
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
 
 |