AMBER Archive (2004)Subject: Re: AMBER: pmemd and mpich - myrinet
From: Robert Duke (rduke_at_email.unc.edu) 
Date: Sat Apr 24 2004 - 07:31:59 CDT
 
 
 
 
Lubos -
 
Hard to say what it is, but I would suspect some system incompatibility
 
rather than it being pentium options.  It is easy enough to test.  Build
 
pmemd with the shipped Machine.mpich_gm_ifc (with the environment variables
 
fixed up) and see what that does.  Look carefully at your makefile outputs.
 
I recommend always building the single processor version of pmemd first too,
 
since you will pick up a lot of problems with the compile process without
 
the added noise of the mpi stuff (I don't think that will help a lot here,
 
but I would always go that path on a machine I had any doubts about).  My
 
gut feel is that perhaps the compiler or the mpich-gm s/w are different, and
 
there are compatibility problems there.  There are required patches in
 
mpich-gm I believe, if you are using the intel fortran compiler (get the
 
system guys and myrinet involved to confirm the status of the mpich-gm s/w).
 
Also, if you have not done this, confirm that myrinet is working NOW by
 
running your old pmemd or whatever you have that used to work.  I have seen
 
the kind of messages you have here when myrinet gets hosed.  I don't
 
administer the cluster, so I don't know the details and what you can do
 
about it.  Good luck.
 
Regards - Bob Duke
 
 ----- Original Message ----- 
 
From: "Lubos Vrbka" <shnek_at_tiscali.cz>
 
To: <amber_at_scripps.edu>
 
Sent: Saturday, April 24, 2004 6:50 AM
 
Subject: Re: AMBER: pmemd and mpich - myrinet
 
 > bob,
 
>
 
> > 1) There is no machinefile for xeons with myrinet simply because I did
 
not
 
> > have access to a xeons + myrinet installation.  It is easy to make the
 
> > appropriate modifications to Machine.mpich_gm_ifc (diff
 
Machine.mpich_ifc
 
> > and Machine.mpich_ifc_p4, and craft up your own Machine.mpich_gm_ifc_p4)
 
> i tried to create my own Machine.mpich_gm_ifc_p4 according to your
 
> instructions
 
>
 
> ...
 
>
 
> setenv MPICH_HOME /software/mpich-1.2.5/build/LINUX/ch_gm_intel
 
> setenv GM_HOME /packages/run/gm-1.6.3
 
>
 
> ...
 
>
 
> setenv MPICH_INCLUDE $MPICH_HOME/include
 
> setenv MPICH_LIBDIR $MPICH_HOME/lib
 
> setenv MPICH_LIBDIR2 $GM_HOME/lib
 
>
 
> setenv MACHINE LINUX_INTEL_P4
 
> setenv MACH Linux
 
> setenv MACHINEFLAGS "-DREGNML -DMPI -DNO_MPI_BUFFER_ALIASING
 
> -DINTEL_P4_VECT_OPT"
 
> setenv FFLAGS "-DSHORT=INTEGER(2) -DLOGICAL_BYTE=LOGICAL(1)"
 
> setenv MACHINEFLAGS "$MACHINEFLAGS $FFLAGS"
 
>
 
> # CPP is the cpp for this machine
 
> setenv CPP "/lib/cpp -traditional  -I$MPICH_INCLUDE"
 
> setenv CC "gcc "
 
> setenv LOADCC "gcc "
 
>
 
> # SYSDIR is the name of the system-specific source directory relative to
 
> src/*/
 
> setenv SYSDIR Machines/intel
 
>
 
> # COMPILER ALIASES:
 
>
 
> # little or no optimization:
 
> setenv L0 "ifc -c -auto -tpp7 -mp1 -O0"
 
>
 
> # modest optimization (local scalar):
 
> setenv L1 "ifc -c -auto -tpp7 -mp1 -O2"
 
>
 
> # high scalar optimization (but not vectorization):
 
> setenv L2 "ifc -c -auto -tpp7 -xW -mp1 -ip -O3"
 
>
 
> # high optimization (may be vectorization, not parallelization):
 
> setenv L3 "ifc -c -auto -tpp7 -xW -mp1 -ip -O3"
 
>
 
> # LOADER/LINKER:
 
> setenv LOAD "ifc -static"
 
> setenv LOADLIB "-limf -lsvml -lPEPCF90 -L$MPICH_LIBDIR -lmpich
 
> -L$MPICH_LIBDIR2 -lgm -lpthread"
 
>
 
> # ranlib, if it exists
 
> setenv RANLIB ranlib
 
>
 
> pmemd builds fine. all 4 copies of the job (i'm running on 2 nodes of
 
> dual xeon machines) are spawned correctly, but then:
 
>
 
> All 4 tasks started.
 
> read_gm_startup_ports: waiting for info
 
> read_gm_startup_ports: mpich gm version 1248
 
> read_gm_startup_ports: id 1 port 2 board 0 gm_node_id 19
 
>    numanode 0 pid 12146 remote_port     0
 
> read_gm_startup_ports: id 3 port 4 board 0 gm_node_id 19
 
>    numanode 0 pid 12160 remote_port     0
 
> [0] Error: Unable to open a GM port !
 
> [0] Error: write to socket failed !
 
> [2] Error: Unable to open a GM port !
 
> [2] Error: write to socket failed !
 
>
 
> and execution is aborted. it surprises me since the same version of gm
 
> libraries works correctly on the same xeon machines with pmemd built
 
> without xeon-specific options. i'm wondering whether there is some
 
> problem with myrinet and i should ask the sysadmins of the cluster, or
 
> the problem is somewhere in the machinefile...
 
>
 
> regards,
 
>
 
> -- 
 
> Lubos
 
> _@_"
 
> -----------------------------------------------------------------------
 
> The AMBER Mail Reflector
 
> To post, send mail to amber_at_scripps.edu
 
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
>
 
 -----------------------------------------------------------------------
 
The AMBER Mail Reflector
 
To post, send mail to amber_at_scripps.edu
 
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
 
  
 |