AMBER Archive (2004)

Subject: Re: AMBER: pmemd and mpich - myrinet

From: Robert Duke (rduke_at_email.unc.edu)
Date: Sat Apr 24 2004 - 07:31:59 CDT


Lubos -
Hard to say what it is, but I would suspect some system incompatibility
rather than it being pentium options. It is easy enough to test. Build
pmemd with the shipped Machine.mpich_gm_ifc (with the environment variables
fixed up) and see what that does. Look carefully at your makefile outputs.
I recommend always building the single processor version of pmemd first too,
since you will pick up a lot of problems with the compile process without
the added noise of the mpi stuff (I don't think that will help a lot here,
but I would always go that path on a machine I had any doubts about). My
gut feel is that perhaps the compiler or the mpich-gm s/w are different, and
there are compatibility problems there. There are required patches in
mpich-gm I believe, if you are using the intel fortran compiler (get the
system guys and myrinet involved to confirm the status of the mpich-gm s/w).
Also, if you have not done this, confirm that myrinet is working NOW by
running your old pmemd or whatever you have that used to work. I have seen
the kind of messages you have here when myrinet gets hosed. I don't
administer the cluster, so I don't know the details and what you can do
about it. Good luck.
Regards - Bob Duke

----- Original Message -----
From: "Lubos Vrbka" <shnek_at_tiscali.cz>
To: <amber_at_scripps.edu>
Sent: Saturday, April 24, 2004 6:50 AM
Subject: Re: AMBER: pmemd and mpich - myrinet

> bob,
>
> > 1) There is no machinefile for xeons with myrinet simply because I did
not
> > have access to a xeons + myrinet installation. It is easy to make the
> > appropriate modifications to Machine.mpich_gm_ifc (diff
Machine.mpich_ifc
> > and Machine.mpich_ifc_p4, and craft up your own Machine.mpich_gm_ifc_p4)
> i tried to create my own Machine.mpich_gm_ifc_p4 according to your
> instructions
>
> ...
>
> setenv MPICH_HOME /software/mpich-1.2.5/build/LINUX/ch_gm_intel
> setenv GM_HOME /packages/run/gm-1.6.3
>
> ...
>
> setenv MPICH_INCLUDE $MPICH_HOME/include
> setenv MPICH_LIBDIR $MPICH_HOME/lib
> setenv MPICH_LIBDIR2 $GM_HOME/lib
>
> setenv MACHINE LINUX_INTEL_P4
> setenv MACH Linux
> setenv MACHINEFLAGS "-DREGNML -DMPI -DNO_MPI_BUFFER_ALIASING
> -DINTEL_P4_VECT_OPT"
> setenv FFLAGS "-DSHORT=INTEGER(2) -DLOGICAL_BYTE=LOGICAL(1)"
> setenv MACHINEFLAGS "$MACHINEFLAGS $FFLAGS"
>
> # CPP is the cpp for this machine
> setenv CPP "/lib/cpp -traditional -I$MPICH_INCLUDE"
> setenv CC "gcc "
> setenv LOADCC "gcc "
>
> # SYSDIR is the name of the system-specific source directory relative to
> src/*/
> setenv SYSDIR Machines/intel
>
> # COMPILER ALIASES:
>
> # little or no optimization:
> setenv L0 "ifc -c -auto -tpp7 -mp1 -O0"
>
> # modest optimization (local scalar):
> setenv L1 "ifc -c -auto -tpp7 -mp1 -O2"
>
> # high scalar optimization (but not vectorization):
> setenv L2 "ifc -c -auto -tpp7 -xW -mp1 -ip -O3"
>
> # high optimization (may be vectorization, not parallelization):
> setenv L3 "ifc -c -auto -tpp7 -xW -mp1 -ip -O3"
>
> # LOADER/LINKER:
> setenv LOAD "ifc -static"
> setenv LOADLIB "-limf -lsvml -lPEPCF90 -L$MPICH_LIBDIR -lmpich
> -L$MPICH_LIBDIR2 -lgm -lpthread"
>
> # ranlib, if it exists
> setenv RANLIB ranlib
>
> pmemd builds fine. all 4 copies of the job (i'm running on 2 nodes of
> dual xeon machines) are spawned correctly, but then:
>
> All 4 tasks started.
> read_gm_startup_ports: waiting for info
> read_gm_startup_ports: mpich gm version 1248
> read_gm_startup_ports: id 1 port 2 board 0 gm_node_id 19
> numanode 0 pid 12146 remote_port 0
> read_gm_startup_ports: id 3 port 4 board 0 gm_node_id 19
> numanode 0 pid 12160 remote_port 0
> [0] Error: Unable to open a GM port !
> [0] Error: write to socket failed !
> [2] Error: Unable to open a GM port !
> [2] Error: write to socket failed !
>
> and execution is aborted. it surprises me since the same version of gm
> libraries works correctly on the same xeon machines with pmemd built
> without xeon-specific options. i'm wondering whether there is some
> problem with myrinet and i should ask the sysadmins of the cluster, or
> the problem is somewhere in the machinefile...
>
> regards,
>
> --
> Lubos
> _@_"
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu