AMBER Archive (2008)

Subject: RE: AMBER: Problems compiling Amber with MKL

From: Sasha Buzko (obuzko_at_ucla.edu)
Date: Mon Apr 14 2008 - 18:24:05 CDT


Ross,
one more update. I recompiled OpenMPI with both --disable-shared and
--enable-static. This time, when I compile with the -static flag, I get
a different compile-time error after a set of warnings, which is easily
the least informative one of those I've seen so far:

ld: Warning: size of symbol `_int_realloc' changed from 772
in /data/openmpi/lib/libopen-pal.a(lt1-malloc.o) to 1411
in /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/libc.a(malloc.o)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/libc.a(malloc.o): In function `free':
(.text+0x5f10): multiple definition of `free'
/data/openmpi/lib/libopen-pal.a(lt1-malloc.o):malloc.c:(.text+0x38f2):
first defined here
ld: Warning: size of symbol `free' changed from 264
in /data/openmpi/lib/libopen-pal.a(lt1-malloc.o) to 454
in /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/libc.a(malloc.o)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/libc.a(malloc.o): In function `malloc':
(.text+0x4680): multiple definition of `malloc'
/data/openmpi/lib/libopen-pal.a(lt1-malloc.o):malloc.c:(.text+0x1e):
first defined here
ld: Warning: size of symbol `malloc' changed from 336
in /data/openmpi/lib/libopen-pal.a(lt1-malloc.o) to 466
in /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/libc.a(malloc.o)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/libc.a(malloc.o): In function `realloc':
(.text+0x60e0): multiple definition of `realloc'
/data/openmpi/lib/libopen-pal.a(lt1-malloc.o):malloc.c:(.text+0x33ba):
first defined here
ld: Warning: size of symbol `realloc' changed from 1336
in /data/openmpi/lib/libopen-pal.a(lt1-malloc.o) to 1148
in /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/libc.a(malloc.o)
make[1]: *** [sander.MPI] Error 1
make[1]: Leaving directory `/data/amber9/src/sander'
make: *** [parallel] Error 2

Dynamically linked version compiles, but fails at the test not finding
the MKL libraries, as before. mpirun -np -4 env produces output that
contains the appropriate library directory in the LD_LIBRARY_PATH..

Sasha

On Mon, 2008-04-14 at 15:25 -0700, Sasha Buzko wrote:

> Ross,
> I tried the things you suggested. The replies are inline.
>
> On Mon, 2008-04-14 at 14:32 -0700, Ross Walker wrote:
>
> > 
> > Hi Sasha,
> >
> > export AMBERHOME=/data/amber9
> > export MPI_HOME=/data/openmpi
> >
> > source /opt/intel/cce/10.1.012/bin/iccvars.sh
> > source /opt/intel/fce/10.1.012/bin/ifortvars.sh
> >
> > PATH=/opt/intel/cce/10.1.012/bin:$PATH; export PATH
> > PATH=/opt/intel/fce/10.1.012/bin:$PATH; export PATH
> >
> > Try checking that MPI_HOME/bin is being picked up at
> > the beginning of your path as well - to make sure
> > that 'which mpif90' and 'which mpirun' return the
> > correct versions.
>
>
> which mpif90 and which mpirun return the correct paths. Below is the
> mpif90 -show output:
>
> [sasha_at_abicluster ~]$ mpif90 -show
> /opt/intel/fce/10.1.012/bin/ifort -I/data/openmpi/include -pthread
> -I/data/openmpi/lib -L/data/openmpi/lib -lmpi_f90 -lmpi_f77 -lmpi
> -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil
>
>
>
>
>
> >
> > Also try running:
> >
> > mpif90 -show
> >
> > to make sure it returns the correct compiler etc.
> > E.g. here is mine for ifort with mpich2:
> >
> > [14:21][caffeine:0.04][rcw:~]$ mpif90 -show
> > ifort -g
> > -I/usr/local/mpi/mpich2-1.0.3_ifort9.1.039/include
> > -I/usr/local/mpi/mpich2-1.0.3_ifort9.1.039/include
> > -L/usr/local/mpi/mpich2-1.0.3_ifort9.1.039/lib
> > -lmpichf90 -lmpichf90 -lmpich -lpthread -lrt
> >
> >
> > Serial version compiles ok with or without the
> > -static flag, but make test.serial fails:
> >
> > So the serial version links against the MKL
> > libraries okay then? It is just the parallel version
> > below that doesn't?
> > cd qmmm2/2pk4; ./Run.2pk4_stan
> > This test not set up for parallel, skipping
> >
> > This is really weird - if you really did make
> > test.serial but it returns that "This test not set
> > up for parallel," then something is wrong here. Make
> > sure the DO_PARALLEL and TESTsander variables are
> > NOT set. Then try things again. My suspicion is that
> > you have DO_PARALLEL set so it is running the serial
> > version of sander through mpi - I.e. running
> > multiple copies of the same code - hence errors
> > opening restrt files etc.
>
>
> I did check that DO_PARALLEL is not set, but it still fails:
>
>
> cd amoeba_wat1; ./Run.amoeba_wat1
>
> Unit 16 Error on OPEN:
> restrt
> ./Run.amoeba_wat1: Program error
> make: *** [test.sander.AMOEBA] Error 1
>
>
> With that, I'm still mostly concerned with the parallel version..
>
>
> >
> > Sequence of actions to compile parallel Amber (after
> > patching the source):
> >
> > [sasha_at_abicluster src]$ ./configure -opteron
> > -openmpi ifort_x86_64
> >
> > Leave out the -opteron - I don't think it does
> > anything with ifort anyway.
> >
> > After this, I edit config.h to replace ifort with
> > mpif90 in FC and LOAD flags. It doesn't compile
> > without it, and it might be useful to have a note
> > about it in the installation instructions.
> >
> > Don't do this... It should be using mpif90 otherwise
> > you will be missing all sorts of library files that
> > are needed. You shouldn't need to edit the config.h
> > file at all. What is the problem when you do 'make
> > parallel' with mpif90 in the config.h file? I assume
> > mpif90 exists in your path and picks up the correct
> > compiler?
>
>
> If I don't put mpif90 in the config.h file (instead of ifort), the
> parallel compilation fails immediately with these errors (static or
> not):
>
>
> ifort -c -w95 -mp1 -O0 -FR -o evb_init.o _evb_init.f
> fortcom: Error: _evb_init.f, line 171: Cannot open include
> file 'mpif-common.h'
> include 'mpif-common.h'
> --------------^
> fortcom: Error: _evb_init.f, line 321: This name does not have
> a type, and must have an explicit type. [MPI_INTEGER]
> call mpi_bcast ( ndim, 1, MPI_INTEGER, 0, commworld, ierr )
> -----------------------------^
> fortcom: Error: _evb_init.f, line 361: This name does not have
> a type, and must have an explicit type.
> [MPI_DOUBLE_PRECISION]
> call mpi_bcast ( xdat_dia(n)% q, ndim,
> MPI_DOUBLE_PRECISION, 0, commworld, ierr )
> ------------------------------------------------^
> fortcom: Error: _evb_init.f, line 366: This name does not have
> a type, and must have an explicit type. [MPI_CHARACTER]
> call mpi_bcast ( xdat_dia(n)% filename, 512,
> MPI_CHARACTER, 0, commworld, ierr )
> ------------------------------------------------------^
> compilation aborted for _evb_init.f (code 1)
> make[1]: *** [evb_init.o] Error 1
> make[1]: Leaving directory `/data/amber9/src/sander'
> make: *** [parallel] Error 2
>
>
> So the problem is having "ifort" in the config.h, and is fixed (or,
> should I say, worked around) by replacing it with "mpif90".
>
>
> >
> > make parallel creates the executables, but make
> > test.parallel fails with this error:
> >
> > You are linking dynamically here I assume?
>
>
> Yes, as I said before, -static always fails.
>
>
>
> >
> > cd cytosine; ./Run.cytosine
> > /data/amber9/exe/sander.MPI: error while loading
> > shared libraries: libmkl_lapack.so: cannot open
> >
> > This implies that the environment is someway
> > different on different nodes. Typically this happens
> > in parallel when you set some environment variables
> > on one node but the other node (which is also
> > running part of the mpi code) doesn't inherit these
> > - hence it doesn't know where to look for the mkl
> > libraries. Typically the simplest solution here is
> > to try and compile statically and then you don't
> > need to worry about it.
>
> Well, that's the thing - static compilation ALWAYS fails.
>
> I run the test immediately after the compilation on the same system,
> so there are no differences in the environment. I'm not even getting
> to running it on multiple nodes, since it fails the post-compilation
> test on the compilation node..
>
>
> >
> > Otherwise you will need to tweak things like the
> > default .profile or .bashrc so that something like
> > 'mpirun -np 4 env' returns you the same thing from
> > all nodes. Normally static linking (if you can do
> > it) avoids this hassle though.
> >
> >
> > The strange thing is that libmkl_lapack.so is located in the
> > directory that was happily noticed by the ./configure
> > script. Same error is thrown when sander.MPI is attempted to
> > run with one of the test cases from the Amber tutorial
> > (which is kind of expected after the test error).
> >
> > It is very possible that the mpirun command (even if
> > you run everything on the same physical node you are
> > compiling on) is invoking a new shell and not picking up the
> > correct paths. Try editing /etc/bashrc on all nodes so they
> > source the compiler and mkl environment setup scripts on
> > login.
> >
> > Compilation with -static flag fails invariably with the
> > following message:
> > ld: cannot find -lmpi_f90
> > make[1]: *** [sander.MPI] Error 1
> > make[1]: Leaving directory `/data/amber9/src/sander'
> > make: *** [parallel] Error 2
> >
> > I would hope this would go away with using mpif90 - although
> > maybe not if no static library is available for openmpi.
> > There should be a way to build a statically linkable openmpi
> > (I do it all the time with mpich2 without problems). So you
> > could try that. Although I would first look at making sure
> > the environment gets inherited correctly on all nodes under
> > an mpirun.
>
>
> Once again, compiling without the -static flag succeeds but it fails
> at the "make test.parallel". I don't care so much about the static
> compilation, so this isn't a big deal. Just as long as I could get a
> paralllel version with MKL to work..
>
> Finally, when I run "mpirun -np 4 env", the output contains this line:
>
> LD_LIBRARY_PATH=/opt/intel/mkl/10.0.1.014/em64t/lib:/data/openmpi/include:/data/openmpi/lib:/opt/intel/fce/10.1.012/lib:/opt/intel/cce/10.1.012/lib
>
> So the library directory is seen, but why it can't locate the
> libraries at runtime, is still a mystery...
>
> Let me know if you can think of any reasons for this.
>
> Thanks
>
> Sasha
>
>
>
>
> >
> > All the best
> > Ross
> > /\
> > \/
> > |\oss Walker
> >
> > | Assistant Research Professor |
> > | San Diego Supercomputer Center |
> > | Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
> > | http://www.rosswalker.co.uk | PGP Key available on request
> > |
> >
> > Note: Electronic Mail is not secure, has no guarantee of
> > delivery, may not be read every day, and should not be used
> > for urgent or sensitive issues.
> >
> >

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu