AMBER Archive (2009)
Subject: Re: [AMBER] MKL error ?
From: Marek Malý (maly_at_sci.ujep.cz)
Date: Thu Feb 05 2009 - 14:00:10 CST
thank you very much for your time and an excellent and comprehensible
Dne Thu, 05 Feb 2009 19:29:45 +0100 Ross Walker <ross_at_rosswalker.co.uk>
> Hi Marek,
>> your solution "OMP_NUM_THREADS=1" is working !
>> When I wrote "OMP_NUM_THREADS=1" and ofcourse "export OMP_NUM_THREADS"
>> commandline before starting amber tests, all four tests
>> test.parallel, test.parallel.QMMM" passed without any problems !
>> It seems to me that the MKL problem is probably mainly in connection
>> sander when igb=1,
>> because I have compilled NAB with -openmp flag and I can use it without
>> any problems for example
>> with OMP_NUM_THREADS=8. I just tested it on the common normal mode
>> analysis program:
> The issue comes about because versions of MKL 10.0 and onwards contain
> parallelization using openMP threads. Not all of the MKL routines include
> this so not everything is affected. GB in sander is affected because it
> makes extensive use of calls to MKL (mainly the vector functions). The
> is true for QMMM where it uses the matrix diagonalizers in MKL. This of
> course can cause problems since the sander code uses MPI for
> parallelization. For example consider a job running with 4 mpi threads
> on a
> quad core machine. The code internally might call a vector exponential
> routine. The MPI code would issue 4 calls to vexp with a quarter of the
> array each time. Hence each processor works on a quarter of the array.
> then the Intel MKL openMP would kick in and say fire up 4 threads for
> call to vexp because it thinks you have 4 cpus available (it knows
> about the other MPI threads) the net result is you get 16 threads
> running on
> 4 processors that all thrash like crazy and your net performance goes
> Some argue that what you should do is run 1 MPI process per node and then
> ncores openmp threads within a node - so called hybrid programming that
> supposed to be the holy grail for multicore chips and save us all but in
> practice it doesn't really work.
> However, it is useful to have the openMP MKL available since some things
> like the dsyevr diagonalizer is openMP parallel in MKL but is not MPI
> parallel in the code. Only the master thread calls the diagonalization
> all other threads block at that point. Hence with OMP_NUM_THREADS set to
> the code would idle 3 of the MPI processes at the diagonalization and in
> their place a total of 4 openMP threads get spawned by the master. This
> works well in some cases and badly in others, mainly dependent on how the
> MPI implementation does blocking. If it just spins the processors and
> checking interupts then sitting at a barrier takes 100% of the processors
> time so it can't execute the openMP thread instead.
> This essentially explains why only specific test cases fail. The
> question is
> though why they fail. My understanding was that if omp_num_threads was
> that it would default to 1 in the MKL code. And indeed I think this is
> happens if you link statically - it can equally though default to the
> of cpus you have which would be bad (I think we should probably update
> amber code to have you specify it in the mdin file and it override any
> environment variable). However, Intel has this terrible habit of
> changing the interfaces to their compilers and MKL libraries so every new
> version behaves in a different way. I should probably read through the
> massive MKL manual for 10.1 at some point and I assume some explanation
> how openmp threading is handled is in there but then come 10.2 it will
> all changed again :-(. The simplest approach for the moment is to force
> omp_num_threads to 1. However, I really think it is a bug in MKL because
> you compile statically there is no problem, if you set omp_num_threads =
> there is no problem and if you set it to 2 there is no problem but if it
> unset and you linked dynamically it crashes - but only on specific MKL
> routines. Hence it has to be an issue within the MKL code that does the
> wrong thing if omp_num_threads is not set.
>> And I saw clearly that in the case of conjgrad() the application used
>> really 8 CPUs.
>> No problems appeard even if I turned gb=1 !
> Yes I believe that the openmp in here is all hand coded in NAB and it
> doesn't use MKL with its 'implicit' openmp. Thus you don't see the
> in here. In sander there is no openmp parallelization (only MPI) so the
> openmp threads that get created are internally within MKL threads and
> the issue is within MKL itself hence why the problem is seen only in
>> Can you say in very short why this your solution works and prevent
>> Amber/Sander from
>> the error ?
> Okay in very short - "Because Intel is for some strange reason not
> with the situation where omp_num_threads is not set so 'perhaps' setting
> to 1 forces it to skip the call to the non-existent routine in mkl."
> My advice would be though that on any machine you ever use these days you
> should ALWAYS hard wire omp_num_threads to 1 this way you always know
> it is doing since lots of libraries are starting to include openmp and it
> can cause all sorts of problems if you are running MPI jobs and don't
> it. Then if you specifically want to run multiple openMP threads you
> manually set omp_num_threads in the script that runs that job. The
> behavior when omp_num_threads is not consistent hence the problems.
>> MLK func load error: /opt/intel/mkl/10.0.011/lib/em64t/libmkl_vml_mc.so:
>> undefined symbol: vmlGetErrorCallBack
> I will probably pass this along to some friends I have at Intel when I
> get a
> chance and see if they can confirm the problem / comment on it.
>> Of course that I also tried recommended static compilation since it
>> to me as a little
>> "cleaner" solution than #1. But unfortunately after I succesfully
>> compilled AmberTools and
>> Amber in serial with -static flag I finally got in troubles during
>> of PARALLEL version of Amber - please see the errors below:
>> /opt/intel/impi/3.1/lib64/libmpiif.a(allgathervf.o): In function
>> allgathervf.c:(.text+0x63): undefined reference to `PMPI_Allgatherv'
>> of course that I tried "make clean" before this compilation but it
>> does'n work :((
> This is because your mpi implementation was not built statically as well
> only shared objects are available so when it tries to link things it
> find the static versions of the libraries. The solution is to recompile
> MPI and have that link statically. How to do this varies by MPI
> implementation but it is often an argument you give to configure. For
> example with mpich2 you set the environment variable LDFLAGS=-static and
> CXXLDFLAGS=-static before you run the configure script. Then you should
> able to build AMBER in parallel statically and link it against the static
> Sorry if this email was a bit long and rambling but hopefully it explains
> what is going on. I haven't fully characterized the problem myself yet
> even how to deal with the openmp within mkl correctly yet - i.e. to have
> turned off in some places (such as vexp calls) but have it on in others
> as dsyevr calls).
> All the best
> |\oss Walker
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
> Note: Electronic Mail is not secure, has no guarantee of delivery, may
> be read every day, and should not be used for urgent or sensitive issues.
> AMBER mailing list
> __________ Informace od NOD32 3830 (20090205) __________
> Tato zprava byla proverena antivirovym systemem NOD32.
Tato zpráva byla vytvořena převratným poštovním klientem Opery:
AMBER mailing list