AMBER Archive (2009)

Subject: Re: [AMBER] MKL error ?

From: Marek Malý (maly_at_sci.ujep.cz)
Date: Thu Feb 05 2009 - 14:00:10 CST


Dear Ross,
thank you very much for your time and an excellent and comprehensible
explanations !

     Marek

Dne Thu, 05 Feb 2009 19:29:45 +0100 Ross Walker <ross_at_rosswalker.co.uk>
napsal/-a:

> Hi Marek,
>
>> #1
>> your solution "OMP_NUM_THREADS=1" is working !
>>
>> When I wrote "OMP_NUM_THREADS=1" and ofcourse "export OMP_NUM_THREADS"
>> on
>> the
>> commandline before starting amber tests, all four tests
>> "test.serial,test.serial.QMMM,
>> test.parallel, test.parallel.QMMM" passed without any problems !
>>
>> It seems to me that the MKL problem is probably mainly in connection
>> with
>> sander when igb=1,
>> because I have compilled NAB with -openmp flag and I can use it without
>> any problems for example
>> with OMP_NUM_THREADS=8. I just tested it on the common normal mode
>> analysis program:
>
> The issue comes about because versions of MKL 10.0 and onwards contain
> parallelization using openMP threads. Not all of the MKL routines include
> this so not everything is affected. GB in sander is affected because it
> makes extensive use of calls to MKL (mainly the vector functions). The
> same
> is true for QMMM where it uses the matrix diagonalizers in MKL. This of
> course can cause problems since the sander code uses MPI for
> parallelization. For example consider a job running with 4 mpi threads
> on a
> quad core machine. The code internally might call a vector exponential
> routine. The MPI code would issue 4 calls to vexp with a quarter of the
> array each time. Hence each processor works on a quarter of the array.
> But
> then the Intel MKL openMP would kick in and say fire up 4 threads for
> each
> call to vexp because it thinks you have 4 cpus available (it knows
> nothing
> about the other MPI threads) the net result is you get 16 threads
> running on
> 4 processors that all thrash like crazy and your net performance goes
> down.
>
> Some argue that what you should do is run 1 MPI process per node and then
> ncores openmp threads within a node - so called hybrid programming that
> was
> supposed to be the holy grail for multicore chips and save us all but in
> practice it doesn't really work.
>
> However, it is useful to have the openMP MKL available since some things
> like the dsyevr diagonalizer is openMP parallel in MKL but is not MPI
> parallel in the code. Only the master thread calls the diagonalization
> and
> all other threads block at that point. Hence with OMP_NUM_THREADS set to
> 4
> the code would idle 3 of the MPI processes at the diagonalization and in
> their place a total of 4 openMP threads get spawned by the master. This
> works well in some cases and badly in others, mainly dependent on how the
> MPI implementation does blocking. If it just spins the processors and
> keeps
> checking interupts then sitting at a barrier takes 100% of the processors
> time so it can't execute the openMP thread instead.
>
> This essentially explains why only specific test cases fail. The
> question is
> though why they fail. My understanding was that if omp_num_threads was
> unset
> that it would default to 1 in the MKL code. And indeed I think this is
> what
> happens if you link statically - it can equally though default to the
> number
> of cpus you have which would be bad (I think we should probably update
> the
> amber code to have you specify it in the mdin file and it override any
> environment variable). However, Intel has this terrible habit of
> continually
> changing the interfaces to their compilers and MKL libraries so every new
> version behaves in a different way. I should probably read through the
> massive MKL manual for 10.1 at some point and I assume some explanation
> of
> how openmp threading is handled is in there but then come 10.2 it will
> have
> all changed again :-(. The simplest approach for the moment is to force
> omp_num_threads to 1. However, I really think it is a bug in MKL because
> if
> you compile statically there is no problem, if you set omp_num_threads =
> 1
> there is no problem and if you set it to 2 there is no problem but if it
> is
> unset and you linked dynamically it crashes - but only on specific MKL
> routines. Hence it has to be an issue within the MKL code that does the
> wrong thing if omp_num_threads is not set.
>
>> ---------------
>> conjgrad()
>> newton()
>> nmode()
>> --------------
>>
>> And I saw clearly that in the case of conjgrad() the application used
>> really 8 CPUs.
>> No problems appeard even if I turned gb=1 !
>
> Yes I believe that the openmp in here is all hand coded in NAB and it
> doesn't use MKL with its 'implicit' openmp. Thus you don't see the
> problem
> in here. In sander there is no openmp parallelization (only MPI) so the
> only
> openmp threads that get created are internally within MKL threads and
> since
> the issue is within MKL itself hence why the problem is seen only in
> sander.
>
>> Can you say in very short why this your solution works and prevent
>> Amber/Sander from
>> the error ?
>
> Okay in very short - "Because Intel is for some strange reason not
> dealing
> with the situation where omp_num_threads is not set so 'perhaps' setting
> it
> to 1 forces it to skip the call to the non-existent routine in mkl."
>
> My advice would be though that on any machine you ever use these days you
> should ALWAYS hard wire omp_num_threads to 1 this way you always know
> what
> it is doing since lots of libraries are starting to include openmp and it
> can cause all sorts of problems if you are running MPI jobs and don't
> expect
> it. Then if you specifically want to run multiple openMP threads you
> manually set omp_num_threads in the script that runs that job. The
> default
> behavior when omp_num_threads is not consistent hence the problems.
>
>> MLK func load error: /opt/intel/mkl/10.0.011/lib/em64t/libmkl_vml_mc.so:
>> undefined symbol: vmlGetErrorCallBack
>
> I will probably pass this along to some friends I have at Intel when I
> get a
> chance and see if they can confirm the problem / comment on it.
>
>> #2
>>
>> Of course that I also tried recommended static compilation since it
>> seems
>> to me as a little
>> "cleaner" solution than #1. But unfortunately after I succesfully
>> compilled AmberTools and
>> Amber in serial with -static flag I finally got in troubles during
>> static
>> compilation
>> of PARALLEL version of Amber - please see the errors below:
>>
>>
>> /opt/intel/impi/3.1/lib64/libmpiif.a(allgathervf.o): In function
>> `MPI_ALLGATHERV':
>> allgathervf.c:(.text+0x63): undefined reference to `PMPI_Allgatherv'
>
>> of course that I tried "make clean" before this compilation but it
>> simply
>> does'n work :((
>
> This is because your mpi implementation was not built statically as well
> so
> only shared objects are available so when it tries to link things it
> can't
> find the static versions of the libraries. The solution is to recompile
> the
> MPI and have that link statically. How to do this varies by MPI
> implementation but it is often an argument you give to configure. For
> example with mpich2 you set the environment variable LDFLAGS=-static and
> CXXLDFLAGS=-static before you run the configure script. Then you should
> be
> able to build AMBER in parallel statically and link it against the static
> MPI.
> Sorry if this email was a bit long and rambling but hopefully it explains
> what is going on. I haven't fully characterized the problem myself yet
> (or
> even how to deal with the openmp within mkl correctly yet - i.e. to have
> it
> turned off in some places (such as vexp calls) but have it on in others
> such
> as dsyevr calls).
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may
> not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
> _______________________________________________
> AMBER mailing list
> AMBER_at_ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
> __________ Informace od NOD32 3830 (20090205) __________
>
> Tato zprava byla proverena antivirovym systemem NOD32.
> http://www.nod32.cz
>
>

-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/

_______________________________________________ AMBER mailing list AMBER_at_ambermd.org http://lists.ambermd.org/mailman/listinfo/amber