AMBER Archive (2009)

Subject: Re: [AMBER] Error in PMEMD run

From: Marek Malý (maly_at_sci.ujep.cz)
Date: Fri May 08 2009 - 11:38:27 CDT


Hi Bob,

I made several tests with not fully used nodes. As you can see from the
table below,
the best performance I have obtained in case of using combination of 4
nodes and 4 CPUs (from 8)
per node. This is the best combination for SANDER and also for PMEMD but
this "free CPU wasting" combination
is just a slightly better than already reported results from 2 fully used
node jobs
In case of Sander there is a little bigger difference 49s in the case of
PMEMD just 5 s.

SANDER
1 2 4 4/3cpu 4/4cpu 4/5cpu 4/6cpu 4/7cpu 2/4cpu 2/5cpu 2/6cpu 2/7cpu
197,79 186,47 218 173 137 198 220 242 196 227 244 260

PMEMD
1 2 4 4/3cpu 4/4cpu 4/5cpu 4/6cpu 4/7cpu 2/4cpu 2/5cpu 2/6cpu 2/7cpu
145 85 422 95 80 102 111 543 176 231 105 95

[Numof nodes / Num of CPUs per node]

If in the header of the table is just single number 1, 2, or 3 it means
that whole node CPUs are used,
so 1 = 1/8 cpu, 2 = 2/8cpu ...

As I have been informed we are using "Cisco InfiniBand" for nodes
communication.

#1
OK, I am not sure but regarding to calculation on 2 nodes = 16 CPUs the
performance
especially in the PMEMD case is not so wrong. Of course big problem is
scaling regarding
to more than 2 node jobs. In this situation is pretty clear that 2 nodes
are maximum
for our cluster. If you have any idea what to try (some infiniband advance
settings) to
improve scaling for more than 2 nodes, I would be very grateful for your
info.

#2
The second issue is in the light of your sooner comments of course
reliability of my actual (ifort 11) compilation.
It seems to me that it works fine but ... So how critical problems were
reported regarding to "inaccuracy" of
"ifort 11" calculation. If there were found serious problems, could you
advise me where it is possible
find your recommended 10.1.021 ifort version (if possible with relevant
MKL and cc libraries) ?

I couldn't find it on the official Intel web. It seems to me that they
have no sw archive.

Thank you very much in advance !

    Best,

      Marek

Dne Fri, 08 May 2009 02:11:25 +0200 Robert Duke <rduke_at_email.unc.edu>
napsal/-a:

> Marek,
> There has been a whole other thread running about how ifort 11, various
> versions, will hang if you try to use it to compile pmemd (actual mails
> on the reflector right around yours...). I have recommended using ifort
> 10.1.021 because I know it works fine. As far as ifort 11.*, I have no
> experience, but there are reports of it hanging (this is a compiler bug
> - the compiler is defective). I also have coworkers that have tried to
> build gaussian 03 with ifort 11.*, and it compiles, but the executables
> don't pass tests. I think German Ecklenberg (I am guessing at the name
> - I unfortunately cleaned up some mail and the may amber reflector
> postings are not available yet) did get some version of 11 to work
> (might have been 11.0.084, but we are dealing with a very dim
> recollection here), but I would still prefer to just trust 10.1.021...
> Boy, you are getting to hit all the speed bumps... These days I would
> not trust any software intel releases for about 6 months after it is
> released - let other guys do the bleeding on the bleeding edge... Ross
> concurs with me on this one.
> Best Regards - Bob
> ----- Original Message ----- From: "Marek Malý" <maly_at_sci.ujep.cz>
> To: "AMBER Mailing List" <amber_at_ambermd.org>
> Sent: Thursday, May 07, 2009 7:59 PM
> Subject: Re: [AMBER] Error in PMEMD run
>
>
> Dear Ross and Bob,
>
> first of all thank you very much for your time and effort
> which really brought good result however some problem
> is still present ...
>
> OK,
>
>
> Our admin installed today ifort version 11 including corresponding cc,
> MKL.
>
> Here is actual LD_LIBRARY_PATH :
>
> LD_LIBRARY_PATH=/opt/intel/impi/3.2.0.011/lib64:/opt/intel/mkl/10.1.0.015/lib/em64t:/opt/intel/cc/11.0.074/lib/intel64:/opt/intel/fc/11.0.074/lib/intel64::/opt/intel/impi/3.2/lib64
>
> Of course first of all I tried just to compile pmemd with this new
> settings but I didn't succeeded :((
>
> Here is my configuration statement:
>
> ./configure linux_em64t ifort intelmpi
>
>
> Compilation started fine but after some time it "stopped" it means just
> progress stopped but not
> the compilation process it means after cca 1 hour the process is still
> alive see this part
> of the "top" list:
>
> 30599 mmaly 20 0 61496 16m 7012 R 50 0.1 58:51.64 fortcom
>
>
> But almost whole hour it was got jammed here:
>
> .....
>
> runmin.f90(465): (col. 11) remark: LOOP WAS VECTORIZED.
> runmin.f90(482): (col. 3) remark: LOOP WAS VECTORIZED.
> runmin.f90(486): (col. 3) remark: LOOP WAS VECTORIZED.
> /lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
> -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC veclib.fpp veclib.f90
> ifort -c -auto -tpp7 -xP -ip -O3 veclib.f90
> ifort: command line remark #10148: option '-tp' not supported
> gcc -c pmemd_clib.c
> /lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
> -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC gb_alltasks_setup.fpp gb_alltasks_setup.f90
> ifort -c -auto -tpp7 -xP -ip -O3 gb_alltasks_setup.f90
> ifort: command line remark #10148: option '-tp' not supported
> /lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
> -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC pme_alltasks_setup.fpp pme_alltasks_setup.f90
> ifort -c -auto -tpp7 -xP -ip -O3 pme_alltasks_setup.f90
> ifort: command line remark #10148: option '-tp' not supported
> /lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
> -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC pme_setup.fpp pme_setup.f90
> ifort -c -auto -tpp7 -xP -ip -O3 pme_setup.f90
> ifort: command line remark #10148: option '-tp' not supported
> pme_setup.f90(145): (col. 17) remark: LOOP WAS VECTORIZED.
> pme_setup.f90(159): (col. 22) remark: LOOP WAS VECTORIZED.
> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
> pme_setup.f90(80): (col. 8) remark: LOOP WAS VECTORIZED.
> /lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
> -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC get_cmdline.fpp get_cmdline.f90
> ifort -c -auto -tpp7 -xP -ip -O3 get_cmdline.f90
> ifort: command line remark #10148: option '-tp' not supported
> /lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
> -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC master_setup.fpp master_setup.f90
> ifort -c -auto -tpp7 -xP -ip -O3 master_setup.f90
> ifort: command line remark #10148: option '-tp' not supported
> /lib/cpp -traditional -P -I/opt/intel/impi/3.2/include -DPUBFFT
> -DBINTRAJ
> -DMPI -DDIRFRC_EFS -DDIRFRC_COMTRANS -DDIRFRC_NOVEC -DMKL
> -DFFTLOADBAL_2PROC pmemd.fpp pmemd.f90
> ifort -c -auto -tpp7 -xP -ip -O3 pmemd.f90
> ifort: command line remark #10148: option '-tp' not supported
> <<<<-HERE IS THE LAST LINE OF THE COMPILATION PROCESS
>
> after this one hour I killed compilation and obtained this typical
> messages:
>
> make[1]: *** Deleting file `pmemd.o'
> make[1]: *** [pmemd.o] Error 2
> make: *** [install] Interrupt
>
> I really do not understand how it is possible that compiler is using 50%
> of CPU for one hour and get jammed in one line ...
>
> I have to say that compilation with old version of ifort package was
> question of some minutes.
>
> It seems to me as an typical case of "infinity" loop ...
>
> But nevertheless then I got the idea just use the old pmemd compilation
> with the new installed libraries (cc ...) and it works :)) !!!
>
> Another situation was with SANDER but after compleet recompilation of
> Amber Tools and Amber, everything is OK ( at least now :)) ).
>
> So I think that my problem is solved but still is here some strange
> question about impossiblity to finish in real time compilation
> of the PMEMD with ifort11 package. Just to be complex I have to say, that
> I tried pmemd instalation with original "configure" file
> but also with Bob's late night one. The result is the same. Fortunately
> it
> is not crucial problem for me now ...
>
> So thank you both again !!!
>
> Best,
>
> Marek
>
>
>
>
>
>
>
>
>
>
>
>
> Dne Thu, 07 May 2009 01:04:10 +0200 Robert Duke <rduke_at_email.unc.edu>
> napsal/-a:
>
>> Oh, very good find Ross; I have not had the experience of mixing these,
>> but I bet you are right! - Bob
>> ----- Original Message ----- From: "Ross Walker" <ross_at_rosswalker.co.uk>
>> To: "'AMBER Mailing List'" <amber_at_ambermd.org>
>> Sent: Wednesday, May 06, 2009 5:53 PM
>> Subject: RE: [AMBER] Error in PMEMD run
>>
>>
>>> Hi Marek,
>>>
>>>> here is the content of the LD_LIBRARY_PATH variable:
>>>>
>>>> LD_LIBRARY_PATH=/opt/intel/impi/3.1/lib64:/opt/intel/mkl/10.0.011/lib/e
>>>> m64t:/opt
>>>> /intel/cce/9.1.043/lib:/opt/intel/fce/10.1.012/lib::/opt/intel/impi/3.1
>>>> /lib64
>>>
>>> I suspect this is the origin of your problems... You have cce v9.1.043
>>> defined and fce v10.1.012 defined. I bet these are not compatible. Note
>>> there is a libsvml.so in /intel/cce/9.1.043/lib/ and this comes first
>>> in
>>> your LD path so will get picked up before the Fortran one. This is
>>> probably
>>> leading to all sorts of problems.
>>>
>>> My advice would be to remove the old cce library spec from the path so
>>> it
>>> picks up the correct libsvml. Or upgrade your cce to match the fce
>>> compiler
>>> version - this should probably always be done and I am surprised Intel
>>> let
>>> you have mixed versions this way but alas..... <sigh>
>>>
>>> All the best
>>> Ross
>>>
>>>
>>> /\
>>> \/
>>> |\oss Walker
>>>
>>> | Assistant Research Professor |
>>> | San Diego Supercomputer Center |
>>> | Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
>>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>>
>>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>>> not
>>> be read every day, and should not be used for urgent or sensitive
>>> issues.
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> AMBER mailing list
>>> AMBER_at_ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER_at_ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>> __________ Informace od NOD32 4051 (20090504) __________
>>
>> Tato zprava byla proverena antivirovym systemem NOD32.
>> http://www.nod32.cz
>>
>>
>

-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/

_______________________________________________ AMBER mailing list AMBER_at_ambermd.org http://lists.ambermd.org/mailman/listinfo/amber