| AMBER Archive (2005)Subject: Re: AMBER: problems with running sander in parallel on linux
From: Scott Brozell (sbrozell_at_scripps.edu)Date: Tue Feb 15 2005 - 16:02:56 CST
 
 
 
 
Hi,
 Yes, this is messy stuff and befriending a sys admin might be worthwhile ;-)
Reducing these LD_LIBRARY_PATH problems should make life easier for
 Amber users and reduce reflector bandwidth.  All the tools discussed
 are worthwhile.  I give compiler flags in the hopes that some will
 want to solve their problems on their own.
 
 David E. Konerding has a good general user suggestioni for mpich:
mpirun -wdir `pwd` -m machines.all -np 8 -g 2 ./pmemd -O -i mdin -p
 prmtop -c inpcrd.equil -o mdout -MPDENV-
 LD_LIBRARY_PATH=/home/dsd/intel_compiler/compiler80/lib
 
 The LOADLIB approach can also work generally if one can rsh or ssh
to the nodes; for example,
 rsh big_cluster_node1 locate libimf.so
 rsh big_cluster_node2 locate libimf.so
 ...
 
 Then all these paths can be put into -rpath on the config.h LOADLIB
and a sander built that does not require LD_LIBRARY_PATH.
 Tedious but effective ...
 Note if locate is unavailable then one can try find / -name libimf.so -print
 (that might get a sys admins attention, so be prepared to sweet talk ;-)
 
 Scott
 On Tue, 15 Feb 2005, Robert Duke wrote:
 > Scott -
> Okay, a few more responses :-)
 > 1) I indeed did not check whether LD_LIBRARY_PATH is actually USED by the
 > compiler; I only said it is a mechanism used by ld; it IS however set by the
 > compiler initialilization scripts to point at /opt/intel... or whatever.
 > The important point was that it is possible to get the dynamic loading to
 > work without it, since I dynamically load on nodes where it is not set.
 > 2) I agree that getting the libraries in a consistent location and using
 > ldconfig require having root privileges, which may sometimes be a problem.
 > Make friends with your sys admin ;-)
 > 3) While I think that the LOADLIB solution below has merit, it is still
 > subject to the headache of having to get all the intel libraries visible in
 > the same place; a system administration issue unless you have access to an
 > nfs sharepoint visible by the same name from all nodes.
 > 4) Ultimately, it would be very nice if we could go back to statically
 > linked exe's - it is the most trouble-free solution as long as you are not
 > sucking in broken parts of the system (threads), and I think anyone that
 > grouses about executable size or shared memory usage is worrying about
 > things that don't matter for this application.
 > Apologies to the user base on all this; we got hammered by redhat.
 > Regards -Bob
 >
 > ----- Original Message -----
 > From: "Scott Brozell" <sbrozell_at_scripps.edu>
 > To: <amber_at_scripps.edu>
 > Sent: Tuesday, February 15, 2005 4:01 PM
 > Subject: Re: AMBER: problems with running sander in parallel on linux
 >
 >
 > > Hi,
 > >
 > > This page is a decent presentation of some LD_LIBRARY_PATH issues
 > > http://www.visi.com/~barr/ldpath.html
 > >
 > > Based on it, we should probably be doing more to set the runtime
 > > library search path.
 > >
 > > Piotr, I added this to the config.h produced by configure ifort:
 > > LOADLIB= -Wl,-rpath /opt/intel_fc_80/lib
 > >
 > > The resulting sander ran without setting LD_LIBRARY_PATH.
 > >
 > > We should investigate getting the right paths as arguments to -rpath
 > > in our configure script.  However, even merely using -rpath with
 > > the canonical path is a big step.
 > >
 > > A few responses below ...
 > >
 > > On Tue, 15 Feb 2005, Robert Duke wrote:
 > >
 > >> Okay, guys, I am a little fuzzy on this stuff, but as I recollect:
 > >> 1) LD_LIBRARY_PATH is first critical during linking, and only used in
 > >> loading (by ld.so) when the information needed to find a shared object is
 > >> not present in the executable.
 > >
 > > ifort does not use LD_LIBRARY_PATH to find libs in /opt/intel_fc_80/lib
 > > Using ifort -v shows that ifort invokes ld with -L/opt/intel_fc_80/lib.
 > >
 > > Using -Wl,-verbose,-M as LOADLIB gets the details on ld's actions, eg:
 > >
 > > attempt to open /opt/intel_fc_80/lib/libm.so failed
 > > attempt to open /opt/intel_fc_80/lib/libm.a failed
 > > attempt to open /usr/lib/libm.so succeeded
 > >
 > > ifort -V
 > > Intel(R) Fortran Compiler for 32-bit applications, Version 8.0   Build
 > > 20031016Z Package ID: l_fc_p_8.0.034
 > >
 > >> 2) I actually don't set LD_LIBRARY_PATH at all on the machines other than
 > >> the compile machine; I just make sure that all the machines have a
 > >> version
 > >> of the intel compiler libraries visible in the same place (via the same
 > >> path).
 > >
 > > Consistency counts, but a general user may not have control over that.
 > >
 > >> 3) It may be necessary to get the path to the ifort libs into
 > >> /etc/ld.so.cache by using ldconfig.  Please read the man pages for
 > >> ldconfig
 > >> (the dynamic linking config tool), ld.so (the dynamic loader), and ld
 > >> (the
 > >> linker) for clarifications on all this wonderful stuff.
 > >
 > > I cannot execute ldconfig as non-root
 > >
 > > ldconfig: Can't create temporary cache file /etc/ld.so.cache~: Permission
 > > denied
 > >
 > > Thus, this also may not be a general solution.
 > >
 > > Scott
 > >
 > >> 4) I WOULD be 100% behind making everything static, but there were huge
 > >> problems with statically linked executables on RH 3 a while back,
 > >> associated
 > >> with the fact that the static threads libraries were essentially broke
 > >> (whereas the dynamic loaded ones are okay).  One could dink around with
 > >> loader options and perhaps manage to dynamically load only certain
 > >> things,
 > >> but I never got to the bottom of the problem, and the OS and compiler
 > >> were
 > >> both moving targets.
 > >>
 > >> Bottom line -
 > >> 1) I continue to load everything dynamically because I have not sorted
 > >> out
 > >> this grief.
 > >> 2) I use LD_LIBRARY_PATH in compiling, but also use ldconfig to insure
 > >> the
 > >> correct library paths are in /etc/ld.so.conf.  Please read the ldconfig
 > >> man
 > >> page.
 > >> 3) I insure the intel libraries are visible by the same path.  I have no
 > >> idea whether 2 + 3 are both necessary; I would not think so, but it
 > >> works.
 > >>
 > >> 4) Another common environment gotcha, especially for pmemd, but
 > >> potentially
 > >> for sander, is the need to put "limit stacksize unlimited" in your .login
 > >> and .cshrc files (it should only be needed in the first, but it IS needed
 > >> in
 > >> the second; I would regard this as a linux bug).
 > >>
 > >> I think I wrote up some stuff that is out there on the amber web page
 > >> about
 > >> all this stuff; David Konerding dug deeper than I did and there was some
 > >> mail on the reflector;  I used to be fascinated by all this stuff when I
 > >> was
 > >> a systems guy, but got over it and just want to get the job done.
 > >>
 > >> Regards - Bob
 > >
 > > Date: Tue, 15 Feb 2005 11:42:45 -0800
 > > From: Scott Brozell <sbrozell_at_scripps.edu>
 > > Reply-To: amber_at_scripps.edu
 > > To: amber_at_scripps.edu
 > > Subject: Re: AMBER: problems with running sander in parallel on linux
 > >
 > > Hi,
 > >
 > > Verify that LD_LIBRARY_PATH is the problem:
 > > copy libimf.so to your working directory and rerun mpirun.
 > > Use the -v option to mpirun to get more details.
 > >
 > > Try a different libimf.so; copy one from another machine.
 > > Here are libimf.so details:
 > >
 > > 1504 -rwxr-xr-x    1 root     root      1533098 Oct 29  2003 libimf.so
 > >
 > > Intel(R) Fortran Compiler for 32-bit applications, Version 8.0   Build
 > > 20031016Z Package ID: l_fc_p_8.0.034
 > >
 > >
 > > What happens with the serial sander ?
 > > What ifort version are you using ?
 > >
 > > good luck,
 > > Scott Brozell
 > >
 > >
 > >> ----- Original Message -----
 > >> From: "Ross Walker" <ross_at_rosswalker.co.uk>
 > >> To: <amber_at_scripps.edu>
 > >> Sent: Tuesday, February 15, 2005 1:50 PM
 > >> Subject: RE: AMBER: problems with running sander in parallel on linux
 > >>
 > >>
 > >> > Hi Piotr,
 > >> >
 > >> >> I compiled mpi sander on my linux box using ifort intel compiler
 > >> >> (under most recent mpi/mpd).
 > >> >> When I tried to run it (in this case on 2 processors) I got
 > >> >> the following error message:
 > >> >>
 > >> >> sander: error while loading shared libraries: libimf.so:
 > >> >> cannot open shared object file: No such file or directory
 > >> >> sander: error while loading shared libraries: libimf.so:
 > >> >> cannot open shared object file: No such file or directory
 > >> >
 > >> > What mpi wrapper are you using? It is possible that the mpirun command
 > >> > is
 > >> > running rsh to your machine and this is not sourcing your login scripts
 > >> > correctly. Hence it doesn't pick up the LD_LIBRARY_PATH and so can't
 > >> > find
 > >> > the libraries. See if you can find out how your mpirun is spawning
 > >> > processes
 > >> > and in particular find out what shell it is running. The default shell
 > >> > may
 > >> > be different to "your" shell.
 > >> >
 > >> > The best solution to this is to compile everything static. You can
 > >> > compile
 > >> > a
 > >> > static version of lam by adding the flag -all-static to the configure
 > >> > script. Then recompile lam and also compile amber statically. You can
 > >> > also
 > >> > compile mpich statically but I can't remember the options for this.
 > > -----------------------------------------------------------------------
 > > The AMBER Mail Reflector
 > > To post, send mail to amber_at_scripps.edu
 > > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 > >
 >
 >
 > -----------------------------------------------------------------------
 > The AMBER Mail Reflector
 > To post, send mail to amber_at_scripps.edu
 > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 >
 
 -----------------------------------------------------------------------
The AMBER Mail Reflector
 To post, send mail to amber_at_scripps.edu
 To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
 
 
 |