AMBER Archive (2005)

Subject: Re: AMBER: problems with running sander in parallel on linux

From: Scott Brozell (sbrozell_at_scripps.edu)
Date: Tue Feb 15 2005 - 16:02:56 CST


Hi,

Yes, this is messy stuff and befriending a sys admin might be worthwhile ;-)
Reducing these LD_LIBRARY_PATH problems should make life easier for
Amber users and reduce reflector bandwidth. All the tools discussed
are worthwhile. I give compiler flags in the hopes that some will
want to solve their problems on their own.

David E. Konerding has a good general user suggestioni for mpich:
 mpirun -wdir `pwd` -m machines.all -np 8 -g 2 ./pmemd -O -i mdin -p
prmtop -c inpcrd.equil -o mdout -MPDENV-
LD_LIBRARY_PATH=/home/dsd/intel_compiler/compiler80/lib

The LOADLIB approach can also work generally if one can rsh or ssh
to the nodes; for example,
rsh big_cluster_node1 locate libimf.so
rsh big_cluster_node2 locate libimf.so
...

Then all these paths can be put into -rpath on the config.h LOADLIB
and a sander built that does not require LD_LIBRARY_PATH.
Tedious but effective ...
Note if locate is unavailable then one can try find / -name libimf.so -print
(that might get a sys admins attention, so be prepared to sweet talk ;-)

Scott

On Tue, 15 Feb 2005, Robert Duke wrote:

> Scott -
> Okay, a few more responses :-)
> 1) I indeed did not check whether LD_LIBRARY_PATH is actually USED by the
> compiler; I only said it is a mechanism used by ld; it IS however set by the
> compiler initialilization scripts to point at /opt/intel... or whatever.
> The important point was that it is possible to get the dynamic loading to
> work without it, since I dynamically load on nodes where it is not set.
> 2) I agree that getting the libraries in a consistent location and using
> ldconfig require having root privileges, which may sometimes be a problem.
> Make friends with your sys admin ;-)
> 3) While I think that the LOADLIB solution below has merit, it is still
> subject to the headache of having to get all the intel libraries visible in
> the same place; a system administration issue unless you have access to an
> nfs sharepoint visible by the same name from all nodes.
> 4) Ultimately, it would be very nice if we could go back to statically
> linked exe's - it is the most trouble-free solution as long as you are not
> sucking in broken parts of the system (threads), and I think anyone that
> grouses about executable size or shared memory usage is worrying about
> things that don't matter for this application.
> Apologies to the user base on all this; we got hammered by redhat.
> Regards -Bob
>
> ----- Original Message -----
> From: "Scott Brozell" <sbrozell_at_scripps.edu>
> To: <amber_at_scripps.edu>
> Sent: Tuesday, February 15, 2005 4:01 PM
> Subject: Re: AMBER: problems with running sander in parallel on linux
>
>
> > Hi,
> >
> > This page is a decent presentation of some LD_LIBRARY_PATH issues
> > http://www.visi.com/~barr/ldpath.html
> >
> > Based on it, we should probably be doing more to set the runtime
> > library search path.
> >
> > Piotr, I added this to the config.h produced by configure ifort:
> > LOADLIB= -Wl,-rpath /opt/intel_fc_80/lib
> >
> > The resulting sander ran without setting LD_LIBRARY_PATH.
> >
> > We should investigate getting the right paths as arguments to -rpath
> > in our configure script. However, even merely using -rpath with
> > the canonical path is a big step.
> >
> > A few responses below ...
> >
> > On Tue, 15 Feb 2005, Robert Duke wrote:
> >
> >> Okay, guys, I am a little fuzzy on this stuff, but as I recollect:
> >> 1) LD_LIBRARY_PATH is first critical during linking, and only used in
> >> loading (by ld.so) when the information needed to find a shared object is
> >> not present in the executable.
> >
> > ifort does not use LD_LIBRARY_PATH to find libs in /opt/intel_fc_80/lib
> > Using ifort -v shows that ifort invokes ld with -L/opt/intel_fc_80/lib.
> >
> > Using -Wl,-verbose,-M as LOADLIB gets the details on ld's actions, eg:
> >
> > attempt to open /opt/intel_fc_80/lib/libm.so failed
> > attempt to open /opt/intel_fc_80/lib/libm.a failed
> > attempt to open /usr/lib/libm.so succeeded
> >
> > ifort -V
> > Intel(R) Fortran Compiler for 32-bit applications, Version 8.0 Build
> > 20031016Z Package ID: l_fc_p_8.0.034
> >
> >> 2) I actually don't set LD_LIBRARY_PATH at all on the machines other than
> >> the compile machine; I just make sure that all the machines have a
> >> version
> >> of the intel compiler libraries visible in the same place (via the same
> >> path).
> >
> > Consistency counts, but a general user may not have control over that.
> >
> >> 3) It may be necessary to get the path to the ifort libs into
> >> /etc/ld.so.cache by using ldconfig. Please read the man pages for
> >> ldconfig
> >> (the dynamic linking config tool), ld.so (the dynamic loader), and ld
> >> (the
> >> linker) for clarifications on all this wonderful stuff.
> >
> > I cannot execute ldconfig as non-root
> >
> > ldconfig: Can't create temporary cache file /etc/ld.so.cache~: Permission
> > denied
> >
> > Thus, this also may not be a general solution.
> >
> > Scott
> >
> >> 4) I WOULD be 100% behind making everything static, but there were huge
> >> problems with statically linked executables on RH 3 a while back,
> >> associated
> >> with the fact that the static threads libraries were essentially broke
> >> (whereas the dynamic loaded ones are okay). One could dink around with
> >> loader options and perhaps manage to dynamically load only certain
> >> things,
> >> but I never got to the bottom of the problem, and the OS and compiler
> >> were
> >> both moving targets.
> >>
> >> Bottom line -
> >> 1) I continue to load everything dynamically because I have not sorted
> >> out
> >> this grief.
> >> 2) I use LD_LIBRARY_PATH in compiling, but also use ldconfig to insure
> >> the
> >> correct library paths are in /etc/ld.so.conf. Please read the ldconfig
> >> man
> >> page.
> >> 3) I insure the intel libraries are visible by the same path. I have no
> >> idea whether 2 + 3 are both necessary; I would not think so, but it
> >> works.
> >>
> >> 4) Another common environment gotcha, especially for pmemd, but
> >> potentially
> >> for sander, is the need to put "limit stacksize unlimited" in your .login
> >> and .cshrc files (it should only be needed in the first, but it IS needed
> >> in
> >> the second; I would regard this as a linux bug).
> >>
> >> I think I wrote up some stuff that is out there on the amber web page
> >> about
> >> all this stuff; David Konerding dug deeper than I did and there was some
> >> mail on the reflector; I used to be fascinated by all this stuff when I
> >> was
> >> a systems guy, but got over it and just want to get the job done.
> >>
> >> Regards - Bob
> >
> > Date: Tue, 15 Feb 2005 11:42:45 -0800
> > From: Scott Brozell <sbrozell_at_scripps.edu>
> > Reply-To: amber_at_scripps.edu
> > To: amber_at_scripps.edu
> > Subject: Re: AMBER: problems with running sander in parallel on linux
> >
> > Hi,
> >
> > Verify that LD_LIBRARY_PATH is the problem:
> > copy libimf.so to your working directory and rerun mpirun.
> > Use the -v option to mpirun to get more details.
> >
> > Try a different libimf.so; copy one from another machine.
> > Here are libimf.so details:
> >
> > 1504 -rwxr-xr-x 1 root root 1533098 Oct 29 2003 libimf.so
> >
> > Intel(R) Fortran Compiler for 32-bit applications, Version 8.0 Build
> > 20031016Z Package ID: l_fc_p_8.0.034
> >
> >
> > What happens with the serial sander ?
> > What ifort version are you using ?
> >
> > good luck,
> > Scott Brozell
> >
> >
> >> ----- Original Message -----
> >> From: "Ross Walker" <ross_at_rosswalker.co.uk>
> >> To: <amber_at_scripps.edu>
> >> Sent: Tuesday, February 15, 2005 1:50 PM
> >> Subject: RE: AMBER: problems with running sander in parallel on linux
> >>
> >>
> >> > Hi Piotr,
> >> >
> >> >> I compiled mpi sander on my linux box using ifort intel compiler
> >> >> (under most recent mpi/mpd).
> >> >> When I tried to run it (in this case on 2 processors) I got
> >> >> the following error message:
> >> >>
> >> >> sander: error while loading shared libraries: libimf.so:
> >> >> cannot open shared object file: No such file or directory
> >> >> sander: error while loading shared libraries: libimf.so:
> >> >> cannot open shared object file: No such file or directory
> >> >
> >> > What mpi wrapper are you using? It is possible that the mpirun command
> >> > is
> >> > running rsh to your machine and this is not sourcing your login scripts
> >> > correctly. Hence it doesn't pick up the LD_LIBRARY_PATH and so can't
> >> > find
> >> > the libraries. See if you can find out how your mpirun is spawning
> >> > processes
> >> > and in particular find out what shell it is running. The default shell
> >> > may
> >> > be different to "your" shell.
> >> >
> >> > The best solution to this is to compile everything static. You can
> >> > compile
> >> > a
> >> > static version of lam by adding the flag -all-static to the configure
> >> > script. Then recompile lam and also compile amber statically. You can
> >> > also
> >> > compile mpich statically but I can't remember the options for this.
> > -----------------------------------------------------------------------
> > The AMBER Mail Reflector
> > To post, send mail to amber_at_scripps.edu
> > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
> >
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu