This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: dlopen_from()
On 2020-01-23 15:33, Carlos O'Donell wrote:
On Thu, Jan 23, 2020 at 10:28 AM Nick Barnes via libc-help
<libc-help@sourceware.org> wrote:
The semantics of dlopen() depend on which shared object the calling
function is in (RUNPATH, RPATH, ORIGIN, etc). This makes it difficult
for shim libraries (using LD_PRELOAD) to wrap calls to dlopen(). There's
no documented way to get at the underlying functionality (the actual
implementing function, dl_open, which takes a caller address). I find
myself digging through the libc source code, trying to fake up internal
data structures which will allow me to fool dlopen() that I'm calling it
from some other shared library. The only alternative seems to be some
sort of ROP attack.
Can you provide a concrete example of a shim library that doesn't work
and how this would solve the problem?
The concrete example I'm familiar with (our Breeze and Mistral products)
are proprietary, so I can't share sources of them, but I expect this to
be a problem for anyone trying to wrap dlopen(), and likely to become
more common as application and library developers become more and more
conscious of dependency versioning and reproducibility (and so
increasingly likely to set RPATH or RUNPATH).
Applications and environments which want to nail down their library
dependency versions often ship with binary libraries and use RPATH or
RUNPATH to ensure they are the ones loaded. The same is true of
third-party libraries loaded by those applications. So it's not
surprising when (say) Python 3.7 installed by Anaconda has an RPATH in
its binary, which it uses to load libraries such as PyTorch 1.2.0, which
in turn has a (different) RPATH, which uses $ORIGIN to make sure that it
gets its own binary shipped libraries. Any system which uses LD_PRELOAD
to wrap dlopen(), for instance to identify and catalogue dynamic
dependencies when planning application migration into a container, will
get into trouble here. The PyTorch library calls dlopen(), which the
dynamic linker has resolved (thanks to LD_PRELOAD) to the wrapper in the
LD_PRELOAD library. The wrapper runs, and calls dlopen() itself, needing
to record information about the call and the result, but Glibc/libDL
cannot find the library because it doesn't have the RPATH. The wrapper
could try to fake it (by digging through the ELF header of the calling
library, finding the RPATH and RUNPATH, inferring the ORIGIN and
PLATFORM, and faking the search path used by dlopen), but this seems
like a lot of work and is a reimplementation of large parts of
glibc/elf/dl-open.c, so would have to keep pace with any future changes
there.
Inevitably there are ways to get around this, but they are pretty
fragile. I've spent several days implementing three different ones.
The semantics of dlopen() explicitly depend on the calling function, so
inevitably the first thing the implementation does is obtain the return
address and call another function with the original dlopen() arguments
and that return address. My proposal is to expose (and rename) that
other function. So the maintenance burden should be fairly low.
That depends largely on convincing maintainers that your new API has a
use case that users care about and is worth maintaining forever.
Yes, a tough sell. But if such an API existed, we would certainly use
it, and so would anyone else serious about wrapping dlopen(). So that's
a potential community of maintainers right there.
--
*Nick Barnes*
Senior Software Developer
Ellexus is the I/O profiling company.
www.ellexus.com <http://www.ellexus.com>
Ellexus Ltd is a limited company registered in England & Wales
Company registration no. 07166034
Registered address: 198 High Street, Tonbridge, Kent TN9 1BE, UK
Operating address: St John's Innovation Centre, Cowley Road, Cambridge
CB4 0WS, UK