Bug 15971

Summary: No interface for debugger access to libraries loaded with dlmopen
Product: glibc Reporter: Gary Benson <gbenson>
Component: dynamic-linkAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: enhancement CC: carlos, codonell, eblake, fche, fweimer, jan, mathieu.lacage, orion, stsp, tromey, woodard
Priority: P2 Flags: fweimer: security-
Version: 2.34   
Target Milestone: 2.35   
Host: Target:
Build: Last reconfirmed:

Description Gary Benson 2013-09-19 15:00:22 UTC
glibc has no interface for debuggers to access libraries loaded using dlmopen (LM_ID_NEWLM).  This issue was originally filed as GDB bug 11839.

The current rtld-debugger interface is described in the file elf/rtld-debugger-interface.txt under the "Standard debugger interface" heading.  This interface only provides access to the first link map (LM_ID_BASE).  This interface is not obviously extendable for the reasons described here: http://gbenson.net/?p=407.

The probes-based rtld-debugger interface allows debuggers to see libraries loaded using dlmopen as they appear.  This is enough to debug applications started in the debugger, but not enough to attach to a running process or to debug using a core file.

There was some discussion of this subject on the libc-alpha mailing list between November 2012 and January 2013.  The archives are not easily navigable, but the various messages can be found here:

  https://sourceware.org/ml/libc-alpha/2012-11/subjects.html#00656
  https://sourceware.org/ml/libc-alpha/2012-12/subjects.html#00078
  https://sourceware.org/ml/libc-alpha/2013-01/subjects.html#00358

I don't know in detail what the interface should look like, but some points:

1) A Solaris-style librtld_db.so would be undesirable as it would suffer much the same issues as libthread_db.so.

2) The interface must be usable by gdbserver, so it must be fairly lightweight.  An interface that required eg a Python interpreter would exclude users running gdbserver in constrained environments.

3) There are people using GDB to debug applications with 5,000 shared libraries and more, so performance is an issue.

4) The interface should work without debugging symbols to be useful for tools such as ABRT.
Comment 1 Carlos O'Donell 2013-09-19 15:13:21 UTC
I acknowledge that glibc needs to do something about this.
Comment 2 mathieu lacage 2015-02-12 12:38:05 UTC
(In reply to Gary Benson from comment #0)
> glibc has no interface for debuggers to access libraries loaded using
> dlmopen (LM_ID_NEWLM).  This issue was originally filed as GDB bug 11839.

I filed that gdb bug 4 years ago and I just discovered this glibc bug because you added a comment to the gdb bug about this one.

> 3) There are people using GDB to debug applications with 5,000 shared
> libraries and more, so performance is an issue.

Yes, I am one of these people. I also wrote my own loader because I needed more than the 32 namespaces provided by glibc so, it would be really helpful if the solution you choose to implement can be made to work with a dynamic number of namespaces that is potentially larger than 32.

I have not looked at the glibc loader in a long time but if there was progress on this (gdb/glibc interface for namespaces) front, I would probably try to work on a patch for glibc to support dynamically-allocated namespaces.

Regardless of the status of this feature, I would be happy to provide testing help and/or debugging/implementation time once a decision on how to add this feature is made (I read the ML discussions and I would personally favor the dwarf or r_debug solution).
Comment 3 Eric Blake 2020-02-14 01:40:38 UTC
I ran into this today while working on nbdkit; I needed a way to work around VDDK's buggy proprietary library that calls dlopen("libcrypto.so") on a relative path name but where the library comes with its own version of libcrypto.so that is incompatible with the one in /usr/lib, but I also need a solution that would not require LD_LIBRARY_PATH.  My solution was to use dlmopen() to open a shim library defining an alternative dlopen(), so that I could intercept VDDK's poor dlopen calls and replace them with saner absolute loads.  But as a result, I'm now unable to debug any of my glue code, or to look at the assembly in the VDDK code (the fact that the VDDK code is proprietary already means that usefully debugging it was unlikely, but if it is loaded by dlmopen it is impossible).  If you need more people using dlmopen() as a reason to finally get around to this bug, then count me as such a person.
https://www.redhat.com/archives/libguestfs/2020-February/msg00154.html
Comment 4 Florian Weimer 2020-02-14 09:09:50 UTC
(In reply to Eric Blake from comment #3)
> I ran into this today while working on nbdkit; I needed a way to work around
> VDDK's buggy proprietary library that calls dlopen("libcrypto.so") on a
> relative path name but where the library comes with its own version of
> libcrypto.so that is incompatible with the one in /usr/lib, but I also need
> a solution that would not require LD_LIBRARY_PATH.

Why is LD_LIBRARY_PATH not an option?

If the inheritance by subprocesses is a problem, you could try an explicit loader invocation with --library-path, or scrub the process environment after the process has been loaded. (After program start, changing the environment variable does not alter the search path.)

> My solution was to use
> dlmopen() to open a shim library defining an alternative dlopen(), so that I
> could intercept VDDK's poor dlopen calls and replace them with saner
> absolute loads.

It looks like you are reimplementing the la_objsearch hook from LD_AUDIT.

dlmopen seems the wrong solution for this because many things break with the current implementation, not just debugging.

This is more of a topic for libc-help. Posting to random bug reports isn't really the way to request help from your fellow Red Hatters.
Comment 5 Eric Blake 2020-02-14 13:47:53 UTC
(In reply to Florian Weimer from comment #4)

Replying out of order:

> This is more of a topic for libc-help. Posting to random bug reports isn't
> really the way to request help from your fellow Red Hatters.

This wasn't a random request for help, so much as confirmation that this (long-standing) bug is still out there, and that I encountered it because I came up with a use case where dlmopen() solved a problem for me.  I got my code to accomplish my goal before finding this bug report, and was merely confirming that the bug (of gdb not being able to debug dlmopen()d code) is still present, even if I managed to get my nbdkit patch working in spite of the debugging deficiency.  If anything, I'm hoping that my post here serves as documentation for why fixing this bug may have benefits to other users who try dlmopen(), rather than me needing help.

> (In reply to Eric Blake from comment #3)
> > I ran into this today while working on nbdkit; I needed a way to work around
> > VDDK's buggy proprietary library that calls dlopen("libcrypto.so") on a
> > relative path name but where the library comes with its own version of
> > libcrypto.so that is incompatible with the one in /usr/lib, but I also need
> > a solution that would not require LD_LIBRARY_PATH.
> 
> Why is LD_LIBRARY_PATH not an option?

VDDK's library is proprietary, and installing it puts both libvixDiskLib.so and a sub-par libstdc++.so (among others) in the same directory. If the user exports LD_LIBRARY_PATH to point to vddk's library, they break execution of any binary that depends on a newer libstdc++.so.  https://bugzilla.redhat.com/show_bug.cgi?id=1756307#c7

At the same time, libvixDiskLib.so exports an initialization function which in turn calls dlopen("libcrypto.so") and similar to load the versions of libraries that it shipped with; because VDDK appears to be built without proper rpath, this load fails if it finds /usr/lib64/libcrypto.so instead of the version it shipped alongside libvixDiskLib.so.  But because VDDK is proprietary, we can't rewrite their library to fix their bug.  So the only way to use VDDK is to influence the search path so that the relative loads performed by VDDK resolve to VDDK's installation path, but it is desirable to limit this influence to just the process loading VDDK.

Telling users they have to set LD_LIBRARY_PATH before running nbdkit (where nbdkit the loads libvixDiskLib.so via dlopen) is annoying: although nbdkit itself is (so far) not broken by any of the other libraries installed by VDDK (because it is a C program, not a C++ program), any child process that nbdkit spawns has to undo the LD_LIBRARY_PATH damage.  Worse, nbdkit is DESIGNED to spawn a child process, and the main executable that spawns a child process (nbdkit --run 'command ...') is in a separate binary than the shared library that dlopen()s libvixDiskLib.so (nbdkit-vddk-plugin.so); coordinating environment variables between the two binaries introduces awkward coupling problems.

Since LD_LIBRARY_PATH is unpalatable, I then explored how I could hook into the dlopen process.  My solution was to dlmopen() a library that hooks dlopen(), although your suggestion of using la_objsearch() is worth exploring - the fact that the dlopen man page did not mention la_objsearch, and that the dlmopen man page did not mention that dlmopen() currently does not support gdb debugging, could be considered documentation bugs under the umbrella of this bug.

> 
> If the inheritance by subprocesses is a problem, you could try an explicit
> loader invocation with --library-path, or scrub the process environment
> after the process has been loaded. (After program start, changing the
> environment variable does not alter the search path.)

Re-execing nbdkit would involve adding coupling between the main binary and the dependent library that dlopen's vddk.  An explicit loader invocation implies a re-exec of nbdkit.

> 
> > My solution was to use
> > dlmopen() to open a shim library defining an alternative dlopen(), so that I
> > could intercept VDDK's poor dlopen calls and replace them with saner
> > absolute loads.
> 
> It looks like you are reimplementing the la_objsearch hook from LD_AUDIT.

The man page for dlopen did not mention la_objsearch.  And, on at least Fedora 31, 'man la_objsearch' fails.  (I did find online references to it, though, now that you've pointed it out).  Is it something that HAS to go through the environment variable LD_AUDIT, or can a standalone shared library be its own auditing interface for just a single process?  If it is really that easy to hook dlopen() to rewrite relative paths into absolute, it seems like it would be easier to locate documentation on the matter.

> 
> dlmopen seems the wrong solution for this because many things break with the
> current implementation, not just debugging.

Then those pitfalls should be documented in the dlmopen man page, as well as a pointer back to this bug (and any others that are the result of difficulties in process management caused by dlmopen).
Comment 6 Carlos O'Donell 2021-07-23 00:35:33 UTC
The most sensible way forward here is probably the suggestion Cicso has which is to create a _r_debug_dlmopen with the appropriate data to access the namespaces:
https://sourceware.org/pipermail/libc-alpha/2020-June/115445.html
Comment 7 Sourceware Commits 2021-09-19 21:49:34 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a93d9e03a31ec14405cb3a09aa95413b67067380

commit a93d9e03a31ec14405cb3a09aa95413b67067380
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Aug 17 19:35:48 2021 -0700

    Extend struct r_debug to support multiple namespaces [BZ #15971]
    
    Glibc does not provide an interface for debugger to access libraries
    loaded in multiple namespaces via dlmopen.
    
    The current rtld-debugger interface is described in the file:
    
    elf/rtld-debugger-interface.txt
    
    under the "Standard debugger interface" heading.  This interface only
    provides access to the first link-map (LM_ID_BASE).
    
    1. Bump r_version to 2 when multiple namespaces are used.  This triggers
    the GDB bug:
    
    https://sourceware.org/bugzilla/show_bug.cgi?id=28236
    
    2. Add struct r_debug_extended to extend struct r_debug into a linked-list,
    where each element correlates to an unique namespace.
    3. Initialize the r_debug_extended structure.  Bump r_version to 2 for
    the new namespace and add the new namespace to the namespace linked list.
    4. Add _dl_debug_update to return the address of struct r_debug' of a
    namespace.
    5. Add a hidden symbol, _r_debug_extended, for struct r_debug_extended.
    6. Provide the symbol, _r_debug, with size of struct r_debug, as an alias
    of _r_debug_extended, for programs which reference _r_debug.
    
    This fixes BZ #15971.
    
    Reviewed-by: Florian Weimer <fweimer@redhat.com>
Comment 8 H.J. Lu 2021-09-19 22:04:08 UTC
Fixed for 2.35.