Bug 30766 - The reloc_complete probe can be hit when not all libraries have been relocated
Summary: The reloc_complete probe can be hit when not all libraries have been relocated
Status: WAITING
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 30765
  Show dependency treegraph
 
Reported: 2023-08-15 14:23 UTC by Andrew Burgess
Modified: 2024-03-08 09:41 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Patch that exposes the issue described in this bug. (857 bytes, patch)
2023-08-15 14:23 UTC, Andrew Burgess
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Burgess 2023-08-15 14:23:44 UTC
Created attachment 15061 [details]
Patch that exposes the issue described in this bug.

The attached patch applies to current(ish) master (d6fe19facc) and exposes the issue described in this bug report.  With this patch applied I see:

 $ make test t=dlfcn/tst-rec-dlopen
 ... snip ...
 DSO moddummy1.so loaded when it shouldn't be
 make[2]: Leaving directory '/tmp/glibc/src/dlfcn'
 FAIL: dlfcn/tst-rec-dlopen
 original exit status 1
 Called dummy2()
 Called dummy2()
 make[1]: Leaving directory '/tmp/glibc/src'

This bug was found while investigating some GDB behaviour, and relates to the reloc_complete probe (elf/dl-open.c).  The docs for reloc_complete say:

 reloc_complete:
    The linker has relocated all objects in the specified namespace.
    The namespace's r_debug structure is consistent and may be
    inspected, and all objects in the namespace's link-map are
    guaranteed to have been relocated.

However, there are times that reloc_complete is called when not every object in the namespace's link-map have been relocated, and worse, there is no way for the debugger, when walking the link-map, to tell the difference between a relocated, and non-relocated object.

This results in GDB bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30765

In the glibc test I modified a situation is setup where a recursive dlopen will be performed; the test overrides malloc and has the new malloc perform a dlopen.  At the top level (in do_test) we dlopen a library, while this dlopen is being performed glibc calls malloc.  These malloc calls themselves trigger a dlopen call (we take care to avoid infinite recursion here).

The problem is, that, while servicing the top-level dlopen, the library is added to the namespace's link-map list, and then malloc is called *before* the library has been relocated.  As a result, this second-level malloc call results in another dlopen call, and, when we hit the reloc_complete probe for this second-level dlopen call, the first library is already in the link-map list, but is not yet relocated, in clear violation of the documented API.

In GDB we hook the reloc_complete probe to figure out when the library has been loaded.  It is possible for a user to stop at this point an examine the inferior's memory, which can include examining global state that should have been relocated, but (due to this bug) has not.
Comment 1 Florian Weimer 2024-03-08 09:41:20 UTC
Andrew, would it help to add a “relocation complete” flag to struct r_debug_extended? At least we have an extension mechanism there. GDB could check if relocation is really complete for the namespace.

If we really need a per-link-map flag for this, it's going to need another extension mechanism.