Created attachment 15061 [details] Patch that exposes the issue described in this bug. The attached patch applies to current(ish) master (d6fe19facc) and exposes the issue described in this bug report. With this patch applied I see: $ make test t=dlfcn/tst-rec-dlopen ... snip ... DSO moddummy1.so loaded when it shouldn't be make[2]: Leaving directory '/tmp/glibc/src/dlfcn' FAIL: dlfcn/tst-rec-dlopen original exit status 1 Called dummy2() Called dummy2() make[1]: Leaving directory '/tmp/glibc/src' This bug was found while investigating some GDB behaviour, and relates to the reloc_complete probe (elf/dl-open.c). The docs for reloc_complete say: reloc_complete: The linker has relocated all objects in the specified namespace. The namespace's r_debug structure is consistent and may be inspected, and all objects in the namespace's link-map are guaranteed to have been relocated. However, there are times that reloc_complete is called when not every object in the namespace's link-map have been relocated, and worse, there is no way for the debugger, when walking the link-map, to tell the difference between a relocated, and non-relocated object. This results in GDB bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30765 In the glibc test I modified a situation is setup where a recursive dlopen will be performed; the test overrides malloc and has the new malloc perform a dlopen. At the top level (in do_test) we dlopen a library, while this dlopen is being performed glibc calls malloc. These malloc calls themselves trigger a dlopen call (we take care to avoid infinite recursion here). The problem is, that, while servicing the top-level dlopen, the library is added to the namespace's link-map list, and then malloc is called *before* the library has been relocated. As a result, this second-level malloc call results in another dlopen call, and, when we hit the reloc_complete probe for this second-level dlopen call, the first library is already in the link-map list, but is not yet relocated, in clear violation of the documented API. In GDB we hook the reloc_complete probe to figure out when the library has been loaded. It is possible for a user to stop at this point an examine the inferior's memory, which can include examining global state that should have been relocated, but (due to this bug) has not.
Andrew, would it help to add a “relocation complete” flag to struct r_debug_extended? At least we have an extension mechanism there. GDB could check if relocation is really complete for the namespace. If we really need a per-link-map flag for this, it's going to need another extension mechanism.