This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Thoughts on dlopen state rollback (bug 20839)
- From: Florian Weimer <fweimer at redhat dot com>
- To: libc-alpha at sourceware dot org
- Date: Wed, 11 Sep 2019 14:37:39 +0200
- Subject: Thoughts on dlopen state rollback (bug 20839)
We've encountered yet another instance of bug 20839, and we probably
need to fix it for real this time.
The core issue that a NODELETE object which is an indirect dependency is
not removed during dlopen failure. Instead, its link map stays around,
but the object is in a potentially bad state (e.g., unrelocated).
I posted two patches which add additional checks for unrelocated
objects:
elf: Assert that objects are relocated before their constructors run
elf: Most symbol lookups cannot deal with unrelocated link maps
I think we have two ways forward here.
(1) Re-relocate the object if we encounter it as a dependency in a
subsequent dlopen. I don't think this is viable because of the amount
of symbol interposition in libpthread (which is NODELETE), and a second
dlopen could actually bind to these symbols without a DT_NEEDED
dependency (because the application expects the symbols to come from
libc). The second problem is that relocation processing is destructive
on some architectures (I think, please correct me if I'm wrong), so if
we encounter an undefined symbol during relocation processing of the
NODELETE object itself, we cannot recover from that failure in general
because we have already overwritten part of the relocation data.
(2) Ignore NODELETE during the post-dlopen closing after a dlopen
failure. We currently do this for the dlopen'ed object itself, but not
for indirect dependencies. This requires the second patch mentioned
above, I think, because we do not do dependency tracking for NODELETE
objects (quite rightly so), and if a reference to the NODELETE object is
created, we cannot safely unload it anymore. We need to define a point
at which a NODELETE object gains real NODELETE status. I think this is
immediately before any ELF constructors are invoked. NODELETE or not
does not matter because there could be a relocation dependency on a
NODELETE mapping from something that has run its ELF constructor. If we
fix bug 24304 (making lazy binding failures in ELF constructors fatal),
then we know that we can proceed to completion in dlopen at this point.
Comments?
Thanks,
Florian