This is the mail archive of the
mailing list for the glibc project.
Re: Thoughts on dlopen state rollback (bug 20839)
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Florian Weimer <fweimer at redhat dot com>, libc-alpha at sourceware dot org
- Date: Thu, 12 Sep 2019 10:26:55 -0400
- Subject: Re: Thoughts on dlopen state rollback (bug 20839)
- References: <firstname.lastname@example.org>
On 9/11/19 8:37 AM, Florian Weimer wrote:
> We've encountered yet another instance of bug 20839, and we probably
> need to fix it for real this time.
> The core issue that a NODELETE object which is an indirect dependency is
> not removed during dlopen failure. Instead, its link map stays around,
> but the object is in a potentially bad state (e.g., unrelocated).
> I posted two patches which add additional checks for unrelocated
> elf: Assert that objects are relocated before their constructors run
> elf: Most symbol lookups cannot deal with unrelocated link maps
> I think we have two ways forward here.
> (1) Re-relocate the object if we encounter it as a dependency in a
> subsequent dlopen. I don't think this is viable because of the amount
> of symbol interposition in libpthread (which is NODELETE), and a second
> dlopen could actually bind to these symbols without a DT_NEEDED
> dependency (because the application expects the symbols to come from
> libc). The second problem is that relocation processing is destructive
> on some architectures (I think, please correct me if I'm wrong), so if
> we encounter an undefined symbol during relocation processing of the
> NODELETE object itself, we cannot recover from that failure in general
> because we have already overwritten part of the relocation data.
The current solution with l_relocate is designed to implemnet (1) and
given your research it doesn't look like it works.
Worse is that if you have a long-running process loading DSOs that
are then thrown away or autogenerated, or faulty, you can end up with
all sorts of mapped garbage you can't get rid of becuase you weren't
as clean as you should have been with the boundaries of use.
> (2) Ignore NODELETE during the post-dlopen closing after a dlopen
> failure. We currently do this for the dlopen'ed object itself, but not
> for indirect dependencies. This requires the second patch mentioned
> above, I think, because we do not do dependency tracking for NODELETE
> objects (quite rightly so), and if a reference to the NODELETE object is
> created, we cannot safely unload it anymore. We need to define a point
> at which a NODELETE object gains real NODELETE status. I think this is
> immediately before any ELF constructors are invoked. NODELETE or not
> does not matter because there could be a relocation dependency on a
> NODELETE mapping from something that has run its ELF constructor. If we
> fix bug 24304 (making lazy binding failures in ELF constructors fatal),
> then we know that we can proceed to completion in dlopen at this point.
I think this is the right solution.
The boundary for NODELETE needs to be made clear in the code.
We have two rough paths where this needs to be demarcated and one is
in elf/rtld.c and the other in elf/dlopen.c.