This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IFUNC resolver scheduling (bugs 21041, 20019)


On 01/25/2017 08:57 AM, Florian Weimer wrote:
> I tried to fix most of the issues involving IFUNC resolvers and their
> access to unrelocated symbols.

Thank you very much for tackling this somewhat complicated topic.

Like the issue of constructors and destructions, IFUNC is not a trivially
black and white issue, and some trade-offs need to be made.

> My first attempt (on branch fw/bug21041) records all relocations
> against IFUNC resolvers during elf_machine_rela processing, and adds
> another pass to apply them.

I like this design. It's relatively straight forward to understand,
audit and maintain.

> The general approach is to execute IFUNC resolvers
> deepest-dependency-first (based on the object in which the IFUNC
> resolver resides, not the relocation which references it).  Most of
> the time, this ensures that IFUNC resolvers do not encounter
> unrelocated symbols because IFUNC resolvers from their dependencies
> have already executed, patching in the missing relocation.  Symbol
> interposition can still bypass that, though.

Is there a problem with symbol interposition?

Running the IFUNC resolvers in a depth-first order is exactly as ELF
specifies for these kinds of dependency-based ordering issues.

> The branch is currently slightly buggy because it reverses the order
> of relocation processing within the same object, and it does not
> delay copy relocations, so those could copy unrelocated data.
> (Support for copy relocations whose source address is determined by
> an IFUNC resolver is missing as well, but this is a separate matter
> and quite easy to fix.)

I don't think there is any real bug in reversing the order of the
IRELATIVE relocation handling.  What matters more is that other
earlier listed relocations are completely handled in order that the
resolver function may run correctly (where possible, and if not possible
then at least deterministic).

Isn't delaying copy relocations and support for copy relocations whose
source address is determined by an IFUNC resolver the same thing?

> Support for DT_TEXTREL is currently missing, too.  This needs another
> preparatory patch like the one to delayed RELRO activation.  (For
> dlopen, the RELRO activation delay does a bit too much work because
> it mprotects everything again, not just the newly allocated
> objects.)

DT_TEXTREL shouldn't be too different, just filling the right pieces
given the framework you already have in place?

> One problem is that recording the relocations requires temporary
> memory allocations during relocation.  I have a mmap-based pool
> allocator for that (currently very specific to the task at hand, but
> I will have to generalize it a bit).  The allocations are required so
> that we do not have to walk through large parts of the relocation
> list in the second pass.  The number of delayed relocations is
> significant: around 100 for simple binaries, and more than 1,000 for
> complex ones.  The current delayed relocation record contains six
> words, so we are talking about several KiB even for small binaries.
> I'm not convinced it makes sense to add a large static stack
> allocation (like scratch_buffer does) to avoid calling mmap because
> the stack memory remains dirty and might not be reused by some
> programs.  Adding at least one mmap and munmap call to the start-up
> sequence is likely visible in benchmarks, though.

I think Szabolcs makes a good point here, in that we should consider:

* What is the cost of a second walk?

* What is the cost of only recording only that IRELATIVE relocs exist
  (recording the first one) and then starting the walk from there?

> The combreloc feature sorts relocations of the same type together.
> The static linker could sort relocations involving IFUNCs together,
> and we could recognize this in the dynamic linker and record spans of
> relocations instead individual relocations.  This would reduce the
> number of delayed relocation records we'd have to allocate.

This is exactly what POWER does for OPD-related relocs (Alan Modra
explained this to me once).

I think this would be a reasonable optimization, but it's just that,
an optimization. Since you can always record the first record, and
start the walk there for the given object?

> However, the current dynamic linker architecture makes this
> difficult: We do not exploit the combreloc optimization very well; we
> dispatch on the next relocation type (with a large switch statement)
> even though it is extremely likely that the type is the same as the
> current type.  If we change that, recognizing spans would become
> easier, too.

I think this is an optimization that we can approach gradually.

> Apart from the performance impact and complexity, there is another
> issue I worry about: What happens if an IFUNC-based relocation is
> subject to a copy relocation?  Assuming we delay IFUNC relocations
> and copy relocations and execute them in that order, the data
> produced by the copy relocation is correct because the IFUNC
> relocation has been performed before the copy is made.  But what
> happens if the copy-relocated symbol is accessed by the IFUNC
> resolver?  I think it will touch the object at its final location,
> and that has not been relocated yet.  I suppose this is similar to
> the dependency bypass due to function interposition.  Maybe it's
> sufficient to document this; it is certainly easier to avoid than the
> current restriction on accessing *any* relocated data.  This looks
> like a circular dependency which is impossible to resolve in general
> (and rather expensive to detect).

I respond to this issue in my downthread email:
https://www.sourceware.org/ml/libc-alpha/2017-01/msg00500.html

> On the positive side, nothing in this affects lazy binding and
> _dl_fixup.

OK.

> In retrospect, adding IFUNCs as a user-visible feature looks like a
> mistake, but now we have to live with them.

IFUNCs are no more or less a mistake than letting users use constructors
or destructors.

My only complaint is:

* Quality of tooling to detect errors is lacking.

We have no way to:

* Determine order of library initialization

* Determine order of constructor / destructor

* Determine effects of symbol visibility

... given a binary, shared objects, environment and system configuration.

We need a better ldd.

> If no one has a better idea, I'll generalize my mmap-based
> bump-pointer/pool allocator, use that to record copy relocations as
> well, and maybe try to squeeze out a few bytes here and there.

Howe hard it it to implement just recording the first reloc and walking
the rest? This should reduce storage needed dramatically.
 
> Comments?

Great work.

I think we are almost to the point where IFUNC has a reasonable chance
of describing in plain language the restrictions on the resolver functions.

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]