This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

IFUNC resolver scheduling (bugs 21041, 20019)


I tried to fix most of the issues involving IFUNC resolvers and their access to unrelocated symbols.

My first attempt (on branch fw/bug21041) records all relocations against IFUNC resolvers during elf_machine_rela processing, and adds another pass to apply them.

The general approach is to execute IFUNC resolvers deepest-dependency-first (based on the object in which the IFUNC resolver resides, not the relocation which references it). Most of the time, this ensures that IFUNC resolvers do not encounter unrelocated symbols because IFUNC resolvers from their dependencies have already executed, patching in the missing relocation. Symbol interposition can still bypass that, though.

The branch is currently slightly buggy because it reverses the order of relocation processing within the same object, and it does not delay copy relocations, so those could copy unrelocated data. (Support for copy relocations whose source address is determined by an IFUNC resolver is missing as well, but this is a separate matter and quite easy to fix.)

Support for DT_TEXTREL is currently missing, too. This needs another preparatory patch like the one to delayed RELRO activation. (For dlopen, the RELRO activation delay does a bit too much work because it mprotects everything again, not just the newly allocated objects.)

One problem is that recording the relocations requires temporary memory allocations during relocation. I have a mmap-based pool allocator for that (currently very specific to the task at hand, but I will have to generalize it a bit). The allocations are required so that we do not have to walk through large parts of the relocation list in the second pass. The number of delayed relocations is significant: around 100 for simple binaries, and more than 1,000 for complex ones. The current delayed relocation record contains six words, so we are talking about several KiB even for small binaries. I'm not convinced it makes sense to add a large static stack allocation (like scratch_buffer does) to avoid calling mmap because the stack memory remains dirty and might not be reused by some programs. Adding at least one mmap and munmap call to the start-up sequence is likely visible in benchmarks, though.

The combreloc feature sorts relocations of the same type together. The static linker could sort relocations involving IFUNCs together, and we could recognize this in the dynamic linker and record spans of relocations instead individual relocations. This would reduce the number of delayed relocation records we'd have to allocate.

However, the current dynamic linker architecture makes this difficult: We do not exploit the combreloc optimization very well; we dispatch on the next relocation type (with a large switch statement) even though it is extremely likely that the type is the same as the current type. If we change that, recognizing spans would become easier, too.

Apart from the performance impact and complexity, there is another issue I worry about: What happens if an IFUNC-based relocation is subject to a copy relocation? Assuming we delay IFUNC relocations and copy relocations and execute them in that order, the data produced by the copy relocation is correct because the IFUNC relocation has been performed before the copy is made. But what happens if the copy-relocated symbol is accessed by the IFUNC resolver? I think it will touch the object at its final location, and that has not been relocated yet. I suppose this is similar to the dependency bypass due to function interposition. Maybe it's sufficient to document this; it is certainly easier to avoid than the current restriction on accessing *any* relocated data. This looks like a circular dependency which is impossible to resolve in general (and rather expensive to detect).

On the positive side, nothing in this affects lazy binding and _dl_fixup.

In retrospect, adding IFUNCs as a user-visible feature looks like a mistake, but now we have to live with them.

If no one has a better idea, I'll generalize my mmap-based bump-pointer/pool allocator, use that to record copy relocations as well, and maybe try to squeeze out a few bytes here and there.

Comments?

Thanks,
Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]