ld: how can I get at the final strtab and symtab at link time?
Thu Jul 4 14:11:00 GMT 2019
So I'm working on .ctf section support for GNU ld and have run into a
bit of a problem.
CTF has internal string tables, meant to record things like structure
member names that do not appear in the ELF strtab. It also has a pair of
tables laid out in the same order as the final symtab, giving CTF types
for (most) STT_OBJECTs and functions (for functions, the return type and
the types of all args).
We want to deduplicate the internal strtab against the ELF strtab, which
means we need to get involved after the ELF strtab is laid out:
similarly, to reorder our tables to match the ordering of the ELF
symtab, we need to do things after the symtab is laid out, because our
tables are laid out in 1:1 correspondence with the ELF symbol tables,
after everything has been emitted into them (local symbols, global
symbols, the works: if we miss even one we end up with a corrupted CTF
section). While this reshuffling at least leaves the size of the CTF
section unchanged, deduplicating against the ELF strtab changes the size
of the CTF section, because the CTF-internal string table shrinks.
However, it looks like the strtab and symtab are laid out terribly late,
in bfd_elf_final_link(), *after* the size of every other section has
been determined and the offsets of everything computed: and by this time
bfd is writing directly to files, so if we change section sizes at that
stage we probably get corrupted output.
We may be saved from major rearchitecting because we can figure out how
much the CTF strtab will shrink in advance of laying out the ELF strtab,
given only knowledge of *which* strings will be in the ELF strtab, not
where they will be located. If we knew that, we could size the section
early, then lay it out later once the strtab layout is known. I briefly
considered using relocs to do the necessary updating of locations in the
CTF that point into the ELF strtab, but the representation of such
pointers in CTF is odd enough that we'd definitely need new relocation
types on every supported machine and honestly even *thinking* about
doing it that way makes me shudder: it feels far too invasive and likely
to annoy everyone. I'd much rather throw a callback in that just scans
the ELF strtab and does the dedup manually (I can exploit the machinery
used to dedup the CTF strtab, so it comes to only a few lines, assuming
I can get at the ELF strtab at all).
So... does anyone have any suggestions about how I might get in that
late? Is there a standard trick for access to the final ELF symtab
and/or strtab that I have failed to notice? I fear not: if not, I
suppose I'll have to accumulate them both in some new data structure,
like the existing hashes only covering every symbol, local and global,
and the offset of every string in the ELF strtab. Populating these seems
likely to be fraught given the number of places that seem to be adding
symbols to the symtab independently. (I'm also looking for a way to get
at the strings that will constitute the final ELF strtab at an earlier
stage, while sections can still be resized.)
Aside: I have long since given up trying to use the linker plugin
interface for this, since it doesn't let us get in anywhere near late
enough and is too easy to *not* use (and then you end up with a
corrupted CTF section), so am doing most of the necessary work via calls
into libctf in ldlang.c:lang_process() -- but it does now look like the
final CTF strtab dedup, symtab reshuffle and section data generation may
have to happen at ldwrite time, probably via a callback to ld invoked
from inside bfd_elf_final_link() or something similar. Perhaps I could
put one in at the same place where optimized stab strings are written
out? Smuggling the callback into BFD might be rather tricky...
Dropping a callback into bfd_elf_final_link() does not mean that we
can't support non-ELF non-ELF final_link()s later on fairly easily,
since I now have format extensions defined that will free the symbol
table stuff from *mandatory* reshuffling to match the ELF symtab: but
I'd still *rather* reshuffle if possible, because it will save space to
do so. And deduplication against the strtab is essential in any case.
NULL && (void)
More information about the Binutils