ld: compressed sections that depend on the symtab and strtab contents (linker people, am I crazy?)

Nick Alcock nick.alcock@oracle.com
Fri Jul 5 00:35:00 GMT 2019

[... a problem description, and a possible solution: in particular
 search for "the right approach". But I don't know if it would be
 acceptable. Sorry for the length, but I'm explaining things that
 everyone surely already knows so that you can tell if my understanding
 is totally wrong.]

On 4 Jul 2019, Nick Alcock spake thusly:
> OK, I may have something. I think I can do strtab deduplication
> immediately after the call to _bfd_elf_strtab_finalize(), then write out
> the CTF section itself much later, right before the call to
> elf_final_link_free(). By this point the symtab section is finalized as
> well and I can reshuffle everything appropriately.
> I *think*. At least it's not horribly tangled up with everything else
> like I feared it might be.
> Let's see if I can make this work.

... so there were obviously more problems.

I was thinking of doing things immediately before this comment in

  /* Since ELF permits relocations to be against local symbols, we
     must have the local symbols available when we do the relocations.
     Since we would rather only read the local symbols once, and we
     would rather not keep them in memory, we handle all the
     relocations for a single input file at the same time.

At this point output has, as far as I can tell, not started, so I can
still call bfd_set_section_size for the CTF section. At this point, the
ELF strtab is almost entirely constructed, though not sorted, and the
only other thing that I need to do that depends on state that is not
ready yet is to sort two CTF sections into the same order as the ELF
symbol table: so I could in theory look through the CTF file and
determine what its size would be once we had done all this.

Only... after writing half the code it became obvious that that can't
actually work. The trouble is that the CTF section is a ctf_archive
composed of members that are (if large enough) themselves compressed.
This means the size of the CTF section depends on its precise content!
So I cannot set the CTF section size until after the order of the
symbols in the symtab is known, and the offsets of the strings in the
strtab are known. And this is not worked out until after output has
started, when BFD locks down section sizes and they cannot be changed.

So I have a contradiction. I can only set the CTF section size after it
is too late to set any section sizes.

So... is there a fundamental reason for this limitation? I can see why
it's necessary for SEC_ALLOC sections, but the CTF section is non-
loaded. Is there some reason why I *can't* set the CTF section size near
the end of bfd_elf_final_link, perhaps right after the call to
_bfd_elf_write_section_eh_frame_hdr? Why does BFD prohibit this? I
haven't dug far enough into the linker to understand quite how all the
writing machinery works: at this point, we are seeking all over the
output and writing stuff (possibly making the file temporarily sparse in
the process?), so we obviously can't insert sections in front of points
in the file containing sections for which writing has already started --
but is there some reason why we can't do a late append of some
non-loaded sections to the end of the file, even after output has
started? We don't care at all where the CTF section is physically
located in the file: it could perfectly well be at the very end for all
we care, and there is nothing intrinsic in the CTF section that prevents
it from being positioned only at a very late stage.

As far as I can tell, the actual writing of non-loaded sections is done
in _bfd_elf_write_object_contents: this calls
_bfd_elf_assign_file_positions_for_non_load, and it's only invoked at
bfd_close time. This is a *very* long time after bfd_elf_final_link has
terminated -- so presumably it's actually fine to set section sizes
anywhere at all inside bfd_elf_final_link as long as we are targetting
ELF and as long as the section is non-loadable. It's just that
bfd_set_section_size won't let us, because output_has_begun.

So... would the right approach here be to add a new target... thing
(callback? operation? what's the terminology here?) allowing targets to
add their own constraints in place of the check on
abfd->output_has_begun in bfd_set_section_size, so we could allow ELF
targets to set the sizes of non-loadable sections even after
output_has_begun, up until _bfd_elf_assign_file_positions_for_non_load
is called? Or would the right approach be something different? As
you can tell, I'm pretty much a babe in the woods here -- but it feels
like this might work. I just don't want to violate some core bfd
invariant in the process, or otherwise turn out something entirely
horrible that everyone will reject. I don't want to induce projectile
vomiting right after the Fourth of July!

So... if the above won't work or is too ugly to live, does anyone have
any suggestions for a way that will? I suspect there are lots of people
here who could figure out the right way to do this in the time it took
me to figure out where _bfd_elf_write_object_contents was called from :)

It's possible that something else I should do is to tell the ctf_archive
machinery not to compress anything itself (using the ctf_archive
compression only in the case of standalone archives not embedded in ELF
files), and ask BFD to compress CTF sections via SEC_ELF_COMPRESS,
taking that complexity out of libctf's hands: since CTF always opens CTF
sections using BFD machinery this should be entirely transparent to all
libctf consumers, and the section size is then the uncompressed size, so
does not depend on the content of the section. But even then it would be
*simpler* if we could set the section size later than
bfd_set_section_size currently allows, after the symtab and strtab are
laid out.

I hope there *is* a right way to do this. The fallback position is to
write the CTF section to a real file and then pexecute() objcopy to
rewrite the executable after the link is over, but even I think that is
too ugly to live, in addition to being slow and clunky and generally
gross. There must be a better way, even if what I suggest above isn't

So... any suggestions? I know this is a weird file format, and it's
unusual to have the inside of sections depend on the contents of the
symbol and string tables. But it saves space, and that's what CTF is all
about. :)

(thanks to Egeyar Bagcioglu and Jose Marchesi for invaluable hints,
without which I wouldn't have got even this far.)

More information about the Binutils mailing list