[PATCH] Support SHF_GNU_RETAIN ELF section flag

Fangrui Song i@maskray.me
Thu Sep 24 19:06:48 GMT 2020


On 2020-09-24, Jozef Lawrynowicz wrote:
>On Wed, Sep 23, 2020 at 04:29:43PM -0700, Fangrui Song wrote:
>> Hi Jozef,
>
>Hi Fangrui,
>
>> I saw your proposal https://sourceware.org/pipermail/gnu-gabi/2020q3/000429.html
>> I did not subscribe to gnu-gabi before yesterday so it is inconvenient for me to
>> reply there. Since SHF_GNU_RETAIN is a new feature, and we already have facility
>> for making arbitrary sections alive with R_*_NONE, can you highlight the selling
>> point of a new flag?
>>
>> Copying me previous reply here
>> > We already have a way to create an artificial reference:
>> >
>> >   .reloc ., R_X86_64_NONE, target_symbol
>> >
>> > If we allow a relocation number for the second operand
>> >
>> >   .reloc ., 0, target_symbol
>> >
>> > this will be generic. You can insert the directives in a GC root (e.g.
>> > _start or a symbol referenced by -u or maybe an .init_array)
>>
>> If you do not want to touch the section containing the -e (--entry) symbol, you
>> can use:
>>
>>   .section .init_array.1,"a",@init_array
>>   .reloc ., R_X86_64_NONE, retained_section
>>
>> (I find that gold has an internal error with such a relocation.)
>> But GNU ld should have been supported this for a very long time.
>>
>> (I added these directives to llvm last year: https://reviews.llvm.org/D62014 )
>>
>
>The fact that this relies on the compiler knowing a specific section
>will be present in a linker script, when we are dealing with such a
>broad ecosystem of targets and operating systems, makes me uneasy. The
>functionality simply breaks if the user has a custom linker script which
>does not have .init_array.
>
>Many embedded applications can be written without requiring this
>section. If someone has written their linker script from scratch, only
>including the section directives for the sections they actually need,
>why must we enforce that they have a .init_array input section rule just
>so they can make use of the "retain" attribute. It doesn't make sense -
>.init_array and "retain" are not related.

I use .init_array (which happens to be a GC root) as an example, not
that I am advertising .init_array . My main point is about .reloc
You can use a .reloc directive in a known GC root. .init_array happens
to be such a GC root so I used it as an example. It is not too bad if
you think about a benign zero-sized section.

>Even if this approach would work and pick the right section, I think
>it is nicer for the user for the "retain" attribute to have a
>dedicated ELF construct which describes the requirement to retain the
>section, instead of using an existing construct whose purpose is not
>related.

Relocations are the keystone of --gc-sections. In some cases we want a
dependency relation but do not want the relocation to alter the content.
We use R_*_NONE in such cases.

A relocation gives more control than a section flag. In cases you need
"if this section is retained, please retain some other sections",
instead of "please always retain these sections".

>Your average user is going to be very confused why there are relocs in
>section X which point to various symbols in their code. If they have
>written the entire application, they might be able to infer that it is
>the "retain" attribute which generated these relocs, but if someone else
>wrote the code or the code is from a library or SDK it will not be
>clear.
>
>Ok we could maybe name a reloc like BFD_RELOC_RETAIN, but then what
>would the description be?
>  This relocation type does not actually perform any relocation action,
>  but is used to indicate that the symbol it references should not be
>  discarded by linker garbage collection. It must be placed in a section
>  which will definitely be present in the linked output file, and not be
>  subject to garbage collection, otherwise it will not have any effect.
>
>Can you tell me why it is preferable to use the relocation mechanism to
>implement this, instead of a precisely defined new section flag?
>
>Why must we look to workarounds to implement something like this
>anyway?  We can work out the details of a new section flag, and ensure
>it is precisely specified to ensure robustness, and then developers can
>benefit from understanding more about how their program has been put
>together.
>
>Do we want to make life easier for ourselves, or easier for our users?
>
>I get that ABI changes can be a bit disruptive, but this new flag in
>particular really isn't complicated anyway.
>
>> ---
>>
>> For a new section flag, there are a bunch of things needing thoughts
>>
>> * assembler
>>
>> The .retain directive seems to be discouraged... For section flags:
>>
>> .section .foo,"a"
>> .section .foo,"aR"        # is this a new section
>> .pushsection .foo,"aR"    # is this a new section
>
>No they are not new sections. From my original proposal:

If we use a section flag, my expected behavior for the second .section
with different flags is an error:
https://sourceware.org/pipermail/binutils/2020-February/109945.html

>> .section .foo,"a"
>> .section .foo,"aR"        # error

In this case, I agree that a separate directive can be more convenient
because the compiler does not need to known the flag when it is about to
emit the first .section directive (for example, due to a faraway __attribute__((section(...))))

But then, it will be an innovation I don't know a precedent exists.

>> Alternatively, the "R" flag is recognized by the "flags" argument to the
>> .section directive and will apply SHF_GNU_RETAIN to that section.
>> It is intended that SHF_GNU_RETAIN does not interfere with any validation when
>> switching to a section. It can be used to augment the section flags in a section
>> which has already been created.
>
>When you have two .section directives for the same section, GAS
>"switches" between them instead of creating new sections, which is what
>I referred to above.
>
>This is why the .retain directive more precisely describes what is
>happening. The compiler is telling the assembler that the section
>containing the declaration of the function or data symbol should have
>the SHF_GNU_RETAIN flag applied.
>
>>
>> Does the compiler need to remember that a section has the flag?
>> (Think how this works with __attribute__((section(...))); many asm streamers are
>> one-pass)
>
>The compiler does not need to worry about sections beyond getting the
>name of the section the declaration is in. The "retain" attribute just
>means that the section containing the declaration of the function or
>data object must be retained, so it emits a directive to describe that.
>Once the assembler has set SHF_GNU_RETAIN on a section, it will not be
>unset.
>
>I expect the most common use case to actually be when either the
>"section" attribute has been used, or the -f{function,data}-sections GCC
>options have been passed. If the user is trying to make the most out of
>garbage collection, they should be using -f{function,data}-sections.
>
>>
>> * linker
>> - What does -r do on two sections of the same, one with the flag and the other
>> without? (as HJ mentioned)
>
>To reply to H.J. as well for this point:
>I don't think this warrants any special behavior, SHF_GNU_RETAIN doesn't
>need to change the behavior of section merging. The user should put the
>object to retain in it's own section if they don't want large parts of
>their program to possibly be unnecessarily retained. The unique section
>name they give their SHF_GNU_RETAIN section will not be merged into a
>general output section name until they perform the final non-relocatable
>link.
>
>A section with SHF_GNU_RETAIN applied is being retained because it
>contains some information that is important to the program. So wherever
>the that information ends up needs to be retained.
>
>> - Does the output section have the flag?
>
>SHF_GNU_RETAIN is applied to an input section.
>To ensure the input section is retained, SHF_GNU_RETAIN must be applied
>to any section that input section is merged with. The flag doesn't get
>removed from output sections.
>
>> - Does the flag retain other sections in the same section group?
>
>Yes.
>From the description on section groups from the ELF spec:
>  ... such groups must be included or omitted from the linked
>  object as a unit.
>
>I think potentially the only confusing part of any section flag merging
>behavior is the fact that the assembly code might have different
>.section directives for the same section, some with "R" and some without
>(+1 for a .retain directive ;)).
>Once the assembler has emitted its output, the SHF_GNU_RETAIN flag
>applied to an input section behaves like any other section flag.
>There is only one line of linker code which does anything specific with
>SHF_GNU_RETAIN, and that is the code in bfd/elflink.c to "gc_mark" the
>section.
>
>Thanks,
>Jozef
>
>>
>>
>> On 2020-09-23, H.J. Lu via Binutils wrote:
>> > On Wed, Sep 23, 2020 at 1:04 PM Jozef Lawrynowicz
>> > <jozef.l@mittosystems.com> wrote:
>> > >
>> > > On Wed, Sep 23, 2020 at 12:03:28PM -0700, H.J. Lu via Binutils wrote:
>> > > > On Wed, Sep 23, 2020 at 11:47 AM Jozef Lawrynowicz
>> > > > <jozef.l@mittosystems.com> wrote:
>> > > > >
>> > > > > On Wed, Sep 23, 2020 at 10:13:37AM -0700, H.J. Lu via Binutils wrote:
>> > > > > > On Wed, Sep 23, 2020 at 9:52 AM Jozef Lawrynowicz
>> > > > > > <jozef.l@mittosystems.com> wrote:
>> > > > > > >
>> > > > > > > On Wed, Sep 23, 2020 at 01:51:56PM +0000, Michael Matz wrote:
>> > > > > > > > Hello,
>> > > > > > > >
>> > > > > > > > On Wed, 23 Sep 2020, H.J. Lu via Binutils wrote:
>> > > > > > > >
>> > > > > > > > > > I think that:
>> > > > > > > > > >
>> > > > > > > > > > >  .section .text,"ax"
>> > > > > > > > > > >    ...
>> > > > > > > > > > >  foo:
>> > > > > > > > > > >    ...
>> > > > > > > > > > >  .retain
>> > > > > > > > > > >  retained_fn:
>> > > > > > > > > > >    ...
>> > > > > > > > > >
>> > > > > > > > > > is some nice syntactic sugar compared to:
>> > > > > > > > > >
>> > > > > > > > > > >  .section .text,"ax"
>> > > > > > > > > > >    ...
>> > > > > > > > > > >  foo:
>> > > > > > > > > > >    ...
>> > > > > > > > > > >  .section .text,"axR"
>> > > > > > > > > > >  retained_fn:
>> > > > > > > > > > >    ...
>> > > > > > > > > >
>> > > > > > > > > > It's also partly for convenience; we have other directives which are
>> > > > > > > > > > synonyms or short-hand for each other.
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > You don't need to keep the whole section when only one symbol should
>> > > > > > > > > be kept.  Please drop the .retain directive.  GCC, as and ld should do the
>> > > > > > > > > right thing with
>> > > > > > > > >
>> > > > > > > > > .section .text,"ax"
>> > > > > > > > >    ...
>> > > > > > > > > foo:
>> > > > > > > > >   ...
>> > > > > > > > >  .section .text,"axR"
>> > > > > > > > >
>> > > > > > > > >  retained_fn:
>> > > > > > > > >
>> > > > > > > > > where foo can be dropped and retained_fn will be kept.
>> > > > > > > >
>> > > > > > > > This is not what we discussed at the ABI list, the flag is per section, so
>> > > > > > > > either the whole section is retained or not.  What you describe is
>> > > > > > > > something else that would work on a per symbol basis, which would have to
>> > > > > > > > be specified in a different way and might or might not be a good idea.
>> > > > > > > > But let's not conflate these two.
>> > > > > > >
>> > > > > > > Also, the linker cannot currently dissect a section and remove a
>> > > > > > > particular unused symbol anyway. Since garbage collection only operates
>> > > > > > > on the section level, marking the section itself as "retained" seems
>> > > > > > > most appropriate.
>> > > > > >
>> > > > > > It can be done.  If you put your branch on
>> > > > > >
>> > > > > > https://gitlab.com/x86-binutils/binutils-gdb
>> > > > > >
>> > > > > > I can help you implement it.
>> > > > >
>> > > > > It's not something I have time to look into at the moment, for now the
>> > > > > aim is just to prevent garbage collection of sections.
>> > > >
>> > > > Linker and assembler already support it.   You just need to add SHF_GNU_RETAIN
>> > > > to the framework.  Check how SHF_GNU_MBIND works.
>> > >
>> > > Sorry, I don't understand.
>> > >
>> > > Are you saying that LD already supports the garbage collection of
>> > > individual unused symbol definitions from input sections? Whilst
>> > > retaining other symbol definitions which are required by the program?
>> > > I cannot find any reference to this.
>> > >
>> > > How does that relate to SHF_GNU_MBIND? I looked at all the references
>> > > to "mbind" in Binutils and nothing seemed related garbage collection of
>> > > sections, since SHF_GNU_MBIND is just used to indicate a particular
>> > > section should be placed in a special memory area.
>> >
>> > For
>> >
>> > section .text,"ax"
>> >   ...
>> > foo:
>> >  ...
>> > .section .text,"axR"
>> > retained_fn:
>> >
>> > you need to create a new .text section with SHF_GNU_RETAIN for
>> > retained_fn.   See get_section in obj-elf.c.  If you want to avoid
>> > merging .text section with SHF_GNU_RETAIN with other .text
>> > sections by ld -r, linker needs to distinguish sections of the
>> > same name with and without SHF_GNU_RETAIN.
>


More information about the Binutils mailing list