Bug 26404 - ld: INSERT [AFTER|BEFORE] variant for extension purposes
Summary: ld: INSERT [AFTER|BEFORE] variant for extension purposes
Status: UNCONFIRMED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-19 00:13 UTC by Fangrui Song
Modified: 2021-09-03 08:42 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fangrui Song 2020-08-19 00:13:23 UTC
`SECTIONS { ... } INSERT [AFTER|BEFORE]` complements the internal linker script.

This is nice: because in many cases the internal one works quite well and the
user does not want to reset the internal linker script.  Users tend to leverage
this property to provide extensions to the internal linker script.

However, the section anchor after [AFTER|BEFORE] makes extensions somewhat
cumbersome: there may not be a suitable anchor or the user just does
not want to set an anchor. The default orphan placement works well.

On one occasion, I've seen a user who wants to write:

  .note.gnu.build-id : {
    PROVIDE_HIDDEN (__build_id_start = . + 16);
    *(.note.gnu.build-id)
    PROVIDE_HIDDEN (__build_id_end = .);
  }
  // __build_id_start / __build_id_end delimiters the build ID payload (16 is
  // the size of the note header)


On another occasion, a user wants to assign some sections KEEP semantics
(https://sourceware.org/binutils/docs/ld/Input-Section-Keep.html#Input-Section-Keep)
but does not want to provide an external linker script.


I have sympathy with both feature requests but cannot convince myself that the
use cases are beyond the ad-hoc category. A natural generalization is that we
can have a syntax similar to INSERT [AFTER|BEFORE] but does not require an
anchor.

About the syntax,

* Suggestion (a)

  EXTEND SECTIONS { // another keyword can be used, e.g. OVERRIDE
    .foo : { KEEP(*(.foo)) }
    .bar : { KEEP(*(.bar)) }
    sym = .;   // symbol assignments are disallowed
  }
  // What if the internal linker script defines .foo as well?

* Suggestion (b)

  SECTIONS {
    .foo : { KEEP(*(.foo)) }
    .bar : { KEEP(*(.bar)) }
    sym = .;   // symbol assignments are disallowed
  } EXTEND;

* Suggestion (c)

  SECTIONS {
    .foo : { KEEP(*(.foo)) }
    .bar : { KEEP(*(.bar)) }
  } REPLACE .data;
  // If the internal linker script defines .data, replace it with the content in
  // the braces. Otherwise, treat .foo and .bar as orphans?

---

Some properties and edge cases need thoughts, e.g.

* It had better not interfere with normal orphan placement.
* When an output section is described by both the internal linker script and the
  external one, what should happen? For INSERT, the interaction is currently undocumented. It seems that the internal one is applied first, followed by the description from INSERT AFTER|BEFORE, e.g.

    SECTIONS { .data : { data_begin = .; *(.data) }} INSERT BEFORE .bss;
    // place .data.* into .data because the internal description does so.

    SECTIONS { .data : { data_begin = .; *(.data) *(.data1) }} INSERT BEFORE .bss;
    // place .data1 into .data

---

For the two feature requests, I think they don't need to define semantics when
there is a collision. The important thing is to have a syntax (not getting in
the way in the future). The collision cases can be considered "undefined
behaviors" for now.
Comment 1 Fangrui Song 2020-09-17 06:09:59 UTC
Another use case.

GCC ipa-preorder (call graph profiling) https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01142.html

If such an extension exists, users don't have to upgrade binutils to have
*(SORT(.text.sorted.*)) in the output section .text
GCC can provide such a linker script with the SECTIONS fragment.
Comment 2 Nick Clifton 2020-09-17 10:14:51 UTC
(In reply to Fangrui Song from comment #0)
[Sorry for not commenting on this PR earlier...]

Currently linker script fragments specified on the command
line override any conflicts with the built-in script, right ?
So how about:

  Suggestion (d):
   A new command line option to tell the linker that the
   argument is a script fragment that extends the built-in
   script.  eg:

     % cat extend.t
     SECTIONS {
         .data : { KEEP (*.foo) }
     }

     % ld --extension-script extend.t hello.o

  In this scenario any .foo sections from the input files
  will be added to the .data output section, *and* all the
  normal input data sections (.data, .data1, etc) will also 
  be added to the output .data section.
  
This has the advantage that no new linker script syntax is needed
and the command line is more readable.  (Compared to just placing
the script name on the command line without an option, which I
always find to be confusing).

Cheers
  Nick

   


The linker already supports augmenting the built-in script via the 
addition of script fragments on the command line, right ?  
not just use that facility.  Eg:


> * Suggestion (a)
> 
>   EXTEND SECTIONS { // another keyword can be used, e.g. OVERRIDE

> * Suggestion (b)
> 
>   SECTIONS {
>     .foo : { KEEP(*(.foo)) }
>     .bar : { KEEP(*(.bar)) }
>     sym = .;   // symbol assignments are disallowed
>   } EXTEND;
> 
> * Suggestion (c)
> 
>   SECTIONS {
>     .foo : { KEEP(*(.foo)) }
>     .bar : { KEEP(*(.bar)) }
>   } REPLACE .data;
>   // If the internal linker script defines .data, replace it with the
> content in
>   // the braces. Otherwise, treat .foo and .bar as orphans?
> 
> ---
> 
> Some properties and edge cases need thoughts, e.g.
> 
> * It had better not interfere with normal orphan placement.
> * When an output section is described by both the internal linker script and
> the
>   external one, what should happen? For INSERT, the interaction is currently
> undocumented. It seems that the internal one is applied first, followed by
> the description from INSERT AFTER|BEFORE, e.g.
> 
>     SECTIONS { .data : { data_begin = .; *(.data) }} INSERT BEFORE .bss;
>     // place .data.* into .data because the internal description does so.
> 
>     SECTIONS { .data : { data_begin = .; *(.data) *(.data1) }} INSERT BEFORE
> .bss;
>     // place .data1 into .data
> 
> ---
> 
> For the two feature requests, I think they don't need to define semantics
> when
> there is a collision. The important thing is to have a syntax (not getting in
> the way in the future). The collision cases can be considered "undefined
> behaviors" for now.
Comment 3 Peter Smith 2020-09-17 10:57:58 UTC
As I understand it [*] just adding the script on the command line but without the -T or --script will cause the linker to merge without the need for INSERT etc.

I agree that this is not intuitive outside the use case of GROUP for files, so I think adding a command line option is useful.

It would be good to concentrate on the semantics. I don't have a proposal right now, although I tend towards making the scope of the first implementation as narrow as possible. It could be expanded later on. For example just do a simple 1:1 replacement of an existing section name. If there is no-match then it can be added via the normal orphan placement rules.

[*] I had a recent experience when a colleagues binary wasn't running and it was caused by forgetting -T so a full linker script got merged with the internal linker script, which unfortunately clashed enough to break the program but not report errors.
Comment 4 Nick Clifton 2020-09-17 15:53:58 UTC
(In reply to Peter Smith from comment #3)
> As I understand it [*] just adding the script on the command line but
> without the -T or --script will cause the linker to merge without the need
> for INSERT etc.
 
You are right.  (I had thought that the fragment would override the
built-in script where there are conflicts, but it appears that it
augments it instead).

So maybe what we need are *two* new command line options, just to make
things clear:  --augment-script=<name> and --overwrite-script=<name>.

Hmm, maybe this is going too far.  We already have -t/--script, so
how about we just add --script-action=[replace|merge|overwrite|new] ?
So --script-action=replace would mean that --script replaces the built-in
script, as it does at the moment.  --script-action=merge would mean
that --script would behave like script fragments do at the moment, and
augment the built-in script.  --script-action=overwrite would mean
the same, except that where there are multiple definitions of the same
section, the definition in the script replaces the definition in the 
built-in.  --script-action=new means that only new definitions can be 
added and an error is generated if there are any conflicts.
Comment 5 Fangrui Song 2020-09-17 16:25:14 UTC
Nick and Peter, thanks for looking into the feature request :)

(In reply to Nick Clifton from comment #4)
> (In reply to Peter Smith from comment #3)
> > As I understand it [*] just adding the script on the command line but
> > without the -T or --script will cause the linker to merge without the need
> > for INSERT etc.
>  
> You are right.  (I had thought that the fragment would override the
> built-in script where there are conflicts, but it appears that it
> augments it instead).
> 
> So maybe what we need are *two* new command line options, just to make
> things clear:  --augment-script=<name> and --overwrite-script=<name>.

Looks fine with me (a bit undecided on which one is better when expressing the augment/overwrite intent, with command line option or a syntax). (Currently I probably prefer a command line option more than a syntax.)

> Hmm, maybe this is going too far.  We already have -t/--script, so
> how about we just add --script-action=[replace|merge|overwrite|new] ?
> So --script-action=replace would mean that --script replaces the built-in
> script, as it does at the moment.  --script-action=merge would mean
> that --script would behave like script fragments do at the moment, and
> augment the built-in script.  --script-action=overwrite would mean
> the same, except that where there are multiple definitions of the same
> section, the definition in the script replaces the definition in the 
> built-in.  --script-action=new means that only new definitions can be 
> added and an error is generated if there are any conflicts.

--script-action is position dependent and affects all the following -T/--script, right? I am wary of introducing new position dependent behaviors. In a large build system, linker options may be collected from all over the place, and having to worry about whether a lasting effect of an option can make things brittle. I have seen problems due to not closing --as-needed/-Bstatic/-Bdynamic (--start-group/--whole-archive can lead to problems as well).
Comment 6 Fangrui Song 2021-02-27 21:06:01 UTC
I try to incorporate the previous ideas. A concrete proposal:

Add --overwrite-script=<scriptfile>


scriptfile defines multiple SECTIONS commands. Each SECTIONS defines exactly an output section (for the first implementation we make the scope as narrow as possible).

  SECTIONS {
    .foo : { KEEP(*(.foo)) }
    /* Having another output section is disallowed.
       It is unclear whether the order here imposes a real output section order. */
    /* Having a symbol assignment is disallowed.
       It is unclear whether the symbol assignment and the output section have interaction.  */
  }
  SECTIONS {
    .bar : { start_bar = .; *(.bar); stop_bar = .; }
  }

If the main script defines .foo, its .foo is overwritten;
if the main script does not define .foo, the new .foo is inserted as an orphan (the regular orphan placement rule applies).

.bar is similar. The placement order of the two SECTIONS commands does not imply the section order.
Comment 7 Nick Clifton 2021-03-05 11:29:53 UTC
(In reply to Fangrui Song from comment #6)

> scriptfile defines multiple SECTIONS commands. Each SECTIONS defines exactly
> one output section (for the first implementation we make the scope as narrow
> as possible).
> 
>   SECTIONS {
>     .foo : { KEEP(*(.foo)) }

If there already is a .foo section in the linker script, should any kind of message be issued ?  I assume not by default, but maybe if the --verbose option is active then it would be helpful to know that the override has happened.  Also, if --verbose is used, then presumably the overwritten version is the one that should be displayed.

> If the main script defines .foo, its .foo is overwritten;
> if the main script does not define .foo, the new .foo is inserted as an
> orphan (the regular orphan placement rule applies).

This might be a problem as section placement is often very important.
We may need to implement your INSERT BEFORE/AFTER idea as well.
Comment 8 Fangrui Song 2021-03-05 19:36:38 UTC
(In reply to Nick Clifton from comment #7)
> (In reply to Fangrui Song from comment #6)
> 
> > scriptfile defines multiple SECTIONS commands. Each SECTIONS defines exactly
> > one output section (for the first implementation we make the scope as narrow
> > as possible).
> > 
> >   SECTIONS {
> >     .foo : { KEEP(*(.foo)) }
> 
> If there already is a .foo section in the linker script, should any kind of
> message be issued ?  I assume not by default, but maybe if the --verbose
> option is active then it would be helpful to know that the override has
> happened.  Also, if --verbose is used, then presumably the overwritten
> version is the one that should be displayed.

Sounds good.

> > If the main script defines .foo, its .foo is overwritten;
> > if the main script does not define .foo, the new .foo is inserted as an
> > orphan (the regular orphan placement rule applies).
> 
> This might be a problem as section placement is often very important.
> We may need to implement your INSERT BEFORE/AFTER idea as well.

Adding an orphan section is probably more important than supporting overwriting an existing output section description.
For regular applications, users want to control the properties of new metadata sections: (1) add KEEP (2) set alignment (3) define start/stop symbols (4) SORT (5) ...
They don't precise placement. The default orphan placement rule likely works and they just want to leverage that.

INSERT BEFORE/AFTER can be awakward because the user has to specify an existing output section, otherwise an error will be reported:

  % ld.bfd -T insert-not-exist.test a.o -o /dev/null
  ld.bfd: .not_exist not found for insert
Comment 9 Fangrui Song 2021-06-01 19:43:41 UTC
I created a ld.lld patch for the --overwrite-script idea https://reviews.llvm.org/D103303
Comment 10 Roland McGrath 2021-06-02 00:57:24 UTC
Build system integration often makes it much easier and simpler to supply new link input files than to plumb special switches that take input file arguments.
For this reason I think that it would be more valuable to more users to be able to get these semantics purely through an implicit input linker script without special command-line syntax.  I certainly don't object to the explicit switch syntax being available, but I would much rather use this feature myself in a form where it's expressed entirely within an input linker script.
Comment 11 Fangrui Song 2021-06-07 19:29:54 UTC
My latest revision of https://reviews.llvm.org/D103303 switched to a new keyword "OVERWRITE_SECTIONS". I dropped the command line option.One can place more than descriptions, though there is no ordering requirement, e.g.

OVERWRITE_SECTIONS {
  .foo.data : { FOO_DATA_START = .; *(.foo.data) FOO_DATA_END = .;  }
  .foo.text : { FOO_TEXT_START = .; *(.foo.text) FOO_TEXT_END = .;  }
}

.foo.data can be after .foo.text .
Comment 12 ceri dwen 2021-09-03 08:42:26 UTC Comment hidden (spam)