Bug 27451

Summary: ld: Provide a way to make C identifier name sections GCable under __start_/__stop_ references
Product: binutils Reporter: Fangrui Song <i>
Component: ldAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: 2.37   
Host: Target:
Build: Last reconfirmed:

Description Fangrui Song 2021-02-21 19:17:51 UTC
The discussion below is about C identifier name sections.

From my observation, GNU ld currently has a rule:

  __start_/__stop_ references from a live section retain all the associated C identifier name sections.

In the following example, __start_meta reference retains both a.o:(meta) and b.o:(meta).

  # a.s
  .global _start
  .text
  _start:
    leaq __start_meta(%rip), %rdi
    leaq __stop_meta(%rip), %rsi
  
  .section meta,"a"
  .byte 0
  
  # b.s
  .section meta,"a"
  .byte 1

In LLD, we have augmented the rule with SHF_LINK_ORDER and SHF_GROUP (after https://reviews.llvm.org/D96753, target release 13.0.0):

  __start_/__stop_ references from a live section retain all the associated non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections.

At this point, some LLVM toolchain folks have concluded that the original rule does not cary its weight, and I actually have a thought to drop it:

  __start_/__stop_ references from a live section do not retain the associated C identifier name sections.

  Either use undefined weak __start_/__stop_, or ensure there is at least one live C identifier name section to avoid "undefined symbol" errors.

There are more details in the "Metadata sections referenced by text sections" section of
https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order

There is good chance that not many OSS need adaptation (I have tested thousands and only swift/systemd have issues.)
I understand that there are still risks, so we may need a linker option to drop the original rule.

__start_/__stop_ is currently ELF specific, so the option is assumed to be under -z.
Given https://sourceware.org/pipermail/binutils/2020-June/111685.html "-z start-stop-visibility=",
we can probably name it -z start-stop-something, e.g.  -z start-stop-gc and -z nostart-stop-gc.

The option decides whether non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections are retained with __start_/__stop_ references.
For SHF_LINK_ORDER or SHF_GROUP sections, it seems that should always be GCable.

There is a possibility that LLD may switch to default -z start-stop-gc. This move can be harder for GNU ld because of stronger commitment with existing behaviors.
(I have asked ClangBuiltLinux folks to test the Linux kernel https://github.com/ClangBuiltLinux/linux/issues/1307 . I have tested x86-64 defconfig myself and it works.
Comment 1 Fangrui Song 2021-02-23 21:36:30 UTC
In LLD, I'll add -z start-stop-gc to let __start_/__stop_ not retain C identifier name sections (https://reviews.llvm.org/D96914). -z nostart-stop-gc can disable it.

For SHF_LINK_ORDER or SHF_GROUP sections, __start_/__stop_ references do not retain them, regardless of -z start-stop-gc or -z nostart-stop-gc.
Comment 2 Fangrui Song 2021-02-27 20:31:49 UTC
Patch: https://sourceware.org/pipermail/binutils/2021-February/115557.html

This rule has caused trouble to clang -fprofile-generate and -fsanitize-coverage.


Other than glibc, my analysis has found some other usage:

* systemd and Swift (https://lists.llvm.org/pipermail/llvm-dev/2021-February/148682.html)
* Clang's ObjC implementation in GNU environment (clang/lib/CodeGen/CGObjCGNU.cpp ): This has been taken care of by my https://reviews.llvm.org/D97448

Hope that at some point we can start to -z start-stop-gc.
Comment 3 Sourceware Commits 2021-03-01 06:59:17 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8ee10e86093150c70360d9e26b29e6d9b6398f33

commit 8ee10e86093150c70360d9e26b29e6d9b6398f33
Author: Alan Modra <amodra@gmail.com>
Date:   Mon Mar 1 08:22:49 2021 +1030

    PR27451, -z start_stop_gc
    
    When --gc-sections is in effect, a reference from a retained section
    to __start_SECNAME or __stop_SECNAME causes all input sections named
    SECNAME to also be retained, if SECNAME is representable as a C
    identifier and either __start_SECNAME or __stop_SECNAME is synthesized
    by the linker.  Add an option to disable that feature, effectively
    ignoring any relocation that references a synthesized linker defined
    __start_ or __stop_ symbol.
    
            PR 27451
    include/
            * bfdlink.h (struct bfd_link_info): Add start_stop_gc.
    bfd/
            * elflink.c (_bfd_elf_gc_mark_rsec): Ignore synthesized linker
            defined start/stop symbols when start_stop_gc.
            (bfd_elf_gc_mark_dynamic_ref_symbol): Likewise.
            (bfd_elf_define_start_stop): Don't modify ldscript_def syms.
            * linker.c (bfd_generic_define_start_stop): Likewise.
    ld/
            * emultempl/elf.em: Handle -z start-stop-gc and -z nostart-stop-gc.
            * lexsup.c (elf_static_list_options): Display help for them.  Move
            help for -z stack-size to here from elf_shlib_list_options. Add
            help for -z start-stop-visibility and -z undefs.
            * ld.texi: Document -z start-stop-gc and -z nostart-stop-gc.
            * NEWS: Mention -z start-stop-gc.
            * testsuite/ld-gc/start2.s,
            * testsuite/ld-gc/start2.d: New test.
            * testsuite/ld-gc/gc.exp: Run it.
Comment 4 Sourceware Commits 2021-03-02 11:27:00 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b80e421f9136117389be9d96210b35b3d562d725

commit b80e421f9136117389be9d96210b35b3d562d725
Author: Alan Modra <amodra@gmail.com>
Date:   Tue Mar 2 21:25:20 2021 +1030

    PR27451, -z start_stop_gc for powerpc64
    
    PowerPC64 has its own gc_mark_dynamic_ref.
    
    bfd/
            PR 27451
            * elf64-ppc.c (ppc64_elf_gc_mark_dynamic_ref): Ignore synthesized
            linker defined start/stop symbols when start_stop_gc.
    ld/
            * testsuite/ld-powerpc/startstop.d,
            * testsuite/ld-powerpc/startstop.r,
            * testsuite/ld-powerpc/startstop.s: New test.
            * testsuite/ld-powerpc/powerpc.exp: Run it.
Comment 5 Fangrui Song 2022-06-21 08:44:35 UTC
GNU ld 2.37 has -z start-stop-gc.