Bug 33530 - ld --gc-sections is quadratic-slow on input with huge list of sections (20K)
Summary: ld --gc-sections is quadratic-slow on input with huge list of sections (20K)
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.45
: P2 normal
Target Milestone: 2.46
Assignee: Alan Modra
URL: https://sourceware.org/pipermail/binu...
Keywords:
Depends on:
Blocks:
 
Reported: 2025-10-08 20:40 UTC by Sergei Trofimovich
Modified: 2025-10-30 06:01 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2025-10-10 00:00:00
Project(s) to access:
ssh public key:


Attachments
A patch (11.89 KB, patch)
2025-10-10 01:51 UTC, H.J. Lu
Details | Diff
A new patch (13.15 KB, patch)
2025-10-10 08:38 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sergei Trofimovich 2025-10-08 20:40:04 UTC
It's reproducible on x86_64-linux target, but I expect the problem to be more generic.

I'll start from the reproducer:

$ printf "int f_0() __attribute__ ((section (\".text.0\"))); int f_0() { return 0; };\n" > main.c; for (( i=1; i<20000; i++ )); do printf "int f_$i() __attribute__ ((section (\".text.$i\"))); int f_$i() { return f_$((i-1))(); };\n"; done >> main.c; printf "int main() { return f_19999(); }" >> main.c; gcc -O0 -c main.c -o main.o; echo "bfd:"; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections; echo "gold:"; time gcc main.o -o main -fuse-ld=gold -Wl,--gc-sections

bfd:

real    0m5,175s
user    0m2,586s
sys     0m2,575s
gold:

real    0m0,049s
user    0m0,036s
sys     0m0,014s

Note how faster `gold` is in this case. I hope `bfd` can be fixed to be on par with it.

IN the example we generate a function per section and build a chain references:

$ head -n 5 main.c
int f_0() __attribute__ ((section (".text.0"))); int f_0() { return 0; };
int f_1() __attribute__ ((section (".text.1"))); int f_1() { return f_0(); };
int f_2() __attribute__ ((section (".text.2"))); int f_2() { return f_1(); };
int f_3() __attribute__ ((section (".text.3"))); int f_3() { return f_2(); };
int f_4() __attribute__ ((section (".text.4"))); int f_4() { return f_3(); };

$ tail -n 5 main.c
int f_19996() __attribute__ ((section (".text.19996"))); int f_19996() { return f_19995(); };
int f_19997() __attribute__ ((section (".text.19997"))); int f_19997() { return f_19996(); };
int f_19998() __attribute__ ((section (".text.19998"))); int f_19998() { return f_19997(); };
int f_19999() __attribute__ ((section (".text.19999"))); int f_19999() { return f_19998(); };
int main() { return f_19999(); }

From what I understand staring at the profiles the trigger here is `-Wl,--gc-sections`. Otherwise `bfd` is as fast.

Where the time I think is spent: `_bfd_elf_gc_mark / _bfd_elf_gc_mark_reloc`
finds out unused sections for removal: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elflink.c;h=91c77c211ef065a77883004eb696adacd92a00be;hb=815d9a14cbbb3b81843f7566222c87fb22e7255d#l14063

```c
bool
_bfd_elf_gc_mark_reloc (struct bfd_link_info *info,
                        asection *sec,
                        elf_gc_mark_hook_fn gc_mark_hook,
                        struct elf_reloc_cookie *cookie)
{
  asection *rsec;
  bool start_stop = false;

  rsec = _bfd_elf_gc_mark_rsec (info, sec, gc_mark_hook, cookie, &start_stop);
  while (rsec != NULL)
    {
      if (!rsec->gc_mark)
        {
          if (bfd_get_flavour (rsec->owner) != bfd_target_elf_flavour
              || (rsec->owner->flags & DYNAMIC) != 0)
            rsec->gc_mark = 1;
          else if (!_bfd_elf_gc_mark (info, rsec, gc_mark_hook))
            return false;
        }
      if (!start_stop)
        break;
      rsec = bfd_get_next_section_by_name (rsec->owner, rsec);
    }
  return true;
}
```

This `while()` loop looks suspicious. If we do such marking for each section it probably has quadratic complexity.

The performance drop is picked from a real example: such huge section count is typical for binaries built by haskell GHC compiler in `-split-sections` mode: https://downloads.haskell.org/ghc/9.12.2/docs/users_guide/phases.html#ghc-flag-fsplit-sections

I think it's similar to `-ffunction-sections` `gcc` option at least in spirit.

`gold` seems to work quite a bit faster on such inputs. While `bfd` uses a lot of RAM and gets slower and slower with the increase of functions I put to the file.

This was noticed in https://discourse.nixos.org/t/removing-gold-from-nixpkgs/70496 where `nixpkgs` considers switching from `gold` linker as `gold` is deprecated for removal. Switching to `bfd` so far looks like a big link-time regression: on larger inputs it's extra minutes of linking on individual binaries.

I hope that `bfd` could be fixed not to degrade as much on such pathological inputs.

Thanks!
Comment 1 H.J. Lu 2025-10-10 01:51:41 UTC
Created attachment 16416 [details]
A patch

Please try this.
Comment 2 Sam James 2025-10-10 05:24:53 UTC
(In reply to H.J. Lu from comment #1)
> Created attachment 16416 [details]
> A patch
> 
> Please try this.

I get libctf test failures:
```
tmpdir/lookup  tmpdir/out.so /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/libctf-lookup/big-struct-ctf.o
Executing on host: sh -c {tmpdir/lookup  tmpdir/out.so /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/libctf-lookup/big-struct-ctf.o 2>&1}  /dev/null ld.tmp (tim
eout = 300)
spawn [open ...]
BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in bfd_elf_get_elf_syms
Please report this bug.
BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in bfd_elf_get_elf_syms
Please report this bug.

regexp_diff match failure
regexp "^Large and huge structs working fine.$"
line   "BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in bfd_elf_get_elf_syms"
extra lines in tmpdir/lookup.out starting with "^Please report this bug.$"
EOF from /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/libctf-lookup/big-struct-corruption.lk
FAIL: libctf-lookup/big-struct-corruption
```
Comment 3 H.J. Lu 2025-10-10 08:37:39 UTC
(In reply to Sam James from comment #2)
> (In reply to H.J. Lu from comment #1)
> > Created attachment 16416 [details]
> > A patch
> > 
> > Please try this.
> 
> I get libctf test failures:
> ```
> tmpdir/lookup  tmpdir/out.so
> /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/
> libctf-lookup/big-struct-ctf.o
> Executing on host: sh -c {tmpdir/lookup  tmpdir/out.so
> /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/
> libctf-lookup/big-struct-ctf.o 2>&1}  /dev/null ld.tmp (tim
> eout = 300)
> spawn [open ...]
> BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at
> /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in
> bfd_elf_get_elf_syms
> Please report this bug.
> BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at
> /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in
> bfd_elf_get_elf_syms
> Please report this bug.
> 
> regexp_diff match failure
> regexp "^Large and huge structs working fine.$"
> line   "BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting
> at /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in
> bfd_elf_get_elf_syms"
> extra lines in tmpdir/lookup.out starting with "^Please report this bug.$"
> EOF from
> /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/
> libctf-lookup/big-struct-corruption.lk
> FAIL: libctf-lookup/big-struct-corruption
> ```

I can't reproduce it.
Comment 4 H.J. Lu 2025-10-10 08:38:07 UTC
Created attachment 16417 [details]
A new patch

Please try this.
Comment 5 Sergei Trofimovich 2025-10-10 11:02:52 UTC
(In reply to H.J. Lu from comment #4)
> Created attachment 16417 [details]
> A new patch
> 
> Please try this.

Synthetic test from #comment1 now works 100ms on `ld.bfd` (compared to 50ms for `ld.gold`):

real    0m0,108s
user    0m0,073s
sys     0m0,035s

On a real test linking of a large `pandoc` project shrunk from 36s down to 17s (gold does it in 13s).

Looks great to me!
Comment 6 Alan Modra 2025-10-21 23:51:02 UTC
Alernate patch series starting with https://sourceware.org/pipermail/binutils/2025-October/144989.html
Comment 7 Sergei Trofimovich 2025-10-22 19:51:44 UTC
(In reply to Alan Modra from comment #6)
> Alernate patch series starting with
> https://sourceware.org/pipermail/binutils/2025-October/144989.html

Does it fix #comment1 for you? I don't see any improvement if I apply 4 patches at https://inbox.sourceware.org/binutils/aPgbFuLXiYbXsWKW@squeak.grove.modra.org/raw against binutils-2.45.

It takes the same 5 seconds after the patch mostly spending in sys:

$ printf "int f_0() __attribute__ ((section (\".text.0\"))); int f_0() { return 0; };\n" > main.c; for (( i=1; i<20000; i++ )); do printf "int f_$i() __attribute__ ((section (\".text.$i\"))); int f_$i() { return f_$((i-1))(); };\n"; done >> main.c; printf "int main() { return f_19999(); }" >> main.c; gcc -O0 -c main.c -o main.o; echo "bfd:"; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections
bfd:

real    0m5,800s
user    0m2,569s
sys     0m3,216s

`perf record -g` / `perf report` claims is still spends all the time in `_bfd_elf_gc_mark`:

    98.30%     0.02%  ld.bfd    libbfd-2.45.so        [.] _bfd_elf_gc_mark
            |
             --98.29%--_bfd_elf_gc_mark
                       |
                        --98.28%--_bfd_elf_gc_mark_reloc
                                  _bfd_elf_gc_mark
                                  |
                                   --98.28%--_bfd_elf_gc_mark_reloc
                                             _bfd_elf_gc_mark
                                             |
                                              --98.27%--_bfd_elf_gc_mark_reloc
                                                        _bfd_elf_gc_mark
                                                        _bfd_elf_gc_mark_reloc
                                                        _bfd_elf_gc_mark
   ...
Comment 8 Sergei Trofimovich 2025-10-22 20:14:35 UTC
(In reply to Sergei Trofimovich from comment #7)
> (In reply to Alan Modra from comment #6)
> > Alernate patch series starting with
> > https://sourceware.org/pipermail/binutils/2025-October/144989.html
> 
> Does it fix #comment1 for you? I don't see any improvement if I apply 4
> patches at
> https://inbox.sourceware.org/binutils/aPgbFuLXiYbXsWKW@squeak.grove.modra.
> org/raw against binutils-2.45.
> 
> It takes the same 5 seconds after the patch mostly spending in sys:

Oh, I probably applied only the first patch instead of the whole thread. I'll retest with all 4.
Comment 9 Alan Modra 2025-10-22 22:02:49 UTC
(In reply to Sergei Trofimovich from comment #8)
> (In reply to Sergei Trofimovich from comment #7)
> > It takes the same 5 seconds after the patch mostly spending in sys:
> 
> Oh, I probably applied only the first patch instead of the whole thread.
> I'll retest with all 4.

My "-O1 -fno-inline -g" builds:
before
real	0m8.052s
user	0m3.058s
sys	0m4.993s

after
real	0m0.182s
user	0m0.095s
sys	0m0.087s

gold
real	0m0.113s
user	0m0.089s
sys	0m0.024s
Comment 10 Sergei Trofimovich 2025-10-23 05:46:38 UTC
(In reply to Alan Modra from comment #6)
> Alernate patch series starting with
> https://sourceware.org/pipermail/binutils/2025-October/144989.html

Tested with all 4 patches cherry-picked. Observe similar performance improvement as H.J.'s patches:

$ printf "int f_0() __attribute__ ((section (\".text.0\"))); int f_0() { return 0; };\n" > main.c; for (( i=1; i<20000; i++ )); do printf "int f_$i() __attribute__ ((section (\".text.$i\"))); int f_$i() { return f_$((i-1))(); };\n"; done >> main.c; printf "int main() { return f_19999(); }" >> main.c; gcc -O0 -c main.c -o main.o; echo "bfd:"; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections
bfd:

real    0m0,113s
user    0m0,070s
sys     0m0,043s

(was 5s before patching)

On a real test linking of a large `pandoc` project shrunk from 36s down to 18s (gold does it in 13s).

Also looks good!
Comment 11 Sourceware Commits 2025-10-30 05:57:32 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ad4b8c3e9586991795daf8358bee31335e412e56

commit ad4b8c3e9586991795daf8358bee31335e412e56
Author: Alan Modra <amodra@gmail.com>
Date:   Thu Oct 30 16:26:27 2025 +1030

    Pass cookie and symndx to gc_mark_hook
    
    Replace the "sym" param with "cookie" and "symndx".  This is in
    preparation for the next patch.  Also remove "rel" param since this is
    available via "cookie", and is always set from cookie->rel.
    
            PR 33530
            * elf-m10300.c (mn10300_elf_gc_mark_hook): Replace "rel" and "sym"
            params with "cookie" and "symndx".  Adjust to suit.
            * elf32-arm.c (elf32_arm_gc_mark_hook): Likewise.
            * elf32-bfin.c (bfin_gc_mark_hook): Likewise.
            * elf32-cris.c (cris_elf_gc_mark_hook): Likewise.
            * elf32-csky.c (csky_elf_gc_mark_hook): Likewise.
            * elf32-d10v.c (elf32_d10v_gc_mark_hook): Likewise.
            * elf32-fr30.c (fr30_elf_gc_mark_hook): Likewise.
            * elf32-frv.c (elf32_frv_gc_mark_hook): Likewise.
            * elf32-hppa.c (elf32_hppa_gc_mark_hook): Likewise.
            * elf32-iq2000.c (iq2000_elf_gc_mark_hook): Likewise.
            * elf32-lm32.c (lm32_elf_gc_mark_hook): Likewise.
            * elf32-m32r.c (m32r_elf_gc_mark_hook): Likewise.
            * elf32-m68k.c (elf_m68k_gc_mark_hook): Likewise.
            * elf32-mcore.c (mcore_elf_gc_mark_hook): Likewise.
            * elf32-metag.c (elf_metag_gc_mark_hook): Likewise.
            * elf32-microblaze.c (microblaze_elf_gc_mark_hook): Likewise.
            * elf32-nds32.c (nds32_elf_gc_mark_hook): Likewise.
            * elf32-or1k.c (or1k_elf_gc_mark_hook): Likewise.
            * elf32-ppc.c (ppc_elf_gc_mark_hook): Likewise.
            * elf32-s390.c (elf_s390_gc_mark_hook): Likewise.
            * elf32-score.c (s3_bfd_score_elf_gc_mark_hook): Likewise.
            (_bfd_score_elf_gc_mark_hook): Likewise.
            * elf32-score7.c (s7_bfd_score_elf_gc_mark_hook): Likewise.
            * elf32-sh.c (sh_elf_gc_mark_hook): Likewise.
            * elf32-tilepro.c (tilepro_elf_gc_mark_hook): Likewise.
            * elf32-v850.c (v850_elf_gc_mark_hook): Likewise.
            * elf32-vax.c (elf_vax_gc_mark_hook): Likewise.
            * elf32-visium.c (visium_elf_gc_mark_hook): Likewise.
            * elf32-xstormy16.c (xstormy16_elf_gc_mark_hook): Likewise.
            * elf32-xtensa.c (elf_xtensa_gc_mark_hook): Likewise.
            * elf64-alpha.c (elf64_alpha_gc_mark_hook): Likewise.
            * elf64-mmix.c (mmix_elf_gc_mark_hook): Likewise.
            * elf64-ppc.c (ppc64_elf_gc_mark_hook): Likewise.
            * elf64-s390.c (elf_s390_gc_mark_hook): Likewise.
            * elfnn-loongarch.c (loongarch_elf_gc_mark_hook): Likewise.
            * elfxx-mips.c (_bfd_mips_elf_gc_mark_hook): Likewise.
            * elfxx-sparc.c (_bfd_sparc_elf_gc_mark_hook): Likewise.
            * elfxx-tilegx.c (tilegx_elf_gc_mark_hook): Likewise.
            * elfxx-x86.c (_bfd_x86_elf_gc_mark_hook): Likewise.
            * elflink.c (_bfd_elf_gc_mark_hook): Likewise.
            (elf_gc_mark_debug_section): Likewise.
            (_bfd_elf_gc_mark_rsec): Adjust gc_mark_hook calls.
            * elf32-cr16.c (elf32_cr16_gc_mark_hook): Delete.
            (elf_backend_gc_mark_hook): Don't define.
            * elf32-moxie.c (moxie_elf_gc_mark_hook): Delete.
            (elf_backend_gc_mark_hook): Don't define.
            * elf-bfd.h (elf_gc_mark_hook_fn, _bfd_elf_gc_mark_hook): Update
            declarations.
            * elf32-score.h (s7_bfd_score_elf_gc_mark_hook): Likewise.
            * elfxx-mips.h (_bfd_mips_elf_gc_mark_hook): Likewise.
            * elfxx-sparc.h (_bfd_sparc_elf_gc_mark_hook): Likewise.
            * elfxx-tilegx.h (tilegx_elf_gc_mark_hook): Likewise.
            * elfxx-x86.h (_bfd_x86_elf_gc_mark_hook): Likewise.
Comment 12 Sourceware Commits 2025-10-30 05:57:37 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=57ccec66689305cdb59afa179b3f9a6464c84820

commit 57ccec66689305cdb59afa179b3f9a6464c84820
Author: Alan Modra <amodra@gmail.com>
Date:   Thu Oct 30 16:26:44 2025 +1030

    _bfd_elf_get_link_hash_entry tidy
    
    Replace the "Elf_Internal_Shdr *symtab_hdr" parameter with
    "unsigned int ext_sym_start", making it a duplicate of the existing
    get_link_hash_entry function.
    
    Also remove unnecessary checks from get_ext_sym_hash_from_cookie and
    find_merged_cie.  The sym_hashes and symbol index checks in
    get_ext_sym_hash_from_cookie are duplicates of those done in
    _bfd_elf_get_link_hash_entry, and there is no need to check for a
    global symbol before calling _bfd_elf_get_link_hash_entry.  When
    bad_symtab, local symbols will have a NULL sym_hashes entry.  Removing
    these unnecessary checks gets rid of some cookie->locsyms references.
    
            PR 33530
            * elf-bfd.h (_bfd_elf_get_link_hash_entry): Update declaration.
            * elflink.c (_bfd_elf_get_link_hash_entry): Rename from
            get_link_hash_entry, adjusting all calls and deleting original
            function.
            (get_ext_sym_hash_from_cookie): Make "symndx" unsigned int.
            Remove unnecessary check on sym_hashes, symbol index and
            symbol binding.
            * elf-eh-frame.c (find_merged_cie): Remove similar unnecessary
            checks.
            * elf64-x86-64.c (elf_x86_64_scan_relocs): Adjust.
            * elfxx-x86.c (_bfd_x86_elf_check_relocs): Adjust.
            (_bfd_x86_elf_link_relax_section): Adjust.
Comment 13 Sourceware Commits 2025-10-30 05:57:42 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8b99ca44515b2b56f1492786e23522e4abbe8a96

commit 8b99ca44515b2b56f1492786e23522e4abbe8a96
Author: Alan Modra <amodra@gmail.com>
Date:   Thu Oct 30 16:26:50 2025 +1030

    Don't read and cache local syms for gc-sections
    
    Most places just need the local sym section, so reading and sometimes
    caching the symbols is excessive.  A symbol shndx can be stored in 4
    bytes, an elf symbol internal form requires 32 bytes.  When caching
    the local symbols we went slightly crazy trying to avoid memory usage,
    resulting in the symbols being freed then immediately read again for
    the testcase in the PR33530.
    
    To avoid this problem, this patch caches the local symbol section
    indices in the bfd rather than in the reloc cookie.  They are not
    initialised until there is a need for them, so unlike elf_sym_hashes
    for global syms you cannot rely on them being present.
    
    One place that does need local syms is adjust_eh_frame_local_symbols,
    but that is called once via bfd_discard_info so there is no problem
    simply reading them.  The other place that needs local syms is
    ppc64_elf_gc_mark_hook for the old ELFv1 ABI when handling .opd.
    bfd_sym_from_r_symndx should be sufficient for function pointer
    references to static functions, which is how this code is triggered.
    
            PR 33530
            * elf-bfd.h (struct elf_reloc_cookie): Delete "locsyms",
            "sym_hashes", "bad_symtab".  Make "locsymcount" and
            "extsymoff" unsigned int.
            (struct elf_obj_tdata): Add loc_shndx.
            (elf_loc_shndx): Define.
            (_bfd_get_local_sym_section): Declare.
            * elf-eh-frame.c (find_merged_cie): Use
            _bfd_get_local_sym_section for local syms.
            (adjust_eh_frame_local_symbols): Read local syms if any match
            .eh_frame section.  Return them if changed.
            (_bfd_elf_discard_section_eh_frame): Adjust.
            * elf64-ppc.c (ppc64_elf_gc_mark_hook): Use
            _bfd_get_local_sym_section.  Use bfd_sym_from_r_symndx when
            reading opd local symbol.
            * elflink.c (_bfd_get_local_sym_section): New function.
            (_bfd_elf_section_for_symbol): Use it.
            (elf_link_add_object_symbols): Remove unnecessary cast on
            bfd_zalloc return.
            (init_reloc_cookie): Remove "info" and "keep_memory" params.
            Adjust all callers.  Don't stash elf_sym_hashes and
            elf_bad_symtab to cookie.  Don't read local syms to cookie.
            (fini_reloc_cookie): Do nothing.
            (_bfd_elf_gc_mark_hook): Use _bfd_get_local_sym_section.
            (elf_gc_mark_debug_section): Likewise.
            (bfd_elf_reloc_symbol_deleted_p): Likewise.  Update cookie use.
Comment 14 Alan Modra 2025-10-30 06:01:03 UTC
Patch series pushed mainline.  I don't intend to push this to the branch.