It's reproducible on x86_64-linux target, but I expect the problem to be more generic. I'll start from the reproducer: $ printf "int f_0() __attribute__ ((section (\".text.0\"))); int f_0() { return 0; };\n" > main.c; for (( i=1; i<20000; i++ )); do printf "int f_$i() __attribute__ ((section (\".text.$i\"))); int f_$i() { return f_$((i-1))(); };\n"; done >> main.c; printf "int main() { return f_19999(); }" >> main.c; gcc -O0 -c main.c -o main.o; echo "bfd:"; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections; echo "gold:"; time gcc main.o -o main -fuse-ld=gold -Wl,--gc-sections bfd: real 0m5,175s user 0m2,586s sys 0m2,575s gold: real 0m0,049s user 0m0,036s sys 0m0,014s Note how faster `gold` is in this case. I hope `bfd` can be fixed to be on par with it. IN the example we generate a function per section and build a chain references: $ head -n 5 main.c int f_0() __attribute__ ((section (".text.0"))); int f_0() { return 0; }; int f_1() __attribute__ ((section (".text.1"))); int f_1() { return f_0(); }; int f_2() __attribute__ ((section (".text.2"))); int f_2() { return f_1(); }; int f_3() __attribute__ ((section (".text.3"))); int f_3() { return f_2(); }; int f_4() __attribute__ ((section (".text.4"))); int f_4() { return f_3(); }; $ tail -n 5 main.c int f_19996() __attribute__ ((section (".text.19996"))); int f_19996() { return f_19995(); }; int f_19997() __attribute__ ((section (".text.19997"))); int f_19997() { return f_19996(); }; int f_19998() __attribute__ ((section (".text.19998"))); int f_19998() { return f_19997(); }; int f_19999() __attribute__ ((section (".text.19999"))); int f_19999() { return f_19998(); }; int main() { return f_19999(); } From what I understand staring at the profiles the trigger here is `-Wl,--gc-sections`. Otherwise `bfd` is as fast. Where the time I think is spent: `_bfd_elf_gc_mark / _bfd_elf_gc_mark_reloc` finds out unused sections for removal: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elflink.c;h=91c77c211ef065a77883004eb696adacd92a00be;hb=815d9a14cbbb3b81843f7566222c87fb22e7255d#l14063 ```c bool _bfd_elf_gc_mark_reloc (struct bfd_link_info *info, asection *sec, elf_gc_mark_hook_fn gc_mark_hook, struct elf_reloc_cookie *cookie) { asection *rsec; bool start_stop = false; rsec = _bfd_elf_gc_mark_rsec (info, sec, gc_mark_hook, cookie, &start_stop); while (rsec != NULL) { if (!rsec->gc_mark) { if (bfd_get_flavour (rsec->owner) != bfd_target_elf_flavour || (rsec->owner->flags & DYNAMIC) != 0) rsec->gc_mark = 1; else if (!_bfd_elf_gc_mark (info, rsec, gc_mark_hook)) return false; } if (!start_stop) break; rsec = bfd_get_next_section_by_name (rsec->owner, rsec); } return true; } ``` This `while()` loop looks suspicious. If we do such marking for each section it probably has quadratic complexity. The performance drop is picked from a real example: such huge section count is typical for binaries built by haskell GHC compiler in `-split-sections` mode: https://downloads.haskell.org/ghc/9.12.2/docs/users_guide/phases.html#ghc-flag-fsplit-sections I think it's similar to `-ffunction-sections` `gcc` option at least in spirit. `gold` seems to work quite a bit faster on such inputs. While `bfd` uses a lot of RAM and gets slower and slower with the increase of functions I put to the file. This was noticed in https://discourse.nixos.org/t/removing-gold-from-nixpkgs/70496 where `nixpkgs` considers switching from `gold` linker as `gold` is deprecated for removal. Switching to `bfd` so far looks like a big link-time regression: on larger inputs it's extra minutes of linking on individual binaries. I hope that `bfd` could be fixed not to degrade as much on such pathological inputs. Thanks!
Created attachment 16416 [details] A patch Please try this.
(In reply to H.J. Lu from comment #1) > Created attachment 16416 [details] > A patch > > Please try this. I get libctf test failures: ``` tmpdir/lookup tmpdir/out.so /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/libctf-lookup/big-struct-ctf.o Executing on host: sh -c {tmpdir/lookup tmpdir/out.so /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/libctf-lookup/big-struct-ctf.o 2>&1} /dev/null ld.tmp (tim eout = 300) spawn [open ...] BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in bfd_elf_get_elf_syms Please report this bug. BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in bfd_elf_get_elf_syms Please report this bug. regexp_diff match failure regexp "^Large and huge structs working fine.$" line "BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in bfd_elf_get_elf_syms" extra lines in tmpdir/lookup.out starting with "^Please report this bug.$" EOF from /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/libctf-lookup/big-struct-corruption.lk FAIL: libctf-lookup/big-struct-corruption ```
(In reply to Sam James from comment #2) > (In reply to H.J. Lu from comment #1) > > Created attachment 16416 [details] > > A patch > > > > Please try this. > > I get libctf test failures: > ``` > tmpdir/lookup tmpdir/out.so > /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/ > libctf-lookup/big-struct-ctf.o > Executing on host: sh -c {tmpdir/lookup tmpdir/out.so > /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/ > libctf-lookup/big-struct-ctf.o 2>&1} /dev/null ld.tmp (tim > eout = 300) > spawn [open ...] > BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at > /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in > bfd_elf_get_elf_syms > Please report this bug. > BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting at > /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in > bfd_elf_get_elf_syms > Please report this bug. > > regexp_diff match failure > regexp "^Large and huge structs working fine.$" > line "BFD: BFD (Gentoo 9999 p1) 2.45.50.20251010 internal error, aborting > at /var/tmp/portage/sys-devel/binutils-9999/work/binutils/bfd/elf.c:485 in > bfd_elf_get_elf_syms" > extra lines in tmpdir/lookup.out starting with "^Please report this bug.$" > EOF from > /var/tmp/portage/sys-devel/binutils-9999/work/binutils/libctf/testsuite/ > libctf-lookup/big-struct-corruption.lk > FAIL: libctf-lookup/big-struct-corruption > ``` I can't reproduce it.
Created attachment 16417 [details] A new patch Please try this.
(In reply to H.J. Lu from comment #4) > Created attachment 16417 [details] > A new patch > > Please try this. Synthetic test from #comment1 now works 100ms on `ld.bfd` (compared to 50ms for `ld.gold`): real 0m0,108s user 0m0,073s sys 0m0,035s On a real test linking of a large `pandoc` project shrunk from 36s down to 17s (gold does it in 13s). Looks great to me!
Alernate patch series starting with https://sourceware.org/pipermail/binutils/2025-October/144989.html
(In reply to Alan Modra from comment #6) > Alernate patch series starting with > https://sourceware.org/pipermail/binutils/2025-October/144989.html Does it fix #comment1 for you? I don't see any improvement if I apply 4 patches at https://inbox.sourceware.org/binutils/aPgbFuLXiYbXsWKW@squeak.grove.modra.org/raw against binutils-2.45. It takes the same 5 seconds after the patch mostly spending in sys: $ printf "int f_0() __attribute__ ((section (\".text.0\"))); int f_0() { return 0; };\n" > main.c; for (( i=1; i<20000; i++ )); do printf "int f_$i() __attribute__ ((section (\".text.$i\"))); int f_$i() { return f_$((i-1))(); };\n"; done >> main.c; printf "int main() { return f_19999(); }" >> main.c; gcc -O0 -c main.c -o main.o; echo "bfd:"; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections bfd: real 0m5,800s user 0m2,569s sys 0m3,216s `perf record -g` / `perf report` claims is still spends all the time in `_bfd_elf_gc_mark`: 98.30% 0.02% ld.bfd libbfd-2.45.so [.] _bfd_elf_gc_mark | --98.29%--_bfd_elf_gc_mark | --98.28%--_bfd_elf_gc_mark_reloc _bfd_elf_gc_mark | --98.28%--_bfd_elf_gc_mark_reloc _bfd_elf_gc_mark | --98.27%--_bfd_elf_gc_mark_reloc _bfd_elf_gc_mark _bfd_elf_gc_mark_reloc _bfd_elf_gc_mark ...
(In reply to Sergei Trofimovich from comment #7) > (In reply to Alan Modra from comment #6) > > Alernate patch series starting with > > https://sourceware.org/pipermail/binutils/2025-October/144989.html > > Does it fix #comment1 for you? I don't see any improvement if I apply 4 > patches at > https://inbox.sourceware.org/binutils/aPgbFuLXiYbXsWKW@squeak.grove.modra. > org/raw against binutils-2.45. > > It takes the same 5 seconds after the patch mostly spending in sys: Oh, I probably applied only the first patch instead of the whole thread. I'll retest with all 4.
(In reply to Sergei Trofimovich from comment #8) > (In reply to Sergei Trofimovich from comment #7) > > It takes the same 5 seconds after the patch mostly spending in sys: > > Oh, I probably applied only the first patch instead of the whole thread. > I'll retest with all 4. My "-O1 -fno-inline -g" builds: before real 0m8.052s user 0m3.058s sys 0m4.993s after real 0m0.182s user 0m0.095s sys 0m0.087s gold real 0m0.113s user 0m0.089s sys 0m0.024s
(In reply to Alan Modra from comment #6) > Alernate patch series starting with > https://sourceware.org/pipermail/binutils/2025-October/144989.html Tested with all 4 patches cherry-picked. Observe similar performance improvement as H.J.'s patches: $ printf "int f_0() __attribute__ ((section (\".text.0\"))); int f_0() { return 0; };\n" > main.c; for (( i=1; i<20000; i++ )); do printf "int f_$i() __attribute__ ((section (\".text.$i\"))); int f_$i() { return f_$((i-1))(); };\n"; done >> main.c; printf "int main() { return f_19999(); }" >> main.c; gcc -O0 -c main.c -o main.o; echo "bfd:"; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections bfd: real 0m0,113s user 0m0,070s sys 0m0,043s (was 5s before patching) On a real test linking of a large `pandoc` project shrunk from 36s down to 18s (gold does it in 13s). Also looks good!
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ad4b8c3e9586991795daf8358bee31335e412e56 commit ad4b8c3e9586991795daf8358bee31335e412e56 Author: Alan Modra <amodra@gmail.com> Date: Thu Oct 30 16:26:27 2025 +1030 Pass cookie and symndx to gc_mark_hook Replace the "sym" param with "cookie" and "symndx". This is in preparation for the next patch. Also remove "rel" param since this is available via "cookie", and is always set from cookie->rel. PR 33530 * elf-m10300.c (mn10300_elf_gc_mark_hook): Replace "rel" and "sym" params with "cookie" and "symndx". Adjust to suit. * elf32-arm.c (elf32_arm_gc_mark_hook): Likewise. * elf32-bfin.c (bfin_gc_mark_hook): Likewise. * elf32-cris.c (cris_elf_gc_mark_hook): Likewise. * elf32-csky.c (csky_elf_gc_mark_hook): Likewise. * elf32-d10v.c (elf32_d10v_gc_mark_hook): Likewise. * elf32-fr30.c (fr30_elf_gc_mark_hook): Likewise. * elf32-frv.c (elf32_frv_gc_mark_hook): Likewise. * elf32-hppa.c (elf32_hppa_gc_mark_hook): Likewise. * elf32-iq2000.c (iq2000_elf_gc_mark_hook): Likewise. * elf32-lm32.c (lm32_elf_gc_mark_hook): Likewise. * elf32-m32r.c (m32r_elf_gc_mark_hook): Likewise. * elf32-m68k.c (elf_m68k_gc_mark_hook): Likewise. * elf32-mcore.c (mcore_elf_gc_mark_hook): Likewise. * elf32-metag.c (elf_metag_gc_mark_hook): Likewise. * elf32-microblaze.c (microblaze_elf_gc_mark_hook): Likewise. * elf32-nds32.c (nds32_elf_gc_mark_hook): Likewise. * elf32-or1k.c (or1k_elf_gc_mark_hook): Likewise. * elf32-ppc.c (ppc_elf_gc_mark_hook): Likewise. * elf32-s390.c (elf_s390_gc_mark_hook): Likewise. * elf32-score.c (s3_bfd_score_elf_gc_mark_hook): Likewise. (_bfd_score_elf_gc_mark_hook): Likewise. * elf32-score7.c (s7_bfd_score_elf_gc_mark_hook): Likewise. * elf32-sh.c (sh_elf_gc_mark_hook): Likewise. * elf32-tilepro.c (tilepro_elf_gc_mark_hook): Likewise. * elf32-v850.c (v850_elf_gc_mark_hook): Likewise. * elf32-vax.c (elf_vax_gc_mark_hook): Likewise. * elf32-visium.c (visium_elf_gc_mark_hook): Likewise. * elf32-xstormy16.c (xstormy16_elf_gc_mark_hook): Likewise. * elf32-xtensa.c (elf_xtensa_gc_mark_hook): Likewise. * elf64-alpha.c (elf64_alpha_gc_mark_hook): Likewise. * elf64-mmix.c (mmix_elf_gc_mark_hook): Likewise. * elf64-ppc.c (ppc64_elf_gc_mark_hook): Likewise. * elf64-s390.c (elf_s390_gc_mark_hook): Likewise. * elfnn-loongarch.c (loongarch_elf_gc_mark_hook): Likewise. * elfxx-mips.c (_bfd_mips_elf_gc_mark_hook): Likewise. * elfxx-sparc.c (_bfd_sparc_elf_gc_mark_hook): Likewise. * elfxx-tilegx.c (tilegx_elf_gc_mark_hook): Likewise. * elfxx-x86.c (_bfd_x86_elf_gc_mark_hook): Likewise. * elflink.c (_bfd_elf_gc_mark_hook): Likewise. (elf_gc_mark_debug_section): Likewise. (_bfd_elf_gc_mark_rsec): Adjust gc_mark_hook calls. * elf32-cr16.c (elf32_cr16_gc_mark_hook): Delete. (elf_backend_gc_mark_hook): Don't define. * elf32-moxie.c (moxie_elf_gc_mark_hook): Delete. (elf_backend_gc_mark_hook): Don't define. * elf-bfd.h (elf_gc_mark_hook_fn, _bfd_elf_gc_mark_hook): Update declarations. * elf32-score.h (s7_bfd_score_elf_gc_mark_hook): Likewise. * elfxx-mips.h (_bfd_mips_elf_gc_mark_hook): Likewise. * elfxx-sparc.h (_bfd_sparc_elf_gc_mark_hook): Likewise. * elfxx-tilegx.h (tilegx_elf_gc_mark_hook): Likewise. * elfxx-x86.h (_bfd_x86_elf_gc_mark_hook): Likewise.
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=57ccec66689305cdb59afa179b3f9a6464c84820 commit 57ccec66689305cdb59afa179b3f9a6464c84820 Author: Alan Modra <amodra@gmail.com> Date: Thu Oct 30 16:26:44 2025 +1030 _bfd_elf_get_link_hash_entry tidy Replace the "Elf_Internal_Shdr *symtab_hdr" parameter with "unsigned int ext_sym_start", making it a duplicate of the existing get_link_hash_entry function. Also remove unnecessary checks from get_ext_sym_hash_from_cookie and find_merged_cie. The sym_hashes and symbol index checks in get_ext_sym_hash_from_cookie are duplicates of those done in _bfd_elf_get_link_hash_entry, and there is no need to check for a global symbol before calling _bfd_elf_get_link_hash_entry. When bad_symtab, local symbols will have a NULL sym_hashes entry. Removing these unnecessary checks gets rid of some cookie->locsyms references. PR 33530 * elf-bfd.h (_bfd_elf_get_link_hash_entry): Update declaration. * elflink.c (_bfd_elf_get_link_hash_entry): Rename from get_link_hash_entry, adjusting all calls and deleting original function. (get_ext_sym_hash_from_cookie): Make "symndx" unsigned int. Remove unnecessary check on sym_hashes, symbol index and symbol binding. * elf-eh-frame.c (find_merged_cie): Remove similar unnecessary checks. * elf64-x86-64.c (elf_x86_64_scan_relocs): Adjust. * elfxx-x86.c (_bfd_x86_elf_check_relocs): Adjust. (_bfd_x86_elf_link_relax_section): Adjust.
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8b99ca44515b2b56f1492786e23522e4abbe8a96 commit 8b99ca44515b2b56f1492786e23522e4abbe8a96 Author: Alan Modra <amodra@gmail.com> Date: Thu Oct 30 16:26:50 2025 +1030 Don't read and cache local syms for gc-sections Most places just need the local sym section, so reading and sometimes caching the symbols is excessive. A symbol shndx can be stored in 4 bytes, an elf symbol internal form requires 32 bytes. When caching the local symbols we went slightly crazy trying to avoid memory usage, resulting in the symbols being freed then immediately read again for the testcase in the PR33530. To avoid this problem, this patch caches the local symbol section indices in the bfd rather than in the reloc cookie. They are not initialised until there is a need for them, so unlike elf_sym_hashes for global syms you cannot rely on them being present. One place that does need local syms is adjust_eh_frame_local_symbols, but that is called once via bfd_discard_info so there is no problem simply reading them. The other place that needs local syms is ppc64_elf_gc_mark_hook for the old ELFv1 ABI when handling .opd. bfd_sym_from_r_symndx should be sufficient for function pointer references to static functions, which is how this code is triggered. PR 33530 * elf-bfd.h (struct elf_reloc_cookie): Delete "locsyms", "sym_hashes", "bad_symtab". Make "locsymcount" and "extsymoff" unsigned int. (struct elf_obj_tdata): Add loc_shndx. (elf_loc_shndx): Define. (_bfd_get_local_sym_section): Declare. * elf-eh-frame.c (find_merged_cie): Use _bfd_get_local_sym_section for local syms. (adjust_eh_frame_local_symbols): Read local syms if any match .eh_frame section. Return them if changed. (_bfd_elf_discard_section_eh_frame): Adjust. * elf64-ppc.c (ppc64_elf_gc_mark_hook): Use _bfd_get_local_sym_section. Use bfd_sym_from_r_symndx when reading opd local symbol. * elflink.c (_bfd_get_local_sym_section): New function. (_bfd_elf_section_for_symbol): Use it. (elf_link_add_object_symbols): Remove unnecessary cast on bfd_zalloc return. (init_reloc_cookie): Remove "info" and "keep_memory" params. Adjust all callers. Don't stash elf_sym_hashes and elf_bad_symtab to cookie. Don't read local syms to cookie. (fini_reloc_cookie): Do nothing. (_bfd_elf_gc_mark_hook): Use _bfd_get_local_sym_section. (elf_gc_mark_debug_section): Likewise. (bfd_elf_reloc_symbol_deleted_p): Likewise. Update cookie use.
Patch series pushed mainline. I don't intend to push this to the branch.