Bug 19739 - ld.bfd performance regression
Summary: ld.bfd performance regression
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.26
: P2 critical
Target Milestone: 2.27
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-27 15:52 UTC by Jan Smets
Modified: 2016-03-04 14:49 UTC (History)
2 users (show)

See Also:
Host:
Target: mips-wrs-vxworks,x86_64
Build:
Last reconfirmed:


Attachments
A patch (962 bytes, patch)
2016-02-29 19:19 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Smets 2016-02-27 15:52:48 UTC
ld.bfd -r on mips

When compiling object files with dwarf debug info the link time goes up significantly. I suspect this is due to the large amount of sections.

I needed to reduce the amount of object files used in the example below because otherwise it would just never stop.


Binutils 2.23.2, not very fast, but acceptable.

Samples: 8K of event 'cycles', Event count (approx.): 5959804941
 64.81%  ld-new  ld-new             [.] _bfd_elf_make_section_from_shdr
 11.45%  ld-new  [kernel.kallsyms]  [k] 0xffffffff8103ba6a
  2.51%  ld-new  ld-new             [.] walk_wild_section_general
  1.62%  ld-new  libc-2.12.so       [.] __gconv_transform_utf8_internal




Binutils 2.26

Samples: 214K of event 'cycles', Event count (approx.): 161542081964
 68.51%  ld-new  ld-new             [.] bfd_get_next_section_by_name
 14.44%  ld-new  libc-2.12.so       [.] __strcmp_sse42
 10.75%  ld-new  ld-new             [.] gldelf32btsmip_place_orphan
  2.40%  ld-new  ld-new             [.] _bfd_elf_make_section_from_shdr
  1.67%  ld-new  [kernel.kallsyms]  [k] 0xffffffff8103ba6a



       │      for (sh = (struct section_hash_entry *) sh->root.next;
  0.00 │      test   %rbx,%rbx
       │    ↓ je     50
       │           sh != NULL;
       │           sh = (struct section_hash_entry *) sh->root.next)
       │        if (sh->root.hash == hash
  0.78 │28:   cmp    %rbp,0x10(%rbx)
  1.38 │    ↑ jne    20
       │           && strcmp (sh->root.string, name) == 0)
 85.90 │      mov    0x8(%rbx),%rdi
  5.69 │      mov    %r13,%rsi
  0.01 │    → callq  strcmp@plt
       │      hash = sh->root.hash;
       │      name = sec->name;


Is this sufficient information ?

Thanks
Comment 1 Markus Trippelsdorf 2016-02-27 16:57:03 UTC
There was a similar issue that got fixed recently: PR19542.
Can you try the current 2_26-branch?
Comment 2 Markus Trippelsdorf 2016-02-27 16:58:23 UTC
(In reply to Markus Trippelsdorf from comment #1)
> There was a similar issue that got fixed recently: PR19542.
> Can you try the current 2_26-branch?

Ah, you're on mips. Please ignore my comment.
Comment 3 Jan Smets 2016-02-27 21:51:46 UTC
I also tried the latest snapshot, so this is a different issue. Thanks.
Comment 4 Jan Smets 2016-02-28 12:58:41 UTC
Same issue on x86_64, roughly 14x slower.


x86_64 2.26.51 (latest snapshot) 

 39.90%  ld-new  ld-new             [.] bfd_get_next_section_by_name
 21.19%  ld-new  libc-2.12.so       [.] __strcmp_sse42
 20.93%  ld-new  ld-new             [.] gldelf_x86_64_place_orphan
  9.53%  ld-new  ld-new             [.] _bfd_elf_match_sections_by_type
  2.54%  ld-new  ld-new             [.] _bfd_elf_make_section_from_shdr

real    0m9.527s


x86_64 2.23.2 

 46.83%  ld-new  ld-new             [.] _bfd_elf_make_section_from_shdr
 16.17%  ld-new  [kernel.kallsyms]  [k] 0xffffffff8103ba6a
  4.00%  ld-new  ld-new             [.] bfd_elf_final_link
  2.38%  ld-new  ld-new             [.] walk_wild_section_general
  1.90%  ld-new  ld-new             [.] bfd_hash_lookup

real    0m0.672s
Comment 5 H.J. Lu 2016-02-28 15:20:35 UTC
Please provide a testcase.
Comment 6 Jan Smets 2016-02-28 15:23:40 UTC
What would be the easiets way to reproduce a testcase.
(without sending proprietary data - are there any tools to discard all data from object files or something?)
Comment 7 H.J. Lu 2016-02-28 15:39:47 UTC
(In reply to Jan Smets from comment #6)
> What would be the easiets way to reproduce a testcase.
> (without sending proprietary data - are there any tools to discard all data
> from object files or something?)

objcopy can remove sections from object files. Or you can write a script
to generate source files with many functions/data, compiling them using
-ffunction-sections -fdata-sections.  If they don't work, you can send me
those object files to me directly.
Comment 8 H.J. Lu 2016-02-29 19:19:58 UTC
Created attachment 9054 [details]
A patch

Please try this.
Comment 9 Alan Modra 2016-02-29 22:28:33 UTC
Why is place_orphan being called?  ie. What section is missing from the linker script?
Comment 10 H.J. Lu 2016-02-29 22:35:11 UTC
(In reply to Alan Modra from comment #9)
> Why is place_orphan being called?  ie. What section is missing from the
> linker script?

It is called on .text.* sections.  We don't want to fold all .text.* sections
into .text section for "ld -r".
Comment 11 Alan Modra 2016-02-29 23:54:15 UTC
Ah, yes ld -r will get lots of orphans when linking -ffunction-sections objects, and my pr19162 fix is quite expensive.  Patch approved, but please move the comment down to just before "place = NULL" rather than into the new !bfd_link_relocatable block.
Comment 12 Jan Smets 2016-03-01 13:31:51 UTC
Patch works, also on mips.
Thanks
Comment 13 cvs-commit@gcc.gnu.org 2016-03-01 13:52:50 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=077fcd6a3b5729044acce83f77ebedd3adbadab0

commit 077fcd6a3b5729044acce83f77ebedd3adbadab0
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Feb 29 11:04:22 2016 -0800

    Speedup ELF orphan placement for relocatable link
    
    Since there is no need to place output sections in specific order for
    relocatable link, we can skip merging flags of other input sections.
    
    	PR ld/19739
    	* ld/emultempl/elf32.em (gld${EMULATION_NAME}_place_orphan): Don't
    	merge flags of other input sections for relocatable link.
Comment 14 cvs-commit@gcc.gnu.org 2016-03-02 13:09:29 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=523f4c9234439fd6ccc0dd2c3b387331dd64c54b

commit 523f4c9234439fd6ccc0dd2c3b387331dd64c54b
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Mar 2 05:05:42 2016 -0800

    Speedup mmo and pe orphan placement for relocatable link
    
    Since there is no need to place output sections in specific order for
    relocatable link, we can skip merging flags of other input sections.
    
    	PR ld/19739
    	* emultempl/mmo.em (mmo_place_orphan): Don't merge flags of other
    	input sections for relocatable link.
    	* emultempl/pe.em (gld_${EMULATION_NAME}_place_orphan): Likewise.
    	* emultempl/pep.em (gld_${EMULATION_NAME}_place_orphan): Likewise.
Comment 15 cvs-commit@gcc.gnu.org 2016-03-04 14:47:41 UTC
The binutils-2_26-branch branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=cb09f0f53e66991e50a2482f0d79492c824d3bda

commit cb09f0f53e66991e50a2482f0d79492c824d3bda
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Feb 29 11:04:22 2016 -0800

    Speedup orphan placement for relocatable link
    
    Since there is no need to place output sections in specific order for
    relocatable link, we can skip merging flags of other input sections.
    
    Backport from master
    
    	PR ld/19739
    	* emultempl/elf32.em (gld${EMULATION_NAME}_place_orphan): Don't
    	merge flags of other input sections for relocatable link.
    	* emultempl/mmo.em (mmo_place_orphan): Likewise.
    	* emultempl/pe.em (gld_${EMULATION_NAME}_place_orphan): Likewise.
    	* emultempl/pep.em (gld_${EMULATION_NAME}_place_orphan): Likewise.
Comment 16 H.J. Lu 2016-03-04 14:49:14 UTC
Fixed on master and 2.26 branch.