Bug 24165 - fivefold time and memory usage since commit 3ae729d5 on large files generated by lto
Summary: fivefold time and memory usage since commit 3ae729d5 on large files generated...
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: gas (show other bugs)
Version: 2.33
: P2 normal
Target Milestone: 2.33
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-04 14:59 UTC by Serge Belyshev
Modified: 2019-02-10 14:16 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2019-02-07 00:00:00


Attachments
Please try this (741 bytes, patch)
2019-02-07 22:48 UTC, H.J. Lu
Details | Diff
An updated patch (1.48 KB, patch)
2019-02-09 04:46 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Serge Belyshev 2019-02-04 14:59:31 UTC
Since fix for bug 22874, commit 3ae729d5, gas time and memory usage increased 5 times on very large files generated by lto.

Before:

gas/as-new: total time in assembly: 215.922071
frag chains:
	0x1a78d60 .text     	   1143697 frags
	0x1a78df8 .data     	       748 frags
	0x1a78e90 .bss      	      5393 frags
	0x1a79518 .bss      	      4466 frags
	0x1a79188 .rodata   	     50145 frags
	0x1a79220 .rodata.str1.1	       152 frags
	0x1a792b8 .text.unlikely	      1429 frags
	0x1a79350 .rodata.str1.8	     14809 frags
	0x1a793e8 .text.startup	       789 frags
	0x1a79480 .init_array	         3 frags
	0x1a795b0 .rodata.cst8	        27 frags
	0x1a79648 .rodata.cst16	       109 frags
	0x1a796e0 .rodata.cst4	         7 frags
	0x1a79778 .comment  	         2 frags
	0x1a79810 .note.GNU-stack	         2 frags
	0x1a798a8 .eh_frame 	    103243 frags
fixups: 1254022
637223 mini local symbols created, 446255 converted

	Maximum resident set size (kbytes): 896072


After:

gas/as-new: total time in assembly: 1025.439759
frag chains:
	0x992d60 .text     	  12155234 frags
	0x992df8 .data     	       748 frags
	0x992e90 .bss      	      5393 frags
	0x993518 .bss      	      4466 frags
	0x993188 .rodata   	     50145 frags
	0x993220 .rodata.str1.1	       152 frags
	0x9932b8 .text.unlikely	     38334 frags
	0x993350 .rodata.str1.8	     14809 frags
	0x9933e8 .text.startup	     15250 frags
	0x993480 .init_array	         3 frags
	0x9935b0 .rodata.cst8	        27 frags
	0x993648 .rodata.cst16	       109 frags
	0x9936e0 .rodata.cst4	         7 frags
	0x993778 .comment  	         2 frags
	0x993810 .note.GNU-stack	         2 frags
	0x9938a8 .eh_frame 	    102827 frags
fixups: 1254022

	Maximum resident set size (kbytes): 3831980


(source is a 150 MB .s file generated by lto final link for cc1plus, 11MB compressed -- did not fit the attachment limit)
Comment 1 Serge Belyshev 2019-02-04 21:03:14 UTC
The problem can be reproduced with synthetic testcase generated like this:

yes .p2align 5 | head -n500000 > bug.s
Comment 2 Martin Liška 2019-02-05 07:36:42 UTC
And the original test-case can be found here:
https://drive.google.com/file/d/1_63e0GbykhVAr_ZubOP4YuiZOW2ThoGl/view?usp=sharing
Comment 3 H.J. Lu 2019-02-07 22:48:44 UTC
Created attachment 11598 [details]
Please try this
Comment 4 Serge Belyshev 2019-02-08 09:37:11 UTC
(In reply to H.J. Lu from comment #3)

Thanks! I have benchmarked different MAX_MEM_FOR_RS_ALIGN_CODE and here are the results.

synthetic testcase from comment 1:

                time, s   maxrss, MB
------------------------------------
before     31     0.279        85      
after      31     0.278        85      
after      63     0.295       100      
after     127     0.356       137      
after     255     0.447       202      
after     511     0.546       331      
after    1023     0.838       655      
after    2047     2.867      2029      
after    4095     5.296      3974      


original cc1plus.s from comment 2:

                time, s   maxrss, MB
------------------------------------
before     31       188       875
after      31       187       876
after      63       193       888
after     127       204       912
after     255       217       963
after     511       236      1071
after    1023       256      1290
after    2047       547      2256
after    4095       906	     3742
Comment 5 H.J. Lu 2019-02-09 04:46:41 UTC
Created attachment 11600 [details]
An updated patch
Comment 6 Serge Belyshev 2019-02-09 06:31:53 UTC
(In reply to H.J. Lu from comment #5)
> An updated patch

This patch fixes the problem on both test cases: 

                time, s   maxrss, MB
------------------------------------
bug.s   	  0.261       74
cc1plus.s         189        875
Comment 7 Sourceware Commits 2019-02-10 12:36:26 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=db22231044df03bbcb987496f3f29f0462b2e9ee

commit db22231044df03bbcb987496f3f29f0462b2e9ee
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Feb 10 04:34:10 2019 -0800

    gas: Pass max_bytes to TC_FRAG_INIT
    
    ommit 3ae729d5a4f63740ed9a778960b17c2912b0bbdd
    Author: H.J. Lu <hjl.tools@gmail.com>
    Date:   Wed Mar 7 04:18:45 2018 -0800
    
        x86: Rewrite NOP generation for fill and alignment
    
    increased MAX_MEM_FOR_RS_ALIGN_CODE to 4095 which resulted in increase
    of assembler time and memory usage by 5 times for inputs with many
    .p2align directives, which is typical for LTO output.  This patch passes
    max_bytes to TC_FRAG_INIT so that MAX_MEM_FOR_RS_ALIGN_CODE can be set
    as needed and tracked by backend it so that HANDLE_ALIGN can check the
    maximum alignment for each rs_align_code frag.  Wall time to assemble
    the same cc1plus.s:
    
    before:
    
    423.78user 0.89system 7:05.71elapsed 99%CPU
    
    after:
    
    102.35user 0.27system 1:42.89elapsed 99%CPU
    
    	PR gas/24165
    	* frags.c (frag_var_init): Pass max_chars to TC_FRAG_INIT as
    	max_bytes.
    	* config/tc-aarch64.h (TC_FRAG_INIT): Add and pass max_bytes to
    	aarch64_init_frag.
    	* /config/tc-arm.h (TC_FRAG_INIT): And and pass max_bytes to
    	arm_init_frag.
    	* config/tc-avr.h (TC_FRAG_INIT): And and ignore max_bytes.
    	* config/tc-ia64.h (TC_FRAG_INIT): Likewise.
    	* config/tc-mmix.h (TC_FRAG_INIT): Likewise.
    	* config/tc-nds32.h (TC_FRAG_INIT): Likewise.
    	* config/tc-ns32k.h (TC_FRAG_INIT): Likewise.
    	* config/tc-rl78.h (TC_FRAG_INIT): Likewise.
    	* config/tc-rx.h (TC_FRAG_INIT): Likewise.
    	* config/tc-score.h (TC_FRAG_INIT): Likewise.
    	* config/tc-tic54x.h (TC_FRAG_INIT): Likewise.
    	* config/tc-tic6x.h (TC_FRAG_INIT): Likewise.
    	* config/tc-xtensa.h (TC_FRAG_INIT): Likewise.
    	* config/tc-i386.h (MAX_MEM_FOR_RS_ALIGN_CODE): Set to
    	(alignment ? ((1 << alignment) - 1) : 1)
    	(i386_tc_frag_data): Add max_bytes.
    	(TC_FRAG_INIT): Add and track max_bytes.
    	(HANDLE_ALIGN): Replace MAX_MEM_FOR_RS_ALIGN_CODE with
    	fragP->tc_frag_data.max_bytes.
    	* doc/internals.texi: Update TC_FRAG_TYPE with max_bytes.
Comment 8 Sourceware Commits 2019-02-10 13:16:05 UTC
The binutils-2_32-branch branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d8699a0b89a3588ba049befa2249cd65f00fa195

commit d8699a0b89a3588ba049befa2249cd65f00fa195
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Feb 10 04:34:10 2019 -0800

    gas: Pass max_bytes to TC_FRAG_INIT
    
    ommit 3ae729d5a4f63740ed9a778960b17c2912b0bbdd
    Author: H.J. Lu <hjl.tools@gmail.com>
    Date:   Wed Mar 7 04:18:45 2018 -0800
    
        x86: Rewrite NOP generation for fill and alignment
    
    increased MAX_MEM_FOR_RS_ALIGN_CODE to 4095 which resulted in increase
    of assembler time and memory usage by 5 times for inputs with many
    .p2align directives, which is typical for LTO output.  This patch passes
    max_bytes to TC_FRAG_INIT so that MAX_MEM_FOR_RS_ALIGN_CODE can be set
    as needed and tracked by backend it so that HANDLE_ALIGN can check the
    maximum alignment for each rs_align_code frag.  Wall time to assemble
    the same cc1plus.s:
    
    before:
    
    423.78user 0.89system 7:05.71elapsed 99%CPU
    
    after:
    
    102.35user 0.27system 1:42.89elapsed 99%CPU
    
    	PR gas/24165
    	* config/tc-i386.h (MAX_MEM_FOR_RS_ALIGN_CODE): Set to
    	(alignment ? ((1 << alignment) - 1) : 1)
    	(i386_tc_frag_data): Add max_bytes.
    	(TC_FRAG_INIT): Track max_chars in max_bytes.
    	(HANDLE_ALIGN): Replace MAX_MEM_FOR_RS_ALIGN_CODE with
    	fragP->tc_frag_data.max_bytes.
    
    (cherry picked from commit db22231044df03bbcb987496f3f29f0462b2e9ee)
Comment 9 Sourceware Commits 2019-02-10 13:56:18 UTC
The binutils-2_32-branch branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6b6ff72600fca30fd116a6fed7af39b9c3656004

commit 6b6ff72600fca30fd116a6fed7af39b9c3656004
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Feb 10 05:55:07 2019 -0800

    Add ChangeLog entries for PR gas/24165
Comment 10 Sourceware Commits 2019-02-10 14:08:11 UTC
The binutils-2_31-branch branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=292144de4f26e2c6a1fa9d84da2c7758872d4725

commit 292144de4f26e2c6a1fa9d84da2c7758872d4725
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Feb 10 04:34:10 2019 -0800

    gas: Pass max_bytes to TC_FRAG_INIT
    
    ommit 3ae729d5a4f63740ed9a778960b17c2912b0bbdd
    Author: H.J. Lu <hjl.tools@gmail.com>
    Date:   Wed Mar 7 04:18:45 2018 -0800
    
        x86: Rewrite NOP generation for fill and alignment
    
    increased MAX_MEM_FOR_RS_ALIGN_CODE to 4095 which resulted in increase
    of assembler time and memory usage by 5 times for inputs with many
    .p2align directives, which is typical for LTO output.  This patch passes
    max_bytes to TC_FRAG_INIT so that MAX_MEM_FOR_RS_ALIGN_CODE can be set
    as needed and tracked by backend it so that HANDLE_ALIGN can check the
    maximum alignment for each rs_align_code frag.  Wall time to assemble
    the same cc1plus.s:
    
    before:
    
    423.78user 0.89system 7:05.71elapsed 99%CPU
    
    after:
    
    102.35user 0.27system 1:42.89elapsed 99%CPU
    
    	PR gas/24165
    	* config/tc-i386.h (MAX_MEM_FOR_RS_ALIGN_CODE): Set to
    	(alignment ? ((1 << alignment) - 1) : 1)
    	(i386_tc_frag_data): Add max_bytes.
    	(TC_FRAG_INIT): Track max_chars in max_bytes.
    	(HANDLE_ALIGN): Replace MAX_MEM_FOR_RS_ALIGN_CODE with
    	fragP->tc_frag_data.max_bytes.
    
    (cherry picked from commit db22231044df03bbcb987496f3f29f0462b2e9ee)
Comment 11 H.J. Lu 2019-02-10 14:16:58 UTC
Fixed for 2.33 and on 2.31/2.32 branches.