Since fix for bug 22874, commit 3ae729d5, gas time and memory usage increased 5 times on very large files generated by lto. Before: gas/as-new: total time in assembly: 215.922071 frag chains: 0x1a78d60 .text 1143697 frags 0x1a78df8 .data 748 frags 0x1a78e90 .bss 5393 frags 0x1a79518 .bss 4466 frags 0x1a79188 .rodata 50145 frags 0x1a79220 .rodata.str1.1 152 frags 0x1a792b8 .text.unlikely 1429 frags 0x1a79350 .rodata.str1.8 14809 frags 0x1a793e8 .text.startup 789 frags 0x1a79480 .init_array 3 frags 0x1a795b0 .rodata.cst8 27 frags 0x1a79648 .rodata.cst16 109 frags 0x1a796e0 .rodata.cst4 7 frags 0x1a79778 .comment 2 frags 0x1a79810 .note.GNU-stack 2 frags 0x1a798a8 .eh_frame 103243 frags fixups: 1254022 637223 mini local symbols created, 446255 converted Maximum resident set size (kbytes): 896072 After: gas/as-new: total time in assembly: 1025.439759 frag chains: 0x992d60 .text 12155234 frags 0x992df8 .data 748 frags 0x992e90 .bss 5393 frags 0x993518 .bss 4466 frags 0x993188 .rodata 50145 frags 0x993220 .rodata.str1.1 152 frags 0x9932b8 .text.unlikely 38334 frags 0x993350 .rodata.str1.8 14809 frags 0x9933e8 .text.startup 15250 frags 0x993480 .init_array 3 frags 0x9935b0 .rodata.cst8 27 frags 0x993648 .rodata.cst16 109 frags 0x9936e0 .rodata.cst4 7 frags 0x993778 .comment 2 frags 0x993810 .note.GNU-stack 2 frags 0x9938a8 .eh_frame 102827 frags fixups: 1254022 Maximum resident set size (kbytes): 3831980 (source is a 150 MB .s file generated by lto final link for cc1plus, 11MB compressed -- did not fit the attachment limit)
The problem can be reproduced with synthetic testcase generated like this: yes .p2align 5 | head -n500000 > bug.s
And the original test-case can be found here: https://drive.google.com/file/d/1_63e0GbykhVAr_ZubOP4YuiZOW2ThoGl/view?usp=sharing
Created attachment 11598 [details] Please try this
(In reply to H.J. Lu from comment #3) Thanks! I have benchmarked different MAX_MEM_FOR_RS_ALIGN_CODE and here are the results. synthetic testcase from comment 1: time, s maxrss, MB ------------------------------------ before 31 0.279 85 after 31 0.278 85 after 63 0.295 100 after 127 0.356 137 after 255 0.447 202 after 511 0.546 331 after 1023 0.838 655 after 2047 2.867 2029 after 4095 5.296 3974 original cc1plus.s from comment 2: time, s maxrss, MB ------------------------------------ before 31 188 875 after 31 187 876 after 63 193 888 after 127 204 912 after 255 217 963 after 511 236 1071 after 1023 256 1290 after 2047 547 2256 after 4095 906 3742
Created attachment 11600 [details] An updated patch
(In reply to H.J. Lu from comment #5) > An updated patch This patch fixes the problem on both test cases: time, s maxrss, MB ------------------------------------ bug.s 0.261 74 cc1plus.s 189 875
The master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=db22231044df03bbcb987496f3f29f0462b2e9ee commit db22231044df03bbcb987496f3f29f0462b2e9ee Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Feb 10 04:34:10 2019 -0800 gas: Pass max_bytes to TC_FRAG_INIT ommit 3ae729d5a4f63740ed9a778960b17c2912b0bbdd Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Mar 7 04:18:45 2018 -0800 x86: Rewrite NOP generation for fill and alignment increased MAX_MEM_FOR_RS_ALIGN_CODE to 4095 which resulted in increase of assembler time and memory usage by 5 times for inputs with many .p2align directives, which is typical for LTO output. This patch passes max_bytes to TC_FRAG_INIT so that MAX_MEM_FOR_RS_ALIGN_CODE can be set as needed and tracked by backend it so that HANDLE_ALIGN can check the maximum alignment for each rs_align_code frag. Wall time to assemble the same cc1plus.s: before: 423.78user 0.89system 7:05.71elapsed 99%CPU after: 102.35user 0.27system 1:42.89elapsed 99%CPU PR gas/24165 * frags.c (frag_var_init): Pass max_chars to TC_FRAG_INIT as max_bytes. * config/tc-aarch64.h (TC_FRAG_INIT): Add and pass max_bytes to aarch64_init_frag. * /config/tc-arm.h (TC_FRAG_INIT): And and pass max_bytes to arm_init_frag. * config/tc-avr.h (TC_FRAG_INIT): And and ignore max_bytes. * config/tc-ia64.h (TC_FRAG_INIT): Likewise. * config/tc-mmix.h (TC_FRAG_INIT): Likewise. * config/tc-nds32.h (TC_FRAG_INIT): Likewise. * config/tc-ns32k.h (TC_FRAG_INIT): Likewise. * config/tc-rl78.h (TC_FRAG_INIT): Likewise. * config/tc-rx.h (TC_FRAG_INIT): Likewise. * config/tc-score.h (TC_FRAG_INIT): Likewise. * config/tc-tic54x.h (TC_FRAG_INIT): Likewise. * config/tc-tic6x.h (TC_FRAG_INIT): Likewise. * config/tc-xtensa.h (TC_FRAG_INIT): Likewise. * config/tc-i386.h (MAX_MEM_FOR_RS_ALIGN_CODE): Set to (alignment ? ((1 << alignment) - 1) : 1) (i386_tc_frag_data): Add max_bytes. (TC_FRAG_INIT): Add and track max_bytes. (HANDLE_ALIGN): Replace MAX_MEM_FOR_RS_ALIGN_CODE with fragP->tc_frag_data.max_bytes. * doc/internals.texi: Update TC_FRAG_TYPE with max_bytes.
The binutils-2_32-branch branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d8699a0b89a3588ba049befa2249cd65f00fa195 commit d8699a0b89a3588ba049befa2249cd65f00fa195 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Feb 10 04:34:10 2019 -0800 gas: Pass max_bytes to TC_FRAG_INIT ommit 3ae729d5a4f63740ed9a778960b17c2912b0bbdd Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Mar 7 04:18:45 2018 -0800 x86: Rewrite NOP generation for fill and alignment increased MAX_MEM_FOR_RS_ALIGN_CODE to 4095 which resulted in increase of assembler time and memory usage by 5 times for inputs with many .p2align directives, which is typical for LTO output. This patch passes max_bytes to TC_FRAG_INIT so that MAX_MEM_FOR_RS_ALIGN_CODE can be set as needed and tracked by backend it so that HANDLE_ALIGN can check the maximum alignment for each rs_align_code frag. Wall time to assemble the same cc1plus.s: before: 423.78user 0.89system 7:05.71elapsed 99%CPU after: 102.35user 0.27system 1:42.89elapsed 99%CPU PR gas/24165 * config/tc-i386.h (MAX_MEM_FOR_RS_ALIGN_CODE): Set to (alignment ? ((1 << alignment) - 1) : 1) (i386_tc_frag_data): Add max_bytes. (TC_FRAG_INIT): Track max_chars in max_bytes. (HANDLE_ALIGN): Replace MAX_MEM_FOR_RS_ALIGN_CODE with fragP->tc_frag_data.max_bytes. (cherry picked from commit db22231044df03bbcb987496f3f29f0462b2e9ee)
The binutils-2_32-branch branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6b6ff72600fca30fd116a6fed7af39b9c3656004 commit 6b6ff72600fca30fd116a6fed7af39b9c3656004 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Feb 10 05:55:07 2019 -0800 Add ChangeLog entries for PR gas/24165
The binutils-2_31-branch branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=292144de4f26e2c6a1fa9d84da2c7758872d4725 commit 292144de4f26e2c6a1fa9d84da2c7758872d4725 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Feb 10 04:34:10 2019 -0800 gas: Pass max_bytes to TC_FRAG_INIT ommit 3ae729d5a4f63740ed9a778960b17c2912b0bbdd Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Mar 7 04:18:45 2018 -0800 x86: Rewrite NOP generation for fill and alignment increased MAX_MEM_FOR_RS_ALIGN_CODE to 4095 which resulted in increase of assembler time and memory usage by 5 times for inputs with many .p2align directives, which is typical for LTO output. This patch passes max_bytes to TC_FRAG_INIT so that MAX_MEM_FOR_RS_ALIGN_CODE can be set as needed and tracked by backend it so that HANDLE_ALIGN can check the maximum alignment for each rs_align_code frag. Wall time to assemble the same cc1plus.s: before: 423.78user 0.89system 7:05.71elapsed 99%CPU after: 102.35user 0.27system 1:42.89elapsed 99%CPU PR gas/24165 * config/tc-i386.h (MAX_MEM_FOR_RS_ALIGN_CODE): Set to (alignment ? ((1 << alignment) - 1) : 1) (i386_tc_frag_data): Add max_bytes. (TC_FRAG_INIT): Track max_chars in max_bytes. (HANDLE_ALIGN): Replace MAX_MEM_FOR_RS_ALIGN_CODE with fragP->tc_frag_data.max_bytes. (cherry picked from commit db22231044df03bbcb987496f3f29f0462b2e9ee)
Fixed for 2.33 and on 2.31/2.32 branches.