Created attachment 10840 [details] aarch64 assembler testcase Hi. After the following patch in GCC land, providing finer grained .loc directives for inline functions, assembler time is significantly higher: https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00481.html tor:~/bld/aarch64-combined/gcc$ time ./as poly-big.s real 0m20.108s user 0m19.847s sys 0m0.203s tor:~/bld/aarch64-combined/gcc$ time ./as poly-small.s real 0m0.622s user 0m0.574s sys 0m0.046s tor:~/bld/aarch64-combined/gcc$ grep \.loc poly-big.s|wc 103773 784669 4036367 tor:~/bld/aarch64-combined/gcc$ grep \.loc poly-small.s|wc 30423 210225 1096181 I am reporting this, just in case the binutils team would like this addressed. This was originally reported as PR84408 in GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84408
gas is generating enormous chains of symbols. (gdb) set args ../pr22870-big.s (gdb) r Starting program: /home/alan/build/gas/aarch64-linux/gas/as-new ../pr22870-big.s C-c C-c Program received signal SIGINT, Interrupt. 0x0000000000423277 in S_GET_SEGMENT (s=<optimised out>) at /home/alan/src/binutils-gdb/gas/symbols.c:2221 2221 return s->bsym->section; (gdb) up #1 resolve_symbol_value (symp=0xb93b70, symp@entry=0xb93ab0) at /home/alan/src/binutils-gdb/gas/symbols.c:1177 1177 seg_left = S_GET_SEGMENT (add_symbol); (gdb) p *add_symbol $1 = {sy_flags = {sy_local_symbol = 1, sy_written = 0, sy_resolved = 0, sy_resolving = 0, sy_used_in_reloc = 0, sy_used = 0, sy_volatile = 0, sy_forward_ref = 0, sy_mri_common = 0, sy_weakrefr = 0, sy_weakrefd = 0}, bsym = 0x7df660, sy_value = {X_add_symbol = 0xb938a0, X_op_symbol = 0xb93970, X_add_number = 0, X_op = O_illegal, X_unsigned = 0, X_extrabit = 0, X_md = 0}, sy_next = 0x1304c, sy_previous = 0x0, sy_frag = 0x0, sy_obj = {local = 12140608, size = 0x0, versioned_name = 0x0}, sy_tc = 0} (gdb) p *((struct local_symbol *)add_symbol) $2 = {lsy_flags = {sy_local_symbol = 1, sy_written = 0, sy_resolved = 0, sy_resolving = 0, sy_used_in_reloc = 0, sy_used = 0, sy_volatile = 0, sy_forward_ref = 0, sy_mri_common = 0, sy_weakrefr = 0, sy_weakrefd = 0}, lsy_section = 0x7df660, lsy_name = 0xb938a0 ".LVU4166", u = {lsy_frag = 0xb93970, lsy_sym = 0xb93970}, lsy_value = 0} (gdb) up #2 0x0000000000422d80 in resolve_symbol_value (symp=0xb96020, symp@entry=0xb95f60) at /home/alan/src/binutils-gdb/gas/symbols.c:1176 1176 left = resolve_symbol_value (add_symbol); (gdb) p *add_symbol $3 = {sy_flags = {sy_local_symbol = 1, sy_written = 0, sy_resolved = 0, sy_resolving = 0, sy_used_in_reloc = 0, sy_used = 0, sy_volatile = 0, sy_forward_ref = 0, sy_mri_common = 0, sy_weakrefr = 0, sy_weakrefd = 0}, bsym = 0x7df660, sy_value = {X_add_symbol = 0xb93aa0, X_op_symbol = 0xb93b70, X_add_number = 0, X_op = O_illegal, X_unsigned = 0, X_extrabit = 0, X_md = 0}, sy_next = 0x1304c, sy_previous = 0x0, sy_frag = 0x0, sy_obj = {local = 12140800, size = 0x0, versioned_name = 0x0}, sy_tc = 0} (gdb) p *((struct local_symbol *)add_symbol) $4 = {lsy_flags = {sy_local_symbol = 1, sy_written = 0, sy_resolved = 0, sy_resolving = 0, sy_used_in_reloc = 0, sy_used = 0, sy_volatile = 0, sy_forward_ref = 0, sy_mri_common = 0, sy_weakrefr = 0, sy_weakrefd = 0}, lsy_section = 0x7df660, lsy_name = 0xb93aa0 ".LVU4167", u = {lsy_frag = 0xb93b70, lsy_sym = 0xb93b70}, lsy_value = 0} (gdb) up 99999 #4663 main (argc=<optimised out>, argv=<optimised out>) at /home/alan/src/binutils-gdb/gas/as.c:1283 1283 perform_an_assembly_pass (argc, argv); (gdb) down #4662 0x0000000000403fb8 in perform_an_assembly_pass (argv=0x7c8e98, argc=<optimised out>) at /home/alan/src/binutils-gdb/gas/as.c:1161 1161 read_a_source_file (*argv); (gdb) #4661 0x000000000041ec4b in read_a_source_file (name=<optimised out>) at /home/alan/src/binutils-gdb/gas/read.c:1148 1148 (*pop->poc_handler) (pop->poc_val); (gdb) #4660 0x000000000041c32f in cons_worker (nbytes=8, rva=0) at /home/alan/src/binutils-gdb/gas/read.c:4081 4081 ret = TC_PARSE_CONS_EXPRESSION (&exp, (unsigned int) nbytes); (gdb) #4659 0x000000000040f49d in expr (rankarg=rankarg@entry=0, resultP=resultP@entry=0x7fffffffda80, mode=mode@entry=expr_normal) at /home/alan/src/binutils-gdb/gas/expr.c:1807 1807 retval = operand (resultP, mode); (gdb) #4658 0x000000000040f21f in operand (expressionP=expressionP@entry=0x7fffffffda80, mode=mode@entry=expr_normal) at /home/alan/src/binutils-gdb/gas/expr.c:1357 1357 expressionP->X_add_number = S_GET_VALUE (symbolP); (gdb) p *symbolP $5 = {sy_flags = {sy_local_symbol = 0, sy_written = 0, sy_resolved = 0, sy_resolving = 1, sy_used_in_reloc = 0, sy_used = 1, sy_volatile = 0, sy_forward_ref = 0, sy_mri_common = 0, sy_weakrefr = 0, sy_weakrefd = 0}, bsym = 0xf7fc70, sy_value = {X_add_symbol = 0xf81fd0, X_op_symbol = 0x0, X_add_number = 1, X_op = O_symbol, X_unsigned = 1, X_extrabit = 0, X_md = 0}, sy_next = 0xf827c0, sy_previous = 0xf82290, sy_frag = 0x7c6f60 <zero_address_frag>, sy_obj = {local = 0, size = 0x0, versioned_name = 0x0}, sy_tc = 0} (gdb) p *symbolP->bsym $6 = {the_bfd = 0x7de110, name = 0xf82300 ".LVU8808", value = 0, flags = 0, section = 0x7a0590 <_bfd_std_section+560>, udata = {p = 0x0, i = 0}} (gdb) down #4657 0x000000000042386c in S_GET_VALUE (s=<optimised out>, s@entry=0xf82310) at /home/alan/src/binutils-gdb/gas/symbols.c:1964 1964 valueT val = resolve_symbol_value (s); (gdb) #4656 0x0000000000422d80 in resolve_symbol_value (symp=symp@entry=0xf82310) at /home/alan/src/binutils-gdb/gas/symbols.c:1176 1176 left = resolve_symbol_value (add_symbol); (gdb) p *add_symbol $7 = {sy_flags = {sy_local_symbol = 1, sy_written = 0, sy_resolved = 0, sy_resolving = 0, sy_used_in_reloc = 0, sy_used = 0, sy_volatile = 0, sy_forward_ref = 0, sy_mri_common = 0, sy_weakrefr = 0, sy_weakrefd = 0}, bsym = 0x7df660, sy_value = {X_add_symbol = 0xf81fc0, X_op_symbol = 0xf82090, X_add_number = 0, X_op = O_illegal, X_unsigned = 0, X_extrabit = 0, X_md = 0}, sy_next = 0x1304c, sy_previous = 0x0, sy_frag = 0x0, sy_obj = {local = 16251728, size = 0x0, versioned_name = 0x0}, sy_tc = 0} (gdb) p *((struct local_symbol *)add_symbol) $8 = {lsy_flags = {sy_local_symbol = 1, sy_written = 0, sy_resolved = 0, sy_resolving = 0, sy_used_in_reloc = 0, sy_used = 0, sy_volatile = 0, sy_forward_ref = 0, sy_mri_common = 0, sy_weakrefr = 0, sy_weakrefd = 0}, lsy_section = 0x7df660, lsy_name = 0xf81fc0 ".LVU8807", u = {lsy_frag = 0xf82090, lsy_sym = 0xf82090}, lsy_value = 0} (gdb) So .LVU8808 is defined from .LVU8807, which is defined from ... all the way down to at least .LVU4166 (where the symbol resolution was interrupted) and I'll bet down to .LVU1. What's more, this symbol resolution is happening before finalize_syms is set so the value calculated for .LVU8808 won't be saved and thus won't cut short the recursion when gas wants the value for .LVU8809.
I suspect the source of the problem is the lack of view reset asserts. Without that, the assembler gets an uninterrupted chain of symbolic views, in which each view is computed based on the view before it and the offset between the addresses in which they were issued. Depending on the order in which the assembler attempts to resolve each view to a constant, it could get expensive. It was never meant for this kind of use: it was expected that most views would be zero-asserts, so this wouldn't arise. The issuance of view asserts was disabled in GCC, triggering this undesirable behavior in the assembler, but I'm not sure I'd consider it a bug in gas. GCC can and should avoid these long chains, issuing view reset asserts after insns known to generate code. It has code to do so, but it's disabled because a few targets had wrong lengths and need auditing and possibly overriding in a target hook.
> GCC can and should avoid these long chains, issuing view > reset asserts after insns known to generate code. That suggest a solution. The assembler certainly knowns when code has been generated, so can it "reset the view"? dwarf2_emit_insn comes to mind as a likely place to implement this.
IIRC it already does that; it only emits symbolic expressions when, at the time it sees the .loc directive, it can't yet determine whether . > .LVL(#-1) (IOW, have we advanced PC since the previous view in the same subsection?). If we can do more to resolve these early, we will certainly cut short the expression bloat and dep chain. Now, even if we can't determine that earlier, we could still do better. We multiply the negation of the above by (.LVU# + 1) (IOW, the successor to the previous view). If the multiplication could take a shortcut if one of the operands is found to be zero (it often will be, in this particular case) and resolve the result to zero (not necessarily for good, since relaxing might turn an align to nothing), we'd get much shorter evaluations overall. If optimizing multiplication is no good, maybe we could introduce a ternary operator for internal use, for use in this case? All that said, GCC will soon cut the chains short at function entry points, where it will force a view reset. This should shrink the chain lengths significantly, but I guess we can still use any of the suggested above to improve it so we only hit the O(2^n) case with small n.
Mine, just awaiting some feedback on the proposed courses of action.
Uhh... Do you still observe this slow down? We've fixed other instances of this at various times. Please reopen if it's still observable.