[PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]

Thu Mar 25 21:43:57 GMT 2021

On 3/24/21 5:31 AM, Jakub Jelinek wrote:
> Hi!
> 
> On the testcase in the PR with
> -fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
> we have prologue:
> 0000000000000000 <_func_with_dwarf_issue_>:
>     0:	4c 8d 54 24 08       	lea    0x8(%rsp),%r10
>     5:	48 83 e4 f0          	and    $0xfffffffffffffff0,%rsp
>     9:	41 ff 72 f8          	pushq  -0x8(%r10)
>     d:	55                   	push   %rbp
>     e:	48 89 e5             	mov    %rsp,%rbp
>    11:	41 57                	push   %r15
>    13:	41 56                	push   %r14
>    15:	41 55                	push   %r13
>    17:	41 54                	push   %r12
>    19:	41 52                	push   %r10
>    1b:	53                   	push   %rbx
>    1c:	48 83 ec 20          	sub    $0x20,%rsp
> and emit
> 00000000 0000000000000014 00000000 CIE
>    Version:               1
>    Augmentation:          "zR"
>    Code alignment factor: 1
>    Data alignment factor: -8
>    Return address column: 16
>    Augmentation data:     1b
>    DW_CFA_def_cfa: r7 (rsp) ofs 8
>    DW_CFA_offset: r16 (rip) at cfa-8
>    DW_CFA_nop
>    DW_CFA_nop
> 
> 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
>    DW_CFA_advance_loc: 5 to 0000000000000005
>    DW_CFA_def_cfa: r10 (r10) ofs 0
>    DW_CFA_advance_loc: 9 to 000000000000000e
>    DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
>    DW_CFA_advance_loc: 13 to 000000000000001b
>    DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
>    DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
>    DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
>    DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
>    DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
> ...
> unwind info for that.  The problem is when async signal
> (or stepping through in the debugger) stops after the pushq %rbp
> instruction and before movq %rsp, %rbp, the unwind info says that
> caller's %rbp is saved there at *%rbp, but that is not true, caller's
> %rbp is either still available in the %rbp register, or in *%rsp,
> only after executing the next instruction - movq %rsp, %rbp - the
> location for %rbp is correct.  So, either we'd need to temporarily
> say:
>    DW_CFA_advance_loc: 9 to 000000000000000e
>    DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
>    DW_CFA_advance_loc: 3 to 0000000000000011
>    DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
>    DW_CFA_advance_loc: 10 to 000000000000001b
> or to me it seems more compact to just say:
>    DW_CFA_advance_loc: 12 to 0000000000000011
>    DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
>    DW_CFA_advance_loc: 10 to 000000000000001b
> 
> The following patch implements the latter.  It keeps the call/jump and
> flush queue handling as is and only moves the clobbers_queued_reg_save
> forced flushes from before the instruction to immediately after the first
> instruction.  My reading of libgcc unwinder is that while for calls it
> executes cfi instruction until < pc (but pc is the end of call anyway),
> for async signal stops it executes cfi instruction until <= pc, so it
> will only omit cfi instructions starting with advance to a higher loc.
> On the testcase from the PR, the difference with the patch is:
> @@ -16,9 +16,9 @@ Contents of the .eh_frame section:
>   00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
>     DW_CFA_advance_loc: 5 to 0000000000000005
>     DW_CFA_def_cfa: r10 (r10) ofs 0
> -  DW_CFA_advance_loc: 9 to 000000000000000e
> +  DW_CFA_advance_loc: 12 to 0000000000000011
>     DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
> -  DW_CFA_advance_loc: 13 to 000000000000001b
> +  DW_CFA_advance_loc: 10 to 000000000000001b
>     DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
>     DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
>     DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
> When looking at other changes, e.g. on cc1plus itself when it has
> identical objdump -dr, the without patch to with patch difference
> is e.g.:
>   0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d
>     DW_CFA_advance_loc: 1 to 0000000000e2fc81
>     DW_CFA_def_cfa_offset: 16
>     DW_CFA_offset: r6 (rbp) at cfa-16
>     DW_CFA_advance_loc: 3 to 0000000000e2fc84
>     DW_CFA_def_cfa_register: r6 (rbp)
> -  DW_CFA_advance_loc: 6 to 0000000000e2fc8a
> +  DW_CFA_advance_loc: 9 to 0000000000e2fc8d
>     DW_CFA_offset: r15 (r15) at cfa-24
>     DW_CFA_offset: r14 (r14) at cfa-32
>     DW_CFA_offset: r13 (r13) at cfa-40
> -  DW_CFA_advance_loc: 6 to 0000000000e2fc90
> +  DW_CFA_advance_loc: 5 to 0000000000e2fc92
>     DW_CFA_offset: r12 (r12) at cfa-48
>     DW_CFA_offset: r3 (rbx) at cfa-56
> -  DW_CFA_advance_loc2: 399 to 0000000000e2fe1f
> +  DW_CFA_advance_loc2: 397 to 0000000000e2fe1f
>     DW_CFA_remember_state
>     DW_CFA_def_cfa: r7 (rsp) ofs 8
>     DW_CFA_advance_loc: 1 to 0000000000e2fe20
>     DW_CFA_restore_state
>     DW_CFA_nop
>     DW_CFA_nop
> for
> 0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>:
>    e2fc80:       55                      push   %rbp
>    e2fc81:       48 89 e5                mov    %rsp,%rbp
>    e2fc84:       41 57                   push   %r15
>    e2fc86:       41 56                   push   %r14
>    e2fc88:       41 55                   push   %r13
>    e2fc8a:       45 31 ed                xor    %r13d,%r13d
>    e2fc8d:       41 54                   push   %r12
>    e2fc8f:       53                      push   %rbx
>    e2fc90:       31 db                   xor    %ebx,%ebx
>    e2fc92:       48 81 ec 98 02 00 00    sub    $0x298,%rsp
>    e2fc99:       89 7d c4                mov    %edi,-0x3c(%rbp)
> ...
> prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40
> starts applying before or after the xorl %r13d,%r13d instruction
> and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying
> before or after the xorl %ebx,%ebx instruction.
> Or:
> -00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
> +00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
>     DW_CFA_advance_loc: 1 to 0000000000e3d161
>     DW_CFA_def_cfa_offset: 16
>     DW_CFA_offset: r6 (rbp) at cfa-16
>     DW_CFA_advance_loc: 9 to 0000000000e3d16a
>     DW_CFA_def_cfa_register: r6 (rbp)
> -  DW_CFA_advance_loc: 2 to 0000000000e3d16c
> +  DW_CFA_advance_loc: 5 to 0000000000e3d16f
>     DW_CFA_offset: r12 (r12) at cfa-24
> -  DW_CFA_advance_loc1: 66 to 0000000000e3d1ae
> +  DW_CFA_advance_loc: 63 to 0000000000e3d1ae
>     DW_CFA_def_cfa: r7 (rsp) ofs 8
> -  DW_CFA_nop
> -  DW_CFA_nop
> -  DW_CFA_nop
> for:
> 0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>:
>    e3d160:       55                      push   %rbp
>    e3d161:       48 63 c2                movslq %edx,%rax
>    e3d164:       49 89 f8                mov    %rdi,%r8
>    e3d167:       48 89 e5                mov    %rsp,%rbp
>    e3d16a:       41 54                   push   %r12
>    e3d16c:       49 89 c4                mov    %rax,%r12
>    e3d16f:       48 83 ec 08             sub    $0x8,%rsp
> again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying
> before movq %rax,%r12 instruction or after it.
> I think if for some instructions it is possible that debugger or signal
> would stop in the middle of instruction, with some side-effects of the
> instruction already done and others not yet, then we can't do that,
> but at the start of instructions that modify the register the side-effects
> of the instruction should not have taken effect yet.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux.

LGTM.  This seems consistent with when we flush due to clobbers in 
dwarf2out_frame_debug.

> Will e.g. GDB be happy about the changes?

I would expect so, but it would be good to have someone from GDB verify.

> Thinking some more about this, what can be problematic on the GCC side
> is inline asm, that can and often will contain multiple instructions.
> For that is an easy fix, just test asm_noperands and handle
> clobbers_queued_reg_save before the insn rather than after in that case.

Sure, but does inline asm go through dwarf2out_frame_debug?

> But there is another problem, instruction patterns that emit multiple
> hw instructions, code can stop in between them.  So, do we want some
> instruction attribute that would conservatively tell us which instruction
> patterns are guaranteed to be single machine instructions?

It seems to me that in that situation you'd want to add the save to 
*%rsp, and still not update to *%rbp until after the combined instruction.

Incidentally, it seems a bit odd that clobbers_queued_reg_save returns 
true both for clobbering the saved register and for clobbering the save 
location; if I already know that %rbp is saved at *%rsp, why do I care 
that the insn we're looking at clobbers %rbp?  Maybe we should only 
check modified_in_p (q->reg, insn) if q->reg doesn't appear in 
regs_saved_in_regs.  That wouldn't help with this testcase, though.

> 2021-03-24  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR debug/99334
> 	* dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn)
> 	forced flushes of queued register saves for non-call/jump insns
> 	that can't throw, emit them first after the insn rather than
> 	before it.
> 
> --- gcc/dwarf2cfi.c.jj	2021-03-02 11:25:47.217727061 +0100
> +++ gcc/dwarf2cfi.c	2021-03-23 17:34:58.240281522 +0100
> @@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e
>   	      dwarf2out_flush_queued_reg_saves ();
>   	    }
>   	  else if (!NONJUMP_INSN_P (insn)
> -		   || clobbers_queued_reg_save (insn)
>   		   || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
>   	    dwarf2out_flush_queued_reg_saves ();
>   	  any_cfis_emitted = false;
>   
>   	  add_cfi_insn = insn;
> +
> +	  if (queued_reg_saves.length () && clobbers_queued_reg_save (insn))
> +	    dwarf2out_flush_queued_reg_saves ();
> +
>   	  scan_insn_after (insn);
>   	  control = insn;
>   	}
> 
> 
> 	Jakub
>