[PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]
Jason Merrill
jason@redhat.com
Thu Mar 25 21:43:57 GMT 2021
On 3/24/21 5:31 AM, Jakub Jelinek wrote:
> Hi!
>
> On the testcase in the PR with
> -fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
> we have prologue:
> 0000000000000000 <_func_with_dwarf_issue_>:
> 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10
> 5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
> 9: 41 ff 72 f8 pushq -0x8(%r10)
> d: 55 push %rbp
> e: 48 89 e5 mov %rsp,%rbp
> 11: 41 57 push %r15
> 13: 41 56 push %r14
> 15: 41 55 push %r13
> 17: 41 54 push %r12
> 19: 41 52 push %r10
> 1b: 53 push %rbx
> 1c: 48 83 ec 20 sub $0x20,%rsp
> and emit
> 00000000 0000000000000014 00000000 CIE
> Version: 1
> Augmentation: "zR"
> Code alignment factor: 1
> Data alignment factor: -8
> Return address column: 16
> Augmentation data: 1b
> DW_CFA_def_cfa: r7 (rsp) ofs 8
> DW_CFA_offset: r16 (rip) at cfa-8
> DW_CFA_nop
> DW_CFA_nop
>
> 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
> DW_CFA_advance_loc: 5 to 0000000000000005
> DW_CFA_def_cfa: r10 (r10) ofs 0
> DW_CFA_advance_loc: 9 to 000000000000000e
> DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
> DW_CFA_advance_loc: 13 to 000000000000001b
> DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
> DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
> DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
> DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
> DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
> ...
> unwind info for that. The problem is when async signal
> (or stepping through in the debugger) stops after the pushq %rbp
> instruction and before movq %rsp, %rbp, the unwind info says that
> caller's %rbp is saved there at *%rbp, but that is not true, caller's
> %rbp is either still available in the %rbp register, or in *%rsp,
> only after executing the next instruction - movq %rsp, %rbp - the
> location for %rbp is correct. So, either we'd need to temporarily
> say:
> DW_CFA_advance_loc: 9 to 000000000000000e
> DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
> DW_CFA_advance_loc: 3 to 0000000000000011
> DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
> DW_CFA_advance_loc: 10 to 000000000000001b
> or to me it seems more compact to just say:
> DW_CFA_advance_loc: 12 to 0000000000000011
> DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
> DW_CFA_advance_loc: 10 to 000000000000001b
>
> The following patch implements the latter. It keeps the call/jump and
> flush queue handling as is and only moves the clobbers_queued_reg_save
> forced flushes from before the instruction to immediately after the first
> instruction. My reading of libgcc unwinder is that while for calls it
> executes cfi instruction until < pc (but pc is the end of call anyway),
> for async signal stops it executes cfi instruction until <= pc, so it
> will only omit cfi instructions starting with advance to a higher loc.
> On the testcase from the PR, the difference with the patch is:
> @@ -16,9 +16,9 @@ Contents of the .eh_frame section:
> 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
> DW_CFA_advance_loc: 5 to 0000000000000005
> DW_CFA_def_cfa: r10 (r10) ofs 0
> - DW_CFA_advance_loc: 9 to 000000000000000e
> + DW_CFA_advance_loc: 12 to 0000000000000011
> DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
> - DW_CFA_advance_loc: 13 to 000000000000001b
> + DW_CFA_advance_loc: 10 to 000000000000001b
> DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
> DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
> DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
> When looking at other changes, e.g. on cc1plus itself when it has
> identical objdump -dr, the without patch to with patch difference
> is e.g.:
> 0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d
> DW_CFA_advance_loc: 1 to 0000000000e2fc81
> DW_CFA_def_cfa_offset: 16
> DW_CFA_offset: r6 (rbp) at cfa-16
> DW_CFA_advance_loc: 3 to 0000000000e2fc84
> DW_CFA_def_cfa_register: r6 (rbp)
> - DW_CFA_advance_loc: 6 to 0000000000e2fc8a
> + DW_CFA_advance_loc: 9 to 0000000000e2fc8d
> DW_CFA_offset: r15 (r15) at cfa-24
> DW_CFA_offset: r14 (r14) at cfa-32
> DW_CFA_offset: r13 (r13) at cfa-40
> - DW_CFA_advance_loc: 6 to 0000000000e2fc90
> + DW_CFA_advance_loc: 5 to 0000000000e2fc92
> DW_CFA_offset: r12 (r12) at cfa-48
> DW_CFA_offset: r3 (rbx) at cfa-56
> - DW_CFA_advance_loc2: 399 to 0000000000e2fe1f
> + DW_CFA_advance_loc2: 397 to 0000000000e2fe1f
> DW_CFA_remember_state
> DW_CFA_def_cfa: r7 (rsp) ofs 8
> DW_CFA_advance_loc: 1 to 0000000000e2fe20
> DW_CFA_restore_state
> DW_CFA_nop
> DW_CFA_nop
> for
> 0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>:
> e2fc80: 55 push %rbp
> e2fc81: 48 89 e5 mov %rsp,%rbp
> e2fc84: 41 57 push %r15
> e2fc86: 41 56 push %r14
> e2fc88: 41 55 push %r13
> e2fc8a: 45 31 ed xor %r13d,%r13d
> e2fc8d: 41 54 push %r12
> e2fc8f: 53 push %rbx
> e2fc90: 31 db xor %ebx,%ebx
> e2fc92: 48 81 ec 98 02 00 00 sub $0x298,%rsp
> e2fc99: 89 7d c4 mov %edi,-0x3c(%rbp)
> ...
> prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40
> starts applying before or after the xorl %r13d,%r13d instruction
> and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying
> before or after the xorl %ebx,%ebx instruction.
> Or:
> -00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
> +00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
> DW_CFA_advance_loc: 1 to 0000000000e3d161
> DW_CFA_def_cfa_offset: 16
> DW_CFA_offset: r6 (rbp) at cfa-16
> DW_CFA_advance_loc: 9 to 0000000000e3d16a
> DW_CFA_def_cfa_register: r6 (rbp)
> - DW_CFA_advance_loc: 2 to 0000000000e3d16c
> + DW_CFA_advance_loc: 5 to 0000000000e3d16f
> DW_CFA_offset: r12 (r12) at cfa-24
> - DW_CFA_advance_loc1: 66 to 0000000000e3d1ae
> + DW_CFA_advance_loc: 63 to 0000000000e3d1ae
> DW_CFA_def_cfa: r7 (rsp) ofs 8
> - DW_CFA_nop
> - DW_CFA_nop
> - DW_CFA_nop
> for:
> 0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>:
> e3d160: 55 push %rbp
> e3d161: 48 63 c2 movslq %edx,%rax
> e3d164: 49 89 f8 mov %rdi,%r8
> e3d167: 48 89 e5 mov %rsp,%rbp
> e3d16a: 41 54 push %r12
> e3d16c: 49 89 c4 mov %rax,%r12
> e3d16f: 48 83 ec 08 sub $0x8,%rsp
> again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying
> before movq %rax,%r12 instruction or after it.
> I think if for some instructions it is possible that debugger or signal
> would stop in the middle of instruction, with some side-effects of the
> instruction already done and others not yet, then we can't do that,
> but at the start of instructions that modify the register the side-effects
> of the instruction should not have taken effect yet.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux.
LGTM. This seems consistent with when we flush due to clobbers in
dwarf2out_frame_debug.
> Will e.g. GDB be happy about the changes?
I would expect so, but it would be good to have someone from GDB verify.
> Thinking some more about this, what can be problematic on the GCC side
> is inline asm, that can and often will contain multiple instructions.
> For that is an easy fix, just test asm_noperands and handle
> clobbers_queued_reg_save before the insn rather than after in that case.
Sure, but does inline asm go through dwarf2out_frame_debug?
> But there is another problem, instruction patterns that emit multiple
> hw instructions, code can stop in between them. So, do we want some
> instruction attribute that would conservatively tell us which instruction
> patterns are guaranteed to be single machine instructions?
It seems to me that in that situation you'd want to add the save to
*%rsp, and still not update to *%rbp until after the combined instruction.
Incidentally, it seems a bit odd that clobbers_queued_reg_save returns
true both for clobbering the saved register and for clobbering the save
location; if I already know that %rbp is saved at *%rsp, why do I care
that the insn we're looking at clobbers %rbp? Maybe we should only
check modified_in_p (q->reg, insn) if q->reg doesn't appear in
regs_saved_in_regs. That wouldn't help with this testcase, though.
> 2021-03-24 Jakub Jelinek <jakub@redhat.com>
>
> PR debug/99334
> * dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn)
> forced flushes of queued register saves for non-call/jump insns
> that can't throw, emit them first after the insn rather than
> before it.
>
> --- gcc/dwarf2cfi.c.jj 2021-03-02 11:25:47.217727061 +0100
> +++ gcc/dwarf2cfi.c 2021-03-23 17:34:58.240281522 +0100
> @@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e
> dwarf2out_flush_queued_reg_saves ();
> }
> else if (!NONJUMP_INSN_P (insn)
> - || clobbers_queued_reg_save (insn)
> || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
> dwarf2out_flush_queued_reg_saves ();
> any_cfis_emitted = false;
>
> add_cfi_insn = insn;
> +
> + if (queued_reg_saves.length () && clobbers_queued_reg_save (insn))
> + dwarf2out_flush_queued_reg_saves ();
> +
> scan_insn_after (insn);
> control = insn;
> }
>
>
> Jakub
>
More information about the Gdb
mailing list