Bug 30028 - [gdb/tdep, x86_64, i386] Epilogue unwinder preferred over dwarf unwinder based on gcc 4.4
Summary: [gdb/tdep, x86_64, i386] Epilogue unwinder preferred over dwarf unwinder base...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: tdep (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 14.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-20 14:06 UTC by Tom de Vries
Modified: 2023-02-20 13:51 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2023-01-20 14:06:37 UTC
Consider test-case gdb.base/unwind-on-each-insn.exp.

On x86_64-linux, it passes.

Now, let's revert the fix in commit 49d7cd733a7 ("Change calculation of frame_id by amd64 epilogue unwinder"):
...
$ git show 49d7cd733a7f1b87aa1d40318b3d7c2b65aca5ac gdb/amd64-tdep.c > tmp.patch
$ patch -p1 -R < tmp.patch 
patching file gdb/amd64-tdep.c
Hunk #1 succeeded at 2945 (offset 8 lines).
Hunk #2 succeeded at 2994 (offset 8 lines).
...

Now we have:
...
FAIL: gdb.base/unwind-on-each-insn.exp: instruction 5: $fba_value == $foo_fba
FAIL: gdb.base/unwind-on-each-insn.exp: instruction 5: check frame-id matches
...

OK, so the test-case acts as regression test for this commit, that seems as intended.

For context, let's look at the foo insns:
...
00000000004004bc <foo>:
  4004bc:       55                      push   %rbp
  4004bd:       48 89 e5                mov    %rsp,%rbp
  4004c0:       90                      nop
  4004c1:       5d                      pop    %rbp
  4004c2:       c3                      ret
...

This is a minimal version of the failure:
...
$ gdb -q -batch outputs/gdb.base/unwind-on-each-insn/unwind-on-each-insn \
  -ex "b *foo" \
  -ex run \
  -ex "si 3" \
  -ex "info frame" \
  -ex si \
  -ex "info frame"
Breakpoint 1 at 0x4004bc

Breakpoint 1, 0x00000000004004bc in foo ()
0x00000000004004c1 in foo ()
Stack level 0, frame at 0x7fffffffdae0:
 rip = 0x4004c1 in foo; saved rip = 0x4004b5
 called by frame at 0x7fffffffdaf0
 Arglist at 0x7fffffffdad0, args: 
 Locals at 0x7fffffffdad0, Previous frame's sp is 0x7fffffffdae0
 Saved registers:
  rbp at 0x7fffffffdad0, rip at 0x7fffffffdad8
0x00000000004004c2 in foo ()
Stack level 0, frame at 0x7fffffffdad8:
 rip = 0x4004c2 in foo; saved rip = 0x4004b5
 called by frame at 0x7fffffffdaf0
 Arglist at 0x7fffffffdae0, args: 
 Locals at 0x7fffffffdae0, Previous frame's sp is 0x7fffffffdae0
 Saved registers:
  rip at 0x7fffffffdad8
...

The problem is that the "frame at" lists a different address at the last insn.

However, we also have this information available:
...
00000068 000000000000001c 0000003c FDE cie=00000030 pc=00000000004004bc..00000000004004c3     
   LOC           CFA      rbp   ra
00000000004004bc rsp+8    u     c-8
00000000004004bd rsp+16   c-16  c-8
00000000004004c0 rbp+16   c-16  c-8
00000000004004c2 rsp+8    c-16  c-8
...

Let's see if it is correct.

At 0x00000000004004c1 we have:
...
(gdb) p $rbp+16
$1 = (void *) 0x7fffffffdae0
...

At 0x00000000004004c2 we have:
...
(gdb) p $rsp+8
$2 = (void *) 0x7fffffffdae0
...
Well, that looks correct.

So, why is this info not used?

The amd64 epilogue unwinder (the one containing the bug), has the highest priority:
...
  /* Hook the function epilogue frame unwinder.  This unwinder is                            
     appended to the list first, so that it supercedes the other                             
     unwinders in function epilogues.  */                                                    
  frame_unwind_prepend_unwinder (gdbarch, &amd64_epilogue_frame_unwind);                     
...

Let's try to see what happens if we disable it:
...
0x00000000004004c2 in foo ()
Stack level 0, frame at 0x7fffffffdbcc:
...
Hmm, that's also no good, it seems that the i386_epilogue_frame_unwind kicked in.

I don't under stand why this would be active for x86_64 code, maybe that is a bug in itself. Anyway, why does this one have this high priority? Well, the code says:
...

  /* Hook the function epilogue frame unwinder.  This unwinder is                             
     appended to the list first, so that it supercedes the DWARF                              
     unwinder in function epilogues (where the DWARF unwinder                                 
     currently fails).  */
  frame_unwind_append_unwinder (gdbarch, &i386_epilogue_frame_unwind);
...

Hmm, so the "currently" refers to July 2009.

In the submission discussion I find this ( https://sourceware.org/pipermail/gdb-patches/2009-July/066792.html ):
...
> I also think you should add a comment about the specific ordering of
> this unwinder.  It has to come before the dwarf2 unwinder because GCC
> doesn't provide proper CFI for the epilogue, right?

Right - I would like to have a way to suppress this unwinder, maybe
based on the producer string like other recognized dwarf2-frame
quirks, but we can worry about that later.  I hope it will be
unnecessary with GCC 4.5.
...

So, if I understand correctly, the priority decision was done based on gcc 4.4.

So maybe it's time to revisit this.

I suppose the producer test is doable, but we'll need to decide about trusted vs untrusted versions, which atm I have no clue about.

A minor benefit of preferring dwarf is that behaviour will be more similar to targets that do not have epilogue unwinders.

Anyway, disabling i386_epilogue_frame_unwind gives me the desired:
...
0x00000000004004c2 in foo ()
Stack level 0, frame at 0x7fffffffdad0:
...
and indeed, the test-case passes again.

At the very least, I would like a maintenance command that tells gdb to prefer dwarf over the epilogue unwinder, in order to be able to test the actual dwarf.  That could also be used to proactively spot bugs in the epilogue unwinder.
Comment 1 Tom de Vries 2023-01-20 14:13:10 UTC
FTR, there are just 5 targets using epilogue unwinders:
...
gdb/amd64-tdep.c:  frame_unwind_prepend_unwinder (gdbarch, &amd64_epilogue_frame_unwind);
gdb/arm-tdep.c:  frame_unwind_append_unwinder (gdbarch, &arm_epilogue_frame_unwind);
gdb/i386-tdep.c:  frame_unwind_append_unwinder (gdbarch, &i386_epilogue_frame_unwind);
gdb/nds32-tdep.c:  frame_unwind_append_unwinder (gdbarch, &nds32_epilogue_frame_unwind);
gdb/rs6000-tdep.c:      frame_unwind_append_unwinder (gdbarch, &rs6000_epilogue_frame_unwind);
gdb/rs6000-tdep.c:      frame_unwind_append_unwinder (gdbarch, &rs6000_epilogue_frame_unwind);
...

The arm one doesn't have priority over dwarf.  Same for nds32 and rs6000.

So, this particularity seems limited to amd64/i386.
Comment 2 Tom de Vries 2023-01-20 14:23:11 UTC
Here ( https://gcc.gnu.org/gcc-4.5/changes.html ) I read:
...
GCC now generates unwind info also for epilogues. 
...
So, maybe this was fixed in gcc 4.5 ?
Comment 3 Tom de Vries 2023-01-20 14:47:35 UTC
I did a build and test run with i386_epilogue_frame_unwind and amd64_epilogue_frame_unwind disabled, with gcc 7.5.0.  No regressions.
Comment 4 Tom de Vries 2023-01-20 16:02:50 UTC
(In reply to Tom de Vries from comment #2)
> Here ( https://gcc.gnu.org/gcc-4.5/changes.html ) I read:
> ...
> GCC now generates unwind info also for epilogues. 
> ...
> So, maybe this was fixed in gcc 4.5 ?

OK, I think this is referring to: https://gcc.gnu.org/pipermail/gcc-patches/2009-May/261377.html which was committed as cd9c1ca866b6aa5041a352e0ed07ae5f91e235e5 .

First tag to contain the commit: 4.5.0.
Comment 5 Tom de Vries 2023-01-20 16:24:32 UTC
(In reply to Tom de Vries from comment #4)
> (In reply to Tom de Vries from comment #2)
> > Here ( https://gcc.gnu.org/gcc-4.5/changes.html ) I read:
> > ...
> > GCC now generates unwind info also for epilogues. 
> > ...
> > So, maybe this was fixed in gcc 4.5 ?
> 
> OK, I think this is referring to:
> https://gcc.gnu.org/pipermail/gcc-patches/2009-May/261377.html which was
> committed as cd9c1ca866b6aa5041a352e0ed07ae5f91e235e5 .
> 
> First tag to contain the commit: 4.5.0.

And, we already know this:
...
      if (gcc_4_minor >= 5)
        cust->set_epilogue_unwind_valid (true);
...

So we already scan the producer for https://sourceware.org/pipermail/gdb-patches/2011-June/083429.html , and crucially:
...
I find questionable this detection uses DWARF DW_AT_producer.  Therefore on
binaries with .eh_frame but no (DWARF) debug info the detection fails and
epilogue unwinders will get into affect.
...

So, that's what's happening.  We have dwarf apart from .eh_frame, and we don't use the .eh_frame info because we can't check that the producer is 4.5 or older.

So, maybe we should just flip the switch on the default assumption.
Comment 6 Tom de Vries 2023-01-20 16:43:14 UTC
(In reply to Tom de Vries from comment #5)
> So, maybe we should just flip the switch on the default assumption.

Like so:
...
diff --git a/gdb/amd64-tdep.c b/gdb/amd64-tdep.c
index 76e843ecc35..86e12b4304a 100644
--- a/gdb/amd64-tdep.c
+++ b/gdb/amd64-tdep.c
@@ -2905,7 +2905,7 @@ amd64_stack_frame_destroyed_p
   struct compunit_symtab *cust;
 
   cust = find_pc_compunit_symtab (pc);
-  if (cust != NULL && cust->epilogue_unwind_valid ())
+  if (cust == NULL || cust->epilogue_unwind_valid ())
     return 0;
 
   if (target_read_memory (pc, &insn, 1))
diff --git a/gdb/i386-tdep.c b/gdb/i386-tdep.c
index 580664d2ce5..46ad2913788 100644
--- a/gdb/i386-tdep.c
+++ b/gdb/i386-tdep.c
@@ -2222,7 +2222,7 @@ i386_stack_frame_destroyed_p
   struct compunit_symtab *cust;
 
   cust = find_pc_compunit_symtab (pc);
-  if (cust != NULL && cust->epilogue_unwind_valid ())
+  if (cust == NULL || cust->epilogue_unwind_valid ())
     return 0;
 
   if (target_read_memory (pc, &insn, 1))
...
Comment 7 Tom de Vries 2023-01-20 17:46:51 UTC
(In reply to Tom de Vries from comment #6)
> (In reply to Tom de Vries from comment #5)
> > So, maybe we should just flip the switch on the default assumption.
> 
> Like so:
> ...
> diff --git a/gdb/amd64-tdep.c b/gdb/amd64-tdep.c
> index 76e843ecc35..86e12b4304a 100644
> --- a/gdb/amd64-tdep.c
> +++ b/gdb/amd64-tdep.c
> @@ -2905,7 +2905,7 @@ amd64_stack_frame_destroyed_p
>    struct compunit_symtab *cust;
>  
>    cust = find_pc_compunit_symtab (pc);
> -  if (cust != NULL && cust->epilogue_unwind_valid ())
> +  if (cust == NULL || cust->epilogue_unwind_valid ())
>      return 0;
>  
>    if (target_read_memory (pc, &insn, 1))
> diff --git a/gdb/i386-tdep.c b/gdb/i386-tdep.c
> index 580664d2ce5..46ad2913788 100644
> --- a/gdb/i386-tdep.c
> +++ b/gdb/i386-tdep.c
> @@ -2222,7 +2222,7 @@ i386_stack_frame_destroyed_p
>    struct compunit_symtab *cust;
>  
>    cust = find_pc_compunit_symtab (pc);
> -  if (cust != NULL && cust->epilogue_unwind_valid ())
> +  if (cust == NULL || cust->epilogue_unwind_valid ())
>      return 0;
>  
>    if (target_read_memory (pc, &insn, 1))
> ...

And likewise:
...
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 44b54f77de9..44e85ebcfe9 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -8484,7 +8484,15 @@ process_full_comp_unit (dwarf2_cu *cu, enum language pretend_la
nguage)
       if (cu->has_loclist && gcc_4_minor >= 5)
        cust->set_locations_valid (true);
 
-      if (gcc_4_minor >= 5)
+      if (cu->producer == nullptr)
+       /* In absence of producer information, optimistically assume that we're
+          not dealing with gcc < 4.5.0.  */
+       cust->set_epilogue_unwind_valid (true);
+      if (!producer_is_gcc (cu->producer, nullptr, nullptr))
+       /* Not gcc.  */
+       cust->set_epilogue_unwind_valid (true);
+      else if (gcc_4_minor >= 5)
+       /* gcc >= 4.5.0.  */
        cust->set_epilogue_unwind_valid (true);
 
       cust->set_call_site_htab (cu->call_site_htab);
...
Comment 8 Tom de Vries 2023-01-21 07:49:18 UTC
Submitted rfc: https://sourceware.org/pipermail/gdb-patches/2023-January/196035.html
Comment 9 Tom de Vries 2023-02-20 13:51:43 UTC
Fixed by commits:
- 8908d9c45cd ("[gdb/symtab] Trust epilogue unwind info for unknown
  producer (-g0 case)")
- 868014341a7 ("[gdb/symtab] Trust epilogue unwind info for unknown or
  non-gcc producer")