With current trunk on aarch64-linux I'm running into: ... FAIL: gdb.reverse/solib-precsave.exp: reverse-next third shr1 FAIL: gdb.reverse/solib-precsave.exp: reverse-next second shr1 FAIL: gdb.reverse/solib-precsave.exp: reverse-next first shr1 FAIL: gdb.reverse/solib-precsave.exp: reverse-next generic FAIL: gdb.reverse/solib-precsave.exp: reverse-step into solib function one FAIL: gdb.reverse/solib-precsave.exp: reverse-step within solib function one FAIL: gdb.reverse/solib-precsave.exp: reverse-step back to main one ... This seems to be a regression since gdb 12.1. I'll try to bisect.
1f3e37e057e876b37db49dbd8ed5ca22c33f6772 is the first bad commit commit 1f3e37e057e876b37db49dbd8ed5ca22c33f6772 Author: Bruno Larsen <blarsen@redhat.com> Date: Wed May 25 15:02:47 2022 -0300 gdb/reverse: Fix stepping over recursive functions
Running this test on Fedora 36 in aarch64, I only get 5 fails. Looking at the log: reverse-next^M 40 b[0] = 6; b[1] = 9; /* generic statement, end part two */^M (gdb) PASS: gdb.reverse/solib-precsave.exp: reverse-next generic reverse-step^M 40 b[0] = 6; b[1] = 9; /* generic statement, end part two */^M (gdb) FAIL: gdb.reverse/solib-precsave.exp: reverse-step into solib function one reverse-step^M 38 b[1] = shr2(17); /* middle part two */^M (gdb) FAIL: gdb.reverse/solib-precsave.exp: reverse-step within solib function one reverse-step^M shr2 (x=17) at /root/bld/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.reverse/shr2.c:23^M 23 }^M (gdb) FAIL: gdb.reverse/solib-precsave.exp: reverse-step back to main one reverse-step^M 22 return 2*x;^M (gdb) FAIL: gdb.reverse/solib-precsave.exp: reverse-step into solib function two reverse-step^M main () at /root/bld/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.reverse/solib-reverse.c:38^M 38 b[1] = shr2(17); /* middle part two */^M (gdb) FAIL: gdb.reverse/solib-precsave.exp: reverse-step within solib function two It seems that aarch64 is also experiencing https://sourceware.org/bugzilla/show_bug.cgi?id=28426 now, and x86 stopped experiencing it. While I agree that this should be fixed, I'm not sure 1f3e37e057e876b37db49dbd8ed5ca22c33f6772 is the actual culprit. I'll add this to my backlog, but it is pretty low priority, since testing in aarch64 is cumbersome for me, unfortunately. Also, if you are still seeing the failures with third, second and first shr1 with the cureent trunk, please post the relevant log here :-)
Created attachment 14416 [details] gdb.log on aarch64-linux, openSUSE Leap 15.4
This looks like a completely different issue to what I found. Looking at until 46 main () at /home/tdevries/gdb/src/gdb/testsuite/gdb.reverse/solib-reverse.c:46 46 return 0; /* end part one */ (gdb) PASS: gdb.reverse/solib-precsave.exp: forward to end part one reverse-next No more reverse-execution history. main () at /home/tdevries/gdb/src/gdb/testsuite/gdb.reverse/solib-reverse.c:27 27 char* cptr = "String 1"; It looks to be the reason I made my patch that changes frame-id calculations for amd64 epilogue unwinder. Essentially, an incorrect frame-id was set as the owner of the step-resume breakpoint used by reverse-next, GDB never "hit" that breakpoint and went back to the start of the execution history. Is gdb.base/unwind-on-each-insn.exp passing for you?
Created attachment 14417 [details] gdb.log on aarch64-linux, openSUSE Leap 15.4 (gdb.base/unwind-on-each-insn.exp) (In reply to B. Larsen from comment #4) > Is gdb.base/unwind-on-each-insn.exp passing for you? Yes.
(In reply to B. Larsen from comment #4) > This looks like a completely different issue to what I found. Well, it looks like you ran into PR28426, which requires column info to reproduce. This PR is using gcc 7.5.0, which does not have column info by default. So, you could try -gno-column-info with your compiler, and perhaps that'll allow you to get past PR28426 and run into this PR.
I can test on aarch64 and armhf quite easily if you want.
(In reply to Tom de Vries from comment #6) > (In reply to B. Larsen from comment #4) > > This looks like a completely different issue to what I found. > > Well, it looks like you ran into PR28426, which requires column info to > reproduce. > > This PR is using gcc 7.5.0, which does not have column info by default. > > So, you could try -gno-column-info with your compiler, and perhaps that'll > allow you to get past PR28426 and run into this PR. Bruno, have you been able to reproduce ?
> Bruno, have you been able to reproduce ? Haven't gotten around to it yet. I'm working on the regression that Simon found on gdb.reverse/step-reverse.exp using native-gdbserver. I hope to be able to get to this bug next week, but since that bug happens because of a corrupted stack somewhere, it might take me a little while.
Hi Tom, I am unable to reproduce the bug using gcc 12.2.1. Adding -gno-column-info only removed those 5 errors. I'm still suspecting that a frame id is being calculated incorrectly somewhere, I'll try to build gcc 7.5.0 and reproduce
The oldest gcc I could find was 8.5.0, and it isn't reproducing the issue. I don't think I can help with this regression... sorry
Are we sure this isn't the known issue of reverse-stepping through a non-contiguous block of PC addresses? The one we stop halfway through line because we have a couple distinct entries for the same line?
Created attachment 14450 [details] Reproducer patch, using Leap 15.4 .s files
(In reply to B. Larsen from comment #11) > The oldest gcc I could find was 8.5.0, and it isn't reproducing the issue. I > don't think I can help with this regression... sorry I created a patch that uses .s files for the two files that use debug info, so perhaps using those you should be able to reproduce. I understand if you consider this too much of a security risk...
Created attachment 14516 [details] Log failing to reproduce I've just attempted to reproduce using the .s files you provided, and I still don't see the same failures you have, tom. The failures that show up in this log are probably because some .s file had the path from your machine. I've attached the execution log, but I'm not sure how useful it will be... Again, I'm very sorry, I don't think I can help with this one.
First sign of trouble: ... [infrun] bpstat_check_breakpoint_conditions: incorrect frame {stack=0xfffffffff320,code=0x0000fffff7f9d594,!special} not {stack=0xfffffffff330,code=0x0000fffff7f9d594,!special}, not stopping ... Using additional_flags=-fasynchronous-unwind-tables works around this. On x86_64, using additional_flags=-fno-asynchronous-unwind-tables to drop the cfi info, the problem doesn't trigger. With x86_64, we have: ... 000000000000050a <shr1>: 50a: 55 push %rbp 50b: 48 89 e5 mov %rsp,%rbp 50e: 48 89 7d f8 mov %rdi,-0x8(%rbp) 512: 90 nop 513: 5d pop %rbp 514: c3 ret ... and at the first insn: ... 1: /x $pc = 0x7ffff7ff150a 2: /x $sp = 0x7fffffffdad8 3: /x $rbp = 0x7fffffffdaf0 (gdb) info frame Stack level 0, frame at 0x7fffffffdae0: rip = 0x7ffff7ff150a in shr1; saved rip = 0x4005df called by frame at 0x7fffffffdb00 Arglist at 0x7fffffffdad0, args: Locals at 0x7fffffffdad0, Previous frame's sp is 0x7fffffffdae0 Saved registers: rip at 0x7fffffffdad8 ... and at the last insn: ... 1: /x $pc = 0x7ffff7ff1514 2: /x $sp = 0x7fffffffdad8 3: /x $rbp = 0x7fffffffdaf0 (gdb) info frame Stack level 0, frame at 0x7fffffffdae0: rip = 0x7ffff7ff1514 in shr1; saved rip = 0x4005df called by frame at 0x7fffffffdb00 Arglist at 0x7fffffffdaf0, args: Locals at 0x7fffffffdaf0, Previous frame's sp is 0x7fffffffdae0 Saved registers: rip at 0x7fffffffdad8 ... the same frame address (as well as the same $sp and $fp). On aarch64 (with additional_flags=-fno-omit-frame-pointer additional_flags=-mno-omit-leaf-frame-pointer added, to have more comparable code), we have: ... 0000000000000594 <shr1>: 594: a9be7bfd stp x29, x30, [sp, #-32]! 598: 910003fd mov x29, sp 59c: f9000fa0 str x0, [x29, #24] 5a0: d503201f nop 5a4: a8c27bfd ldp x29, x30, [sp], #32 5a8: d65f03c0 ret ... and at the first insn: ... 1: /x $pc = 0xfffff7f9d594 2: /x $sp = 0xfffffffff320 3: /x $x29 = 0xfffffffff320 (gdb) info frame Stack level 0, frame at 0xfffffffff320: pc = 0xfffff7f9d594 in shr1; saved pc = 0x400718 called by frame at 0xfffffffff340 Arglist at 0xfffffffff320, args: Locals at 0xfffffffff320, Previous frame's sp is 0xfffffffff320 ... and at the last insn: ... 1: /x $pc = 0xfffff7f9d5a8 2: /x $sp = 0xfffffffff320 3: /x $x29 = 0xfffffffff320 (gdb) info frame Stack level 0, frame at 0xfffffffff340: pc = 0xfffff7f9d5a8 in shr1; saved pc = 0xfffff7d68fa0 called by frame at 0xfffffffff490 Arglist at 0xfffffffff320, args: Locals at 0xfffffffff320, Previous frame's sp is 0xfffffffff340 Saved registers: x29 at 0xfffffffff320, x30 at 0xfffffffff328 ... so a different frame address (while having the same $sp and $fp).
(In reply to Tom de Vries from comment #16) > On aarch64 (with additional_flags=-fno-omit-frame-pointer > additional_flags=-mno-omit-leaf-frame-pointer added, to have more comparable > code), we have: And for completeness-sake, without those flags (so, exercising the original failure): ... 0000000000000594 <shr1>: 594: d10043ff sub sp, sp, #0x10 598: f90007e0 str x0, [sp, #8] 59c: d503201f nop 5a0: 910043ff add sp, sp, #0x10 5a4: d65f03c0 ret ... and at first insn: ... 1: /x $pc = 0xfffff7f9d594 2: /x $sp = 0xfffffffff320 3: /x $x29 = 0xfffffffff320 (gdb) info frame Stack level 0, frame at 0xfffffffff320: pc = 0xfffff7f9d594 in shr1; saved pc = 0x400718 called by frame at 0xfffffffff340 Arglist at 0xfffffffff320, args: Locals at 0xfffffffff320, Previous frame's sp is 0xfffffffff320 ... and at last insn: ... 1: /x $pc = 0xfffff7f9d5a4 2: /x $sp = 0xfffffffff320 3: /x $x29 = 0xfffffffff320 (gdb) info frame Stack level 0, frame at 0xfffffffff330: pc = 0xfffff7f9d5a4 in shr1; saved pc = 0x400718 called by frame at 0xfffffffff340 Arglist at 0xfffffffff320, args: Locals at 0xfffffffff320, Previous frame's sp is 0xfffffffff330 Saved registers: x0 at 0xfffffffff328 ...
Tentative patch: ... diff --git a/gdb/aarch64-tdep.c b/gdb/aarch64-tdep.c index b576d3b9d99..06349353716 100644 --- a/gdb/aarch64-tdep.c +++ b/gdb/aarch64-tdep.c @@ -996,7 +996,11 @@ aarch64_make_prologue_cache_1 (frame_info_ptr this_frame, if (unwound_fp == 0) return; - cache->prev_sp = unwound_fp + cache->framesize; + if (cache->framereg == AARCH64_SP_REGNUM + && get_frame_register_unsigned (this_frame, AARCH64_FP_REGNUM) == unwound_fp) + cache->prev_sp = unwound_fp; + else + cache->prev_sp = unwound_fp + cache->framesize; /* Calculate actual addresses of saved registers using offsets determined by aarch64_analyze_prologue. */ ...
Isn't this a failure of gdb to interpret the line table changes properly? If so, why the target-specific change to aarch64's prologue cache?
(In reply to Luis Machado from comment #19) > Isn't this a failure of gdb to interpret the line table changes properly? If > so, why the target-specific change to aarch64's prologue cache? The problem happens in libshr1.sl, which is compiled without debug info, so there are no line tables...
Ok. So is this a completely different issue compared to the line table one?
(In reply to Luis Machado from comment #21) > Ok. So is this a completely different issue compared to the line table one? I'm not sure yet if there is one.
Filed spinoff PRs: - PR30010 - [gdb/tdep, aarch64] Incorrect frame address for last insn (non-leaf case) https://sourceware.org/bugzilla/show_bug.cgi?id=30010 - PR30011 - [gdb/tdep, aarch64] Incorrect frame address for last insn (leaf case) https://sourceware.org/bugzilla/show_bug.cgi?id=30011
(In reply to B. Larsen from comment #15) > I've attached the execution log, but I'm not sure how useful it will be... In the log I see this: ... warning: Can not parse XML target description; XML support was disabled at compile time ... I'd advise you to always go and fix this if you run into this, some targets don't function properly without.
(In reply to B. Larsen from comment #15) > I've just attempted to reproduce using the .s files you provided, and I > still don't see the same failures you have, tom. OK, I'll attach a patch that uses the .s file for all three files involved, hopefully that'll allow reproduction.
Created attachment 14609 [details] Reproducer patch, using Leap 15.4 .s files
(In reply to Tom de Vries from comment #24) > (In reply to B. Larsen from comment #15) > > I've attached the execution log, but I'm not sure how useful it will be... > > In the log I see this: > ... > warning: Can not parse XML target description; XML support was disabled at > compile time > ... > I'd advise you to always go and fix this if you run into this, some targets > don't function properly without. We should probably start erroring out if XML is not supported in some targets. Assuming targets descriptions when XML is not available is very error-prone.
(In reply to Tom de Vries from comment #26) > Created attachment 14609 [details] > Reproducer patch, using Leap 15.4 .s files And if there is somehow a problem with running the test-case, try the minimal: ... $ cat gdb.in set trace-commands on start record b 43 continue reverse-next $ gdb -q -batch outputs/gdb.reverse/solib-reverse/solib-reverse -x gdb.in +start Temporary breakpoint 1 at 0x40069c: file /home/tdevries/gdb/src/gdb/testsuite/gdb.reverse/solib-reverse.c, line 27. Temporary breakpoint 1, main () at /home/tdevries/gdb/src/gdb/testsuite/gdb.reverse/solib-reverse.c:27 27 char* cptr = "String 1"; +record +target record-full +b 43 Breakpoint 2 at 0x400700: file /home/tdevries/gdb/src/gdb/testsuite/gdb.reverse/solib-reverse.c, line 43. +continue Breakpoint 2, main () at /home/tdevries/gdb/src/gdb/testsuite/gdb.reverse/solib-reverse.c:43 43 shr1 ("message 2\n"); /* shr1 two */ +reverse-next +next No more reverse-execution history. main () at /home/tdevries/gdb/src/gdb/testsuite/gdb.reverse/solib-reverse.c:27 27 char* cptr = "String 1"; ...
I've submitted a patch series here ( https://sourceware.org/pipermail/gdb-patches/2023-January/195920.html ) that contains a fix for PR30011. With that fix, this PR no longer triggers. The remaining question for me is whether PR30011 and this PR are duplicates. Put differently, is there something we can do in gdb to deal with the situation that the frame id for first and last insn in a function are not necessarily equal. Say we assume that they are equal if the target contains an epilogue unwinder, but if not, we're handling things more conservatively. This kind of fix could be backported to fix the regression that was introduced. Bruno, any comment?
> Say we assume that they are equal if the target contains an epilogue unwinder, but if not, we're handling things more conservatively. This kind of fix could be backported to fix the regression that was introduced. This sounds ok, but I wonder if "handling things more conservatively" will regress [record/16678](https://sourceware.org/bugzilla/show_bug.cgi?id=16678). To fix that bug I assumed that we knew the frame ID would be the same throughout the whole function. If we can't rely on that, how are we supposed to detect a breakpoint being hit in a recursive call?
(In reply to B. Larsen from comment #30) > > Say we assume that they are equal if the target contains an epilogue unwinder, but if not, we're handling things more conservatively. This kind of fix could be backported to fix the regression that was introduced. > > This sounds ok, but I wonder if "handling things more conservatively" will > regress > [record/16678](https://sourceware.org/bugzilla/show_bug.cgi?id=16678). To > fix that bug I assumed that we knew the frame ID would be the same > throughout the whole function. If we can't rely on that, how are we supposed > to detect a breakpoint being hit in a recursive call? Well, you've fixed that problem for a recursive function with debug info, and for x86_64 without debug info (meaning, also using -fno-asynchronous-unwind-tables to make sure there's no cfi directives). On aarch64 though, I see: ... $ cat test.c extern int factorial (int x); int v; int main() { v = 1; factorial(5); return 0; } $ cat test2.c int factorial(int x) { if (x == 1) return 1; int result = x * factorial(x-1); return result; } $ gcc test2.c -c ; gcc test.c test2.o -g $ gdb a.out Reading symbols from a.out... (gdb) start Temporary breakpoint 1 at 0x40056c: file test.c, line 4. Starting program: a.out Temporary breakpoint 1, main () at test.c:4 4 v = 1; (gdb) record (gdb) next 5 factorial(5); (gdb) next 6 return 0; (gdb) reverse-next 0x00000000004005d4 in factorial () (gdb) ... This is essentially the same problem as in this PR ... and I'm proposing that we fix it, in a target-independent fashion. I think the following cases can be distinguished: - cfi info present, assume frame ID's accurate, no special handling needed - cfi info not present, but target has an epilogue unwinder , assume frame ID's accurate, no special handling needed - cfi info not present, no epilogue unwinder. Special handling needed, but no need to worry about breaking handling of recursive function, because it's already broken, and the special handling intends to fix it. My naive idea on how to approach this was to instead of doing a step-resume breakpoint at the last insn and resume, do first a single step back, and then insert the step-resume breakpoint and resume. This will already fix the aarch64 case. There may be cases where this gives the wrong answer, but AFAIU those cases are broken anyway.
> I think the following cases can be distinguished: > - cfi info present, assume frame ID's accurate, no special handling needed > - cfi info not present, but target has an epilogue unwinder , assume frame > ID's accurate, no special handling needed > - cfi info not present, no epilogue unwinder. Special handling needed, > but no need to worry about breaking handling of recursive function, > because it's already broken, and the special handling intends to fix it. > > My naive idea on how to approach this was to instead of doing a step-resume > breakpoint at the last insn and resume, do first a single step back, and > then insert the step-resume breakpoint and resume. This will already fix > the aarch64 case. There may be cases where this gives the wrong answer, but > AFAIU those cases are broken anyway. Ah ok, sounds like a good naive approach. My thought was avoiding the previous naive solution!