Bug 31507 - [gdb, aarch64] FAIL: gdb.arch/disp-step-insn-reloc.exp: can_relocate_adr_forward: relocated instruction
Summary: [gdb, aarch64] FAIL: gdb.arch/disp-step-insn-reloc.exp: can_relocate_adr_forw...
Status: ASSIGNED
Alias: None
Product: gdb
Classification: Unclassified
Component: tdep (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-18 12:32 UTC by Tom de Vries
Modified: 2024-05-06 12:09 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
gdb.log without fail (2.62 KB, text/x-log)
2024-03-18 14:08 UTC, Tom de Vries
Details
gdb.log with fail (2.64 KB, text/x-log)
2024-03-18 14:08 UTC, Tom de Vries
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2024-03-18 12:32:12 UTC
On aarch64-linux (debian 12) I ran into:
...
(gdb) PASS: gdb.arch/disp-step-insn-reloc.exp: can_relocate_adr_forward: go to breakpoint 6
continue^M
Continuing.^M
^M
Breakpoint 8, can_relocate_adr_forward () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:356^M
356       asm ("set_point6:\n"^M
(gdb) FAIL: gdb.arch/disp-step-insn-reloc.exp: can_relocate_adr_forward: relocated instruction
...
Comment 1 Tom de Vries 2024-03-18 14:03:12 UTC
Doesn't reproduce all the time, but often enough:
...
$ for n in $(seq 1 10); do ./test.sh 2>&1 | grep "# of " | sort -u; done
# of expected passes		49
# of expected passes		37
# of unexpected failures	12
# of expected passes		49
# of expected passes		37
# of unexpected failures	12
# of expected passes		49
# of expected passes		36
# of unexpected failures	13
# of expected passes		49
# of expected passes		49
# of expected passes		46
# of unexpected failures	3
# of expected passes		49
...
Comment 2 Tom de Vries 2024-03-18 14:08:31 UTC
Created attachment 15413 [details]
gdb.log without fail
Comment 3 Tom de Vries 2024-03-18 14:08:55 UTC
Created attachment 15414 [details]
gdb.log with fail
Comment 4 Tom de Vries 2024-03-18 14:10:54 UTC
First relevant difference in gdb.log:
...
@@ -221,132 +221,133 @@
 (gdb) continue
 Continuing.
 
-Breakpoint 8, can_relocate_adr_forward () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:356
-356	  asm ("set_point6:\n"
+Breakpoint 17, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
+31	}
 (gdb) PASS: gdb.arch/disp-step-insn-reloc.exp: can_relocate_adr_forward: go to breakpoint 6
 continue
 Continuing.
 
-Breakpoint 17, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
-31	}
-(gdb) PASS: gdb.arch/disp-step-insn-reloc.exp: can_relocate_adr_forward: relocated instruction
+Breakpoint 8, can_relocate_adr_forward () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:356
+356	  asm ("set_point6:\n"
+(gdb) FAIL: gdb.arch/disp-step-insn-reloc.exp: can_relocate_adr_forward: relocated instruction
...
Comment 5 Tom de Vries 2024-03-18 14:55:34 UTC
More minimal version:
...
$ cat gdb.in
file /home/linux/gdb/build/gdb/testsuite/outputs/gdb.arch/disp-step-insn-reloc/disp-step-insn-reloc
start
break *set_point0
break *set_point1
break *set_point2
break pass
break fail
set displaced-stepping on
continue
bt
continue
bt
continue
bt
continue
bt
continue
bt
continue
bt
$ gdb -q -batch -ex "set trace-commands on" -x gdb.in 2>&1 | tee LOG; grep Breakpoint LOG
...

Usually, we have:
...
Breakpoint 2, 0x0000aaaaaaaa08bc in can_relocate_b () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:128
Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
Breakpoint 3, 0x0000aaaaaaaa090c in can_relocate_bcond_true () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:164
Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
Breakpoint 4, 0x0000aaaaaaaa0958 in can_relocate_cbz () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:203
Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
...

But once in a while:
...
Breakpoint 2, 0x0000aaaaaaaa08bc in can_relocate_b () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:128
Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
Breakpoint 3, 0x0000aaaaaaaa090c in can_relocate_bcond_true () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:164
Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
Breakpoint 4, 0x0000aaaaaaaa0958 in can_relocate_cbz () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:203
...

Looking at the backtrace, we hit pass twice without making progress:
...
+continue

Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
31      }
+bt
#0  pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
#1  0x0000aaaaaaaa0928 in can_relocate_bcond_true () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:179
#2  0x0000aaaaaaaa0c9c in main () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:629
+continue

Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
31      }
+bt
#0  pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
#1  0x0000aaaaaaaa0928 in can_relocate_bcond_true () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:179
#2  0x0000aaaaaaaa0c9c in main () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:629
...
Comment 6 Tom de Vries 2024-03-18 15:08:21 UTC
OK, so this hits the "PC did not move. Discarding PC adjustment" case:
...
Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
31      }
+x /i $pc
=> 0xaaaaaaaa0894 <pass>:       nop
+continue
[displaced] displaced_step_prepare_throw: displaced-stepping 1307111.1307111.0 now
[displaced] displaced_step_prepare_throw: original insn 0xaaaaaaaa0894: 1f 20 03 d5      nop
[displaced] prepare: selected buffer at 0xaaaaaaaa0788
[displaced] prepare: saved 0xaaaaaaaa0788: 1e 00 80 d2
[displaced] aarch64_displaced_step_copy_insn: writing insn d503201f at 0xaaaaaaaa0788
[displaced] displaced_step_prepare_throw: prepared successfully thread=1307111.1307111.0, original_pc=0xaaaaaaaa0894, displaced_pc=0xaaaaaaaa0788
[displaced] displaced_step_prepare_throw: replacement insn 0xaaaaaaaa0788: 1f 20 03 d5   nop
[displaced] finish: restored 1307111.1307111.0 0xaaaaaaaa0788
[displaced] aarch64_displaced_step_fixup: PC after stepping: 0xaaaaaaaa0788 (was 0xaaaaaaaa0788).
[displaced] aarch64_displaced_step_fixup: adjusting PC by 4
[displaced] aarch64_displaced_step_fixup: PC did not move. Discarding PC adjustment.
[displaced] aarch64_displaced_step_fixup: fixup: set PC to 0xaaaaaaaa0894:0

Breakpoint 5, pass () at /home/linux/gdb/src/gdb/testsuite/gdb.arch/insn-reloc.c:31
31      }
+x /i $pc
=> 0xaaaaaaaa0894 <pass>:       nop
...
Comment 7 Tom de Vries 2024-03-18 15:11:51 UTC
This may be a regression due to:
...
commit 0c27188999bfc5bf03536bf44593c4ed8df296c3
Author: Luis Machado <luis.machado@linaro.org>
Date:   Thu Jan 9 16:04:36 2020 -0300

    Fix step-over-syscall.exp failure
    
    In particular, this one:
    
    FAIL: gdb.base/step-over-syscall.exp: fork: displaced=on: check_pc_after_cross_syscall: single step over fork final pc
    
    When ptrace fork event reporting is enabled, GDB gets a PTRACE_EVENT_FORK
    event whenever the inferior executes the fork syscall.
    
    Then the logic is that GDB needs to step the inferior yet again in order to
    receive a predetermined SIGTRAP, but no execution takes place because the
    signal was already queued for delivery. That means the PC should stay the same.
    
    I noticed the aarch64 code is currently adjusting the PC in this situation,
    making the inferior skip an instruction without executing it.
    
    The following change checks if we did not execute the instruction
    (pc - to == 0), making proper adjustments for such case.
    
    Regression tested on aarch64-linux-gnu on the tryserver.
    
    gdb/ChangeLog:
    
    2020-01-21  Luis Machado  <luis.machado@linaro.org>
...

Luis, could you take a look?
Comment 8 Luis Machado 2024-03-18 15:21:22 UTC
Sure. I vaguely recall the situation with 0c27188999bfc5bf03536bf44593c4ed8df296c3

I'm wondering if we're incorrectly identifying an instruction that we're trying to displaced-step and going through an incorrect outcome.
Comment 9 Luis Machado 2024-03-18 15:23:26 UTC
I tried running this test with RACY_ITER=100 but didn't see any FAIL's. It comes out as non-racy.
Comment 10 Luis Machado 2024-03-18 17:20:47 UTC
Ok. This doesn't reproduce for me even if I run it 1000 times.

I'll check the log file more thoroughly to see if anything rings a bell.
Comment 11 Tom de Vries 2024-03-18 17:49:30 UTC
(In reply to Luis Machado from comment #10)
> Ok. This doesn't reproduce for me even if I run it 1000 times.

That's unfortunate.

Let me try to describe the setup, in case that helps in any way:
- lenovo ideapad 3 chromebook, SOC mt8183 (4 Cortex-A73, 4 Cortex-A53), 4GB RAM
- debian 12, installed from https://github.com/hexdump0815/imagebuilder/blob/main/systems/chromebook_kukui/readme.md
- system up-to-date
- uname -a: Linux changeme 6.1.51-stb-mt8+ #1 SMP PREEMPT
    Tue Sep  5 16:08:26 CEST 2023 aarch64 GNU/Linux
- ldd (Debian GLIBC 2.36-9+deb12u4) 2.36
- gcc version 12.2.0 (Debian 12.2.0-14) 
- GNU assembler (GNU Binutils for Debian) 2.40
- gdb build with -O0 -g -fuse-ld=mold
- build at commit 6549a232d25 ("Fix compiling bfd/vms-lib.c for a 32-bit host.")

I tried my usual tricks of "taskset -c 0" and "stress -c 8", but found that this makes the fail less likely.

I haven't been able to reproduce on either pinebook pro (running manjaro) or m1 macbook (running fedora asahi remix).
Comment 12 Luis Machado 2024-03-18 18:05:21 UTC
Ok, that's useful information. Let me try to get the kernel + tools versions right first, and then I'll make another attempt at reproducing things.

It is a bit suspicious that we're having an issue with a nop instruction. But let me play with it some and see if I find anything.

As usual, the case the patch was trying to address was a bit odd.
Comment 13 Tom de Vries 2024-03-19 07:20:26 UTC
(In reply to Tom de Vries from comment #7)
> This may be a regression due to:
> ...
> commit 0c27188999bfc5bf03536bf44593c4ed8df296c3
> Author: Luis Machado <luis.machado@linaro.org>
> Date:   Thu Jan 9 16:04:36 2020 -0300
> 
>     Fix step-over-syscall.exp failure

I've confirmed this, ran a loop with 500 iterations, didn't fail before the commit, fails after the commit.
Comment 14 Luis Machado 2024-03-19 08:13:51 UTC
Thanks for confirming Tom.
Comment 15 Luis Machado 2024-03-22 13:30:15 UTC
Just a quick update. I've tried this on different hardware with a newer Ubuntu (22.04). I couldn't get it to reproduce yet.

I'm trying a few more things.
Comment 16 Tom de Vries 2024-05-06 12:09:43 UTC
I've also ran into this with gdb.dwarf2/dw2-lines.exp.