During any attempt of debugging session with ARM32 assembler binary gdb 13.1/13.2 breaks on regular basis: (gdb) break _start Breakpoint 1 at 0x1005c (gdb) run Starting program: /home/zb/devel/pi-asm-master/06_first_data.bin Breakpoint 1, 0x0001005c in _start () (gdb) nexti 0x00010060 in _start () (gdb) nexti /build/gdb-qUBixh/gdb-13.2/gdb/displaced-stepping.c:287: internal-error: copy_insn_closure_by_addr: Assertion `buffer.copy_insn_closure.get () != nullptr' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- 0x4ce739 ??? 0x6ea34b ??? 0x6ea4f1 ??? 0x7ce215 ??? 0x53065b ??? [..] 0x585b15 ??? 0x584a49 ??? --------------------- /build/gdb-qUBixh/gdb-13.2/gdb/displaced-stepping.c:287: internal-error: copy_insn_closure_by_addr: Assertion `buffer.copy_insn_closure.get () != nullptr' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) n About any simple assembler program can be used to repeat this; it happens all the time. Still I paste the one used in that attempt: ===================================== .section .rodata msg: .ascii "Hello, World\n" .text .align 2 SYS_EXIT = 1 SYS_WRITE = 4 STDOUT = 1 .global _start exit: mov r7, #SYS_EXIT svc #0 _start: mov r7, #SYS_WRITE mov r0, #STDOUT ldr r1, =msg mov r2, #13 svc #0 // Now exit mov r0, #0 b exit ============================== as program.s -o program.o && ld program.o -o program.bin Then simply: gdb -q ./program.bin — and the rest like above; after second or third „nexti” gdb will ALWAYS break. Tested both 13.1 and 13.2 versions of gdb. Gear: Banana Pi M2+, Armbian „Bookworm”, ARM32 — here's an excerpt from cpuinfo: processor : 0 model name : ARMv7 Processor rev 5 (v7l) BogoMIPS : 22.85 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc07 CPU revision : 5 [..] Hardware : Allwinner sun8i Family Revision : 0000 Serial : 02c00081f0069981
Just imagine: this bug DOESN'T occur in old version 10.1. Just a few minutes ago I verified it using old package (GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git).
I can't readily try your program, but locally we do a lot of testing on arm-elf, which works fine, so I wonder what the difference is. Maybe build a gdb with debug info so we can get a better stack trace? I don't know if that will really help but it definitely couldn't hurt.
I didn't compile gdb; I used packages from Debian (and from Armbian); compilation of MB-sized package on my little Banana Pi takes hours. Anyway I believe I already gave you plenty of information.
This is probably related to PR29272, since the assertion was added because of it.
CC'ing Luis, who may know.
Yeah, it is the assertion that should capture some broken situation, added as part of PR29272. I'll have to do some digging, as I have since forgotten what the particular situation was. It used to be a bit obscure.
I managed to reproduce this. Let me see what I can find.
Well, just should've been a bit obvious. I'm still checking this, but here's some useful information. The displaced stepping machinery in gdb, which is used to side-step breakpoints without removing them, uses scratch space for executing the instructions out of their original location. That scratch location was picked a while ago to be the entry point of the program, from the auxv's AT_ENTRY value. That's usually the first instruction of _start. Now, your example program starts from _start. So things get a bit confusing because gdb is trying to execute and modify things using the same address. As a temporary workaround, you can disable displaced stepping. It likely won't make a difference for your case. set displaced-stepping off That should allow you to resume your debugging session. I'll see what can be done about this corner case.
Please, note (and compare): when debugging x86_64 assembly routine, the same way (repeated „nexti”), the problem doesn't occur — somehow it seems to be ARM-specific.
Yes, this is arm-specific. Displaced stepping has a significant arch-specific layer behind it.
I have a WIP fix that I need to test.
Here's the proposed patch: https://inbox.sourceware.org/gdb-patches/20230929081503.4014732-1-luis.machado@arm.com/T/#u
The master branch has been updated by Luis Machado <luisgpm@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5d4a870e05ac45e3f5a301c672a4079995b5db7a commit 5d4a870e05ac45e3f5a301c672a4079995b5db7a Author: Luis Machado <luis.machado@arm.com> Date: Thu Sep 28 11:08:29 2023 +0100 Only allow closure lookup by address if there are threads displaced-stepping Since commit 1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, we have an assertion in displaced_step_buffers::copy_insn_closure_by_addr that makes sure a closure is available whenever we have a match between the provided address argument and the buffer address. That is fine, but the report in PR30872 shows this assertion triggering when it really shouldn't. After some investigation, here's what I found out. The 32-bit Arm architecture is the only one that calls gdbarch_displaced_step_copy_insn_closure_by_addr directly, and that's because 32-bit Arm needs to figure out the thumb state of the original instruction that we displaced-stepped through the displaced-step buffer. Before the assertion was put in place by commit 1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, there was the possibility of getting nullptr back, which meant we were not doing a displaced-stepping operation. Now, with the assertion in place, this is running into issues. It looks like displaced_step_buffers::copy_insn_closure_by_addr is being used to return a couple different answers depending on the state we're in: 1 - If we are actively displaced-stepping, then copy_insn_closure_by_addr is supposed to return a valid closure for us, so we can determine the thumb mode. 2 - If we are not actively displaced-stepping, then copy_insn_closure_by_addr should return nullptr to signal that there isn't any displaced-step buffers in use, because we don't have a valid closure (but we should always have this). Since the displaced-step buffers are always allocated, but not always used, that means the buffers will always contain data. In particular, the buffer addr field cannot be used to determine if the buffer is active or not. For instance, we cannot set the buffer addr field to 0x0, as that can be a valid PC in some cases. My understanding is that the current_thread field should be a good candidate to signal that a particular displaced-step buffer is active or not. If it is nullptr, we have no threads using that buffer to displaced-step. Otherwise, it is an active buffer in use by a particular thread. The following fix modifies the displaced_step_buffers::copy_insn_closure_by_addr function so we only attempt to return a closure if the buffer has an assigned current_thread and if the buffer address matches the address argument. Alternatively, I think we could use a function to answer the question of whether we're actively displaced-stepping (so we have an active buffer) or not. I've also added a testcase that exercises the problem. It should reproduce reliably on Arm, as that is the only architecture that faces this problem at the moment. Regression-tested on Ubuntu 20.04. OK? Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30872 Approved-By: Simon Marchi <simon.marchi@efficios.com>
The gdb-14-branch branch has been updated by Luis Machado <luisgpm@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=c9d954f934b64e2e3e66ed4d0652cd759eaf49f5 commit c9d954f934b64e2e3e66ed4d0652cd759eaf49f5 Author: Luis Machado <luis.machado@arm.com> Date: Thu Sep 28 11:08:29 2023 +0100 Only allow closure lookup by address if there are threads displaced-stepping Since commit 1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, we have an assertion in displaced_step_buffers::copy_insn_closure_by_addr that makes sure a closure is available whenever we have a match between the provided address argument and the buffer address. That is fine, but the report in PR30872 shows this assertion triggering when it really shouldn't. After some investigation, here's what I found out. The 32-bit Arm architecture is the only one that calls gdbarch_displaced_step_copy_insn_closure_by_addr directly, and that's because 32-bit Arm needs to figure out the thumb state of the original instruction that we displaced-stepped through the displaced-step buffer. Before the assertion was put in place by commit 1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, there was the possibility of getting nullptr back, which meant we were not doing a displaced-stepping operation. Now, with the assertion in place, this is running into issues. It looks like displaced_step_buffers::copy_insn_closure_by_addr is being used to return a couple different answers depending on the state we're in: 1 - If we are actively displaced-stepping, then copy_insn_closure_by_addr is supposed to return a valid closure for us, so we can determine the thumb mode. 2 - If we are not actively displaced-stepping, then copy_insn_closure_by_addr should return nullptr to signal that there isn't any displaced-step buffers in use, because we don't have a valid closure (but we should always have this). Since the displaced-step buffers are always allocated, but not always used, that means the buffers will always contain data. In particular, the buffer addr field cannot be used to determine if the buffer is active or not. For instance, we cannot set the buffer addr field to 0x0, as that can be a valid PC in some cases. My understanding is that the current_thread field should be a good candidate to signal that a particular displaced-step buffer is active or not. If it is nullptr, we have no threads using that buffer to displaced-step. Otherwise, it is an active buffer in use by a particular thread. The following fix modifies the displaced_step_buffers::copy_insn_closure_by_addr function so we only attempt to return a closure if the buffer has an assigned current_thread and if the buffer address matches the address argument. Alternatively, I think we could use a function to answer the question of whether we're actively displaced-stepping (so we have an active buffer) or not. I've also added a testcase that exercises the problem. It should reproduce reliably on Arm, as that is the only architecture that faces this problem at the moment. Regression-tested on Ubuntu 20.04. OK? Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30872 Approved-By: Simon Marchi <simon.marchi@efficios.com>
Fixed now on master and gdb-14-branch. Please reopen if you see any issue related to this.