Bug 30872 - Assertion `buffer.copy_insn_closure.get () != nullptr' failed
Summary: Assertion `buffer.copy_insn_closure.get () != nullptr' failed
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 13.1
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-19 09:43 UTC by Zbigniew
Modified: 2023-10-16 11:01 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2023-09-27 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Zbigniew 2023-09-19 09:43:55 UTC
During any attempt of debugging session with ARM32 assembler binary gdb 13.1/13.2 breaks on regular basis:

(gdb) break _start
Breakpoint 1 at 0x1005c
(gdb) run
Starting program: /home/zb/devel/pi-asm-master/06_first_data.bin 

Breakpoint 1, 0x0001005c in _start ()
(gdb) nexti
0x00010060 in _start ()
(gdb) nexti
/build/gdb-qUBixh/gdb-13.2/gdb/displaced-stepping.c:287: internal-error: copy_insn_closure_by_addr: Assertion `buffer.copy_insn_closure.get () != nullptr' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
----- Backtrace -----
0x4ce739 ???
0x6ea34b ???
0x6ea4f1 ???
0x7ce215 ???
0x53065b ???
[..]
0x585b15 ???
0x584a49 ???
---------------------
/build/gdb-qUBixh/gdb-13.2/gdb/displaced-stepping.c:287: internal-error: copy_insn_closure_by_addr: Assertion `buffer.copy_insn_closure.get () != nullptr' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

About any simple assembler program can be used to repeat this; it happens all the time. Still I paste the one used in that attempt:
=====================================
.section .rodata 
msg:
    .ascii "Hello, World\n"

.text
.align 2

SYS_EXIT = 1
SYS_WRITE = 4
STDOUT = 1

.global _start

exit:
    mov    r7, #SYS_EXIT
    svc    #0

_start:
    mov    r7, #SYS_WRITE
    mov    r0, #STDOUT
    ldr    r1, =msg
    mov    r2, #13
    svc    #0

    // Now exit
    mov    r0, #0
    b      exit
==============================
as program.s -o program.o && ld program.o -o program.bin
Then simply: gdb -q ./program.bin — and the rest like above; after second or third „nexti” gdb will ALWAYS break.

Tested both 13.1 and 13.2 versions of gdb.

Gear: Banana Pi M2+, Armbian „Bookworm”, ARM32 — here's an excerpt from cpuinfo:

processor       : 0
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 22.85
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5
[..]
Hardware        : Allwinner sun8i Family
Revision        : 0000
Serial          : 02c00081f0069981
Comment 1 Zbigniew 2023-09-19 12:02:25 UTC
Just imagine: this bug DOESN'T occur in old version 10.1. Just a few minutes ago I verified it using old package (GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git).
Comment 2 Tom Tromey 2023-09-19 12:29:12 UTC
I can't readily try your program, but locally we do a lot
of testing on arm-elf, which works fine, so I wonder what the
difference is.

Maybe build a gdb with debug info so we can get a better
stack trace?  I don't know if that will really help but
it definitely couldn't hurt.
Comment 3 Zbigniew 2023-09-19 12:34:49 UTC
I didn't compile gdb; I used packages from Debian (and from Armbian); compilation of MB-sized package on my little Banana Pi takes hours.
Anyway I believe I already gave you plenty of information.
Comment 4 Hannes Domani 2023-09-19 16:02:54 UTC
This is probably related to PR29272, since the assertion was added because of it.
Comment 5 Tom Tromey 2023-09-19 18:00:40 UTC
CC'ing Luis, who may know.
Comment 6 Luis Machado 2023-09-19 20:31:15 UTC
Yeah, it is the assertion that should capture some broken situation, added as part of PR29272. I'll have to do some digging, as I have since forgotten what the particular situation was. It used to be a bit obscure.
Comment 7 Luis Machado 2023-09-27 11:46:29 UTC
I managed to reproduce this. Let me see what I can find.
Comment 8 Luis Machado 2023-09-27 12:39:49 UTC
Well, just should've been a bit obvious. I'm still checking this, but here's some useful information.

The displaced stepping machinery in gdb, which is used to side-step breakpoints without removing them, uses scratch space for executing the instructions out of their original location.

That scratch location was picked a while ago to be the entry point of the program, from the auxv's AT_ENTRY value. That's usually the first instruction of _start.

Now, your example program starts from _start. So things get a bit confusing because gdb is trying to execute and modify things using the same address.

As a temporary workaround, you can disable displaced stepping. It likely won't make a difference for your case.

set displaced-stepping off

That should allow you to resume your debugging session.

I'll see what can be done about this corner case.
Comment 9 Zbigniew 2023-09-27 13:27:55 UTC
Please, note (and compare): when debugging x86_64 assembly routine, the same way (repeated „nexti”), the problem doesn't occur — somehow it seems to be ARM-specific.
Comment 10 Luis Machado 2023-09-27 14:07:47 UTC
Yes, this is arm-specific. Displaced stepping has a significant arch-specific layer behind it.
Comment 11 Luis Machado 2023-09-28 08:13:10 UTC
I have a WIP fix that I need to test.
Comment 13 Sourceware Commits 2023-10-16 10:57:15 UTC
The master branch has been updated by Luis Machado <luisgpm@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5d4a870e05ac45e3f5a301c672a4079995b5db7a

commit 5d4a870e05ac45e3f5a301c672a4079995b5db7a
Author: Luis Machado <luis.machado@arm.com>
Date:   Thu Sep 28 11:08:29 2023 +0100

    Only allow closure lookup by address if there are threads displaced-stepping
    
    Since commit 1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, we have an assertion in
    displaced_step_buffers::copy_insn_closure_by_addr that makes sure a closure
    is available whenever we have a match between the provided address argument and
    the buffer address.
    
    That is fine, but the report in PR30872 shows this assertion triggering when
    it really shouldn't. After some investigation, here's what I found out.
    
    The 32-bit Arm architecture is the only one that calls
    gdbarch_displaced_step_copy_insn_closure_by_addr directly, and that's because
    32-bit Arm needs to figure out the thumb state of the original instruction
    that we displaced-stepped through the displaced-step buffer.
    
    Before the assertion was put in place by commit
    1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, there was the possibility of
    getting nullptr back, which meant we were not doing a displaced-stepping
    operation.
    
    Now, with the assertion in place, this is running into issues.
    
    It looks like displaced_step_buffers::copy_insn_closure_by_addr is
    being used to return a couple different answers depending on the
    state we're in:
    
    1 - If we are actively displaced-stepping, then copy_insn_closure_by_addr
    is supposed to return a valid closure for us, so we can determine the
    thumb mode.
    
    2 - If we are not actively displaced-stepping, then copy_insn_closure_by_addr
    should return nullptr to signal that there isn't any displaced-step buffers
    in use, because we don't have a valid closure (but we should always have
    this).
    
    Since the displaced-step buffers are always allocated, but not always used,
    that means the buffers will always contain data. In particular, the buffer
    addr field cannot be used to determine if the buffer is active or not.
    
    For instance, we cannot set the buffer addr field to 0x0, as that can be a
    valid PC in some cases.
    
    My understanding is that the current_thread field should be a good candidate
    to signal that a particular displaced-step buffer is active or not. If it is
    nullptr, we have no threads using that buffer to displaced-step.  Otherwise,
    it is an active buffer in use by a particular thread.
    
    The following fix modifies the displaced_step_buffers::copy_insn_closure_by_addr
    function so we only attempt to return a closure if the buffer has an assigned
    current_thread and if the buffer address matches the address argument.
    
    Alternatively, I think we could use a function to answer the question of
    whether we're actively displaced-stepping (so we have an active buffer) or
    not.
    
    I've also added a testcase that exercises the problem. It should reproduce
    reliably on Arm, as that is the only architecture that faces this problem
    at the moment.
    
    Regression-tested on Ubuntu 20.04. OK?
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30872
    Approved-By: Simon Marchi <simon.marchi@efficios.com>
Comment 14 Sourceware Commits 2023-10-16 10:59:33 UTC
The gdb-14-branch branch has been updated by Luis Machado <luisgpm@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=c9d954f934b64e2e3e66ed4d0652cd759eaf49f5

commit c9d954f934b64e2e3e66ed4d0652cd759eaf49f5
Author: Luis Machado <luis.machado@arm.com>
Date:   Thu Sep 28 11:08:29 2023 +0100

    Only allow closure lookup by address if there are threads displaced-stepping
    
    Since commit 1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, we have an assertion in
    displaced_step_buffers::copy_insn_closure_by_addr that makes sure a closure
    is available whenever we have a match between the provided address argument and
    the buffer address.
    
    That is fine, but the report in PR30872 shows this assertion triggering when
    it really shouldn't. After some investigation, here's what I found out.
    
    The 32-bit Arm architecture is the only one that calls
    gdbarch_displaced_step_copy_insn_closure_by_addr directly, and that's because
    32-bit Arm needs to figure out the thumb state of the original instruction
    that we displaced-stepped through the displaced-step buffer.
    
    Before the assertion was put in place by commit
    1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, there was the possibility of
    getting nullptr back, which meant we were not doing a displaced-stepping
    operation.
    
    Now, with the assertion in place, this is running into issues.
    
    It looks like displaced_step_buffers::copy_insn_closure_by_addr is
    being used to return a couple different answers depending on the
    state we're in:
    
    1 - If we are actively displaced-stepping, then copy_insn_closure_by_addr
    is supposed to return a valid closure for us, so we can determine the
    thumb mode.
    
    2 - If we are not actively displaced-stepping, then copy_insn_closure_by_addr
    should return nullptr to signal that there isn't any displaced-step buffers
    in use, because we don't have a valid closure (but we should always have
    this).
    
    Since the displaced-step buffers are always allocated, but not always used,
    that means the buffers will always contain data. In particular, the buffer
    addr field cannot be used to determine if the buffer is active or not.
    
    For instance, we cannot set the buffer addr field to 0x0, as that can be a
    valid PC in some cases.
    
    My understanding is that the current_thread field should be a good candidate
    to signal that a particular displaced-step buffer is active or not. If it is
    nullptr, we have no threads using that buffer to displaced-step.  Otherwise,
    it is an active buffer in use by a particular thread.
    
    The following fix modifies the displaced_step_buffers::copy_insn_closure_by_addr
    function so we only attempt to return a closure if the buffer has an assigned
    current_thread and if the buffer address matches the address argument.
    
    Alternatively, I think we could use a function to answer the question of
    whether we're actively displaced-stepping (so we have an active buffer) or
    not.
    
    I've also added a testcase that exercises the problem. It should reproduce
    reliably on Arm, as that is the only architecture that faces this problem
    at the moment.
    
    Regression-tested on Ubuntu 20.04. OK?
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30872
    Approved-By: Simon Marchi <simon.marchi@efficios.com>
Comment 15 Luis Machado 2023-10-16 11:01:40 UTC
Fixed now on master and gdb-14-branch. Please reopen if you see any issue related to this.