Bug 28056 - [11/12 Regression] linux-tdep.c:2550: internal-error: displaced_step_prepare_status linux_displaced_step_prepare on s390x-linux-gnu
Summary: [11/12 Regression] linux-tdep.c:2550: internal-error: displaced_step_prepare_...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 11.1
Assignee: Simon Marchi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-05 08:27 UTC by Matthias Klose
Modified: 2021-07-11 14:31 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Proposed patch (1.88 KB, patch)
2021-07-05 18:14 UTC, Simon Marchi
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Klose 2021-07-05 08:27:01 UTC
seen with the trunk 20210630, and the gdb-12 branch 20210705 on s390x-linux-gnu:

$ gdb /bin/ls
GNU gdb (Ubuntu 11.0.90.20210705-0ubuntu1) 11.0.90.20210705-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "s390x-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /bin/ls...
(No debugging symbols found in /bin/ls)
(gdb) run
Starting program: /usr/bin/ls 
/home/ubuntu/tmp/gdb-11.0.90.20210705/gdb/linux-tdep.c:2550: internal-error: displaced_step_prepare_status linux_displaced_step_prepare(gdbarch*, thread_info*, CORE_ADDR&): Assertion `gdbarch_data->num_disp_step_buffers > 0' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
Command aborted.
(gdb)
Comment 1 Simon Marchi 2021-07-05 15:00:09 UTC
This is clearly related to my change.  I don't have access to an s390x machine to test this... do you have an easy way to set up a qemu machine or something like that?
Comment 2 Simon Marchi 2021-07-05 18:14:20 UTC
Created attachment 13540 [details]
Proposed patch

I managed to get a working qemu setup without much efforts using Buildroot, and I can reproduce the issue.  Although if there is a ready-to-use Ubuntu image I can use, I'd be happy to use it (I just don't want to go through the trouble of installing the distro in qemu with the installer).

See the attached patch, it does fix the issue for me.

However, the inferior always hits this when running:

    Program received signal SIGSEGV, Segmentation fault.
    0x000003ffac312c6c in _dl_debug_initialize () from /lib/ld64.so.1

... and that was happening prior to my patches, and in GDB 10.  I guess you don't see it?
Comment 3 Matthias Klose 2021-07-06 05:03:03 UTC
no, that patch doesn't help.  the gdb 10.2 build was fine, working correctly.

gdb: configured with: --build=s390x-linux-gnu
         --host=s390x-linux-gnu
         --prefix=/usr
         --libexecdir=${prefix}/lib/gdb
         --disable-werror
         --disable-maintainer-mode
         --disable-dependency-tracking
         --disable-silent-rules
         --disable-gdbtk
         --disable-shared
         --with-system-readline
         --with-pkgversion='Ubuntu 11.0.90.20210705-0ubuntu2'
         --srcdir=/<<PKGBUILDDIR>>
         --with-expat
         --with-system-zlib
         --without-guile
         --without-babeltrace
         --with-debuginfod
         --with-babeltrace
         --with-system-gdbinit=/etc/gdb/gdbinit
         --with-system-gdbinit-dir=/etc/gdb/gdbinit.d
         --enable-tui
         --with-lzma
         --with-python=python3
         --with-xxhash
         --with-mpfr
Comment 4 Matthias Klose 2021-07-06 06:01:30 UTC
for a s390x setup, maybe see
https://developer.ibm.com/components/ibm-linuxone/gettingstarted/
Comment 5 Simon Marchi 2021-07-06 13:43:29 UTC
(In reply to Matthias Klose from comment #4)
> for a s390x setup, maybe see
> https://developer.ibm.com/components/ibm-linuxone/gettingstarted/

Thanks, that works wonderfully.  I tested on a RHEL 8.3 machine, I was able to reproduce the issue, and the patch fixes it for me there also.  So, can I ask you first to double check that you built and tested the right thing?  If it still fails, I'll need you to do a bit of assisted debugging, because I don't know what the problem would be.
Comment 6 Matthias Klose 2021-07-06 16:12:10 UTC
yes, double checked,
https://launchpad.net/ubuntu/+source/gdb/11.0.90.20210705-0ubuntu2/+build/21752179

Ubuntu's compiler enables a few hardening flags by default,
-D_FORTIFY_SOURCE=2 -fstack-protector-strong -Wformat -Werror=format-security
but that's unchanged from the gdb 10.2 build.
Comment 7 Simon Marchi 2021-07-06 16:42:42 UTC
Ok, well can you check what are the calls to linux_init_abi (put a breakpoint or add a printf) and tell me what the num_disp_step_buffers argument values are?  I would expect them to be 1.

If you see a call where num_disp_step_buffers is 0, then please provide the backtrace at that point.

Thanks!
Comment 8 Matthias Klose 2021-07-07 08:29:40 UTC
looking at a 20210706 build, the fix seems to work.  Unsure what I did wrong yesterday.  So the fix would be needed on the trunk and the branch.
Comment 9 Simon Marchi 2021-07-07 12:58:17 UTC
(In reply to Matthias Klose from comment #8)
> looking at a 20210706 build, the fix seems to work.  Unsure what I did wrong
> yesterday.  So the fix would be needed on the trunk and the branch.

Ok, glad to hear that!

Patch posted here: https://sourceware.org/pipermail/gdb-patches/2021-July/180752.html
Comment 10 cvs-commit@gcc.gnu.org 2021-07-08 14:05:11 UTC
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=74b10a3219e44ba2585e3f7226a6455d41e92c1b

commit 74b10a3219e44ba2585e3f7226a6455d41e92c1b
Author: Simon Marchi <simon.marchi@polymtl.ca>
Date:   Wed Jul 7 08:57:36 2021 -0400

    gdb: don't set Linux-specific displaced stepping methods in s390_gdbarch_init
    
    According to bug 28056, running an s390x binary gives:
    
        (gdb) run
        Starting program: /usr/bin/ls
        /home/ubuntu/tmp/gdb-11.0.90.20210705/gdb/linux-tdep.c:2550: internal-error: displaced_step_prepare_status linux_displaced_step_prepare(gdbarch*, thread_info*, CORE_ADDR&): Assertion `gdbarch_data->num_disp_step_buffers > 0' failed.
    
    This is because the s390 architecture registers some Linux-specific
    displaced stepping callbacks in the OS-agnostic s390_gdbarch_init:
    
        set_gdbarch_displaced_step_prepare (gdbarch, linux_displaced_step_prepare);
        set_gdbarch_displaced_step_finish (gdbarch, linux_displaced_step_finish);
        set_gdbarch_displaced_step_restore_all_in_ptid
          (gdbarch, linux_displaced_step_restore_all_in_ptid);
    
    But then the Linux-specific s390_linux_init_abi_any passes
    num_disp_step_buffers=0 to linux_init_abi:
    
        linux_init_abi (info, gdbarch, 0);
    
    The problem happens when linux_displaced_step_prepare is called for the
    first time.  It tries to allocate the displaced stepping buffers, but
    sees that the number of displaced stepping buffers for that architecture
    is 0, which is unexpected / invalid.
    
    s390_gdbarch_init should not register the linux_* callbacks, that is
    expected to be done by linux_init_abi.  If debugging a bare-metal s390
    program, or an s390 program on another OS GDB doesn't know about, we
    wouldn't want to use them.  We would either register no callbacks, if
    displaced stepping isn't supported, or register a different set of
    callbacks if we wanted to support displaced stepping in those cases.
    
    The commit that refactored the displaced stepping machinery and
    introduced these set_gdbarch_displaced_step_* calls is 187b041e2514
    ("gdb: move displaced stepping logic to gdbarch, allow starting
    concurrent displaced steps").  However, even before that,
    s390_gdbarch_init did:
    
      set_gdbarch_displaced_step_location (gdbarch, linux_displaced_step_location);
    
    ... which already seemed wrong.  The Linux-specific callback was used
    even for non-Linux system.  Maybe that was on purpose, because it would
    also happen to work in some other non-Linux case, or maybe it was simply
    a mistake.  I'll assume that this was a small mistake when
    s390-tdep.{h,c} where factored out of s390-linux-tdep.c, in d6e589456475
    ("s390: Split up s390-linux-tdep.c into two files").
    
    Fix this by removing the setting of these displaced step callbacks from
    s390_gdbarch_init.  Instead, pass num_disp_step_buffers=1 to
    linux_init_abi, in s390_linux_init_abi_any.  Doing so will cause
    linux_init_abi to register these same callbacks.  It will also mean that
    when debugging a bare-metal s390 executable or an executable on another
    OS that GDB doesn't know about, gdbarch_displaced_step_prepare won't be
    set, so displaced stepping won't be used.
    
    This patch will need to be merged in the gdb-11-branch, since this is a
    GDB 11 regression, so here's the ChangeLog entry:
    
    gdb/ChangeLog:
    
            * s390-linux-tdep.c (s390_linux_init_abi_any): Pass 1 (number
            of displaced stepping buffers to linux_init_abi.
            * s390-tdep.c (s390_gdbarch_init): Don't set the Linux-specific
            displaced-stepping gdbarch callbacks.
    
    Change-Id: Ieab2f8990c78fde845ce7378d6fd4ee2833800d5
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28056
Comment 11 cvs-commit@gcc.gnu.org 2021-07-08 14:06:11 UTC
The gdb-11-branch branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ff32938d4471ebf6dce462934de8ab47f9fd5d66

commit ff32938d4471ebf6dce462934de8ab47f9fd5d66
Author: Simon Marchi <simon.marchi@polymtl.ca>
Date:   Thu Jul 8 10:05:16 2021 -0400

    gdb: don't set Linux-specific displaced stepping methods in s390_gdbarch_init
    
    According to bug 28056, running an s390x binary gives:
    
        (gdb) run
        Starting program: /usr/bin/ls
        /home/ubuntu/tmp/gdb-11.0.90.20210705/gdb/linux-tdep.c:2550: internal-error: displaced_step_prepare_status linux_displaced_step_prepare(gdbarch*, thread_info*, CORE_ADDR&): Assertion `gdbarch_data->num_disp_step_buffers > 0' failed.
    
    This is because the s390 architecture registers some Linux-specific
    displaced stepping callbacks in the OS-agnostic s390_gdbarch_init:
    
        set_gdbarch_displaced_step_prepare (gdbarch, linux_displaced_step_prepare);
        set_gdbarch_displaced_step_finish (gdbarch, linux_displaced_step_finish);
        set_gdbarch_displaced_step_restore_all_in_ptid
          (gdbarch, linux_displaced_step_restore_all_in_ptid);
    
    But then the Linux-specific s390_linux_init_abi_any passes
    num_disp_step_buffers=0 to linux_init_abi:
    
        linux_init_abi (info, gdbarch, 0);
    
    The problem happens when linux_displaced_step_prepare is called for the
    first time.  It tries to allocate the displaced stepping buffers, but
    sees that the number of displaced stepping buffers for that architecture
    is 0, which is unexpected / invalid.
    
    s390_gdbarch_init should not register the linux_* callbacks, that is
    expected to be done by linux_init_abi.  If debugging a bare-metal s390
    program, or an s390 program on another OS GDB doesn't know about, we
    wouldn't want to use them.  We would either register no callbacks, if
    displaced stepping isn't supported, or register a different set of
    callbacks if we wanted to support displaced stepping in those cases.
    
    The commit that refactored the displaced stepping machinery and
    introduced these set_gdbarch_displaced_step_* calls is 187b041e2514
    ("gdb: move displaced stepping logic to gdbarch, allow starting
    concurrent displaced steps").  However, even before that,
    s390_gdbarch_init did:
    
      set_gdbarch_displaced_step_location (gdbarch, linux_displaced_step_location);
    
    ... which already seemed wrong.  The Linux-specific callback was used
    even for non-Linux system.  Maybe that was on purpose, because it would
    also happen to work in some other non-Linux case, or maybe it was simply
    a mistake.  I'll assume that this was a small mistake when
    s390-tdep.{h,c} where factored out of s390-linux-tdep.c, in d6e589456475
    ("s390: Split up s390-linux-tdep.c into two files").
    
    Fix this by removing the setting of these displaced step callbacks from
    s390_gdbarch_init.  Instead, pass num_disp_step_buffers=1 to
    linux_init_abi, in s390_linux_init_abi_any.  Doing so will cause
    linux_init_abi to register these same callbacks.  It will also mean that
    when debugging a bare-metal s390 executable or an executable on another
    OS that GDB doesn't know about, gdbarch_displaced_step_prepare won't be
    set, so displaced stepping won't be used.
    
    This patch will need to be merged in the gdb-11-branch, since this is a
    GDB 11 regression, so here's the ChangeLog entry:
    
    gdb/ChangeLog:
    
            * s390-linux-tdep.c (s390_linux_init_abi_any): Pass 1 (number
            of displaced stepping buffers to linux_init_abi.
            * s390-tdep.c (s390_gdbarch_init): Don't set the Linux-specific
            displaced-stepping gdbarch callbacks.
    
    Change-Id: Ieab2f8990c78fde845ce7378d6fd4ee2833800d5
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28056
Comment 12 Simon Marchi 2021-07-08 14:10:56 UTC
Fixed by that commit.
Comment 13 Joel Brobecker 2021-07-11 14:30:27 UTC
(In reply to Simon Marchi from comment #12)
> Fixed by that commit.

Hi Simon. Did you mean to also change the PR status to resolved/fixed, by any chance?
Comment 14 Simon Marchi 2021-07-11 14:31:07 UTC
Woops, yes, thanks.