Bug 30252 - gdb-13.1/gdb/target.c:2641: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed
Summary: gdb-13.1/gdb/target.c:2641: internal-error: target_resume: Assertion `inferio...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: tdep (show other bugs)
Version: 13.1
: P2 normal
Target Milestone: 14.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-20 16:00 UTC by Petr Šumbera
Modified: 2023-07-06 14:15 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
assertion hit with set debug infrun 1 (2.60 KB, text/plain)
2023-03-22 08:06 UTC, Petr Šumbera
Details
Fix (1.68 KB, text/plain)
2023-03-23 17:36 UTC, Pedro Alves
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Šumbera 2023-03-20 16:00:19 UTC
I see following regression/internal-error when using gdb (13.1) on Solaris x86:

(gdb) run -P
Starting program: /usr/bin/firefox -P
[Thread debugging using libthread_db enabled]
warning: could not convert 'mutex_t' from the host encoding (ISO-8859-1) to UTF-32.
This normally should not happen, please file a bug report.
[New Thread 1 (LWP 1)]
/builds/psumbera/userland-gdb-13/components/gdb/gdb-13.1/gdb/target.c:2641: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
----- Backtrace -----
0x9b6213 ???
0xd3e52c ???
0xd3e778 ???
0xe798bc ???
0xcee510 ???
0xbdefca ???
0xc7273b ???
0xcf1443 ???
0xb3b9c4 ???
0xb4d008 ???
0x979b2f ???
0xe7a684 ???
0xb777f1 ???
0xb791e4 ???
0x922da6 ???
0x922c12 ???
---------------------
/builds/psumbera/userland-gdb-13/components/gdb/gdb-13.1/gdb/target.c:2641: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

--

I used git bisect and it says:

d51926f06a7f4854bebdd71dcb0a78dbaa2f4168 is the first bad commit
commit d51926f06a7f4854bebdd71dcb0a78dbaa2f4168
Author: Pedro Alves <pedro@palves.net>
Date:   Thu Apr 21 14:20:36 2022 +0100

    Slightly tweak and clarify target_resume's interface
Comment 1 Petr Šumbera 2023-03-21 09:08:23 UTC
The problematic commit added following two assertions:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/target.c;h=0cebecfafc379534180a6563d246cdbc3488311e;hb=HEAD#l2648

2648   gdb_assert (inferior_ptid != null_ptid);
2649   gdb_assert (inferior_ptid.matches (scope_ptid));

When they are removed it seems to work as expected/before.
Comment 2 Tom Tromey 2023-03-21 14:55:51 UTC
I'm not super conversant with this area but I think that, most likely,
patching out these asserts will let it work in some scenarios but
crash in others.  Probably the native target needs some extra work here,
though unfortunately I don't know exactly what.
Comment 3 Simon Marchi 2023-03-22 01:02:08 UTC
(In reply to Tom Tromey from comment #2)
> I'm not super conversant with this area but I think that, most likely,
> patching out these asserts will let it work in some scenarios but
> crash in others.  Probably the native target needs some extra work here,
> though unfortunately I don't know exactly what.

I agree.

Can you run the same test case, but with doing "set debug infrun 1" first?
Comment 4 Petr Šumbera 2023-03-22 08:06:26 UTC
Created attachment 14771 [details]
assertion hit with set debug infrun 1

Thanks for looking at it. What we are looking for?
Comment 5 Pedro Alves 2023-03-23 16:13:28 UTC
The problem is that the backend called target_resume without switching to the leader thread of that resumption.

- inferior_ptid is a global in gdb that points at the currently selected thread.
- inferior_ptid == null_ptid means no thread is selected at that point.

- The target_resume interface is:

 /* Resume execution (or prepare for execution) of the current thread
    (INFERIOR_PTID), while optionally letting other threads of the
    current process or all processes run free.
    ...

Thus calling target_resume with inferior_ptid == null_ptid is bogus.

target_wait (which leads to procfs_target::wait on Solaris) is called with inferior_ptid == null_ptid on entry exactly to help catch such bogus uses.  It's procfs_target::wait that is calling target_resume incorrectly.  From the backtrace, it seems that the line in question is 2187, which has:

		    /* How to keep going without returning to wfi: */
		    target_continue_no_signal (ptid);
		    goto wait_again;

target_continue_no_signal is a small wrapper around target_resume, which would make sense.  I guess we don't see target_continue_no_signal in the backtrace because that is an optimized gdb build.

I'm writing a fix.
Comment 6 Pedro Alves 2023-03-23 17:36:21 UTC
Created attachment 14777 [details]
Fix

Could you give the attached patch a try?

I was able to confirm that gdb/procfs.c (the modified file) at least compiles using an old Solaris 11 VM I had lying around, but I wasn't able to build GDB fully due to other issues.
Comment 7 Petr Šumbera 2023-03-24 08:06:12 UTC
(In reply to Pedro Alves from comment #6)
> Created attachment 14777 [details]
> Fix
> 
> Could you give the attached patch a try?

Yes. Thank you! Assertion is not hit in my test case.
 
> I was able to confirm that gdb/procfs.c (the modified file) at least
> compiles using an old Solaris 11 VM I had lying around, but I wasn't able to
> build GDB fully due to other issues.

There is one year old release which you should be able to use for such testing:

https://blogs.oracle.com/solaris/post/building-open-source-software-on-oracle-solaris-114-cbe-release
Comment 8 Marcel Telka 2023-06-12 20:16:26 UTC
I hit the same bug with gdb 13.2 on OpenIndiana (illumos based distro) and the attached patch solved the problem.
Comment 9 Sourceware Commits 2023-07-06 14:10:26 UTC
The master branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b2ad7bb9e6a012699195d3eda9d40679c406ebdc

commit b2ad7bb9e6a012699195d3eda9d40679c406ebdc
Author: Pedro Alves <pedro@palves.net>
Date:   Thu Jul 6 15:05:11 2023 +0100

    Fix Solaris regression (PR tdep/30252)
    
    PR tdep/30252 reports that using GDB on Solaris fails an assertion in
    target_resume:
    
     target.c:2648: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed.
     A problem internal to GDB has been detected,
     further debugging may prove unreliable.
     Quit this debugging session? (y or n)
    
    The backtrace, after running it through c++filt, looks like:
    
     ----- Backtrace -----
     0xa18914 gdb_internal_backtrace_1
             /root/binutils-gdb/gdb/bt-utils.c:122
     0xa18914 gdb_internal_backtrace()
             /root/binutils-gdb/gdb/bt-utils.c:168
     0xdec834 internal_vproblem
             /root/binutils-gdb/gdb/utils.c:401
     0xdecad8 internal_verror(char const*, int, char const*, __va_list_tag*)
             /root/binutils-gdb/gdb/utils.c:481
     0xf3638c internal_error_loc(char const*, int, char const*, ...)
             /root/binutils-gdb/gdbsupport/errors.cc:58
     0xd70580 target_resume(ptid_t, int, gdb_signal)
             /root/binutils-gdb/gdb/target.c:2648
     0xc59e85 procfs_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>)
             /root/binutils-gdb/gdb/procfs.c:2187
     0xcf6da7 sol_thread_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>)
             /root/binutils-gdb/gdb/sol-thread.c:442
     0xd73711 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>)
             /root/binutils-gdb/gdb/target.c:2586
     ...
    
    The problem is that the procfs backend, while inside target_wait,
    called target_resume without switching to the leader thread of that
    resumption.
    
    The target_resume interface is:
    
     /* Resume execution (or prepare for execution) of the current thread
        (INFERIOR_PTID), while optionally letting other threads of the
        current process or all processes run free.
        ...
    
    Thus calling target_resume with inferior_ptid == null_ptid is bogus.
    
    target_wait (which leads to procfs_target::wait on Solaris) is called
    with inferior_ptid == null_ptid on entry exactly to help catch such
    bogus uses.
    
    From the backtrace, it seems that the relevant line in question is
    procfs.c:2187:
    
    2186  /* How to keep going without returning to wfi: */
    2187  target_continue_no_signal (ptid);
    2188  goto wait_again;
    
    target_continue_no_signal is a small wrapper around target_resume,
    which would make sense.
    
    The fix is to not call target_resume or go via the target stack at
    all.  Instead, factor out a new proc_resume function out of
    procfs_target::resume, and call that.  The new function does not rely
    on inferior_ptid.
    
    I've not been able to test it myself, but Petr confirmed it fixes the
    assertion failure with his test case, and Marcel Telka also confirmed
    it solves the problem.
    
    Tested-By: Petr Å umbera <petr.sumbera@oracle.com>
    Tested-By: Marcel Telka <marcel@telka.sk>
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30252
    Change-Id: I6213c59b081d400a22e799ee621c2eff6dcafbf3
Comment 10 Sourceware Commits 2023-07-06 14:12:37 UTC
The gdb-13-branch branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=dfe07f10de81a38f595022556873d16585ea2b7e

commit dfe07f10de81a38f595022556873d16585ea2b7e
Author: Pedro Alves <pedro@palves.net>
Date:   Thu Jul 6 15:05:11 2023 +0100

    Fix Solaris regression (PR tdep/30252)
    
    PR tdep/30252 reports that using GDB on Solaris fails an assertion in
    target_resume:
    
     target.c:2648: internal-error: target_resume: Assertion `inferior_ptid != null_ptid' failed.
     A problem internal to GDB has been detected,
     further debugging may prove unreliable.
     Quit this debugging session? (y or n)
    
    The backtrace, after running it through c++filt, looks like:
    
     ----- Backtrace -----
     0xa18914 gdb_internal_backtrace_1
             /root/binutils-gdb/gdb/bt-utils.c:122
     0xa18914 gdb_internal_backtrace()
             /root/binutils-gdb/gdb/bt-utils.c:168
     0xdec834 internal_vproblem
             /root/binutils-gdb/gdb/utils.c:401
     0xdecad8 internal_verror(char const*, int, char const*, __va_list_tag*)
             /root/binutils-gdb/gdb/utils.c:481
     0xf3638c internal_error_loc(char const*, int, char const*, ...)
             /root/binutils-gdb/gdbsupport/errors.cc:58
     0xd70580 target_resume(ptid_t, int, gdb_signal)
             /root/binutils-gdb/gdb/target.c:2648
     0xc59e85 procfs_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>)
             /root/binutils-gdb/gdb/procfs.c:2187
     0xcf6da7 sol_thread_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>)
             /root/binutils-gdb/gdb/sol-thread.c:442
     0xd73711 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>)
             /root/binutils-gdb/gdb/target.c:2586
     ...
    
    The problem is that the procfs backend, while inside target_wait,
    called target_resume without switching to the leader thread of that
    resumption.
    
    The target_resume interface is:
    
     /* Resume execution (or prepare for execution) of the current thread
        (INFERIOR_PTID), while optionally letting other threads of the
        current process or all processes run free.
        ...
    
    Thus calling target_resume with inferior_ptid == null_ptid is bogus.
    
    target_wait (which leads to procfs_target::wait on Solaris) is called
    with inferior_ptid == null_ptid on entry exactly to help catch such
    bogus uses.
    
    From the backtrace, it seems that the relevant line in question is
    procfs.c:2187:
    
    2186  /* How to keep going without returning to wfi: */
    2187  target_continue_no_signal (ptid);
    2188  goto wait_again;
    
    target_continue_no_signal is a small wrapper around target_resume,
    which would make sense.
    
    The fix is to not call target_resume or go via the target stack at
    all.  Instead, factor out a new proc_resume function out of
    procfs_target::resume, and call that.  The new function does not rely
    on inferior_ptid.
    
    I've not been able to test it myself, but Petr confirmed it fixes the
    assertion failure with his test case, and Marcel Telka also confirmed
    it solves the problem.
    
    Tested-By: Petr Å umbera <petr.sumbera@oracle.com>
    Tested-By: Marcel Telka <marcel@telka.sk>
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30252
    Change-Id: I6213c59b081d400a22e799ee621c2eff6dcafbf3
Comment 11 Pedro Alves 2023-07-06 14:15:28 UTC
Thanks for extra confirmation Marcel, I would have forgotten about this patch without it.

I've now merged the patch to both master and the gdb 13 branch.  I am not sure if there will ever be another GDB 13 release, most likely there won't, but, if you need to pull the fix for some downstream release, it's there.