Bug 28512 - waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig() const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind == TARGET_WAITKIND_SIGNALLED' failed.
Summary: waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig() const: ...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-28 09:03 UTC by Tom de Vries
Modified: 2021-12-09 02:05 UTC (History)
3 users (show)

See Also:
Host: x86_64-linux, aarch64-linux
Target:
Build:
Last reconfirmed:


Attachments
gdb.log (10.65 KB, text/x-log)
2021-10-28 12:35 UTC, Tom de Vries
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2021-10-28 09:03:38 UTC
On openSUSE Leap 15.2 aarch64 I run into:
...
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 3: print seconds_left
detach^M
/home/tdevries/gdb/src/gdb/target/waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig() const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind == TARGET_WAITKIND_SIGNALLED' failed.^M
...
Comment 1 Tom de Vries 2021-10-28 09:06:02 UTC
I wonder whether this is fixed by "[PATCH 3/3] gdb, gdbserver: make target_waitstatus safe" ( https://sourceware.org/pipermail/gdb-patches/2021-October/182502.html ).
Comment 2 Simon Marchi 2021-10-28 11:49:19 UTC
(In reply to Tom de Vries from comment #1)
> I wonder whether this is fixed by "[PATCH 3/3] gdb, gdbserver: make
> target_waitstatus safe" (
> https://sourceware.org/pipermail/gdb-patches/2021-October/182502.html ).

In fact I think it is caused by it, in the sense that this patch just adds some assertions to make sure we access the active union field of target_waitstatus.  So that patch probably just exposes an existing bug, where the code uses .sig() although the union does not contain a signal number.
Comment 3 Tom de Vries 2021-10-28 12:25:00 UTC
(In reply to Simon Marchi from comment #2)
> (In reply to Tom de Vries from comment #1)
> > I wonder whether this is fixed by "[PATCH 3/3] gdb, gdbserver: make
> > target_waitstatus safe" (
> > https://sourceware.org/pipermail/gdb-patches/2021-October/182502.html ).
> 
> In fact I think it is caused by it, in the sense that this patch just adds
> some assertions to make sure we access the active union field of
> target_waitstatus.  So that patch probably just exposes an existing bug,
> where the code uses .sig() although the union does not contain a signal
> number.

Ah, right, I didn't realize this was already committed.  Anyway, a bug then.
Comment 4 Luis Machado 2021-10-28 12:28:56 UTC
It seems I'm out of luck. I don't run into any such failures with Ubuntu 20.04.

What are the versions of tools involved in your reproducer from openSUSE Leap 15.2?
Comment 5 Tom de Vries 2021-10-28 12:35:37 UTC
Created attachment 13738 [details]
gdb.log
Comment 6 Luis Machado 2021-10-28 12:40:19 UTC
Thanks.

I'll just post the backtrace bit...

FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 9: detach (GDB internal error)
Resyncing due to internal error.
0x4c82bb gdb_internal_backtrace_1
	/home/tdevries/gdb/src/gdb/bt-utils.c:121
0x4c82bb _Z22gdb_internal_backtracev
	/home/tdevries/gdb/src/gdb/bt-utils.c:164
0x7dd563 internal_vproblem
	/home/tdevries/gdb/src/gdb/utils.c:393
0x7dd713 _Z15internal_verrorPKciS0_St9__va_list
	/home/tdevries/gdb/src/gdb/utils.c:470
0x92ceeb _Z14internal_errorPKciS0_z
	/home/tdevries/gdb/src/gdbsupport/errors.cc:55
0x63b8d3 _ZNK17target_waitstatus3sigEv
	/home/tdevries/gdb/src/gdb/target/waitstatus.h:299
0x63b8d3 get_detach_signal
	/home/tdevries/gdb/src/gdb/linux-nat.c:1271
0x63f8f7 detach_one_lwp
	/home/tdevries/gdb/src/gdb/linux-nat.c:1341
0x63fb5f detach_callback
	/home/tdevries/gdb/src/gdb/linux-nat.c:1406
0x63fe07 _ZNK3gdb13function_viewIFiP8lwp_infoEEclES2_
	/home/tdevries/gdb/src/gdb/../gdbsupport/function-view.h:247
0x63fe07 _Z17iterate_over_lwps6ptid_tN3gdb13function_viewIFiP8lwp_infoEEE
	/home/tdevries/gdb/src/gdb/linux-nat.c:937
0x63ff47 _ZN16linux_nat_target6detachEP8inferiori
	/home/tdevries/gdb/src/gdb/linux-nat.c:1431
0x7787c7 _Z13target_detachP8inferiori
	/home/tdevries/gdb/src/gdb/target.c:2569
0x603caf _Z14detach_commandPKci
	/home/tdevries/gdb/src/gdb/infcmd.c:2702
0x4f6a83 _Z8cmd_funcP16cmd_list_elementPKci
	/home/tdevries/gdb/src/gdb/cli/cli-decode.c:2459
0x785bf7 _Z15execute_commandPKci
	/home/tdevries/gdb/src/gdb/top.c:670
0x5b3677 _Z15command_handlerPKc
	/home/tdevries/gdb/src/gdb/event-top.c:597
0x5b39db _Z20command_line_handlerOSt10unique_ptrIcN3gdb13xfree_deleterIcEEE
	/home/tdevries/gdb/src/gdb/event-top.c:782
0x5b40ef gdb_rl_callback_handler
	/home/tdevries/gdb/src/gdb/event-top.c:229
0x8595d3 rl_callback_read_char
	/home/tdevries/gdb/src/readline/readline/callback.c:281

Might be a generic GDB bug or bad signal/ptrace interaction, given I see no immediately obvious interaction with aarch64-specific code.
Comment 7 Tom de Vries 2021-10-28 12:49:43 UTC
(In reply to Luis Machado from comment #6)
> Might be a generic GDB bug or bad signal/ptrace interaction, given I see no
> immediately obvious interaction with aarch64-specific code.

Ack, reproduced on x86_64, using cpulimit -c 1.
Comment 8 Simon Marchi 2021-10-28 14:31:51 UTC
Ah, thanks for the backtrace (and those who added that automatic backtracing feature!).  I recognize it, I also hit it while working on an work-in-progress patch, I made the obvious fix of adding `&& tp->pending_waitstatus ().kind () == TARGET_WAITKIND_STOPPED` to the condition, before accessing `->sig ()`:

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index cada889c5348..dead4309704e 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1267,7 +1267,8 @@ get_detach_signal (struct lwp_info *lp)
 
       if (target_is_non_stop_p () && !tp->executing ())
        {
-         if (tp->has_pending_waitstatus ())
+         if (tp->has_pending_waitstatus ()
+             && tp->pending_waitstatus ().kind () == TARGET_WAITKIND_STOPPED)
            signo = tp->pending_waitstatus ().sig ();
          else
            signo = tp->stop_signal ();


I was wondering if it should check for TARGET_WAITKIND_SIGNALLED as well, but I don't think so.  If the target reported TARGET_WAITKIND_SIGNALLED for a process, it means it no longer exists (it's as if it had reported TARGET_WAITKIND_EXITED).  It's not possible do detach a thread that no longer exists.
Comment 9 Simon Marchi 2021-10-28 16:30:58 UTC
I'll cherry-pick my patch that fixes this and send it on its own.
Comment 10 Sourceware Commits 2021-12-09 02:03:19 UTC
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=df5ad102009c41ab4dfadbb8cfb8c8b2a02a4f78

commit df5ad102009c41ab4dfadbb8cfb8c8b2a02a4f78
Author: Simon Marchi <simon.marchi@efficios.com>
Date:   Wed Dec 1 09:40:03 2021 -0500

    gdb, gdbserver: detach fork child when detaching from fork parent
    
    While working with pending fork events, I wondered what would happen if
    the user detached an inferior while a thread of that inferior had a
    pending fork event.  What happens with the fork child, which is
    ptrace-attached by the GDB process (or by GDBserver), but not known to
    the core?  Sure enough, neither the core of GDB or the target detach the
    child process, so GDB (or GDBserver) just stays ptrace-attached to the
    process.  The result is that the fork child process is stuck, while you
    would expect it to be detached and run.
    
    Make GDBserver detach of fork children it knows about.  That is done in
    the generic handle_detach function.  Since a process_info already exists
    for the child, we can simply call detach_inferior on it.
    
    GDB-side, make the linux-nat and remote targets detach of fork children
    known because of pending fork events.  These pending fork events can be
    stored in:
    
     - thread_info::pending_waitstatus, if the core has consumed the event
       but then saved it for later (for example, because it got the event
       while stopping all threads, to present an all-stop stop on top of a
       non-stop target)
     - thread_info::pending_follow: if we ran to a "catch fork" and we
       detach at that moment
    
    Additionally, pending fork events can be in target-specific fields:
    
     - For linux-nat, they can be in lwp_info::status and
       lwp_info::waitstatus.
     - For the remote target, they could be stored as pending stop replies,
       saved in `remote_state::notif_state::pending_event`, if not
       acknowledged yet, or in `remote_state::stop_reply_queue`, if
       acknowledged.  I followed the model of remove_new_fork_children for
       this: call remote_notif_get_pending_events to process /
       acknowledge any unacknowledged notification, then look through
       stop_reply_queue.
    
    Update the gdb.threads/pending-fork-event.exp test (and rename it to
    gdb.threads/pending-fork-event-detach.exp) to try to detach the process
    while it is stopped with a pending fork event.  In order to verify that
    the fork child process is correctly detached and resumes execution
    outside of GDB's control, make that process create a file in the test
    output directory, and make the test wait $timeout seconds for that file
    to appear (it happens instantly if everything goes well).
    
    This test catches a bug in linux-nat.c, also reported as PR 28512
    ("waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig()
    const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind ==
    TARGET_WAITKIND_SIGNALLED' failed.).  When detaching a thread with a
    pending event, get_detach_signal unconditionally fetches the signal
    stored in the waitstatus (`tp->pending_waitstatus ().sig ()`).  However,
    that is only valid if the pending event is of type
    TARGET_WAITKIND_STOPPED, and this is now enforced using assertions (iit
    would also be valid for TARGET_WAITKIND_SIGNALLED, but that would mean
    the thread does not exist anymore, so we wouldn't be detaching it).  Add
    a condition in get_detach_signal to access the signal number only if the
    wait status is of kind TARGET_WAITKIND_STOPPED, and use GDB_SIGNAL_0
    instead (since the thread was not stopped with a signal to begin with).
    
    Add another test, gdb.threads/pending-fork-event-ns.exp, specifically to
    verify that we consider events in pending stop replies in the remote
    target.  This test has many threads constantly forking, and we detach
    from the program while the program is executing.  That gives us some
    chance that we detach while a fork stop reply is stored in the remote
    target.  To verify that we correctly detach all fork children, we ask
    the parent to exit by sending it a SIGUSR1 signal and have it write a
    file to the filesystem before exiting.  Because the parent's main thread
    joins the forking threads, and the forking threads wait for their fork
    children to exit, if some fork child is not detach by GDB, the parent
    will not write the file, and the test will time out.  If I remove the
    new remote_detach_pid calls in remote.c, the test fails eventually if I
    run it in a loop.
    
    There is a known limitation: we don't remove breakpoints from the
    children before detaching it.  So the children, could hit a trap
    instruction after being detached and crash.  I know this is wrong, and
    it should be fixed, but I would like to handle that later.  The current
    patch doesn't fix everything, but it's a step in the right direction.
    
    Change-Id: I6d811a56f520e3cb92d5ea563ad38976f92e93dd
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28512
Comment 11 Simon Marchi 2021-12-09 02:04:50 UTC
Should be fixed but patch mentioned above.
Comment 12 Simon Marchi 2021-12-09 02:05:08 UTC
Marking fixed for real.