On openSUSE Leap 15.2 aarch64 I run into: ... (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 3: print seconds_left detach^M /home/tdevries/gdb/src/gdb/target/waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig() const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind == TARGET_WAITKIND_SIGNALLED' failed.^M ...
I wonder whether this is fixed by "[PATCH 3/3] gdb, gdbserver: make target_waitstatus safe" ( https://sourceware.org/pipermail/gdb-patches/2021-October/182502.html ).
(In reply to Tom de Vries from comment #1) > I wonder whether this is fixed by "[PATCH 3/3] gdb, gdbserver: make > target_waitstatus safe" ( > https://sourceware.org/pipermail/gdb-patches/2021-October/182502.html ). In fact I think it is caused by it, in the sense that this patch just adds some assertions to make sure we access the active union field of target_waitstatus. So that patch probably just exposes an existing bug, where the code uses .sig() although the union does not contain a signal number.
(In reply to Simon Marchi from comment #2) > (In reply to Tom de Vries from comment #1) > > I wonder whether this is fixed by "[PATCH 3/3] gdb, gdbserver: make > > target_waitstatus safe" ( > > https://sourceware.org/pipermail/gdb-patches/2021-October/182502.html ). > > In fact I think it is caused by it, in the sense that this patch just adds > some assertions to make sure we access the active union field of > target_waitstatus. So that patch probably just exposes an existing bug, > where the code uses .sig() although the union does not contain a signal > number. Ah, right, I didn't realize this was already committed. Anyway, a bug then.
It seems I'm out of luck. I don't run into any such failures with Ubuntu 20.04. What are the versions of tools involved in your reproducer from openSUSE Leap 15.2?
Created attachment 13738 [details] gdb.log
Thanks. I'll just post the backtrace bit... FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 9: detach (GDB internal error) Resyncing due to internal error. 0x4c82bb gdb_internal_backtrace_1 /home/tdevries/gdb/src/gdb/bt-utils.c:121 0x4c82bb _Z22gdb_internal_backtracev /home/tdevries/gdb/src/gdb/bt-utils.c:164 0x7dd563 internal_vproblem /home/tdevries/gdb/src/gdb/utils.c:393 0x7dd713 _Z15internal_verrorPKciS0_St9__va_list /home/tdevries/gdb/src/gdb/utils.c:470 0x92ceeb _Z14internal_errorPKciS0_z /home/tdevries/gdb/src/gdbsupport/errors.cc:55 0x63b8d3 _ZNK17target_waitstatus3sigEv /home/tdevries/gdb/src/gdb/target/waitstatus.h:299 0x63b8d3 get_detach_signal /home/tdevries/gdb/src/gdb/linux-nat.c:1271 0x63f8f7 detach_one_lwp /home/tdevries/gdb/src/gdb/linux-nat.c:1341 0x63fb5f detach_callback /home/tdevries/gdb/src/gdb/linux-nat.c:1406 0x63fe07 _ZNK3gdb13function_viewIFiP8lwp_infoEEclES2_ /home/tdevries/gdb/src/gdb/../gdbsupport/function-view.h:247 0x63fe07 _Z17iterate_over_lwps6ptid_tN3gdb13function_viewIFiP8lwp_infoEEE /home/tdevries/gdb/src/gdb/linux-nat.c:937 0x63ff47 _ZN16linux_nat_target6detachEP8inferiori /home/tdevries/gdb/src/gdb/linux-nat.c:1431 0x7787c7 _Z13target_detachP8inferiori /home/tdevries/gdb/src/gdb/target.c:2569 0x603caf _Z14detach_commandPKci /home/tdevries/gdb/src/gdb/infcmd.c:2702 0x4f6a83 _Z8cmd_funcP16cmd_list_elementPKci /home/tdevries/gdb/src/gdb/cli/cli-decode.c:2459 0x785bf7 _Z15execute_commandPKci /home/tdevries/gdb/src/gdb/top.c:670 0x5b3677 _Z15command_handlerPKc /home/tdevries/gdb/src/gdb/event-top.c:597 0x5b39db _Z20command_line_handlerOSt10unique_ptrIcN3gdb13xfree_deleterIcEEE /home/tdevries/gdb/src/gdb/event-top.c:782 0x5b40ef gdb_rl_callback_handler /home/tdevries/gdb/src/gdb/event-top.c:229 0x8595d3 rl_callback_read_char /home/tdevries/gdb/src/readline/readline/callback.c:281 Might be a generic GDB bug or bad signal/ptrace interaction, given I see no immediately obvious interaction with aarch64-specific code.
(In reply to Luis Machado from comment #6) > Might be a generic GDB bug or bad signal/ptrace interaction, given I see no > immediately obvious interaction with aarch64-specific code. Ack, reproduced on x86_64, using cpulimit -c 1.
Ah, thanks for the backtrace (and those who added that automatic backtracing feature!). I recognize it, I also hit it while working on an work-in-progress patch, I made the obvious fix of adding `&& tp->pending_waitstatus ().kind () == TARGET_WAITKIND_STOPPED` to the condition, before accessing `->sig ()`: diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index cada889c5348..dead4309704e 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -1267,7 +1267,8 @@ get_detach_signal (struct lwp_info *lp) if (target_is_non_stop_p () && !tp->executing ()) { - if (tp->has_pending_waitstatus ()) + if (tp->has_pending_waitstatus () + && tp->pending_waitstatus ().kind () == TARGET_WAITKIND_STOPPED) signo = tp->pending_waitstatus ().sig (); else signo = tp->stop_signal (); I was wondering if it should check for TARGET_WAITKIND_SIGNALLED as well, but I don't think so. If the target reported TARGET_WAITKIND_SIGNALLED for a process, it means it no longer exists (it's as if it had reported TARGET_WAITKIND_EXITED). It's not possible do detach a thread that no longer exists.
I'll cherry-pick my patch that fixes this and send it on its own.
The master branch has been updated by Simon Marchi <simark@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=df5ad102009c41ab4dfadbb8cfb8c8b2a02a4f78 commit df5ad102009c41ab4dfadbb8cfb8c8b2a02a4f78 Author: Simon Marchi <simon.marchi@efficios.com> Date: Wed Dec 1 09:40:03 2021 -0500 gdb, gdbserver: detach fork child when detaching from fork parent While working with pending fork events, I wondered what would happen if the user detached an inferior while a thread of that inferior had a pending fork event. What happens with the fork child, which is ptrace-attached by the GDB process (or by GDBserver), but not known to the core? Sure enough, neither the core of GDB or the target detach the child process, so GDB (or GDBserver) just stays ptrace-attached to the process. The result is that the fork child process is stuck, while you would expect it to be detached and run. Make GDBserver detach of fork children it knows about. That is done in the generic handle_detach function. Since a process_info already exists for the child, we can simply call detach_inferior on it. GDB-side, make the linux-nat and remote targets detach of fork children known because of pending fork events. These pending fork events can be stored in: - thread_info::pending_waitstatus, if the core has consumed the event but then saved it for later (for example, because it got the event while stopping all threads, to present an all-stop stop on top of a non-stop target) - thread_info::pending_follow: if we ran to a "catch fork" and we detach at that moment Additionally, pending fork events can be in target-specific fields: - For linux-nat, they can be in lwp_info::status and lwp_info::waitstatus. - For the remote target, they could be stored as pending stop replies, saved in `remote_state::notif_state::pending_event`, if not acknowledged yet, or in `remote_state::stop_reply_queue`, if acknowledged. I followed the model of remove_new_fork_children for this: call remote_notif_get_pending_events to process / acknowledge any unacknowledged notification, then look through stop_reply_queue. Update the gdb.threads/pending-fork-event.exp test (and rename it to gdb.threads/pending-fork-event-detach.exp) to try to detach the process while it is stopped with a pending fork event. In order to verify that the fork child process is correctly detached and resumes execution outside of GDB's control, make that process create a file in the test output directory, and make the test wait $timeout seconds for that file to appear (it happens instantly if everything goes well). This test catches a bug in linux-nat.c, also reported as PR 28512 ("waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig() const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind == TARGET_WAITKIND_SIGNALLED' failed.). When detaching a thread with a pending event, get_detach_signal unconditionally fetches the signal stored in the waitstatus (`tp->pending_waitstatus ().sig ()`). However, that is only valid if the pending event is of type TARGET_WAITKIND_STOPPED, and this is now enforced using assertions (iit would also be valid for TARGET_WAITKIND_SIGNALLED, but that would mean the thread does not exist anymore, so we wouldn't be detaching it). Add a condition in get_detach_signal to access the signal number only if the wait status is of kind TARGET_WAITKIND_STOPPED, and use GDB_SIGNAL_0 instead (since the thread was not stopped with a signal to begin with). Add another test, gdb.threads/pending-fork-event-ns.exp, specifically to verify that we consider events in pending stop replies in the remote target. This test has many threads constantly forking, and we detach from the program while the program is executing. That gives us some chance that we detach while a fork stop reply is stored in the remote target. To verify that we correctly detach all fork children, we ask the parent to exit by sending it a SIGUSR1 signal and have it write a file to the filesystem before exiting. Because the parent's main thread joins the forking threads, and the forking threads wait for their fork children to exit, if some fork child is not detach by GDB, the parent will not write the file, and the test will time out. If I remove the new remote_detach_pid calls in remote.c, the test fails eventually if I run it in a loop. There is a known limitation: we don't remove breakpoints from the children before detaching it. So the children, could hit a trap instruction after being detached and crash. I know this is wrong, and it should be fixed, but I would like to handle that later. The current patch doesn't fix everything, but it's a step in the right direction. Change-Id: I6d811a56f520e3cb92d5ea563ad38976f92e93dd Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28512
Should be fixed but patch mentioned above.
Marking fixed for real.