On an aarch64-linux with gdb 13.2.1, I run into: ... (gdb) PASS: gdb.base/kill-during-detach.exp: exit_p=true: checkpoint_p=true: Create worker function: input 9: end python kill_and_detach()^M Traceback (most recent call last):^M File "<string>", line 1, in <module>^M File "<string>", line 7, in kill_and_detach^M gdb.error: Selected thread is running.^M Error while executing Python code.^M (gdb) FAIL: gdb.base/kill-during-detach.exp: exit_p=true: checkpoint_p=true: python kill_and_detach() ...
Created attachment 15276 [details] gdb.log
CC-ing recent test-case author.
(In reply to Tom de Vries from comment #0) > On an aarch64-linux with gdb 13.2.1, I run into: More specifically, Fedora Asahi Remix on an m1.
Created attachment 15279 [details] gdb.log (with internal-error instead of error) Backtrace at error: ... 0x4f53fb gdb_internal_backtrace_1 /home/vries/gdb/src/gdb/bt-utils.c:122 0x4f53fb _Z22gdb_internal_backtracev /home/vries/gdb/src/gdb/bt-utils.c:168 0x89d897 internal_vproblem /home/vries/gdb/src/gdb/utils.c:396 0x89db67 _Z15internal_verrorPKciS0_St9__va_list /home/vries/gdb/src/gdb/utils.c:476 0xa001e3 _Z18internal_error_locPKciS0_z /home/vries/gdb/src/gdbsupport/errors.cc:58 0x83773f _Z25validate_registers_accessv /home/vries/gdb/src/gdb/thread.c:1002 0x61b50f _Z17get_current_framev /home/vries/gdb/src/gdb/frame.c:1681 0x65a7a3 _Z27call_function_by_hand_dummyP5valueP4typeN3gdb10array_viewIS0_EEPFvPviES6_ /home/vries/gdb/src/gdb/infcall.c:893 0x5f2afb _ZN4expr9operation16evaluate_funcallEP4typeP10expression6nosidePKcRKSt6vectorISt 10unique_ptrIS0_St14default_deleteIS0_EESaISC_EE /home/vries/gdb/src/gdb/eval.c:678 0x5eeeaf _ZN4expr9operation17evaluate_for_castEP4typeP10expression6noside /home/vries/gdb/src/gdb/eval.c:2576 0x5effab _ZN10expression8evaluateEP4type6noside /home/vries/gdb/src/gdb/eval.c:111 0x5f019b _Z19parse_and_eval_longPKc /home/vries/gdb/src/gdb/eval.c:66 0x69a237 call_lseek /home/vries/gdb/src/gdb/linux-fork.c:210 0x69a237 fork_load_infrun_state /home/vries/gdb/src/gdb/linux-fork.c:237 0x69ad13 _Z17linux_fork_detachiP8lwp_info /home/vries/gdb/src/gdb/linux-fork.c:393 0x6a1533 _ZN16linux_nat_target6detachEP8inferiori /home/vries/gdb/src/gdb/linux-nat.c:1506 0x6afd07 _ZN16thread_db_target6detachEP8inferiori /home/vries/gdb/src/gdb/linux-thread-db.c:1385 0x8324e3 _Z13target_detachP8inferiori /home/vries/gdb/src/gdb/target.c:2527 0x65d4cf _Z14detach_commandPKci /home/vries/gdb/src/gdb/infcmd.c:2837 0x524633 _Z8cmd_funcP16cmd_list_elementPKci /home/vries/gdb/src/gdb/cli/cli-decode.c:2735 0x841a07 _Z15execute_commandPKci /home/vries/gdb/src/gdb/top.c:575 0x52dfe3 execute_control_command_1 /home/vries/gdb/src/gdb/cli/cli-script.c:529 0x52e497 _Z24execute_control_commandsP12command_linei /home/vries/gdb/src/gdb/cli/cli-script.c:411 0x75e6d3 execute_gdb_command /home/vries/gdb/src/gdb/python/python.c:676 0xffff266ecedb ??? 0xffff266bc447 ??? 0xffff266c8b63 ??? 0xffff2677fac7 ??? 0xffff267b0cdf ??? 0xffff267a9bd7 ??? 0xffff267971a7 ??? 0xffff267970c3 ??? 0x75fba7 python_command /home/vries/gdb/src/gdb/python/python.c:436 0x524633 _Z8cmd_funcP16cmd_list_elementPKci /home/vries/gdb/src/gdb/cli/cli-decode.c:2735 0x841a07 _Z15execute_commandPKci /home/vries/gdb/src/gdb/top.c:575 0x5f6d73 _Z15command_handlerPKc /home/vries/gdb/src/gdb/event-top.c:566 0x5f80eb _Z20command_line_handlerOSt10unique_ptrIcN3gdb13xfree_deleterIcEEE /home/vries/gdb/src/gdb/event-top.c:802 0x5f76b3 gdb_rl_callback_handler /home/vries/gdb/src/gdb/event-top.c:259 0x8eb9fb rl_callback_read_char /home/vries/gdb/src/readline/readline/callback.c:290 0x5f77eb gdb_rl_callback_read_char_wrapper_noexcept /home/vries/gdb/src/gdb/event-top.c:195 0x5f7963 gdb_rl_callback_read_char_wrapper /home/vries/gdb/src/gdb/event-top.c:234 0x8797cf stdin_event_handler /home/vries/gdb/src/gdb/ui.c:155 0xa008e7 gdb_wait_for_event /home/vries/gdb/src/gdbsupport/event-loop.cc:716 0xa0135b _Z16gdb_do_one_eventi /home/vries/gdb/src/gdbsupport/event-loop.cc:264 0x6bfccf start_event_loop /home/vries/gdb/src/gdb/main.c:408 0x6bfccf captured_command_loop /home/vries/gdb/src/gdb/main.c:472 0x6c2403 captured_main /home/vries/gdb/src/gdb/main.c:1343 0x6c2403 _Z8gdb_mainP18captured_main_args /home/vries/gdb/src/gdb/main.c:1362 0x4244e3 main /home/vries/gdb/src/gdb/gdb.c:39 ...
Minimal version: ... $ cat gdb.in file /home/vries/gdb/build/gdb/testsuite/outputs/gdb.base/kill-during-detach/kill-during-detach start checkpoint continue & detach $ gdb -q -batch -ex "set trace-commands on" -x gdb.in +file /home/vries/gdb/build/gdb/testsuite/outputs/gdb.base/kill-during-detach/kill-during-detach +start Temporary breakpoint 1 at 0x4101f0: file /home/vries/gdb/src/gdb/testsuite/gdb.base/kill-during-detach.c, line 25. This GDB supports auto-downloading debuginfo from the following URLs: <https://debuginfod.fedoraproject.org/> Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal] Debuginfod has been disabled. To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Temporary breakpoint 1, main () at /home/vries/gdb/src/gdb/testsuite/gdb.base/kill-during-detach.c:25 25 alarm (300); +checkpoint +continue & +detach gdb.in:5: Error in sourced command file: Selected thread is running. ...
I've investigated why this doesn't trigger on x86_64. I found that in the x86_64 case, we've only got fds 0, 1 and 2, and none of them get their file pointer restored, so inferior lseek is not called during linux_fork_detach. In contrast, in the m1 case, we've got fds 42 and 52, whose file pointers are to be restored to 0, so inferior lseek is called during linux_fork_detach. File descriptors in more detail: ... $ ls -la /proc/3487043/fd total 0 dr-x------. 2 vries vries 5 3 jan 11:41 . dr-xr-xr-x. 9 vries vries 0 3 jan 11:41 .. lrwx------. 1 vries vries 64 3 jan 11:41 0 -> /dev/pts/1 lrwx------. 1 vries vries 64 3 jan 11:41 1 -> /dev/pts/1 lrwx------. 1 vries vries 64 3 jan 11:41 2 -> /dev/pts/1 lr-x------. 1 vries vries 64 3 jan 11:41 42 -> anon_inode:inotify lrwx------. 1 vries vries 64 3 jan 11:41 52 -> '/memfd:pulseaudio (deleted)' ... So, my guess is that opening a file before the checkpoint and reading something from the file after the checkpoint will also trigger this on x86_64.
I, too, have Fedora Asahi Remix running on a macbook m1. I've attempted to reproduce this problem with Fedora 38 and then, after an update, with Fedora 39. Thus far, I haven't been able to reproduce it. When I run it, it shows 54 expected passes.
(In reply to Kevin Buettner from comment #7) > Thus far, I haven't been able to reproduce it. I started from a clean os install, built and tested gdb, and then used grepping UNTESTED/UNSUPPORTED to add packages and configure settings to make the build more complete. At some point in this process I started running into this failure. So I decided to revert this process, but didn't manage, a very minimal gdb still reproduced the failure 100% of the time. I also did some updates at some point, so I was in the 'need restart' state. After restarting, the failure disappeared. I investigated the file descriptor list in /proc/<pid>/fd, and found two sockets where I previously found the files mentioned in comment 6. Anyway, I wrote a patch for the test-case that allows us reproduce the failure reliably, also on x86-64-linux.
Created attachment 15281 [details] Trigger patch
Not sure if this is the right place or way, but this fixes the FAIL: ... diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c index 1430ff89fa7..0d97ec1ba04 100644 --- a/gdb/linux-fork.c +++ b/gdb/linux-fork.c @@ -228,6 +228,8 @@ fork_load_infrun_state (struct fork_info *fp) inferior_thread ()->set_stop_pc (regcache_read_pc (get_thread_regcache (inferior_thread ()))); nullify_last_target_wait_ptid (); + inferior_thread ()->set_executing (false); + inferior_thread ()->set_resumed (false); /* Now restore the file positions of open file descriptors. */ if (fp->filepos) ...
(In reply to Tom de Vries from comment #10) > Not sure if this is the right place or way, but this fixes the FAIL: > ... > diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c > index 1430ff89fa7..0d97ec1ba04 100644 > --- a/gdb/linux-fork.c > +++ b/gdb/linux-fork.c > @@ -228,6 +228,8 @@ fork_load_infrun_state (struct fork_info *fp) > inferior_thread ()->set_stop_pc > (regcache_read_pc (get_thread_regcache (inferior_thread ()))); > nullify_last_target_wait_ptid (); > + inferior_thread ()->set_executing (false); > + inferior_thread ()->set_resumed (false); > > /* Now restore the file positions of open file descriptors. */ > if (fp->filepos) > ... There are multiple problems with linux-fork.c that need to be addressed at some point, but your patch looks reasonable to me.
(In reply to Kevin Buettner from comment #11) > (In reply to Tom de Vries from comment #10) > > Not sure if this is the right place or way, but this fixes the FAIL: > > ... > > diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c > > index 1430ff89fa7..0d97ec1ba04 100644 > > --- a/gdb/linux-fork.c > > +++ b/gdb/linux-fork.c > > @@ -228,6 +228,8 @@ fork_load_infrun_state (struct fork_info *fp) > > inferior_thread ()->set_stop_pc > > (regcache_read_pc (get_thread_regcache (inferior_thread ()))); > > nullify_last_target_wait_ptid (); > > + inferior_thread ()->set_executing (false); > > + inferior_thread ()->set_resumed (false); > > > > /* Now restore the file positions of open file descriptors. */ > > if (fp->filepos) > > ... > > There are multiple problems with linux-fork.c that need to be addressed > at some point, but your patch looks reasonable to me. Thanks for the pre-submission review, submitted ( https://sourceware.org/pipermail/gdb-patches/2024-January/205754.html ).
The master branch has been updated by Tom de Vries <vries@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=4ece39c56cfdd5647d4061f3c084b9de6f9e443c commit 4ece39c56cfdd5647d4061f3c084b9de6f9e443c Author: Tom de Vries <tdevries@suse.de> Date: Thu Jan 11 10:12:48 2024 +0100 [gdb/testsuite] Extend gdb.base/kill-during-detach.exp I ran into the following FAIL: ... (gdb) python kill_and_detach()^M Traceback (most recent call last):^M File "<string>", line 1, in <module>^M File "<string>", line 7, in kill_and_detach^M gdb.error: Selected thread is running.^M Error while executing Python code.^M (gdb) FAIL: gdb.base/kill-during-detach.exp: exit_p=true: checkpoint_p=true: \ python kill_and_detach() ... The FAIL happens as follows: - gdb is debugging a process A - a checkpoint is created, in other words, fork is called in the inferior, after which we have: - checkpoint 0 (the fork parent, process A), and - checkpoint 1 (the fork child, process B). - during checkpoint creation, lseek is called in the inferior (process A) for all file descriptors, and it returns != -1 for at least one file descriptor. - the process A continues in the background - gdb detaches, from process A - gdb switches to process B, in other words, it restarts checkpoint 1 - while restarting checkpoint 1, gdb tries to call lseek in the inferior (process B), but this fails because gdb incorrectly thinks that inferior B is running. This happens because linux_nat_switch_fork patches the pid of process B into the current inferior and current thread which where originally representing process A. So, because process A was running in the background, the thread_info fields executing and resumed are set accordingly, but they are not correct for process B. There's a line in fork_load_infrun_state that fixes up the thread_info field stop_pc, so fix this by adding similar fixups for the executing and resumed fields alongside. The FAIL did not always reproduce, so extend the test-case to reliably trigger this scenario. Tested on x86_64-linux. Approved-By: Kevin Buettner <kevinb@redhat.com> PR gdb/31203 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31203
Fixed.
(In reply to Tom de Vries from comment #0) > On an aarch64-linux with gdb 13.2.1, I run into: Hmm, that should have been gcc, not gdb.