Bug 31203 - [gdb] FAIL: gdb.base/kill-during-detach.exp: exit_p=true: checkpoint_p=true: python kill_and_detach()
Summary: [gdb] FAIL: gdb.base/kill-during-detach.exp: exit_p=true: checkpoint_p=true: ...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 15.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-01 20:04 UTC by Tom de Vries
Modified: 2024-01-11 09:15 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
gdb.log (1.59 KB, text/x-log)
2024-01-01 20:04 UTC, Tom de Vries
Details
gdb.log (with internal-error instead of error) (3.68 KB, text/x-log)
2024-01-03 08:47 UTC, Tom de Vries
Details
Trigger patch (1.38 KB, patch)
2024-01-04 11:21 UTC, Tom de Vries
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2024-01-01 20:04:08 UTC
On an aarch64-linux with gdb 13.2.1, I run into:
...
(gdb) PASS: gdb.base/kill-during-detach.exp: exit_p=true: checkpoint_p=true: Create worker function: input 9: end
python kill_and_detach()^M
Traceback (most recent call last):^M
  File "<string>", line 1, in <module>^M
  File "<string>", line 7, in kill_and_detach^M
gdb.error: Selected thread is running.^M
Error while executing Python code.^M
(gdb) FAIL: gdb.base/kill-during-detach.exp: exit_p=true: checkpoint_p=true: python kill_and_detach()
...
Comment 1 Tom de Vries 2024-01-01 20:04:33 UTC
Created attachment 15276 [details]
gdb.log
Comment 2 Tom de Vries 2024-01-02 08:01:00 UTC
CC-ing recent test-case author.
Comment 3 Tom de Vries 2024-01-02 08:36:42 UTC
(In reply to Tom de Vries from comment #0)
> On an aarch64-linux with gdb 13.2.1, I run into:

More specifically, Fedora Asahi Remix on an m1.
Comment 4 Tom de Vries 2024-01-03 08:47:57 UTC
Created attachment 15279 [details]
gdb.log (with internal-error instead of error)

Backtrace at error:
...
0x4f53fb gdb_internal_backtrace_1
        /home/vries/gdb/src/gdb/bt-utils.c:122
0x4f53fb _Z22gdb_internal_backtracev
        /home/vries/gdb/src/gdb/bt-utils.c:168
0x89d897 internal_vproblem
        /home/vries/gdb/src/gdb/utils.c:396
0x89db67 _Z15internal_verrorPKciS0_St9__va_list
        /home/vries/gdb/src/gdb/utils.c:476
0xa001e3 _Z18internal_error_locPKciS0_z
        /home/vries/gdb/src/gdbsupport/errors.cc:58
0x83773f _Z25validate_registers_accessv
        /home/vries/gdb/src/gdb/thread.c:1002
0x61b50f _Z17get_current_framev
        /home/vries/gdb/src/gdb/frame.c:1681
0x65a7a3 _Z27call_function_by_hand_dummyP5valueP4typeN3gdb10array_viewIS0_EEPFvPviES6_
        /home/vries/gdb/src/gdb/infcall.c:893
0x5f2afb _ZN4expr9operation16evaluate_funcallEP4typeP10expression6nosidePKcRKSt6vectorISt
10unique_ptrIS0_St14default_deleteIS0_EESaISC_EE
        /home/vries/gdb/src/gdb/eval.c:678
0x5eeeaf _ZN4expr9operation17evaluate_for_castEP4typeP10expression6noside
        /home/vries/gdb/src/gdb/eval.c:2576
0x5effab _ZN10expression8evaluateEP4type6noside
        /home/vries/gdb/src/gdb/eval.c:111
0x5f019b _Z19parse_and_eval_longPKc
        /home/vries/gdb/src/gdb/eval.c:66
0x69a237 call_lseek
        /home/vries/gdb/src/gdb/linux-fork.c:210
0x69a237 fork_load_infrun_state
        /home/vries/gdb/src/gdb/linux-fork.c:237
0x69ad13 _Z17linux_fork_detachiP8lwp_info
        /home/vries/gdb/src/gdb/linux-fork.c:393
0x6a1533 _ZN16linux_nat_target6detachEP8inferiori
        /home/vries/gdb/src/gdb/linux-nat.c:1506
0x6afd07 _ZN16thread_db_target6detachEP8inferiori
        /home/vries/gdb/src/gdb/linux-thread-db.c:1385
0x8324e3 _Z13target_detachP8inferiori
        /home/vries/gdb/src/gdb/target.c:2527
0x65d4cf _Z14detach_commandPKci
        /home/vries/gdb/src/gdb/infcmd.c:2837
0x524633 _Z8cmd_funcP16cmd_list_elementPKci
        /home/vries/gdb/src/gdb/cli/cli-decode.c:2735
0x841a07 _Z15execute_commandPKci
        /home/vries/gdb/src/gdb/top.c:575
0x52dfe3 execute_control_command_1
        /home/vries/gdb/src/gdb/cli/cli-script.c:529
0x52e497 _Z24execute_control_commandsP12command_linei
        /home/vries/gdb/src/gdb/cli/cli-script.c:411
0x75e6d3 execute_gdb_command
        /home/vries/gdb/src/gdb/python/python.c:676
0xffff266ecedb ???
0xffff266bc447 ???
0xffff266c8b63 ???
0xffff2677fac7 ???
0xffff267b0cdf ???
0xffff267a9bd7 ???
0xffff267971a7 ???
0xffff267970c3 ???
0x75fba7 python_command
        /home/vries/gdb/src/gdb/python/python.c:436
0x524633 _Z8cmd_funcP16cmd_list_elementPKci
        /home/vries/gdb/src/gdb/cli/cli-decode.c:2735
0x841a07 _Z15execute_commandPKci
        /home/vries/gdb/src/gdb/top.c:575
0x5f6d73 _Z15command_handlerPKc
        /home/vries/gdb/src/gdb/event-top.c:566
0x5f80eb _Z20command_line_handlerOSt10unique_ptrIcN3gdb13xfree_deleterIcEEE
        /home/vries/gdb/src/gdb/event-top.c:802
0x5f76b3 gdb_rl_callback_handler
        /home/vries/gdb/src/gdb/event-top.c:259
0x8eb9fb rl_callback_read_char
        /home/vries/gdb/src/readline/readline/callback.c:290
0x5f77eb gdb_rl_callback_read_char_wrapper_noexcept
        /home/vries/gdb/src/gdb/event-top.c:195
0x5f7963 gdb_rl_callback_read_char_wrapper
        /home/vries/gdb/src/gdb/event-top.c:234
0x8797cf stdin_event_handler
        /home/vries/gdb/src/gdb/ui.c:155
0xa008e7 gdb_wait_for_event
        /home/vries/gdb/src/gdbsupport/event-loop.cc:716
0xa0135b _Z16gdb_do_one_eventi
        /home/vries/gdb/src/gdbsupport/event-loop.cc:264
0x6bfccf start_event_loop
        /home/vries/gdb/src/gdb/main.c:408
0x6bfccf captured_command_loop
        /home/vries/gdb/src/gdb/main.c:472
0x6c2403 captured_main
        /home/vries/gdb/src/gdb/main.c:1343
0x6c2403 _Z8gdb_mainP18captured_main_args
        /home/vries/gdb/src/gdb/main.c:1362
0x4244e3 main
        /home/vries/gdb/src/gdb/gdb.c:39
...
Comment 5 Tom de Vries 2024-01-03 10:31:29 UTC
Minimal version:
...
$ cat gdb.in
file /home/vries/gdb/build/gdb/testsuite/outputs/gdb.base/kill-during-detach/kill-during-detach
start
checkpoint
continue &
detach
$ gdb -q -batch -ex "set trace-commands on" -x gdb.in
+file /home/vries/gdb/build/gdb/testsuite/outputs/gdb.base/kill-during-detach/kill-during-detach
+start
Temporary breakpoint 1 at 0x4101f0: file /home/vries/gdb/src/gdb/testsuite/gdb.base/kill-during-detach.c, line 25.

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Temporary breakpoint 1, main () at /home/vries/gdb/src/gdb/testsuite/gdb.base/kill-during-detach.c:25
25        alarm (300);
+checkpoint
+continue &
+detach
gdb.in:5: Error in sourced command file:
Selected thread is running.
...
Comment 6 Tom de Vries 2024-01-03 10:58:58 UTC
I've investigated why this doesn't trigger on x86_64.

I found that in the x86_64 case, we've only got fds 0, 1 and 2, and none of them get 
their file pointer restored, so inferior lseek is not called during linux_fork_detach.

In contrast, in the m1 case, we've got fds 42 and 52, whose file pointers are to be restored to 0, so inferior lseek is called during linux_fork_detach.

File descriptors in more detail:
...
$ ls -la /proc/3487043/fd
total 0
dr-x------. 2 vries vries  5  3 jan 11:41 .
dr-xr-xr-x. 9 vries vries  0  3 jan 11:41 ..
lrwx------. 1 vries vries 64  3 jan 11:41 0 -> /dev/pts/1
lrwx------. 1 vries vries 64  3 jan 11:41 1 -> /dev/pts/1
lrwx------. 1 vries vries 64  3 jan 11:41 2 -> /dev/pts/1
lr-x------. 1 vries vries 64  3 jan 11:41 42 -> anon_inode:inotify
lrwx------. 1 vries vries 64  3 jan 11:41 52 -> '/memfd:pulseaudio (deleted)'
...

So, my guess is that opening a file before the checkpoint and reading something from the file after the checkpoint will also trigger this on x86_64.
Comment 7 Kevin Buettner 2024-01-04 04:37:56 UTC
I, too, have Fedora Asahi Remix running on a macbook m1.  I've attempted to reproduce this problem with Fedora 38 and then, after an update, with Fedora 39.  Thus far, I haven't been able to reproduce it.  When I run it, it shows 54 expected passes.
Comment 8 Tom de Vries 2024-01-04 11:20:44 UTC
(In reply to Kevin Buettner from comment #7)
> Thus far, I haven't been able to reproduce it.

I started from a clean os install, built and tested gdb, and then used grepping UNTESTED/UNSUPPORTED to add packages and configure settings to make the build more complete.  At some point in this process I started running into this failure.  So I decided to revert this process, but didn't manage, a very minimal gdb still reproduced the failure 100% of the time.

I also did some updates at some point, so I was in the 'need restart' state.  After restarting, the failure disappeared.

I investigated the file descriptor list in /proc/<pid>/fd, and found two sockets where I previously found the files mentioned in comment 6.

Anyway, I wrote a patch for the test-case that allows us reproduce the failure reliably, also on x86-64-linux.
Comment 9 Tom de Vries 2024-01-04 11:21:41 UTC
Created attachment 15281 [details]
Trigger patch
Comment 10 Tom de Vries 2024-01-08 16:29:20 UTC
Not sure if this is the right place or way, but this fixes the FAIL:
...
diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c
index 1430ff89fa7..0d97ec1ba04 100644
--- a/gdb/linux-fork.c
+++ b/gdb/linux-fork.c
@@ -228,6 +228,8 @@ fork_load_infrun_state (struct fork_info *fp)
   inferior_thread ()->set_stop_pc
     (regcache_read_pc (get_thread_regcache (inferior_thread ())));
   nullify_last_target_wait_ptid ();
+  inferior_thread ()->set_executing (false);
+  inferior_thread ()->set_resumed (false);
 
   /* Now restore the file positions of open file descriptors.  */
   if (fp->filepos)
...
Comment 11 Kevin Buettner 2024-01-08 19:05:53 UTC
(In reply to Tom de Vries from comment #10)
> Not sure if this is the right place or way, but this fixes the FAIL:
> ...
> diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c
> index 1430ff89fa7..0d97ec1ba04 100644
> --- a/gdb/linux-fork.c
> +++ b/gdb/linux-fork.c
> @@ -228,6 +228,8 @@ fork_load_infrun_state (struct fork_info *fp)
>    inferior_thread ()->set_stop_pc
>      (regcache_read_pc (get_thread_regcache (inferior_thread ())));
>    nullify_last_target_wait_ptid ();
> +  inferior_thread ()->set_executing (false);
> +  inferior_thread ()->set_resumed (false);
>  
>    /* Now restore the file positions of open file descriptors.  */
>    if (fp->filepos)
> ...

There are multiple problems with linux-fork.c that need to be addressed
at some point, but your patch looks reasonable to me.
Comment 12 Tom de Vries 2024-01-09 16:57:23 UTC
(In reply to Kevin Buettner from comment #11)
> (In reply to Tom de Vries from comment #10)
> > Not sure if this is the right place or way, but this fixes the FAIL:
> > ...
> > diff --git a/gdb/linux-fork.c b/gdb/linux-fork.c
> > index 1430ff89fa7..0d97ec1ba04 100644
> > --- a/gdb/linux-fork.c
> > +++ b/gdb/linux-fork.c
> > @@ -228,6 +228,8 @@ fork_load_infrun_state (struct fork_info *fp)
> >    inferior_thread ()->set_stop_pc
> >      (regcache_read_pc (get_thread_regcache (inferior_thread ())));
> >    nullify_last_target_wait_ptid ();
> > +  inferior_thread ()->set_executing (false);
> > +  inferior_thread ()->set_resumed (false);
> >  
> >    /* Now restore the file positions of open file descriptors.  */
> >    if (fp->filepos)
> > ...
> 
> There are multiple problems with linux-fork.c that need to be addressed
> at some point, but your patch looks reasonable to me.

Thanks for the pre-submission review, submitted ( https://sourceware.org/pipermail/gdb-patches/2024-January/205754.html ).
Comment 13 Sourceware Commits 2024-01-11 09:12:15 UTC
The master branch has been updated by Tom de Vries <vries@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=4ece39c56cfdd5647d4061f3c084b9de6f9e443c

commit 4ece39c56cfdd5647d4061f3c084b9de6f9e443c
Author: Tom de Vries <tdevries@suse.de>
Date:   Thu Jan 11 10:12:48 2024 +0100

    [gdb/testsuite] Extend gdb.base/kill-during-detach.exp
    
    I ran into the following FAIL:
    ...
    (gdb) python kill_and_detach()^M
    Traceback (most recent call last):^M
      File "<string>", line 1, in <module>^M
      File "<string>", line 7, in kill_and_detach^M
    gdb.error: Selected thread is running.^M
    Error while executing Python code.^M
    (gdb) FAIL: gdb.base/kill-during-detach.exp: exit_p=true: checkpoint_p=true: \
      python kill_and_detach()
    ...
    
    The FAIL happens as follows:
    - gdb is debugging a process A
    - a checkpoint is created, in other words, fork is called in the inferior,
      after which we have:
      - checkpoint 0 (the fork parent, process A), and
      - checkpoint 1 (the fork child, process B).
    - during checkpoint creation, lseek is called in the inferior (process A) for
      all file descriptors, and it returns != -1 for at least one file descriptor.
    - the process A continues in the background
    - gdb detaches, from process A
    - gdb switches to process B, in other words, it restarts checkpoint 1
    - while restarting checkpoint 1, gdb tries to call lseek in the inferior
      (process B), but this fails because gdb incorrectly thinks that inferior B
      is running.
    
    This happens because linux_nat_switch_fork patches the pid of process B into
    the current inferior and current thread which where originally representing
    process A.  So, because process A was running in the background, the
    thread_info fields executing and resumed are set accordingly, but they are not
    correct for process B.
    
    There's a line in fork_load_infrun_state that fixes up the thread_info field
    stop_pc, so fix this by adding similar fixups for the executing and resumed
    fields alongside.
    
    The FAIL did not always reproduce, so extend the test-case to reliably
    trigger this scenario.
    
    Tested on x86_64-linux.
    
    Approved-By: Kevin Buettner <kevinb@redhat.com>
    
    PR gdb/31203
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31203
Comment 14 Tom de Vries 2024-01-11 09:13:45 UTC
Fixed.
Comment 15 Tom de Vries 2024-01-11 09:15:17 UTC
(In reply to Tom de Vries from comment #0)
> On an aarch64-linux with gdb 13.2.1, I run into:

Hmm, that should have been gcc, not gdb.