Bug 30574 - hang when using remote target, schedule-multiple, follow-fork-mode child, and stepping over a vfork
Summary: hang when using remote target, schedule-multiple, follow-fork-mode child, and...
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: server (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-21 21:01 UTC by Andrew Burgess
Modified: 2023-07-17 08:52 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
reproducer for this issue (811 bytes, application/x-bzip)
2023-06-21 21:01 UTC, Andrew Burgess
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Burgess 2023-06-21 21:01:10 UTC
Created attachment 14941 [details]
reproducer for this issue

When using a remote target and stepping over a vfork with follow-fork-mode set to child, and schedule-multiple set to on, GDBServer will hang, which in turn causes GDB to hang.

To reproduce the issue, download the atteached remote-vfork-bug.tar.bz2, and then:

  $ tar -xf remote-vfork-bug.tar.bz2
  $ cd remote-vfork-bug/
  $ make GDB="/path/to/gdb"
  ... snip lots of output ...
  Breakpoint 1, main (argc=1, argv=0x7fffffffacb8) at test.c:11
  11	  pid = vfork (); /* VFORK */
  [Attaching after Thread 2182093.2182093 vfork to child Thread 2182094.2182094]
  [New inferior 2 (process 2182094)]

At which point GDB hangs.

The problem appears to be that after the VFORK GDB resumes the vfork-child process, but, due to schedule-multiple being on, this results in GDB sending a ptid of -1 to gdbserver, this means resume everything.

I think this bit is fine, this is what happens with e.g. the linux-nat layer within GDB, but in the linux-nat layer when we action the -1 ptid we loop over all threads and resume them, except we spot that one thread is a vfork-parent and specifically skip resuming that thread.

In contrast gdbserver doesn't skip the vfork-parent, and instead resumes the vfork-parent.

While the vfork-child is running the kernel will hold the vfork parent stopped.

When the vfork-child execs gdbserver sees the event and tries to stop all resumed threads, as the vfork-parent is resumed gdbserver sends it a SIGSTOP and then waits for the thread to report a stop .... and it is this stop that never arrives, and this is where we seem to hang.

Some of the details in the above might be a little off, I've only given this a quick review before filing this bug.
Comment 1 Sourceware Commits 2023-07-17 08:52:08 UTC
The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=a068d1a6b2dd982e1019bc265610f07bb8adff94

commit a068d1a6b2dd982e1019bc265610f07bb8adff94
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Wed Jun 21 14:19:27 2023 +0100

    gdb/testsuite: expand gdb.base/foll-vfork.exp
    
    This commit provides tests for all of the bugs fixed in the previous
    four commits, this is achieved by expanding gdb.base/foll-vfork.exp to
    run with different configurations:
    
      * target-non-stop on/off
      * non-stop on/off
      * schedule-multiple on/off
    
    We don't test with schedule-multiple on if we are using a remote
    target, this is due to bug gdb/30574.  I've added a link to that bug
    in this commit, but all this commit does is expose that bug, there's
    no fixes here.
    
    Some of the bugs fixed in the previous commits are very timing
    dependent, as such, they don't always show up.  I've had more success
    when running this test on a very loaded machine -- I usually run ~8
    copies of the test in parallel, then the bugs would normally show up
    pretty quickly.
    
    Other than running the test in more configurations, I've not made any
    changes to what is actually being tested, other than in one place
    where, when testing with non-stop mode, GDB stops in a different
    inferior, as such I had to add a new 'inferior 2' call, this can be
    found in vfork_relations_in_info_inferiors.
    
    I have cleaned things up a little, for example, making use of
    proc_with_prefix to remove some with_test_prefix calls.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30574