[PATCH] gdb/linux-nat: use LP's inferior when handling vfork done in linux_handle_extended_wait
Simon Marchi
simon.marchi@polymtl.ca
Wed Aug 18 21:05:17 GMT 2021
On 2021-08-12 11:48 a.m., Simon Marchi wrote:
> I spotted this bug by reading the code and subsequently wrote a test to
> reproduce it. The bug is caught by the assertions that are added.
> Otherwise the bug wouldn't cause a visible problem, but GDB would still
> be in a wrong state.
>
> When receiving a PTRACE_EVENT_VFORK_DONE, we do:
>
> if (event == PTRACE_EVENT_VFORK_DONE)
> {
> if (current_inferior ()->waiting_for_vfork_done)
> {
> linux_nat_debug_printf
> ("Got expected PTRACE_EVENT_VFORK_DONE from LWP %ld: stopping",
> lp->ptid.lwp ());
>
> ourstatus->kind = TARGET_WAITKIND_VFORK_DONE;
> return 0;
> }
>
> linux_nat_debug_printf
> ("Got PTRACE_EVENT_VFORK_DONE from LWP %ld: ignoring", lp->ptid.lwp ());
>
> return 1;
> }
>
> However, the current inferior may not be the inferior for which we have
> received the event, it could be another inferior of the linux-nat
> target. The current inferior is set in do_target_wait_1 before calling
> target_wait, so that target_wait hits the target stack we want. And
> there isn't anything making LP's inferior current between pulling the
> event out of the kernel and that code that handles
> PTRACE_EVENT_VFORK_DONE.
>
> Let's say that inferior 1 is waiting for a vfork done event while
> inferior 2 is executing, doing things not vfork-related. Inferior 2 is
> chosen by do_target_wait and made the current one prior to calling
> target_wait. We pull the PTRACE_EVENT_VFORK_DONE event from the kernel
> for inferior 1. We check the current_inferior (inferior 2)'s
> waiting_for_vfork_done flag. It is false, so we (wrongfully) ignore the
> event that was meant for inferior 1.
>
> I thought that this would end up in inferior 1 getting stuck, waiting
> for the event forever. But we actually happen to attempt to resume
> inferior 1 at various points (like in resume_stopped_resumed_lwps) after
> having pulled the events), so inferior 1 resumes execution and is not
> stuck. However, the inferior::waiting_for_vfork_done flag stays set for
> inferior 1, which is not a correct state. A correct execution would
> return TARGET_WAITKIND_VFORK_DONE to infrun for inferior 1, which would
> clear the flag. When processing TARGET_WAITKIND_VFORK_DONE, infrun also
> clears the program_space::breakpoints_not_allowed flag. That sounds
> important, and failing to do that might lead to some other bad behavior.
>
> To catch this bad state, add some assertions in handle_inferior_event.
> The idea is that if an inferior is a vfork parent, we expect that it
> can't report any events other than TARGET_WAITKIND_VFORK_DONE or
> TARGET_WAITKIND_SIGNALLED (which can happen if you kill -9 it during the
> vfork period). The test program does multiple consecutive vforks. If
> the bug explained above happens and waiting_for_vfork_done stays
> wrongfully set, the assertion will fail when a different event.
>
> If I run the test without the fix in linux-nat.c, I get:
>
> run^M
> Starting program: /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/vfork-multi-inferior/vfork-multi-inferior-vforker ^M
> [Detaching after vfork from child process 822537]^M
> /home/simark/src/binutils-gdb/gdb/infrun.c:5255: internal-error: void handle_inferior_event(execution_control_state*): Assertion `!inf->waiting_for_vfork_ done' failed.^M
>
> At some point during my investigation, I suspected the random inferior
> selection in do_target_wait to not work properly (always choosing one of
> the two inferiors), so I wrote the test to try with the vforker as
> inferior 1 and then as inferior 2. In the end, I don't think that's
> necessary, but I left the test as-is, as it will just test more
> combinations, and the test runs quickly anyway.
>
> Change-Id: Ie5b922d4f988d3fabf9f41fdeb83166da00af269
> ---
> gdb/infrun.c | 13 +++
> gdb/linux-nat.c | 4 +-
> .../gdb.base/vfork-multi-inferior-other.c | 12 ++
> .../gdb.base/vfork-multi-inferior-vforker.c | 24 ++++
> .../gdb.base/vfork-multi-inferior.exp | 107 ++++++++++++++++++
> 5 files changed, 159 insertions(+), 1 deletion(-)
> create mode 100644 gdb/testsuite/gdb.base/vfork-multi-inferior-other.c
> create mode 100644 gdb/testsuite/gdb.base/vfork-multi-inferior-vforker.c
> create mode 100644 gdb/testsuite/gdb.base/vfork-multi-inferior.exp
>
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 5ee650fa4645..87224f1e5096 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -5242,6 +5242,19 @@ handle_inferior_event (struct execution_control_state *ecs)
>
> mark_non_executing_threads (ecs->target, ecs->ptid, ecs->ws);
>
> + /* If an inferior is a vfork parent, we expect it to be stopped, the only
> + two possible outcomes for it is VFORK_DONE (when the vfork child exits
> + or execs) or SIGNALLED, if is is forcefully killed (e.g. kill -9) from
> + outside GDB. */
> + if (ecs->ws.kind != TARGET_WAITKIND_VFORK_DONE
> + && ecs->ws.kind != TARGET_WAITKIND_SIGNALLED)
> + {
> + inferior *inf = find_inferior_ptid (ecs->target, ecs->ptid);
> + gdb_assert (inf != nullptr);
> + gdb_assert (inf->vfork_child == nullptr);
> + gdb_assert (!inf->waiting_for_vfork_done);
> + }
New piece of information. Today I learned that when a multi-threaded
program vforks, only the vfork-ing thread is frozen until the vfork
child execs or exits (this is documented in the vfork man page on
Linux). Other threads in the vfork-ing process still run. So while a
thread is waiting for a vfork done event, other threads could in theory
report some other event.
But that leads to the question: is it really safe that the other threads
in the vfork-ing process stay running? During the window of time where
the vfork-child and parent share an address space, we remove the
breakpoints from the parent:
https://gitlab.com/gnutools/binutils-gdb/-/blob/master/gdb/infrun.c#L439-449
On non-stop targets, by keeping the other threads running, don't we risk
them missing a breakpoint?
On all-stop targets, when we resume the vfork parent (expecting to
receive a vfork done event, which is the sign where it's finally safe to
re-insert the breakpoints), we do resume all threads. That means the
other threads of the vfork parent run free without breakpoints inserted,
can't they also miss a breakpoint?
I'm thinking that when handling a vfork where we detach the child, the
safe thing to do would be:
- non-stop target: stop all threads of the interior, remove
breakpoints, resume only the vfork-ing thread, get the vfork done
event, re-insert the breakpoints, resume all threads meant to be
running
- all-stop target: all threads are naturally stopped when the vfork
event is reported, remove breakpoints, resume only the vfork-ing
thread, get the vfork done event, re-insert breakpoints, resume all
threads meant to be running
That's pretty much the same thing as when doing an in-line single-step.
Maybe that's actually what we do today, but I couldn't find signs of it.
In the mean time, I think that the fix in linux-nat.c (shown below) is
still right and would like to merge it, but I would remove the
assertions. But that means I wouldn't know how to write a test.
> +
> switch (ecs->ws.kind)
> {
> case TARGET_WAITKIND_LOADED:
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 211e447dc6f4..1b45f4e6c875 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -2027,7 +2027,9 @@ linux_handle_extended_wait (struct lwp_info *lp, int status)
>
> if (event == PTRACE_EVENT_VFORK_DONE)
> {
> - if (current_inferior ()->waiting_for_vfork_done)
> + inferior *inf = find_inferior_ptid (linux_target, lp->ptid);
> +
> + if (inf->waiting_for_vfork_done)
Simon
More information about the Gdb-patches
mailing list