[PATCH v2 5/6] gdb, infrun: fix silent inferior switch in do_target_wait()
Andrew Burgess
aburgess@redhat.com
Wed Oct 23 11:52:04 GMT 2024
"Metzger, Markus T" <markus.t.metzger@intel.com> writes:
> Hello Andrew,
>
>>> In do_target_wait(), we iterate over inferiors and call
>>> do_target_wait_1(), which eventually calls target_wait() per inferior.
>>> Each time, we wait for minus_one_ptid.
>>
>>Could you expand on this a little more please as I don't understand what
>>you are telling me, and I can't align this explanation with the code.
>
> This used to be minus_one_ptid until you changed it (back) to wait_ptid in
> 07505b613a0605ef42a004364ab8ccb70fd3c8eb.
>
> There is a single call of that function that passes minus_one_ptid outside
> of condition evaluation:
>
> /* If the thread is in the middle of the condition evaluation, wait for
> an event from the current thread. Otherwise, wait for an event from
> any thread. */
> ptid_t waiton_ptid = in_cond_eval ? inferior_ptid : minus_one_ptid;
>
> if (!do_target_wait (waiton_ptid, &ecs, TARGET_WNOHANG))
> {
> infrun_debug_printf ("do_target_wait returned no event");
> disable_commit_resumed.reset_and_commit ();
> return;
> }
>
>>So it's not clear to me what you mean when you say 'Each time, we wait
>>for minus_one_ptid'. Maybe you mean elsewhere in GDB?
>
> I should say 'outside of condition evaluation, we wait for minus_one_ptid'.
>
>
>>> In some cases, e.g. gdb.threads/detach-step-over.exp, we ask to wait for
>>> one inferior, and get an event from a different inferior back without
>>> noticing the inferior switch.
>>
>>The way you mention detach-step-over.exp it seems like you think GDB is
>>doing something wrong. But the test fully passes for me without this
>>patch, and you don't adjust the expected output. So I'm guessing that
>>nothing changes for that test. But that seems at odds with the say you
>>mention the test; my reading of the above is that GDB is doing something
>>wrong. Could you clarify what's going on please.
>
> It is definitely wrong to go through all the effort of randomly selecting an inferior
> and then iterating over inferiors one-by-one if we then allow switching to a different
> inferior in the actual wait call.
>
>
>>I see that without this patch you do get some failures in the next patch
>>of this series, and my next job is to debug those failures in order to
>>understand this patch more...
>
> The next patch organizes the replay/record state per inferior. A single target
> may have inferiors that are currently recording and inferiors that are currently
> replaying. We need to wait for individual inferiors since they are handled by
> different target methods: the replaying ones are handled by record targets and
> the recording ones are handled by the target beneath the record target.
>
> We could handle this in the record target by splitting forwarding the request
> for some threads and handling it itself for others.
>
> Since the infrun do_target_wait() code obviously intends to do the wait per
> inferior and just forgets to adjust the wait focus to one inferior, fixing that
> seemed to be the better approach to me.
I agree with you. But I do have one concern. As part of the next
commit I think you should add:
gdb_assert (ptid != minus_one_ptid);
to the start of `record_btrace_target::wait`. For the reasons you
explain, this function can't currently wait on all inferiors, if one
inferior requires handling in record_btrace_target while another
inferior requires handling in the target beneath.
Which leads to my concern. With that assert in place we are, I claim,
making the promise that target_ops::wait() will _never_ be called with
minus_one_ptid, but this code:
/* Make sure we're not widening WAIT_PTID. */
if (!ptid.matches (wait_ptid)
/* Targets that cannot async will be asked for a blocking wait.
Blocking wait does not work inferior-by-inferior if the target
provides more than one inferior. Fall back to waiting for
WAIT_PTID in that case. */
|| !target_can_async_p () || ((options & TARGET_WNOHANG) == 0)
/* FIXME: I don't see why we should have inferiors with zero pid,
which indicates that the respective ptid is not a process.
They do exist, though, and we cannot wait for them. */
|| !ptid.is_pid ())
ptid = wait_ptid;
>From your branch, in infrun.c, would, if I understand it correctly, mean
that ptid could be set to minus_one_ptid in some cases.
So my concern here is: are there situations where
record_btrace_target::wait can end up being called with minus_one_ptid?
Lets be clear, I totally agree with you that for the case where
TARGET_WNOHANG is set, it makes more sense that we wait on each inferior
in turn. But I wonder if we should still be changing
record_btrace_target::wait so that it can handle minus_one_ptid?
I applied this HACK:
diff --git a/gdb/record-btrace.c b/gdb/record-btrace.c
index 7937bb3f8db..367225c3320 100644
--- a/gdb/record-btrace.c
+++ b/gdb/record-btrace.c
@@ -2637,6 +2637,13 @@ record_btrace_target::wait (ptid_t ptid, struct target_waitstatus *status,
if (moving.empty ())
{
+ if ((::execution_direction != EXEC_REVERSE)
+ && ptid == minus_one_ptid)
+ {
+ infrun_debug_printf ("forwarding the wait to target beneath");
+ return this->beneath ()->wait (ptid, status, options);
+ }
+
*status = btrace_step_no_moving_threads ();
DEBUG ("wait ended by %s: %s", null_ptid.to_string ().c_str (),
and revert patch 5/6 in your series, and all of your new tests still
pass. I don't think the above is the correct way to change ::wait, but
it does hint that it might be possible to support minus_one_ptid.
As I said, I do agree with you that do_target_wait could be improved,
but I still need convincing that record_btrace_target::wait is OK as is.
Thanks,
Andrew
>
>
>>... but is there any chance we could write a test that exposes an issue
>>that is fixed by this patch, that could be added just with this patch?
>>
>>> ---
>>> gdb/infrun.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
>>> gdb/linux-nat.c | 17 ++++++++++++-----
>>> gdb/remote.c | 22 +++++++++++++++++-----
>>> 3 files changed, 72 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/gdb/infrun.c b/gdb/infrun.c
>>> index a5030b16376..9ca0571065c 100644
>>> --- a/gdb/infrun.c
>>> +++ b/gdb/infrun.c
>>> @@ -4198,7 +4198,23 @@ do_target_wait (ptid_t wait_ptid,
>>execution_control_state *ecs,
>>>
>>> auto do_wait = [&] (inferior *inf)
>>> {
>>> - ecs->ptid = do_target_wait_1 (inf, wait_ptid, &ecs->ws, options);
>>> + ptid_t ptid { inf->pid };
>>> +
>>> + /* Make sure we're not widening WAIT_PTID. */
>>> + if (!ptid.matches (wait_ptid)
>>
>>Sorry, I just don't understand what's going on here.
>>
>>If wait_ptid is minus_one_ptid then ptid will be for pid -1, and the
>>ptid.matches() call will return true. But a ptid_t initialised with
>>only a pid of -1 is equal to minus_one_ptid, right?
>
> Right, but this ptid is initialized to inf->pid, not to wait_ptid.pid ().
>
>
>>If wait_ptid is for a process only, with no thread, then ptid ==
>>wait_ptid, and ptid.matches() returns true.
>>
>>If wait_ptid is for a thread then ptid.matches() returns false, and we
>>revert to using wait_ptid.
>>
>>So, if my understanding is correct, ptid always ends up equal to
>>wait_ptid.
>>
>>
>>> + /* Targets that cannot async will be asked for a blocking wait.
>>> +
>>> + Blocking wait does not work inferior-by-inferior if the target
>>> + provides more than one inferior. Fall back to waiting for
>>> + WAIT_PTID in that case. */
>>> + || !target_can_async_p () || ((options & TARGET_WNOHANG) == 0)
>>> + /* FIXME: I don't see why we should have inferiors with zero pid,
>>> + which indicates that the respective ptid is not a process.
>>> + They do exist, though, and we cannot wait for them. */
>>
>>I'm not a fan of FIXME comments that don't lay out a clear path for how
>>to fix the issue. e.g. "FIXME: This can't be changed until GCC 33.1 is
>>released with support for DWARF8" or some such.
>>
>>I cannot offer much insight here, but you do say "They do exist, though,
>>..." so you've clearly tracked down where they come from. So why not
>>share those details. Better yet, rewrite the comment as an explanation:
>>
>> "Process with a zero pid are created by ...... Unfortunately we cannot
>> wait for a zero pid so revert to using the full wait_ptid."
>
> One such inferior is created by initialize_inferiors ():
> set_current_inferior (add_inferior_silent (0));
>
> It is strange that the ptid for such an inferior says that this isn't a process.
>
> I changed the comment to:
> /* We cannot wait for inferiors without a pid.
>
> One such inferior is created by initialize_inferiors () to
> ensure that there always is an inferior. */
>
>
>>> + || !ptid.is_pid ())
>>> + ptid = wait_ptid;
>>> +
>>> + ecs->ptid = do_target_wait_1 (inf, ptid, &ecs->ws, options);
>>> ecs->target = inf->process_target ();
>>> return (ecs->ws.kind () != TARGET_WAITKIND_IGNORE);
>>> };
>>> @@ -4208,6 +4224,12 @@ do_target_wait (ptid_t wait_ptid,
>>execution_control_state *ecs,
>>> reported the stop to the user, polling for events. */
>>> scoped_restore_current_thread restore_thread;
>>>
>>> + /* The first TARGET_WAITKIND_NO_RESUMED execution state.
>>> +
>>> + If we do not find a more interesting event, we will report that. */
>>
>>I think this comment needs rewording. I had to study the code before
>>this made sense to me. By default the no_resumed object's `ws` member
>>will be TARGET_WAITKIND_IGNORE, so initially I didn't understand what
>>the TARGET_WAITKIND_NO_RESUMED comment was going on about.
>>
>>I think the comment should explain how the object will be copied into
>>later on when we find a TARGET_WAITKIND_NO_RESUMED result.
>
> Yeah, this is another side-effect of not adjusting the wait scope. If you're
> waiting for minus_one_ptid, the target will check all its inferiors and only
> return TARGET_WAITKIND_NO_RESUMED if no other thread has anything
> to report.
>
> It's still wrong in the case of multi-target, where we could have an event on
> some other target, yet we still return the TARGET_WAITKIND_NO_RESUMED
> from the first target we checked.
>
> I changed the comment to:
> /* The first TARGET_WAITKIND_NO_RESUMED execution state.
>
> We do not want to return TARGET_WAITKIND_NO_RESUMED right away since
> another inferior may have a more interesting event to report. If
> there is no other event to report, after all, we want to report the
> first such event.
>
> This variable holds that first event, which will be copied on the
> first TARGET_WAITKIND_NO_RESUMED below. */
>
>
>>I'm continuing to review this patch, but answers to some of the above
>>(and a test?) would help a lot.
>
> Here come answers to the above. I don't have a test that exploits this
> inferior switching. I noticed it when I debugged gdb.threads/detach-step-over.exp.
> I don't remember why exactly I debugged it. There were fails, of course, but
> they were not related to the inferior switching.
>
> Thanks,
> Markus.
>
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0, www.intel.de
> Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928
More information about the Gdb-patches
mailing list