[PATCH 02/23] Don't rely on inferior_ptid in record_full_wait

Sat Aug 1 16:14:36 GMT 2020

On 2020-07-30 11:17 p.m., Tom Tromey wrote:
>>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:
> 
> Pedro> The multi-target patch sets inferior_ptid to null_ptid before handling
> Pedro> a target event, and thus before calling target_wait, in order to catch
> Pedro> places in target_ops::wait implementations that are incorrectly
> Pedro> relying on inferior_ptid (which could otherwise be a ptid of a
> Pedro> different target, for example).  That caught this instance in
> Pedro> record-full.c.
> 
> I found a few target_ops::wait implementations doing "return
> inferior_ptid" in error cases.  Based on this comment, and the comment
> for target_wait, I suspect these should actually return minus_one_ptid
> instead.
> 
> I've appended the patch so you can see what it looks like.  I haven't
> tried it at all.  Does this seem correct?

I think you are right that it's incorrect, but...

> diff --git a/gdb/inf-ptrace.c b/gdb/inf-ptrace.c
> index ae0b0f7ff0d..2cae87023f9 100644
> --- a/gdb/inf-ptrace.c
> +++ b/gdb/inf-ptrace.c
> @@ -343,7 +343,7 @@ inf_ptrace_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
>  	  /* Claim it exited with unknown signal.  */
>  	  ourstatus->kind = TARGET_WAITKIND_SIGNALLED;
>  	  ourstatus->value.sig = GDB_SIGNAL_UNKNOWN;
> -	  return inferior_ptid;
> +	  return minus_one_ptid;
>  	}

I don't know if this makes sense: you tell the core of GDB that some
process was signalled (terminated by a signal), but you don't tell
what process it is.  I am not sure what the core of GDB can do with
this information.

The contract that the `wait` implementations must respect is not
very well documented I believe, so it's not clear which value is
valid with which event.  But I expect that tying a
TARGET_WAITKIND_SIGNALLED event with a minus_one_ptid (or null_ptid
for that matter) is invalid, and will just break down later in
handle_inferior_event, where we handle this kind of event:

    case TARGET_WAITKIND_EXITED:
    case TARGET_WAITKIND_SIGNALLED:
      {
	/* Depending on the system, ecs->ptid may point to a thread or
	   to a process.  On some targets, target_mourn_inferior may
	   need to have access to the just-exited thread.  That is the
	   case of GNU/Linux's "checkpoint" support, for example.
	   Call the switch_to_xxx routine as appropriate.  */
	thread_info *thr = find_thread_ptid (ecs->target, ecs->ptid);
	if (thr != nullptr)
	  switch_to_thread (thr);
	else
	  {
	    inferior *inf = find_inferior_ptid (ecs->target, ecs->ptid);
	    switch_to_inferior_no_thread (inf);
	  }

find_thread_ptid will return nullptr, so we'll go in the else.
find_inferior_ptid will return nullptr, which we'll pass to
switch_to_inferior_no_thread, and it will assert somewhere in there.

Note that this waitpid call is blocking (and there's no async stuff in
inf-ptrace.c), so I presume that this is only used in sync targets.

If GDB asked to wait for a specific (non-minus_one) ptid, waitpid returns
-1 and errno is ECHILD, it means that the process GDB is thinking about
doesn't exist.  Something is wrong, we missed an event or something.  I
suppose that the original intention of pretending the process was
terminated by an unknown signal was to at least make the debugger stop.
If we returned TARGET_WAITKIND_IGNORE, we would keep waiting for an event
that will never arrive, probably in an infinite busy loop of calling
waitpid and returning TARGET_WAITKIND_IGNORE.  We could keep doing that
by returning the ptid that GDB passed as an argument to wait.

But if GDB asked us to wait for minus_one_ptid, then we can't really do that.
If we return TARGET_WAITKIND_IGNORE, we'll probably get into an infinite loop
as described above.

Perhaps we could return TARGET_WAITKIND_NO_RESUMED in both cases?

Simon