[PATCH v5 3/5] gdb/remote: do not delete a thread if it has a pending event

Tue Apr 21 18:54:06 GMT 2020

On 4/20/20 9:35 PM, Aktemur, Tankut Baris wrote:
> On Thursday, April 16, 2020 6:24 PM, Pedro Alves wrote:
>> On 4/6/20 4:45 PM, Tankut Baris Aktemur via Gdb-patches wrote:
>>> gdb/ChangeLog:
>>> 2020-04-06  Tankut Baris Aktemur  <tankut.baris.aktemur@intel.com>
>>>
>>> 	* remote.c (remote_target::update_thread_list): Do not delete
>>> 	a thread if it has a pending event.
>>> ---
>>>  gdb/remote.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/gdb/remote.c b/gdb/remote.c
>>> index bfbc0bc21d3..12ac7cb9862 100644
>>> --- a/gdb/remote.c
>>> +++ b/gdb/remote.c
>>> @@ -3821,6 +3821,9 @@ remote_target::update_thread_list ()
>>>  	  if (tp->inf->process_target () != this)
>>>  	    continue;
>>>
>>> +	  if (tp->suspend.waitstatus_pending_p)
>>> +	    continue;
>>> +
>>
>> This patch makes me nervous.  :-/
>>
>> Why doesn't prune_threads, via remote_target::thread_alive end
>> up removing the thread anyway?
> 
> As far as I see, prune_threads is called only if gdbserver does not
> send a thread list.  The comment in the function suggests that this
> could be the case for older servers.  I'm not sure how old it should
> be, though, for me to be able to test it.
>  
>> I think it'd be better to extend the check to also check for
>> TARGET_WAITKIND_EXITED/TARGET_WAITKIND_SIGNALLED.  Let threads
>> with other kinds of events pending be deleted.
>>
>> Other targets are likely to need something like this, but I'm
>> OK with going with remote-specific test to start with, so we
>> can all this in, including the testcase, and then work on
>> something more generic if we find a need.
>>
>> In any case, please add some comments.
> 
> First I did this change, but when I started working on Patch #5, interesting
> things have come up.
> 
> Having both of the inferiors on a single extended-remote target is supposed
> to trigger the same stop-all-threads scenario that this series attempts to fix.
> But it turned out to be slightly different.
> 
> The 'update_thread_list' at the beginning of the stop_all_threads pass
> removes all the threads because the remote target sends an empty list of threads
> (well both inferiors exited and no thread remained).  So, before we even attempt
> to stop threads, they are already gone when the list is updated.  The problem is,
> inferior 2 (i.e. the inferior whose thread we would attempt to stop but would
> receive an EXITED waitstatus for) is then left in a seemingly live state (pid != 0),
> but with no threads.  Hence, we cannot even switch to it after inferior 1's
> exit event is shown.
> 
> Therefore I updated this patch as below.  Although no regression was detected,
> I don't feel much safe with this new version.  Comments welcome.

Hmm.  The current_inferior() check seems like a red flag, IMO.
Why is that needed?

How does it happen that all threads are deleted?  I've seen
this before, but I'll need to refresh my memory.  Let's see.

We have this in gdbserver/linux-low.cc:

/* Return true if LWP has exited already, and has a pending exit event
   to report to GDB.  */

static int
lwp_is_marked_dead (struct lwp_info *lwp)
{
  return (lwp->status_pending_p
	  && (WIFEXITED (lwp->status_pending)
	      || WIFSIGNALED (lwp->status_pending)));
}

/* Return true if the given thread is still alive.  */

bool
linux_process_target::thread_alive (ptid_t ptid)
{
  struct lwp_info *lwp = find_lwp_pid (ptid);

  /* We assume we always know if a thread exits.  If a whole process
     exited but we still haven't been able to report it to GDB, we'll
     hold on to the last lwp of the dead process.  */
  if (lwp != NULL)
    return !lwp_is_marked_dead (lwp);
  else
    return 0;
}

If gdbserver deleted all threads, it seems like that would
mean that the linux-low.cc backend has already reported the
pending exit event to gdbserver's core, and the process exit event
is thus guaranteed to already be in transit, somewhere, maybe
queued in gdbserver/server.cc's notif_stop queue, or maybe even
already in gdb, but queued in gdb/remote.c's stop_reply_queue queue,
or at whatever other level we queue events.

As I mentioned, I've ran into something like this before.
See infrun.c:handle_no_resumed where it reads:

  /* Note however that we may find no resumed thread because the whole
     process exited meanwhile (thus updating the thread list results
     in an empty thread list).  In this case we know we'll be getting
     a process exit event shortly.  */

Maybe we should make sure that stop_all_threads ends up in wait_one
if the inferior has a pid != 0, and no threads?

If we still need to keep one thread in the inferior's thread list,
it feels like that should be handled by common code, like e.g.,
from within delete_thread.  

But if we do that, then we don't need to change stop_all_threads.
And that bit I quoted in handle_no_resumed becomes dead, and
seems like could trick gdb into reporting a TARGET_WAITKIND_NO_RESUMED
to the user.  :-/

It also feels like we're missing a thread state in enum thread_state,
like a THREAD_ZOMBIE state, for a thread that has exited, but that
should still be listed, and only exists because it needs to be waited
on to consume the exit event.  We can probably do without it, though
I've been having this feeling for a long while.  Maybe
THREAD_EXITED + pending status would be the same thing.

Or maybe we should be able to select an inferior and resume it
even if it has no threads but still has pid != 0, which indicates
that the inferior has execution and thus has a pending exit wait
status to collect.  That would go along with storing that
pending exit waitstatus on a separate list, instead of storing
it on a thread.  That might be tougher to hack in.  Not sure it's
worth it if we can do without it.  All the continue/resume code
in infcmd.c/infrun.c assumes there's a current thread.

Thanks,
Pedro Alves