[PATCH] PR threads/20743: Don't attempt to suspend or resume exited threads.
John Baldwin
jhb@freebsd.org
Tue Dec 27 21:03:00 GMT 2016
On Tuesday, December 27, 2016 05:43:29 PM Vasil Dimov wrote:
> On Fri, Dec 23, 2016 at 15:43:19 -0600, Luis Machado wrote:
> > On 12/23/2016 03:28 PM, John Baldwin wrote:
> > > When resuming a native FreeBSD process, ignore exited threads when
> > > suspending/resuming individual threads prior to continuing the process.
> > >
> > > gdb/ChangeLog:
> > >
> > > PR threads/20743
> > > * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads.
> > > (resume_all_threads_cb): Likewise.
> > > (fbsd_resume): Assert resuming thread has not exited.
> > > ---
> > > gdb/ChangeLog | 7 +++++++
> > > gdb/fbsd-nat.c | 7 +++++++
> > > 2 files changed, 14 insertions(+)
> > >
> > > diff --git a/gdb/ChangeLog b/gdb/ChangeLog
> > > index db6e913..4fb3732 100644
> > > --- a/gdb/ChangeLog
> > > +++ b/gdb/ChangeLog
> > > @@ -1,3 +1,10 @@
> > > +2016-12-23 John Baldwin <jhb@FreeBSD.org>
> > > +
> > > + PR threads/20743
> > > + * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads.
> > > + (resume_all_threads_cb): Likewise.
> > > + (fbsd_resume): Assert resuming thread has not exited.
> > > +
> > > 2016-12-22 Doug Evans <xdje42@gmail.com>
> > >
> > > * infrun.c (set_step_over_info): Add comment.
> > > diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c
> > > index ade62f1..7cd08c6 100644
> > > --- a/gdb/fbsd-nat.c
> > > +++ b/gdb/fbsd-nat.c
> > > @@ -662,6 +662,9 @@ resume_one_thread_cb (struct thread_info *tp, void *data)
> > > if (ptid_get_pid (tp->ptid) != ptid_get_pid (*ptid))
> > > return 0;
> > >
> > > + if (is_exited (tp->ptid))
> > > + return 0;
> > > +
> > > if (ptid_get_lwp (tp->ptid) == ptid_get_lwp (*ptid))
> > > request = PT_RESUME;
> > > else
> > > @@ -680,6 +683,9 @@ resume_all_threads_cb (struct thread_info *tp, void *data)
> > > if (!ptid_match (tp->ptid, *filter))
> > > return 0;
> > >
> > > + if (is_exited (tp->ptid))
> > > + return 0;
> > > +
> > > if (ptrace (PT_RESUME, ptid_get_lwp (tp->ptid), NULL, 0) == -1)
> > > perror_with_name (("ptrace"));
> > > return 0;
> > > @@ -711,6 +717,7 @@ fbsd_resume (struct target_ops *ops,
> > > if (ptid_lwp_p (ptid))
> > > {
> > > /* If ptid is a specific LWP, suspend all other LWPs in the process. */
> > > + gdb_assert (!is_exited (ptid));
> >
> > If we're asserting on this (since supposedly it shouldn't happen), do we
> > need to check for is_exited on the two functions above?
> >
> > Also, is there a reason why we're not detecting a thread that has
> > exited? Aren't all threads stopped at this point (for all-stop mode at
> > least)?
> [...]
>
> Hello,
>
> I just nailed this down after it has been annoying me for some time,
> fixed it with a similar patch as the one submitted by John, and came
> here to report it.
>
> The reason that we are "not detecting" an exited thread (at least in the
> scenario I got is), gdb/thread.c:
>
> --- cut ---
> static void
> delete_thread_1 (ptid_t ptid, int silent)
> {
> ...
> /* If this is the current thread, or there's code out there that
> relies on it existing (refcount > 0) we can't delete yet. Mark
> it as exited, and notify it. */
> if (tp->refcount > 0
> || ptid_equal (tp->ptid, inferior_ptid))
> {
> ...
> /* Will be really deleted some other time. */
> printf_unfiltered ("========== Will be really deleted some other time %u\n", ptid);
> return;
> }
> ...
> if (tpprev)
> tpprev->next = tp->next;
> else
> thread_list = tp->next;
> --- cut ---
>
> In my scenario tp->refcount is 0, but
> "ptid_equal (tp->ptid, inferior_ptid)" is true, so the thread's entry is
> not removed from the global "threads_list".
>
> The gdb output (with "set debug fbsd-lwp" enabled):
>
> --- cut ---
> FLWP: adding thread for LWP 102009
> [New LWP 102009 of process 40304]
> FLWP: fbsd_resume for ptid (-1, 0, 0)
> FLWP: fbsd_resume for ptid (40304, 102009, 0)
> FLWP: fbsd_resume for ptid (-1, 0, 0)
> FLWP: fbsd_resume for ptid (40304, 102009, 0)
> FLWP: fbsd_resume for ptid (-1, 0, 0)
> FLWP: deleting thread for LWP 102009
> [LWP 102009 of process 40304 exited]
> ...
> ptrace: No such process.
> --- cut ---
>
> Hope this helps.
In particular, the sequence of events is this:
- an LWP (T1) reports a "normal" event (in the test case it is hitting a
breakpoint). This is reported to the core and sets the current thread
(and thus inferior_ptid) to T1.
- the same LWP (T1) then exits and a thread exit event is reported via
ptrace() to the native target. The native target calls delete_thread,
but the thread is not removed, just marked EXITING since it ==
inferior_ptid as Vasil noted. The native target just
continues the process explicitly via ptrace() without reporting any
event to the core aside from the call to delete_thread().
- some other LWP (T2) reports an event (in the test case it is a
breakpoint).
- the user continues which invokes fbsd_resume() which wants to resume
all threads. Here iterate_over_threads() in fbsd_resume() will
encounters the exited thread for T1 since nothing has called
thread_update_list() (which would invoke delete_exited_threads() from
fbsd_update_thread_list()). Since the thread is exited, trying to
manipulate it via ptrace() results in an error.
I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS
instead of explicitly continuing the process, but that doesn't help, and it
means that the ptid being returned is still T1 in that case.
I'm not sure if I should explicitly be calling delete_exited_threads() in
fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume()
could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't
clear to me which of these is preferred since both are in use).
I added the assertion for my own sanity. I suspect gdb should never try to
invoke target_resume() with a ptid of an exited thread, but if for some
reason it did the effect on FreeBSD would be a hang since we would suspend
all the other threads and when the process was continued via PT_CONTINUE it
would have nothing to do and would never return from wait(). I'd rather have
gdb fail an assertion in that case rather than hang.
--
John Baldwin
More information about the Gdb-patches
mailing list