This is the mail archive of the
mailing list for the GDB project.
Re: [PATCH] PR threads/20743: Don't attempt to suspend or resume exited threads.
- From: Luis Machado <lgustavo at codesourcery dot com>
- To: John Baldwin <jhb at freebsd dot org>, <gdb-patches at sourceware dot org>
- Cc: <vd at freebsd dot org>
- Date: Thu, 12 Jan 2017 19:27:43 -0600
- Subject: Re: [PATCH] PR threads/20743: Don't attempt to suspend or resume exited threads.
- Authentication-results: sourceware.org; auth=none
- References: <20161223212842.42715-1-jhb@FreeBSD.org> <1700771.1OUYESxIQe@ralph.baldwin.cx> <email@example.com> <1723055.CyypAqrLYR@ralph.baldwin.cx>
- Reply-to: Luis Machado <lgustavo at codesourcery dot com>
On 01/12/2017 01:16 PM, John Baldwin wrote:
On Thursday, January 12, 2017 10:29:00 AM Luis Machado wrote:
On 12/28/2016 11:37 AM, John Baldwin wrote:
On Wednesday, December 28, 2016 09:07:07 AM Vasil Dimov wrote:
On Tue, Dec 27, 2016 at 13:03:27 -0800, John Baldwin wrote:
I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS
instead of explicitly continuing the process, but that doesn't help, and it
means that the ptid being returned is still T1 in that case.
I'm not sure if I should explicitly be calling delete_exited_threads() in
fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume()
could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't
clear to me which of these is preferred since both are in use).
I added the assertion for my own sanity. I suspect gdb should never try to
invoke target_resume() with a ptid of an exited thread, but if for some
reason it did the effect on FreeBSD would be a hang since we would suspend
all the other threads and when the process was continued via PT_CONTINUE it
would have nothing to do and would never return from wait(). I'd rather have
gdb fail an assertion in that case rather than hang.
I am not sure if this is related, but since I get a hang I would rather
mention it: with the John's patch (including the assert) gdb does not
emit the "ptrace: No such process" error, but when I attempt to quit,
No, this is a separate bug in the kernel whereby a process doesn't
treat PT_KILL as a detach-like event but incorrectly expects to keep
getting PT_CONTINUE events for a while until it finally exits. I'm
working on writing up regression/unit tests for PT_KILL and then
fixing the bug.
I think the patch is mainly papering over a bigger problem. My guess is
that the native fbsd backend is not doing something it should.
I'd check how linux-nat.c is doing things and then try to confirm the
fbsd behavior is sane.
For example, i noticed linux-nat.c has exit_lwp (...) that handles
deletion of both thread information and the thread itself (lwp). Even if
it is the currently-selected thread, we *will* get the lwp removed from
the list of existing lwp's.
FreeBSD's backend doesn't maintain a separate lwp list, it just uses
the existing GDB thread list. For FreeBSD's backend the two lists would
simply mirror each other so it seems a bit of a waste to maintain a
duplicate list. exit_lwp() calls delete_thread() which is the same thing
the FreeBSD backend is doing, so if that is the current thread in
inferior_ptid, the Linux backend will also being leaving the exited
thread around in GDB's list until some future call to delete_exited_threads().
I think the thing that makes Linux work is that it doesn't use GDB's
thread list. Meaning, it doesn't walk over GDB's thread list, but instead
iterates over its private LWP list via iterate_over_lwps(). It would seem
that GDB's thread list is designed so that backends shouldn't need their
own thread list (you can add target-specific data with a custom destructor
that gets invoked when freeing a thread for example), but the Linux backend
doesn't choose to use it that way?
Looking at some other threaded backends:
- aix-thread.c relies on custom ptrace ops that resume a single thread
- darwin-nat.c uses its own thread list (stored in the inferior's
private data) instead of GDB's thread list.
- gnu-nat.c uses its own thread list instead of GDB's thread list.
- obsd-nat.c uses GDB's thread list but doesn't seem to support resuming
individual threads (only entire processes).
- procfs.c maintains its own thread list, but it doesn't seem to use it
for resume but relies on the associated kernel resuming either an
entire process or a single thread in a process via different ioctls.
- remote.c:remote_resume() uses ALL_NON_EXITED_THREADS
- windows_nat.cwindows_resume() calls windows_continue() which uses a
target-internal thread list rather than GDB's thread list.
It doesn't make sense to keep a thread that has already exitted in the
list of threads we are manipulating.
FreeBSD's backend isn't making that choice. delete_thread() in threads.c
is the one making that choice. If FreeBSD's backend were to define its
own thread list, the contents would be identical except it would not
include any exited threads, so skipping exited threads gives the same
result as walking a hypothetical private list.
So i take it using ALL_NON_EXITED_THREADS is something that would seem
reasonable to use in this case and not iterate through all threads (even
ones marked exitting)?