This is the mail archive of the
mailing list for the GDB project.
Re: [PATCH] PR threads/20743: Don't attempt to suspend or resume exited threads.
- From: Luis Machado <lgustavo at codesourcery dot com>
- To: John Baldwin <jhb at freebsd dot org>, <vd at freebsd dot org>
- Cc: <gdb-patches at sourceware dot org>
- Date: Thu, 12 Jan 2017 10:29:00 -0600
- Subject: Re: [PATCH] PR threads/20743: Don't attempt to suspend or resume exited threads.
- Authentication-results: sourceware.org; auth=none
- References: <20161223212842.42715-1-jhb@FreeBSD.org> <2893581.89CAWbS1EM@ralph.baldwin.cx> <20161228080707.GA4007@nitro> <1700771.1OUYESxIQe@ralph.baldwin.cx>
- Reply-to: Luis Machado <lgustavo at codesourcery dot com>
On 12/28/2016 11:37 AM, John Baldwin wrote:
On Wednesday, December 28, 2016 09:07:07 AM Vasil Dimov wrote:
On Tue, Dec 27, 2016 at 13:03:27 -0800, John Baldwin wrote:
I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS
instead of explicitly continuing the process, but that doesn't help, and it
means that the ptid being returned is still T1 in that case.
I'm not sure if I should explicitly be calling delete_exited_threads() in
fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume()
could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't
clear to me which of these is preferred since both are in use).
I added the assertion for my own sanity. I suspect gdb should never try to
invoke target_resume() with a ptid of an exited thread, but if for some
reason it did the effect on FreeBSD would be a hang since we would suspend
all the other threads and when the process was continued via PT_CONTINUE it
would have nothing to do and would never return from wait(). I'd rather have
gdb fail an assertion in that case rather than hang.
I am not sure if this is related, but since I get a hang I would rather
mention it: with the John's patch (including the assert) gdb does not
emit the "ptrace: No such process" error, but when I attempt to quit,
No, this is a separate bug in the kernel whereby a process doesn't
treat PT_KILL as a detach-like event but incorrectly expects to keep
getting PT_CONTINUE events for a while until it finally exits. I'm
working on writing up regression/unit tests for PT_KILL and then
fixing the bug.
I think the patch is mainly papering over a bigger problem. My guess is
that the native fbsd backend is not doing something it should.
I'd check how linux-nat.c is doing things and then try to confirm the
fbsd behavior is sane.
For example, i noticed linux-nat.c has exit_lwp (...) that handles
deletion of both thread information and the thread itself (lwp). Even if
it is the currently-selected thread, we *will* get the lwp removed from
the list of existing lwp's.
It doesn't make sense to keep a thread that has already exitted in the
list of threads we are manipulating.