This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
- From: Don Breazeal <donb at codesourcery dot com>
- To: Pedro Alves <palves at redhat dot com>, "Breazeal, Don" <Don_Breazeal at mentor dot com>, Simon Marchi <simon dot marchi at ericsson dot com>
- Cc: GDB Patches <gdb-patches at sourceware dot org>
- Date: Fri, 24 Jul 2015 11:43:42 -0700
- Subject: Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode
- Authentication-results: sourceware.org; auth=none
- References: <1437672294-29351-1-git-send-email-palves at redhat dot com> <55B1308E dot 4020700 at redhat dot com>
On 7/23/2015 11:21 AM, Pedro Alves wrote:
> So I managed to extract out this smaller patch from the
> gdbserver fixes I mentioned. I think this one looks safe enough
> for 7.10. WDYT?
>
> -----------
> From 98d41152bff2a21f7fda864d87ee5dd0cffa2d17 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 23 Jul 2015 18:49:51 +0100
> Subject: [PATCH] remote follow fork and spurious child stops in non-stop mode
>
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
>
> Running gdb.threads/fork-plus-threads.exp against gdbserver in
> extended-remote mode, even though the test passes, we still see broken
> behavior:
>
> (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
> continue &
> Continuing.
> (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
> [New Thread 28092.28092]
>
> [Thread 28092.28092] #2 stopped.
> [New Thread 28094.28094]
> [Inferior 2 (process 28092) exited normally]
> [New Thread 28094.28105]
> [New Thread 28094.28109]
>
> ...
>
> [Thread 28174.28174] #18 stopped.
> [New Thread 28185.28185]
> [Inferior 10 (process 28174) exited normally]
> [New Thread 28185.28196]
>
> [Thread 28185.28185] #20 stopped.
> Cannot remove breakpoints because program is no longer writable.
> Further execution is probably impossible.
> [Inferior 11 (process 28185) exited normally]
> [Inferior 1 (process 28091) exited normally]
> PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
> info threads
> No threads.
> (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
> info inferiors
> Num Description Executable
> * 1 <null> /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
> (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left
>
> All the "[Thread FOO] #NN stopped." above are bogus, as well as the
> "Cannot remove breakpoints because program is no longer writable.",
> which is a consequence.
>
> The problem is that when we intercept a fork event, we should report
> the event for the parent, only, and leave the child stopped, but not
> report its stop event. GDB later decides whether to follow the parent
> or the child. But because handle_extended_wait does not set the
> child's last_status.kind to TARGET_WAITKIND_STOPPED, a
> stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
> memory) by mistake ends up queueing a SIGSTOP on the child, resuming
> it, and then when that SIGSTOP is intercepted, because the LWP has
> last_resume_kind set to resume_stop, gdbserver reports the stop to
> GDB, as GDB_SIGNAL_0:
>
> ...
> >>>> entering unstop_all_lwps
> unstopping all lwps
> proceed_one_lwp: lwp 1600
> client wants LWP to remain 1600 stopped
> proceed_one_lwp: lwp 1828
> Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Sending sigstop to lwp 1828
> pc is 0x3615ebc7cc
> Resuming lwp 1828 (continue, signal 0, stop expected)
> continue from pc 0x3615ebc7cc
> unstop_all_lwps done
> sigchld_handler
> <<<< exiting unstop_all_lwps
> handling possible target event
> >>>> entering linux_wait_1
> linux_wait_1: [<all threads>]
> my_waitpid (-1, 0x40000001)
> my_waitpid (-1, 0x1): status(137f), 1828
> LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
> LLW: waitpid 1828 received Stopped (signal) (stopped)
> pc is 0x3615ebc7cc
> Expected stop.
> LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ...
> linux_wait_1 ret = LWP 1828.1828, 1, 0
> <<<< exiting linux_wait_1
> Writing resume reply for LWP 1828.1828:1
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> By inspection, I also noticed that we miss leaving the child with the
> suspend count incremented if stopping threads, like we do for clone
> threads.
>
> Tested on x86_64 Fedora 20, extended-remote.
>
> gdb/gdbserver/ChangeLog:
> 2015-07-23 Pedro Alves <palves@redhat.com>
>
> * linux-low.c (handle_extended_wait): Set the child's last
> reported status to TARGET_WAITKIND_STOPPED.
> ---
> gdb/gdbserver/linux-low.c | 7 ++++++
> gdb/testsuite/gdb.threads/fork-plus-threads.exp | 30 +++++++++++++++++++++++++
> 2 files changed, 37 insertions(+)
>
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 17b2a51..56a33ff 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -488,6 +488,13 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
> child_lwp->status_pending_p = 0;
> child_thr = get_lwp_thread (child_lwp);
> child_thr->last_resume_kind = resume_stop;
> + child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;
This makes perfect sense to me.
> +
> + /* If we're suspending all threads, leave this one suspended
> + too. */
> + if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
> + child_lwp->suspended = 1;
I have a question about this. In the definition of struct lwp_info in
linux-low.h, it has this comment:
/* When this is true, we shall not try to resume this thread, even
if last_resume_kind isn't resume_stop. */
int suspended;
Since we are setting last_resume_kind to resume_stop here, is this
unnecessary?
Thanks,
--Don
> +
> parent_proc = get_thread_process (event_thr);
> child_proc->attached = parent_proc->attached;
> clone_all_breakpoints (&child_proc->breakpoints,
> diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> index f44dd76..80d2464 100644
> --- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> +++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
> @@ -48,13 +48,43 @@ gdb_test_multiple $test $test {
> }
> }
>
> +# gdbserver had a bug that resulted in reporting the fork child's
> +# initial stop to gdb, which gdb does not expect, in turn resulting in
> +# a broken session, like:
> +#
> +# [Thread 31536.31536] #16 stopped. <== BAD
> +# [New Thread 31547.31547]
> +# [Inferior 10 (process 31536) exited normally]
> +# [New Thread 31547.31560]
> +#
> +# [Thread 31547.31547] #18 stopped. <== BAD
> +# Cannot remove breakpoints because program is no longer writable. <== BAD
> +# Further execution is probably impossible. <== BAD
> +# [Inferior 11 (process 31547) exited normally]
> +# [Inferior 1 (process 31454) exited normally]
> +#
> +# These variables track whether we see such broken behavior.
> +set saw_cannot_remove_breakpoints 0
> +set saw_thread_stopped 0
> +
> set test "reached breakpoint"
> gdb_test_multiple "" $test {
> + -re "Cannot remove breakpoints" {
> + set saw_cannot_remove_breakpoints 1
> + exp_continue
> + }
> + -re "Thread \[^\r\n\]+ stopped\\." {
> + set saw_thread_stopped 1
> + exp_continue
> + }
> -re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
> pass $test
> }
> }
>
> +gdb_assert !$saw_cannot_remove_breakpoints "no failure to remove breakpoints"
> +gdb_assert !$saw_thread_stopped "no spurious thread stop"
> +
> gdb_test "info threads" "No threads\." \
> "no threads left"
>
>