This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
[gdbserver/linux] Fix spurious SIGSTOPs
- From: Pedro Alves <pedro at codesourcery dot com>
- To: gdb-patches at sourceware dot org
- Date: Sat, 28 Aug 2010 16:14:13 +0100
- Subject: [gdbserver/linux] Fix spurious SIGSTOPs
While testing fast tracepoints + non-stop, with my favourite
test (*) I was noticing an occasional spurious SIGSTOP being
reported to GDB. I've applied the patch below to fix it, after
running the testsuite on x86_64-unknown-linux-gnu without
regressions.
There were a couple of cases where we were queuing a
SIGSTOP when we don't need to. Fixing those made the spurious
SIGSTOP rate go much lower (my test needed a couple of minutes
to trigger rather than a few seconds), but it didn't solve it
fully. There are situations, when moving lwps out of jump pads,
were we end up deferring a SIGSTOP to report to GDB more than
once. It goes something like this:
- vCont;t arrives, we queue a SIGSTOP, and go wait.
- when the lwp reports the corresponding stop, we find the lwp is
collecting a fast tracepoint, so we decide to move it out of
the jump pad, deferring the SIGSTOP until that is done.
- Some other LWP hits an internal gdbserver breakpoint that
is no longer necessary, so we go about deleting it. We pause
all lwps while we do so, sending a SIGSTOP to the first LWP as well.
- the first LWP reports the stop for the SIGSTOP, and, since
it still hasn't moved fully out of the jump pad, we decide to
defer it (remember the vCont;t). But, this is the second SIGSTOP
we are putting in the defer queue.
- eventually, the first lwp moves out of the jump pad, and we dequeue
the first SIGSTOP from the deferred-signals-to-report queue,
and report the stop to GDB (as response to that original vCont;t).
- GDB resumes the lwp again, and since there's still a SIGSTOP in
the deferred signals queue, we report it now.
- GDB reports the SIGSTOP to the user.
(*) - a program similar to gdb.threads/schedlock.c, with 3 threads
spinning in a loop. I set a fast tracepoint with a collect $registers
action for each thread (somewhere within their tight loops). Then, set
circular tracing on, and, start the tracing experiment. While the
program is collecting, from gdb, do this:
define foo
tstatus
c -a&
interrupt -a
end
then, execute "foo" once. Put a weight on the "enter" key
(triggering auto-repeat). Eventually, is something goes wrong
with stepping-over-breakpoints support, you'll hit an assertion.
For this test, I've made gdbserver abort if it was about to
report a SIGSTOP to gdb.
Here are the commands I use. Might be useful to someone (or a future
self):
$ cat gdbserver.cmd
#!/bin/bash
./gdbserver --debug \
--wrapper \
env \
LD_PRELOAD=/home/pedro/gdb/baseline/build/gdb/gdbserver/libinproctrace.so \
-- \
:9999 \
~/gdb/tests/threads
$ ./gdbserver.cmd 1>gdbserver.log 2>&1
$ cat foo.cmd
ftrace 64
actions
collect $registers
end
ftrace 82
actions
collect $registers
end
define foo
c -a&
interrupt -a
tstatus
end
$ gdb -ex "r" --args ./gdb -ex "set target-async 1" -ex "set non-stop 1" \
-ex "set pagination off" -ex "tar rem :9999" -ex "b main" -ex "source foo.cmd" \
~/gdb/tests/threads
$ c ; set circular-trace-buffer on; tstart; foo; foo; foo; ...
--
Pedro Alves
2010-08-28 Pedro Alves <pedro@codesourcery.com>
* linux-low.c (__SIGRTMIN): Define if not already defined.
(linux_create_inferior): Check for __ANDROID__ rather than
__SIGRTMIN.
(enqueue_one_deferred_signal): Don't requeue non-RT signals that
are already deferred.
(linux_wait_1): Check for __ANDROID__ rather than __SIGRTMIN.
(linux_resume_one_thread): Don't queue a SIGSTOP if the lwp is
stopped and already has a pending signal to report.
(proceed_one_lwp): : Don't queue a SIGSTOP if the lwp already has
a pending signal to report or is moving out of a jump pad.
(linux_init_signals): Check for __ANDROID__ rather than
__SIGRTMIN.
---
gdb/gdbserver/linux-low.c | 49 +++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 44 insertions(+), 5 deletions(-)
Index: src/gdb/gdbserver/linux-low.c
===================================================================
--- src.orig/gdb/gdbserver/linux-low.c 2010-08-28 15:06:40.000000000 +0100
+++ src/gdb/gdbserver/linux-low.c 2010-08-28 15:17:03.000000000 +0100
@@ -98,6 +98,12 @@
#define W_STOPCODE(sig) ((sig) << 8 | 0x7f)
#endif
+/* This is the kernel's hard limit. Not to be confused with
+ SIGRTMIN. */
+#ifndef __SIGRTMIN
+#define __SIGRTMIN 32
+#endif
+
#ifdef __UCLIBC__
#if !(defined(__UCLIBC_HAS_MMU__) || defined(__ARCH_HAS_MMU__))
#define HAS_NOMMU
@@ -566,7 +572,7 @@ linux_create_inferior (char *program, ch
{
ptrace (PTRACE_TRACEME, 0, 0, 0);
-#ifdef __SIGRTMIN /* Bionic doesn't use SIGRTMIN the way glibc does. */
+#ifndef __ANDROID__ /* Bionic doesn't use SIGRTMIN the way glibc does. */
signal (__SIGRTMIN + 1, SIG_DFL);
#endif
@@ -1337,6 +1343,30 @@ Deferring signal %d for LWP %ld.\n", WST
fprintf (stderr, " (no more currently queued signals)\n");
}
+ /* Don't enqueue non-RT signals if they are already in the deferred
+ queue. (SIGSTOP being the easiest signal to see ending up here
+ twice) */
+ if (WSTOPSIG (*wstat) < __SIGRTMIN)
+ {
+ struct pending_signals *sig;
+
+ for (sig = lwp->pending_signals_to_report;
+ sig != NULL;
+ sig = sig->prev)
+ {
+ if (sig->signal == WSTOPSIG (*wstat))
+ {
+ if (debug_threads)
+ fprintf (stderr,
+ "Not requeuing already queued non-RT signal %d"
+ " for LWP %ld\n",
+ sig->signal,
+ lwpid_of (lwp));
+ return;
+ }
+ }
+ }
+
p_sig = xmalloc (sizeof (*p_sig));
p_sig->prev = lwp->pending_signals_to_report;
p_sig->signal = WSTOPSIG (*wstat);
@@ -2230,7 +2260,7 @@ Check if we're already there.\n",
if (WIFSTOPPED (w)
&& current_inferior->last_resume_kind != resume_step
&& (
-#if defined (USE_THREAD_DB) && defined (__SIGRTMIN)
+#if defined (USE_THREAD_DB) && !defined (__ANDROID__)
(current_process ()->private->thread_db != NULL
&& (WSTOPSIG (w) == __SIGRTMIN
|| WSTOPSIG (w) == __SIGRTMIN + 1))
@@ -3332,7 +3362,14 @@ linux_resume_one_thread (struct inferior
the thread already has a pending status to report, we
will still report it the next time we wait - see
status_pending_p_callback. */
- send_sigstop (lwp);
+
+ /* If we already have a pending signal to report, then
+ there's no need to queue a SIGSTOP, as this means we're
+ midway through moving the LWP out of the jumppad, and we
+ will report the pending signal as soon as that is
+ finished. */
+ if (lwp->pending_signals_to_report == NULL)
+ send_sigstop (lwp);
}
/* For stop requests, we're done. */
@@ -3500,7 +3537,9 @@ proceed_one_lwp (struct inferior_list_en
return 0;
}
- if (thread->last_resume_kind == resume_stop)
+ if (thread->last_resume_kind == resume_stop
+ && lwp->pending_signals_to_report == NULL
+ && lwp->collecting_fast_tracepoint == 0)
{
/* We haven't reported this LWP as stopped yet (otherwise, the
last_status.kind check above would catch it, and we wouldn't
@@ -5117,7 +5156,7 @@ linux_init_signals ()
{
/* FIXME drow/2002-06-09: As above, we should check with LinuxThreads
to find what the cancel signal actually is. */
-#ifdef __SIGRTMIN /* Bionic doesn't use SIGRTMIN the way glibc does. */
+#ifndef __ANDROID__ /* Bionic doesn't use SIGRTMIN the way glibc does. */
signal (__SIGRTMIN+1, SIG_IGN);
#endif
}