This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
[gdbserver/linux] Fix spurious SIGSTOPs

From: Pedro Alves <pedro at codesourcery dot com>
To: gdb-patches at sourceware dot org
Date: Sat, 28 Aug 2010 16:14:13 +0100
Subject: [gdbserver/linux] Fix spurious SIGSTOPs
While testing fast tracepoints + non-stop, with my favourite
test (*) I was noticing an occasional spurious SIGSTOP being
reported to GDB.  I've applied the patch below to fix it, after
running the testsuite on x86_64-unknown-linux-gnu without
regressions.

There were a couple of cases where we were queuing a
SIGSTOP when we don't need to.  Fixing those made the spurious
SIGSTOP rate go much lower (my test needed a couple of minutes
to trigger rather than a few seconds), but it didn't solve it
fully.  There are situations, when moving lwps out of jump pads,
were we end up deferring a SIGSTOP to report to GDB more than
once.  It goes something like this:

 - vCont;t arrives, we queue a SIGSTOP, and go wait.
 - when the lwp reports the corresponding stop, we find the lwp is
   collecting a fast tracepoint, so we decide to move it out of
   the jump pad, deferring the SIGSTOP until that is done.
 - Some other LWP hits an internal gdbserver breakpoint that
   is no longer necessary, so we go about deleting it.  We pause
   all lwps while we do so, sending a SIGSTOP to the first LWP as well.
 - the first LWP reports the stop for the SIGSTOP, and, since
   it still hasn't moved fully out of the jump pad, we decide to
   defer it (remember the vCont;t).  But, this is the second SIGSTOP
   we are putting in the defer queue.
 - eventually, the first lwp moves out of the jump pad, and we dequeue
   the first SIGSTOP from the deferred-signals-to-report queue,
   and report the stop to GDB (as response to that original vCont;t).
 - GDB resumes the lwp again, and since there's still a SIGSTOP in 
   the deferred signals queue, we report it now.
 - GDB reports the SIGSTOP to the user.

(*) - a program similar to gdb.threads/schedlock.c, with 3 threads
spinning in a loop.  I set a fast tracepoint with a collect $registers
action for each thread (somewhere within their tight loops).  Then, set
circular tracing on, and, start the tracing experiment.  While the
program is collecting, from gdb, do this:

define foo
  tstatus
  c -a&
  interrupt -a
end

then, execute "foo" once.  Put a weight on the "enter" key
(triggering auto-repeat).  Eventually, is something goes wrong
with stepping-over-breakpoints support, you'll hit an assertion.
For this test, I've made gdbserver abort if it was about to
report a SIGSTOP to gdb.

Here are the commands I use.  Might be useful to someone (or a future
self):

$ cat gdbserver.cmd
#!/bin/bash
./gdbserver --debug \
            --wrapper \
                      env \
                      LD_PRELOAD=/home/pedro/gdb/baseline/build/gdb/gdbserver/libinproctrace.so \
                      -- \
                      :9999 \
                      ~/gdb/tests/threads

$ ./gdbserver.cmd 1>gdbserver.log 2>&1

$ cat foo.cmd 
ftrace 64
actions
        collect $registers
end

ftrace 82
actions
        collect $registers
end

define foo
       c -a&
       interrupt -a
       tstatus
end

$ gdb -ex "r" --args ./gdb -ex "set target-async 1" -ex "set non-stop 1" \
  -ex "set pagination off" -ex "tar rem :9999" -ex "b main" -ex "source foo.cmd" \
   ~/gdb/tests/threads

$ c ; set circular-trace-buffer on; tstart; foo; foo; foo; ...

-- 
Pedro Alves


2010-08-28  Pedro Alves  <pedro@codesourcery.com>

	* linux-low.c (__SIGRTMIN): Define if not already defined.
	(linux_create_inferior): Check for __ANDROID__ rather than
	__SIGRTMIN.
	(enqueue_one_deferred_signal): Don't requeue non-RT signals that
	are already deferred.
	(linux_wait_1): Check for __ANDROID__ rather than __SIGRTMIN.
	(linux_resume_one_thread): Don't queue a SIGSTOP if the lwp is
	stopped and already has a pending signal to report.
	(proceed_one_lwp): : Don't queue a SIGSTOP if the lwp already has
	a pending signal to report or is moving out of a jump pad.
	(linux_init_signals): Check for __ANDROID__ rather than
	__SIGRTMIN.

---
 gdb/gdbserver/linux-low.c |   49 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 44 insertions(+), 5 deletions(-)

Index: src/gdb/gdbserver/linux-low.c
===================================================================
--- src.orig/gdb/gdbserver/linux-low.c	2010-08-28 15:06:40.000000000 +0100
+++ src/gdb/gdbserver/linux-low.c	2010-08-28 15:17:03.000000000 +0100
@@ -98,6 +98,12 @@
 #define W_STOPCODE(sig) ((sig) << 8 | 0x7f)
 #endif
 
+/* This is the kernel's hard limit.  Not to be confused with
+   SIGRTMIN.  */
+#ifndef __SIGRTMIN
+#define __SIGRTMIN 32
+#endif
+
 #ifdef __UCLIBC__
 #if !(defined(__UCLIBC_HAS_MMU__) || defined(__ARCH_HAS_MMU__))
 #define HAS_NOMMU
@@ -566,7 +572,7 @@ linux_create_inferior (char *program, ch
     {
       ptrace (PTRACE_TRACEME, 0, 0, 0);
 
-#ifdef __SIGRTMIN /* Bionic doesn't use SIGRTMIN the way glibc does.  */
+#ifndef __ANDROID__ /* Bionic doesn't use SIGRTMIN the way glibc does.  */
       signal (__SIGRTMIN + 1, SIG_DFL);
 #endif
 
@@ -1337,6 +1343,30 @@ Deferring signal %d for LWP %ld.\n", WST
       fprintf (stderr, "   (no more currently queued signals)\n");
     }
 
+  /* Don't enqueue non-RT signals if they are already in the deferred
+     queue.  (SIGSTOP being the easiest signal to see ending up here
+     twice)  */
+  if (WSTOPSIG (*wstat) < __SIGRTMIN)
+    {
+      struct pending_signals *sig;
+
+      for (sig = lwp->pending_signals_to_report;
+	   sig != NULL;
+	   sig = sig->prev)
+	{
+	  if (sig->signal == WSTOPSIG (*wstat))
+	    {
+	      if (debug_threads)
+		fprintf (stderr,
+			 "Not requeuing already queued non-RT signal %d"
+			 " for LWP %ld\n",
+			 sig->signal,
+			 lwpid_of (lwp));
+	      return;
+	    }
+	}
+    }
+
   p_sig = xmalloc (sizeof (*p_sig));
   p_sig->prev = lwp->pending_signals_to_report;
   p_sig->signal = WSTOPSIG (*wstat);
@@ -2230,7 +2260,7 @@ Check if we're already there.\n",
   if (WIFSTOPPED (w)
       && current_inferior->last_resume_kind != resume_step
       && (
-#if defined (USE_THREAD_DB) && defined (__SIGRTMIN)
+#if defined (USE_THREAD_DB) && !defined (__ANDROID__)
 	  (current_process ()->private->thread_db != NULL
 	   && (WSTOPSIG (w) == __SIGRTMIN
 	       || WSTOPSIG (w) == __SIGRTMIN + 1))
@@ -3332,7 +3362,14 @@ linux_resume_one_thread (struct inferior
 	     the thread already has a pending status to report, we
 	     will still report it the next time we wait - see
 	     status_pending_p_callback.  */
-	  send_sigstop (lwp);
+
+	  /* If we already have a pending signal to report, then
+	     there's no need to queue a SIGSTOP, as this means we're
+	     midway through moving the LWP out of the jumppad, and we
+	     will report the pending signal as soon as that is
+	     finished.  */
+	  if (lwp->pending_signals_to_report == NULL)
+	    send_sigstop (lwp);
 	}
 
       /* For stop requests, we're done.  */
@@ -3500,7 +3537,9 @@ proceed_one_lwp (struct inferior_list_en
       return 0;
     }
 
-  if (thread->last_resume_kind == resume_stop)
+  if (thread->last_resume_kind == resume_stop
+      && lwp->pending_signals_to_report == NULL
+      && lwp->collecting_fast_tracepoint == 0)
     {
       /* We haven't reported this LWP as stopped yet (otherwise, the
 	 last_status.kind check above would catch it, and we wouldn't
@@ -5117,7 +5156,7 @@ linux_init_signals ()
 {
   /* FIXME drow/2002-06-09: As above, we should check with LinuxThreads
      to find what the cancel signal actually is.  */
-#ifdef __SIGRTMIN /* Bionic doesn't use SIGRTMIN the way glibc does.  */
+#ifndef __ANDROID__ /* Bionic doesn't use SIGRTMIN the way glibc does.  */
   signal (__SIGRTMIN+1, SIG_IGN);
 #endif
 }
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]