This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[rfc] Fix a Linux ignored-signals/threading bug


A customer reported a bug which caused a threaded program to hang when run
under GDB.  No breakpoints or anything, just running it with the debugger
attached.  When the hang triggered, one thread would be awake and waiting
for a lock, but all other threads would be marked as tracing-stopped.

We tracked the problem back to a combination of two clever optimizations:
one in linux_nat_wait which avoids stopping all threads if GDB would just
pass the signal back to the inferior and resume, and another in
linux_nat_resume which avoids restarting any threads if the desired thread
already has a saved event.

If the desired thread has a saved status, but that status will be ignored,
then linux_nat_resume would bypass the resume; then linux_nat_wait would eat
the status and restart the thread; then we'll spin in linux_nat_wait waiting
for some other thread to release the lock, but no other threads were
resumed.  We're stuck.

So, I chose to solve this by detecting the special case in linux_nat_resume.
I could also have resumed all other threads (but not the event thread) if
resume_all and the event thread had a saved status; but in most cases the
saved status is probably a SIGTRAP rather than an ignored signal, so
resuming other threads is wasteful.

I can't think of a good way to reproduce this reliably unfortunately.
The testcase just generates a lot of SIGCHLDs by creating a large number of
threads which continually fork date, by using system() [which takes a lock].

Tested on x86_64-pc-linux-gnu.  Any comments?  Otherwise I'll commit this in
a few days.

-- 
Daniel Jacobowitz
CodeSourcery, LLC

2005-09-28  Daniel Jacobowitz  <dan@codesourcery.com>

	* linux-nat.c (linux_nat_resume): Add more debugging messages.  Do
	not short-circuit resuming all threads if the signal will be ignored
	in linux_nat_wait.

Index: linux-nat.c
===================================================================
RCS file: /big/fsf/rsync/src/src/gdb/linux-nat.c,v
retrieving revision 1.33
diff -u -p -r1.33 linux-nat.c
--- linux-nat.c	10 Sep 2005 18:11:02 -0000	1.33
+++ linux-nat.c	27 Sep 2005 18:55:14 -0000
@@ -1072,6 +1072,14 @@ linux_nat_resume (ptid_t ptid, int step,
   struct lwp_info *lp;
   int resume_all;
 
+  if (debug_linux_nat)
+    fprintf_unfiltered (gdb_stdlog,
+			"LLR: Preparing to %s %s, %s, inferior_ptid %s\n",
+			step ? "step" : "resume",
+			target_pid_to_str (ptid),
+			signo ? strsignal (signo) : "0",
+			target_pid_to_str (inferior_ptid));
+
   /* A specific PTID means `step only this process id'.  */
   resume_all = (PIDGET (ptid) == -1);
 
@@ -1097,12 +1105,45 @@ linux_nat_resume (ptid_t ptid, int step,
       lp->resumed = 1;
 
       /* If we have a pending wait status for this thread, there is no
-         point in resuming the process.  */
+	 point in resuming the process.  But first make sure that
+	 linux_nat_wait won't preemptively handle the event - we
+	 should never take this short-circuit if we are going to
+	 leave LP running, since we have skipped resuming all the
+	 other threads.  This bit of code needs to be synchronized
+	 with linux_nat_wait.  */
+
+      if (lp->status && WIFSTOPPED (lp->status))
+	{
+	  int saved_signo = target_signal_from_host (WSTOPSIG (lp->status));
+
+	  if (signal_stop_state (saved_signo) == 0
+	      && signal_print_state (saved_signo) == 0
+	      && signal_pass_state (saved_signo) == 1)
+	    {
+	      if (debug_linux_nat)
+		fprintf_unfiltered (gdb_stdlog,
+				    "LLR: Not short circuiting for ignored "
+				    "status 0x%x\n", lp->status);
+
+	      /* FIXME: What should we do if we are supposed to continue
+		 this thread with a signal?  */
+	      gdb_assert (signo == TARGET_SIGNAL_0);
+	      signo = saved_signo;
+	      lp->status = 0;
+	    }
+	}
+
       if (lp->status)
 	{
 	  /* FIXME: What should we do if we are supposed to continue
 	     this thread with a signal?  */
 	  gdb_assert (signo == TARGET_SIGNAL_0);
+
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog,
+				"LLR: Short circuiting for status 0x%x\n",
+				lp->status);
+
 	  return;
 	}
 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]