Bug 17314 - Any thread created by Python must block SIGCHLD
Summary: Any thread created by Python must block SIGCHLD
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: python (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-26 20:37 UTC by dje
Modified: 2014-09-13 22:55 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dje 2014-08-26 20:37:10 UTC
This is related to PR 17247, but in this case I think a simple doc improvement is what's needed, modulo going with something more robust (if possible):
An alternate solution would be to have linux-nat.c:sigchld_handler detect, if possible (can one obtain the thread id in an async-safe way?  seems like gettid should be async-safe) that the wrong thread has received the signal and cope.

Basically, gdb is the only thread that can receive SIGCHLDs.
Otherwise gdb will hang in the sigsuspend calls in linux-nat.c.

For future debugging sake, IWBN to print the thread id in linux-nat.c:sigchld_handler.

Repro:

---foo.py---
import thread
import time

def forever():
  while True:
    time.sleep(1)

def start_background_thread():
  thread.start_new_thread(forever, ())
---

bash$ make run
(gdb) source foo.py
(gdb) py start_background_thread()
(gdb) file testsuite/gdb.threads/print-threads
(gdb) r
...
[New Thread 0x2aaaab7a3700 (LWP 14472)]
Print 1, thread 0
[New Thread 0x2aaaab9a4700 (LWP 14473)]
Print 1, thread 1
  C-c C-c

... gdb appears to be hung ...

bash2$ gdb gdb
(top-gdb) attach 14050
Attaching to program: ...
...
0x00002add9e70d5e4 in do_sigsuspend (set=0xbdf380 <suspend_mask>)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
63        return INLINE_SYSCALL (rt_sigsuspend, 2, CHECK_SIGSET (set), _NSIG / 8);
(top-gdb) bt
#0  0x00002add9e70d5e4 in do_sigsuspend (set=0xbdf380 <suspend_mask>)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
#1  __GI___sigsuspend (
During symbol reading, Child DIE 0x116692 and its abstract origin 0x1202c0 have different tags.
During symbol reading, Child DIE 0x116692 and its abstract origin 0x1202c0 have different parents.
During symbol reading, Multiple children of DIE 0x118ae5 refer to DIE 0x1156e3 as their abstract origin.
During symbol reading, DW_AT_GNU_call_site_target target DIE has invalid low pc, for referencing DIE 0x118e57 [in module /g3/gnu/sourceware/main-gdb-git/b-maste\
r/obj64/gdb/gdb].
set=<optimized out>, set@entry=0xbdf380 <suspend_mask>)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:78
#2  0x00000000004b95ea in wait_lwp (lp=lp@entry=0x227aec0)
    at ../../binutils-gdb/gdb/linux-nat.c:2314
#3  0x00000000004bb315 in stop_wait_callback (lp=0x227aec0,
    data=<optimized out>) at ../../binutils-gdb/gdb/linux-nat.c:2591
#4  0x00000000004ba1c8 in iterate_over_lwps (filter=...,
    callback=callback@entry=0x4bb2c0 <stop_wait_callback>, data=data@entry=0x0)
    at ../../binutils-gdb/gdb/linux-nat.c:1041
#5  0x00000000004bcc21 in linux_nat_wait_1 (ops=<optimized out>,
    target_options=1, ourstatus=0x7fffaeeadeb0, ptid=...)
    at ../../binutils-gdb/gdb/linux-nat.c:3530
#6  linux_nat_wait (ops=<optimized out>, ptid=..., ourstatus=0x7fffaeeadeb0,
    target_options=1) at ../../binutils-gdb/gdb/linux-nat.c:3680
#7  0x00000000004c3364 in thread_db_wait (
During symbol reading, Only single DW_OP_reg or DW_OP_fbreg is supported for DW_FORM_block* DW_AT_location is supported for DW_TAG_GNU_call_site child DIE 0x74b\
462 [in module /g3/gnu/sourceware/main-gdb-git/b-master/obj64/gdb/gdb].
ops=<optimized out>, ptid=..., ourstatus=0x7fffaeeadeb0, options=1)
    at ../../binutils-gdb/gdb/linux-thread-db.c:1489
#8  0x00000000005e9cae in delegate_wait (self=<optimized out>, arg1=...,
    arg2=<optimized out>, arg3=<optimized out>)
    at ../../binutils-gdb/gdb/target-delegates.c:116
#9  0x00000000005f7425 in target_wait (ptid=..., status=<optimized out>,
    options=<optimized out>) at ../../binutils-gdb/gdb/target.c:2057
#10 0x00000000005b9f4d in fetch_inferior_event (
    client_data=client_data@entry=0x0) at ../../binutils-gdb/gdb/infrun.c:2881
#11 0x00000000005d0453 in inferior_event_handler (event_type=INF_REG_EVENT,
    client_data=0x0) at ../../binutils-gdb/gdb/inf-loop.c:58
#12 0x00000000005ce733 in process_event ()
    at ../../binutils-gdb/gdb/event-loop.c:340
#13 0x00000000005cea4a in gdb_do_one_event ()
    at ../../binutils-gdb/gdb/event-loop.c:392
#14 0x00000000005ceca7 in start_event_loop ()
    at ../../binutils-gdb/gdb/event-loop.c:429
#15 0x00000000005c8283 in captured_command_loop (data=data@entry=0x0)
    at ../../binutils-gdb/gdb/main.c:322
#16 0x00000000005c598a in catch_errors (
    func=func@entry=0x5c8270 <captured_command_loop>,
    func_args=func_args@entry=0x0, errstring=errstring@entry=0x8cb5de "",
    mask=mask@entry=RETURN_MASK_ALL) at ../../binutils-gdb/gdb/exceptions.c:514
#17 0x00000000005c91a6 in captured_main (data=data@entry=0x7fffaeeae1c0)
    at ../../binutils-gdb/gdb/main.c:1178
#18 0x00000000005c598a in catch_errors (
    func=func@entry=0x5c8770 <captured_main>,
    func_args=func_args@entry=0x7fffaeeae1c0,
    errstring=errstring@entry=0x8cb5de "", mask=mask@entry=RETURN_MASK_ALL)
    at ../../binutils-gdb/gdb/exceptions.c:514
#19 0x00000000005c968b in gdb_main (args=args@entry=0x7fffaeeae1c0)
    at ../../binutils-gdb/gdb/main.c:1186
#20 0x00000000004740a5 in main (argc=<optimized out>, argv=<optimized out>)
    at ../../binutils-gdb/gdb/gdb.c:32
(top-gdb) 

... same as 17247.
Comment 1 Doug Evans 2014-08-31 02:39:51 UTC
Setting aside the configure.ac changes, here's a prototype patch to forward SIGCHLD to gdb's thread.

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 1e8991d..af0cfe2 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -24,9 +24,9 @@
 #include "nat/linux-nat.h"
 #include "nat/linux-waitpid.h"
 #include "gdb_wait.h"
-#ifdef HAVE_TKILL_SYSCALL
-#include <unistd.h>
-#include <sys/syscall.h>
+#if defined (HAVE_TKILL_SYSCALL) || defined (HAVE_GETTID_SYSCALL)
+#include <unistd.h>            /* for syscall */
+#include <sys/syscall.h>       /* for syscall numbers */
 #endif
 #include <sys/ptrace.h>
 #include "linux-nat.h"
@@ -171,6 +171,11 @@ blocked.  */
 #define O_LARGEFILE 0
 #endif

+#ifdef HAVE_GETTID_SYSCALL
+/* The thread id (result of gettid) for gdb.  */
+static pid_t gdb_tid;
+#endif
+
 /* The single-threaded native GNU/Linux target_ops.  We save a pointer for
    the use of the multi-threaded target.  */
 static struct target_ops *linux_ops;
@@ -4664,6 +4673,41 @@ sigchld_handler (int signo)
 {
   int old_errno = errno;

+#ifdef HAVE_GETTID_SYSCALL
+  /* If the user starts other threads (e.g. via python) and if the user
+     doesn't remember to block SIGCHLD in those threads, then that thread
+     may receive the SIGCHLD instead of GDB.  This is bad because gdb will
+     then hang in sigsuspend waiting for the SIGCHLD.  PRs 17247, 17314.
+     If we get a SIGCHLD in a different thread, forward the SIGCHLD onto
+     GDB's thread (if we can).  */     
+  {
+    pid_t this_tid = syscall (__NR_gettid);
+
+    /* IWBN to print an error message here if gettid fails.
+       However, it shouldn't ever fail, and if it does it's possible we'll
+       flood the terminal with failure messages.  Since we're in async code
+       it's harder to clip the output after a certain number of failures,
+       so instead just punt.  */
+#ifdef HAVE_TKILL_SYSCALL
+    if (this_tid > 0 && this_tid != gdb_tid)
+      {
+       if (debug_linux_nat)
+         {
+           static const char forwarding_msg[]
+             = "sigchld: forwarding signal to gdb's thread\n";
+
+           ui_file_write_async_safe (gdb_stdlog, forwarding_msg,
+                                     strlen (forwarding_msg));
+         }
+
+       kill_lwp (gdb_tid, SIGCHLD);
+       errno = old_errno;
+       return;
+      }
+#endif
+  }
+#endif
+
   if (debug_linux_nat)
     ui_file_write_async_safe (gdb_stdlog,
                              "sigchld\n", sizeof ("sigchld\n") - 1);
@@ -5013,6 +5057,12 @@ extern initialize_file_ftype _initialize_linux_nat;
 void
 _initialize_linux_nat (void)
 {
+#ifdef HAVE_GETTID_SYSCALL
+  /* Remember gdb's thread id so that we can perform a sanity check in
+     sigchld_handler.  */
+  gdb_tid = syscall (__NR_gettid);
+#endif
+
   add_setshow_zuinteger_cmd ("lin-lwp", class_maintenance,
                             &debug_linux_nat, _("\
 Set debugging of GNU/Linux lwp module."), _("\
Comment 2 cvs-commit@gcc.gnu.org 2014-09-13 22:55:23 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  3714cea7d44e2bd9cac28a61d45fcc8c3cc6ae83 (commit)
      from  1a8bdf69e7d1f6084af503407d6a4217b06d6ea5 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=3714cea7d44e2bd9cac28a61d45fcc8c3cc6ae83

commit 3714cea7d44e2bd9cac28a61d45fcc8c3cc6ae83
Author: Doug Evans <xdje42@gmail.com>
Date:   Sat Sep 13 15:52:15 2014 -0700

    Pass plain-text prompt to with_gdb_prompt.
    
    I had occasion to use with_gdb_prompt in a test for the patch for PR 17314
    and was passing the plain text prompt as the value, "(top-gdb)",
    instead of a regexp, "\(top-gdb\)" (expressed as "\\(top-gdb\\)" in TCL).
    
    I then discovered that in order to restore the prompt gdb passes the
    original value of $gdb_prompt to "set prompt", which works because
    "set prompt \(gdb\) " is equivalent to "set prompt (gdb) ".
    Perhaps I'm being overly cautious but this feels a bit subtle,
    but at any rate as an API choice I'd much rather pass the plain text
    form to with_gdb_prompt.
    
    I also discovered that the initial value of gdb_prompt is set in
    two places to two different values.
    At the global level gdb.exp sets it to "\[(\]gdb\[)\]"
    and default_gdb_init sets it to "\\(gdb\\)".
    The former form is undesirable as an argument to "set prompt",
    but it's not clear to me that just deleting this code won't break
    anything.  Thus I just changed the value to be consistent and added
    a comment.
    
    gdb/testsuite/ChangeLog:
    
    	* lib/gdb.exp (gdb_prompt): Add comment and change initial value to
    	be consistent with what default_gdb_init uses.
    	(with_gdb_prompt): Change form of PROMPT argument from a regexp to
    	the plain text of the prompt.  Add some logging printfs.
    	* gdb.perf/disassemble.exp: Update call to with_gdb_prompt.

-----------------------------------------------------------------------

Summary of changes:
 gdb/testsuite/ChangeLog                |    8 ++++++++
 gdb/testsuite/gdb.perf/disassemble.exp |    2 +-
 gdb/testsuite/lib/gdb.exp              |   31 +++++++++++++++++++++++++++++--
 3 files changed, 38 insertions(+), 3 deletions(-)