Bug 30680 - [gdb/dap] ThreadSanitizer: data race gdbsupport/event-pipe.h:44 in event_pipe::is_open() const
Summary: [gdb/dap] ThreadSanitizer: data race gdbsupport/event-pipe.h:44 in event_pipe...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: dap (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 14.1
Assignee: Tom Tromey
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-25 10:33 UTC by Tom de Vries
Modified: 2023-07-31 12:38 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
gdb.log (39.37 KB, text/x-log)
2023-07-25 10:33 UTC, Tom de Vries
Details
LOG.gz (7.68 MB, application/gzip)
2023-07-26 16:57 UTC, Tom de Vries
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2023-07-25 10:33:22 UTC
Created attachment 15010 [details]
gdb.log

When doing a gdb build with gcc 13.1.1 and with -O2 -fno-hoist-adjacent-loads to work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799, I run into:
...
WARNING: ThreadSanitizer: data race (pid=22165)
Read of size 4 at 0x00000261ca40 by thread T13:
#0 event_pipe::is_open() const /data/vries/gdb/src/gdb/../gdbsupport/event-pipe.h:44 (gdb+0x92f381) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
  ...
Previous write of size 4 at 0x00000261ca40 by main thread:
#0 event_pipe::close_pipe() /data/vries/gdb/src/gdbsupport/event-pipe.cc:62 (gdb+0x1610e2d) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
...

event-pipe.h:44
...
    bool is_open () const
44  { return m_fds[0] != -1; } 
...

event-pipe.cc:62:
...
    void
    event_pipe::close_pipe ()
    {
      ::close (m_fds[0]);
      ::close (m_fds[1]);
62    m_fds[0] = -1;
      m_fds[1] = -1;
    }
...
Comment 1 Tom de Vries 2023-07-25 10:46:35 UTC
A bit more elaborate.

WARNING: ThreadSanitizer: data race (pid=22165)
  Read of size 4 at 0x00000261ca40 by thread T13:
    #0 event_pipe::is_open() const /data/vries/gdb/src/gdb/../gdbsupport/event-pipe.h:44 (gdb+0x92f381) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #1 inf_ptrace_target::async_file_mark_if_open() /data/vries/gdb/src/gdb/inf-ptrace.h:73 (gdb+0x92f381)
    #2 sigchld_handler /data/vries/gdb/src/gdb/linux-nat.c:4238 (gdb+0x92f381)
    #3 <null> <null> (libtsan.so.2+0x3e2cc) (BuildId: 8f2a9be581a0fcb3d7109755a6067408093b9dbd)
    #4 _Py_read <null> (libpython3.11.so.1.0+0x2590e9) (BuildId: e08746d8f4494e69c6a275f48c30104fb13e6d81)

  Previous write of size 4 at 0x00000261ca40 by main thread:
    #0 event_pipe::close_pipe() /data/vries/gdb/src/gdbsupport/event-pipe.cc:62 (gdb+0x1610e2d) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #1 inf_ptrace_target::async_file_close() /data/vries/gdb/src/gdb/inf-ptrace.h:82 (gdb+0x93033c) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #2 linux_nat_target::async(bool) /data/vries/gdb/src/gdb/linux-nat.c:4279 (gdb+0x93033c)
    #3 target_ops::async(bool) /data/vries/gdb/src/gdb/target-delegates.c:2161 (gdb+0xce9b04) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #4 target_async(bool) /data/vries/gdb/src/gdb/target.c:4338 (gdb+0xcf987d) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #5 inferior_event_handler(inferior_event_type) /data/vries/gdb/src/gdb/inf-loop.c:52 (gdb+0x8b761a) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #6 fetch_inferior_event() /data/vries/gdb/src/gdb/infrun.c:4501 (gdb+0x8f564e) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #7 inferior_event_handler(inferior_event_type) /data/vries/gdb/src/gdb/inf-loop.c:42 (gdb+0x8b758c) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #8 infrun_async_inferior_event_handler /data/vries/gdb/src/gdb/infrun.c:9820 (gdb+0x8d4fc7) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #9 check_async_event_handlers() /data/vries/gdb/src/gdb/async-event.c:338 (gdb+0x5a640f) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #10 gdb_do_one_event(int) /data/vries/gdb/src/gdbsupport/event-loop.cc:221 (gdb+0x1600024) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #11 wait_sync_command_done() /data/vries/gdb/src/gdb/top.c:426 (gdb+0xd1874b) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #12 maybe_wait_sync_command_done(int) /data/vries/gdb/src/gdb/top.c:443 (gdb+0xd1cb4c) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #13 execute_command(char const*, int) /data/vries/gdb/src/gdb/top.c:576 (gdb+0xd1cb4c)
    #14 execute_control_command_1 /data/vries/gdb/src/gdb/cli/cli-script.c:529 (gdb+0x675883) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #15 execute_control_command(command_line*, int) /data/vries/gdb/src/gdb/cli/cli-script.c:701 (gdb+0x675d0e) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #16 execute_control_commands(command_line*, int) /data/vries/gdb/src/gdb/cli/cli-script.c:411 (gdb+0x675e59) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #17 operator() /data/vries/gdb/src/gdb/cli/cli-script.c:432 (gdb+0x675ef7) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #18 __invoke_impl<void, execute_control_commands_to_string(command_line*, int)::<lambda()>&> /usr/include/c++/13/bits/invoke.h:61 (gdb+0x675ef7)
    #19 __invoke_r<void, execute_control_commands_to_string(command_line*, int)::<lambda()>&> /usr/include/c++/13/bits/invoke.h:111 (gdb+0x675ef7)
WROTE: <<<{"type": "event", "event": "output", "body": {"category": "stdout", "output": "    #20 _M_invoke /usr/include/c++/13/bits/std_function.h:290 (gdb+0x675ef7)
    #21 std::function<void ()>::operator()() const /usr/include/c++/13/bits/std_function.h:591 (gdb+0xd189c1) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #22 execute_fn_to_ui_file(ui_file*, std::function<void ()>) /data/vries/gdb/src/gdb/top.c:630 (gdb+0xd189c1)
    #23 execute_fn_to_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::function<void ()>, bool) /data/vries/gdb/src/gdb/top.c:644 (gdb+0xd18bab) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #24 execute_control_commands_to_string[abi:cxx11](command_line*, int) /data/vries/gdb/src/gdb/cli/cli-script.c:430 (gdb+0x672034) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #25 execute_gdb_command /data/vries/gdb/src/gdb/python/python.c:690 (gdb+0xb1176e) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #26 <null> <null> (libpython3.11.so.1.0+0x1cd27c) (BuildId: e08746d8f4494e69c6a275f48c30104fb13e6d81)
    #27 std::function<void ()>::operator()() const /usr/include/c++/13/bits/std_function.h:591 (gdb+0xbc8db2) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #28 run_events /data/vries/gdb/src/gdb/run-on-main-thread.c:76 (gdb+0xbc8db2)
    #29 handle_file_event /data/vries/gdb/src/gdbsupport/event-loop.cc:573 (gdb+0x15fe861) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #30 gdb_wait_for_event /data/vries/gdb/src/gdbsupport/event-loop.cc:694 (gdb+0x15feead) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #31 gdb_do_one_event(int) /data/vries/gdb/src/gdbsupport/event-loop.cc:264 (gdb+0x15fff9a) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #32 start_event_loop /data/vries/gdb/src/gdb/main.c:412 (gdb+0x98ea81) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #33 captured_command_loop /data/vries/gdb/src/gdb/main.c:476 (gdb+0x98ea81)
    #34 captured_main /data/vries/gdb/src/gdb/main.c:1320 (gdb+0x992784) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)
    #35 gdb_main(captured_main_args*) /data/vries/gdb/src/gdb/main.c:1339 (gdb+0x992784)
    #36 main /data/vries/gdb/src/gdb/gdb.c:32 (gdb+0x47ab01) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898)

  Location is global 'inf_ptrace_target::m_event_pipe' of size 8 at 0x00000261ca40 (gdb+0x261ca40)

SUMMARY: ThreadSanitizer: data race /data/vries/gdb/src/gdb/../gdbsupport/event-pipe.h:44 in event_pipe::is_open() const"
Comment 2 Tom de Vries 2023-07-25 11:10:10 UTC
Doesn't reproduce all the time.
Comment 3 Tom de Vries 2023-07-25 11:51:52 UTC
Also reproduces with -O0, reproduced 4 out of 25 times.
Comment 4 Tom Tromey 2023-07-25 12:36:39 UTC
>     #2 sigchld_handler /data/vries/gdb/src/gdb/linux-nat.c:4238 (gdb+0x92f381)

This is odd because all the DAP-related threads should be started
with SIGCHLD blocked.
Comment 5 Tom de Vries 2023-07-26 12:12:10 UTC
(In reply to Tom de Vries from comment #3)
> Also reproduces with -O0, reproduced 4 out of 25 times.

That was with gdb 13.1.1 on openSUSE Tumbleweed.

Also reproduces on openSUSE Leap 15.4, using gcc 7.5.0, reproduced 7 out of 25 times.
Comment 6 Tom de Vries 2023-07-26 16:23:55 UTC
(In reply to Tom Tromey from comment #4)
> >     #2 sigchld_handler /data/vries/gdb/src/gdb/linux-nat.c:4238 (gdb+0x92f381)
> 
> This is odd because all the DAP-related threads should be started
> with SIGCHLD blocked.

I've written an assert to try to catch this situation without tsan:
...
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 250a8f43282..d6d0ba0f415 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -69,6 +69,7 @@
 #include "gdbsupport/scope-exit.h"
 #include "gdbsupport/gdb-sigmask.h"
 #include "gdbsupport/common-debug.h"
+#include "run-on-main-thread.h"
 #include <unordered_map>
 
 /* This comment documents high-level logic of this file.
@@ -4227,6 +4228,7 @@ linux_nat_target::supports_disable_randomization ()
 static void
 sigchld_handler (int signo)
 {
+  gdb_assert (is_main_thread ());
   int old_errno = errno;
 
   if (debug_linux_nat)
diff --git a/gdb/run-on-main-thread.c b/gdb/run-on-main-thread.c
index c7d9de0afc8..f9388e33633 100644
--- a/gdb/run-on-main-thread.c
+++ b/gdb/run-on-main-thread.c
@@ -94,12 +94,27 @@ run_on_main_thread (std::function<void ()> &&func)
   serial_event_set (runnable_event);
 }
 
+#if CXX_STD_THREAD
+static void
+initialize_main_thread_var (void)
+{
+  static bool initialized = false;
+
+  if (initialized)
+    return;
+  initialized = true;
+
+  main_thread = std::this_thread::get_id ();
+}
+#endif
+
 /* See run-on-main-thread.h.  */
 
 bool
 is_main_thread ()
 {
 #if CXX_STD_THREAD
+  initialize_main_thread_var ();
   return std::this_thread::get_id () == main_thread;
 #else--
   return true;
@@ -111,7 +126,7 @@ void
 _initialize_run_on_main_thread ()
 {
 #if CXX_STD_THREAD
-  main_thread = std::this_thread::get_id ();
+  initialize_main_thread_var ();
 #endif
   runnable_event = make_serial_event ();
   add_file_handler (serial_event_fd (runnable_event), run_events, nullptr,
...
and it does trigger.
Comment 7 Tom de Vries 2023-07-26 16:57:37 UTC
Created attachment 15014 [details]
LOG.gz

(In reply to Tom de Vries from comment #6)
> and it does trigger.

In a run of 200, 12 times.

There are problems in other test-cases as well:
...
$ grep ERROR:.*sourcing LOG | sort | uniq -c
      3 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/ada-scopes.exp.
      6 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/args-env.exp.
     87 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/basic-dap.exp.
      6 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/bt-nodebug.exp.
     18 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/catch-exception.exp.
     60 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/cond-bp.exp.
     24 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/hover.exp.
      6 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/log-message.exp.
     36 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/modules.exp.
      3 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/scopes.exp.
     12 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/stop-at-main.exp.
...
Comment 8 Tom de Vries 2023-07-26 16:59:18 UTC
(In reply to Tom de Vries from comment #7)
> Created attachment 15014 [details]
> LOG.gz

Also occurring:
...
"type": "event", "event": "output", "body": {"category": "stdout", "output": "Recursive internal problem.\n"}, "seq": 24}Content-Length: 129^M
^M
{"type": "event", "event": "output", "body": {"category": "stdout", "output": "\u0000corrupted double-linked list\n"}
...
Comment 9 Tom de Vries 2023-07-28 11:01:55 UTC
(In reply to Tom de Vries from comment #6)
> diff --git a/gdb/run-on-main-thread.c b/gdb/run-on-main-thread.c
> index c7d9de0afc8..f9388e33633 100644
> --- a/gdb/run-on-main-thread.c
> +++ b/gdb/run-on-main-thread.c
> @@ -94,12 +94,27 @@ run_on_main_thread (std::function<void ()> &&func)
>    serial_event_set (runnable_event);
>  }
>  
> +#if CXX_STD_THREAD
> +static void
> +initialize_main_thread_var (void)
> +{
> +  static bool initialized = false;
> +
> +  if (initialized)
> +    return;
> +  initialized = true;
> +
> +  main_thread = std::this_thread::get_id ();
> +}
> +#endif
> +
>  /* See run-on-main-thread.h.  */
>  
>  bool
>  is_main_thread ()
>  {
>  #if CXX_STD_THREAD
> +  initialize_main_thread_var ();
>    return std::this_thread::get_id () == main_thread;
>  #else--
>    return true;
> @@ -111,7 +126,7 @@ void
>  _initialize_run_on_main_thread ()
>  {
>  #if CXX_STD_THREAD
> -  main_thread = std::this_thread::get_id ();
> +  initialize_main_thread_var ();
>  #endif
>    runnable_event = make_serial_event ();
>    add_file_handler (serial_event_fd (runnable_event), run_events, nullptr,
> ...

Submitted this bit here ( https://sourceware.org/pipermail/gdb-patches/2023-July/201190.html ).
Comment 10 Tom Tromey 2023-07-28 16:04:38 UTC
I definitely (but intermittently) see the failure with the assert patch applied.
I'm not much closer to finding the problem, though.
I think the challenge is finding out what bit of code might be
unblocking SIGCHLD on this thread.
Comment 11 Tom de Vries 2023-07-28 16:20:47 UTC
(In reply to Tom Tromey from comment #10)
> I definitely (but intermittently) see the failure with the assert patch
> applied.
> I'm not much closer to finding the problem, though.
> I think the challenge is finding out what bit of code might be
> unblocking SIGCHLD on this thread.

I set a breakpoint on pthread_create, and ran a basic dap test.  For the dap threads, there's a backtrace:
...
#24 0x00007ffff7401b7e in _PyObject_FastCallDict (func=<function at remote 0x7fffec10c1e0>, args=0x0, 
    nargs=<optimized out>, kwargs=0x0) at Objects/abstract.c:2310
#25 0x0000000000a353ea in dap_interp::init (this=0x2bfdac0, top_level=true)
    at /data/vries/gdb/src/gdb/python/py-dap.c:82
#26 0x000000000089ef02 in interp_set (interp=0x2bfdac0, top_level=true) at /data/vries/gdb/src/gdb/interps.c:140
#27 0x000000000089f197 in set_top_level_interpreter (name=0x286de50 <interpreter_p[abi:cxx11]+16> "dap")
    at /data/vries/gdb/src/gdb/interps.c:209
#28 0x000000000091ac77 in captured_main_1 (context=0x7fffffffdaa0) at /data/vries/gdb/src/gdb/main.c:1139
...
so I tried to wrap the code around the py-dap.c:82:
...
  gdbpy_ref<> result_obj (PyObject_CallObject (func.get (), nullptr));
  if (result_obj == nullptr)
    gdbpy_handle_exception ();
...
in the gdb::block_signals wrapper, but that didn't fix it either.
Comment 12 Tom Tromey 2023-07-28 17:58:02 UTC
Found it.
Comment 14 Sourceware Commits 2023-07-31 12:37:03 UTC
The master branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=30c01bb104e4dfc50e8cf3d0312bbf7d22f73d10

commit 30c01bb104e4dfc50e8cf3d0312bbf7d22f73d10
Author: Tom Tromey <tromey@adacore.com>
Date:   Fri Jul 28 12:02:38 2023 -0600

    Restore previous sigmask in gdb.block_signals
    
    Tom de Vries found a bug where, sometimes, a SIGCHLD would be
    delivered to a non-main thread, wreaking havoc.
    
    The problem is that gdb.block_signals after first blocking a set of
    signals, then unblocked the same set rather than restoring the initial
    situation.  This function being called from the DAP thread lead to
    SIGCHLD being unblocked there.
    
    This patch fixes the problem by restoring the previous set of signals
    instead.
    
    Tested-by: Tom de Vries <tdevries@suse.de>
    Reviewed-By: Tom de Vries <tdevries@suse.de>
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30680
Comment 15 Tom Tromey 2023-07-31 12:38:35 UTC
Fixed.