Created attachment 15010 [details] gdb.log When doing a gdb build with gcc 13.1.1 and with -O2 -fno-hoist-adjacent-loads to work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799, I run into: ... WARNING: ThreadSanitizer: data race (pid=22165) Read of size 4 at 0x00000261ca40 by thread T13: #0 event_pipe::is_open() const /data/vries/gdb/src/gdb/../gdbsupport/event-pipe.h:44 (gdb+0x92f381) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) ... Previous write of size 4 at 0x00000261ca40 by main thread: #0 event_pipe::close_pipe() /data/vries/gdb/src/gdbsupport/event-pipe.cc:62 (gdb+0x1610e2d) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) ... event-pipe.h:44 ... bool is_open () const 44 { return m_fds[0] != -1; } ... event-pipe.cc:62: ... void event_pipe::close_pipe () { ::close (m_fds[0]); ::close (m_fds[1]); 62 m_fds[0] = -1; m_fds[1] = -1; } ...
A bit more elaborate. WARNING: ThreadSanitizer: data race (pid=22165) Read of size 4 at 0x00000261ca40 by thread T13: #0 event_pipe::is_open() const /data/vries/gdb/src/gdb/../gdbsupport/event-pipe.h:44 (gdb+0x92f381) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #1 inf_ptrace_target::async_file_mark_if_open() /data/vries/gdb/src/gdb/inf-ptrace.h:73 (gdb+0x92f381) #2 sigchld_handler /data/vries/gdb/src/gdb/linux-nat.c:4238 (gdb+0x92f381) #3 <null> <null> (libtsan.so.2+0x3e2cc) (BuildId: 8f2a9be581a0fcb3d7109755a6067408093b9dbd) #4 _Py_read <null> (libpython3.11.so.1.0+0x2590e9) (BuildId: e08746d8f4494e69c6a275f48c30104fb13e6d81) Previous write of size 4 at 0x00000261ca40 by main thread: #0 event_pipe::close_pipe() /data/vries/gdb/src/gdbsupport/event-pipe.cc:62 (gdb+0x1610e2d) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #1 inf_ptrace_target::async_file_close() /data/vries/gdb/src/gdb/inf-ptrace.h:82 (gdb+0x93033c) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #2 linux_nat_target::async(bool) /data/vries/gdb/src/gdb/linux-nat.c:4279 (gdb+0x93033c) #3 target_ops::async(bool) /data/vries/gdb/src/gdb/target-delegates.c:2161 (gdb+0xce9b04) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #4 target_async(bool) /data/vries/gdb/src/gdb/target.c:4338 (gdb+0xcf987d) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #5 inferior_event_handler(inferior_event_type) /data/vries/gdb/src/gdb/inf-loop.c:52 (gdb+0x8b761a) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #6 fetch_inferior_event() /data/vries/gdb/src/gdb/infrun.c:4501 (gdb+0x8f564e) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #7 inferior_event_handler(inferior_event_type) /data/vries/gdb/src/gdb/inf-loop.c:42 (gdb+0x8b758c) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #8 infrun_async_inferior_event_handler /data/vries/gdb/src/gdb/infrun.c:9820 (gdb+0x8d4fc7) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #9 check_async_event_handlers() /data/vries/gdb/src/gdb/async-event.c:338 (gdb+0x5a640f) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #10 gdb_do_one_event(int) /data/vries/gdb/src/gdbsupport/event-loop.cc:221 (gdb+0x1600024) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #11 wait_sync_command_done() /data/vries/gdb/src/gdb/top.c:426 (gdb+0xd1874b) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #12 maybe_wait_sync_command_done(int) /data/vries/gdb/src/gdb/top.c:443 (gdb+0xd1cb4c) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #13 execute_command(char const*, int) /data/vries/gdb/src/gdb/top.c:576 (gdb+0xd1cb4c) #14 execute_control_command_1 /data/vries/gdb/src/gdb/cli/cli-script.c:529 (gdb+0x675883) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #15 execute_control_command(command_line*, int) /data/vries/gdb/src/gdb/cli/cli-script.c:701 (gdb+0x675d0e) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #16 execute_control_commands(command_line*, int) /data/vries/gdb/src/gdb/cli/cli-script.c:411 (gdb+0x675e59) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #17 operator() /data/vries/gdb/src/gdb/cli/cli-script.c:432 (gdb+0x675ef7) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #18 __invoke_impl<void, execute_control_commands_to_string(command_line*, int)::<lambda()>&> /usr/include/c++/13/bits/invoke.h:61 (gdb+0x675ef7) #19 __invoke_r<void, execute_control_commands_to_string(command_line*, int)::<lambda()>&> /usr/include/c++/13/bits/invoke.h:111 (gdb+0x675ef7) WROTE: <<<{"type": "event", "event": "output", "body": {"category": "stdout", "output": " #20 _M_invoke /usr/include/c++/13/bits/std_function.h:290 (gdb+0x675ef7) #21 std::function<void ()>::operator()() const /usr/include/c++/13/bits/std_function.h:591 (gdb+0xd189c1) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #22 execute_fn_to_ui_file(ui_file*, std::function<void ()>) /data/vries/gdb/src/gdb/top.c:630 (gdb+0xd189c1) #23 execute_fn_to_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::function<void ()>, bool) /data/vries/gdb/src/gdb/top.c:644 (gdb+0xd18bab) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #24 execute_control_commands_to_string[abi:cxx11](command_line*, int) /data/vries/gdb/src/gdb/cli/cli-script.c:430 (gdb+0x672034) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #25 execute_gdb_command /data/vries/gdb/src/gdb/python/python.c:690 (gdb+0xb1176e) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #26 <null> <null> (libpython3.11.so.1.0+0x1cd27c) (BuildId: e08746d8f4494e69c6a275f48c30104fb13e6d81) #27 std::function<void ()>::operator()() const /usr/include/c++/13/bits/std_function.h:591 (gdb+0xbc8db2) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #28 run_events /data/vries/gdb/src/gdb/run-on-main-thread.c:76 (gdb+0xbc8db2) #29 handle_file_event /data/vries/gdb/src/gdbsupport/event-loop.cc:573 (gdb+0x15fe861) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #30 gdb_wait_for_event /data/vries/gdb/src/gdbsupport/event-loop.cc:694 (gdb+0x15feead) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #31 gdb_do_one_event(int) /data/vries/gdb/src/gdbsupport/event-loop.cc:264 (gdb+0x15fff9a) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #32 start_event_loop /data/vries/gdb/src/gdb/main.c:412 (gdb+0x98ea81) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #33 captured_command_loop /data/vries/gdb/src/gdb/main.c:476 (gdb+0x98ea81) #34 captured_main /data/vries/gdb/src/gdb/main.c:1320 (gdb+0x992784) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) #35 gdb_main(captured_main_args*) /data/vries/gdb/src/gdb/main.c:1339 (gdb+0x992784) #36 main /data/vries/gdb/src/gdb/gdb.c:32 (gdb+0x47ab01) (BuildId: c3b5cbc2b30d2ab9521ca0586222e0001ad66898) Location is global 'inf_ptrace_target::m_event_pipe' of size 8 at 0x00000261ca40 (gdb+0x261ca40) SUMMARY: ThreadSanitizer: data race /data/vries/gdb/src/gdb/../gdbsupport/event-pipe.h:44 in event_pipe::is_open() const"
Doesn't reproduce all the time.
Also reproduces with -O0, reproduced 4 out of 25 times.
> #2 sigchld_handler /data/vries/gdb/src/gdb/linux-nat.c:4238 (gdb+0x92f381) This is odd because all the DAP-related threads should be started with SIGCHLD blocked.
(In reply to Tom de Vries from comment #3) > Also reproduces with -O0, reproduced 4 out of 25 times. That was with gdb 13.1.1 on openSUSE Tumbleweed. Also reproduces on openSUSE Leap 15.4, using gcc 7.5.0, reproduced 7 out of 25 times.
(In reply to Tom Tromey from comment #4) > > #2 sigchld_handler /data/vries/gdb/src/gdb/linux-nat.c:4238 (gdb+0x92f381) > > This is odd because all the DAP-related threads should be started > with SIGCHLD blocked. I've written an assert to try to catch this situation without tsan: ... diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index 250a8f43282..d6d0ba0f415 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -69,6 +69,7 @@ #include "gdbsupport/scope-exit.h" #include "gdbsupport/gdb-sigmask.h" #include "gdbsupport/common-debug.h" +#include "run-on-main-thread.h" #include <unordered_map> /* This comment documents high-level logic of this file. @@ -4227,6 +4228,7 @@ linux_nat_target::supports_disable_randomization () static void sigchld_handler (int signo) { + gdb_assert (is_main_thread ()); int old_errno = errno; if (debug_linux_nat) diff --git a/gdb/run-on-main-thread.c b/gdb/run-on-main-thread.c index c7d9de0afc8..f9388e33633 100644 --- a/gdb/run-on-main-thread.c +++ b/gdb/run-on-main-thread.c @@ -94,12 +94,27 @@ run_on_main_thread (std::function<void ()> &&func) serial_event_set (runnable_event); } +#if CXX_STD_THREAD +static void +initialize_main_thread_var (void) +{ + static bool initialized = false; + + if (initialized) + return; + initialized = true; + + main_thread = std::this_thread::get_id (); +} +#endif + /* See run-on-main-thread.h. */ bool is_main_thread () { #if CXX_STD_THREAD + initialize_main_thread_var (); return std::this_thread::get_id () == main_thread; #else-- return true; @@ -111,7 +126,7 @@ void _initialize_run_on_main_thread () { #if CXX_STD_THREAD - main_thread = std::this_thread::get_id (); + initialize_main_thread_var (); #endif runnable_event = make_serial_event (); add_file_handler (serial_event_fd (runnable_event), run_events, nullptr, ... and it does trigger.
Created attachment 15014 [details] LOG.gz (In reply to Tom de Vries from comment #6) > and it does trigger. In a run of 200, 12 times. There are problems in other test-cases as well: ... $ grep ERROR:.*sourcing LOG | sort | uniq -c 3 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/ada-scopes.exp. 6 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/args-env.exp. 87 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/basic-dap.exp. 6 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/bt-nodebug.exp. 18 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/catch-exception.exp. 60 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/cond-bp.exp. 24 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/hover.exp. 6 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/log-message.exp. 36 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/modules.exp. 3 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/scopes.exp. 12 ERROR: tcl error sourcing /data/vries/gdb/src/gdb/testsuite/gdb.dap/stop-at-main.exp. ...
(In reply to Tom de Vries from comment #7) > Created attachment 15014 [details] > LOG.gz Also occurring: ... "type": "event", "event": "output", "body": {"category": "stdout", "output": "Recursive internal problem.\n"}, "seq": 24}Content-Length: 129^M ^M {"type": "event", "event": "output", "body": {"category": "stdout", "output": "\u0000corrupted double-linked list\n"} ...
(In reply to Tom de Vries from comment #6) > diff --git a/gdb/run-on-main-thread.c b/gdb/run-on-main-thread.c > index c7d9de0afc8..f9388e33633 100644 > --- a/gdb/run-on-main-thread.c > +++ b/gdb/run-on-main-thread.c > @@ -94,12 +94,27 @@ run_on_main_thread (std::function<void ()> &&func) > serial_event_set (runnable_event); > } > > +#if CXX_STD_THREAD > +static void > +initialize_main_thread_var (void) > +{ > + static bool initialized = false; > + > + if (initialized) > + return; > + initialized = true; > + > + main_thread = std::this_thread::get_id (); > +} > +#endif > + > /* See run-on-main-thread.h. */ > > bool > is_main_thread () > { > #if CXX_STD_THREAD > + initialize_main_thread_var (); > return std::this_thread::get_id () == main_thread; > #else-- > return true; > @@ -111,7 +126,7 @@ void > _initialize_run_on_main_thread () > { > #if CXX_STD_THREAD > - main_thread = std::this_thread::get_id (); > + initialize_main_thread_var (); > #endif > runnable_event = make_serial_event (); > add_file_handler (serial_event_fd (runnable_event), run_events, nullptr, > ... Submitted this bit here ( https://sourceware.org/pipermail/gdb-patches/2023-July/201190.html ).
I definitely (but intermittently) see the failure with the assert patch applied. I'm not much closer to finding the problem, though. I think the challenge is finding out what bit of code might be unblocking SIGCHLD on this thread.
(In reply to Tom Tromey from comment #10) > I definitely (but intermittently) see the failure with the assert patch > applied. > I'm not much closer to finding the problem, though. > I think the challenge is finding out what bit of code might be > unblocking SIGCHLD on this thread. I set a breakpoint on pthread_create, and ran a basic dap test. For the dap threads, there's a backtrace: ... #24 0x00007ffff7401b7e in _PyObject_FastCallDict (func=<function at remote 0x7fffec10c1e0>, args=0x0, nargs=<optimized out>, kwargs=0x0) at Objects/abstract.c:2310 #25 0x0000000000a353ea in dap_interp::init (this=0x2bfdac0, top_level=true) at /data/vries/gdb/src/gdb/python/py-dap.c:82 #26 0x000000000089ef02 in interp_set (interp=0x2bfdac0, top_level=true) at /data/vries/gdb/src/gdb/interps.c:140 #27 0x000000000089f197 in set_top_level_interpreter (name=0x286de50 <interpreter_p[abi:cxx11]+16> "dap") at /data/vries/gdb/src/gdb/interps.c:209 #28 0x000000000091ac77 in captured_main_1 (context=0x7fffffffdaa0) at /data/vries/gdb/src/gdb/main.c:1139 ... so I tried to wrap the code around the py-dap.c:82: ... gdbpy_ref<> result_obj (PyObject_CallObject (func.get (), nullptr)); if (result_obj == nullptr) gdbpy_handle_exception (); ... in the gdb::block_signals wrapper, but that didn't fix it either.
Found it.
https://sourceware.org/pipermail/gdb-patches/2023-July/201215.html
The master branch has been updated by Tom Tromey <tromey@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=30c01bb104e4dfc50e8cf3d0312bbf7d22f73d10 commit 30c01bb104e4dfc50e8cf3d0312bbf7d22f73d10 Author: Tom Tromey <tromey@adacore.com> Date: Fri Jul 28 12:02:38 2023 -0600 Restore previous sigmask in gdb.block_signals Tom de Vries found a bug where, sometimes, a SIGCHLD would be delivered to a non-main thread, wreaking havoc. The problem is that gdb.block_signals after first blocking a set of signals, then unblocked the same set rather than restoring the initial situation. This function being called from the DAP thread lead to SIGCHLD being unblocked there. This patch fixes the problem by restoring the previous set of signals instead. Tested-by: Tom de Vries <tdevries@suse.de> Reviewed-By: Tom de Vries <tdevries@suse.de> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30680
Fixed.