This program: ---8<--- #include <pthread.h> #include <unistd.h> static void function_that_segfaults (void) { int *p = 0; *p = 1; } static void break_here (void) {} static void * thread_func (void *p) { for (;;) sleep (1); return NULL; } static void * thread_func2 (void *p) { sleep (1); break_here (); return NULL; } int main (void) { pthread_t threads[10]; pthread_create (&threads[0], NULL, thread_func, NULL); pthread_create (&threads[1], NULL, thread_func, NULL); pthread_create (&threads[2], NULL, thread_func, NULL); pthread_create (&threads[3], NULL, thread_func, NULL); pthread_create (&threads[5], NULL, thread_func, NULL); pthread_create (&threads[6], NULL, thread_func, NULL); pthread_create (&threads[4], NULL, thread_func2, NULL); sleep (60); return function_that_segfaults != 0; } --->8--- $ gcc test.c -g3 -O0 -pthread $ ./gdb -q -nx --data-directory=data-directory a.out -ex "b break_here if function_that_segfaults()" Reading symbols from a.out... Breakpoint 1 at 0x11ae: file test.c, line 13. (gdb) r Starting program: /home/smarchi/build/binutils-gdb/gdb/a.out [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff7d99700 (LWP 3567019)] [New Thread 0x7ffff7598700 (LWP 3567020)] [New Thread 0x7ffff6d97700 (LWP 3567021)] [New Thread 0x7ffff6596700 (LWP 3567022)] [New Thread 0x7ffff5d95700 (LWP 3567023)] [New Thread 0x7ffff5594700 (LWP 3567024)] [New Thread 0x7ffff4d93700 (LWP 3567025)] Error in testing breakpoint condition: Couldn't get registers: No such process. An error occurred while in a function called from GDB. Evaluation of the expression containing the function (function_that_segfaults) will be abandoned. When the function is done executing, GDB will silently stop. Selected thread is running. (gdb) The "Couldn't get registers: No such process." is very strange. We expect GDB to say that the thread received a signal (SIGSEGV) while running the hand-called function. And then if you continue with: (gdb) kill Kill the program being debugged? (y or n) y [Inferior 1 (process 3567034) killed] (gdb) r Starting program: /home/smarchi/build/binutils-gdb/gdb/a.out /home/smarchi/src/binutils-gdb/gdb/target.c:2607: internal-error: target_wait: Assertion `!proc_target->commit_resumed_state' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Looking at the proceed call here: (top-gdb) bt #0 proceed (addr=0x555555555189, siggnal=GDB_SIGNAL_0) at /home/smarchi/src/binutils-gdb/gdb/infrun.c:3046 #1 0x0000558e5d95a128 in run_inferior_call (sm=std::unique_ptr<call_thread_fsm> = {...}, call_thread=0x61700009e680, real_pc=0x555555555189) at /home/smarchi/src/binutils-gdb/gdb/infcall.c:610 #2 0x0000558e5d95ff6e in call_function_by_hand_dummy (function=0x611000489d00, default_return_type=0x0, args=..., dummy_dtor=0x0, dummy_dtor_data=0x0) at /home/smarchi/src/binutils-gdb/gdb/infcall.c:1279 #3 0x0000558e5d95b4be in call_function_by_hand (function=0x611000489d00, default_return_type=0x0, args=...) at /home/smarchi/src/binutils-gdb/gdb/infcall.c:741 #4 0x0000558e5d609a2e in evaluate_subexp_do_call (exp=0x6030001579f0, noside=EVAL_NORMAL, callee=0x611000489d00, argvec=..., function_name=0x0, default_return_type=0x0) at /home/smarchi/src/binutils-gdb/gdb/eval.c:674 #5 0x0000558e5d60a7c5 in expr::operation::evaluate_funcall (this=0x603000157ab0, expect_type=0x0, exp=0x6030001579f0, noside=EVAL_NORMAL, function_name=0x0, args=std::__debug::vector of length 0, capacity 0) at /home/smarchi/src/binutils-gdb/gdb/eval.c:702 #6 0x0000558e5c4090aa in expr::operation::evaluate_funcall (this=0x603000157ab0, expect_type=0x0, exp=0x6030001579f0, noside=EVAL_NORMAL, args=std::__debug::vector of length 0, capacity 0) at /home/smarchi/src/binutils-gdb/gdb/expression.h:136 #7 0x0000558e5d60ad63 in expr::var_value_operation::evaluate_funcall (this=0x603000157ab0, expect_type=0x0, exp=0x6030001579f0, noside=EVAL_NORMAL, args=std::__debug::vector of length 0, capacity 0) at /home/smarchi/src/binutils-gdb/gdb/eval.c:714 #8 0x0000558e5cb8d2be in expr::funcall_operation::evaluate (this=0x607000083f80, expect_type=0x0, exp=0x6030001579f0, noside=EVAL_NORMAL) at /home/smarchi/src/binutils-gdb/gdb/expop.h:2178 #9 0x0000558e5d604e00 in expression::evaluate (During symbol reading: Child DIE 0x8d876c and its abstract origin 0x8f9b2b have different parents sthis=0x6030001579f0, expect_type=0x0, noside=EVAL_NORMAL) at /home/smarchi/src/binutils-gdb/gdb/eval.c:101 #10 0x0000558e5d604f71 in evaluate_expression (exp=0x6030001579f0, expect_type=0x0) at /home/smarchi/src/binutils-gdb/gdb/eval.c:115 #11 0x0000558e5c8c99b9 in breakpoint_cond_eval (exp=0x6030001579f0) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:4739 #12 0x0000558e5c8d1f11 in bpstat_check_breakpoint_conditions (bs=0x6060001b29c0, thread=0x61700009e680) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:5303 #13 0x0000558e5c8d4b45 in bpstat_stop_status (aspace=0x603000045a00, bp_addr=0x5555555551ae, thread=0x61700009e680, ws=..., stop_chain=0x6060001b29c0) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:5475 #14 0x0000558e5da1f939 in handle_signal_stop (ecs=0x7fff97a4bd50) at /home/smarchi/src/binutils-gdb/gdb/infrun.c:6200 #15 0x0000558e5da19441 in handle_inferior_event (ecs=0x7fff97a4bd50) at /home/smarchi/src/binutils-gdb/gdb/infrun.c:5690 #16 0x0000558e5da05206 in fetch_inferior_event () at /home/smarchi/src/binutils-gdb/gdb/infrun.c:4091 #17 0x0000558e5d94fad4 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/smarchi/src/binutils-gdb/gdb/inf-loop.c:41 #18 0x0000558e5dc29bdd in handle_target_event (error=0, client_data=0x0) at /home/smarchi/src/binutils-gdb/gdb/linux-nat.c:4096 #19 0x0000558e5f4e4dd1 in handle_file_event (file_ptr=0x607000016050, ready_mask=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:574 #20 0x0000558e5f4e562c in gdb_wait_for_event (block=0) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:700 #21 0x0000558e5f4e343c in gdb_do_one_event () at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:212 #22 0x0000558e5dd29d99 in start_event_loop () at /home/smarchi/src/binutils-gdb/gdb/main.c:421 #23 0x0000558e5dd2a1df in captured_command_loop () at /home/smarchi/src/binutils-gdb/gdb/main.c:481 #24 0x0000558e5dd2fad9 in captured_main (data=0x7fff97a4c200) at /home/smarchi/src/binutils-gdb/gdb/main.c:1348 #25 0x0000558e5dd2fbc2 in gdb_main (args=0x7fff97a4c200) at /home/smarchi/src/binutils-gdb/gdb/main.c:1363 #26 0x0000558e5c3e1ddd in main (argc=7, argv=0x7fff97a4c378) at /home/smarchi/src/binutils-gdb/gdb/gdb.c:32 We find that GDB tries to resume some other threads than the event thread (for which we evaluate the breakpoint condition), because it thinks they are not resumed. Probably because when the linux-nat target added them, they were added in the non-resumed state and stayed this way.
Wow, it's a small world. I literally just started looking at this same issue this week. The whole thread not marked resumed issue is fixed by this excellent patch: https://sourceware.org/pipermail/gdb-patches/2022-January/185109.html Which you know as you already posted a link to this bug to that thread. However, there are so many other problem related to this issue. The first thing I noticed is that run_inferior_call calls clear_proceed_status, which in all-stop mode calls clear_proceed_status_thread for each thread. Once the above patch is merged I plan to add an assert to clear_proceed_status_thread that the thread we are clearing is not resumed and not executing. Currently the not-executing assert will fail, but (due to the above patch being missing) the not-resumed assert will only fail sometimes. If we ignore the clear_proceed_status issue, then with the above patch the resumed flag will be correct, and GDB will not try to start the already resumed threads as part of the inferior call. However, after the call, as we're in all-stop mode, GDB will stop all threads. However, if the breakpoint condition doesn't segfault, but instead just returns false, then GDB will resume the single thread that stopped for the breakpoint - leaving all the other threads stopped. I'm currently working on the idea that when we evaluate the breakpoint condition we temporarily place GDB into non-stop mode, this would mean that, when we evaluate the b/p condition we only restart the one thread, and afterwards, we only expect the one thread to stop, but I need to do lots more testing yet - maybe this is a really bad idea. The only other option I can think of is to somehow have the infcall code figure out that we are in all-stop mode, but some threads are already running. Then, after making the inferior call we only stop the set of threads that we started. However, this has a massive problem; how to handle new threads? I'll clean up my correct patch and post it to this bug later today in case anyone wants to try it. I'll also add your crashing function test to my working branch to make sure that is handled too.
Created attachment 14005 [details] A WIP patch Here's the patch I'm currently working on. This should apply to current master and resolves the issue in this bug, as well as the original issue I was working on. I've run the complete testsuite on GNU/Linux x86-64 with no regressions. I still need to do lots more testing, especially around things like handling targets that don't support non-stop mode, and what happens if some other thread stops while we are evaluating the breakpoint condition. But any initial thoughts are welcome.
(In reply to Andrew Burgess from comment #1) > Wow, it's a small world. I literally just started looking at this same > issue this week. > > The whole thread not marked resumed issue is fixed by this excellent patch: > > https://sourceware.org/pipermail/gdb-patches/2022-January/185109.html > > Which you know as you already posted a link to this bug to that thread. > > However, there are so many other problem related to this issue. > > The first thing I noticed is that run_inferior_call calls > clear_proceed_status, which in all-stop mode calls > clear_proceed_status_thread for each thread. > > Once the above patch is merged I plan to add an assert to > clear_proceed_status_thread that the thread we are clearing is not resumed > and not executing. > > Currently the not-executing assert will fail, but (due to the above patch > being missing) the not-resumed assert will only fail sometimes. > > If we ignore the clear_proceed_status issue, then with the above patch the > resumed flag will be correct, and GDB will not try to start the already > resumed threads as part of the inferior call. > > However, after the call, as we're in all-stop mode, GDB will stop all > threads. > > However, if the breakpoint condition doesn't segfault, but instead just > returns false, then GDB will resume the single thread that stopped for the > breakpoint - leaving all the other threads stopped. Yeah, the fact that the breakpoint condition function caused a segfault is just another difficulty on top. You can ignore that part. > I'm currently working on the idea that when we evaluate the breakpoint > condition we temporarily place GDB into non-stop mode, this would mean that, > when we evaluate the b/p condition we only restart the one thread, and > afterwards, we only expect the one thread to stop, but I need to do lots > more testing yet - maybe this is a really bad idea. > > The only other option I can think of is to somehow have the infcall code > figure out that we are in all-stop mode, but some threads are already > running. Then, after making the inferior call we only stop the set of > threads that we started. However, this has a massive problem; how to handle > new threads? When thinking about this, my intuition was more like the later. In all-stop over a non-stop target: 1. A thread hits a breakpoint, only that thread is stopped while we process the breakpoint hit 2. When doing the infcall in the breakpoint condition, only that thread is resumed (the other threads already are) 3. When the infcall is done, only that thread is stopped 4a. If the condition is true, then GDB stops all threads 4b. if the condition is false, that thread is resumed In all-stop over an all-stop target: 1. A thread hits a breakpoint, all threads are stopped while we process the breakpoint hit 2. When doing the infcall in the breakpoint condition, all threads are resumed (is this what would happen if the user were to do a manual infcall?) 3. When the infcall is done, all threads are stopped 4a. If the condition is true, all threads remain stopped 4b. If the condition is false, all threads are resumed In non-stop over a non-stop target, then it looks like "all-stop-on-top-of-non-stop", except that not all threads are stopped in step 4a. I didn't really think through what would happen to new threads, I suppose they would just keep running. > > I'll clean up my correct patch and post it to this bug later today in case > anyone wants to try it. I'll also add your crashing function test to my > working branch to make sure that is handled too. Thanks, that's some really quick customer service.
A highly-related patch series was this: https://sourceware.org/pipermail/gdb-patches/2021-March/176654.html Perhaps there are a few useful things that still apply to the current master. > In all-stop over an all-stop target: > > 1. A thread hits a breakpoint, all threads are stopped while we process > the breakpoint hit > 2. When doing the infcall in the breakpoint condition, all threads are > resumed (is this what would happen if the user were to do a manual infcall?) I think GDB should act like the "scheduler-locking on" mode in this case, because if another thread has a pending event, the condition evaluation could be dismissed. This is what distinguishes an infcall in condition evaluation from a manual infcall. The series linked above introduced an `in_cond_eval` flag to make this distinction.
https://sourceware.org/pipermail/gdb-patches/2022-October/192926.html
*** Bug 23191 has been marked as a duplicate of this bug. ***
*** Bug 28911 has been marked as a duplicate of this bug. ***
The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=3df7843699ff3610f89ac880685396b531d8ec1b commit 3df7843699ff3610f89ac880685396b531d8ec1b Author: Andrew Burgess <aburgess@redhat.com> Date: Fri Oct 9 13:27:13 2020 +0200 gdb: fix b/p conditions with infcalls in multi-threaded inferiors This commit fixes bug PR 28942, that is, creating a conditional breakpoint in a multi-threaded inferior, where the breakpoint condition includes an inferior function call. Currently, when a user tries to create such a breakpoint, then GDB will fail with: (gdb) break infcall-from-bp-cond-single.c:61 if (return_true ()) Breakpoint 2 at 0x4011fa: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c, line 61. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2460150)] [New Thread 0x7ffff745c700 (LWP 2460151)] [New Thread 0x7ffff6c5b700 (LWP 2460152)] [New Thread 0x7ffff645a700 (LWP 2460153)] [New Thread 0x7ffff5c59700 (LWP 2460154)] Error in testing breakpoint condition: Couldn't get registers: No such process. An error occurred while in a function called from GDB. Evaluation of the expression containing the function (return_true) will be abandoned. When the function is done executing, GDB will silently stop. Selected thread is running. (gdb) Or, in some cases, like this: (gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg, 1)) Breakpoint 2 at 0x401194: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c, line 56. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2461106)] [New Thread 0x7ffff745c700 (LWP 2461107)] ../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. The precise error depends on the exact thread state; so there's race conditions depending on which threads have fully started, and which have not. But the underlying problem is always the same; when GDB tries to execute the inferior function call from within the breakpoint condition, GDB will, incorrectly, try to resume threads that are already running - GDB doesn't realise that some threads might already be running. The solution proposed in this patch requires an additional member variable thread_info::in_cond_eval. This flag is set to true (in breakpoint.c) when GDB is evaluating a breakpoint condition. In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is true, then GDB will only try to resume the current thread, that is, the thread for which the breakpoint condition is being evaluated. This solves the problem of GDB trying to resume threads that are already running. The next problem is that inferior function calls are assumed to be synchronous, that is, GDB doesn't expect to start an inferior function call in thread #1, then receive a stop from thread #2 for some other, unrelated reason. To prevent GDB responding to an event from another thread, we update fetch_inferior_event and do_target_wait in infrun.c, so that, when an inferior function call (on behalf of a breakpoint condition) is in progress, we only wait for events from the current thread (the one evaluating the condition). In do_target_wait I had to change the inferior_matches lambda function, which is used to select which inferior to wait on. Previously the logic was this: auto inferior_matches = [&wait_ptid] (inferior *inf) { return (inf->process_target () != nullptr && ptid_t (inf->pid).matches (wait_ptid)); }; This compares the pid of the inferior against the complete ptid we want to wait on. Before this commit wait_ptid was only ever minus_one_ptid (which is special, and means any process), and so every inferior would match. After this commit though wait_ptid might represent a specific thread in a specific inferior. If we compare the pid of the inferior to a specific ptid then these will not match. The fix is to compare against the pid extracted from the wait_ptid, not against the complete wait_ptid itself. In fetch_inferior_event, after receiving the event, we only want to stop all the other threads, and call inferior_event_handler with INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint. If we are, then all the other threads should be left doing whatever they were before. The inferior_event_handler call will be performed once the breakpoint condition has finished being evaluated, and GDB decides to stop or not. The final problem that needs solving relates to GDB's commit-resume mechanism, which allows GDB to collect resume requests into a single packet in order to reduce traffic to a remote target. The problem is that the commit-resume mechanism will not send any resume requests for an inferior if there are already events pending on the GDB side. Imagine an inferior with two threads. Both threads hit a breakpoint, maybe the same conditional breakpoint. At this point there are two pending events, one for each thread. GDB selects one of the events and spots that this is a conditional breakpoint, GDB evaluates the condition. The condition includes an inferior function call, so GDB sets up for the call and resumes the one thread, the resume request is added to the commit-resume queue. When the commit-resume queue is committed GDB sees that there is a pending event from another thread, and so doesn't send any resume requests to the actual target, GDB is assuming that when we wait we will select the event from the other thread. However, as this is an inferior function call for a condition evaluation, we will not select the event from the other thread, we only care about events from the thread that is evaluating the condition - and the resume for this thread was never sent to the target. And so, GDB hangs, waiting for an event from a thread that was never fully resumed. To fix this issue I have added the concept of "forcing" the commit-resume queue. When enabling commit resume, if the force flag is true, then any resumes will be committed to the target, even if there are other threads with pending events. A note on authorship: this patch was based on some work done by Natalia Saiapova and Tankut Baris Aktemur from Intel[1]. I have made some changes to their work in this version. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942 [1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html Co-authored-by: Natalia Saiapova <natalia.saiapova@intel.com> Co-authored-by: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Tested-By: Luis Machado <luis.machado@arm.com> Tested-By: Keith Seitz <keiths@redhat.com>
(In reply to Sourceware Commits from comment #8) > The master branch has been updated by Andrew Burgess > <aburgess@sourceware.org>: > > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git; > h=3df7843699ff3610f89ac880685396b531d8ec1b > > commit 3df7843699ff3610f89ac880685396b531d8ec1b > Author: Andrew Burgess <aburgess@redhat.com> > Date: Fri Oct 9 13:27:13 2020 +0200 > > gdb: fix b/p conditions with infcalls in multi-threaded inferiors > > This commit fixes bug PR 28942, that is, creating a conditional > breakpoint in a multi-threaded inferior, where the breakpoint > condition includes an inferior function call. Is there still something left to do here?