[PATCH] Don't stop all threads prematurely after first step of "step N"
Simon Marchi
simark@simark.ca
Tue Jul 19 19:33:09 GMT 2022
On 2022-07-18 14:54, Pedro Alves wrote:
> In all-stop mode, when the target is itself in non-stop mode (like
> GNU/Linux), if you use the "step N" (or "stepi/next/nexti N") to step
> a thread a number of times:
>
> (gdb) help step
> step, s
> Step program until it reaches a different source line.
> Usage: step [N]
> Argument N means step N times (or till program stops for another reason).
>
> ... GDB prematurely stops all threads after the first step, and
> doesn't re-resume them for the subsequent N-1 steps. It's as if for
> the 2nd and subsequent steps, the command was running with
> scheduler-locking enabled.
>
> This can be observed with the testcase added by this commit, which
> looks like this:
>
> static pthread_barrier_t barrier;
>
> static void *
> thread_func (void *arg)
> {
> pthread_barrier_wait (&barrier);
> return NULL;
> }
>
> int
> main ()
> {
> pthread_t thread;
> int ret;
>
> pthread_barrier_init (&barrier, NULL, 2);
>
> /* We run to this line below, and then issue "next 3". That should
> step over the 3 lines below and land on the return statement. If
> GDB prematurely stops the thread_func thread after the first of
> the 3 nexts (and never resumes it again), then the join won't
> ever return. */
> pthread_create (&thread, NULL, thread_func, NULL); /* set break here */
> pthread_barrier_wait (&barrier);
> pthread_join (thread, NULL);
>
> return 0;
> }
>
> The test hangs and times out without the GDB fix:
>
> (gdb) next 3
> [New Thread 0x7ffff7d89700 (LWP 525772)]
> FAIL: gdb.threads/step-N-all-progress.exp: non-stop=off: target-non-stop=on: next 3 (timeout)
>
> The problem is a core gdb bug.
>
> When you do "step/stepi/next/nexti N", GDB internally creates a
> thread_fsm object and associates it with the stepping thread. For the
> stepping commands, the FSM's class is step_command_fsm. That object
> is what keeps track of how many steps are left to make. When one step
> finishes, handle_inferior_event calls stop_waiting and returns, and
> then fetch_inferior_event calls the "should_stop" method of the event
> thread's FSM. The implementation of that method decrements the
> steps-left counter. If the counter is 0, it returns true and we
> proceed to presenting the stop to the user. If it isn't 0 yet, then
> the method returns false, indicating to fetch_inferior_event to "keep
> going".
>
> Focusing now on when the first step finishes -- we're in "all-stop"
> mode, with the target in non-stop mode. When a step finishes,
> handle_inferior_event calls stop_waiting, which itself calls
> stop_all_threads to stop everything. I.e., after the first step
> completes, all threads are stopped, before handle_inferior_event
> returns. And after that, now in fetch_inferior_event, we consult the
> thread's thread_fsm::should_stop, which as we've seen, for the first
> step returns false -- i.e., we need to keep_going for another step.
> However, since the target is in non-stop mode, keep_going resumes
> _only_ the current thread. All the other threads remain stopped,
> inadvertently.
>
> If the target is in non-stop mode, we don't actually need to stop all
> threads right after each first step finishes, and then re-resume them
> again. We can instead defer stopping all threads until all the steps
> are completed.
>
> So fix this by delaying the stopping of all threads until after we
> called the FSM's "should_stop" method. I.e., move it from
> stop_waiting, to handle_inferior_events's callers,
> fetch_inferior_event and wait_for_inferior.
>
> New test included. Tested on x86-64 GNU/Linux native and gdbserver.
>
> Change-Id: Iaad50dcfea4464c84bdbac853a89df92ade6ae01
LGTM.
Simon
More information about the Gdb-patches
mailing list