Summary: | Attempt to resume already running thread | ||
---|---|---|---|
Product: | gdb | Reporter: | Andrew Burgess <aburgess> |
Component: | gdb | Assignee: | Pedro Alves <pedro> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | pedro |
Priority: | P2 | ||
Version: | HEAD | ||
Target Milestone: | 15.1 | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: |
reproducer
testsuite patch |
The error can be reproduced with the native target too. In the reproducer replace gdb.in.cmd with: set height 0 set trace-commands on set non-stop on set displaced-stepping off file thr.x starti break main Then when you run make you'll see: $ make ... snip ... +stepi [New Thread 0x7ffff7ff8700 (LWP 3889982)] Hello from the thread. cmd.gdb:14: Error in sourced command file: Couldn't get registers: No such process. **** FAILED **** Despite the 'no such process' message, I think this is the same issue. The problem seems to be that in the following situation: - gdb is doing a single step over a clone syscall, and - displaced stepping is off Then, when GDB steps it might be that the first event reported to GDB is the creation of the new thread. If this is the case the thread is added to GDB with state THREAD_RUNNING and 'executing=true'. However, the resumed field is left as false. Once the in-place step over has completed then GDB tries to restart all threads, however, the new thread will have already been set running by the Linux code from resume_stopped_resumed_lwps. If feels like, when the new thread is spotted we should actually, in this case, realise that all other threads are currently stopped, and so add the thread to GDB in the stopped state. Annoyingly, the same problem is seen with the remote protocol, only, I think the situation here is worse. When the stepi is started the remote end will see the new thread, but as this is non-stop mode, the new thread will be set running again, and only the completed step is reported to GDB. Now, when GDB tries to complete the step-over and resume all threads, first GDB updates the threads list and sees the new thread, then GDB tries to set the new thread running. Unfortunately, the new thread is already running, and so we get an error. Of course, as the new thread was allowed to run unchecked while we were doing the step over, it could easily have missed hitting a breakpoint. It would seem (to me) that when doing an in-place step over, we almost need to take the remote target out of non-stop mode temporarily so that any new threads that appear will be held until the step has completed?? Created attachment 13432 [details]
testsuite patch
Of course, once I realised that what we have is a thread that manages to be running when it shouldn't be, then it's easy enough to get GDB to trigger an assertion like this:
internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.
What's happening here is the new thread, the thread that GDB isn't expecting to be running, is hitting a stop event. The thread then ends up inside finish_step_over, where we make some assertions about the state of the thread that we expect to stop. As the thread that actually stops is not the one we expect, things go wrong and we hit the assertion.
The attached patch adds a new testcase that exposes this issue (on x86-64/Linux).
I've posted a series that fixes this, here: [PATCH 00/25] Step over thread clone and thread exit https://sourceware.org/pipermail/gdb-patches/2022-June/190181.html The master branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=0d36baa9af0d9929c96b89a184a469c432c68b0d commit 0d36baa9af0d9929c96b89a184a469c432c68b0d Author: Pedro Alves <pedro@palves.net> Date: Fri Nov 12 20:50:29 2021 +0000 Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED (A good chunk of the problem statement in the commit log below is Andrew's, adjusted for a different solution, and for covering displaced stepping too. The testcase is mostly Andrew's too.) This commit addresses bugs gdb/19675 and gdb/27830, which are about stepping over a breakpoint set at a clone syscall instruction, one is about displaced stepping, and the other about in-line stepping. Currently, when a new thread is created through a clone syscall, GDB sets the new thread running. With 'continue' this makes sense (assuming no schedlock): - all-stop mode, user issues 'continue', all threads are set running, a newly created thread should also be set running. - non-stop mode, user issues 'continue', other pre-existing threads are not affected, but as the new thread is (sort-of) a child of the thread the user asked to run, it makes sense that the new threads should be created in the running state. Similarly, if we are stopped at the clone syscall, and there's no software breakpoint at this address, then the current behaviour is fine: - all-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). While stepping the thread of interest all the other threads will be allowed to continue. A newly created thread will be set running, and then stopped once the thread of interest has completed its step. - non-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). Other threads might be running or stopped, but as with the continue case above, the new thread will be created running. The only possible issue here is that the new thread will be left running after the initial thread has completed its stepi. The user would need to manually select the thread and interrupt it, this might not be what the user expects. However, this is not something this commit tries to change. The problem then is what happens when we try to step over a clone syscall if there is a breakpoint at the syscall address. - For both all-stop and non-stop modes, with in-line stepping: + user issues 'stepi', + [non-stop mode only] GDB stops all threads. In all-stop mode all threads are already stopped. + GDB removes s/w breakpoint at syscall address, + GDB single steps just the thread of interest, all other threads are left stopped, + New thread is created running, + Initial thread completes its step, + [non-stop mode only] GDB resumes all threads that it previously stopped. There are two problems in the in-line stepping scenario above: 1. The new thread might pass through the same code that the initial thread is in (i.e. the clone syscall code), in which case it will fail to hit the breakpoint in clone as this was removed so the first thread can single step, 2. The new thread might trigger some other stop event before the initial thread reports its step completion. If this happens we end up triggering an assertion as GDB assumes that only the thread being stepped should stop. The assert looks like this: infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. - For both all-stop and non-stop modes, with displaced stepping: + user issues 'stepi', + GDB starts the displaced step, moves thread's PC to the out-of-line scratch pad, maybe adjusts registers, + GDB single steps the thread of interest, [non-stop mode only] all other threads are left as they were, either running or stopped. In all-stop, all other threads are left stopped. + New thread is created running, + Initial thread completes its step, GDB re-adjusts its PC, restores/releases scratchpad, + [non-stop mode only] GDB resumes the thread, now past its breakpoint. + [all-stop mode only] GDB resumes all threads. There is one problem with the displaced stepping scenario above: 3. When the parent thread completed its step, GDB adjusted its PC, but did not adjust the child's PC, thus that new child thread will continue execution in the scratch pad, invoking undefined behavior. If you're lucky, you see a crash. If unlucky, the inferior gets silently corrupted. What is needed is for GDB to have more control over whether the new thread is created running or not. Issue #1 above requires that the new thread not be allowed to run until the breakpoint has been reinserted. The only way to guarantee this is if the new thread is held in a stopped state until the single step has completed. Issue #3 above requires that GDB is informed of when a thread clones itself, and of what is the child's ptid, so that GDB can fixup both the parent and the child. When looking for solutions to this problem I considered how GDB handles fork/vfork as these have some of the same issues. The main difference between fork/vfork and clone is that the clone events are not reported back to core GDB. Instead, the clone event is handled automatically in the target code and the child thread is immediately set running. Note we have support for requesting thread creation events out of the target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported for the new/child thread. That would be sufficient to address in-line stepping (issue #1), but not for displaced-stepping (issue #3). To handle displaced-stepping, we need an event that is reported to the _parent_ of the clone, as the information about the displaced step is associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED includes no indication of which thread is the parent that spawned the new child. In fact, for some targets, like e.g., Windows, it would be impossible to know which thread that was, as thread creation there doesn't work by "cloning". The solution implemented here is to model clone on fork/vfork, and introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except that we end up with a new thread in the same process, instead of a new thread of a new process. Like FORKED and VFORKED, THREAD_CLONED waitstatuses have a child_ptid property, and the child is held stopped until GDB explicitly resumes it. This addresses the in-line stepping case (issues #1 and #2). The infrun code that handles displaced stepping fixup for the child after a fork/vfork event is thus reused for THREAD_CLONE, with some minimal conditions added, addressing the displaced stepping case (issue #3). The native Linux backend is adjusted to unconditionally report TARGET_WAITKIND_THREAD_CLONED events to the core. Following the follow_fork model in core GDB, we introduce a target_follow_clone target method, which is responsible for making the new clone child visible to the rest of GDB. Subsequent patches will add clone events support to the remote protocol and gdbserver. displaced_step_in_progress_thread becomes unused with this patch, but a new use will reappear later in the series. To avoid deleting it and readding it back, this patch marks it with attribute unused, and the latter patch removes the attribute again. We need to do this because the function is static, and with no callers, the compiler would warn, (error with -Werror), breaking the build. This adds a new gdb.threads/stepi-over-clone.exp testcase, which exercises stepping over a clone syscall, with displaced stepping vs inline stepping, and all-stop vs non-stop. We already test stepping over clone syscalls with gdb.base/step-over-syscall.exp, but this test uses pthreads, while the other test uses raw clone, and this one is more thorough. The testcase passes on native GNU/Linux, but fails against GDBserver. GDBserver will be fixed by a later patch in the series. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e Reviewed-By: Andrew Burgess <aburgess@redhat.com> The master branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=53de5394f7bf11995b1d9cb6885a8490b2ebc9da commit 53de5394f7bf11995b1d9cb6885a8490b2ebc9da Author: Pedro Alves <pedro@palves.net> Date: Tue Nov 23 20:35:12 2021 +0000 Support clone events in the remote protocol The previous patch taught GDB about a new TARGET_WAITKIND_THREAD_CLONED event kind, and made the Linux target report clone events. A following patch will teach Linux GDBserver to do the same thing. But before we get there, we need to teach the remote protocol about TARGET_WAITKIND_THREAD_CLONED. That's what this patch does. Clone is very similar to vfork and fork, and the new stop reply is likewise handled similarly. The stub reports "T05clone:...". GDBserver core is taught to handle TARGET_WAITKIND_THREAD_CLONED and forward it to GDB in this patch, but no backend actually emits it yet. That will be done in a following patch. Documentation for this new remote protocol feature is included in a documentation patch later in the series. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: If271f20320d864f074d8ac0d531cc1a323da847f Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 The master branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=65c459abebf70bd5a64dcee11d4d7d4a8498465f commit 65c459abebf70bd5a64dcee11d4d7d4a8498465f Author: Pedro Alves <pedro@palves.net> Date: Tue Nov 23 20:35:12 2021 +0000 Thread options & clone events (core + remote) A previous patch taught GDB about a new TARGET_WAITKIND_THREAD_CLONED event kind, and made the Linux target report clone events. A following patch will teach Linux GDBserver to do the same thing. However, for remote debugging, it wouldn't be ideal for GDBserver to report every clone event to GDB, when GDB only cares about such events in some specific situations. Reporting clone events all the time would be potentially chatty. We don't enable thread create/exit events all the time for the same reason. Instead we have the QThreadEvents packet. QThreadEvents is target-wide, though. This patch makes GDB instead explicitly request that the target reports clone events or not, on a per-thread basis. In order to be able to do that with GDBserver, we need a new remote protocol feature. Since a following patch will want to enable thread exit events on per-thread basis too, the packet introduced here is more generic than just for clone events. It lets you enable/disable a set of options at once, modelled on Linux ptrace's PTRACE_SETOPTIONS. IOW, this commit introduces a new QThreadOptions packet, that lets you specify a set of per-thread event options you want to enable. The packet accepts a list of options/thread-id pairs, similarly to vCont, processed left to right, with the options field being a number interpreted as a bit mask of options. The only option defined in this commit is GDB_THREAD_OPTION_CLONE (0x1), which ask the remote target to report clone events. Another patch later in the series will introduce another option. For example, this packet sets option "1" (clone events) on thread p1000.2345: QThreadOptions;1:p1000.2345 and this clears options for all threads of process 1000, and then sets option "1" (clone events) on thread p1000.2345: QThreadOptions;0:p1000.-1;1:p1000.2345 This clears options of all threads of all processes: QThreadOptions;0 The target reports the set of supported options by including "QThreadOptions=<supported options>" in its qSupported response. infrun is then tweaked to enable GDB_THREAD_OPTION_CLONE when stepping over a breakpoint. Unlike PTRACE_SETOPTIONS, fork/vfork/clone children do NOT inherit their parent's thread options. This is so that GDB can send e.g., "QThreadOptions;0;1:TID" without worrying about threads it doesn't know about yet. Documentation for this new remote protocol feature is included in a documentation patch later in the series. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie41e5093b2573f14cf6ac41b0b5804eba75be37e The master branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=25b16bc9e791d53028c3c180125a80f345b97d94 commit 25b16bc9e791d53028c3c180125a80f345b97d94 Author: Pedro Alves <pedro@palves.net> Date: Tue Nov 23 20:35:12 2021 +0000 Thread options & clone events (native Linux) This commit teaches the native Linux target about the GDB_THREAD_OPTION_CLONE thread option. It's actually simpler to just continue reporting all clone events unconditionally to the core. There's never any harm in reporting a clone event when the option is disabled. All we need to do is to report support for the option, otherwise GDB falls back to use target_thread_events(). Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: If90316e2dcd0c61d0fefa0d463c046011698acf9 The master branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=393a6b5947d037a55ce1b57474e1ffb3074f544e commit 393a6b5947d037a55ce1b57474e1ffb3074f544e Author: Pedro Alves <pedro@palves.net> Date: Tue Nov 23 20:35:12 2021 +0000 Thread options & clone events (Linux GDBserver) This patch teaches the Linux GDBserver backend to report clone events to GDB, when GDB has requested them with the GDB_THREAD_OPTION_CLONE thread option, via the new QThreadOptions packet. This shuffles code in linux_process_target::handle_extended_wait around to a more logical order when we now have to handle and potentially report all of fork/vfork/clone. Raname lwp_info::fork_relative -> lwp_info::relative as the field is no longer only about (v)fork. With this, gdb.threads/stepi-over-clone.exp now cleanly passes against GDBserver, so remove the native-target-only requirement from that testcase. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I3a19bc98801ec31e5c6fdbe1ebe17df855142bb2 Fixed. |
Created attachment 13423 [details] reproducer When using 'target remote', 'set non-stop on' and 'set displaced-stepping off', if we step over an instruction that creates a new thread then GDB ends up trying to resume a thread that is already running. To reproduce, download the attached tar file then: $ tar -xf thr-resume-issue.tar.xz $ cd thr-resume-issue $ make .....snip.... [New Thread 3878916.3878917] cmd.gdb:12: Error in sourced command file: PC register is not available **** FAILED **** This test case is only going to work on an x86-64/Linux setup. Notice the "PC register is not available" error. This is caused by GDB trying to read the $pc from the new running thread. In more detail what happens is that GDB stops at the syscall instruction that will spawn a new thread. At this point the inferior is single threaded. Next we 'stepi', as displaced stepping is off we do inplace stepping. When the step is completed we end up in finish_step_over (infrun.c), this then calls restart_threads. In restart_threads we call update_thread_list and spot the new thread. The thread is created with executing == true and state == THREAD_RUNNING, but, resumed == false. Back in restart_threads we then loop over all threads and try to put the thread "back to what they were trying to do back when we paused them for an in-line step-over." For our new thread we try to resume the thread (even though it's already running), which requires a read of $pc, which causes the error we see.