[PATCH 15/46] gdb, ze: add TARGET_WAITKIND_UNAVAILABLE
Lancelot SIX
lsix@lancelotsix.com
Wed Sep 4 10:47:25 GMT 2024
On 02/07/2024 12:42, Tankut Baris Aktemur wrote:
> From: Markus Metzger <markus.t.metzger@intel.com>
>
> This new WAITKIND means that we cannot interact with the thread at the
> moment. The thread may become available again at a later time.
>
> This will be used to model idle threads on Intel GT devices.
> ---
> gdb/fork-child.c | 10 +++++++---
> gdb/gdbthread.h | 12 +++++++++---
> gdb/infrun.c | 22 +++++++++++++++++++++-
> gdb/nat/fork-inferior.c | 10 ++++++++++
> gdb/remote.c | 6 +++++-
> gdb/target/waitstatus.c | 1 +
> gdb/target/waitstatus.h | 22 ++++++++++++++++++++++
> gdb/thread.c | 2 +-
> 8 files changed, 76 insertions(+), 9 deletions(-)
>
> diff --git a/gdb/fork-child.c b/gdb/fork-child.c
> index 539b11695d9..80372d023da 100644
> --- a/gdb/fork-child.c
> +++ b/gdb/fork-child.c
> @@ -124,14 +124,18 @@ gdb_startup_inferior (pid_t pid, int num_traps)
> {
> inferior *inf = current_inferior ();
> process_stratum_target *proc_target = inf->process_target ();
> + struct target_waitstatus ws;
>
> scoped_restore save_starting_up
> = make_scoped_restore (&inf->starting_up, true);
>
> - ptid_t ptid = startup_inferior (proc_target, pid, num_traps, NULL, NULL);
> + ptid_t ptid = startup_inferior (proc_target, pid, num_traps, &ws, NULL);
>
> - /* Mark all threads non-executing. */
> - set_executing (proc_target, ptid, false);
> + if (ws.kind () != TARGET_WAITKIND_UNAVAILABLE)
> + {
> + /* Mark all threads non-executing. */
> + set_executing (proc_target, ptid, false);
> + }
>
> return ptid;
> }
> diff --git a/gdb/gdbthread.h b/gdb/gdbthread.h
> index 73f6895fe46..91496c594e4 100644
> --- a/gdb/gdbthread.h
> +++ b/gdb/gdbthread.h
> @@ -847,9 +847,15 @@ extern bool threads_are_executing (process_stratum_target *targ);
> /* Merge the executing property of thread PTID of TARG over to its
> thread state property (frontend running/stopped view).
>
> - "not executing" -> "stopped"
> - "executing" -> "running"
> - "exited" -> "exited"
> + "not executing or not resumed" -> "stopped"
> + "executing and resumed" -> "running"
> + "exited" -> "exited"
> +
> + On GPUs, threads may exist but not currently be available, e.g. because
> + they are idle or are executing a dispatch of another process. We call
> + them unavailable and we model them as executing but not resumed. From
> + the front-end perspective, they are stopped. From the target
> + perspective, they are running.
Hi Baris,
I am not entirely sure I fully understand all the implications of this
model.
I can see a scenario where:
- you are in target-async & all-stop, want to stop_all_threads for a
given process (A),
- some of the device’s hardware threads are still busy executing work
for a process B, so those hardware threads are reported as "unavailable".
After that, all (current) threads of process A are stopped, but as soon
as HW is done with working on the dispatch of process B, it can start
executing more work from the current dispatch of process A. In this
configuration, can’t we end up in a situation where we think that all
activity is suspended, but still have "things actively running"?
I have not read the entire series yet, so maybe the solution to that
comes later, but I was wondering if there was more semantics attached to
"unavailable" to handle this scenario, or if another approach will come
later.
Writing this I also realize that the approach we use for AMDGPU to
prevent creation of new waves has not been sent upstream yet. Probably
because the test to actually validate this requires debug info, but we
should probably submit it as this is quite fundamental to properly
controlling a GPU. In short, the way it is done for AMDGPU is we have a
"prevent_new_threads" target method that the core can use when it needs
to be sure that no new threads can be created[1] (stop_all_threads is
one such place).
Best,
Lancelot.
[1]
https://github.com/ROCm/ROCgdb/commit/0260ac17a5e664c504092724f6bb393e18d0a6ba
>
> If PTID is minus_one_ptid, go over all threads of TARG.
>
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 1133ef3492f..cc80a4f7f96 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -5434,7 +5434,19 @@ mark_non_executing_threads (process_stratum_target *target,
> else
> mark_ptid = event_ptid;
>
> - set_executing (target, mark_ptid, false);
> + /* Unavailable threads are still executing.
> +
> + They were idle when we tried to stop them but they may start
> + executing work at any time.
> +
> + In all-stop mode, because the target does not listen to debug
> + events, those threads are practically not executing. But in
> + non-stop mode, the target can receive debug events from those
> + threads and the user can send interrupts to them. So, we leave
> + them as executing. */
> + if (!(target_is_non_stop_p ()
> + && ws.kind () == TARGET_WAITKIND_UNAVAILABLE))
> + set_executing (target, mark_ptid, false);
>
> /* Likewise the resumed flag. */
> set_resumed (target, mark_ptid, false);
> @@ -6625,6 +6637,14 @@ handle_inferior_event (struct execution_control_state *ecs)
> interps_notify_no_history ();
> stop_waiting (ecs);
> return;
> +
> + case TARGET_WAITKIND_UNAVAILABLE:
> + context_switch (ecs);
> + infrun_debug_printf ("unavailable");
> +
> + stop_print_frame = false;
> + stop_waiting (ecs);
> + return;
> }
> }
>
> diff --git a/gdb/nat/fork-inferior.c b/gdb/nat/fork-inferior.c
> index 41765b102bc..d70a3d616ca 100644
> --- a/gdb/nat/fork-inferior.c
> +++ b/gdb/nat/fork-inferior.c
> @@ -521,6 +521,16 @@ startup_inferior (process_stratum_target *proc_target, pid_t pid, int ntraps,
> resume_signal = ws.sig ();
> switch_to_thread (proc_target, event_ptid);
> break;
> +
> + case TARGET_WAITKIND_UNAVAILABLE:
> + /* We tried to interrupt the target but it responded that it is
> + currently unavailable.
> +
> + There is no guarantee that it will become available any time
> + soon. That's good enough for starting up the inferior,
> + however. */
> + switch_to_thread (proc_target, event_ptid);
> + return resume_ptid;
> }
>
> if (resume_signal != GDB_SIGNAL_TRAP)
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 7746abd1be2..25584db5b5a 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -4962,7 +4962,11 @@ remote_target::process_initial_stop_replies (int from_tty)
> || ws.sig () != GDB_SIGNAL_0)
> evthread->set_pending_waitstatus (ws);
>
> - set_executing (this, event_ptid, false);
> + /* Unavailable threads are executing (i.e. they may report events
> + and we cannot access their state) but not running (i.e. we tried
> + to stop them) from GDB's point of view. */
> + if (ws.kind () != TARGET_WAITKIND_UNAVAILABLE)
> + set_executing (this, event_ptid, false);
> set_running (this, event_ptid, false);
> get_remote_thread_info (evthread)->set_not_resumed ();
> }
> diff --git a/gdb/target/waitstatus.c b/gdb/target/waitstatus.c
> index 9e9b5633b12..1cd0eee2236 100644
> --- a/gdb/target/waitstatus.c
> +++ b/gdb/target/waitstatus.c
> @@ -62,6 +62,7 @@ DIAGNOSTIC_ERROR_SWITCH
> case TARGET_WAITKIND_NO_HISTORY:
> case TARGET_WAITKIND_NO_RESUMED:
> case TARGET_WAITKIND_THREAD_CREATED:
> + case TARGET_WAITKIND_UNAVAILABLE:
> return str;
> }
> DIAGNOSTIC_POP
> diff --git a/gdb/target/waitstatus.h b/gdb/target/waitstatus.h
> index 7d5ad3f9776..3d1d1b918c7 100644
> --- a/gdb/target/waitstatus.h
> +++ b/gdb/target/waitstatus.h
> @@ -107,6 +107,19 @@ enum target_waitkind
>
> /* The thread has exited. The exit status is in value.integer. */
> TARGET_WAITKIND_THREAD_EXITED,
> +
> + /* The thread is unavailable. We tried to stop it but it did not
> + respond in reasonable time. Chances are that we won't be able to
> + stop it.
> +
> + On GPUs, if we model hardware threads to avoid frequent entry/exit
> + notifications, idle threads may not respond to interrupts and hence
> + cannot be stopped by us.
> +
> + They become responsive again when they pick up new work and they may
> + create events such as hitting breakpoints. But we cannot tell when
> + this will happen - if at all. */
> + TARGET_WAITKIND_UNAVAILABLE,
> };
>
> /* Determine if KIND represents an event with a new child - a fork,
> @@ -165,6 +178,8 @@ DIAGNOSTIC_ERROR_SWITCH
> return "THREAD_CREATED";
> case TARGET_WAITKIND_THREAD_EXITED:
> return "THREAD_EXITED";
> + case TARGET_WAITKIND_UNAVAILABLE:
> + return "UNAVAILABLE";
> };
> DIAGNOSTIC_POP
>
> @@ -368,6 +383,13 @@ struct target_waitstatus
> return *this;
> }
>
> + target_waitstatus &set_unavailable ()
> + {
> + this->reset ();
> + m_kind = TARGET_WAITKIND_UNAVAILABLE;
> + return *this;
> + }
> +
> /* Get the kind of this wait status. */
>
> target_waitkind kind () const
> diff --git a/gdb/thread.c b/gdb/thread.c
> index 4ee46936861..71dfb0d0fb1 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -970,7 +970,7 @@ finish_thread_state (process_stratum_target *targ, ptid_t ptid)
> bool any_started = false;
>
> for (thread_info *tp : all_non_exited_threads (targ, ptid))
> - if (set_running_thread (tp, tp->executing ()))
> + if (set_running_thread (tp, tp->executing () && tp->resumed ()))
> any_started = true;
>
> if (any_started)
More information about the Gdb-patches
mailing list