[PATCH v3] Fix sporadic XFAILs in gdb.threads/attach-many-short-lived-threads.exp

Thiago Jung Bauermann thiago.bauermann@linaro.org
Wed Apr 10 03:59:58 GMT 2024


Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> This is about random test failures like those:
>
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 6: attach (EPERM)
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 7: attach (EPERM)
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: attach (EPERM)
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 10: attach (EPERM)
>
> The reason for this effect is apparently as follows:
>
> There is a race condition when gdb tries to attach a thread but the
> thread exits at the same time.  Normally when that happens the return
> code of ptrace(PTRACE_ATTACH, x) is EPERM, which could also have other
> reasons.  To detect the true reason, we try to open /proc/<pid>/status
> which normally fails in that situation, but it may happen that the
> fopen succeeds, and the thread disappears while reading the content,
> then linux_proc_pid_get_state fails to find the "State:" line.
> Since there is no other possible reason why this can happen,
> use that as an indication that the thread has most likely exited.
> ---
>  gdb/nat/linux-procfs.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
>
> v2: from kernel code review, it seems the missing "State:"
>  can only happen if the thread disappeared, so no need to
>  look at errno at all here.

As I said in v1 of this patch, my reading of the kernel code (function
task_state in linux/fs/proc/array.c) agrees with the statement above,
but I'm not confident in my conclusion.

The reason is that it's tricky to understand the effect of kernel
preemption as well as simultaneous execution of "competing" code in a
different CPU. Case in point: in the beginning of task_state, there's a
block enclosed in rcu_read_lock *and* inside that block there's also
some code within task_lock. Why both are necessary? I don't know. And
the weirdest thing to me is that the part which reads the task state for
the "State:" line isn't protected by either of those. My uninformed
assumption would be that it would be best if it were.

So unfortunately I don't feel comfortable enough giving my Reviewed-by:
to this patch, even though I think it's probably right.

I would defer to Pedro, who wrote this code originally, in commit
8784d56326e7 ("Linux: on attach, attach to lwps listed under
/proc/$pid/task/"). It was a long time ago, but perhaps he remembers why
he decided to assume that there could be a live task with no "State:"
line in its status file.

> v3: update commit message.
>
> diff --git a/gdb/nat/linux-procfs.c b/gdb/nat/linux-procfs.c
> index e2086952ce6..8d46d5bf289 100644
> --- a/gdb/nat/linux-procfs.c
> +++ b/gdb/nat/linux-procfs.c
> @@ -157,17 +157,12 @@ linux_proc_pid_is_gone (pid_t pid)
>    enum proc_state state;
>
>    have_state = linux_proc_pid_get_state (pid, 0, &state);
> -  if (have_state < 0)
> +  if (have_state <= 0)
>      {
> -      /* If we can't open the status file, assume the thread has
> -	 disappeared.  */
> +      /* If we can't open the status file or there is no "State:" line,
> +	 assume the thread has disappeared.  */
>        return 1;
>      }
> -  else if (have_state == 0)
> -    {
> -      /* No "State:" line, assume thread is alive.  */
> -      return 0;
> -    }
>    else
>      return (state == PROC_STATE_ZOMBIE || state == PROC_STATE_DEAD);
>  }

--
Thiago


More information about the Gdb-patches mailing list