[RFC PATCH 0/3] Fix attaching to process when it has zombie threads

Thiago Jung Bauermann thiago.bauermann@linaro.org
Thu Mar 21 23:11:46 GMT 2024


Hello,

This patch series fixes a GDB hang when attaching to a multi-threaded
inferior which happens often (but not always) on aarch64-linux and
powerpc64le-linux, as described in PR 31312.  See patch 3 for a detailed
descripiton of the problem.

Patches 1 and 2 are preparatory patches because I want to use existing code
to parse the /proc/PID/stat file to get the thread's starttime value, so
that GDB and gdbserver aren't fooled by PID reuse.

This patch series was tested on native and extended-remote aarch64-linux
and armv8l-linux-gnueabihf and no regressions were found, except for the
following:

When running gdb.threads/detach-step-over.exp on armv8l-linux-gnueabihf
extended-remote, sometimes GDBserver dies with:

  builtin_spawn /home/thiago.bauermann/.cache/builds/gdb-native-aarch32/gdb/testsuite/outputs/gdb.threads/detach-step-over/detach-step-over
  Remote debugging from host 127.0.0.1, port 56624
  Process /home/thiago.bauermann/.cache/builds/gdb-native-aarch32/gdb/testsuite/outputs/gdb.threads/detach-step-over/detach-step-over created; pid = 840876
  Attached; pid = 840821
  Detaching from process 840821
  Attached; pid = 840821
  /home/thiago.bauermann/src/binutils-gdb/gdbserver/linux-low.cc:1956: A problem internal to GDBserver has been detected.
  unsuspend LWP 840821, suspended=-1

The assertion triggered is this one:

  /* Decrement LWP's suspend count.  */

  static void
  lwp_suspended_decr (struct lwp_info *lwp)
  {
    lwp->suspended--;

    if (lwp->suspended < 0)
      {
        struct thread_info *thread = get_lwp_thread (lwp);

        internal_error ("unsuspend LWP %ld, suspended=%d\n", lwpid_of (thread),
  		      lwp->suspended);
      }
  }

Unfortunately for the moment I don't have time to further debug this
problem and I didn't want to keep sitting on these patches until I can come
back to this issue.

Note that of all the testcases in the GDB testsuite, only
detach-step-over.exp triggers the GDBserver internal error so it's a
localized problem.

This is why I'm posting the patch series as an RFC. Considering that it
fixes a problem that is causing instability in the testsuite results for
aarch64-linux and powerpc64le-linux, does it make sense to commit it as is,
and then investigate the GDBserver internal error on armv8l-linux-gnueabihf
later?

Thiago Jung Bauermann (3):
  gdb/nat: Use procfs(5) indexes in linux_common_core_of_thread
  gdb/nat: Factor linux_find_proc_stat_field out of
    linux_common_core_of_thread
  gdb/nat/linux: Fix attaching to process when it has zombie threads

 gdb/nat/linux-osdata.c | 65 +++++++++++++++++++++++++++++++++---------
 gdb/nat/linux-osdata.h |  7 +++++
 gdb/nat/linux-procfs.c | 19 ++++++++++++
 3 files changed, 77 insertions(+), 14 deletions(-)


base-commit: b42aa684f6ff2bce9b8bc58aa89574723f17f1ce


More information about the Gdb-patches mailing list