Bug 27822 - [gdb/tdep, x86_64] Wrong thread picked to select process 64/32-bitness
Summary: [gdb/tdep, x86_64] Wrong thread picked to select process 64/32-bitness
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: tdep (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 11.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-04 14:00 UTC by Tom de Vries
Modified: 2021-05-23 08:11 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2021-05-04 14:00:19 UTC
See this conversation here ( https://lore.kernel.org/io-uring/CAHk-=wh0KoEZXPYMGkfkeVEerSCEF1AiCZSvz9TRrx=Kj74D+Q@mail.gmail.com/ ).

The problem seems to be that gdb picks the wrong thread to select 32-bit/64-bit property for a process.

AFAICT, this is done here in x86_linux_nat_target::read_description:
...
  /* GNU/Linux LWP ID's are process ID's.  */
  tid = inferior_ptid.lwp ();
  if (tid == 0)
    tid = inferior_ptid.pid (); /* Not a threaded program.  */
...

It's not obvious to me from the discussion what is wrong with that.
Comment 1 Tom de Vries 2021-05-04 15:26:41 UTC
Anyway, I've tested this patch and no regressions:
...
diff --git a/gdb/x86-linux-nat.c b/gdb/x86-linux-nat.c
index 85c7f0ddc94..bdf381f1430 100644
--- a/gdb/x86-linux-nat.c
+++ b/gdb/x86-linux-nat.c
@@ -114,9 +114,7 @@ x86_linux_nat_target::read_description ()
   uint64_t xcr0_features_bits;
 
   /* GNU/Linux LWP ID's are process ID's.  */
-  tid = inferior_ptid.lwp ();
-  if (tid == 0)
-    tid = inferior_ptid.pid (); /* Not a threaded program.  */
+  tid = inferior_ptid.pid ();
 
 #ifdef __x86_64__
   {
...
so I wonder if this fixes the observed problems.
Comment 2 Tom de Vries 2021-05-07 08:44:50 UTC
Posted patch: https://sourceware.org/pipermail/gdb-patches/2021-May/178596.html
Comment 3 Sourceware Commits 2021-05-23 08:08:50 UTC
The master branch has been updated by Tom de Vries <vries@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=fbf3c4b97907cb198168f58e7a22d497868e5926

commit fbf3c4b97907cb198168f58e7a22d497868e5926
Author: Tom de Vries <tdevries@suse.de>
Date:   Sun May 23 10:08:45 2021 +0200

    [gdb/tdep] Use pid to choose process 64/32-bitness
    
    In a linux kernel mailing list discussion, it was mentioned that "gdb has
    this odd thing where it takes the 64-bit vs 32-bit data for the whole process
    from one thread, and picks the worst possible thread to do it (ie explicitly
    not even the main thread, ...)" [1].
    
    The picking of the thread is done here in
    x86_linux_nat_target::read_description:
    ...
      /* GNU/Linux LWP ID's are process ID's.  */
      tid = inferior_ptid.lwp ();
      if (tid == 0)
        tid = inferior_ptid.pid (); /* Not a threaded program.  */
    ...
    
    To understand what this code does, let's investigate a scenario in which
    inferior_ptid.lwp () != inferior_ptid.pid ().
    
    Say we start exec jit-attach-pie, identified with pid x.  The main thread
    starts another thread that sleeps, and then the main thread waits for the
    sleeping thread.  So we have two threads, identified with LWP IDs x and x+1:
    ...
    PID  LWP  CMD
    x    x    ./jit-attach-pie
    x    x+1  ./jit-attach-pie
    ...
    [ The thread with LWP x is known as the thread group leader. ]
    
    When attaching to this exec using the pid, gdb does a stop_all_threads which
    iterates over all the threads, first LWP x, and then LWP x+1.
    
    So the state we arrive with at x86_linux_nat_target::read_description is:
    ...
    (gdb) p inferior_ptid
    $1 = {m_pid = x, m_lwp = x+1, m_tid = 0}
    ...
    and consequently we probe 64/32-bitness from thread LWP x+1.
    
    [ Note that this is different from when gdb doesn't attach but instead
    launches the exec itself, in which case there's just one thread to begin with,
    and consequently the probed thread is LWP x. ]
    
    According to aforementioned remark, a better choice would have been the main
    thread, that is, LWP x.
    
    This patch implement that choice, by simply doing:
    ...
      tid = inferior_ptid.pid ();
    ...
    
    The fact that gdb makes a per-process permanent choice for 64/32-bitness is a
    problem in itself: each thread can be in either 64 or 32 bit mode, and change
    forth and back.  That is a problem that this patch doesn't fix.
    
    Now finally: why does this matter in the context of the linux kernel
    discussion?  The discussion was related to a patch that exposed io_uring
    threads to user-space.  This made it possible that one of those threads would
    be picked out to select 64/32-bitness.  Given that such threads are atypical
    user-space threads in the sense that they don't return to user-space and don't
    have a userspace register state, reading their registers returns garbage, and
    so it could f.i. occur that in a 64-bit process with all normal user-space
    threads in 64-bit mode, the probing would return 32-bit.
    
    It may be that this is worked-around on the kernel side by providing userspace
    register state in those threads such that current gdb is happy.  Nevertheless,
    it seems prudent to fix this on the gdb size as well.
    
    Tested on x86_64-linux.
    
    [1] https://lore.kernel.org/io-uring/CAHk-=wh0KoEZXPYMGkfkeVEerSCEF1AiCZSvz9TRrx=Kj74D+Q@mail.gmail.com/
    
    gdb/ChangeLog:
    
    2021-05-23  Tom de Vries  <tdevries@suse.de>
    
            PR tdep/27822
            * target.h (struct target_ops): Mention target_thread_architecture in
            read_description comment.
            * x86-linux-nat.c (x86_linux_nat_target::read_description): Use
            pid to determine if process is 64-bit or 32-bit.
            * aarch64-linux-nat.c (aarch64_linux_nat_target::read_description):
            Same.
            * ppc-linux-nat.c (ppc_linux_nat_target::read_description): Same.
            * riscv-linux-nat.c (riscv_linux_nat_target::read_description): Same.
            * s390-linux-nat.c (s390_linux_nat_target::read_description): Same.
            * arm-linux-nat.c (arm_linux_nat_target::read_description): Same.
            Likewise, use pid to determine if kernel supports reading VFP
            registers.
Comment 4 Tom de Vries 2021-05-23 08:11:20 UTC
Patch committed, marking resolved-fixed.