[PATCH] Limit the switch_to_thread() calls in startup_inferior()

Simon Marchi simark@simark.ca
Sun Oct 25 02:58:05 GMT 2020


On 2020-10-14 4:49 a.m., Kamil Rytarowski wrote:
> Do not jump over the threads during the startup unless we encounter
> TARGET_WAITKIND_STOPPED with SIGTRAP or TARGET_WAITKIND_EXECD.
>
> Otherwise whenever a startup-with-shell processes signals on the
> startup stage, it might indicate to switch to a non-existing
> thread or a special-thread number (target lwp=0 on NetBSD means
> that a signal was directed to all threads within a process).
>
> This caused a crash with tcsh on NetBSD, where the tcsh shell
> runs startup detection of the hostname. This action involves
> spwaning a new process through fork.
>
> GDB crashes this way:
> $ SHELL=tcsh /usr/bin/gdb echo
> (gdb) r
> Starting program: /bin/echo
> /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/thread.c:1309: internal-error: void switch_to_thread(thread_info*): Assertion `thr != NULL' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
>
> And the core-dump is as follows:
>
> (top-gdb) bt
> During symbol reading: incomplete CFI data; unspecified registers (e.g., rax) at 0x7f7ff5b698a4
>     file=0x135314f "../../gdb/thread.c", line=1309, fmt=0x1353069 "%s: Assertion `%s' failed.", ap=0x7f7fffffdc08) at ../../gdb/utils.c:414
>     at ../../gdb/inf-ptrace.c:117
> args=0x0, from_tty=1, run_how=RUN_NORMAL) at ../../gdb/infcmd.c:493
> --Type <RET> for more, q to quit, c to continue without paging--

I tried to wrap my head around this and understand the need for the
switch_to_thread, what could be the impacts of your change, and I just
can't give anything close to a definitive answer.

I did a bit of archeology, and found that the switch_to_thread were
added by this patch:

  https://sourceware.org/pipermail/gdb-patches/2008-October/060950.html

Following this thread:

  https://sourceware.org/legacy-ml/gdb/2008-10/msg00034.html

Sooo... it sounds like the tid in the ptid made by gnu-nat.c is a bogus
integer (looks like it still is today, but it's in lwp instead of tid).
And the problem was because the tid increased at every exec.

So my theory is that these switch_to_thread were needed to keep the
inferior_ptid global sync'ed with the ptid gnu-nat.c knows about.

The failure reported in the thread above is this one:

  https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/i386gnu-nat.c;h=3d71047eab7dd4184dd978bb7b351661f09beec5;hb=2e979b9404414113dc61b32eebcda8a53632605c#l124

You can see that back then, gnu_fetch_registers relied on inferior_ptid,
so it would make sense for this function to fail if inferior_ptid is
stale.  I just don't know where/when during the startup_inferior
function did gnu_fetch_registers get called (it would have been useful
to have a backtrace for that).  Today,
i386_gnu_nat_target::fetch_registers relies on the ptid of the regcache,
not inferior_ptid.  Perhaps it's not necessary to call switch_to_thread
at all today?  It's hard to say.

Samuel, as the last person who touched gnu-nat.c, do you have a bit of
time to test Kamil's patch on Hurd, see if it breaks starting inferiors?
And what if you remove all the switch_to_thread from that function, does
it still work?

Simon


More information about the Gdb-patches mailing list