[PATCH 2/2] glibc-2.34: Fix internal error when running gdb.base/gdb-sigterm.exp

Pedro Alves pedro@palves.net
Wed Jul 14 00:07:03 GMT 2021


On 2021-07-14 12:35 a.m., Kevin Buettner wrote:

> Pedro Alves <pedro@palves.net> wrote:

>> This isn't right.  The real bug is that inferior_ptid and current_thread_ got
>> out of sync.  Where did that happen?  What set inferior_ptid without using
>> switch_to_thread & friends?
> 
> inferior_ptid is temporarily set without changing the current thread in
> ps_xfer_memory in proc-service.c:
> 
> static ps_err_e
> ps_xfer_memory (const struct ps_prochandle *ph, psaddr_t addr,
> 		gdb_byte *buf, size_t len, int write)
> {
>   scoped_restore_current_inferior restore_inferior;
>   set_current_inferior (ph->thread->inf);
> 
>   scoped_restore_current_program_space restore_current_progspace;
>   set_current_program_space (ph->thread->inf->pspace);
> 
>   scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid);
>   inferior_ptid = ph->thread->ptid;
> 
>   CORE_ADDR core_addr = ps_addr_to_core_addr (addr);
> 
>   int ret;
>   if (write)
>     ret = target_write_memory (core_addr, buf, len);
>   else
>     ret = target_read_memory (core_addr, buf, len);
>   return (ret == 0 ? PS_OK : PS_ERR);
> }
> 
> Under normal circumstances, inferior_ptid is set and then either
> target_read_memory() or target_write_memory() are called after
> which it'll be reset to its former value when ps_xfer_memory
> returns.
> 
> However, should GDB receive a SIGTERM while this is happening,
> the QUIT processing code in target_read() will be taken, leading
> to an eventual call to inferior_thread where the assert occurs.
> 
> Here's the relevant portion of the backtrace on my F34 w/ glibc-2.33.9000
> machine:
> 
> #18 0x0000000000b6abf2 in internal_error (file=<optimized out>, 
>     line=<optimized out>, fmt=<optimized out>)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdbsupport/errors.cc:55
> #19 0x00000000009b72e2 in inferior_thread ()
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/thread.c:72
> #20 0x00000000009b8b2b in any_thread_of_inferior (inf=0x1603a60)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/thread.c:642
> #21 0x00000000009c45b8 in kill_or_detach (inf=0x1603a60, from_tty=0)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/top.c:1677
> #22 0x00000000009c49e7 in quit_force (exit_arg=0x0, from_tty=0)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/top.c:1779
> #23 0x0000000000a37e57 in quit ()
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/utils.c:629
> #24 0x0000000000a37eb3 in maybe_quit ()
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/utils.c:653
> #25 0x000000000099d8ef in target_read (ops=0x108b9c0 <the_thread_db_target>, 
>     object=TARGET_OBJECT_MEMORY, annex=0x0, buf=0x7ffca2d41d90 "", 
>     offset=140737354129392, len=8)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/target.c:2025
> #26 0x000000000099d2e6 in target_read_memory (memaddr=0x7ffff7ffdff0, 
>     myaddr=0x7ffca2d41d90 "", len=8)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/target.c:1810
> #27 0x0000000000854fa0 in ps_xfer_memory (During symbol reading: const value length mismatch for 'sigall_set', got 128, expected 0
> ph=0x1b086d8, addr=0x7ffff7ffdff0, 
>     buf=0x7ffca2d41d90 "", len=8, write=0)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/proc-service.c:90
> #28 0x0000000000855160 in ps_pdread (ph=0x1b086d8, addr=0x7ffff7ffdff0, 
>     buf=0x7ffca2d41d90, size=8)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/proc-service.c:124
> #29 0x00007fc1240151f0 in _td_fetch_value (ta=ta@entry=0x1b08740, 
>     desc=desc@entry=0x1b087f0, descriptor_name=descriptor_name@entry=14, 
>     idx=idx@entry=0x0, address=<optimized out>, 
>     result=result@entry=0x7ffca2d41dc0) at fetch-value.c:115
> #30 0x00007fc124011bdd in td_ta_map_lwp2thr (ta_arg=0x1b08740, lwpid=1452077, 
>     th=0x7ffca2d41f90) at td_ta_map_lwp2thr.c:194
> #31 0x00000000007ab627 in thread_from_lwp (stopped=0x18588b0, ptid=...)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/linux-thread-db.c:416
> #32 0x00000000007ada46 in thread_db_target::wait (
>     this=0x108b9c0 <the_thread_db_target>, ptid=..., ourstatus=0x7ffca2d42578, 
>     options=...)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/linux-thread-db.c:1431
> #33 0x000000000099e953 in target_wait (ptid=..., status=0x7ffca2d42578, 
>     options=...)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/target.c:2608
> #34 0x0000000000756eaa in do_target_wait_1 (inf=0x1603a60, ptid=..., 
>     status=0x7ffca2d42578, options=...)
>     at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/infrun.c:3663

Ah.  This is sooo much clearer explained this way.

> 
> The key points are as follows:
> 
> 1) inferior_ptid and current_thread_ are set to null_ptid and nullptr
>    respectively via the call to switch_to_inferior_no_thread() in
>    do_target_wait_1().
> 2) Later on, while attempting to figure out the thread corresponding to an
>    LWP, ps_xfer_memory is called().
> 3) inferior_ptid is set (without also setting current_thread_) within
>    ps_xfer_memory().  Arrangements are made, however, to have
>    inferior_ptid's value restored upon return from ps_xfer_memory().

OK, this is one case where we can't use switch_to_thread, because we're in
really lower level code, and ps_xfer_memory may be reached even without a thread,
and also it's not a good idea to switch threads here because it flushes frame caches.

> 4) Lower level memory xfer functions are invoked from ps_xfer_memory().
> 5) A SIGTERM is delivered to GDB; the signal handler is invoked causing
>    sync_quit_force_run to be set.
> 6) The QUIT processing code in target_read_memory notices that sync_quit_force_run
>    is set.  It calls quit() which calls quit_force() which calls kill_or_detach().
> 7) Mayhem (well, an assert) occurs due to inferior_ptid being something other than
>    null_ptid while current_thread_ is still nullptr.

It looks to me that this sync_quit_force_run mechanism as is today is too dangerous.  We're
quite deep in the target implementation, with all sorts of state half switched around, we
can't really assume that it's safe to reenter kill_or_detach (and target_kill/target_detach)
like that.

Off hand, it seems to me that the right fix is to let the normal quit exception propagate all
the way to the top level, and then have the top level call quit_force if sync_quit_force_run
is set.


More information about the Gdb-patches mailing list