Bug 20743

Summary: can't usefully "continue" due to "ptrace: No such process" after gdb switches thread (gdb7.11.1 on FreeBSD 11)
Product: gdb Reporter: misc-sourceware
Component: threadsAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: jhb, misc-sourceware
Priority: P2    
Version: 7.11.1   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description misc-sourceware 2016-10-28 10:49:13 UTC
FreeBSD kitten 11.0-RELEASE-p1 FreeBSD 11.0-RELEASE-p1 #0 r306420: Thu Sep 29 01:43:23 UTC 2016     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

GNU gdb (GDB) 7.11.1 [GDB v7.11.1 for FreeBSD]


Attach to a running, multi-threaded process using --pid 
Set breakpoint
Continue (expecting process to run until breakpoint hit)
For some reason gdb stops with "ptrace: no such process"
Continue (again)
Repeat


(gdb) inf thr
  Id   Target Id         Frame 
* 1    LWP 100638 of process 22406 0x000000080326e0da in _poll () from /lib/libc.so.7
(gdb) b blockchain_monitor.cpp:187
Breakpoint 1 at 0x4069df: file blockchain_monitor.cpp, line 187.
(gdb) c
Continuing.
[New LWP 101613 of process 22406]
[LWP 101613 of process 22406 exited]
[New LWP 101614 of process 22406]
[Switching to LWP 101614 of process 22406]
0x0000000802640a10 in ?? () from /lib/libthr.so.3
ptrace: No such process.
(gdb) inf thr
  Id   Target Id         Frame 
  1    LWP 100638 of process 22406 0x000000080326e0da in _poll () from /lib/libc.so.7
* 3    LWP 101614 of process 22406 0x0000000802640a10 in ?? () from /lib/libthr.so.3
(gdb) bt
#0  0x0000000802640a10 in ?? () from /lib/libthr.so.3
#1  0x00007fffdf9fc000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfbfc000
(gdb) c
Continuing.
[LWP 101614 of process 22406 exited]
[New LWP 101028 of process 22406]
[Switching to LWP 101028 of process 22406]
0x0000000802640a10 in ?? () from /lib/libthr.so.3
ptrace: No such process.
(gdb) c
Continuing.
[LWP 101028 of process 22406 exited]
[New LWP 100112 of process 22406]
[Switching to LWP 100112 of process 22406]
0x0000000802640a10 in ?? () from /lib/libthr.so.3
ptrace: No such process.
(gdb) 

[...and so on...]


Here's a more detailed continue using the same process as above:


(gdb) set debug fbsd-lwp on
(gdb) c
Continuing.
FLWP: fbsd_resume for ptid (-1, 0, 0)
FLWP: deleting thread for LWP 100482
[LWP 100482 of process 22406 exited]
FLWP: adding thread for LWP 100841
[New LWP 100841 of process 22406]
FLWP: fbsd_resume for ptid (-1, 0, 0)
[Switching to LWP 100841 of process 22406]
0x0000000802640a10 in ?? () from /lib/libthr.so.3
ptrace: No such process.
(gdb) set debug infrun 1
(gdb) c
Continuing.
infrun: clear_proceed_status_thread (LWP 100638 of process 22406)
infrun: clear_proceed_status_thread (LWP 100841 of process 22406)
infrun: proceed (addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT)
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 100841 of process 22406] at 0x802640a10
FLWP: fbsd_resume for ptid (-1, 0, 0)
infrun: prepare_to_wait
FLWP: deleting thread for LWP 100841
[LWP 100841 of process 22406 exited]
FLWP: adding thread for LWP 100499
[New LWP 100499 of process 22406]
infrun: target_wait (-1.0.0, status) =
infrun:   22406.100499.0 [LWP 100499 of process 22406],
infrun:   status->kind = spurious
infrun: TARGET_WAITKIND_SPURIOUS
infrun: Switching context from LWP 100841 of process 22406 to LWP 100499 of process 22406
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 100499 of process 22406] at 0x802640a10
FLWP: fbsd_resume for ptid (-1, 0, 0)
[Switching to LWP 100499 of process 22406]
0x0000000802640a10 in ?? () from /lib/libthr.so.3
ptrace: No such process.
(gdb)
Comment 1 misc-sourceware 2016-10-31 18:50:20 UTC
[also notifying FreeBSD port maintainer of this bug]

The crux of the issue seems to be resume_all_threads_cb() in fbsd-nat.c trying to resume a thread that has exited. This causes ptrace(PT_RESUME) to fail with "no such process". (As a side-note, it doesn't matter which thread is current before "continue" command as gdb seems to switch to any new thread spawned - why is that?)

Exited threads are still in the thread list when resume_all_threads_cb() is called, e.g. if the current thread (in inferior_ptid) exits.

To demonstrate this, change resume_all_threads_cb() to add debugging as follows so it shows which thread it's about to resume and to confirm which call to ptrace() returns an error:

static int
resume_all_threads_cb (struct thread_info *tp, void *data)
{
  ptid_t *filter = (ptid_t *) data;

  if (!ptid_match (tp->ptid, *filter))
    return 0;

  if (debug_fbsd_lwp)
    fprintf_unfiltered (gdb_stdlog,
                        "FLWP: PT_RESUME for ptid (%d, %ld, %ld)\n",
                        ptid_get_pid (tp->ptid), ptid_get_lwp (tp->ptid),
                        ptid_get_tid (tp->ptid));
        

  if (ptrace (PT_RESUME, ptid_get_lwp (tp->ptid), NULL, 0) == -1)
    perror_with_name (("ptrace PT_RESUME"));
  return 0;
}


Now the debugging output looks like this:

(gdb) set debug infrun 3
(gdb) set debug fbsd-lwp on
(gdb) c
Continuing.
infrun: clear_proceed_status_thread (LWP 101201 of process 35559)
infrun: proceed (addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT)
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 101201 of process 35559] at 0x8032880da
FLWP: fbsd_resume for ptid (-1, 0, 0)
FLWP: PT_RESUME for ptid (35559, 101201, 0)
infrun: prepare_to_wait
FLWP: adding thread for LWP 101576
[New LWP 101576 of process 35559]
infrun: target_wait (-1.0.0, status) =
infrun:   35559.101576.0 [LWP 101576 of process 35559],
infrun:   status->kind = spurious
infrun: TARGET_WAITKIND_SPURIOUS
infrun: Switching context from LWP 101201 of process 35559 to LWP 101576 of process 35559
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 101576 of process 35559] at 0x80265aa10
FLWP: fbsd_resume for ptid (-1, 0, 0)
FLWP: PT_RESUME for ptid (35559, 101201, 0)
FLWP: PT_RESUME for ptid (35559, 101576, 0)
infrun: prepare_to_wait
FLWP: deleting thread for LWP 101576
[LWP 101576 of process 35559 exited]
FLWP: adding thread for LWP 101586
[New LWP 101586 of process 35559]
infrun: target_wait (-1.0.0, status) =
infrun:   35559.101586.0 [LWP 101586 of process 35559],
infrun:   status->kind = spurious
infrun: TARGET_WAITKIND_SPURIOUS
infrun: Switching context from LWP 101576 of process 35559 to LWP 101586 of process 35559
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 101586 of process 35559] at 0x80265aa10
FLWP: fbsd_resume for ptid (-1, 0, 0)
FLWP: PT_RESUME for ptid (35559, 101201, 0)
FLWP: PT_RESUME for ptid (35559, 101576, 0)
[Switching to LWP 101586 of process 35559]
0x000000080265aa10 in ?? () from /lib/libthr.so.3
ptrace PT_RESUME: No such process.
(gdb) 

Note the last few lines showing a call for LWP 101576 - a thread that has exited.


This may not be the ideal fix but as a work-around change the top of resume_all_threads_cb() to:

resume_all_threads_cb (struct thread_info *tp, void *data)
{
  ptid_t *filter = (ptid_t *) data;

  /* don't resume an exited thread */
  if (tp->state == THREAD_EXITED)
    return 0;

[existing code, starting with if() continues from here]

Output showing issue is worked-around:

(gdb) set debug infrun 3
(gdb) set debug fbsd-lwp on
(gdb) c
Continuing.
infrun: clear_proceed_status_thread (LWP 101201 of process 35559)
infrun: proceed (addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT)
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 101201 of process 35559] at 0x8032880da
FLWP: fbsd_resume for ptid (-1, 0, 0)
FLWP: PT_RESUME for ptid (35559, 101201, 0)
infrun: prepare_to_wait
FLWP: adding thread for LWP 100444
[New LWP 100444 of process 35559]
infrun: target_wait (-1.0.0, status) =
infrun:   35559.100444.0 [LWP 100444 of process 35559],
infrun:   status->kind = spurious
infrun: TARGET_WAITKIND_SPURIOUS
infrun: Switching context from LWP 101201 of process 35559 to LWP 100444 of process 35559
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 100444 of process 35559] at 0x80265aa10
FLWP: fbsd_resume for ptid (-1, 0, 0)
FLWP: PT_RESUME for ptid (35559, 101201, 0)
FLWP: PT_RESUME for ptid (35559, 100444, 0)
infrun: prepare_to_wait
FLWP: deleting thread for LWP 100444
[LWP 100444 of process 35559 exited]
FLWP: adding thread for LWP 101642
[New LWP 101642 of process 35559]
infrun: target_wait (-1.0.0, status) =
infrun:   35559.101642.0 [LWP 101642 of process 35559],
infrun:   status->kind = spurious
infrun: TARGET_WAITKIND_SPURIOUS
infrun: Switching context from LWP 100444 of process 35559 to LWP 101642 of process 35559
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 101642 of process 35559] at 0x80265aa10
FLWP: fbsd_resume for ptid (-1, 0, 0)
FLWP: PT_RESUME for ptid (35559, 101201, 0)
FLWP: PT_RESUME for ptid (35559, 101642, 0)
[...and so on...]
Comment 2 jhb 2016-12-22 01:42:13 UTC
I can confirm this, and I have a similar patch (but using is_exited(), and patching both of the callbacks, but asserting that we never try to do a single-resume of an exited thread).

The reason gdb switches to new threads when they are created is that we always report a stop when a new thread arrives.  I could change this to have it only add the thread but not report a stop, but it gets kind of messy if you are single-stepping across thread creation as in theory I would need to cache that info down in the fbsd nat layer and PT_SUSPEND the new thread before doing my own PT_CONTINUE.
Comment 3 Sourceware Commits 2017-04-18 16:50:01 UTC
The master branch has been updated by John Baldwin <jhb@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d56060f08aa4ed5786042a066f62aa8e474cc0fd

commit d56060f08aa4ed5786042a066f62aa8e474cc0fd
Author: John Baldwin <jhb@FreeBSD.org>
Date:   Tue Apr 18 09:44:32 2017 -0700

    PR threads/20743: Don't attempt to suspend or resume exited threads.
    
    When resuming a native FreeBSD process, ignore exited threads when
    suspending/resuming individual threads prior to continuing the process.
    
    gdb/ChangeLog:
    
    	PR threads/20743
    	* fbsd-nat.c (resume_one_thread_cb): Remove.
    	(resume_all_threads_cb): Remove.
    	(fbsd_resume): Use ALL_NON_EXITED_THREADS instead of
    	iterate_over_threads.
Comment 4 Sourceware Commits 2017-04-18 16:53:41 UTC
The gdb-8.0-branch branch has been updated by John Baldwin <jhb@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=24b03ea864424cf8482ba07fb074389aa759e592

commit 24b03ea864424cf8482ba07fb074389aa759e592
Author: John Baldwin <jhb@FreeBSD.org>
Date:   Tue Apr 18 09:44:32 2017 -0700

    PR threads/20743: Don't attempt to suspend or resume exited threads.
    
    When resuming a native FreeBSD process, ignore exited threads when
    suspending/resuming individual threads prior to continuing the process.
    
    gdb/ChangeLog:
    
    	PR threads/20743
    	* fbsd-nat.c (resume_one_thread_cb): Remove.
    	(resume_all_threads_cb): Remove.
    	(fbsd_resume): Use ALL_NON_EXITED_THREADS instead of
    	iterate_over_threads.
Comment 5 jhb 2017-04-18 16:55:19 UTC
Fix committed to master and the 8.0 branch and will appear in 8.0 release.