I wrote a GDB test that attaches to a program that constantly spawns short-lived threads, which exposed several issues. This is one of them. GDB sometimes prints out a warning like: ... [New LWP 20700] warning: unable to open /proc file '/proc/-1/status' [New LWP 20850] [New LWP 21019] ... That happens because when a thread exits, and is joined, glibc does: nptl/pthread_join.c: pthread_join () { ... if (__glibc_likely (result == 0)) { /* We mark the thread as terminated and as joined. */ pd->tid = -1; ... /* Free the TCB. */ __free_tcb (pd); } So if we stop the inferior at just the right time, and list threads with td_ta_thr_iter / td_thr_get_info, we can find threads with kernel thread ID -1 (td_thrinfo_t.ti_lid == -1). Unfortunately, td_thrinfo_t.ti_state claims the thread is TD_THR_ACTIVE. Turns out the set of states td_thr_get_info returns isn't very complete: td_thr_get_info () { if ((((int) (uintptr_t) cancelhandling) & EXITING_BITMASK) == 0) /* XXX For now there is no way to get more information. */ infop->ti_state = TD_THR_ACTIVE; else if ((((int) (uintptr_t) cancelhandling) & TERMINATED_BITMASK) == 0) infop->ti_state = TD_THR_ZOMBIE; else infop->ti_state = TD_THR_UNKNOWN;
I'll add a special case to GDB: ignore threads with ti_lid == -1.
I've investigated this some more. I noticed that the thread's state is not actually TD_THR_ACTIVE just after the thread is joined, before the thread is removed from the thread list, here, in the code I pasted before: nptl/pthread_join.c: pthread_join () { ... if (__glibc_likely (result == 0)) { /* We mark the thread as terminated and as joined. */ pd->tid = -1; ... /* Free the TCB. */ __free_tcb (pd); } But, I _am_ seeing TD_THR_ACTIVE threads with pd->tid == -1. Turns out that nothing in __free_tcb clears pd->tid. So later on, when a new thread reuses the old thread's tcb/stack, the new thread will start out with tid==-1 (reused from the old thread), up until the kernel actually starts the new clone and fills in tid (CLONE_CHILD_SETTID), and it's _that_ thread that has TD_THR_ACTIVE state. I don't think a new state for when the thread is already listed in the thread list but doesn't have a kernel clone associated yet could help here, as a debugger can always attach between glibc changing the thread state and the kernel filling in the clone's tid. This made me wonder what happens if a detached thread's tcb/stack is reused. Or, if a new stack is allocated for a new thread, instead of reused, and gdb lists threads before the kernel spawns the new clone. In that case, the thread's tid field starts out as 0. So I thought that just like GDB can see threads with tid=-1, it should also find them with tid=0 as well. But, turns out it doesn't, because nptl_db/td_thr_get_info.c:td_thr_get_info has this: /* Initialization which are the same in both cases. */ infop->ti_ta_p = th->th_ta_p; infop->ti_lid = tid == 0 ? ps_getpid (th->th_ta_p->ph) : (uintptr_t) tid; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ infop->ti_traceme = report_events != 0; Eh. So ti_lid (same as pd->tid inside the inferior) is never returned as zero. Instead, for threads that are just being created, GDB is told that their kernel thread ID is the overall thread group id. But this is wrong. This can well confuse GDB if it decides to refresh its own thread's state cache (given NPTL's 1:1 model, gdb only keeps track of threads by their kernel ID...) (I'm guessing that the intent here was that tid == 0 indicates that that this is the main thread and the pthread library isn't fully initialized yet, and so the tgid would be correct.)