This is the mail archive of the
gdb@sourceware.org
mailing list for the GDB project.
Re: fail to attach to process on Solaris
On Wednesday 21 September 2011 00:22:21, Burkhardt, Glenn wrote:
> The problem appears that thread debug library has callback for register
> get operation that's connected to "sol-thread.c:ps_lgetregs()". In the
> case that fails, the thread exists, but the calling sequence tries to
> lookup registers for a LWP with the same ID as the thread.
This is Solaris 9, with the default 1:1 model thread library, right?
> #0 find_procinfo_or_die (pid=12276, tid=67) at procfs.c:489
> #1 0x000a1cd0 in procfs_fetch_registers (ops=0x7293d8,
> regcache=0x71b1d0,
> regnum=-1) at procfs.c:3483
> #2 0x0012feec in sol_thread_fetch_registers (ops=0x718a70,
> regcache=0x71b1d0,
> regnum=-1) at sol-thread.c:457
> #3 0x00231af0 in target_fetch_registers (regcache=0x71b1d0, regno=-1)
> at target.c:3417
> #4 0x00130e48 in ps_lgetregs (ph=0x700998, lwpid=67,
> gregset=0xffbfe37c)
> at sol-thread.c:923
> #5 0xff0735dc in td_thr_getgregs () from /usr/lib/libthread_db.so.1
> #6 0x0012fff8 in sol_thread_fetch_registers (ops=0x718a70,
> regcache=0x71b3b0,
> regnum=68) at sol-thread.c:473
But what is the rest of the stack trace? IOW, where's this being
called from?
>
> For this stack trace of 'gdb', 'sol_thread_fetch_registers()' is passed
>
> (gdb) frame
> #6 0x0012fff8 in sol_thread_fetch_registers (ops=0x718a70,
> regcache=0x71b3b0,
> regnum=68) at sol-thread.c:473
> 473 val = p_td_thr_getgregs (&thandle, gregset);
> (gdb) p *regcache
> $24 = {descr = 0x84fc40, aspace = 0x7aa258, registers = 0x846c48 "",
> register_status = 0x14f37c0 "", readonly_p = 0, ptid = {pid = 12276,
> lwp = 0, tid = 67}}
>
> So it's looking for registers from a thread that's not associated with
> an LWP. But the
> function 'ps_lgetregs()' is always looking for the registers on the LWP
> list.
>
> I can't see how the callback 'ps_lgetregs()' is connected to the thread
> debug library. In fact, the documentation for the thread debug library
> seems sparse. I've only been able to find out about it in the man pages
> and comments section of sol-thread.c So any pointers to documentation
> would be helpful.
That's about all there is... Luckily or not, glibc copied the same
interface out of Solaris, so people who understand the Linux version
can understand the Solaris' one with ease. Older Solaris versions
supported an M:N thread model, where multiple user space
threads would be mapped to the same kernel thread (LWP), and sometimes
even to no kernel thread (LWP) (when they're idle). libthread_db.so is a
library the system provides, that debuggers load into their own address
space, that serves as bridge between user threads, and however they're
mapped underneath. So in this case, GDB wants to fetch the registers
of some thread. It asks libthread_db.so for its registers. libthread_db.so
internally knows that that thread is mapped into LWP 67, and to serve
GDB's initial request, it needs to fetch the registers of LWP 67. libthread_db.so
can't read registers off of an LWP itself, but the debugger client can. So
libthread_db.so calls back info the debugger through the `ps_lgetregs' function
of the proc_service interface (see man ps_lgetregs).
ps_lgetregs ends up recursing into sol_thread_fetch_registers, but this
time, inferior_ptid points directly into an LWP, so we just pass the
request directly to the LWP support layer in procfs.c. It's at this
point that things are failing for some reason.
So, next step would be understanding whether LWP 67 really still exists or not
at the failure point. Can you find that out peeking at /proc/... from the
command line? Maybe the LWP had just exited while GDB was attaching to the
process, but GDB hadn't processed the exit event yet? Or has GDB failed in the
thread->lwp id mappings somewhere?
--
Pedro Alves