[non-stop] 08/10 linux native support

Wed Jun 25 23:08:00 GMT 2008

On Wed, Jun 25, 2008 at 10:17:25PM +0100, Pedro Alves wrote:
> > This may be trouble.  Sometimes the thread state is not
> > atomically updated, so peeking at it right after creation but before
> > an event can fail.
> >
> 
> Oh, that's not nice.  Is this something that's worth and/or possible
> to fix in libthreaddb?

I don't remember whether it's fixed in current libthread_db, or else
impossible to fix due to the kernel interfaces involved.  There's
tension between having the thread on the list early enough and having
its entry be correct.  I know I wrote a related kernel patch, which
was never merged.  libthread_db is better about this than it used to
be though.

> > Why is it necessary?  We already know the ptid since we made them
> > independent of thread_db TID some time ago.  attach_thread should cope
> > if the thread is already in GDB's thread list when the event
> > eventually arrives.  So we should be able to just add the new
> > thread directly.
> 
> That's right, the only thing we'll miss if we do that, is the
> thread_db id of the thread in output like:
> 
> [New Thread 0xf7e11b90 (LWP 26100)]
>              ^^^^^^^^
> And info threads:
> 
>   2 Thread 0xf7e11b90 (LWP 26100)  (running)
>              ^^^^^^^^
> 
> Those will only show up on the next stop event (of any thread).
> It may take a while, if all threads are running (unless we do
> momentarily stop threads trick).

Oh, dear.  Options:

  - delay the notification until thread_db discovers the thread,
    if libthread_db is already active

  - display the notification without the thread ID; we'll have the
    LWP ID and we could add the GDB thread number

  - go with your code and fix broken situations as they arise

I'm undecided.  Note that your code is unnecessarily quadratic, by the
way.  It'll walk the entire thread list; we could just load the new
thread since we know its LWP ID.  libthread_db may still do a walk in
that case though...

> > SIGKILL should work even if the thread is stopped.
> 
> I think I'll need a SIGCONT as well in that case.  For some
> reason, I wasn't getting that to work all the times.  I'll
> experiment some more.

Kernels may vary in this regard.  Your code seems reasonable.
PTRACE_KILL is supposed to be just SIGKILL + PTRACE_CONT, and SIGKILL
is supposed to work even on stopped processes, but the details come
and go... as you know, signal handling is a very touchy area and hard
to write tests for.

-- 
Daniel Jacobowitz
CodeSourcery