[RFC]: fix for recycled thread ids

Jeff Johnston jjohnstn@redhat.com
Thu Mar 25 00:46:00 GMT 2004


Jeff Johnston wrote:
> Daniel Jacobowitz wrote:
> 
>> On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote:
>>
>>> Fortunately, in <gnu/libc-version.h> there is a function to return the
>>> runtime version of glibc.  We should be able to use that - and the not
>>> 100% valid, but generally valid and already assumed by thread_db,
>>> assumption that a native GDB, when used to debug native programs, is
>>> debugging the same version of the C library - to enable TD_DEATH when
>>> it is safe to do so.  This will let us detach the threads when they
>>> die.
>>>
>>> That has its own risks, since the thread continues to run for a short
>>> while after the death event is reported.  For instance, in NPTL the
>>> thread reports the event and then cleans up after itself; in LT I don't
>>> remember whether the manager or the thread does this, but I think it's
>>> the same.  I already wrote limited code to handle this, if you search
>>> for "zombie" in thread-db.c, so it should be OK.  The gist is that we
>>> remove it from the thread list right away, but do not detach the
>>> thread.  We resist attaching to zombies.
>>>
>>> At least, all that is how it looks to me.  I'll experiment with
>>> TD_DEATH before I speculate further.
>>
>>
>>
>> Here's a proof-of-concept patch.  It is not quite right, in that now I
>> can hit C-c about four times before I get the error().  It does
>> illustrate what I was suggesting, though, minus the version checking.
>>
>> The reason we still get the segfaults is not that surprising.  I spent
>> a little while coaxing it into producing a core dump, and got this:
>>
>> (gdb) i threads
>>   7 process 10303  0x40034531 in __nptl_create_event () from 
>> /lib/tls/i686/cmov/libpthread.so.0
>>   6 process 12467  0xffffe410 in __kernel_vsyscall ()
>>   5 process 12468  0xffffe410 in __kernel_vsyscall ()
>>   4 process 12469  0x0804884e in threadfunc (arg=0x25) at lphello.c:64
>>   3 process 12470  0x40034540 in __nptl_death_event () from 
>> /lib/tls/i686/cmov/libpthread.so.0
>>   2 process 12471  0xffffe410 in __kernel_vsyscall ()
>> * 1 process 12472  0xffffe410 in __kernel_vsyscall ()
>>
>> So one thread is running, four are probably sleeping, one is dying...
>> and one is creating another thread.  I'd bet money that we aren't
>> attached to the one being created yet.  I can produce all sorts of
>> other interesting GDB internal errors this way, too.
>>
> 
> I also tried the TD_DEATH method but gave up after running into 
> segfaults.  I have to go back and recheck whether those segfaults could 
> occur without the Crtl-C being issued.  I can't remember off-hand.
> 

Just to confirm, I don't see any segfaults just running, but this code is 
extremely brittle on my RHEL3-U1 system.  It is coming back to me why I 
abandoned TD_DEATH.

On the first Ctrl-C I usually get:

Program received signal SIGINT, Interrupt.
[Switching to Thread -1226028112 (LWP 16580)]
0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) info threads
[New Thread -1226163280 (LWP 16579)]
Can't attach LWP 16579: Operation not permitted
(gdb) info threads
[New Thread -1226298448 (LWP 16578)]
Can't attach LWP 16578: Operation not permitted
(gdb) info threads
   158 Thread -1226298448 (LWP 16578)  0xb75ce6b1 in __nptl_death_event ()
    from /lib/tls/libpthread.so.0
   157 Thread -1226163280 (LWP 16579)  0xb75ce6b1 in __nptl_death_event ()
    from /lib/tls/libpthread.so.0
* 156 Thread -1226028112 (LWP 16580)  0xb75ebc32 in _dl_sysinfo_int80 ()
    from /lib/ld-linux.so.2
   1 Thread -1218598080 (LWP 16423)  0xb75ce6a1 in __nptl_create_event ()
    from /lib/tls/libpthread.so.0
(gdb) c

After continuing, the program will go shortly and then die.

[Thread -1226163280 (zombie) exited]
[New Thread -1226433616 (LWP 16674)]
Cannot enable thread event reporting for Thread -1226433616 (LWP 16674): generic 
error

My original fix doesn't break near as easily on RHEL-U1.  Hitting enough 
Ctrl-C's does eventually trigger an error (occassionally even a gdb assert), but 
this may be part of catching the race condition before we know about the thread.
Now, it is more than likely just a bug in the TD_DEATH event processing because 
we haven't exercised it.  Would it be worth looking at implementing your 
original alternative for my patch?

You asked me in a previous note to run with strace on and confirm that I was not 
getting WIFEXITTED for the dying threads.  I confirmed this.  I only get one 
WIFEXITTED and that is for the main thread.

I still need to do the Eclipse test.

-- Jeff J.


>> Does this patch fix the problems you were seeing with less pathological
>> test cases?  Eclipse, in this case, is a less pathological test case,
>> since it creates threads that actually do a bit of work.  If it does,
>> I'll add the necessary glibc version checking.  The only way to fix the
>> race on the create side is to add PTRACE_EVENT_CLONE support (this _is_
>> what it was for), but that will require some major surgery.
>>
> 
> As you know, I am also looking at the PTRACE_EVENT_CLONE stuff.  I'll 
> try this patch out with the Eclipse test.  I also will include your 
> lin-lwp linux event handler fix to see if that gets past the fork 
> problem they were seeing.
> 
>> Index: thread-db.c
>> ===================================================================
>> RCS file: /cvs/src/src/gdb/thread-db.c,v
>> retrieving revision 1.37
>> diff -u -p -r1.37 thread-db.c
>> --- thread-db.c    29 Feb 2004 02:39:47 -0000    1.37
>> +++ thread-db.c    24 Mar 2004 16:47:23 -0000
>> @@ -1,6 +1,6 @@
>>  /* libthread_db assisted debugging support, generic parts.
>>  
>> -   Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc.
>> +   Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc.
>>  
>>     This file is part of GDB.
>>  
>> @@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr;
>>  static void thread_db_find_new_threads (void);
>>  static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
>>                 const td_thrinfo_t *ti_p, int verbose);
>> +static void detach_thread (ptid_t ptid, int verbose);
>>  
>>  
>>  /* Building process ids.  */
>> @@ -154,6 +155,8 @@ struct private_thread_info
>>    unsigned int th_valid:1;
>>    unsigned int ti_valid:1;
>>  
>> +  unsigned int dying:1;
>> +
>>    td_thrhandle_t th;
>>    td_thrinfo_t ti;
>>  };
>> @@ -501,7 +504,7 @@ enable_thread_event_reporting (void)
>>    /* Set the process wide mask saying which events we're interested 
>> in.  */
>>    td_event_emptyset (&events);
>>    td_event_addset (&events, TD_CREATE);
>> -#if 0
>> +#if 1
>>    /* FIXME: kettenis/2000-04-23: The event reporting facility is
>>       broken for TD_DEATH events in glibc 2.1.3, so don't enable it for
>>       now.  */
>> @@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr
>>    struct thread_info *tp;
>>    td_err_e err;
>>  
>> +      /* We may already know about this thread, for instance when the
>> +         user has issued the `info threads' command before the SIGTRAP
>> +         for hitting the thread creation breakpoint was reported.  */
>> +  if (in_thread_list (ptid))
>> +    {
>> +      tp = find_thread_pid (ptid);
>> +      gdb_assert (tp != NULL);
>> +
>> +      if (!tp->private->dying)
>> +        return;
>> +
>> +      delete_thread (ptid);
>> +    }
>> +
>>    check_thread_signals ();
>>  
>>    /* Add the thread to GDB's thread list.  */
>> @@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t
>>  static void
>>  detach_thread (ptid_t ptid, int verbose)
>>  {
>> +  struct thread_info *thread_info;
>> +
>>    if (verbose)
>>      printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid));
>> +
>> +  /* Don't delete the thread now, because it still reports as active 
>> until
>> +     it has executed a few instructions after the event breakpoint.  */
>> +  thread_info = find_thread_pid (ptid);
>> +  gdb_assert (thread_info != NULL);
>> +  thread_info->private->dying = 1;
>>  }
>>  
>>  static void
>> @@ -848,11 +873,7 @@ check_event (ptid_t ptid)
>>      {
>>      case TD_CREATE:
>>  
>> -      /* We may already know about this thread, for instance when the
>> -         user has issued the `info threads' command before the SIGTRAP
>> -         for hitting the thread creation breakpoint was reported.  */
>> -      if (!in_thread_list (ptid))
>> -        attach_thread (ptid, msg.th_p, &ti, 1);
>> +      attach_thread (ptid, msg.th_p, &ti, 1);
>>  
>>        break;
>>  
>> @@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh
>>  
>>    ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid));
>>  
>> -  if (!in_thread_list (ptid))
>> -    attach_thread (ptid, th_p, &ti, 1);
>> +  attach_thread (ptid, th_p, &ti, 1);
>>  
>>    return 0;
>>  }
>>
>>
> 
> 



More information about the Gdb-patches mailing list