[RFC]: fix for recycled thread ids
Jeff Johnston
jjohnstn@redhat.com
Thu Mar 25 00:46:00 GMT 2004
Jeff Johnston wrote:
> Daniel Jacobowitz wrote:
>
>> On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote:
>>
>>> Fortunately, in <gnu/libc-version.h> there is a function to return the
>>> runtime version of glibc. We should be able to use that - and the not
>>> 100% valid, but generally valid and already assumed by thread_db,
>>> assumption that a native GDB, when used to debug native programs, is
>>> debugging the same version of the C library - to enable TD_DEATH when
>>> it is safe to do so. This will let us detach the threads when they
>>> die.
>>>
>>> That has its own risks, since the thread continues to run for a short
>>> while after the death event is reported. For instance, in NPTL the
>>> thread reports the event and then cleans up after itself; in LT I don't
>>> remember whether the manager or the thread does this, but I think it's
>>> the same. I already wrote limited code to handle this, if you search
>>> for "zombie" in thread-db.c, so it should be OK. The gist is that we
>>> remove it from the thread list right away, but do not detach the
>>> thread. We resist attaching to zombies.
>>>
>>> At least, all that is how it looks to me. I'll experiment with
>>> TD_DEATH before I speculate further.
>>
>>
>>
>> Here's a proof-of-concept patch. It is not quite right, in that now I
>> can hit C-c about four times before I get the error(). It does
>> illustrate what I was suggesting, though, minus the version checking.
>>
>> The reason we still get the segfaults is not that surprising. I spent
>> a little while coaxing it into producing a core dump, and got this:
>>
>> (gdb) i threads
>> 7 process 10303 0x40034531 in __nptl_create_event () from
>> /lib/tls/i686/cmov/libpthread.so.0
>> 6 process 12467 0xffffe410 in __kernel_vsyscall ()
>> 5 process 12468 0xffffe410 in __kernel_vsyscall ()
>> 4 process 12469 0x0804884e in threadfunc (arg=0x25) at lphello.c:64
>> 3 process 12470 0x40034540 in __nptl_death_event () from
>> /lib/tls/i686/cmov/libpthread.so.0
>> 2 process 12471 0xffffe410 in __kernel_vsyscall ()
>> * 1 process 12472 0xffffe410 in __kernel_vsyscall ()
>>
>> So one thread is running, four are probably sleeping, one is dying...
>> and one is creating another thread. I'd bet money that we aren't
>> attached to the one being created yet. I can produce all sorts of
>> other interesting GDB internal errors this way, too.
>>
>
> I also tried the TD_DEATH method but gave up after running into
> segfaults. I have to go back and recheck whether those segfaults could
> occur without the Crtl-C being issued. I can't remember off-hand.
>
Just to confirm, I don't see any segfaults just running, but this code is
extremely brittle on my RHEL3-U1 system. It is coming back to me why I
abandoned TD_DEATH.
On the first Ctrl-C I usually get:
Program received signal SIGINT, Interrupt.
[Switching to Thread -1226028112 (LWP 16580)]
0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) info threads
[New Thread -1226163280 (LWP 16579)]
Can't attach LWP 16579: Operation not permitted
(gdb) info threads
[New Thread -1226298448 (LWP 16578)]
Can't attach LWP 16578: Operation not permitted
(gdb) info threads
158 Thread -1226298448 (LWP 16578) 0xb75ce6b1 in __nptl_death_event ()
from /lib/tls/libpthread.so.0
157 Thread -1226163280 (LWP 16579) 0xb75ce6b1 in __nptl_death_event ()
from /lib/tls/libpthread.so.0
* 156 Thread -1226028112 (LWP 16580) 0xb75ebc32 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2
1 Thread -1218598080 (LWP 16423) 0xb75ce6a1 in __nptl_create_event ()
from /lib/tls/libpthread.so.0
(gdb) c
After continuing, the program will go shortly and then die.
[Thread -1226163280 (zombie) exited]
[New Thread -1226433616 (LWP 16674)]
Cannot enable thread event reporting for Thread -1226433616 (LWP 16674): generic
error
My original fix doesn't break near as easily on RHEL-U1. Hitting enough
Ctrl-C's does eventually trigger an error (occassionally even a gdb assert), but
this may be part of catching the race condition before we know about the thread.
Now, it is more than likely just a bug in the TD_DEATH event processing because
we haven't exercised it. Would it be worth looking at implementing your
original alternative for my patch?
You asked me in a previous note to run with strace on and confirm that I was not
getting WIFEXITTED for the dying threads. I confirmed this. I only get one
WIFEXITTED and that is for the main thread.
I still need to do the Eclipse test.
-- Jeff J.
>> Does this patch fix the problems you were seeing with less pathological
>> test cases? Eclipse, in this case, is a less pathological test case,
>> since it creates threads that actually do a bit of work. If it does,
>> I'll add the necessary glibc version checking. The only way to fix the
>> race on the create side is to add PTRACE_EVENT_CLONE support (this _is_
>> what it was for), but that will require some major surgery.
>>
>
> As you know, I am also looking at the PTRACE_EVENT_CLONE stuff. I'll
> try this patch out with the Eclipse test. I also will include your
> lin-lwp linux event handler fix to see if that gets past the fork
> problem they were seeing.
>
>> Index: thread-db.c
>> ===================================================================
>> RCS file: /cvs/src/src/gdb/thread-db.c,v
>> retrieving revision 1.37
>> diff -u -p -r1.37 thread-db.c
>> --- thread-db.c 29 Feb 2004 02:39:47 -0000 1.37
>> +++ thread-db.c 24 Mar 2004 16:47:23 -0000
>> @@ -1,6 +1,6 @@
>> /* libthread_db assisted debugging support, generic parts.
>>
>> - Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc.
>> + Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc.
>>
>> This file is part of GDB.
>>
>> @@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr;
>> static void thread_db_find_new_threads (void);
>> static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
>> const td_thrinfo_t *ti_p, int verbose);
>> +static void detach_thread (ptid_t ptid, int verbose);
>>
>>
>> /* Building process ids. */
>> @@ -154,6 +155,8 @@ struct private_thread_info
>> unsigned int th_valid:1;
>> unsigned int ti_valid:1;
>>
>> + unsigned int dying:1;
>> +
>> td_thrhandle_t th;
>> td_thrinfo_t ti;
>> };
>> @@ -501,7 +504,7 @@ enable_thread_event_reporting (void)
>> /* Set the process wide mask saying which events we're interested
>> in. */
>> td_event_emptyset (&events);
>> td_event_addset (&events, TD_CREATE);
>> -#if 0
>> +#if 1
>> /* FIXME: kettenis/2000-04-23: The event reporting facility is
>> broken for TD_DEATH events in glibc 2.1.3, so don't enable it for
>> now. */
>> @@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr
>> struct thread_info *tp;
>> td_err_e err;
>>
>> + /* We may already know about this thread, for instance when the
>> + user has issued the `info threads' command before the SIGTRAP
>> + for hitting the thread creation breakpoint was reported. */
>> + if (in_thread_list (ptid))
>> + {
>> + tp = find_thread_pid (ptid);
>> + gdb_assert (tp != NULL);
>> +
>> + if (!tp->private->dying)
>> + return;
>> +
>> + delete_thread (ptid);
>> + }
>> +
>> check_thread_signals ();
>>
>> /* Add the thread to GDB's thread list. */
>> @@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t
>> static void
>> detach_thread (ptid_t ptid, int verbose)
>> {
>> + struct thread_info *thread_info;
>> +
>> if (verbose)
>> printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid));
>> +
>> + /* Don't delete the thread now, because it still reports as active
>> until
>> + it has executed a few instructions after the event breakpoint. */
>> + thread_info = find_thread_pid (ptid);
>> + gdb_assert (thread_info != NULL);
>> + thread_info->private->dying = 1;
>> }
>>
>> static void
>> @@ -848,11 +873,7 @@ check_event (ptid_t ptid)
>> {
>> case TD_CREATE:
>>
>> - /* We may already know about this thread, for instance when the
>> - user has issued the `info threads' command before the SIGTRAP
>> - for hitting the thread creation breakpoint was reported. */
>> - if (!in_thread_list (ptid))
>> - attach_thread (ptid, msg.th_p, &ti, 1);
>> + attach_thread (ptid, msg.th_p, &ti, 1);
>>
>> break;
>>
>> @@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh
>>
>> ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid));
>>
>> - if (!in_thread_list (ptid))
>> - attach_thread (ptid, th_p, &ti, 1);
>> + attach_thread (ptid, th_p, &ti, 1);
>>
>> return 0;
>> }
>>
>>
>
>
More information about the Gdb-patches
mailing list