This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC]: fix for recycled thread ids


Jeff Johnston wrote:
Daniel Jacobowitz wrote:

On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote:

Fortunately, in <gnu/libc-version.h> there is a function to return the
runtime version of glibc.  We should be able to use that - and the not
100% valid, but generally valid and already assumed by thread_db,
assumption that a native GDB, when used to debug native programs, is
debugging the same version of the C library - to enable TD_DEATH when
it is safe to do so.  This will let us detach the threads when they
die.

That has its own risks, since the thread continues to run for a short
while after the death event is reported.  For instance, in NPTL the
thread reports the event and then cleans up after itself; in LT I don't
remember whether the manager or the thread does this, but I think it's
the same.  I already wrote limited code to handle this, if you search
for "zombie" in thread-db.c, so it should be OK.  The gist is that we
remove it from the thread list right away, but do not detach the
thread.  We resist attaching to zombies.

At least, all that is how it looks to me.  I'll experiment with
TD_DEATH before I speculate further.



Here's a proof-of-concept patch. It is not quite right, in that now I can hit C-c about four times before I get the error(). It does illustrate what I was suggesting, though, minus the version checking.

The reason we still get the segfaults is not that surprising.  I spent
a little while coaxing it into producing a core dump, and got this:

(gdb) i threads
7 process 10303 0x40034531 in __nptl_create_event () from /lib/tls/i686/cmov/libpthread.so.0
6 process 12467 0xffffe410 in __kernel_vsyscall ()
5 process 12468 0xffffe410 in __kernel_vsyscall ()
4 process 12469 0x0804884e in threadfunc (arg=0x25) at lphello.c:64
3 process 12470 0x40034540 in __nptl_death_event () from /lib/tls/i686/cmov/libpthread.so.0
2 process 12471 0xffffe410 in __kernel_vsyscall ()
* 1 process 12472 0xffffe410 in __kernel_vsyscall ()


So one thread is running, four are probably sleeping, one is dying...
and one is creating another thread.  I'd bet money that we aren't
attached to the one being created yet.  I can produce all sorts of
other interesting GDB internal errors this way, too.


I also tried the TD_DEATH method but gave up after running into segfaults. I have to go back and recheck whether those segfaults could occur without the Crtl-C being issued. I can't remember off-hand.



Just to confirm, I don't see any segfaults just running, but this code is extremely brittle on my RHEL3-U1 system. It is coming back to me why I abandoned TD_DEATH.


On the first Ctrl-C I usually get:

Program received signal SIGINT, Interrupt.
[Switching to Thread -1226028112 (LWP 16580)]
0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) info threads
[New Thread -1226163280 (LWP 16579)]
Can't attach LWP 16579: Operation not permitted
(gdb) info threads
[New Thread -1226298448 (LWP 16578)]
Can't attach LWP 16578: Operation not permitted
(gdb) info threads
  158 Thread -1226298448 (LWP 16578)  0xb75ce6b1 in __nptl_death_event ()
   from /lib/tls/libpthread.so.0
  157 Thread -1226163280 (LWP 16579)  0xb75ce6b1 in __nptl_death_event ()
   from /lib/tls/libpthread.so.0
* 156 Thread -1226028112 (LWP 16580)  0xb75ebc32 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
  1 Thread -1218598080 (LWP 16423)  0xb75ce6a1 in __nptl_create_event ()
   from /lib/tls/libpthread.so.0
(gdb) c

After continuing, the program will go shortly and then die.

[Thread -1226163280 (zombie) exited]
[New Thread -1226433616 (LWP 16674)]
Cannot enable thread event reporting for Thread -1226433616 (LWP 16674): generic error


My original fix doesn't break near as easily on RHEL-U1. Hitting enough Ctrl-C's does eventually trigger an error (occassionally even a gdb assert), but this may be part of catching the race condition before we know about the thread.
Now, it is more than likely just a bug in the TD_DEATH event processing because we haven't exercised it. Would it be worth looking at implementing your original alternative for my patch?


You asked me in a previous note to run with strace on and confirm that I was not getting WIFEXITTED for the dying threads. I confirmed this. I only get one WIFEXITTED and that is for the main thread.

I still need to do the Eclipse test.

-- Jeff J.


Does this patch fix the problems you were seeing with less pathological
test cases?  Eclipse, in this case, is a less pathological test case,
since it creates threads that actually do a bit of work.  If it does,
I'll add the necessary glibc version checking.  The only way to fix the
race on the create side is to add PTRACE_EVENT_CLONE support (this _is_
what it was for), but that will require some major surgery.


As you know, I am also looking at the PTRACE_EVENT_CLONE stuff. I'll try this patch out with the Eclipse test. I also will include your lin-lwp linux event handler fix to see if that gets past the fork problem they were seeing.


Index: thread-db.c
===================================================================
RCS file: /cvs/src/src/gdb/thread-db.c,v
retrieving revision 1.37
diff -u -p -r1.37 thread-db.c
--- thread-db.c 29 Feb 2004 02:39:47 -0000 1.37
+++ thread-db.c 24 Mar 2004 16:47:23 -0000
@@ -1,6 +1,6 @@
/* libthread_db assisted debugging support, generic parts.
- Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc.
+ Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc.
This file is part of GDB.
@@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr;
static void thread_db_find_new_threads (void);
static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
const td_thrinfo_t *ti_p, int verbose);
+static void detach_thread (ptid_t ptid, int verbose);

/* Building process ids. */
@@ -154,6 +155,8 @@ struct private_thread_info
unsigned int th_valid:1;
unsigned int ti_valid:1;
+ unsigned int dying:1;
+
td_thrhandle_t th;
td_thrinfo_t ti;
};
@@ -501,7 +504,7 @@ enable_thread_event_reporting (void)
/* Set the process wide mask saying which events we're interested in. */
td_event_emptyset (&events);
td_event_addset (&events, TD_CREATE);
-#if 0
+#if 1
/* FIXME: kettenis/2000-04-23: The event reporting facility is
broken for TD_DEATH events in glibc 2.1.3, so don't enable it for
now. */
@@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr
struct thread_info *tp;
td_err_e err;
+ /* We may already know about this thread, for instance when the
+ user has issued the `info threads' command before the SIGTRAP
+ for hitting the thread creation breakpoint was reported. */
+ if (in_thread_list (ptid))
+ {
+ tp = find_thread_pid (ptid);
+ gdb_assert (tp != NULL);
+
+ if (!tp->private->dying)
+ return;
+
+ delete_thread (ptid);
+ }
+
check_thread_signals ();
/* Add the thread to GDB's thread list. */
@@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t
static void
detach_thread (ptid_t ptid, int verbose)
{
+ struct thread_info *thread_info;
+
if (verbose)
printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid));
+
+ /* Don't delete the thread now, because it still reports as active until
+ it has executed a few instructions after the event breakpoint. */
+ thread_info = find_thread_pid (ptid);
+ gdb_assert (thread_info != NULL);
+ thread_info->private->dying = 1;
}
static void
@@ -848,11 +873,7 @@ check_event (ptid_t ptid)
{
case TD_CREATE:
- /* We may already know about this thread, for instance when the
- user has issued the `info threads' command before the SIGTRAP
- for hitting the thread creation breakpoint was reported. */
- if (!in_thread_list (ptid))
- attach_thread (ptid, msg.th_p, &ti, 1);
+ attach_thread (ptid, msg.th_p, &ti, 1);
break;
@@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh
ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid));
- if (!in_thread_list (ptid))
- attach_thread (ptid, th_p, &ti, 1);
+ attach_thread (ptid, th_p, &ti, 1);
return 0;
}







Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]