[RFC]: fix for recycled thread ids

Jeff Johnston jjohnstn@redhat.com
Fri Mar 19 19:35:00 GMT 2004


Daniel Jacobowitz wrote:
> On Fri, Mar 19, 2004 at 01:44:19PM -0500, Jeff Johnston wrote:
> 
>>Daniel Jacobowitz wrote:
>>
>>>On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote:
>>>
>>>
>>>>The following patch fixes a problem when a user application creates a 
>>>>thread shortly after another thread has completed.  For nptl, thread ids 
>>>>are addresses. If a thread completes/dies, the tid is available for reuse 
>>>>by a new thread.
>>>
>>>
>>>Does NPTL re-use the TID quickly, or cycle around the way LT did so
>>>that we only see this under high thread pressure?
>>>
>>
>>I can't say for sure as I don't maintain libthread_db.  The test case in 
>>question does create high thread pressure, but I think it would be a 
>>mistake to generalize and think that this couldn't happen in an existing 
>>application.
> 
> 
> I know you don't maintain NPTL, but this is the sort of question that
> we need to understand before we can fix the problem correctly.  I see
> that you've attached a testcase, so I'll take a look at it when I get
> back from my trip on Monday.
>

Ok, thanks.

> 
>>>>On RH9 and RHEL3, nptl threads do not have exit events associated with 
>>>>them.  I have already discussed this with Daniel J. who feels that the 
>>>>kernels are not doing the right thing, but regardless, the current and 
>>>>previous RH nptl kernels are behaving this way and gdb needs to handle 
>>>>it.  As such, when a new thread is created, if it is reusing the tid of a 
>>>>previous thread that gdb hasn't figured out isn't around any more, gdb 
>>>>ignores the create event and the new thread is not added.  Ignoring the 
>>>>event is done because it is possible for gdb to find out about the thread 
>>>>before it's creation event is reported and so the create event can be 
>>>>redundant information.
>>>
>>>
>>>What I haven't seen a good explanation of is what problem this causes. 
>>>If a thread goes away, and then a new thread using the same ID is
>>>created, and then we stop, what do we lose besides the cosmetic fact
>>>that there is no [New Thread] message?  Does anything go wrong?
>>>
>>>Also, I would like the issue of whether or not it is a kernel bug
>>>resolved before we discuss working around it in GDB.
>>>
>>
>>The problem is if a global signal is passed on to the inferior program when 
>>there are threads we have not attached to, the process terminates.  A 
>>Ctrl-C is such a signal.  In the example program, we only attach to the 
>>first 100 threads and when the Ctrl-C is issued, we get:
>>
>>ptrace: No such process.
>>thread_db_get_info: cannot get thread info: generic error
>>
>>The end-user is cooked.
> 
> 
> OK.  So what you're saying is, the problem is that we do not see that
> the new thread has been created, so we do not attach to it.  Is that
> right?
>

Yes.

> Conceptually, we attach to LWPs, not to threads.  That suggests to me
> that the correct fix is to ask the LWP layer if the LWP is attached
> rather than looking it up in the thread list in the first place. 
> We've already got an appropriate list of LWPs though we might need a
> new accessor.
> 

I like that idea.  We still have to deal with the bogus thread list entry.  The 
routine prune_threads calls thread_db_alive and it won't realize the thread info 
it has is bogus because it will find the tid is valid.

> 
>>Regarding resolving this issue as a kernel error, any fix for RHEL3 won't 
>>get shipped until Update 3.  I know of no scheduled update for RH9 and this 
>>would not qualify as a security update.
> 
> 
> That's not what I said - I don't care whether an update is published
> for any particular vendor's product.  I want us to understand whether
> we are working around a kernel bug or fixing an actual bug in GDB.
> That's another part of the problem that we need to understand in order
> to fix it correctly.
> 
> As the author of the kernel code in question, I think that it's a
> kernel bug.  Roland seemed to agree.
> 
> 
>>Would it make sense to rename thread-db.c to lin-thread-db.c?
> 
> 
> Probably not, but some explanatory comments may be in order.
> 



More information about the Gdb-patches mailing list