This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Ping[2]: [PATCH] Fix sporadic failure in tst-eintr1 test case
- From: Siddhesh Poyarekar <siddhesh at redhat dot com>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: Jonathan Nieder <jrnieder at gmail dot com>, KOSAKI Motohiro <kosaki dot motohiro at gmail dot com>, libc-alpha at sourceware dot org
- Date: Wed, 10 Apr 2013 13:22:23 +0530
- Subject: Re: Ping[2]: [PATCH] Fix sporadic failure in tst-eintr1 test case
- References: <20120920141557 dot 4d74155d at spoyarek> <20120920144516 dot 40d18070 at spoyarek> <20120924174116 dot 11fd6b63 at spoyarek> <20121002000423 dot 30e99e6a at spoyarek> <CAHGf_=rQSC33ZnpGR3xwuBUEZ=1-mEOupFP=8qXyc6r_Qy7G3g at mail dot gmail dot com> <20121001191239 dot GG16391 at elie dot Belkin> <5165144E dot 8080803 at redhat dot com>
On Wed, Apr 10, 2013 at 09:27:10AM +0200, Florian Weimer wrote:
> I ran into this on current Fedora 18.
I'm surprised it took so long :)
> This smells like a bug in our implementation. Can we fix this in
> glibc? Any pointers?
>
> pthread_join actually deallocating resources seems fairly important
> to me as a quality-of-implementation issue, irrespective of what the
> standard says.
It's not a fault with pthread_join and I don't think this can be fixed
with glibc. The core problem here is the latency between the kernel
notifying the pthread_join'er about the thread exit and the actual
reaping of the thread where the latter is what reduces NPROC. The
test case goes from joining to spawning the new thread faster than the
kernel is able to reap existing threads, resulting in a net increase
in NPROC.
We could introduce a wait in pthread_join but that would be a horrible
thing to do. There's a possibility of delaying notification of thread
exit to the joiner as long as possible within the kernel. I had
looked at it briefly last year, but since nobody's really hurting from
this other than the test itself, I didn't push too hard.
Siddhesh