This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] nptl: Fix join/tryjoin comments.
On 2/11/19 4:37 PM, Florian Weimer wrote:
> * Carlos O'Donell:
>> This commit fixes several inaccurate comments in the implementation of
>> __pthread_timedjoin_ex(), pthread_tryjoin_np, and lll_wait_tid, all of
>> which are used to implement POSIX thread joining.
> GNU style is not to use () after function names.
Commit message fixed (by removing ()).
>> Firstly, in pthread_tryjoin_np() we only attempt the join if a read of
>> pd->tid == 0, because that means the thread has already been reaped by
>> the kernel and we can safely join it without blocking.
>> Secondly, all join implementations call the common
>> __pthread_timedjoin_ex and only if abstime is NULL (block) do we
>> actually need to use cancellation (to cancel the potentially infinite
> I think the change regarding abstime == NULL is wrong. This is about
> pthread_join/pthread_timedjoin_np (blocking) on the one hand and
> pthread_tryjoin_np on the other (not blocking).
Why is the comment wrong? Or do you mean to say the comment could be
made more accurate?
The same core __pthread_timedjoin_ex is used for everything, so it handles
In a join with abstime NULL we switch to asynchronous cancellation, and
this is done behind the scenes by lll_wait_tid calling lll_futex_wait_cancel
when abstime is NULL.
In a tryjoin we only call __pthread_timedjoin_ex when __tid is 0, and we
optimize away the lll_wait_tid by setting block to true. It doesn't change
the fact that if we had called lll_wait_tid, we would not have blocked
anyway, we would have stopped at the first atomic_load_acquire and never
Only in timedjoin, with abstime non-NULL, do we ever reach the call to
lll_futex_wait_cancel. It's the only time we switch to asychronous ancellation
because it's what we do around the *_cancel variants (until we have a better
method for doing cancellation).
Given this information:
(a) For join, block == true, and abstime == NULL, so we do use async cancel
for the futex wait. And the old comment was fine, but the new comment
is clearer. The old comment also was for when we *unconditionally*
called CANCEL_ASYNC() (which means "turn on async cancel"), but this
code was removed in 2018 by ce7eb0e9031, making this not something we
do unconditionally anymore.
(b) For tryjoin, block == false, and abstime == NULL, in theory the comment
in pthread_tryjoin should be deleted because block == false will mean
we *never* call lll_wait_tid...
29 /* If pd->tid != 0 then lll_wait_tid will not block on futex
30 operation. */
^^^^ With the block == false optimization we never call lll_wait_tid!
But the comment is wrong, probably just a typo in 2017 by
31 return __pthread_timedjoin_ex (threadid, thread_return, NULL, false);
but I like the comment because it holds regardless of the optimization.
(c) For timedjoin, block == true, and abstime != NULL, and the comment is
wrong, there is no async cancel involved. We call lll_wait_tid and
the abstime != NULL case calls __lll_timedwait_tid which is not a
cancellation point. No cancellation for timedjoin.
So in (c) we show that it's not about block vs non-block, it's about
making only pthread_join a cancellation point.
Is this a bug? Should __lll_timedwait_tid for very long timeouts have used
cancellation? No. Since the timedjoin is a GNU addition it is ours to define
as a cancellation point or not, and we choose not to. While join is mandated
by POSIX to be a cancellation point and it is (the only one of the joins to