This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] nptl: Fix join/tryjoin comments.


On 2/11/19 4:37 PM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> This commit fixes several inaccurate comments in the implementation of
>> __pthread_timedjoin_ex(), pthread_tryjoin_np, and lll_wait_tid, all of
>> which are used to implement POSIX thread joining.
> 
> GNU style is not to use () after function names.

Commit message fixed (by removing ()).

>> Firstly, in pthread_tryjoin_np() we only attempt the join if a read of
>> pd->tid == 0, because that means the thread has already been reaped by
>> the kernel and we can safely join it without blocking.
>>
>> Secondly, all join implementations call the common
>> __pthread_timedjoin_ex and only if abstime is NULL (block) do we
>> actually need to use cancellation (to cancel the potentially infinite
>> wait).
> 
> I think the change regarding abstime == NULL is wrong.  This is about
> pthread_join/pthread_timedjoin_np (blocking) on the one hand and
> pthread_tryjoin_np on the other (not blocking).

Why is the comment wrong? Or do you mean to say the comment could be
made more accurate?

The same core __pthread_timedjoin_ex is used for everything, so it handles
3 cases:

* join
* tryjoin
* timedjoin

In a join with abstime NULL we switch to asynchronous cancellation, and
this is done behind the scenes by lll_wait_tid calling lll_futex_wait_cancel
when abstime is NULL.

In a tryjoin we only call __pthread_timedjoin_ex when __tid is 0, and we
optimize away the lll_wait_tid by setting block to true. It doesn't change
the fact that if we had called lll_wait_tid, we would not have blocked
anyway, we would have stopped at the first atomic_load_acquire and never
waited.

Only in timedjoin, with abstime non-NULL, do we ever reach the call to
lll_futex_wait_cancel. It's the only time we switch to asychronous ancellation
because it's what we do around the *_cancel variants (until we have a better
method for doing cancellation).

Given this information:

(a) For join, block == true, and abstime == NULL, so we do use async cancel
    for the futex wait. And the old comment was fine, but the new comment
    is clearer. The old comment also was for when we *unconditionally*
    called CANCEL_ASYNC() (which means "turn on async cancel"), but this
    code was removed in 2018 by ce7eb0e9031, making this not something we
    do unconditionally anymore.

(b) For tryjoin, block == false, and abstime == NULL, in theory the comment
    in pthread_tryjoin should be deleted because block == false will mean
    we *never* call lll_wait_tid...

 29   /* If pd->tid != 0 then lll_wait_tid will not block on futex
 30      operation.  */

      ^^^^ With the block == false optimization we never call lll_wait_tid!
      But the comment is wrong, probably just a typo in 2017 by
      4735850f7a4.

 31   return __pthread_timedjoin_ex (threadid, thread_return, NULL, false);

    but I like the comment because it holds regardless of the optimization.

(c) For timedjoin, block == true, and abstime != NULL, and the comment is
    wrong, there is no async cancel involved. We call lll_wait_tid and
    the abstime != NULL case calls __lll_timedwait_tid which is not a
    cancellation point. No cancellation for timedjoin.

So in (c) we show that it's not about block vs non-block, it's about
making only pthread_join a cancellation point.

Is this a bug? Should __lll_timedwait_tid for very long timeouts have used
cancellation? No. Since the timedjoin is a GNU addition it is ours to define
as a cancellation point or not, and we choose not to. While join is mandated
by POSIX to be a cancellation point and it is (the only one of the joins to
be so).

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]