This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683)
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Wed, 17 Sep 2014 19:13:27 -0300
- Subject: Re: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683)
- Authentication-results: sourceware.org; auth=none
- References: <5410C70E dot 70207 at linux dot vnet dot ibm dot com> <1410533066 dot 4967 dot 96 dot camel at triegel dot csb>
On 12-09-2014 11:44, Torvald Riegel wrote:
> On Wed, 2014-09-10 at 18:47 -0300, Adhemerval Zanella wrote:
>> I have summarized in [1] the current issues with GLIBC pthread cancellation system,
>> the current GLIBC implementation and the proposed solution by Rich Felker with the
>> adjustment required to enabled it on GLIBC.
>>
>> It is still heavily WIP and I'm still planning to add more content, so any question,
>> comments, advices are welcomed.
>>
>> The GLIBC adjustment to proposed solution is in fact the current work I'm doing to
>> rewrite pthread cancellation subsystem [2]. My code still needs a *lot* of cleanup,
>> but initial results are promising. It is building on both powerpc64 and x86_64
>> (it won't build on others platforms basically because I rewrite the way cancelable
>> syscalls are done).
> The general direction seems good to me. Thanks for working on this. I
> have some questions/comments about the steps you outline in the "GLIBC
> adjustment" section on the wiki page:
>
> "4. Adjust nptl/pthread_cancel.c to send an signal instead of acting
> directly. This avoid synchronization issues about updating the
> cancellation status and also focus the logic on signal handler and
> cancellation syscall code."
>
> The current code seems focused on trying to avoid sending signal if
> possible, seemingly because of performance concerns. My gut feeling is
> that cancellation doesn't need to have low latency, but do we have
> sufficient confidence that always sending the signal is alright?
>
> I believe we can still avoid sending the signal with the new scheme;
> what we'd have to do is let the code for cancellable syscalls mark
> (using atomic ops) whether it's currently executing or not (ie,
> fetch_and_set to true and fetch_and_set to previous value); then the
> pthread_cancel should see whether it needs to send a signal or not.
> That adds a bit more synchronization, but seems possible.
This solution seems feasible, but I would like to address it, if possible, on a next
patch iterations. Right now, I would like to focus on implementation correctness
for such solution. I will add on the wiki the suggestion.
>
>
> "8. Adjust 'lowlevellock.h' arch-specific implementations to provide
> cancelable futex calls (used in libpthread code)."
>
> Only FUTEX_WAIT should be cancellable, I suppose.
Yes, I had to add cancel wrappers just for lll_futex_* that do FUTEX_WAIT.
>
> I've been thinking about whether it would be easier for the
> implementation of cancellable pthreads functions to just call a
> noncancellable FUTEX_WAIT and handle EINTR properly (by not retrying the
> wait but calling pthread_testcancel instead so that any pending
> cancellation is acted upon). However, this seems to save less
> complexity than just doing for FUTEX_WAIT what we do for all the other
> cancellable syscalls, so it's probably not worth it.
>
>
> I'd add another thing to the list of steps, though: For future-proofing,
> it would be good pthread_setcancelstate and pthread_setcanceltype would
> have an explicit compiler barrier in them. If at some point we have LTO
> including those functions, we use C11 atomics, and compilers start
> optimizing those more aggressively, then I believe we're at risk that
> the compiler reorders code across the atomics in pthread_setcancelstate;
> this could lead to cancellation handlers seeing unexpected state when
> they are actually run. C11 has tighter rules for what's allowed in
> signal handlers (IIRC, these are still under discussion, at least in the
> C++ committee) than what POSIX requires, yet the cancellation handler is
> similar to a signal handler. A compiler barrier should hopefully
> prevent any issues due to that mismatch. Thoughts?
>
I will add this on the future iterations for this patch, but since it is not really
and issue right now, I prefer to work after this fix is corrected.