This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683)


On 12-09-2014 11:44, Torvald Riegel wrote:
> On Wed, 2014-09-10 at 18:47 -0300, Adhemerval Zanella wrote:
>> I have summarized in [1] the current issues with GLIBC pthread cancellation system,
>> the current GLIBC implementation and the proposed solution by Rich Felker with the
>> adjustment required to enabled it on GLIBC.
>>
>> It is still heavily WIP and I'm still planning to add more content, so any question,
>> comments, advices are welcomed.
>>
>> The GLIBC adjustment to proposed solution is in fact the current work I'm doing to
>> rewrite pthread cancellation subsystem [2]. My code still needs a *lot* of cleanup,
>> but initial results are promising. It is building on both powerpc64 and x86_64 
>> (it won't build on others platforms basically because I rewrite the way cancelable
>> syscalls are done).
> The general direction seems good to me.  Thanks for working on this.  I
> have some questions/comments about the steps you outline in the "GLIBC
> adjustment" section on the wiki page:
>
> "4. Adjust nptl/pthread_cancel.c to send an signal instead of acting
> directly. This avoid synchronization issues about updating the
> cancellation status and also focus the logic on signal handler and
> cancellation syscall code."
>
> The current code seems focused on trying to avoid sending signal if
> possible, seemingly because of performance concerns.  My gut feeling is
> that cancellation doesn't need to have low latency, but do we have
> sufficient confidence that always sending the signal is alright?
>
> I believe we can still avoid sending the signal with the new scheme;
> what we'd have to do is let the code for cancellable syscalls mark
> (using atomic ops) whether it's currently executing or not (ie,
> fetch_and_set to true and fetch_and_set to previous value); then the
> pthread_cancel should see whether it needs to send a signal or not.
> That adds a bit more synchronization, but seems possible.

This solution seems feasible, but I would like to address it, if possible, on a next
patch iterations.  Right now, I would like to focus on implementation correctness
for such solution.  I will add on the wiki the suggestion. 

>
>
> "8. Adjust 'lowlevellock.h' arch-specific implementations to provide
> cancelable futex calls (used in libpthread code)."
>
> Only FUTEX_WAIT should be cancellable, I suppose.

Yes, I had to add cancel wrappers just for lll_futex_* that do FUTEX_WAIT.

>
> I've been thinking about whether it would be easier for the
> implementation of cancellable pthreads functions to just call a
> noncancellable FUTEX_WAIT and handle EINTR properly (by not retrying the
> wait but calling pthread_testcancel instead so that any pending
> cancellation is acted upon).  However, this seems to save less
> complexity than just doing for FUTEX_WAIT what we do for all the other
> cancellable syscalls, so it's probably not worth it.
>
>
> I'd add another thing to the list of steps, though: For future-proofing,
> it would be good pthread_setcancelstate and pthread_setcanceltype would
> have an explicit compiler barrier in them.  If at some point we have LTO
> including those functions, we use C11 atomics, and compilers start
> optimizing those more aggressively, then I believe we're at risk that
> the compiler reorders code across the atomics in pthread_setcancelstate;
> this could lead to cancellation handlers seeing unexpected state when
> they are actually run.  C11 has tighter rules for what's allowed in
> signal handlers (IIRC, these are still under discussion, at least in the
> C++ committee) than what POSIX requires, yet the cancellation handler is
> similar to a signal handler.  A compiler barrier should hopefully
> prevent any issues due to that mismatch.  Thoughts?
>
I will add this on the future iterations for this patch, but since it is not really
and issue right now, I prefer to work after this fix is corrected.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]