This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683)
- From: Rich Felker <dalias at aerifal dot cx>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Fri, 12 Sep 2014 11:32:35 -0400
- Subject: Re: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683)
- Authentication-results: sourceware.org; auth=none
- References: <5410C70E dot 70207 at linux dot vnet dot ibm dot com> <1410533066 dot 4967 dot 96 dot camel at triegel dot csb>
On Fri, Sep 12, 2014 at 04:44:26PM +0200, Torvald Riegel wrote:
> On Wed, 2014-09-10 at 18:47 -0300, Adhemerval Zanella wrote:
> > I have summarized in  the current issues with GLIBC pthread cancellation system,
> > the current GLIBC implementation and the proposed solution by Rich Felker with the
> > adjustment required to enabled it on GLIBC.
> > It is still heavily WIP and I'm still planning to add more content, so any question,
> > comments, advices are welcomed.
> > The GLIBC adjustment to proposed solution is in fact the current work I'm doing to
> > rewrite pthread cancellation subsystem . My code still needs a *lot* of cleanup,
> > but initial results are promising. It is building on both powerpc64 and x86_64
> > (it won't build on others platforms basically because I rewrite the way cancelable
> > syscalls are done).
> The general direction seems good to me. Thanks for working on this. I
> have some questions/comments about the steps you outline in the "GLIBC
> adjustment" section on the wiki page:
> "4. Adjust nptl/pthread_cancel.c to send an signal instead of acting
> directly. This avoid synchronization issues about updating the
> cancellation status and also focus the logic on signal handler and
> cancellation syscall code."
> The current code seems focused on trying to avoid sending signal if
> possible, seemingly because of performance concerns. My gut feeling is
> that cancellation doesn't need to have low latency, but do we have
> sufficient confidence that always sending the signal is alright?
> I believe we can still avoid sending the signal with the new scheme;
> what we'd have to do is let the code for cancellable syscalls mark
> (using atomic ops) whether it's currently executing or not (ie,
> fetch_and_set to true and fetch_and_set to previous value); then the
> pthread_cancel should see whether it needs to send a signal or not.
> That adds a bit more synchronization, but seems possible.
I agree that avoiding sending a signal is still desirable. One reason
that may not have been considered is short reads/writes. If you send a
signal which you don't expect to be acted upon, chances are fairly
good that it's going to cause spurious short reads/writes due to
interrupting a syscall that has already transferred some data. This is
not a conformance problem and not critical, but it still seems
desirable to avoid.
> "8. Adjust 'lowlevellock.h' arch-specific implementations to provide
> cancelable futex calls (used in libpthread code)."
> Only FUTEX_WAIT should be cancellable, I suppose.
> I've been thinking about whether it would be easier for the
> implementation of cancellable pthreads functions to just call a
> noncancellable FUTEX_WAIT and handle EINTR properly (by not retrying the
> wait but calling pthread_testcancel instead so that any pending
> cancellation is acted upon). However, this seems to save less
> complexity than just doing for FUTEX_WAIT what we do for all the other
> cancellable syscalls, so it's probably not worth it.
This is not viable; it has race conditions, of the exact type which
the futex syscall and the new cancellation system are designed to
AFAIK it also only works if the cancellation signal is an interrupting
signal (no SA_RESTART) which is a big problem -- it would introduce
illegal EINTR results into programs not expecting to receive them and
not prepared to handle them.
> I'd add another thing to the list of steps, though: For future-proofing,
> it would be good pthread_setcancelstate and pthread_setcanceltype would
> have an explicit compiler barrier in them. If at some point we have LTO
> including those functions, we use C11 atomics, and compilers start
> optimizing those more aggressively, then I believe we're at risk that
> the compiler reorders code across the atomics in pthread_setcancelstate;
> this could lead to cancellation handlers seeing unexpected state when
> they are actually run. C11 has tighter rules for what's allowed in
> signal handlers (IIRC, these are still under discussion, at least in the
> C++ committee) than what POSIX requires, yet the cancellation handler is
> similar to a signal handler. A compiler barrier should hopefully
> prevent any issues due to that mismatch. Thoughts?
Since you're only concerned about signal handlers, not access from
other threads, volatile is probably sufficient here.