This is the mail archive of the
mailing list for the glibc project.
Re: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683)
- From: Rich Felker <dalias at aerifal dot cx>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Fri, 12 Sep 2014 13:17:21 -0400
- Subject: Re: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683)
- Authentication-results: sourceware.org; auth=none
- References: <5410C70E dot 70207 at linux dot vnet dot ibm dot com> <1410533066 dot 4967 dot 96 dot camel at triegel dot csb> <20140912153235 dot GP23797 at brightrain dot aerifal dot cx> <1410538290 dot 4967 dot 111 dot camel at triegel dot csb>
On Fri, Sep 12, 2014 at 06:11:30PM +0200, Torvald Riegel wrote:
> On Fri, 2014-09-12 at 11:32 -0400, Rich Felker wrote:
> > On Fri, Sep 12, 2014 at 04:44:26PM +0200, Torvald Riegel wrote:
> > > On Wed, 2014-09-10 at 18:47 -0300, Adhemerval Zanella wrote:
> > > "8. Adjust 'lowlevellock.h' arch-specific implementations to provide
> > > cancelable futex calls (used in libpthread code)."
> > >
> > > Only FUTEX_WAIT should be cancellable, I suppose.
> > >
> > > I've been thinking about whether it would be easier for the
> > > implementation of cancellable pthreads functions to just call a
> > > noncancellable FUTEX_WAIT and handle EINTR properly (by not retrying the
> > > wait but calling pthread_testcancel instead so that any pending
> > > cancellation is acted upon). However, this seems to save less
> > > complexity than just doing for FUTEX_WAIT what we do for all the other
> > > cancellable syscalls, so it's probably not worth it.
> > This is not viable; it has race conditions, of the exact type which
> > the futex syscall and the new cancellation system are designed to
> > avoid.
> The new scheme would be used, but would always set the cancellation as
> pending. So what I thought about would primarily change the way we act
> on cancellation, not when it's considered to have happened. There's no
> real side effect in waiting, so futex_wait is a special case.
The problem is that you end up waiting unboundedly long, possibly
forever. Consider cancellation of a pthread_cond_wait that will never
be signaled, where the cancellation request arrives in the race
> > > I'd add another thing to the list of steps, though: For future-proofing,
> > > it would be good pthread_setcancelstate and pthread_setcanceltype would
> > > have an explicit compiler barrier in them. If at some point we have LTO
> > > including those functions, we use C11 atomics, and compilers start
> > > optimizing those more aggressively, then I believe we're at risk that
> > > the compiler reorders code across the atomics in pthread_setcancelstate;
> > > this could lead to cancellation handlers seeing unexpected state when
> > > they are actually run. C11 has tighter rules for what's allowed in
> > > signal handlers (IIRC, these are still under discussion, at least in the
> > > C++ committee) than what POSIX requires, yet the cancellation handler is
> > > similar to a signal handler. A compiler barrier should hopefully
> > > prevent any issues due to that mismatch. Thoughts?
> > Since you're only concerned about signal handlers, not access from
> > other threads, volatile is probably sufficient here.
> Yes, we don't need HW synchronization between the interrupted code and
> the cancellation handler. Requiring programmers to use volatile
> whenever they need to communicate from interrupted code to cancellation
> handlers (on both sides!) could work in practice, but only if the
> pthread_setcancelstate etc. look to the compiler like they could call
> code that uses volatiles -- which isn't the case for these functions
> (assuming the compiler understands and optimizes atomics).
Oh, I see -- you're concerned about reordering with respect to code
in the calling application. In that case I think you're right -- you
need a full compiler barrier of some sort.