This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Propose fix for race conditions in pthread cancellation (bz#12683)

On Wed, 2014-09-10 at 18:47 -0300, Adhemerval Zanella wrote:
> I have summarized in [1] the current issues with GLIBC pthread cancellation system,
> the current GLIBC implementation and the proposed solution by Rich Felker with the
> adjustment required to enabled it on GLIBC.
> It is still heavily WIP and I'm still planning to add more content, so any question,
> comments, advices are welcomed.
> The GLIBC adjustment to proposed solution is in fact the current work I'm doing to
> rewrite pthread cancellation subsystem [2]. My code still needs a *lot* of cleanup,
> but initial results are promising. It is building on both powerpc64 and x86_64 
> (it won't build on others platforms basically because I rewrite the way cancelable
> syscalls are done).

The general direction seems good to me.  Thanks for working on this.  I
have some questions/comments about the steps you outline in the "GLIBC
adjustment" section on the wiki page:

"4. Adjust nptl/pthread_cancel.c to send an signal instead of acting
directly. This avoid synchronization issues about updating the
cancellation status and also focus the logic on signal handler and
cancellation syscall code."

The current code seems focused on trying to avoid sending signal if
possible, seemingly because of performance concerns.  My gut feeling is
that cancellation doesn't need to have low latency, but do we have
sufficient confidence that always sending the signal is alright?

I believe we can still avoid sending the signal with the new scheme;
what we'd have to do is let the code for cancellable syscalls mark
(using atomic ops) whether it's currently executing or not (ie,
fetch_and_set to true and fetch_and_set to previous value); then the
pthread_cancel should see whether it needs to send a signal or not.
That adds a bit more synchronization, but seems possible.

"8. Adjust 'lowlevellock.h' arch-specific implementations to provide
cancelable futex calls (used in libpthread code)."

Only FUTEX_WAIT should be cancellable, I suppose.

I've been thinking about whether it would be easier for the
implementation of cancellable pthreads functions to just call a
noncancellable FUTEX_WAIT and handle EINTR properly (by not retrying the
wait but calling pthread_testcancel instead so that any pending
cancellation is acted upon).  However, this seems to save less
complexity than just doing for FUTEX_WAIT what we do for all the other
cancellable syscalls, so it's probably not worth it.

I'd add another thing to the list of steps, though: For future-proofing,
it would be good pthread_setcancelstate and pthread_setcanceltype would
have an explicit compiler barrier in them.  If at some point we have LTO
including those functions, we use C11 atomics, and compilers start
optimizing those more aggressively, then I believe we're at risk that
the compiler reorders code across the atomics in pthread_setcancelstate;
this could lead to cancellation handlers seeing unexpected state when
they are actually run.  C11 has tighter rules for what's allowed in
signal handlers (IIRC, these are still under discussion, at least in the
C++ committee) than what POSIX requires, yet the cancellation handler is
similar to a signal handler.  A compiler barrier should hopefully
prevent any issues due to that mismatch.  Thoughts?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]