POSIX thread cancellation
POSIX thread cancellation aims to provide a method for terminating a thread of execution.
The POSIX thread cancellation API includes 6 functions:
int pthread_setcancelstate(int state, int *oldstate);
int pthread_setcanceltype(int type, int *oldtype);
void pthread_cleanup_push(void (*routine)(void *), void *arg);
void pthread_cleanup_pop(int execute);
int pthread_cancel(pthread_t thread);
void pthread_testcancel(void);
Threads will set their cancel state, type, push and pop cleanup handlers to quiesce state in the event of a cancellation, cancel other threads, and test for their own cancellation.
The API seems relatively simple, but as with all concurrent APIs there is a considerably level of complexity that is not entirely clear at first.
This document aims to describe the more complex behaviour of the GNU implementation when using gcc and glibc to build your application.
Can I use cancellation?
Using cancellation requires that the application and all libraries be aware that cancellation is being used and act accordingly.
This means the following:
- Libraries that install signal handlers for asynchronously delivered signals must install handlers that disable and enable cancellation to avoid calls to cancellation points triggering in the signal handler (runs cleanup handler in async-signal context).
- User callbacks must know if they are called with cancellation enabled and act accordingly, either disabling cancellation, or avoiding calling functions with are not allowed or are cancellation points.
There is the additional implementation dependent requirement for C++ in gcc and glibc:
- Cancellation cannot be used with C++ if any object destructors exist in the cancellation region that would cause cancellation to occur within them.
The reason for this is more obscure. The cancellation support in glibc is provided by the unwinding machinery in the compiler. Destructors are listed as noexcept and the compiler expects no throwing from them. Because cancellation is a form of exception, as it tries to unwind beyond the destructor it will terminate the process because no throwing is allowed from the destructor. This can't be worked around since the compiler may use noexcept to make changes to the generated code and unwind tables that make it impossible to cancel from that region i.e. there is not enough information recorded.
As you can see, cancellation requires significant coordination between the application and libraries it uses. It is still a useful feature, but may not be readily useful for common off-the-shelf C or C++ libraries.
Asynchronous cancellation safety
The purpose of asynchronous cancellation is to allow purely computations threads to be interrupted. Asynchronous cancellation is not intended for any other purpose.
When asynchronous cancellation is active, POSIX states that only three functions are safe to call:
pthread_cancel
pthread_setcancelstate
pthread_setcanceltype
In summary, you can cancel yourself, disable cancellation, or move from asynchronous cancellation to deferred cancellation. All of these steps move your out of asynchronous cancellation.
The additional implementation details are as follows:
- In glibc today (2017-08-17) you cannot read or write to a thread local storage (TLS) variable for the first time in an asynchronous cancellation region because doing so might cause the implementation to call functions which are not asynchronous cancellation safe. The same problem extends to asynchronous signal handlers, and needs to be fixed in glibc to make TLS safe.
- You must prevent any compiler optimizations which would transform code in the asynchronous cancellation active region into function calls which are not asynchronous cancellation safe e.g. prevent loops from being converted into memcpy/memmove.
- You must not use any language feature or langauge functions which would cause the implementation to make library calls which themselves would call functions which are cancellation points.
Asynchronous cancellation is a very special case feature used by very few applications and in very special cases. This makes it unlikely that you will have problems if you only carry out computation in the code that has asynchronous cancellation enabled.
Deferred cancellation and signals
A common pitfall when using deferred cancellation and signals is to fail to realize the compositional issues with both of these features.
If you enable deferred cancellation, and receive an asynchronous signal, and if during the asynchronous signal handler you call a function that has deferred cancellation enabled, it is the semantic equivalent of having enabled asynchronous cancellation. For example calling stat in a signal handler is allowed, but if you do this in a signal handler that interrupts a deferred cancellation region, it will cause the cancellation to be immediately acted upon. The cancellation will then attempt to run cleanup handlers in asynchronous-signal context and that could be problematic if those cleanup handlers were not asynchronous-signal safe.
Again this underscores the need to coordinate the entire application and libraries if cancellation is to be supported safely.
One way to make cancellation easier to use is to ignore deferred cancellation in a signal handler, and delay the handling until the signal handler returns.
Cancellation and C++
As already discussed earlier, C++ destructors are marked noexcept regions from which cancellations cannot be started. This means that cancellations must be deferred until a later point. In practice this is not enforced in the GNU runtimes, and enabling cancellation in C++ requires those actions noted earlier in this document e.g. no destructors may call functions which are cancellation points.
Harmonizing cancellation in C and C++
The biggest problem in the glibc cancellation implementation is the various interactions with C++.
The known issues are:
Calling cancellation point functions from within destructors or any noexcept region results in program termination.
Optimization of unwind tables for functions that only call noexcept region results.
Overcoming these issues is not impossible and we list here one way to do this.
In glibc we already have what is called nocancel entry points for many functions which would otherwise be cancellation points (syscall wrappers in particular). Calling one of these functions is the equivalent of disabling cancellation, calling the function, and then enabling cancellation, but without having to pay the cost of doing two additional function calls and their state manipulation.
If all C++ code was compiled so as to cause cancellation point functions (all 247 possible functions) to call their non-cancellation enabled entry points, then all C++ code would be free of any cancellation points without any real additional cost. This would allow C++ application developers to make use of pthread_testcancel to add a cancellation point at specific places in their C++ code that would be valid places e.g. outside of destructors and noexcept regions. The call to pthread_testcancel would also prevent the compiler from optimizing away the unwind tables needed to unwind from the function calling the routine. Care would still need to be taken when calling other C libraries because they may contain calls to cancellation points, but this is no different than the normal inter-library coordination required to enable cancellation. At the very least these changes would ensure that C++ code would be able to use C library functions safely without introducing cancellation points.
A simpler alternative would be to add a thread attribute in glibc which indicates that cancellation is disabled. The C++ library would use the glibc thread attribute and disable cancellation for all C++ started threads. Then as a final resort for adding cancellation where needed teh developer could add calls to pthread_testcancel which would ignore the thread attribute and cause a cancellation point. Developers could use pthread_testcancel when they know it to be safe to throw an exception.
Compiling a distribution
Compiling the distribution with the ability to interoperate with C++ and cancellation means that we need unwind tables enabled everywhere. There is no technical reason not to compile your entire distribution with unwind tables to support interoperability with C++ and POSIX thread cancellation. These unwind tables should at a minimum support synchronous exceptions.
Lastly, calling a function that is a cancellation point in an asynchronous signal handler with deferred cancellation is the equivalent of acting upon asynchronous cancellation. In practice this requires everything that can be interrupted by the signal handler to be compiled with -fasynchronous-unwind-tables, it requires the cleanup handlers to be async-signal safe, and this is almost an impossible requirement to meet in a distribution. Therefore deferred cancellation handling from a signal handler should be considered undefined behaviour.
To summarize the suggested options for compiling a distribution:
Minimally compile everything with -funwind-tables.
Do not enforce compiling with -fasynchronous-unwind-tables.
Set the SIGCANCEL bit in sa_mask for all signal handler registrations. Then at every cancellation point check if the SIGCANCEL bit is set in the signal mask, and if it is further defer the cancellation to another time. In effect the SIGCANCEL bit in sa_mask acts as a proxy for detecting deferred cancellation in an asynchronous-signal handler. This prevents deferred cancellation from asynchronously cancelling code that is not expecting to be asynchronously cancelled.
Consider fixing glibc and gcc to cause all C++ code to disable cancellation of all glibc cancellation points (either by calling non-cancelling versions of functions or using a thread attribute), except pthread_testcancel, thus allowing C++ authors to selectively use cancellation.