Created attachment 6650 [details] test case demonstrating crash due to this bug As they are specified in POSIX, cancellation cleanup handlers are simply "invoked" when cancellation is acted upon, but nowhere is it specified that the block scope of the function from which the cancellation point is called terminates prematurely any time during the cleanup process, or that the block scope of one cleanup push/pop is terminates before the next cleanup handler is run. However, glibc's implementation of cleanup in terms of C++-style exceptions results in cleanup handlers running as exceptions in the block where they were installed, rather than merely being invoked. This means that objects used by the cleanup handler may no longer exist at the time the cleanup handler runs, and attempts to write to them will clobber the new stack frame in which the cleanup handler is running. I'm attaching a working proof-of-concept that demonstrates the issue by clobbering its own stack and crashing, even though it has not done anything that the POSIX standard disallows. It's rather ugly and unnatural since I wrote it to ensure that the crash happens, not to mimic code that might appear in the real world. A more real-world example is something like: char small_bug[SMALL_SIZE]; struct context ctx = { .buf = small_buf }; pthread_cleanup_push(cleanup_func, &ctx); ... if (...) { char large_buf[LARGE_SIZE]; ctx->buf = large_buf; ... ctx->buf = small_buf; } ... pthread_cleanup_pop(0); I'm aware that this bug is inherent in the cancellation design glibc chose to use, and is not easy to fix, but please do not mark this bug as invalid. If the glibc team wants to be able to keep this behavior, a defect report process with the Austin Group based on this glibc bug report should be started, with the goal of making a rigorous definition of the interaction of cancellation cleanup handlers and automatic object lifetimes, so that programs like my test case become non-conforming in the next edition (or TC?) of POSIX. Short of an amendment to the standard, I believe this bug report is valid.
To quote the Rationale for the 1996 edition of POSIX.1 (since various bits of POSIX rationale may well not have ended up integrated in the versions of POSIX based on the Single Unix Specification), "it is an explicit goal of this standard to be compatible with existing exception facilities and languages having exceptions" (B.18.1.3 Thread Cancellation Cleanup Handlers, page 579, lines 9213-9215). Then, B.18.2.3 Establishing Cancellation Handlers, page 582, lines 9312-9314, "A more ambitious implementation of these routines might do even better by allowing the compiler to note that the cancellation cleanup handler is a constant and can be expanded inline.". And B.18.3 Language-Independent Cancellation Functionality, page 585, lines 9455-9459: "It is intended that bindings be able to use language exception facilities as part of the implementation of thread cancellation. In particular, it would be desirable to have thread cancellation, cancellation scopes, and their associated cleanup code map into exception raise, exception scopes, and exception handlers in languages providing them.". I think that exceptions, associated unwinding and execution of cleanup handlers in the context where pthread_cleanup_push was called were pretty clearly intended to be allowed as an approach for implementing cancellation.
Yes, this is the response I got from their side too, but there is no specification for such behavior in the standard, and in the absence of such, there's no reason to believe that execution of any block ends as a result of cancellation being acted upon. I'd like to get this fixed in POSIX.