Bug 14609 - Stack clobbering in pthread cancellation cleanup handlers
Summary: Stack clobbering in pthread cancellation cleanup handlers
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2012-09-22 23:46 UTC by Rich Felker
Modified: 2014-06-25 06:50 UTC (History)
1 user (show)

See Also:
Last reconfirmed:
fweimer: security-

test case demonstrating crash due to this bug (265 bytes, text/x-csrc)
2012-09-22 23:46 UTC, Rich Felker

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Felker 2012-09-22 23:46:47 UTC
Created attachment 6650 [details]
test case demonstrating crash due to this bug

As they are specified in POSIX, cancellation cleanup handlers are simply "invoked" when cancellation is acted upon, but nowhere is it specified that the block scope of the function from which the cancellation point is called terminates prematurely any time during the cleanup process, or that the block scope of one cleanup push/pop is terminates before the next cleanup handler is run. However, glibc's implementation of cleanup in terms of C++-style exceptions results in cleanup handlers running as exceptions in the block where they were installed, rather than merely being invoked. This means that objects used by the cleanup handler may no longer exist at the time the cleanup handler runs, and attempts to write to them will clobber the new stack frame in which the cleanup handler is running.

I'm attaching a working proof-of-concept that demonstrates the issue by clobbering its own stack and crashing, even though it has not done anything that the POSIX standard disallows. It's rather ugly and unnatural since I wrote it to ensure that the crash happens, not to mimic code that might appear in the real world. A more real-world example is something like:

char small_bug[SMALL_SIZE];
struct context ctx = { .buf = small_buf };
pthread_cleanup_push(cleanup_func, &ctx);
if (...) {
    char large_buf[LARGE_SIZE];
    ctx->buf = large_buf;
    ctx->buf = small_buf;

I'm aware that this bug is inherent in the cancellation design glibc chose to use, and is not easy to fix, but please do not mark this bug as invalid. If the glibc team wants to be able to keep this behavior, a defect report process with the Austin Group based on this glibc bug report should be started, with the goal of making a rigorous definition of the interaction of cancellation cleanup handlers and automatic object lifetimes, so that programs like my test case become non-conforming in the next edition (or TC?) of POSIX.

Short of an amendment to the standard, I believe this bug report is valid.
Comment 1 joseph@codesourcery.com 2012-09-23 19:24:58 UTC
To quote the Rationale for the 1996 edition of POSIX.1 (since various bits 
of POSIX rationale may well not have ended up integrated in the versions 
of POSIX based on the Single Unix Specification), "it is an explicit goal 
of this standard to be compatible with existing exception facilities and 
languages having exceptions" (B.18.1.3 Thread Cancellation Cleanup 
Handlers, page 579, lines 9213-9215).  Then, B.18.2.3 Establishing 
Cancellation Handlers, page 582, lines 9312-9314, "A more ambitious 
implementation of these routines might do even better by allowing the 
compiler to note that the cancellation cleanup handler is a constant and 
can be expanded inline.".  And B.18.3 Language-Independent Cancellation 
Functionality, page 585, lines 9455-9459: "It is intended that bindings be 
able to use language exception facilities as part of the implementation of 
thread cancellation.  In particular, it would be desirable to have thread 
cancellation, cancellation scopes, and their associated cleanup code map 
into exception raise, exception scopes, and exception handlers in 
languages providing them.".

I think that exceptions, associated unwinding and execution of cleanup 
handlers in the context where pthread_cleanup_push was called were pretty 
clearly intended to be allowed as an approach for implementing 
Comment 2 Rich Felker 2014-06-25 06:50:21 UTC
Yes, this is the response I got from their side too, but there is no specification for such behavior in the standard, and in the absence of such, there's no reason to believe that execution of any block ends as a result of cancellation being acted upon. I'd like to get this fixed in POSIX.