Bug 13065

Summary: Race condition in pthread barriers
Product: glibc Reporter: Rich Felker <bugdal>
Component: nptlAssignee: Torvald Riegel <triegel>
Status: RESOLVED FIXED    
Severity: normal CC: fweimer, triegel
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: 2.23   
Host: Target:
Build: Last reconfirmed:
Attachments: simple demonstration of the bug

Description Rich Felker 2011-08-07 18:30:06 UTC
The glibc/NPTL implementation of pthread barriers has a race condition, whereby threads exiting the barrier may access memory belonging to the barrier after one or more of the pthread_barrier_wait calls has returned. At this point, per POSIX, the barrier is supposed to be in "the state it had as a result of the most recent pthread_barrier_init() function that referenced it." In particular, it's valid to call pthread_barrier_destroy on the barrier then re-initialize it with a new value, or to free/unmap the memory it's located in. The latter operation would especially make sense for a process-shared barrier where the caller is done using the barrier but other processes may continue to use it.

See the attachedment for a proof-of-concept that causes NPTL's pthread_barrier_wait to crash. This usage is not quite "correct" (the barrier should be destroyed before unmapping the memory, and only one thread should destroy it) but these issues could easily be fixed by throwing in a mutex. I've just made the code as simple as possible to demonstrate the bug.

Michael Burr proposed a solution (http://stackoverflow.com/questions/5886614/how-can-barriers-be-destroyable-as-soon-as-pthread-barrier-wait-returns/5902671#5902671) to this problem which which we have successfully incorporated into musl:

http://git.etalabs.net/cgi-bin/gitweb.cgi?p=musl;a=commitdiff;h=f16a3089be33a75ef8e75b2dd5ec3095996bbb87;hp=202911435b56fe007ca62fc6e573fa3ea238d337

However it only works for non-process-shared barriers, as it requires all waiters to be able to access the first waiter's address space. I am not aware of any fix for process-shared barriers that does not involve allocating shared resources at pthread_barrier_wait time, which could of course fail and leave the caller with no way to recover... I suspect fixing this robustly may require adding a FUTEX_BARRIER operation to the kernel that does not return success until "val" processes all call FUTEX_BARRIER on the same futex address.

Note that, unfortunately, process-shared barriers are the area where this bug has the greatest chance of hitting real-world applications, since a process is likely to unmap the barrier soon after it's done using it.
Comment 1 Rich Felker 2011-09-25 16:13:44 UTC
Created attachment 5944 [details]
simple demonstration of the bug

This is the attachment that was supposed to be included in the original bug report.
Comment 2 Torvald Riegel 2013-12-20 18:10:26 UTC
This is conceptually related to Bug 13690, whose resolution depends on the outcome of a POSIX request for clarification.  The same kind of wording that needs to be clarified for that bug is not present in the barrier specification, but it's essentially the same question of when POSIX synchronization objects can be safely destroyed.  Therefore, I think it's good to wait for a result of the clarification request.

Furthermore, destruction is somewhat more complex due to the standard leaving it unspecified whether a thread that processes a signal will keep other waiters block.  That is, if at least one of the other threads blocked on the barrier may process signals, then one cannot destruct a barrier after returning from a call to pthread_barrier_wait().
Comment 3 Rich Felker 2013-12-20 18:39:05 UTC
I don't think signals make it any more complicated. Implementation-wise, there are two possibilities:

1. A waiter stuck in a signal handler blocks other waiters until it returns. In this case, no waiter returns while the one waiter is still in the signal handler, and there's no special issue to deal with.

2. A waiter stuck in a signal handler allows other waiters to proceed once the last waiter has arrived. The ONLY way to implement this is to have some resource identifying the barrier instance (keep in mind: as soon as any waiter returns from the wait, the barrier is ready for reuse as a new instance) whose lifetime persists until the signal handler returns. In order to avoid requiring dynamic resource allocation for each barrier instance (which could fail, rendering barriers unsafe for any actual synchronization usage) the resource must essentially have its storage associated with the threads involved in the barrier instance (e.g. on their stacks, TLS, kernel task structures, etc.). In such an implementation, the thread stuck in the signal handler needs to be finished working with the barrier resource itself (since it could be reused for a new barrier instance, or destroyed) and must perform its waiting based on the instance resource associated with the waiting threads.

In case it's not clear, what I'm arguing is not in regards to what the standard says about barriers and self-synchronized destruction. My argument is that, in either case, there's no additional barrier-specific difficulty to supporting self-synchronized destruction. The other requirements of making the barrier implementation correct already put you in a good position for supporting self-synchronized destruction where it's no more difficult than for mutexes or semaphores.

BTW, since a thread's status as being a waiter on a barrier is not a testable condition (i.e. there's no way to measure whether it's waiting on the barrier versus suspended awaiting scheduling just prior to waiting on the barrier) the standard has no choice but to allow the option where signal handlers block forward process of other waiters. Allowing the other option, however, does create the possibility of observable behavior; if an implementation takes this option, you may observe other threads exiting the barrier wait while the signal handler is still running.
Comment 4 Torvald Riegel 2016-01-15 20:23:52 UTC
Fixed in b02840bacdefde318d2ad2f920e50785b9b25d69