Bug 14485

Summary:	File corruption race condition in robust mutex unlocking
Product:	glibc	Reporter:	Rich Felker <bugdal>
Component:	nptl	Assignee:	Torvald Riegel <triegel>
Status:	ASSIGNED ---
Severity:	normal	CC:	carlos, drepper.fsp, fweimer, mail
Priority:	P2	Flags:	fweimer: security-
Version:	unspecified
Target Milestone:	---
See Also:	https://sourceware.org/bugzilla/show_bug.cgi?id=20973 https://sourceware.org/bugzilla/show_bug.cgi?id=20985 https://sourceware.org/bugzilla/show_bug.cgi?id=19402 https://sourceware.org/bugzilla/show_bug.cgi?id=19089 https://sourceware.org/bugzilla/show_bug.cgi?id=21083 https://sourceware.org/bugzilla/show_bug.cgi?id=17463
Host:		Target:
Build:		Last reconfirmed:

Description Rich Felker 2012-08-17 18:52:06 UTC

The general procedure for unlocking a robust mutex is:

1. Put the mutex address in the "pending" slot of the thread's robust mutex list.
2. Remove the mutex from the thread's linked list of locked robust mutexes.
3. Low level unlock (clear the futex and possibly wake waiters).
4. Clear the "pending" slot in the thread's robust mutex list.

Suppose between steps 3 and 4, another thread in the same process obtains the mutex in such a way that it is necessarily the last user of the mutex, then unlocks, destroys, and frees it. It then calls mmap with MAP_SHARED on a file, device, or shared memory segment, which happens to be assigned the same address the robust mutex had, and the file contents at the offset where the futex was located happen to contain the tid of the first thread that was in between steps 3 and 4 above. Now, suppose the process is immediately killed. The kernel then sets bit 30 (owner died) at this offset in the mapped file, wrongly trusting that the pending field in the robust list header still points to a valid robust mutex.

As far as I can tell, the ONLY solution to this problem is to introduce a global (within the process) lock on mmap and munmap, and to hold it between steps 2 and 4 of the robust mutex unlock procedure. The same lock can also be used to fix bug #13064. To minimize cost, this lock should be a rwlock where mmap and munmap count as "read" operations (so they don't block one another) and only the dangerous robust mutex unlock and barrier operations count as "write" operations.

Comment 1 Rich Felker 2012-08-17 22:34:24 UTC

It seems this bug has been known (but not reported as a bug) since 2010 or earlier:

http://lists.freebsd.org/pipermail/svn-src-user/2010-November/003668.html

Keep in mind this thread I'm linking has some other complaints about NPTL's robust mutexes that are orthogonal to this bug report, such as the fact that you can maliciously mess up other processes that map the same mutex you have access to. These other complaints are perhaps QoI issues, but not major bugs; an application has no basis to assume it's safe to let untrusted processes map its synchronization objects.

Comment 2 Florian Weimer 2014-06-25 10:47:42 UTC

What causes the corruption? Can you really unmap a page which is in use in a futex system call? Do we have a test case?

Comment 3 Rich Felker 2014-06-25 15:47:03 UTC

The corruption is performed by the kernel when it walks the robust list. The basic situation is the same as in PR #13690, except that here there's actually a potential write to the memory rather than just a read.

The sequence of events leading to corruption goes like this:

1. Thread A unlocks the process-shared, robust mutex and is preempted after the mutex is removed from the robust list and atomically unlocked, but before it's removed from the list_op_pending field of the robust list header.

2. Thread B locks the mutex, and, knowing by program logic that it's the last user of the mutex, unlocks and unmaps it, allocates/maps something else that gets assigned the same address as the shared mutex mapping, and then exits.

3. The kernel destroys the process, which involves walking each thread's robust list and processing each thread's list_op_pending field of the robust list header. Since thread A has a list_op_pending pointing at the address previously occupied by the mutex, the kernel obliviously "unlocks the mutex" by writing a 0 to the address and futex-waking it. However, the kernel has instead overwritten part of whatever mapping thread A created. If this is private memory it (probably) doesn't matter since the process is ending anyway (but are there race conditions where this can be seen?). If this is shared memory or a shared file mapping, however, the kernel corrupts it.

I suspect the race is difficult to hit since thread A has to get preempted at exactly the wrong time AND thread B has to do a fair amount of work without thread A getting scheduled again. So I'm not sure how much luck we'd have getting a test case.

Comment 4 Niklas Hambüchen 2015-02-09 00:28:33 UTC

@maintainers, do you acknowledge this as a bug?

I'd like to use this in a shared memory setup, but am scared of this case to happen.

Comment 5 Carlos O'Donell 2015-02-09 20:41:17 UTC

(In reply to Rich Felker from comment #3)
> 1. Thread A unlocks the process-shared, robust mutex and is preempted after
> the mutex is removed from the robust list and atomically unlocked, but
> before it's removed from the list_op_pending field of the robust list header.
> 
> 2. Thread B locks the mutex, and, knowing by program logic that it's the
> last user of the mutex, unlocks and unmaps it, allocates/maps something else
> that gets assigned the same address as the shared mutex mapping, and then
> exits.

Isn't this undefined behaviour? You have not specified how you established a happens-after relationship between the destruction of the mutex by Thread B and the last use by Thread A. In this description you give it would seem to me that Thread A is still not done, and that the "program logic" from Thread B is destroying an in-use mutex and that results in undefined behaviour from Thread A. Thread B fails to establish a happens-after the use of the mutex from Thread A. If Thread B truly establishes a happens-after the unlock from Thread A, is there a problem? I don't think there is.

Did I get something wrong Rich?

Comment 6 Carlos O'Donell 2015-02-09 21:13:51 UTC

(In reply to Carlos O'Donell from comment #5)
> (In reply to Rich Felker from comment #3)
> > 1. Thread A unlocks the process-shared, robust mutex and is preempted after
> > the mutex is removed from the robust list and atomically unlocked, but
> > before it's removed from the list_op_pending field of the robust list header.
> > 
> > 2. Thread B locks the mutex, and, knowing by program logic that it's the
> > last user of the mutex, unlocks and unmaps it, allocates/maps something else
> > that gets assigned the same address as the shared mutex mapping, and then
> > exits.
> 
> Isn't this undefined behaviour? You have not specified how you established a
> happens-after relationship between the destruction of the mutex by Thread B
> and the last use by Thread A. In this description you give it would seem to
> me that Thread A is still not done, and that the "program logic" from Thread
> B is destroying an in-use mutex and that results in undefined behaviour from
> Thread A. Thread B fails to establish a happens-after the use of the mutex
> from Thread A. If Thread B truly establishes a happens-after the unlock from
> Thread A, is there a problem? I don't think there is.
> 
> Did I get something wrong Rich?

OK, I see what's wrong.

This issue is about self-synchronizing vs. not-self-synchronizing.

http://austingroupbugs.net/view.php?id=811

Given 811 has been accepted, I withdraw my complaint.

Your example is valid, and we do have a problem.

Comment 7 Rich Felker 2015-02-09 22:51:37 UTC

Carlos, there's actually a still-open related Austin Group issue, number 864:

http://austingroupbugs.net/view.php?id=864

If there's a desire from the glibc side that implementations not be required to handle the case of self-synchronized unmapping, please have someone make that case. I'd be interested in hearing some arguments on both sides, as I haven't really made up my own opinion on which way it should be resolved.

Comment 8 Rich Felker 2015-02-10 00:18:43 UTC

In reply to comment 4, this issue can be avoided by applications in at least two ways:

1. Use a separate mapping of the shared synchronization object for each user/thread that might want to unmap it.

2. Use a separate synchronization object local to the process to synchronize unmapping of the shared mutex.

Since the only way you'd have multiple threads in the same process accessing the shared synchronization object is by storing the pointer to the (mapping containing the) shared mutex in some process-local object that's shared between threads, it seems natural that you would already be synchronizing access to this memory with another mutex (or other synchronization object) stored with the pointer. So approach 2 seems like it's always practical, probably doesn't involve any new synchronization, and likely makes it unnecessary/useless to support self-synchronized unmapping. On the other hand, it may not actually be any harder to support self-synchronized unmapping than to support self-synchronized destruction+unmapping, which almost certainly needs to be supported.

Comment 9 Torvald Riegel 2015-02-10 21:57:01 UTC

I agree that there is an issue if we claim that a robust mutex can be destroyed as soon as the thread that wants to destroy it can acquire it and there is no other thread or process trying to acquire it anymore.  I don't think that whether we consider destruction or unmap without destruction makes a significant difference, except regarding performance of potential solutions.

Comment 10 Rich Felker 2015-02-10 22:17:51 UTC

Torvald, the distinction between unmap and destroy+unmap may be significant in that the costly synchronization could be tucked away in pthread_mutex_destroy to deal with the latter case but not the former. So I think this realistically comes into any performance-based argument of what the standard should mandate.

Comment 11 Niklas Hambüchen 2015-08-09 12:29:41 UTC

Could somebody summarise for me as somebody not familiar with the glibc internals, what is the status of this bug, and in which cases am I safe to use a robust mutex in a shared memory setup? Thanks!

Comment 12 Carlos O'Donell 2016-12-16 01:25:55 UTC

(In reply to mail from comment #11)
> Could somebody summarise for me as somebody not familiar with the glibc
> internals, what is the status of this bug, and in which cases am I safe to
> use a robust mutex in a shared memory setup? Thanks!

The problem is with the reuse of memory and waiting for the dying threads to finish their cleanup of the robust mutex.

You must not unmap the memory until the dying threads have had a chance to do their cleanup.

Re-iterating what Rich says in comment #8:
https://sourceware.org/bugzilla/show_bug.cgi?id=14485#c8

(a) Use a distinct mapping for each user/thread that uses the shared memory robust mutex and which also wants to unmap that memory. This way after everyone who uses the robust mutex removes their mappings, the dying thread's mapping remains as it is being shutdown and is removed last after it's own cleanup is done. Only after that point will anything reuse that memory.

(b) Use some kind of process-local synchronization (say sempahore) to wait for all users of the object to be done before you unmap the memory. The dying thread would count as a user, so you would not be able to unmap the memory yet, and that way it would have a chance to run it's own robust mutex cleanup. You might also use pthread_tryjoin_np to see if the dying thread is cleaned up by the kernel yet, since that would indicate it was done and you could recover the use count.

Comment 13 Torvald Riegel 2016-12-16 10:56:48 UTC

(In reply to Carlos O'Donell from comment #12)
> (In reply to mail from comment #11)
> > Could somebody summarise for me as somebody not familiar with the glibc
> > internals, what is the status of this bug, and in which cases am I safe to
> > use a robust mutex in a shared memory setup? Thanks!
> 
> The problem is with the reuse of memory and waiting for the dying threads to
> finish their cleanup of the robust mutex.
> 
> You must not unmap the memory until the dying threads have had a chance to
> do their cleanup.
> 
> Re-iterating what Rich says in comment #8:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14485#c8
> 
> (a) Use a distinct mapping for each user/thread that uses the shared memory
> robust mutex and which also wants to unmap that memory. This way after
> everyone who uses the robust mutex removes their mappings, the dying
> thread's mapping remains as it is being shutdown and is removed last after
> it's own cleanup is done. Only after that point will anything reuse that
> memory.
> 
> (b) Use some kind of process-local synchronization (say sempahore) to wait
> for all users of the object to be done before you unmap the memory. The
> dying thread would count as a user, so you would not be able to unmap the
> memory yet, and that way it would have a chance to run it's own robust mutex
> cleanup. You might also use pthread_tryjoin_np to see if the dying thread is
> cleaned up by the kernel yet, since that would indicate it was done and you
> could recover the use count.

Note that what Carlos said is a workaround for the problem that glibc robust mutexes do not currently satisfy the POSIX mutex destruction requirements.  I think glibc should eventually satisfy those requirements, which is a more general problem than just being related to unmapping memory.

Fixing this bug will require additional support in the kernel, so it will probably take some more time until we get there.