This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: mmap'ed robust mutexes and possible undefined behaviour
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Marcos Dione <mdione at grulic dot org dot ar>, libc-help at sourceware dot org
- Date: Mon, 24 Nov 2014 16:48:32 -0500
- Subject: Re: mmap'ed robust mutexes and possible undefined behaviour
- Authentication-results: sourceware.org; auth=none
- References: <20141124203441 dot GA3759 at diablo dot grulicueva dot local>
On 11/24/2014 03:34 PM, Marcos Dione wrote:
> We found a situation where a robust mutex cannot be recovered
> from a stale lock and we're wondering if it's simply an undefined
> situation or a bug in the kernel. Attached you will find the sample
> code, which is loosely based on a glibc's test case.The gist of it is as
> follows:
>
> 1. we open a file.
> 2. we mmap it and use that mem to store a robust mutex.
> 3. we lock the mutex.
> 4. we munmap the file.
> 5. we close the file.
Undefined behaviour.
This results in undefined behaviour since the allocated storage for
the mutex object has been lost. You need to keep that storage around
for the robust algorithms to work with. Without any data you can't
do anything.
> The example does steps 1 and 2, then creates creates tw children
> who will try to do steps 3 to 5. Of course only one gets the lock while
> the other waits. If the child who has the lock does the 4th step, then
> the other child never recovers the stale lock. In any other situation
> (that is, commenting/removing the code) it works fine.
The other child will never wake up since you can't unlock the mutex.
You can't recover if you don't have the object any more.
You need the object to make it consistent and then use it again.
You've lost everything associated with the object unless you bring
back the mapping.
> This looks suspiciously like undefined behaviour, because it's like
> we're pulling the rug from under the mutex' feet, but in the other hand
> looks like a kernel bug because it doesn't really recover from the
> situation. What do you think?
It is undefined behaviour. You've remove the memory required for the futex
operation which needs a mapped-in page and address in order to operate.
Worse you also destroyed the userspace memory registered with the kernel
via set_robust_list that allows the kernel to cleanup the mutex if the thread
or process dies at an inopportune moment.
Does that answer your question?
I don't see how there is anything the kernel or glibc can do to prevent
the unmap from breaking all subsequent uses.
A historical note: In the past the robustness was maintained in the kernel
by extending the VMA to know if it had a robust list, and then at do_exit
the kernel could handle the cleanup, and in this case it could have handled
the cleanup during unmap in your use case. Unfortunately this was *terrible*
for performance and thus a pair of syscalls were created to manage the
robust lists in coordination with the kernel.
Cheers,
Carlos.