Bug 21422 - Deadlock in 2.25 pthread_cond_broadcast after process abort
Summary: Deadlock in 2.25 pthread_cond_broadcast after process abort
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.25
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-23 19:43 UTC by Michael Krasnyk
Modified: 2017-12-25 17:01 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Test for reproduction (1017 bytes, text/x-csrc)
2017-04-23 19:43 UTC, Michael Krasnyk
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Krasnyk 2017-04-23 19:43:10 UTC
Created attachment 10015 [details]
Test for reproduction

Hi, in glibc 2.25 the attached test hangs in pthread_cond_broadcast. The test code creates a shared mmapped file /tmp/test.mmap with a shared mutex and a conditional variable. If the waiting process was aborted and restarted, the signaling process hangs in pthread_cond_broadcast.

Correct behavior:
./a.out 1
./a.out 2 &  ( sleep 1 ; ./a.out 3 )

Hangs on the third line in "./a.out 3" process with libc 2.25, but works with 2.24:
./a.out 1
./a.out 2 &  p=$! ; ( sleep 1 ; kill -SIGABRT $p ; ./a.out 3 )
./a.out 2 &  ( sleep 1 ; ./a.out 3 )
Comment 1 Dimitri Staessens 2017-04-24 16:28:20 UTC
Hi, 

I'm following the nptl bugs since we found some apparent ones ourselves in 2.25.

I looked at your code and it could be you are depending on undefined behaviour. When you abort and restart the process, shouldn't you use a robust mutex? Can you try that?

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_setrobust.html

and check if the calls set errno values to EOWNERDEAD or ENOTRECOVERABLE.

cheers,

Dimitri
Comment 2 Michael Krasnyk 2017-04-24 17:08:12 UTC
Hi Dimitri,
i have tried the same test with additional line
pthread_mutexattr_setrobust(&m_attr, PTHREAD_MUTEX_ROBUST); 
but the results is the same. The aborted process waits not in pthread_mutex_lock
but in pthread_cond_wait, so the mutex is not locked by the process.
Also the hang occurs not in waiting for a signal, but in sending one by pthread_cond_signal or pthread_cond_broadcast.

Regards,
Michael
Comment 3 Dimitri Staessens 2017-04-24 17:41:16 UTC
Hi Michael,

thanks for trying. Just wanted get that angle covered.

I can confirm I'm seeing the same behaviour on my system. It works in 2.24 and on 2.25 the sending process hangs in pthread_cond_wait. However, when compiling a debug build (gcc -g3 -lpthread test.c -o test), it did work. Does a debug build work on your end as well?

I hope the devs can do something with this.

Thanks for flagging this, so we know we don't have to look for bugs in our code when we see this one pop up.

cheers,

Dimitri
Comment 4 Dimitri Staessens 2017-04-24 18:05:55 UTC
It has nothing to do with the debug build. It just doesn't happen always. I'm not sure it has anything to do with the SIGABRT either. The pthread_cond_broadcast just seems to hang randomly.

This is some gdb output:
0x00007f8fdcdd2477 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
(gdb) bt
#0  0x00007f8fdcdd2477 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x000000000040103b in main (argc=2, argv=0x7ffc7507baf8) at test.c:137
(gdb) frame 1
#1  0x000000000040103b in main (argc=2, argv=0x7ffc7507baf8) at test.c:137
137           pthread_cond_broadcast(&block->condition);
(gdb) print block.condition
$1 = {__data = {{__wseq = 9, __wseq32 = {__low = 9, __high = 0}}, {__g1_start = 1, __g1_start32 = {__low = 1, __high = 0}}, __g_refs = {5, 2}, __g_size = {0, 0}, __g1_orig_size = 13, __wrefs = 25, __g_signals = {5, 0}}, 
  __size = "\t\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\005\000\000\000\002", '\000' <repeats 11 times>, "\r\000\000\000\031\000\000\000\005\000\000\000\000\000\000", __align = 9}
(gdb) 


cheers,

Dimitri
Comment 5 Torvald Riegel 2017-04-25 11:33:56 UTC
In general, if you kill processes that may be executing blocking synchronization, then this may block other processes that the former processes where synchronizing with.  For example, non-robust mutexes are blocking synchronization; if you happen to kill a process that has acquired a lock, no other process can subsequently acquire the lock.  Robust mutexes work around that by employing something like a failure detector.

Non-blocking synchronization mechanisms such as C11 atomics that are lock-free do continue to work if one of the processes stops, or is killed. (However, please treat this statement as just an illustrating example; killing a C11 process is kind of a grey zone regarding the standard's forward progress requirements.)

Condition variables are not required to perform non-blocking synchronization only.  POSIX (and glibc) do not specify (or provide) robust variants of condition variables either.  Therefore, this is not a bug.

I don't know what you want to do, but one approach would be to catch signals and use POSIX' cancellation to cancel the pthread_cond_wait; another option might to use nonblocking synchronization such as C11 atomics.
Comment 6 Dimitri Staessens 2017-04-25 13:38:17 UTC
Hi Torvald,

thank you for the insights. I have a question, though: 

POSIX does define pthread_cond_wait() as a cancellation point. So, if the calling thread is cancelled, shouldn't the condition variable handle this correctly? I also observe the process sending the pthread_cond_broadcast() lock up if the process blocking on the condvar was interrupted with SIGINT, and another process is started, for instance.

I do agree with you that a SIGKILL during the pthread_cond_wait() can result in undefined behaviour since there are no robust condvars.

thanks again,

Dimitri
Comment 7 Florian Weimer 2017-04-25 13:42:16 UTC
(In reply to Dimitri Staessens from comment #6)
> Hi Torvald,
> 
> thank you for the insights. I have a question, though: 
> 
> POSIX does define pthread_cond_wait() as a cancellation point. So, if the
> calling thread is cancelled, shouldn't the condition variable handle this
> correctly?

If you use pthread_cancel, then yes, it should work.  If the thread is terminated by other means, the condition variable may end up in an inconsistent state.

> I also observe the process sending the pthread_cond_broadcast()
> lock up if the process blocking on the condvar was interrupted with SIGINT,
> and another process is started, for instance.

What happens due to the SIGINT?  Does a futex system call return EINTR or something like that?
Comment 8 Dimitri Staessens 2017-04-25 14:36:18 UTC
I don't know exactly how SIGINT is implemented on Linux. But I now fully agree with you, this is not a bug but undefined behaviour.

Thanks for clearing it up for me!

cheers,

Dimitri
Comment 9 Michael Krasnyk 2017-04-25 16:05:55 UTC
Thanks for the discussion!

The original issue is https://github.com/Project-OSRM/osrm-backend/issues/3911 
Unfortunately shared condition variables may indeed end up in some inconsistent state and pthread_cancel will not work due to possible OOM SIGKILL signals.

Mainly the report is about clarification if it is an undefined behavior change from 2.24 to 2.25 or a regression. 
Without robust conditional variables it will not be possible to make correct IPC signaling.

Regards,
Michael
Comment 10 Carlos O'Donell 2017-04-25 23:41:16 UTC
Also note your specific example is not allowed by POSIX. You cannot serialize the mutex state to disk and reload it. A glibc upgrade could change the internal structure of the type and the second run of the application would fail to be able to correctly use the structure. I've written a blog post about the problems inherent here when background upgrades are allowed:
https://developers.redhat.com/blog/2017/03/13/cc-library-upgrades-and-opaque-data-types-in-process-shared-memory/
Comment 11 Torvald Riegel 2017-04-25 23:47:32 UTC
(In reply to Michael Krasnyk from comment #9)
> Thanks for the discussion!
> 
> The original issue is
> https://github.com/Project-OSRM/osrm-backend/issues/3911 
> Unfortunately shared condition variables may indeed end up in some
> inconsistent state and pthread_cancel will not work due to possible OOM
> SIGKILL signals.

If some of the synchronizing process can just be killed, then blocking synchronization in general will be a problem for you.  As I mentioned earlier, robust mutexes are different, but they had glibc implementation bugs before glibc 2.25 and still have a bug in current Linux kernels.

If you want to avoid busy waiting when using the atomics, perhaps try combining them with futexes to build exactly the synchronization mechanism you need.  I don't think that futexes explicitly promise to be nonblocking, but I believe they are in practice (eg, killing a process that's in a futex-wait call shouldn't block subsequent futex-wake calls).

> Mainly the report is about clarification if it is an undefined behavior
> change from 2.24 to 2.25 or a regression.

I'd like to stress again that this applies to all things do potentially blocking synchronization (eg, semaphores).  How likely you might be in practice to run into these problems will vary, but you can't really rely on it to not happen.

> Without robust conditional variables it will not be possible to make correct
> IPC signaling.

You can with nonblocking atomics, but I agree that this is harder.
Comment 12 Florian Schmidt 2017-12-25 17:01:52 UTC
i also stumbled upon this, and missed this report before reporting to my distribution: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884776

i changed our communication middleware to now use futex-syscalls on linux.

this lib is used cross-platform. i already had my own (hacky) condition variable impl for win32 and vxworks. the QNX pshared condvar worked as expected out-of-the box (they don't mention robust variants at all...).

i did had some strange behaviour with the old pshared condvar on linux (i think it was related to pthread_cond_destroy()'ing them...) 

so our code base grew another special case for signaling waiters. i like the futex based special case, i hope it will behave much better than the old impl.