This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RFC: Deadlock in multithreaded application when doing IO and fork.
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>
- Cc: Roland McGrath <roland at hack dot frob dot com>
- Date: Thu, 31 Jan 2013 18:02:38 -0500
- Subject: RFC: Deadlock in multithreaded application when doing IO and fork.
Community,
I've seen what I believe to be the following deadlock
scenario in a multithreaded application when doing
IO and forking.
It is safe to call fork in a multthreaded environment.
It is also safe to do IO in a multithreaded environment.
Doing both at the same time is supposed to be safe,
but is known to be dangerous if you don't know what
you're doing.
The following deadlock scenario looks like a bug in glibc.
Thread A:
* Calls fork() which runs pthread_atfork handlers.
* Malloc's registered handler runs and locks out all other
threads from doing allocations by causing them to block.
* Tries to take the IO list lock via _IO_list_lock()
-> _IO_lock_lock(list_all_lock);
Thread B:
* Calls _IO_flush_all()
-> _IO_flush_all_lockp (do_lock=1)
-> _IO_lock_lock (list_all_lock); ...
* Walks the list of open fp's and tries to take each
fp's lock and flush.
Thread C:
* Calls _IO_getdelim which takes only the fp lock
and then tries to call either malloc or realloc.
* Tries to take arena lock to allocate memory.
So in summary:
Thread A waits on thread B for the list_all_lock lock.
Thread B waits on thread C for the fp file lock.
Thread C waits on thread A for the arena lock.
The window for this to happen is small.
You need at least three threads.
One possible fix looks like this:
Thread A:
* Calls fork() which runs pthread_atfork handlers.
* Run malloc's registered handler *last*
* Malloc's handler does:
- Call _IO_list_lock() first.
- Lock all arenas last.
* Continue with fork processing...
Thread B:
...
Thread C:
...
The salient point is that the last thing we do
is lock list_all_lock and then lock the arenas.
In this case A and B each try to acquire list_all_lock
before locking the arenas, and that ensures that other
threads are able to make forward progress or are
blocked, but not deadlocked.
The wrinkle here is that once you take list_all_lock
another thread could be inside malloc and trigger an
assertion or debug output, which will block, and
deadlock fork() e.g.
T1 T2
fork
take list_all_lock
calls malloc
takes arena lock.
malloc aborts and tries to do IO.
or
user define malloc tries to do IO.
blocks on list_all_lock.
blocks on arena lock
This seems like a better scenario than before. Now we only
deadlock in a failure scenario during a smaller window.
We could detect this in malloc by doing a trylock on the
list_all_lock and avoid the IO, continuing on to calling
abort().
In abort() we will flush all the IO *without* taking locks
(calls _IO_flush_all_lockp(do_lock=0)) since abort()
might be called from anywhere.
User malloc handlers will have to setup pthread_atfork
handlers to notify themselves of the upcoming fork and
that they should stop doing IO or risk inconsistent
state in the child and deadlock in the parent.
I don't see any serious performance arguments against
this fix, we are simply changing the order of the lock
acquisition. I might actually argue that by moving the
arena locking last, we actually allow other threads to
make progress while we run our handlers.
There are some arguments that can be made for locking
arenas early in the process and how that impacts performance
for the forking thread (but deteriorates it for all other
threads).
Comments?
Cheers,
Carlos.