This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] nptl: Fix deadlock on atfork handler which calls dlclose (BZ#24595)
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org, Florian Weimer <fweimer at redhat dot com>
- Date: Thu, 23 May 2019 09:36:08 -0500
- Subject: Re: [PATCH] nptl: Fix deadlock on atfork handler which calls dlclose (BZ#24595)
- References: <20190523133048.14922-1-adhemerval.zanella@linaro.org>
On 5/23/19 8:30 AM, Adhemerval Zanella wrote:
> Some real-world cases rely upon that atfork handlers can call functions
> that might change the atfork handlers, such as dlclose. Since 27761a10
> (Refactor atfork handlers), all internal atfork handlers access is
> protected with a simple lock, not allowing reentrancy. This leads to
> deadlocks for the aforementioned scenario. Although this specific usage
> is far from portable (as comment #2 in the bug report), glibc did allow
> it.
>
> This patch fixes by using a recursive lock along with a double-linked
> list to hold the atfork handlers. It allows atfork handlers to remove
> elements through dlclose without invaliding the forward or backward
> walking when issuing the atfork handlers.
>
> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu,
> powerpc-linux-gnu, and aarch64-linux-gnu.
Florian and I talked about this bug this morning, and I agreed with him
that we really need to get away from running foreign funcions with locks
held. Such solutions always lead to problems. If we are going to refactor
this isn't there a way we can do so that runs the foreign function without
holding the list lock?
Your solution looks correct to me, it's an incremental improvement that
fixes the underlying issue by allowing the lock to be recursive, so it
feels like a good stop gap. I'm just wondering out loud if we can do better
without too much more work.
For example look at stdlib/exit.c where we unlock the list lock before
running the handlers. We make a copy of the data we need and run the
function. Granted in this case it's only valid for one thread to ever be
calling _exit(), so we have it easier. In the atfork case we can have many
concurrent fork() calls in flight. Can we assume that the user is going
to coordinate their own dlclose vs. library-atfork-handler call?
Thoughts?
--
Cheers,
Carlos.