This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] nptl: Fix deadlock on atfork handler which calls dlclose (BZ#24595)



On 23/05/2019 11:36, Carlos O'Donell wrote:
> On 5/23/19 8:30 AM, Adhemerval Zanella wrote:
>> Some real-world cases rely upon that atfork handlers can call functions
>> that might change the atfork handlers, such as dlclose.  Since 27761a10
>> (Refactor atfork handlers), all internal atfork handlers access is
>> protected with a simple lock, not allowing reentrancy.  This leads to
>> deadlocks for the aforementioned scenario. Although this specific usage
>> is far from portable (as comment #2 in the bug report), glibc did allow
>> it.
>>
>> This patch fixes by using a recursive lock along with a double-linked
>> list to hold the atfork handlers.  It allows atfork handlers to remove
>> elements through dlclose without invaliding the forward or backward
>> walking when issuing the atfork handlers.
>>
>> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu,
>> powerpc-linux-gnu, and aarch64-linux-gnu.
> 
> Florian and I talked about this bug this morning, and I agreed with him
> that we really need to get away from running foreign funcions with locks
> held. Such solutions always lead to problems. If we are going to refactor
> this isn't there a way we can do so that runs the foreign function without
> holding the list lock?
> 
> Your solution looks correct to me, it's an incremental improvement that
> fixes the underlying issue by allowing the lock to be recursive, so it
> feels like a good stop gap. I'm just wondering out loud if we can do better
> without too much more work.
> 
> For example look at stdlib/exit.c where we unlock the list lock before
> running the handlers. We make a copy of the data we need and run the 
> function. Granted in this case it's only valid for one thread to ever be
> calling _exit(), so we have it easier. In the atfork case we can have many
> concurrent fork() calls in flight. Can we assume that the user is going
> to coordinate their own dlclose vs. library-atfork-handler call?

We can go back to old behaviour of using a lock-free registration on the
atfork lists, make a lock-free copy on fork, and handle the synchronization
with __unregister_atfork using locks plus futexes.  It would require all the
old complexity of mixing lock-free algorithms with locks accesses plus using 
an unbounded alloca on fork (maybe we could just use malloc to allocate
the backup list, my understanding it is avoid to to add another issue to
make fork async-signal-safe).

I still think the old solution is over-complex and the issue lays more
in the fact atfork handlers are under-specified under POSIX. My understanding
is also this problem is also out of the scope of POSIX, so I think it should
be an implementation detail to specify what exactly atfork handlers are
allowed to call.

Now, back to your question I think once user start to use dlclose on
library-atfork-handler I support they know are they are doing (even though
this is far from portable is not a bad design imho). Maybe we can check
exactly what user in trying on the pcks11 module and work towards a more
sane solution.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]