Bug 19448

Summary: deadlock in dlopen when ctor calls dlopen in another thread
Product: glibc Reporter: Szabolcs Nagy <nszabolcs>
Component: dynamic-linkAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED DUPLICATE    
Severity: normal CC: carlos, fweimer
Priority: P2 Flags: fweimer: security-
Version: 2.22   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Szabolcs Nagy 2016-01-12 10:48:31 UTC
an internal lock is held in dlopen while user code is executed (ctors).

this means user code can deadlock dlopen, observably breaking semantics
assuming thread creation is allowed in ctors.

thread creation is needed for the deadlock because GL(dl_load_lock) is
a recursive lock.

// main.c
#include <dlfcn.h>
int main()
{
	dlopen("mod.so", RTLD_NOW); // lock is held during ctors
}

// mod.c
#include <dlfcn.h>
#include <pthread.h>
static void *start(void *a)
{
	dlopen("xxx", RTLD_NOW); // lock in ctor in another thread
	return 0;
}
__attribute__((constructor)) static void foo(void)
{
	pthread_t td;
	pthread_create(&td, 0, start, 0);
	pthread_join(td, 0);  // main thread waits here
}
Comment 1 Carlos O'Donell 2016-01-13 03:05:18 UTC
This is another case of dlopen being synchronously reentered by another context of execution, but in this case there is no easy fix (like there was in the case of an interposed malloc that calls dlopen itself).

The constructor is essentially a foreign function being called while holding the load lock, and when you recursively enter dlopen again from another thread, you get a deadlock.

Siddhesh and I talked about this at one point last year and the consensus among us was that we can't fix this without rewriting the dynamic load routines to use atomic operations and remove all instances of the load lock. Doing so would simplify a lot of the other problems we have (we would also remove lazy TLS allocation to fix the AS-safe issue calling malloc when accessing TLS variables for the first time in signal handlers).
Comment 2 Florian Weimer 2017-03-10 18:29:35 UTC
This is a symptom of bug 15686.

*** This bug has been marked as a duplicate of bug 15686 ***