C++11 introduces the thread_local scope for variables that have thread scope. These variables are similar to the TLS variables declared with __thread with a few added features. One of the key features is that they support non-trivial constructors and destructors that are called on first-use and thread exit respectively. Additionally, if the current thread results in a process exit, destructors for thread_local variables should be called for this thread as well.
The compiler can handle constructors, but destructors need more runtime context. It could be possible that a dynamically loaded library defines and constructs a thread_local variable, but is dlclose()'d before thread exit. The destructor of this variable will then have the rug pulled from under it and crash. As a result, the dynamic linker needs to be informed so as to not unload the DSO.
To implement this support, glibc defines __cxa_thread_atexit_impl exclusively for use by libstdc++ (which has the __cxa_thread_atexit to wrap around it), that registers destructors for thread_local variables in a list. Upon thread or process exit, the destructors are called in reverse order in which they were added. The function is defined as:
int __cxa_thread_atexit_impl (void (*dtor) (void *), void *obj, void *dso_symbol);
dtor is the destructor to call,
obj is the object to pass as the first argument of DTOR,
dso_symbol is an object defined within the DSO. This is usually the __dso_handle symbol linked in from crtbegin.o into every DSO.
__cxa_thread_atexit_impl is called by libstdc++ on object construction. In this function, the DSO in which the dso_symbol is defined is marked as DF_1_NODELETE so that the DSO is not unloaded on dlclose. A reference counter is maintained for each DSO to keep track of the thread_local objects constructed or destroyed. That way, when all thread_local objects defined in a DSO are destroyed, the DF_1_NODELETE flag is cleared so that a subsequent dlclose unloads the DSO.
The list of destructors of thread_local variables for the thread are maintained in a TLS variable (declared with the __thread keyword). The destructors are called before pthread thread-specific data is destroyed. As a result, pthread thread-specific data (specifically, destructors for the data) cannot use thread_local variables.
As of 90b37cac8b5a3e1548c29d91e3e0bff1014d2e5c we track the number of thread_local objects that reference the DSO and when that reaches zero we allow the DSO to be unloaded. This change allows the removal of locking dl_load_lock during destructor execution since the operations are done atomically now. Detailed concurrency notes were added with this patch.