Bug 12307 - accessing thread local storage blocks forever when using dlopen
Summary: accessing thread local storage blocks forever when using dlopen
Status: REOPENED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.12
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-09 15:12 UTC by Maxim Egorushkin
Modified: 2017-08-29 20:26 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Maxim Egorushkin 2010-12-09 15:12:52 UTC
Description of problem:
A new thread created by a shared library initialization routine blocks forever
accessing thread local storage when the shared library is loaded during
run-time using dlopen().

Version-Release number of selected component (if applicable):
$ rpm -q glibc
glibc-2.12.1-4.x86_64

How reproducible:
Always.

Steps to Reproduce:
$ cat shared.cc
#include <cassert>
#include <pthread.h>
#include <stdio.h>

namespace {

__thread void* some_value;

void* thread(void*)
{
    printf("thread enter\n");
    void* value = some_value; // hangs here forever
    printf("thread leave\n");
    return value;
}

struct X
{
    X()
    {
        pthread_t thread_id;
        int r = pthread_create(&thread_id, NULL, thread, NULL);
        assert(!r);
        void* value;
        r = pthread_join(thread_id, &value);
        assert(!r);
    }
} x;

}

$ g++ -Wall -shared -fpic -pthread -g -o shared.so shared.cc

$ cat loader.cc
#include <stdio.h>
#include <dlfcn.h>

char const shared_path[] = "./shared.so";

int main()
{
    printf("loading %s\n", shared_path);
    void* h = dlopen(shared_path, RTLD_NOW | RTLD_GLOBAL);
    printf("%s loaded at %p\n", shared_path, h);
    dlclose(h);
    printf("%s unloaded\n", shared_path);
}

$ g++ -Wall -pthread -ldl -g -o loader loader.cc


Actual results:
$ ./loader
loading ./shared.so
thread enter
(the above hangs forever)
  C-c C-c

Expected results:
$ ./loader
loading ./shared.so
thread enter
thread leave
./shared.so loaded at 0x7ff97355eb18
./shared.so unloaded

Additional info:
When the executable links the shared library explicitly at link time, i.e.:

$ g++ -Wall -pthread -ldl -g -o loader -l:./shared.so loader.cc 

it works as expected.

P.S. It was originally filed against Fedora. https://bugzilla.redhat.com/show_bug.cgi?id=661676
Comment 1 Ondrej Bilka 2013-10-13 08:57:59 UTC
Here I agree with Jakub that spawning threads in constructor is stupid.
Comment 2 Maxim Egorushkin 2013-10-13 11:42:11 UTC
A C++ constructor is used here instead of creating a function marked with gcc constructor attribute. See http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/Function-Attributes.html#index-g_t_0040code_007bdestructor_007d-function-attribute-2594

In either case (C++ constructor and gcc constructor function) that code gets executed when a shared library is loaded, which may be before main() or while in main(). As far as I am aware, there are no documented restrictions on which functions can not be called in either (like there are restrictions for what can be called from a signal handler).

Dynamic linking is designed to be transparent and I expect the same result in the case when ld.so loads the .so before main() is executed as well as when the program explicitly loads the .so from main().

Robert Y. Liu reports that this code works as expected with glibc-2.5, but not with glibc-2.12, so it appears that a regression crippled in somewhere between these two versions. I just checked it with Fedora 18 and glibc-2.16 and it is still broken.

I don't think "stupid" is valid justification for closing this technical issue.
Comment 3 Rich Felker 2013-10-14 23:09:34 UTC
Indeed, regardless of whether it's "stupid", as far as I know there's no formal reason it's invalid to create threads from a constructor and the bug is valid.
Comment 4 Ondrej Bilka 2013-10-15 06:15:36 UTC
On Mon, Oct 14, 2013 at 11:09:34PM +0000, bugdal at aerifal dot cx wrote:
> --- Comment #3 from Rich Felker <bugdal at aerifal dot cx> ---
> Indeed, regardless of whether it's "stupid", as far as I know there's no formal
> reason it's invalid to create threads from a constructor and the bug is valid.
> 
And will you write a patch?
Comment 5 Maxim Egorushkin 2013-10-15 08:59:43 UTC
(In reply to Ondrej Bilka from comment #4)
> On Mon, Oct 14, 2013 at 11:09:34PM +0000, bugdal at aerifal dot cx wrote:
> > --- Comment #3 from Rich Felker <bugdal at aerifal dot cx> ---
> > Indeed, regardless of whether it's "stupid", as far as I know there's no formal
> > reason it's invalid to create threads from a constructor and the bug is valid.
> > 
> And will you write a patch?

I would suggest diagnosing before prescribing.

A good start would be:

1) Confirm that this code works as expected with glibc-2.5.
2) Do git bisect between glibc-2.5 and glibc-2.12 to find the commit that broke it.
Comment 6 Florian Weimer 2017-03-10 18:32:53 UTC
This is an indirect consequence of bug 15686 and perhaps bug 16133.

All TLS access needs to be async-signal-safe, which is incompatible with locking, and may require separate fixes, so leaving this bug open.