pthread_mutex_lock hang during tls_get_addr_tail()

Paul Smith paul@mad-scientist.net
Sat Sep 10 20:33:00 GMT 2016


On Sat, 2016-09-10 at 11:00 -0700, Paul Pluzhnikov wrote:
> On Sat, Sep 10, 2016 at 10:17 AM, Paul Smith <paul@mad-scientist.net> wrote:
> > Thread 21 (Thread 0x7f0061c53700 (LWP 5295)):
> > #0  0x0000003f3e4094d1 in pthread_mutex_lock () from /lib64/libpthread.s.0
> > #1  0x0000003f3dc110f7 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2
> > #2  0x0000003f3dc11500 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
> > #3  0x00007f0059679b9c in __cxa_get_globals () from /usr/local/lib64/libmylib.so
> > #4  0x00007f0058cc4c47 in UncaughtExceptionCounter::getUncaughtExceptionCount (this=0x7f0061c50ce4)
> >    ...
> >
> > I looked at the implementation of __cxa_get_globals() and it only
> > returns the address of a static __thread variable:
> >
> >   get_global() _GLIBCXX_NOTHROW
> >   {
> >     static __thread abi::__cxa_eh_globals global;
> >     return &global;
> >   }
> >
> >   extern "C" __cxa_eh_globals* __cxxabiv1::__cxa_get_globals() _GLIBCXX_NOTHROW
> >   { return get_global(); }
> >
> > More details: this environment is actually using a Java 1.8 JVM which is
> > loading my .so and using JNI to access it.  The hang doesn't happen on
> > the first call to these functions, but it happens "pretty soon".
> 
> I believe you are looking at
> https://sourceware.org/bugzilla/show_bug.cgi?id=16133
> Our attempts to fix it have been reverted :-(

Hm.  That does look somewhat suspicious.  I'm not sure I understand the
situation or implications completely though.  Surely we're not saying
that it's not possible to have reliable exception handling in a
dlopen()'d C++ .so if you're using glibc (because of it's use of
__thread variables)?  Even if loaded in a JVM?

As far as I'm aware, no signals are being caught in the thread where
this hang occurs.  Certainly the stacktrace where the hang occurs is not
within a signal handler; it's not even in an exception catch block: it's
just in a local variable destructor.  I'm not sure why this relatively
straightforward use of a __thread variable would fall afoul of the issue
raised in the linked bug.

I guess I should also point out (I didn't know this might make a
difference) that my shared library is statically linking jemalloc as a
replacement heap allocation library.  All of my code, including
jemalloc, is compiled and linked with -fvisibility=hidden and only a few
symbols are marked as having default visibility.

Is there something I can do in the way I link my .so that will work
around this problem?  If I use LD_PRELOAD to load my library when
starting the JVM could that make a difference?



More information about the Libc-help mailing list