pthread_mutex_lock hang during tls_get_addr_tail()

Paul Pluzhnikov ppluzhnikov@google.com
Sat Sep 10 18:00:00 GMT 2016


On Sat, Sep 10, 2016 at 10:17 AM, Paul Smith <paul@mad-scientist.net> wrote:
>
> Hi all. I have a weird issue and I wanted to see if anyone has any
> thoughts.
>
> On all the systems I've tried my code with it works fine (this is an
> extensively tested codbase).  However, one of my users is using CentOS
> 6.5 with glibc 2.12-1.166.el6.x86_64 installed, and they are seeing a
> hang in pthread_mutex_lock() during a call to __tls_get_addr().
>
> Specifically, I have a shared library written in C++ (GCC 4.9.2) and the
> call is from the STL's __cxa_get_globals() function.  Here's a
> stacktrace:
>
> Thread 21 (Thread 0x7f0061c53700 (LWP 5295)):
> #0  0x0000003f3e4094d1 in pthread_mutex_lock () from /lib64/libpthread.s.0
> #1  0x0000003f3dc110f7 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2
> #2  0x0000003f3dc11500 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
> #3  0x00007f0059679b9c in __cxa_get_globals () from /usr/local/lib64/libmylib.so
> #4  0x00007f0058cc4c47 in UncaughtExceptionCounter::getUncaughtExceptionCount (this=0x7f0061c50ce4)
>    ...
>
> I looked at the implementation of __cxa_get_globals() and it only
> returns the address of a static __thread variable:
>
>   get_global() _GLIBCXX_NOTHROW
>   {
>     static __thread
> abi::__cxa_eh_globals global;
>     return &global;
>   }
>
>   extern "C" __cxa_eh_globals*
>   __cxxabiv1::__cxa_get_globals() _GLIBCXX_NOTHROW
>   { return get_global(); }
>
> More details: this environment is actually using a Java 1.8 JVM which is
> loading my .so and using JNI to access it.  The hang doesn't happen on
> the first call to these functions, but it happens "pretty soon".
>
> I've loaded a CentOS 6.5 system in a QEMU VM and tried to reproduce it
> with the default glibc there (2.12-1.132) and can't reproduce the hang.
>  I also upgraded to the latest 6.5 glibc (2.12-1.192) and can't
> reproduce it there either.  I can't find this exact RPM (1.166) so I
> can't test that, so I'm not even sure if it's really a glibc issue or
> not.
>
> I guess what I'm wondering is if the above stacktrace and info rings any
> bells with anyone or suggests other places to look.  I'm severely
> hampered by not being able to repro the problem myself but my user can
> do it on their system (which I don't have access to) within a minute or
> two, every time.

I believe you are looking at
https://sourceware.org/bugzilla/show_bug.cgi?id=16133
Our attempts to fix it have been reverted :-(

No idea why you only see this on CentOS.

-- 
Paul Pluzhnikov



More information about the Libc-help mailing list