This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: pthread_mutex_lock hang during tls_get_addr_tail()
- From: Paul Pluzhnikov <ppluzhnikov at google dot com>
- To: paul at mad-scientist dot net
- Cc: "libc-help at sourceware dot org" <libc-help at sourceware dot org>
- Date: Sat, 10 Sep 2016 11:00:07 -0700
- Subject: Re: pthread_mutex_lock hang during tls_get_addr_tail()
- Authentication-results: sourceware.org; auth=none
- References: <1473527863.15006.119.camel@mad-scientist.net>
On Sat, Sep 10, 2016 at 10:17 AM, Paul Smith <paul@mad-scientist.net> wrote:
>
> Hi all. I have a weird issue and I wanted to see if anyone has any
> thoughts.
>
> On all the systems I've tried my code with it works fine (this is an
> extensively tested codbase). However, one of my users is using CentOS
> 6.5 with glibc 2.12-1.166.el6.x86_64 installed, and they are seeing a
> hang in pthread_mutex_lock() during a call to __tls_get_addr().
>
> Specifically, I have a shared library written in C++ (GCC 4.9.2) and the
> call is from the STL's __cxa_get_globals() function. Here's a
> stacktrace:
>
> Thread 21 (Thread 0x7f0061c53700 (LWP 5295)):
> #0 0x0000003f3e4094d1 in pthread_mutex_lock () from /lib64/libpthread.s.0
> #1 0x0000003f3dc110f7 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2
> #2 0x0000003f3dc11500 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
> #3 0x00007f0059679b9c in __cxa_get_globals () from /usr/local/lib64/libmylib.so
> #4 0x00007f0058cc4c47 in UncaughtExceptionCounter::getUncaughtExceptionCount (this=0x7f0061c50ce4)
> ...
>
> I looked at the implementation of __cxa_get_globals() and it only
> returns the address of a static __thread variable:
>
> get_global() _GLIBCXX_NOTHROW
> {
> static __thread
> abi::__cxa_eh_globals global;
> return &global;
> }
>
> extern "C" __cxa_eh_globals*
> __cxxabiv1::__cxa_get_globals() _GLIBCXX_NOTHROW
> { return get_global(); }
>
> More details: this environment is actually using a Java 1.8 JVM which is
> loading my .so and using JNI to access it. The hang doesn't happen on
> the first call to these functions, but it happens "pretty soon".
>
> I've loaded a CentOS 6.5 system in a QEMU VM and tried to reproduce it
> with the default glibc there (2.12-1.132) and can't reproduce the hang.
> I also upgraded to the latest 6.5 glibc (2.12-1.192) and can't
> reproduce it there either. I can't find this exact RPM (1.166) so I
> can't test that, so I'm not even sure if it's really a glibc issue or
> not.
>
> I guess what I'm wondering is if the above stacktrace and info rings any
> bells with anyone or suggests other places to look. I'm severely
> hampered by not being able to repro the problem myself but my user can
> do it on their system (which I don't have access to) within a minute or
> two, every time.
I believe you are looking at
https://sourceware.org/bugzilla/show_bug.cgi?id=16133
Our attempts to fix it have been reverted :-(
No idea why you only see this on CentOS.
--
Paul Pluzhnikov