pthread_mutex_lock hang during tls_get_addr_tail()

Carlos O'Donell carlos@systemhalted.org
Sun Sep 11 10:18:00 GMT 2016


On Sat, Sep 10, 2016 at 7:09 PM, Paul Smith <paul@mad-scientist.net> wrote:
> On Sat, 2016-09-10 at 15:43 -0700, Paul Pluzhnikov wrote:
>> On Sat, Sep 10, 2016 at 1:33 PM, Paul Smith <paul@mad-scientist.net> wrote:
>>
>> > As far as I'm aware, no signals are being caught in the thread where
>> > this hang occurs.
>>
>> Well, I guess it's something different then.
>>
>> I took a look at tls_get_addr_tail, and the only lock it takes is the
>> loader lock. It also locks it with __rtld_lock_lock_recursive, which
>> means the thread can't self-deadlock.
>>
>> The key question then is: what other thread is holding the loader
>> locked, and what prevents that other thread from completing whatever
>> loader operation it's in?
>
> All the other threads are hanging out in pthread_cond_wait, from what I
> can remember, except maybe a few waiting on recv().  However I haven't
> examined them closely (because many of them were started by the JVM I
> don't have really useful stacktraces for many threads).  What sorts of
> things should I be looking for?  Does anything else take this lock,
> other than a dlopen() or similar?  Certainly my library doesn't do any
> dlopen() or similar.
>
> I'll try to get the user to send me a complete stack trace for all
> threads, but it may not happen until Monday.

You _need_ the context of the other threads.

Constructors are foreign functions which run with the internal dynamic
loader lock held to keep the state of loaded libraries consistent.

To me this looks like you have a thread A which calls dlopen, and
starts running constructors, which in turn create thread B which
touches a tls variable and therefore needs to wait for thread A to
finish with dlopen, but that can't happen because thread A waits on
thread B, and you have a deadlock only if you touch the tls variable.

The moral of the story is: Constructors are foreign functions running
during startup and are prone to lots of sequencing problems and should
be kept to a minimum.

Cheers,
Carlos



More information about the Libc-help mailing list