deadlock in __lll_lock_wait() @ /lib64/libpthread.so.0

Carlos O'Donell carlos@systemhalted.org
Fri Nov 16 20:46:00 GMT 2012


On Fri, Nov 16, 2012 at 3:02 PM, Paweł Sikora <pluto@agmk.net> wrote:
> On Thursday 15 of November 2012 17:03:14 Carlos O'Donell wrote:
>> On Thu, Nov 15, 2012 at 12:58 PM, Paweł Sikora <pluto@agmk.net> wrote:
>> > Hi,
>> >
>> > i'm playing with some EDA simulator which loads dynamically (via dlopen) my plugin.
>> > during plugin initialization (global ctors) it deadlocks on the __lll_lock_wait.
>> > i'm observing this issue on RHEL-5/CentOS-5 with glibc-2.5-58.el5_6.4.
>> > is it a known bug on the 2.5 branch?
>>
>> That was released 7 years ago. I don't remember anything from that
>> time period :-)
>
> RHEL5 has at least 10 years of commercial support and many companies still use it ;-)

That's excellent, but *I* don't remember that far back :-)

>> > btw, i can workaround this issue with -Wl,-z,now linking flag to avoid lazy
>> > symbol binding but i'd like to avoid this way if possible.
>>
>> Why do you assume it's a glibc bug?
>> (...)
>> It will always deadlock in ___lll_lock_wait for any deadlock since that's the
>> lowest level function for the locking implementation.
>
> it works fine with newer glibc-2.12 from RHEL6 and with glibc-2.16 from other linux distro.
> moreover, these traces from different threads stuck in the same point -> _L_lock_1127,
> so i assume that it is probably a glibc-2.5 bug fixed in newer version. the main problem
> is to locate the right fix in glibc.git mirror and check RHEL5 updates against it.
> i can't force customer to update theirs RHEL5 cluster without strong arguments :)

In the glibc 2.9 era (2008, 3 years after 2.5[1]) on x86_64 we added
tlsdesc support.

The tlsdesc support had some interesting dependencies on _dl_load_lock, which
is the mostly likely lock being taken here. The lock is used to serialize access
to the dynamic loader data. As such it get touched from a number of different
places to prevent corruption.

It's possible that _dl_load_lock access is the problem here in the 2.5 codebase.

It's possible the problem still exists and the changes for tlsdesc
have covered it up.

It's also possible you have a kernel bug that misses a futex wakeup.

Good luck.

Cheers,
Carlos.

[1] http://sourceware.org/glibc/wiki/Glibc%20Timeline



More information about the Libc-help mailing list