I'm using glibc 2.13 from Slackware 13.37 on x86-64. One of our local programs hangs from time to time and I have kill -9 it. Today I was able to get core file and get the following back trace. #0 0x00007f3118abfa8e in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007f3118a49c75 in _L_lock_12271 () from /lib64/libc.so.6 #2 0x00007f3118a47efd in realloc () from /lib64/libc.so.6 #3 0x00007f3118a3b633 in vasprintf () from /lib64/libc.so.6 #4 0x00007f3118a1c238 in asprintf () from /lib64/libc.so.6 #5 0x00007f31189f7863 in __assert_fail () from /lib64/libc.so.6 #6 0x00007f3118d76366 in __reclaim_stacks () from /lib64/libpthread.so.0 #7 0x00007f3118a78a36 in fork () from /lib64/libc.so.6 #8 0x000000000040672d in do_notify (in=<value optimized out>) at lib/inotify.c:39 #9 0x00007f3118d76d6b in start_thread () from /lib64/libpthread.so.0 #10 0x00007f3118ab045d in clone () from /lib64/libc.so.6 The program uses about 4G memory on machine with 16G and and 2.6.38.7 kernel. The notify function looks like this: https://github.com/gfto/tsdecrypt/blob/master/notify.c#L46 I understand the problem is some kind of malloc related deadlock but I have no idea how to avoid or work around it. There is currently glibc 2.14.1 in slackware-current, will updating to this version fix the problem?
This could easily be the malloc deadlock I posted about here: http://sources.redhat.com/ml/libc-alpha/2012-02/msg00272.html
This should have been fixed by: commit 7a775e6b3d938586db5a66a76de9c14667151cf9 Author: Andreas Schwab <schwab@redhat.com> Date: Thu Sep 15 14:48:01 2011 +0200 Avoid race between {,__de}allocate_stack and __reclaim_stacks during fork which is in 2.15 and later.
I'm now using 2.15 and the problem is gone (as expected). Thanks.
Created attachment 7714 [details] Source code to reproduce the problem
Problem is still reproducible on GLIBC2.17/ubunt 13.04. Attached is the source code to reproduce the problem. Stack is the following one __lll_lock_wait_private() at lowlevellock.S:95 0x7ffff73e3d7b _L_lock_697() at 0x7ffff73d0af4 __GI___vsyslog_chk() at syslog.c:258 0x7ffff73d0524 __syslog() at syslog.c:117 0x7ffff73d099f forkSyslogHang::forkSyslogHang() at forkSyslogHang.cpp:33 0x4036e1 threadPool::threadExecute() at threadPool.cpp:36 0x403ae0 start_thread() at pthread_create.c:311 0x7ffff7bc4f8e clone() at clone.S:113 0x7ffff73d5a0d
(In reply to Ionut Ceausu from comment #5) > Problem is still reproducible on GLIBC2.17/ubunt 13.04. Attached is the > source code to reproduce the problem. > Stack is the following one > __lll_lock_wait_private() at lowlevellock.S:95 0x7ffff73e3d7b > _L_lock_697() at 0x7ffff73d0af4 > __GI___vsyslog_chk() at syslog.c:258 0x7ffff73d0524 > __syslog() at syslog.c:117 0x7ffff73d099f This is a different problem. I filed it as bug 19429.