This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH] Race in dl code: dl_open_worker and _dl_fini
- From: Sripathi Kodi <sripathik at in dot ibm dot com>
- To: libc-alpha at sources dot redhat dot com
- Date: Fri, 22 Dec 2006 13:49:28 +0530
- Subject: [PATCH] Race in dl code: dl_open_worker and _dl_fini
- Reply-to: sripathik at in dot ibm dot com
Hi,
Some time back, I had reported a problem where glibc detected invalid free:
http://sourceware.org/ml/libc-alpha/2006-07/msg00092.html
I have now traced the root cause of this problem. I think there is a
problem in dl code. I have attached a patch to the end of this mail.
I occasionally see a message of this type when I run a multi-threaded testcase:
*** glibc detected *** free(): invalid pointer: 0xb7f9fb78 ***
When I ran with MALLOC_CHECK_=2 and got a core file, the backtraces of
the two threads involved looked like this:
Thread 1:
#0 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7dd37d5 in raise () at ../string/bits/string2.h:1000
#2 0xb7dd5149 in abort () at ../string/bits/string2.h:1000
#3 0xb7e0740a in __libc_message (do_abort=2, fmt=0xb7ec8c64 "*** glibc detected
*** %s: 0x%s ***\n")
at ../sysdeps/unix/sysv/linux/libc_fatal.c:145
#4 0xb7e0db3f in _int_free (av=0xb7ed3820, mem=0xb7f38b78) at malloc.c:5525
#5 0xb7e0deba in __libc_free (mem=0xb7f38b78) at malloc.c:3404
#6 0xb7f75a5d in ___tls_get_addr (ti=0xb7ed1e30) at ../sysdeps/generic/dl-tls.c:670
#7 0xb7ea9c7b in __libc_dl_error_tsd () at dl-tsd.c:53
#8 0xb7f73045 in _dl_catch_error (objname=0xb7f382d0, errstring=0xb7f382d4,
operate=0xb7ea9770 <do_dlsym>,
args=0xb7f382d8) at dl-error.c:155
#9 0xb7ea98ce in *__GI___libc_dlsym (map=0xb7ef9510, name=0xb7f2044d
"_Unwind_Resume") at dl-libc.c:42
#10 0xb7f1f330 in _Unwind_ForcedUnwind (exc=0x0, stop=0, stop_argument=0x0)
at ../nptl/sysdeps/pthread/unwind-forcedunwind.c:44
#11 0xb7f1ce21 in __pthread_unwind (buf=Variable "buf" is not available.
) at unwind.c:130
#12 0xb7f18200 in __pthread_exit (value=0x0) at pthreadP.h:222
#13 0xb7f3d019 in internal_exit ()
from /home/sripathi/ibm-java2-ws-sdk-50-linux-i386/jre/bin/realtime/libj9thr23.so
#14 0xb7f3c33f in thread_wrapper ()
from /home/sripathi/ibm-java2-ws-sdk-50-linux-i386/jre/bin/realtime/libj9thr23.so
#15 0xb7f175c1 in start_thread (arg=0xb7f38ba0) at pthread_create.c:261
#16 0xb7e736fe in clone () from /lib/tls/libc.so.6
Thread 2:
#0 0x497f58f4 in _exit () from /lib/tls/libc.so.6
#1 0x497954c4 in exit () from /lib/tls/libc.so.6
#2 0x4977fe2d in __libc_start_main (main=0x8048d4e <main>, argc=2,
ubp_av=0xbf9f78b4,
init=0x80597f8 <__libc_csu_init>, fini=0x805984c <__libc_csu_fini>,
rtld_fini=0x4975e670 <_dl_fini>,
stack_end=0xbf9f78ac) at ../sysdeps/generic/libc-start.c:240
#3 0x08048b81 in _start ()
Analysis:
Thread 2 is doing process exit. From exit(), it calls _dl_fini(). In
_dl_fini, we go through the list of loaded libraries, set l_init_called
for each of them to 0 and then call destructor for the library. From the
comment in the code, setting l_init_flag to 0 is to prevent the
destructors from being called multiple times:
/* 'maps' now contains the objects in the right order. Now call the
destructors. We have to process this array from the front. */
for (i = 0; i < nmaps; ++i)
{
l = maps[i];
if (l->l_init_called)
{
/* Make sure nothing happens if we are called twice. */
l->l_init_called = 0; <=========
/* Is there a destructor function? */
Meanwhile, Thread 1 is doing pthread_exit. As part of this, it dlopen()s
libgcc_s.so and looks for the symbol "_Unwind_ForcedUnwind" in it
(pthread_cancel_init). Thread 1 loads libgcc_s.so and since libc.so is a
dependency, checks if it is initialized. In dl_open_worker(), it checks
the l_init_called flag to decide if it should initialize the tls for it.
Since Thread 2 has just set l_init_called for libc.so to 0, Thread 2
thinks libc.so is not initialized and tries to initialize the tls for
libc.so! However, tls of libc.so is being used at this time, for symbol
lookup (in _dl_catch_error). This causes problems that lead to the
crash/invalid free message from glibc.
Solution:
1) From what I have understood, _dl_fini() is only called during process
exit and process exit can only be called once from anywhere in the
program. So there cannot be multiple calls to _dl_fini(). Hence we need
not take care of this situation. We just need to remove the line that
sets l_init_flag to 0. I have attached a patch below that does this.
2) Rather than depending on l_init_flag, have a different flag in
link_map structure, to be used in _dl_fini() to prevent destructors
being called multiple times. I can send a patch to do this if this is
the correct way.
Please let me know if both the above ideas are wrong and suggest any
alternatives I should look at.
Thanks and regards,
Sripathi.
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
diff -uprN libc_org/elf/dl-fini.c libc/elf/dl-fini.c
--- libc_org/elf/dl-fini.c 2006-12-22 01:54:22.000000000 -0600
+++ libc/elf/dl-fini.c 2006-12-22 02:10:22.000000000 -0600
@@ -217,9 +217,6 @@ _dl_fini (void)
if (l->l_init_called)
{
- /* Make sure nothing happens if we are called twice. */
- l->l_init_called = 0;
-
/* Is there a destructor function? */
if (l->l_info[DT_FINI_ARRAY] != NULL
|| l->l_info[DT_FINI] != NULL)