This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Race in dl code: dl_open_worker and _dl_fini


Hi,

Some time back, I had reported a problem where glibc detected invalid free: 
http://sourceware.org/ml/libc-alpha/2006-07/msg00092.html 
I have  now traced the root  cause of this  problem. I think there  is a
problem in dl code. I have attached a patch to the end of this mail.

I occasionally see a message of this type when I run a multi-threaded testcase:
*** glibc detected *** free(): invalid pointer: 0xb7f9fb78 ***

When I ran  with MALLOC_CHECK_=2 and got a core  file, the backtraces of
the two threads involved looked like this:

Thread 1:
#0  0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7dd37d5 in raise () at ../string/bits/string2.h:1000
#2  0xb7dd5149 in abort () at ../string/bits/string2.h:1000
#3  0xb7e0740a in __libc_message (do_abort=2, fmt=0xb7ec8c64 "*** glibc detected
*** %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:145
#4  0xb7e0db3f in _int_free (av=0xb7ed3820, mem=0xb7f38b78) at malloc.c:5525
#5  0xb7e0deba in __libc_free (mem=0xb7f38b78) at malloc.c:3404
#6  0xb7f75a5d in ___tls_get_addr (ti=0xb7ed1e30) at ../sysdeps/generic/dl-tls.c:670
#7  0xb7ea9c7b in __libc_dl_error_tsd () at dl-tsd.c:53
#8  0xb7f73045 in _dl_catch_error (objname=0xb7f382d0, errstring=0xb7f382d4,
operate=0xb7ea9770 <do_dlsym>,
    args=0xb7f382d8) at dl-error.c:155
#9  0xb7ea98ce in *__GI___libc_dlsym (map=0xb7ef9510, name=0xb7f2044d
"_Unwind_Resume") at dl-libc.c:42
#10 0xb7f1f330 in _Unwind_ForcedUnwind (exc=0x0, stop=0, stop_argument=0x0)
    at ../nptl/sysdeps/pthread/unwind-forcedunwind.c:44
#11 0xb7f1ce21 in __pthread_unwind (buf=Variable "buf" is not available.
) at unwind.c:130
#12 0xb7f18200 in __pthread_exit (value=0x0) at pthreadP.h:222
#13 0xb7f3d019 in internal_exit ()
   from /home/sripathi/ibm-java2-ws-sdk-50-linux-i386/jre/bin/realtime/libj9thr23.so
#14 0xb7f3c33f in thread_wrapper ()
   from /home/sripathi/ibm-java2-ws-sdk-50-linux-i386/jre/bin/realtime/libj9thr23.so
#15 0xb7f175c1 in start_thread (arg=0xb7f38ba0) at pthread_create.c:261
#16 0xb7e736fe in clone () from /lib/tls/libc.so.6

Thread 2:
#0  0x497f58f4 in _exit () from /lib/tls/libc.so.6
#1  0x497954c4 in exit () from /lib/tls/libc.so.6
#2  0x4977fe2d in __libc_start_main (main=0x8048d4e <main>, argc=2,
ubp_av=0xbf9f78b4,
    init=0x80597f8 <__libc_csu_init>, fini=0x805984c <__libc_csu_fini>,
rtld_fini=0x4975e670 <_dl_fini>,
    stack_end=0xbf9f78ac) at ../sysdeps/generic/libc-start.c:240
#3  0x08048b81 in _start ()


Analysis:

Thread 2  is doing process  exit. From  exit(), it calls  _dl_fini(). In
_dl_fini, we go through the  list of loaded libraries, set l_init_called
for each of them to 0 and then call destructor for the library. From the
comment  in  the code,  setting  l_init_flag  to  0  is to  prevent  the
destructors from being called multiple times:

      /* 'maps' now contains the objects in the right order.  Now call the
         destructors.  We have to process this array from the front.  */
      for (i = 0; i < nmaps; ++i)
        {
          l = maps[i];

          if (l->l_init_called)
            {
              /* Make sure nothing happens if we are called twice.  */
              l->l_init_called = 0;          <=========

              /* Is there a destructor function?  */

Meanwhile, Thread 1 is doing pthread_exit. As part of this, it dlopen()s
libgcc_s.so  and  looks  for  the symbol  "_Unwind_ForcedUnwind"  in  it
(pthread_cancel_init). Thread 1 loads libgcc_s.so and since libc.so is a
dependency, checks if it is  initialized. In dl_open_worker(), it checks
the l_init_called flag to decide if it should initialize the tls for it.
Since Thread  2 has just  set l_init_called for  libc.so to 0,  Thread 2
thinks libc.so  is not initialized and  tries to initialize the  tls for
libc.so! However, tls of libc.so is  being used at this time, for symbol
lookup  (in _dl_catch_error).  This  causes problems  that  lead to  the
crash/invalid free message from glibc.

Solution:

1) From what I have understood, _dl_fini() is only called during process
exit  and process  exit can  only be  called once  from anywhere  in the
program. So there cannot be multiple  calls to _dl_fini(). Hence we need
not take care  of this situation. We  just need to remove  the line that
sets l_init_flag to 0. I have attached a patch below that does this.

2)  Rather than  depending  on  l_init_flag, have  a  different flag  in
link_map  structure, to  be used  in _dl_fini()  to prevent  destructors
being called multiple  times. I can send  a patch to do this  if this is
the correct way.

Please let  me know if  both the above ideas  are wrong and  suggest any
alternatives I should look at.

Thanks and regards,
Sripathi.


Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>

diff -uprN libc_org/elf/dl-fini.c libc/elf/dl-fini.c
--- libc_org/elf/dl-fini.c	2006-12-22 01:54:22.000000000 -0600
+++ libc/elf/dl-fini.c	2006-12-22 02:10:22.000000000 -0600
@@ -217,9 +217,6 @@ _dl_fini (void)
 
 	  if (l->l_init_called)
 	    {
-	      /* Make sure nothing happens if we are called twice.  */
-	      l->l_init_called = 0;
-
 	      /* Is there a destructor function?  */
 	      if (l->l_info[DT_FINI_ARRAY] != NULL
 		  || l->l_info[DT_FINI] != NULL)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]