This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Race in _dl_open()


hi,

The patch attached here has a small race window still kept open for the case 2, where _dl_close is called !

Attaching a new, more complete patch which eliminates the race conditions.

Since the dl_load_lock is recursive, we hold it while calling the _dl_close() and release only after we are through it and finished the assert check!

Comments ?

Thanks,

Suzuki


suzuki wrote:
hi,

We haven't had the luck to reproduce this with a simple test case :( .. We are still working on it.

Btw, do you see the possibility of the kind of race pointed out here ? Also is this the proper way to fix it ?

Thanks,

Suzuki

suzuki wrote:

Hi,

One of our application was getting terminated with the following error

"Inconsistency detected by ld.so: dl-open.c: 610: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

with glibc-2.4.31. This race seems to be present in the libc I got from the CVS [at code inspection]. We were able to reproduce this consistently after running for 4-5hrs.

Upon debugging we found that it is due to a race between two threads doing a _dl_open().

The scenario is something like this :

In elf/dl-open.c, _dl_open:


/* Make sure we are alone. */ __rtld_lock_lock_recursive (GL(dl_load_lock));

[...]

  int errcode = _dl_catch_error (&objname, &errstring, &malloced,
                                 dl_open_worker, &args);
#ifndef MAP_COPY
  /* We must munmap() the cache file.  */
  _dl_unload_cache ();
#endif

  /* Release the lock.  */
  __rtld_lock_unlock_recursive (GL(dl_load_lock));

^^^^^ This would kick any other thread waiting on the lock.


if (__builtin_expect (errstring != NULL, 0)) { [...] }

assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);

And, if the thread which gets woken up is playing with the same namespace, and sets the r_state to RT_ADD in _dl_map_object_from_fd even before we reach here (truly possible in an SMP system), ( due to getting scheduled out ), we would hit the assert !

In our case we found that the r_state was RT_ADD and which was set by another thread.

So, it is not safe to believe that the r_state won't get changed once we release the lock.

We have tried the following solution for this issue, and it works fine.

Solution: Ensure the consistency in case of a successful dl_open_worker, before releasing the lock ! Our solution ignored the case where dl_open_worker may fail.

Attached patch addresses all the issues and we think this may be a complete solution for the problem.

1) For successful load by dl_open_worker or a failure where we don't have to remove any object from memory ( i.e, we don't have to do a _dl_close to ensure the consistency ), the assertion is performed before unlocking.

2) For the latter case, where we have to do a _dl_close(), the assertion is done just after the _dl_close.



Comments ?


Thanks,


Suzuki K P <suzuki@in.ibm.com>
Linux Technology Center,
IBM Systems & Technology Labs.




------------------------------------------------------------------------


Index: glibc-2.4/elf/dl-open.c
===================================================================
--- glibc-2.4.orig/elf/dl-open.c 2006-10-03 11:10:22.000000000 -0700
+++ glibc-2.4/elf/dl-open.c 2006-10-10 15:28:33.000000000 -0700
@@ -554,6 +554,16 @@
_dl_unload_cache ();
#endif
+ /* Check the consistency state, for a successful load, before we release
+ the lock. Somebody else might come up and modify the states once we + release this lock. In case of the failures, where we need a _dl_close,
+ consistency is achieved only after the _dl_close. So for that case
+ we delay the check until we finish _dl_close. */
+ if (__builtin_expect (errstring == NULL, 1) + || args.map == NULL)
+ assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+ +
/* Release the lock. */
__rtld_lock_unlock_recursive (GL(dl_load_lock));
@@ -564,7 +574,8 @@
size_t len_errstring;
/* Remove the object from memory. It may be in an inconsistent
- state if relocation failed, for example. */
+ state if relocation failed, for example. Enusre the conistent
+ state. */
if (args.map)
{
#ifdef USE_TLS
@@ -580,6 +591,7 @@
#endif
_dl_close (args.map);
+ assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
}
/* Make a local copy of the error string so that we can release the
@@ -601,13 +613,9 @@
if (malloced)
free ((char *) errstring);
- assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
-
/* Reraise the error. */
_dl_signal_error (errcode, objname, NULL, local_errstring);
- }
-
- assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+ } #ifndef SHARED
DL_STATIC_INIT (args.map);




Index: glibc-2.4/elf/dl-open.c
===================================================================
--- glibc-2.4.orig/elf/dl-open.c        2005-11-15 05:30:08.000000000 -0800
+++ glibc-2.4/elf/dl-open.c     2006-10-25 19:09:48.527370480 -0700
@@ -553,10 +553,14 @@
   /* We must munmap() the cache file.  */
   _dl_unload_cache ();
 #endif
-
-  /* Release the lock.  */
-  __rtld_lock_unlock_recursive (GL(dl_load_lock));
-
+
+  /* Releasing the lock here would expose a window for other threads
+     to load libs and thus make the state of r_debug, !RT_CONSISTENT.
+     We release the lock only after we have performed the RT_CONSISTENT
+     check.  */
+
+   //__rtld_lock_unlock_recursive (GL(dl_load_lock));
+
   if (__builtin_expect (errstring != NULL, 0))
     {
       /* Some error occurred during loading.  */
@@ -565,6 +569,7 @@

       /* Remove the object from memory.  It may be in an inconsistent
         state if relocation failed, for example.  */
+
       if (args.map)
        {
 #ifdef USE_TLS
@@ -578,9 +583,14 @@
          if ((mode & __RTLD_AUDIT) == 0)
            GL(dl_tls_dtv_gaps) = true;
 #endif
-
+          /* Note that we already hold the dl_load_lock here. But thats ok, since
+             it is a recursive lock ! */
          _dl_close (args.map);
        }
+      /* Now we are sure that the state is consistent. Do the check and
+         release the lock */
+      assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));

       /* Make a local copy of the error string so that we can release the
         memory allocated for it.  */
@@ -601,14 +611,16 @@
       if (malloced)
        free ((char *) errstring);

-      assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
-
       /* Reraise the error.  */
       _dl_signal_error (errcode, objname, NULL, local_errstring);
+    }
+  else
+    {
+      /* Successful load, state should be consistent. Release the lock */
+      assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
     }
-
-  assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
-
+
 #ifndef SHARED
   DL_STATIC_INIT (args.map);
 #endif

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]