This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Race in _dl_open()


hi,

We haven't had the luck to reproduce this with a simple test case :( .. We are still working on it.

Btw, do you see the possibility of the kind of race pointed out here ? Also is this the proper way to fix it ?

Thanks,

Suzuki

suzuki wrote:
Hi,

One of our application was getting terminated with the following error

"Inconsistency detected by ld.so: dl-open.c: 610: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

with glibc-2.4.31. This race seems to be present in the libc I got from the CVS [at code inspection]. We were able to reproduce this consistently after running for 4-5hrs.

Upon debugging we found that it is due to a race between two threads doing a _dl_open().

The scenario is something like this :

In elf/dl-open.c, _dl_open:


/* Make sure we are alone. */ __rtld_lock_lock_recursive (GL(dl_load_lock));

[...]

  int errcode = _dl_catch_error (&objname, &errstring, &malloced,
                                 dl_open_worker, &args);
#ifndef MAP_COPY
  /* We must munmap() the cache file.  */
  _dl_unload_cache ();
#endif

  /* Release the lock.  */
  __rtld_lock_unlock_recursive (GL(dl_load_lock));

^^^^^ This would kick any other thread waiting on the lock.


if (__builtin_expect (errstring != NULL, 0)) { [...] }

assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);

And, if the thread which gets woken up is playing with the same namespace, and sets the r_state to RT_ADD in _dl_map_object_from_fd even before we reach here (truly possible in an SMP system), ( due to getting scheduled out ), we would hit the assert !

In our case we found that the r_state was RT_ADD and which was set by another thread.

So, it is not safe to believe that the r_state won't get changed once we release the lock.

We have tried the following solution for this issue, and it works fine.

Solution: Ensure the consistency in case of a successful dl_open_worker, before releasing the lock ! Our solution ignored the case where dl_open_worker may fail.

Attached patch addresses all the issues and we think this may be a complete solution for the problem.

1) For successful load by dl_open_worker or a failure where we don't have to remove any object from memory ( i.e, we don't have to do a _dl_close to ensure the consistency ), the assertion is performed before unlocking.

2) For the latter case, where we have to do a _dl_close(), the assertion is done just after the _dl_close.



Comments ?


Thanks,


Suzuki K P <suzuki@in.ibm.com>
Linux Technology Center,
IBM Systems & Technology Labs.




------------------------------------------------------------------------


Index: glibc-2.4/elf/dl-open.c
===================================================================
--- glibc-2.4.orig/elf/dl-open.c 2006-10-03 11:10:22.000000000 -0700
+++ glibc-2.4/elf/dl-open.c 2006-10-10 15:28:33.000000000 -0700
@@ -554,6 +554,16 @@
_dl_unload_cache ();
#endif
+ /* Check the consistency state, for a successful load, before we release
+ the lock. Somebody else might come up and modify the states once we + release this lock. In case of the failures, where we need a _dl_close,
+ consistency is achieved only after the _dl_close. So for that case
+ we delay the check until we finish _dl_close. */
+ if (__builtin_expect (errstring == NULL, 1) + || args.map == NULL)
+ assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+ +
/* Release the lock. */
__rtld_lock_unlock_recursive (GL(dl_load_lock));
@@ -564,7 +574,8 @@
size_t len_errstring;
/* Remove the object from memory. It may be in an inconsistent
- state if relocation failed, for example. */
+ state if relocation failed, for example. Enusre the conistent
+ state. */
if (args.map)
{
#ifdef USE_TLS
@@ -580,6 +591,7 @@
#endif
_dl_close (args.map);
+ assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
}
/* Make a local copy of the error string so that we can release the
@@ -601,13 +613,9 @@
if (malloced)
free ((char *) errstring);
- assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
-
/* Reraise the error. */
_dl_signal_error (errcode, objname, NULL, local_errstring);
- }
-
- assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+ } #ifndef SHARED
DL_STATIC_INIT (args.map);


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]