This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] Race in _dl_open()
hi,
The patch attached here has a small race window still kept open for the
case 2, where _dl_close is called !
Attaching a new, more complete patch which eliminates the race conditions.
Since the dl_load_lock is recursive, we hold it while calling the
_dl_close() and release only after we are through it and finished the
assert check!
Comments ?
Thanks,
Suzuki
suzuki wrote:
hi,
We haven't had the luck to reproduce this with a simple test case :( ..
We are still working on it.
Btw, do you see the possibility of the kind of race pointed out here ?
Also is this the proper way to fix it ?
Thanks,
Suzuki
suzuki wrote:
Hi,
One of our application was getting terminated with the following error
"Inconsistency detected by ld.so: dl-open.c: 610: _dl_open: Assertion
`_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!
with glibc-2.4.31. This race seems to be present in the libc I got
from the CVS [at code inspection]. We were able to reproduce this
consistently after running for 4-5hrs.
Upon debugging we found that it is due to a race between two threads
doing a _dl_open().
The scenario is something like this :
In elf/dl-open.c, _dl_open:
/* Make sure we are alone. */
__rtld_lock_lock_recursive (GL(dl_load_lock));
[...]
int errcode = _dl_catch_error (&objname, &errstring, &malloced,
dl_open_worker, &args);
#ifndef MAP_COPY
/* We must munmap() the cache file. */
_dl_unload_cache ();
#endif
/* Release the lock. */
__rtld_lock_unlock_recursive (GL(dl_load_lock));
^^^^^ This would kick any other thread waiting on the lock.
if (__builtin_expect (errstring != NULL, 0))
{
[...]
}
assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
And, if the thread which gets woken up is playing with the same
namespace, and sets the r_state to RT_ADD in _dl_map_object_from_fd
even before we reach here (truly possible in an SMP system), ( due to
getting scheduled out ), we would hit the assert !
In our case we found that the r_state was RT_ADD and which was set by
another thread.
So, it is not safe to believe that the r_state won't get changed once
we release the lock.
We have tried the following solution for this issue, and it works fine.
Solution: Ensure the consistency in case of a successful
dl_open_worker, before releasing the lock ! Our solution ignored the
case where dl_open_worker may fail.
Attached patch addresses all the issues and we think this may be a
complete solution for the problem.
1) For successful load by dl_open_worker or a failure where we don't
have to remove any object from memory ( i.e, we don't have to do a
_dl_close to ensure the consistency ), the assertion is performed
before unlocking.
2) For the latter case, where we have to do a _dl_close(), the
assertion is done just after the _dl_close.
Comments ?
Thanks,
Suzuki K P <suzuki@in.ibm.com>
Linux Technology Center,
IBM Systems & Technology Labs.
------------------------------------------------------------------------
Index: glibc-2.4/elf/dl-open.c
===================================================================
--- glibc-2.4.orig/elf/dl-open.c 2006-10-03 11:10:22.000000000 -0700
+++ glibc-2.4/elf/dl-open.c 2006-10-10 15:28:33.000000000 -0700
@@ -554,6 +554,16 @@
_dl_unload_cache ();
#endif
+ /* Check the consistency state, for a successful load, before we
release
+ the lock. Somebody else might come up and modify the states once
we + release this lock. In case of the failures, where we need a
_dl_close,
+ consistency is achieved only after the _dl_close. So for that case
+ we delay the check until we finish _dl_close. */
+ if (__builtin_expect (errstring == NULL, 1) + || args.map ==
NULL)
+ assert (_dl_debug_initialize (0, args.nsid)->r_state ==
RT_CONSISTENT);
+ +
/* Release the lock. */
__rtld_lock_unlock_recursive (GL(dl_load_lock));
@@ -564,7 +574,8 @@
size_t len_errstring;
/* Remove the object from memory. It may be in an inconsistent
- state if relocation failed, for example. */
+ state if relocation failed, for example. Enusre the conistent
+ state. */
if (args.map)
{
#ifdef USE_TLS
@@ -580,6 +591,7 @@
#endif
_dl_close (args.map);
+ assert (_dl_debug_initialize (0, args.nsid)->r_state ==
RT_CONSISTENT);
}
/* Make a local copy of the error string so that we can release
the
@@ -601,13 +613,9 @@
if (malloced)
free ((char *) errstring);
- assert (_dl_debug_initialize (0, args.nsid)->r_state ==
RT_CONSISTENT);
-
/* Reraise the error. */
_dl_signal_error (errcode, objname, NULL, local_errstring);
- }
-
- assert (_dl_debug_initialize (0, args.nsid)->r_state ==
RT_CONSISTENT);
+ }
#ifndef SHARED
DL_STATIC_INIT (args.map);
Index: glibc-2.4/elf/dl-open.c
===================================================================
--- glibc-2.4.orig/elf/dl-open.c 2005-11-15 05:30:08.000000000 -0800
+++ glibc-2.4/elf/dl-open.c 2006-10-25 19:09:48.527370480 -0700
@@ -553,10 +553,14 @@
/* We must munmap() the cache file. */
_dl_unload_cache ();
#endif
-
- /* Release the lock. */
- __rtld_lock_unlock_recursive (GL(dl_load_lock));
-
+
+ /* Releasing the lock here would expose a window for other threads
+ to load libs and thus make the state of r_debug, !RT_CONSISTENT.
+ We release the lock only after we have performed the RT_CONSISTENT
+ check. */
+
+ //__rtld_lock_unlock_recursive (GL(dl_load_lock));
+
if (__builtin_expect (errstring != NULL, 0))
{
/* Some error occurred during loading. */
@@ -565,6 +569,7 @@
/* Remove the object from memory. It may be in an inconsistent
state if relocation failed, for example. */
+
if (args.map)
{
#ifdef USE_TLS
@@ -578,9 +583,14 @@
if ((mode & __RTLD_AUDIT) == 0)
GL(dl_tls_dtv_gaps) = true;
#endif
-
+ /* Note that we already hold the dl_load_lock here. But thats ok, since
+ it is a recursive lock ! */
_dl_close (args.map);
}
+ /* Now we are sure that the state is consistent. Do the check and
+ release the lock */
+ assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+ __rtld_lock_unlock_recursive (GL(dl_load_lock));
/* Make a local copy of the error string so that we can release the
memory allocated for it. */
@@ -601,14 +611,16 @@
if (malloced)
free ((char *) errstring);
- assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
-
/* Reraise the error. */
_dl_signal_error (errcode, objname, NULL, local_errstring);
+ }
+ else
+ {
+ /* Successful load, state should be consistent. Release the lock */
+ assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+ __rtld_lock_unlock_recursive (GL(dl_load_lock));
}
-
- assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
-
+
#ifndef SHARED
DL_STATIC_INIT (args.map);
#endif