This is sources Bugzilla
Bugzilla Version 2.17.5
Bugzilla Bug 3429
  Race in _dl_open with r_debug.r_state consistency check Last modified: 2009-07-24 01:52:04
     Query page      Enter new bug
Bug#: 3429   Hardware:   Reporter: Suzuki <suzuki@in.ibm.com>
Host: Target: Build:
Product:     Add CC:
Component:   Version:   CC:
Remove selected CCs
Status: RESOLVED   Priority:  
Resolution: FIXED   Severity:  
Assigned To: Ulrich Drepper <drepper@redhat.com>   Target Milestone:  
Flags: Requestee:
  backport ()
  examined ()
  testsuite ()
Summary:
Keywords:

Attachment Description Type Created Actions
dl-open-race-fix.diff patch to fix the race patch 2006-10-27 17:39 Edit | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 3429 depends on: Show dependency tree
Show dependency graph
Bug 3429 blocks:

Additional Comments:


Leave as RESOLVED FIXED
Reopen bug
Mark bug as VERIFIED

View Bug Activity   |   Format For Printing


Description:   Last confirmed: 0000-00-00 00:00 Opened: 2006-10-27 17:31
While running some stress tests on one of our application, we encountered an
assert() in ld.so as follows:

"Inconsistency detected by ld.so: dl-open.c: 610: _dl_open: Assertion
`_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

with glibc-2.4.31. This race seems to be present in the libc I got from the CVS
[at code inspection]. We were able to reproduce this consistently within 4-5hrs
of run.

Upon debugging we found that it is due to a race between two threads doing a
_dl_open().

The scenario is something like this :

In elf/dl-open.c, _dl_open:


  /* Make sure we are alone.  */
  __rtld_lock_lock_recursive (GL(dl_load_lock));

[...]

  int errcode = _dl_catch_error (&objname, &errstring, &malloced,
                                 dl_open_worker, &args);
#ifndef MAP_COPY
  /* We must munmap() the cache file.  */
  _dl_unload_cache ();
#endif

  /* Release the lock.  */
  __rtld_lock_unlock_recursive (GL(dl_load_lock));

^^^^^ This would kick any other thread waiting on the lock.


if (__builtin_expect (errstring != NULL, 0))
  {
     [...]
   assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
  }

assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);

And, if the thread which gets woken up is playing with the same namespace, and
sets the r_state to RT_ADD in _dl_map_object_from_fd even before we reach here
(truly possible in an SMP system),  ( due to getting scheduled out ), we would
hit the assert !

So, it is not safe to believe that the r_state won't get changed once we release
the lock.

------- Additional Comment #1 From Suzuki 2006-10-27 17:39 -------
Created an attachment (id=1391)
patch to fix the race

This patch has been tested to fix the issue. 

Comments ?

Thanks

------- Additional Comment #2 From Ulrich Drepper 2006-10-27 18:43 -------
You're addressing a real problem.  The assert are unimportant by the _dl_close
call must be protected.  This is fixed now.

------- Additional Comment #3 From Suzuki 2006-10-27 18:50 -------
(In reply to comment #2)
> You're addressing a real problem.  The assert are unimportant by the _dl_close
> call must be protected.  This is fixed now.

So could you please let us know if there is already a patch existing for the
issue ? Or can we use this patch as the final fix ?

Thanks.

------- Additional Comment #4 From cvs-commit@gcc.gnu.org 2007-01-12 15:21 -------
Subject: Bug 3429

CVSROOT:	/cvs/glibc
Module name:	libc
Branch: 	glibc-2_5-branch
Changes by:	jakub@sourceware.org	2007-01-12 15:21:33

Modified files:
	.              : ChangeLog 
	elf            : Makefile dl-close.c dl-open.c 
Added files:
	elf            : tst-thrlock.c 

Log message:
	* elf/dl-close.c (_dl_close_worker): Renamed from _dl_close and
	split out locking and parameter checking.
	(_dl_close): Call _dl_close_worker after locking and checking.
	* elf/dl-open.c (_dl_open): Call _dl_close_worker instead of
	_dl_close.
	* elf/Makefile: Add rules to build and run tst-thrlock.
	* elf/tst-thrlock.c:  New file.
	
	[BZ #3429]
	* elf/dl-open.c (dl_open_worker): Keep holding dl_load_lock until
	we are sure we do not need it anymore for _dl_close.  Also move
	the asserts inside the lock region.
	Patch mostly by Suzuki <suzuki@in.ibm.com>.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/ChangeLog.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.10362.2.7&r2=1.10362.2.8
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/tst-thrlock.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=NONE&r2=1.2.4.1
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/Makefile.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.315&r2=1.315.2.1
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/dl-close.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.117&r2=1.117.2.1
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/dl-open.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.128&r2=1.128.2.1


------- Additional Comment #5 From Jim Radford 2009-07-24 01:48 -------
I noticed this same message with glibc-2.10.1-2.x86_64.  It happened after a
suspend when my disk was churning, so I suspect there's another race.

------- Additional Comment #6 From Ulrich Drepper 2009-07-24 01:52 -------
Stop reopening bugs.  If you have something to report open a new bug.  But not
if you're not providing real information like a reproducer.

     Query page      Enter new bug
Actions: New | Query | bug # | Reports | Requests   New Account | Log In