Bug 3429 - Race in _dl_open with r_debug.r_state consistency check
Summary: Race in _dl_open with r_debug.r_state consistency check
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.4
: P1 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-27 17:31 UTC by Suzuki
Modified: 2016-05-08 14:17 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
patch to fix the race (1.12 KB, patch)
2006-10-27 17:39 UTC, Suzuki
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Suzuki 2006-10-27 17:31:27 UTC
While running some stress tests on one of our application, we encountered an
assert() in ld.so as follows:

"Inconsistency detected by ld.so: dl-open.c: 610: _dl_open: Assertion
`_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

with glibc-2.4.31. This race seems to be present in the libc I got from the CVS
[at code inspection]. We were able to reproduce this consistently within 4-5hrs
of run.

Upon debugging we found that it is due to a race between two threads doing a
_dl_open().

The scenario is something like this :

In elf/dl-open.c, _dl_open:


  /* Make sure we are alone.  */
  __rtld_lock_lock_recursive (GL(dl_load_lock));

[...]

  int errcode = _dl_catch_error (&objname, &errstring, &malloced,
                                 dl_open_worker, &args);
#ifndef MAP_COPY
  /* We must munmap() the cache file.  */
  _dl_unload_cache ();
#endif

  /* Release the lock.  */
  __rtld_lock_unlock_recursive (GL(dl_load_lock));

^^^^^ This would kick any other thread waiting on the lock.


if (__builtin_expect (errstring != NULL, 0))
  {
     [...]
   assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
  }

assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);

And, if the thread which gets woken up is playing with the same namespace, and
sets the r_state to RT_ADD in _dl_map_object_from_fd even before we reach here
(truly possible in an SMP system),  ( due to getting scheduled out ), we would
hit the assert !

So, it is not safe to believe that the r_state won't get changed once we release
the lock.
Comment 1 Suzuki 2006-10-27 17:39:49 UTC
Created attachment 1391 [details]
patch to fix the race

This patch has been tested to fix the issue. 

Comments ?

Thanks
Comment 2 Ulrich Drepper 2006-10-27 18:43:52 UTC
You're addressing a real problem.  The assert are unimportant by the _dl_close
call must be protected.  This is fixed now.
Comment 3 Suzuki 2006-10-27 18:50:50 UTC
(In reply to comment #2)
> You're addressing a real problem.  The assert are unimportant by the _dl_close
> call must be protected.  This is fixed now.

So could you please let us know if there is already a patch existing for the
issue ? Or can we use this patch as the final fix ?

Thanks.
Comment 4 cvs-commit@gcc.gnu.org 2007-01-12 15:21:48 UTC
Subject: Bug 3429

CVSROOT:	/cvs/glibc
Module name:	libc
Branch: 	glibc-2_5-branch
Changes by:	jakub@sourceware.org	2007-01-12 15:21:33

Modified files:
	.              : ChangeLog 
	elf            : Makefile dl-close.c dl-open.c 
Added files:
	elf            : tst-thrlock.c 

Log message:
	* elf/dl-close.c (_dl_close_worker): Renamed from _dl_close and
	split out locking and parameter checking.
	(_dl_close): Call _dl_close_worker after locking and checking.
	* elf/dl-open.c (_dl_open): Call _dl_close_worker instead of
	_dl_close.
	* elf/Makefile: Add rules to build and run tst-thrlock.
	* elf/tst-thrlock.c:  New file.
	
	[BZ #3429]
	* elf/dl-open.c (dl_open_worker): Keep holding dl_load_lock until
	we are sure we do not need it anymore for _dl_close.  Also move
	the asserts inside the lock region.
	Patch mostly by Suzuki <suzuki@in.ibm.com>.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/ChangeLog.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.10362.2.7&r2=1.10362.2.8
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/tst-thrlock.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=NONE&r2=1.2.4.1
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/Makefile.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.315&r2=1.315.2.1
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/dl-close.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.117&r2=1.117.2.1
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/dl-open.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.128&r2=1.128.2.1

Comment 5 Jim Radford 2009-07-24 01:48:21 UTC
I noticed this same message with glibc-2.10.1-2.x86_64.  It happened after a
suspend when my disk was churning, so I suspect there's another race.
Comment 6 Ulrich Drepper 2009-07-24 01:52:04 UTC
Stop reopening bugs.  If you have something to report open a new bug.  But not
if you're not providing real information like a reproducer.