This is sources Bugzilla
Bugzilla Version 2.17.5
Bugzilla Bug 4578
  Assertion `...r_state == RT_CONSISTENT' failed! Last modified: 2007-06-08 18:13:23
     Query page      Enter new bug
Bug#: 4578   Hardware:   Reporter: Larry Stewart <larry.stewart@sicortex.com>
Host: Target: Build:
Product:     Add CC:
Component:   Version:   CC:
Remove selected CCs
Status: NEW   Priority:  
Resolution:   Severity:  
Assigned To: Ulrich Drepper <drepper@redhat.com>   Target Milestone:  
Flags: Requestee:
  backport ()
  examined ()
  testsuite ()
Summary:
Keywords:

Attachment Description Type Created Actions
dlopen-race.c test case text/plain 2007-05-31 19:00 Edit None
glibc.patch proposed patch text/plain 2007-05-31 19:08 Edit None
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 4578 depends on: Show dependency tree
Show dependency graph
Bug 4578 blocks:

Additional Comments:


Leave as NEW 
Mark bug as waiting for feedback
Mark bug as suspended
Accept bug (change status to ASSIGNED)
Resolve bug, changing resolution to
Resolve bug, mark it as duplicate of bug #
Reassign bug to
Reassign bug to owner of selected component

View Bug Activity   |   Format For Printing


Description:   Last confirmed: 0000-00-00 00:00 Opened: 2007-05-31 18:58
We hit an assertion in ld.so about every 6000 runs of the cluster manager
slurmstepd on SiCortex hardware.  This is evidently the same bug as

http://www.redhat.com/archives/phil-list/2003-December/msg00008.html

Evidently it has reappeared because of ld.so consistency checking and 
because our chip (6 way SMP at 500 MHz) has a wider window of vulnerability.

We've adapted the previously reported test case so that it fails about half the
time (attached) and developed a patch (attached) that resolves the problem.

The test case doesn't fail for us on opterons, the only other systems we have
available.

The failure message we get on our machines is:

Inconsistency detected by ld.so: dl-open.c: 215: dl_open_worker: Assertion
`_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!

Details:

If a thread happens to hold dl_load_lock and have r_state set to RT_ADD or
RT_DELETE at the time another thread calls fork(), then the child exit code
from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case) re-initializes
dl_load_lock but does not restore r_state to RT_CONSISTENT. If the child
subsequently requires ld.so functionality before calling exec(), then the
assertion will fire.

The patch acquires dl_load_lock on entry to fork() and releases it on exit
from the parent path.  The child path is initialized as currently done.
This is essentially pthreads_atfork, but forced to be first because the
acquisition of dl_load_lock must happen before malloc_atfork is active
to avoid a deadlock.

------- Additional Comment #1 From Larry Stewart 2007-05-31 19:00 -------
Created an attachment (id=1873)
test case

------- Additional Comment #2 From Larry Stewart 2007-05-31 19:08 -------
Created an attachment (id=1874)
proposed patch

------- Additional Comment #3 From Petr Baudis 2007-06-08 17:49 -------
I think this is a dupe of bug 3429.

------- Additional Comment #4 From Larry Stewart 2007-06-08 18:13 -------
Actually it isn't a duplicate of 3429.  The assertion failure message is the same,
but it occurs on a different line of the source file, and we applied
the patch for 3429 without fixing our bug.  Sorry I didn't make a more
complete report.

     Query page      Enter new bug
Actions: New | Query | bug # | Reports | Requests   New Account | Log In