Bug 4578

Summary: Assertion `...r_state == RT_CONSISTENT' failed!
Product: glibc Reporter: Larry Stewart <larry.stewart>
Component: nptlAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: gautamshruti66, glibc-bugs, stephen.robinson
Priority: P2 Flags: fweimer: security-
Version: 2.3.5   
Target Milestone: ---   
Host: mips64-linux-gnu Target:
Build: Last reconfirmed:
Attachments: test case
proposed patch
reliable test case
updated patch for glibc-2.22

Description Larry Stewart 2007-05-31 18:58:41 UTC
We hit an assertion in ld.so about every 6000 runs of the cluster manager
slurmstepd on SiCortex hardware.  This is evidently the same bug as

http://www.redhat.com/archives/phil-list/2003-December/msg00008.html

Evidently it has reappeared because of ld.so consistency checking and 
because our chip (6 way SMP at 500 MHz) has a wider window of vulnerability.

We've adapted the previously reported test case so that it fails about half the
time (attached) and developed a patch (attached) that resolves the problem.

The test case doesn't fail for us on opterons, the only other systems we have
available.

The failure message we get on our machines is:

Inconsistency detected by ld.so: dl-open.c: 215: dl_open_worker: Assertion
`_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!

Details:

If a thread happens to hold dl_load_lock and have r_state set to RT_ADD or
RT_DELETE at the time another thread calls fork(), then the child exit code
from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case) re-initializes
dl_load_lock but does not restore r_state to RT_CONSISTENT. If the child
subsequently requires ld.so functionality before calling exec(), then the
assertion will fire.

The patch acquires dl_load_lock on entry to fork() and releases it on exit
from the parent path.  The child path is initialized as currently done.
This is essentially pthreads_atfork, but forced to be first because the
acquisition of dl_load_lock must happen before malloc_atfork is active
to avoid a deadlock.
Comment 1 Larry Stewart 2007-05-31 19:00:05 UTC
Created attachment 1873 [details]
test case
Comment 2 Larry Stewart 2007-05-31 19:08:50 UTC
Created attachment 1874 [details]
proposed patch
Comment 3 Petr Baudis 2007-06-08 17:49:06 UTC
I think this is a dupe of bug 3429.
Comment 4 Larry Stewart 2007-06-08 18:13:23 UTC
Actually it isn't a duplicate of 3429.  The assertion failure message is the same,
but it occurs on a different line of the source file, and we applied
the patch for 3429 without fixing our bug.  Sorry I didn't make a more
complete report.
Comment 5 shruti 2015-09-15 01:41:26 UTC
Does this bug still exists?
Comment 6 shruti 2015-09-15 01:43:15 UTC
Does this bug still exists? I came across this assertion, which I was running my C program via an application running on heroku.
Comment 7 Stephen Robinson 2015-11-23 13:05:45 UTC
I have run into this bug in glibc-2.16.0 and can also confirm that it is still present in glibc-2.22.

The assertion was removed Jan 21st 2015 in changeset ccdb048, but the underlying bug is still present.

The bug occurs because there is no protection to ensure that the loader's internal structures are not being modified when a fork occurs and the child process receives a snapshot of those structures. 

I have created a new test case that reliably reproduces the bug as long as there are at least two CPUs available so that the threads can run in parallel.

The existing patch from Larry Stewart fixes this issue for me, though I've also updated it to acquire the new dl_load_write_lock.

While investigating this issue I also discovered a deadlock due to dl_load_write_lock not being reinitialised for the child process during fork. I will open a new bug for this.
Comment 8 Stephen Robinson 2015-11-23 13:07:08 UTC
Created attachment 8804 [details]
reliable test case
Comment 9 Stephen Robinson 2015-11-23 13:07:52 UTC
Created attachment 8805 [details]
updated patch for glibc-2.22
Comment 10 Stephen Robinson 2015-11-23 13:22:36 UTC
I've opened bug 19282 for the related issue of dl_load_write_lock not being reinitialised during fork