Bug 13882 - Cycle detection in dynamic loader is broken
Summary: Cycle detection in dynamic loader is broken
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.15
: P2 normal
Target Milestone: 2.16
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-20 18:06 UTC by law
Modified: 2019-04-14 11:10 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description law 2012-03-20 18:06:32 UTC
The cycle detection and initializer ordering in the dynamic loader is badly broken.  

This change:


2011-08-16  Andreas Schwab  <schwab@redhat.com>

        [BZ #11724]
        * elf/dl-deps.c (_dl_map_object_deps): Only assume cycle when
        object is seen twice.
        * elf/dl-fini.c (_dl_sort_fini): Likewise.

Attempted to fix cycle detection, but got it horribly wrong.

We track how often the each object is "seen"  and once an object is
seen more than once, we assume there's a cycle.

"seen" is actually a misnomer.  It actually represents the number of
times we have looked at an object to see if there's an object later in
the list for which the current one is a dependency.

So let's assume we have 6 objects.  1 2 3 4 5 6

1 depends on 2
2 depends on 3
3 depends on 4
4 depends on 5
5 depends on 6

At the start of the code to reorder the list and detect cycles, assume
the objects arrive in the following order

1 4 6 5 3 2 (from the link line ordering)

If we trace the changes in the list we'll see the following

1 6 5 3 4 2 (Step #1, move 4 after 3, 4 has been seen once)

1 5 6 3 4 2 (Step #2, move 6 after 5, 6 has been seen once)

1 6 3 4 5 2 (Step #3, move 5 after 4, 5 has been seen once)

1 3 4 5 6 2 (Step #4, move 6 after 5 (again), 6 has been seen twice)

1 4 5 6 2 3 (Step #5, move 3 after 2, 3 has been seen once)

1 5 6 2 3 4 (Step #6, move 4 after 3 (again), 4 has been seen twice)

1 6 2 3 4 5 (Step #7, move 5 after 4 (again), 5 has been seen twice)

At this point the code (erroneously) believes it's hit a cycle as it's
already marked object 6 as being seen twice.

However, there clearly isn't a cycle if you review the dependencies.
Thus, the check for an object already being seen twice is wrong.

Given the same 6 nodes, the pathological case begins with this state

6 5 4 3 2 1

and transitions like this:

5 6 4 3 2 1
6 4 5 3 2 1
4 5 6 3 2 1
5 6 3 4 2 1
6 3 4 5 2 1
3 4 5 6 2 1
4 5 6 2 3 1
5 6 2 3 4 1
6 2 3 4 5 1
2 3 4 5 6 1
3 4 5 6 1 2
4 5 6 1 2 3
5 6 1 2 3 4
6 1 2 3 4 5
1 2 3 4 5 6

Object #6 is moved 5 times.  It's fairly simple to show that for any
given number of objects the worst case scenario is N - 1 moves.


There is a secondary problem in this code; the "seen" array is an array of chars and thus we're assuming that we never have more than 128 DSOs that need ordering.  Several executables on my Fedora 16 box already exceed 128 DSOs.

The broken sorting code is replicated in 3 locations:

_dl_map_object_deps, _dl_sort_fini and _dl_open_worker

http://sourceware.org/ml/libc-alpha/2012-01/msg00072.html
http://sourceware.org/ml/libc-alpha/2012-01/msg00080.html
Comment 1 Paul Pluzhnikov 2012-03-21 15:19:42 UTC
> Several executables on my Fedora 16 box already exceed 128 DSOs.

FWIW, we routinely run executables (unit-tests) which have several thousand DSOs (but they generally don't need ordering ;-)
Comment 2 Carlos O'Donell 2012-05-07 20:27:25 UTC
We want this fix in 2.16 setting milestone.
Comment 3 law 2012-06-21 15:28:54 UTC
commit 28363bbf705830cb35791af679401559376eaa75
Author: Jeff Law <law@redhat.com>
Date:   Thu Jun 21 09:26:41 2012 -0600

    2012-06-21  Jeff Law  <law@redhat.com>
    
            [BZ #13882]
            * elf/dl-deps.c (_dl_map_object_deps): Fix cycle detection.  Use
            uint16_t for elements in the "seen" array to avoid char overflows.
            * elf/dl-fini.c (_dl_sort_fini): Likewise.
            * elf/dl-open.c (dl_open_worker): Likewise.
Comment 4 Don Hatch 2013-03-27 08:51:06 UTC
Note that this fix changes the running time from O(n^2) to O(n^3)--
I've opened a new bug 15310 on this.