Bug 14379 - shared object constructors are called in the wrong order (with interposition)
Summary: shared object constructors are called in the wrong order (with interposition)
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-20 19:12 UTC by Andy Lutomirski
Modified: 2022-08-29 09:46 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andy Lutomirski 2012-07-20 19:12:14 UTC
[This is not clearly a bug in the strictest sense, as the ELF spec, AFAICT, has nothing useful to say.  It's rather unfortunate behavior, though, that makes writing efficient and reliable malloc replacements very difficult, for example.]

glibc's dynamic elf loader makes some effort to initialize an SO's DT_NEEDED dependencies before initializing the SO that depended on them.  This is nice.  

Unfortunately, for sibling dependencies, it appears that the first library is searched first for symbols (as required by the ELF spec) but the *last* library is initialized first.  This means that, if an earlier library overrides a symbol called by the constructor of a later library, then the earlier library is called before it is initialized.

This is quite unfortunate for malloc replacements.  I have one that is intended to be used with LD_PRELOAD, which causes its symbols to be searched first (as desired) but causes it to be constructed last (which is bad).

Fixing this would be more complicated than just reversing the order of initialization: it would still be good for deeper dependencies to be initialized first.

As a demonstration, create lib.c and bin.c, as below.  Then do:

$ for name in lib1 lib2 preload_lib; do gcc -DLIBNAME="\"$name\"" -fPIC -shared -o ${name}.so lib.c -Wl,--no-undefined; done
$ gcc -Wl,--no-as-needed -o bin_12 bin.c lib1.so lib2.so -Wl,-rpath -Wl,.
$ gcc -Wl,--no-as-needed -o bin_1 bin.c lib1.so -Wl,-rpath -Wl,.

$ ./bin_12
lib2: ctor called. calling the_func
  lib1: the_func called
  lib1: NOT INITED!
lib1: ctor called. calling the_func
  lib1: the_func called
main: calling the_func
  lib1: the_func called

$ LD_PRELOAD=preload_lib.so ./bin_1
lib1: ctor called. calling the_func
  preload_lib: the_func called
  preload_lib: NOT INITED!
preload_lib: ctor called. calling the_func
  preload_lib: the_func called
main: calling the_func
  preload_lib: the_func called


-- start of lib.c --

#include <unistd.h>
#define PRINT(x) do { static const char __s[] = x; write(1, __s, sizeof(__s)-1); } while(0)

static int inited = 0;

void the_func(void)
{
  PRINT("  " LIBNAME ": the_func called\n");
  if (!inited)
    PRINT("  " LIBNAME ": NOT INITED!\n");
}

__attribute__((constructor)) static void ctor(void)
{
  inited = 1;
  PRINT(LIBNAME ": ctor called. calling the_func\n");
  the_func();
}

-- end of lib.c --

-- start of bin.c --

#include <unistd.h>
#define PRINT(x) do { static const char __s[] = x; write(1, __s, sizeof(__s)-1); } while(0)

extern void the_func(void);

int main()
{
  PRINT("main: calling the_func\n");
  the_func();
  return 0;
}

-- end of bin.c --
Comment 1 Carlos O'Donell 2012-07-24 01:40:49 UTC
Could you please provide a concrete example including:

(a) Use case where the current ordering causes a problem.

(b) New ordering where the problem is solved. Please describe in detail what you mean by "it would still be good for deeper dependencies to be
initialized first." 

We need to see more detailed use cases before we can make an informed judgement here.
Comment 2 Andy Lutomirski 2012-07-24 15:21:19 UTC
Here's a silly malloc replacement library:

#include <sys/mman.h>
#include <stdlib.h>

static char *arena;

__attribute__((constructor)) static void init()
{
  arena = (char*)mmap(0, 1 << 24, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
}

void *malloc(size_t n)
{
  void *ret = arena;
  arena += n;
  return ret;
}

void *realloc(void *ptr, size_t size)
{
  abort();
}

void free(void *p) {}


IMO it should work (as long as no one tries to use realloc, calloc, or threads, but this is just an example).  Here's a rather ordinary shared library:

#include <malloc.h>

__attribute__((constructor)) static void init()
{
	*(char*)malloc(1) = 0;
}

This is, of course, silly, but in C++, dynamic allocation due to global object constructors is common.

If I try to use that library with the malloc replacement LD_PRELOADed, it crashes:

   19797:	calling init: /lib64/ld-linux-x86-64.so.2
     19797:	
     19797:	
     19797:	calling init: /lib64/libc.so.6
     19797:	
     19797:	
     19797:	calling init: /lib64/libgcc_s.so.1
     19797:	
     19797:	
     19797:	calling init: /lib64/libm.so.6
     19797:	
     19797:	
     19797:	calling init: /lib64/libstdc++.so.6
     19797:	
     19797:	
     19797:	calling init: ./test_lib.so
     19797:	
Segmentation fault

The malloc library (silly_malloc.so in this case) wasn't initialized.  I think that, in any sensible initialization order, LD_PRELOADed libraries would initialize first, not last.  (Maybe libraries in the DT_NEEDED list of the LD_PRELOADed library should initialize even sooner, but that's irrelevant here.)

My comment about initializing deeper dependencies first is something that current glibc does right.  I don't think that should change.
Comment 3 Andy Lutomirski 2012-09-11 23:41:20 UTC
[marking as UNCONFIRMED because I don't think this is waiting for anything]
Comment 4 Carlos O'Donell 2012-09-12 02:10:44 UTC
(In reply to comment #3)
> [marking as UNCONFIRMED because I don't think this is waiting for anything]

That's right, you've provided an example and we need someone to review the example. Thanks for updating the ticket.
Comment 5 Andy Lutomirski 2014-10-02 16:43:28 UTC
I think this bug is involved in a fairly reliable deadlock when a multithreaded application that preloads tcmalloc and links to GnuTLS 3.2 or earlier calls fork:

https://bugs.freedesktop.org/show_bug.cgi?id=84567
Comment 6 Andy Lutomirski 2014-10-02 16:45:16 UTC
FWIW, in the comments to your patch, you mention that pthread_atfork is nearly impossible to use cleanly without atomics.  Unless I'm missing something, this is true for the prepare handler, but not for the parent and child handlers -- there are no threads when the parent and child handlers are called.
Comment 7 Andy Lutomirski 2014-10-02 16:46:21 UTC
(In reply to Andy Lutomirski from comment #6)

Sorry, ignore that comment.  Wrong BZ entry.
Comment 8 rabinv 2019-02-14 17:10:28 UTC
Proposed fix (for the preload case):
https://www.sourceware.org/ml/libc-alpha/2019-02/msg00373.html
Comment 9 Rich Felker 2019-03-01 16:19:52 UTC
I question the validity of this issue. Such a malloc implementation would not work linked into the main program, nor as a DT_NEEDED for the main program (assuming other libraries are in use with ctors), so there's no reason to expect it to work in the LD_PRELOAD case either.

If there is a desire to make this work, special-casing LD_PRELOAD is not the right way. Instead, when library A depends on library B, and library C interposes in front of a symbol defined in library B and used in library A, the ctor ordering should attempt to arrange for C's ctors to run before A's (and probably also before B's). I'm not sure if such a constraint is easy to satisfy in general though (for example, it wouldn't work in the main-program case above; you'd get a circular constraint), and I suspect it's a mistake to try.

For what it's worth, I think it's a bug to write a malloc implementation that depends on ctors having run, since malloc could be called very early (by other ctors, maybe even by the implementation before any ctors run). Normally, it's possible to arrange malloc logic such that the init code is in a non-critical path that's automatically reached on first use, like "if (bin_is_empty) ..." where the bin_is_empty predicate is true on the statically-initialized state. In more generality, you need pthread_once/call_once for this kind of functionality.
Comment 10 Florian Weimer 2019-03-06 12:34:19 UTC
(In reply to Rich Felker from comment #9)
> I question the validity of this issue. Such a malloc implementation would
> not work linked into the main program, nor as a DT_NEEDED for the main
> program (assuming other libraries are in use with ctors), so there's no
> reason to expect it to work in the LD_PRELOAD case either.
> 
> If there is a desire to make this work, special-casing LD_PRELOAD is not the
> right way. Instead, when library A depends on library B, and library C
> interposes in front of a symbol defined in library B and used in library A,
> the ctor ordering should attempt to arrange for C's ctors to run before A's
> (and probably also before B's). I'm not sure if such a constraint is easy to
> satisfy in general though (for example, it wouldn't work in the main-program
> case above; you'd get a circular constraint), and I suspect it's a mistake
> to try.

This is an interesting point.  I think you are right that this reliance of ELF constructors is always going to be brittle (especially for a malloc).  I'm less convinced now that this feature is desirable, I'm afraid.

We actually record implicit dependencies during binding, and I believe this can affect the order in which ELF finalizers are executed.

However, I'm not entirely sure if this behavior (whether implemented for ELF constructors or destructors) would be consistent with the ELF specification.  Furthermore, most users expect that finalizers run in the opposite order of initializers, disregarding any bindings which happened in the meantime.  This is so confusing that I've been contemplating if we should disregard the symbol binding information for finalizer ordering purposes.

I'm also worried that accidental interposition is fairly common for widely used C++ templates (e.g. std::vector<std::string>).  If I'm not mistaken, interposition goes in the opposite detection of the DT_NEEDED dependencies, so this would result in effectively reverting the order of initializers (objects initialized before their dependencies).
Comment 11 Andy Lutomirski 2019-03-06 18:20:06 UTC
I feel like the symbol binding approaches and the LD_PRELOAD_INIT_EARLY idea are both too complicated.  What's wrong with my original suggestion?  It seems to me that glibc currently initializes DSOs in an order that is based on DT_NEEDED but is partially backwards from what makes sense and that simply changing the order would solve the problem pretty well.  In other words, I don't see why the current behavior makes much sense, nor do I see why a more complicated set of rules is needed.
Comment 12 Rich Felker 2019-03-06 18:30:00 UTC
I agree with Florian's assessment that interposition usually ends up being close to the opposite of topological dependency order, so you don't have much room to impose such a rule like I'd suggested. Scratch that.

Are you saying it would suffice for your needs just to reverse the order of sibling ctor execution, so that earlier (in load order/symbol definition order) dependencies of the same parent get constructed before later ones, with all the (direct) preloads being ordered as if they were early DT_NEEDEDs of the main program? That seems reasonable to me, and I think it matches the newly introduced behavior in musl (I just pushed changes doing real topological dep order for ctors), which isn't really intentional in that regard but just the most natural fallout of the topological sort.

I still think your original usage case is highly broken though. A malloc implementation should be prepared to be called before any particular ctor has executed. If glibc documents requirements for safety of an interposed malloc replacement at some point, I think this should be considered for making a documented requirement.
Comment 13 Andy Lutomirski 2019-03-06 19:33:36 UTC
(In reply to Rich Felker from comment #12)
> Are you saying it would suffice for your needs just to reverse the order of
> sibling ctor execution, so that earlier (in load order/symbol definition
> order) dependencies of the same parent get constructed before later ones,
> with all the (direct) preloads being ordered as if they were early
> DT_NEEDEDs of the main program? That seems reasonable to me, and I think it
> matches the newly introduced behavior in musl (I just pushed changes doing
> real topological dep order for ctors), which isn't really intentional in
> that regard but just the most natural fallout of the topological sort.
> 

It would, and I think that behavior would make a lot more sense.

> I still think your original usage case is highly broken though. A malloc
> implementation should be prepared to be called before any particular ctor
> has executed. If glibc documents requirements for safety of an interposed
> malloc replacement at some point, I think this should be considered for
> making a documented requirement.

Indeed, and I fixed that.  I was just very surprised when I found out that the reason my library didn't work without this change was that glibc called the ctor much later than seemed reasonable.
Comment 14 Florian Weimer 2022-08-29 09:46:00 UTC
I'm closing this as INVALID per the discussion here: interposition necessarily goes in the opposite direction of DT_NEEDED dependencies, and interposing objects need to be written in a way that reflects that (e.g., using lazy initialization instead of ELF constructors).