This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Support -z initfirst for multiple shared libraries

From: Yury Gribov <y dot gribov at samsung dot com>
To: Kostya Serebryany <kcc at google dot com>
Cc: d wk <dwksrc at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>, Jakub Jelinek <jakub at redhat dot com>
Date: Thu, 28 Apr 2016 19:57:41 +0300
Subject: Re: [PATCH] Support -z initfirst for multiple shared libraries
Authentication-results: sourceware.org; auth=none
References: <CAPESumqMqVem6VvaKXf_ko1zpM9_wOXixp1O7WGtS0RnvhMSpg at mail dot gmail dot com> <57206430 dot 9020206 at samsung dot com> <CAPESumqycYdiReUCUH51aLChaRLaTiGh121joQ_6rH7oRP+LAQ at mail dot gmail dot com> <5720F56E dot 6070704 at samsung dot com> <CAN=P9pjS1j480qBzy9xu9ZToyFA=xXgZmiLeFu_jkKDp3j87Rg at mail dot gmail dot com>

On 04/27/2016 08:31 PM, Kostya Serebryany wrote:

On Wed, Apr 27, 2016 at 10:22 AM, Yury Gribov <y.gribov@samsung.com> wrote:

On 04/27/2016 04:57 PM, d wk wrote:

Good points!

On Wed, Apr 27, 2016 at 3:03 AM, Yury Gribov <y.gribov@samsung.com>
wrote:

On 04/26/2016 11:58 PM, d wk wrote:


Hello libc developers,

In a project of mine, I needed to run some code before any constructors
from any system libraries (such as libc or libpthread). The
linker/loader
-z initfirst feature is perfect for this, but it only supports one
shared
library. Unfortunately libpthread also uses this feature (I assume the
feature exists because pthread needed it), so my project was
incompatible
with libpthread.

So, I wrote a small patch which changes the single dl_initfirst variable
into a linked list. This patch does not change the size of any data
structures (it's ABI compatible), just turns dl_initfirst into a list.
The
list is not freed (the allocator wouldn't free it anyway), and insertion
into the list is quadratic, but I expect there will never be more than
a handful of initfirst libraries!

This patch records initfirst libraries in load order, so LD_PRELOAD
libraries will have their constructors called before libpthread. If the
opposite behaviour is desired, the LD_PRELOAD'd library can always
declare
a dependency on libpthread. Normally LD_PRELOAD constructors are run
last,
which is very inconvenient when trying to inject new functionality, and
I expect anyone using -z initfirst with LD_PRELOAD to really want to run
first. The patch is written against latest glibc 2.23 (I also tested on
glibc 2.21, and it's not quite compatible with 2.17 since the other data
structures changed).

I was not the first person to run into this problem,
someone wanted the same thing on stack overflow two years
ago. You can see my answer there with a complete test case.


http://stackoverflow.com/questions/19796383/linux-ld-preload-z-initfirst/36143861

Hope you will accept this patch. Comments welcome. Thanks,



Hi,

I think that many debugging/profiling tools would want this feature (e.g.
for AddressSanitizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56393#c9).
But here are few questions you may want to consider:
* is your solution compatible with DF_1_INITFIRST behavior on Solaris?


 From reading the Solaris ld.so.1(1) man page, they say that initfirst

"marks the object so that its runtime initialization occurs before the
runtime initialization of any other objects brought into the process at
the same time". Here LD_PRELOAD objects are arguably brought into the
process earlier, as described by their ld(1) man page. So we ought to
ensure that LD_PRELOAD initfirst libraries are initialized before all
other LD_PRELOAD libraries, and also that normal initfirst libraries
are initialized before all other normal libraries. Actually we will run
LD_PRELOAD initfirst, then normal initfirst, then normal constructors,
then normal LD_PRELOAD constructors (running these last is the default
behaviour without initfirst). So although it's a bit complicated, I think
the behaviour of this patch is compatible with Solaris.


Agreed.

* should DF_1_INITFIRST also influence destruction order?


Solaris does: "object runtime finalization will occur after the runtime
finalization of any other objects removed from the process at the same
time". glibc's previous initfirst did not do this (I guess pthread didn't
need a destructor). In general I think this is much less important, it's
usually only used for proper cleanup. The destructor code in dl-fini.c
also looks more complicated to adapt, but I can try if this is deemed
important. It seems like an orthogonal issue to me.


* what if initfirst library has some dependencies e.g. it needs malloc

from
Glibc or dlsym from libdl.so during construction (that's e.g.
AddressSanitizer's case)? The current logic of initfirst is rather
primitive
as it does not track such dependencies at all.


Unfortunately, libpthread depends on libc -- yet it uses initfirst to get
initialized before libc.


Yeah. It seems that initfirst is a crude hack which bypasses all
dependency tracking. I wonder if there's a place for another, hopefully
saner, dependency-respecting flag.

In a way, we cannot satisfy the constraints for

initfirst (to paraphrase Solaris, an initfirst library is initialized
before the initialization of other libraries present at load-time) and
also allow the initfirst library to have dependencies like this. It's a
contradiction and it just makes the loading process less deterministic.
The
developers just have to make sure that the constructors do not call
any functions from libraries that haven't been initialized yet (or call
functions that don't care about initialization).

In my own system, I needed libc functionality. What I did was write
a minimal library which had -z initfirst, and reimplement malloc,
read, write, and whatever else I needed.


That's possible approach but requiring all tools developers to do the same
seems like an overkill as they'll typically need to reimplement good part
of IO, getenv(), ELF symtab parser and (primitive) memory allocator.

There seems to be no way around that given the current primitive
DF_1_INITFIRST semantics, so I wonder if a better approach would be to
throw in a completely different dynamic flag for more precise control over
library initialization order.

This library would pass

off its data structures to another shared library, which really was
depending on libc and got initialized later. The user would write
LD_PRELOAD=libstage1.so:libstage2.so. My code had the requirement that it
had a constructor called very early, and another constructor called late,
however. In the simpler case where the debugging/profiling tool developer
needs to run some code early, then some code later which depends on libc
(but doesn't need constructing), it can be done from within a single
library. As libpthread currently is doing.

(I didn't try this, but maybe it could be arranged that calling malloc()
before libc is initialized uses the loader's own watermark allocator? The
loader itself has a similar dilemma, of course, and it uses its own malloc
until libc's becomes available...)

libdl is kind of a special case because it is so closely tied to the
loader. In my system, I ended up parsing the ELF headers from loaded
libraries to look up symbols. It's fairly simple to reproduce what the
loader is doing and walk its data structure to find load addresses. I
think, again, the best way to handle an initfirst library's dependency
on libdl would be to expose the loader's symbol map so that the library
could call loader functions if it really wanted to. A lot of libdl's
functionality (like dlopen'ing new libraries) just gets confusing at
initfirst time.

If we really wanted to honour these dependencies, we certainly could. I'm
just not sure it's what tool developers want.


I'm myself pretty sure that people would generally prefer to avoid
reimplementing parts of Glibc (symbol resolver in particular). Let's see if
Kostya has something to say.



Hm? To say about what? :)
We almost never use asan as a DSO on Linux, so we don't get any problems
like this.

Just to clarify: you mean you don't ASan DSO dependencies (libc,libpthread, librt, etc.) to be initialized before ASan initialization(i.e. __asan_init) runs? In that case the OP's patch would work for ASanDSO.



-dwk.

-Y


~ dwk.



----[ cut here ]----
Support -z initfirst for multiple shared libraries (run in load order).

This is particularly useful when combined with LD_PRELOAD, as it is then
possible to run constructors before any code in other libraries runs.
---
    elf/dl-init.c              |  9 ++++++++-
    elf/dl-load.c              | 19 ++++++++++++++++++-
    elf/dl-support.c           |  4 ++--
    sysdeps/generic/ldsodefs.h |  7 +++++--
    4 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/elf/dl-init.c b/elf/dl-init.c
index 818c3aa..da59d1f 100644
--- a/elf/dl-init.c
+++ b/elf/dl-init.c
@@ -84,7 +84,14 @@ _dl_init (struct link_map *main_map, int argc, char
**argv, char **env)

      if (__glibc_unlikely (GL(dl_initfirst) != NULL))
        {
-      call_init (GL(dl_initfirst), argc, argv, env);
+      struct initfirst_list *initfirst;
+      for(initfirst = GL(dl_initfirst); initfirst; initfirst =
initfirst->next)
+        {
+          call_init (initfirst->which, argc, argv, env);
+        }
+
+      /* We do not try to free this list, as the memory will not be
reclaimed
+         by the allocator unless there were no intervening malloc()'s.
*/
          GL(dl_initfirst) = NULL;
        }

diff --git a/elf/dl-load.c b/elf/dl-load.c
index c0d6249..1efabbf 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -1388,7 +1388,24 @@ cannot enable executable stack as shared object
requires");

      /* Remember whether this object must be initialized first.  */
      if (l->l_flags_1 & DF_1_INITFIRST)
-    GL(dl_initfirst) = l;
+    {
+      struct initfirst_list *new_node = malloc(sizeof(*node));
+      struct initfirst_list *it = GL(dl_initfirst);
+      new_node->which = l;
+      new_node->next = NULL;
+
+      /* We append to the end of the linked list. Whichever library was
loaded
+         first has higher initfirst priority. This means that
LD_PRELOAD
+         initfirst overrides initfirst in libraries linked normally.
*/
+      if (!it)
+        GL(dl_initfirst) = new_node;
+      else
+        {
+          while (it->next)
+            it = it->next;
+          it->next = new_node;
+        }
+    }

      /* Finally the file information.  */
      l->l_file_id = id;
diff --git a/elf/dl-support.c b/elf/dl-support.c
index c30194c..d8b8acc 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -147,8 +147,8 @@ struct r_search_path_elem *_dl_all_dirs;
    /* All directories after startup.  */
    struct r_search_path_elem *_dl_init_all_dirs;

-/* The object to be initialized first.  */
-struct link_map *_dl_initfirst;
+/* The list of objects to be initialized first.  */
+struct initfirst_list *_dl_initfirst;

    /* Descriptor to write debug messages to.  */
    int _dl_debug_fd = STDERR_FILENO;
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index ddec0be..198c089 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -326,8 +326,11 @@ struct rtld_global
      /* Incremented whenever something may have been added to dl_loaded.
*/
      EXTERN unsigned long long _dl_load_adds;

-  /* The object to be initialized first.  */
-  EXTERN struct link_map *_dl_initfirst;
+  /* The list of objects to be initialized first.  */
+  EXTERN struct initfirst_list {
+    struct link_map *which;
+    struct initfirst_list *next;
+  } *_dl_initfirst;

    #if HP_SMALL_TIMING_AVAIL
      /* Start time on CPU clock.  */

References:
- [PATCH] Support -z initfirst for multiple shared libraries
  - From: d wk
- Re: [PATCH] Support -z initfirst for multiple shared libraries
  - From: Yury Gribov
- Re: [PATCH] Support -z initfirst for multiple shared libraries
  - From: d wk
- Re: [PATCH] Support -z initfirst for multiple shared libraries
  - From: Yury Gribov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]