On 04/27/2016 04:57 PM, d wk wrote:
Good points!
On Wed, Apr 27, 2016 at 3:03 AM, Yury Gribov <y.gribov@samsung.com>
wrote:
On 04/26/2016 11:58 PM, d wk wrote:
Hello libc developers,
In a project of mine, I needed to run some code before any constructors
from any system libraries (such as libc or libpthread). The
linker/loader
-z initfirst feature is perfect for this, but it only supports one
shared
library. Unfortunately libpthread also uses this feature (I assume the
feature exists because pthread needed it), so my project was
incompatible
with libpthread.
So, I wrote a small patch which changes the single dl_initfirst variable
into a linked list. This patch does not change the size of any data
structures (it's ABI compatible), just turns dl_initfirst into a list.
The
list is not freed (the allocator wouldn't free it anyway), and insertion
into the list is quadratic, but I expect there will never be more than
a handful of initfirst libraries!
This patch records initfirst libraries in load order, so LD_PRELOAD
libraries will have their constructors called before libpthread. If the
opposite behaviour is desired, the LD_PRELOAD'd library can always
declare
a dependency on libpthread. Normally LD_PRELOAD constructors are run
last,
which is very inconvenient when trying to inject new functionality, and
I expect anyone using -z initfirst with LD_PRELOAD to really want to run
first. The patch is written against latest glibc 2.23 (I also tested on
glibc 2.21, and it's not quite compatible with 2.17 since the other data
structures changed).
I was not the first person to run into this problem,
someone wanted the same thing on stack overflow two years
ago. You can see my answer there with a complete test case.
http://stackoverflow.com/questions/19796383/linux-ld-preload-z-initfirst/36143861
Hope you will accept this patch. Comments welcome. Thanks,
Hi,
I think that many debugging/profiling tools would want this feature (e.g.
for AddressSanitizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56393#c9).
But here are few questions you may want to consider:
* is your solution compatible with DF_1_INITFIRST behavior on Solaris?
From reading the Solaris ld.so.1(1) man page, they say that initfirst
"marks the object so that its runtime initialization occurs before the
runtime initialization of any other objects brought into the process at
the same time". Here LD_PRELOAD objects are arguably brought into the
process earlier, as described by their ld(1) man page. So we ought to
ensure that LD_PRELOAD initfirst libraries are initialized before all
other LD_PRELOAD libraries, and also that normal initfirst libraries
are initialized before all other normal libraries. Actually we will run
LD_PRELOAD initfirst, then normal initfirst, then normal constructors,
then normal LD_PRELOAD constructors (running these last is the default
behaviour without initfirst). So although it's a bit complicated, I think
the behaviour of this patch is compatible with Solaris.
Agreed.
* should DF_1_INITFIRST also influence destruction order?
Solaris does: "object runtime finalization will occur after the runtime
finalization of any other objects removed from the process at the same
time". glibc's previous initfirst did not do this (I guess pthread didn't
need a destructor). In general I think this is much less important, it's
usually only used for proper cleanup. The destructor code in dl-fini.c
also looks more complicated to adapt, but I can try if this is deemed
important. It seems like an orthogonal issue to me.
* what if initfirst library has some dependencies e.g. it needs malloc
from
Glibc or dlsym from libdl.so during construction (that's e.g.
AddressSanitizer's case)? The current logic of initfirst is rather
primitive
as it does not track such dependencies at all.
Unfortunately, libpthread depends on libc -- yet it uses initfirst to get
initialized before libc.
Yeah. It seems that initfirst is a crude hack which bypasses all
dependency tracking. I wonder if there's a place for another, hopefully
saner, dependency-respecting flag.
In a way, we cannot satisfy the constraints for
initfirst (to paraphrase Solaris, an initfirst library is initialized
before the initialization of other libraries present at load-time) and
also allow the initfirst library to have dependencies like this. It's a
contradiction and it just makes the loading process less deterministic.
The
developers just have to make sure that the constructors do not call
any functions from libraries that haven't been initialized yet (or call
functions that don't care about initialization).
In my own system, I needed libc functionality. What I did was write
a minimal library which had -z initfirst, and reimplement malloc,
read, write, and whatever else I needed.
That's possible approach but requiring all tools developers to do the same
seems like an overkill as they'll typically need to reimplement good part
of IO, getenv(), ELF symtab parser and (primitive) memory allocator.
There seems to be no way around that given the current primitive
DF_1_INITFIRST semantics, so I wonder if a better approach would be to
throw in a completely different dynamic flag for more precise control over
library initialization order.
This library would pass
off its data structures to another shared library, which really was
depending on libc and got initialized later. The user would write
LD_PRELOAD=libstage1.so:libstage2.so. My code had the requirement that it
had a constructor called very early, and another constructor called late,
however. In the simpler case where the debugging/profiling tool developer
needs to run some code early, then some code later which depends on libc
(but doesn't need constructing), it can be done from within a single
library. As libpthread currently is doing.
(I didn't try this, but maybe it could be arranged that calling malloc()
before libc is initialized uses the loader's own watermark allocator? The
loader itself has a similar dilemma, of course, and it uses its own malloc
until libc's becomes available...)
libdl is kind of a special case because it is so closely tied to the
loader. In my system, I ended up parsing the ELF headers from loaded
libraries to look up symbols. It's fairly simple to reproduce what the
loader is doing and walk its data structure to find load addresses. I
think, again, the best way to handle an initfirst library's dependency
on libdl would be to expose the loader's symbol map so that the library
could call loader functions if it really wanted to. A lot of libdl's
functionality (like dlopen'ing new libraries) just gets confusing at
initfirst time.
If we really wanted to honour these dependencies, we certainly could. I'm
just not sure it's what tool developers want.
I'm myself pretty sure that people would generally prefer to avoid
reimplementing parts of Glibc (symbol resolver in particular). Let's see if
Kostya has something to say.