[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: Audit external function called indirectly via GOT



> To be specific we are talking about the Solaris LD_AUDIT support that is
> implemented in the GNU dynamic loader ld.so. This has been a very useful
> thing for developers to have, particularly those working on schemes that
> alter lookup paths or binding rules. Also those that use these hooks to
> do other useful auditing. There were a lot of Solaris LD_AUDIT users, and
> now there are a lot of users that use this same feature in the GNU tools.

The description of la_symbind*() says this:

       "The return value of la_symbind32() and la_symbind64() is the address
       to which control should be passed after the function returns.  If the
       auditing library is simply monitoring symbol bindings, then it should
       return sym->st_value.  A different value may be returned if the
       library wishes to direct control to an alternate location."

That implies that it is called only for symbols that are
dynamically-bound (i.e., lazy binding). Does this mean that you want
to cancel the immediate binding effects of -fno-plt?

> The problem comes when you build with -fno-plt, or if you elide a PLT slot
> for any other reason, there is no longer a place for the LD_AUDIT
> infrastructure to hook into.
>
> In the case of x86 the -fno-plt generated code is a direct call through
> the GOT. The GOT is RO after relocation (relro), and so most tooling expects
> that it cannot be changed. Therefore it's not entirely kosher to reuse the
> GOT for this purpose, though you could do that, in fact on x86 the GLOB_DAT
> reloc and GOT entry look an awful lot like a function descriptor and a call
> through that function descriptor (for arches that have non-code PLTs).
>
> By keeping the generation of the PLT slot, but not using it, you can go back
> and re-use that PLT entry for auditing. If you are RELRO then you are going
> to pay a performance cost for turning on auditing, you will be forced to
> go through the PLT call sequence every time, enter the loader, find your
> already computed resolution in the loader's cache, and continue. If you are
> non-RELRO you can finalize the binding in the PLT.

I'm not sure if you're saying that this is worse with -fno-plt than
without. Wouldn't you have the same performance cost either way, if
auditing is turned on?

> What does "statically relocated" mean?

If I'm reading HJ's proposal correctly, he's got: (1) a regular GOT
entry (with a GLOB_DAT relocation), (2) a "provisional" PLTGOT entry
(with a JUMP_SLOT relocation), and (3) a "provisional" PLT entry for
each external function, and all these extra dynamic table entries are
there so that:

(1) the dynamic loader can find the provisional PLTGOT entry for the
same function by matching the GLOB_DAT relocations with the JUMP_SLOT
relocations,
(2) use that to find the corresponding provisional PLT entry,
(3) relocate the GOT entry to point to that PLT entry,
(4) which will then proceed to use the PLTGOT entry for binding as if
-fno-plt had not been used.

My suggestion was that the GOT entry could be statically initialized
by the linker to point to the provisional PLT entry, rather than
forcing the dynamic loader to go through all this messy computation.
If auditing is not enabled, it would process the GLOB_DAT relocation
normally, and set the GOT entry to point to the actual function,
bypassing the provisional PLT and PLTGOT entries completely. If
auditing is enabled, it could simply ignore the GLOB_DAT relocation
(or, if the binary is PIE, it could process it as a RELATIVE
relocation), and the -fno-plt calls will end up jumping to the
provisional PLT entry.

(This is already how we handle the PLTGOT entries: the linker
statically initializes the entries to point to part (b)* of the PLT
entry, while putting JUMP_SLOT relocations for those entries into the
JMPREL table.)

I think if you do that, none of these extra dynamic table entries will
be needed, except for the IGNORE_JMPREL flag that indicates there are
no JMPREL slots other than those for the provisional PLT entries. How
useful is that flag? If the final program has even one external call
that was *not* compiled with -fno-plt, you won't be able to set it.
Would it be better to partition the JMPREL and PLT tables into
"regular" and "provisional" entries? That would take just a single new
DT_PROVISIONAL_JMPREL entry to tell the dynamic loader where the
JMPREL entries for the provisional PLT entries begin; it can ignore
everything past that point when auditing is turned off.

I suppose you may also want to partition the GLOB_DAT relocations, so
that the dynamic loader can easily figure out which ones to ignore
when auditing is enabled. That would take another dynamic table entry.

Now, why do we need both the regular GOT entry and the provisional
PLTGOT entry? If the program is linked with -z relro and lazy binding,
you can put the GOT entries in the RELRO segment, and the PLTGOT
entries in writable data. That gives you the security when auditing is
turned off, and the ability to dynamically patch the PLTGOT when it's
turned on. In any other case, however, I see no reason to have both.
If you get rid of the GOT entry, and have the point of call jump
indirectly through the PLTGOT entry, which is initialized to point to
part (b) of the PLT entry, everything should work the same as without
-fno-plt. Essentially, all -fno-plt would do is inline part (a) of the
PLT entry.

-cary

* I'm using parts (a) and (b) to refer to the two parts of a PLT
entry: (a) an indirect jump via the PLTGOT entry, and (b) code that
jumps to the lazy binding routine, passing the JUMP_SLOT index.