[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU dlopen(3) differs from POSIX/IEEE





On 18-Jun-2016 11:02 AM, Carlos O'Donell wrote:
On 06/18/2016 12:11 AM, Suprateeka R Hegde wrote:
All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in
foo at runtime to be compliant with POSIX.

I disagree. Nothing in POSIX says that needs to be done. The
key failure in your reasoning is that you have assumed lazy
symbol resolution must happen at the point of the first function
call.

ld(1) on a GNU/Linux machine says:
---
-z lazy

When generating an executable or shared library, mark it to tell the dynamic linker to defer function call resolution to the point when the function is called (lazy binding)
---

This made me think that GNU implementation also matches with other implementations -- that is lazy resolution happens at the time of the first call.

You have read "shall be made available for relocation" and
then used implementation knowledge to decide that _today_ those
relocations have a happens-after relationship with dlopen in your
program. But because lazy symbol resolution is not an observable
event for a well-defined program,

Yes. I agree very much. But making some massive enterprise legacy application to become "well-defined" now is beyond tool chain writers.

The very use of --unresolved-symbol=ignore all for an executable link is bad in a way.

and no guarantees are made,
you can't make a happens-after relationship, and can't expect
'foo' to resolve to the loaded 'foo' that came into the global
scope with dlopen.

Perhaps in the future you want a mode where all lazy symbol
resolution is done before the first dlopen runs. Say we want to
do this to relocate the whole PLT and mark it read-only for
safety hardening.

This is going to be a "mode". Almost similar to BIND_NOW. But not default. Even if decided default, a non-default (lazy writable PLTs) mode still exists.

If you were to _require_ lazy resolution to happen at the point
of the function call, which is what you're assuming here, then
it would prevent the above implementation from being conforming.

Both are mutually exclusive. In my opinion, programs either want immediate binding or lazy binding. Not an arbitrary mix of both.

However, because POSIX says nothing about when the lazy symbol
resolution happens, or anything at all about it,

It indeed says something:
---
RTLD_LAZY

Relocations shall be performed at an implementation-defined time, ranging from the time of the dlopen() call until the first reference to a given symbol occurs
---

And then based on the ld(1) manpage, I thought GNU/Linux implementation uses the time of first call.

What is the harm if we go by the existing documentation and under the option -z lazy or RTLD_LAZY, make lazy resolution happen at the point of function call?

(BTW, the above is already in place currently and is working as expected)

And eventually change the semantics of RTLD_GLOBAL to match the description mentioned in the POSIX spec -- ...relocation processing of any other executable object file.

--
Supra