This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Evolution of ELF symbol management
On 11/22/2016 10:09 AM, Florian Weimer wrote:
> On 11/19/2016 06:25 PM, Zack Weinberg wrote:
>> On 11/18/2016 10:48 AM, Florian Weimer wrote:
>>> If we don't declare the library-safe names in headers, how can
>>> libraries call them?
>>
>> We could _declare_ the library-safe names in the headers, just not
>> as the primaries. Like how string.h currently declares both bzero
>> and __bzero.
>
> I have no idea at all why __bzero was introduced.
__bzero was the first case that came to mind where existing headers
declare both a name and a __name for the same function, is all. I've
been looking at string.h a lot lately.
>>> I also do not want to encourage application or library code to
>>> reference the implementation namespace at the source code level.
>>> It's ugly, and I suspect it encourages implementation namespace
>>> pollution once programmers are used to it.
>>
>> I don't like it either, but how else could a library's headers opt
>> into these special names on a per-symbol, per-use basis?
>
> In general, headers should avoid using libc types, particularly
> off_t, time_t, struct timeval, struct stat, and so on. But there
> might be exceptions.
... I'm not sure why we're suddenly discussing typedef names?
> For C, it's not clear at all to me whether we need any kind of
> opt-in besides compiling with a particular _*_SOURCE variant (which
> introduces the definition).
I am having trouble articulating why I don't like this. I think it
might be mostly to do with the way new symbols are now different than
old symbols. I'd be okay with an across-the-board change to symbol
resolution (as discussed below) that made interposition not work by
default, but I'm not okay with the idea of its being supported only for
symbols added before some arbitrary release.
Third-party C libraries don't tend to put nearly as much code in inline
functions, so the need for explicitly-useable __names is lessened there,
I think.
> For C++, we might use something based on namespaces to get a clear
> separation. However, the problem there is that type names and
> struct tags end up in C++ mangled identifiers and thus impact
> application ABI. I have no good idea what to do there.
Arguably that's a Good Thing -- a change to what off_t means is an ABI
break whether or not it shows up in symbol names, and _making_ that one
in particular show up in symbol names might solve some of the problems
that lead to _FILE_OFFSET_BITS=64 still not being default for the older
32-bit architectures.
>> Come to think of it, to actually avoid polluting the user
>> namespace, any library that wants to use these will need a
>> secondary set of libc headers that declare _only_ the private
>> names. (This is especially relevant for C++ with so much code in
>> headers.) If we don't do that, the user-namespace libc prototype
>> (which still exists under your plan) might conflict with an
>> unrelated application definition.
>
> Yes, that's what using a namespace for C++ would achieve.
>
> I'm less convinced we should do this for C. There is precedent,
> Microsoft did exactly this for the POSIX-inspired interfaces in
> their libc (functions like _open, _close and so on).
That definitely does cause grief for portable code...
>> I was imagining a new annotation on _all_ undefined symbols in a
>> shared object, giving the soname of the object that they were
>> satisfied by at link time. At load time, 'getrandom!libc.so.6'
>> resolves to the 'getrandom' definition in libc.so.6, ignoring all
>> other definitions of the same name. If there are symbol versions
>> involved, only the versions exported by libc.so.6 are considered.
>> For instance, 'getrandom!libc.so.6@GLIBC_2.25' cannot be satisfied
>> by 'getrandom@GLIBC_2.25' exported by libmissing-syscalls.so.1.
>
> We still need to support LD_PRELOAD and interposition of arbitrary
> symbols, and not just malloc-related ones, for the benefit of
> Address Sanitizer, fakeroot, cwrap, memstomp and other tools.
>
> This is why hard-coding the DSO name does not seem advisable.
This argument applies equally to every new symbol we might add, and in
fact to every _intra_-libc call that currently _can't_ be interposed.
So I'm inclined to discount it.
The solution I'm leaning toward involves each library designating a set
of exported symbols, calls to which _can_ be interposed; the default is
not to allow it. We'd probably have to spend some time figuring out
exactly which of libc's symbols should be interposeable.
> When an application is linked against a shared object, if it
> interposes any symbols in it, the symbols becomes exported, so that
> interposition works at run time (otherwise, it could not happen).
> You can see an example here:
>
> $ nm -Dg malloc/tst-interpose-nothread | grep ' T '
>
> The application is *not* compiled with -Bdynamic or something like
> that, it happens automatically.
>
> But the symbol version from libc.so.6 is not attached to this symbol
> (“nm” would not show it, but you can check with eu-readelf, for
> example).
Well, OK, why don't we just fix that? Is there a good reason why it
_doesn't_ pick up the symbol version? (I know that for compatibility's
sake an unversioned symbol has to interpose all versions of the same
name, but that wouldn't seem to apply when the versioned symbol was
visible to the static linker.)
>> I thought the issue here was controlling *which library* provides
>> a symbol, independent of whether the symbol has versions.
>
> I'm not convinced this desirable because of the exceptions I listed
> above. Both manual name mangling (which I currently prefer) and
> symbol versioning with the no-interpose flag (the one I sketched
> earlier) would support them. But only name mangling addresses
> collisions before the first static link.
I hope I've explained clearly enough already why I'm not a fan of manual
name mangling, especially on an ad-hoc or new-symbols-only basis.
>> All this aside, this discussion is still very brainstormy and that
>> makes me think that we should *not* yet be supplying mangled names
>> for public use. Once we start doing that we are stuck with it
>> forever, after all. Contrariwise, we *can* always retrofit __libc_*
>> aliases or whatever once we know what we ought to be doing.
>
> I think getrandom is special because the name is rather generic,
> just like some of the new libm names (where we still might have to
> introduce mangling based on feedback from distribution rebuilds; I
> just don't know yet).
We agreed that the unmangled name has to exist, so how about we move
forward by introducing only the unmangled names for the new symbols
currently proposed (getrandom, explicit_bzero), introduce mangling if
necessary based on feedback, and work toward a long-term solution that
can be applied across the board?
zw