This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Evolution of ELF symbol management


On 11/23/2016 03:08 PM, Zack Weinberg wrote:

In general, headers should avoid using libc types, particularly
off_t, time_t, struct timeval, struct stat, and so on.  But there
might be exceptions.

... I'm not sure why we're suddenly discussing typedef names?

I'm trying to come up with reasons why the usual header file conflict avoidance mechanisms would not work.

I mean, if you use C, you pretty much agreed to using separate compilation to work around header conflict issues.

For C, it's not clear at all to me whether we need any kind of
opt-in besides compiling with a particular _*_SOURCE variant (which
introduces the definition).

I am having trouble articulating why I don't like this.  I think it
might be mostly to do with the way new symbols are now different than
old symbols.  I'd be okay with an across-the-board change to symbol
resolution (as discussed below) that made interposition not work by
default, but I'm not okay with the idea of its being supported only for
symbols added before some arbitrary release.

I prefer the case-by-case approach because it allows us to review ABI changes individually.

Third-party C libraries don't tend to put nearly as much code in inline
functions, so the need for explicitly-useable __names is lessened there,
I think.

Right, and separate compilation is available as workaround.

For C++, we might use something based on namespaces to get a clear
separation.  However, the problem there is that type names and
struct tags end up in C++ mangled identifiers and thus impact
application ABI. I have no good idea what to do there.

Arguably that's a Good Thing -- a change to what off_t means is an ABI
break whether or not it shows up in symbol names, and _making_ that one
in particular show up in symbol names might solve some of the problems
that lead to _FILE_OFFSET_BITS=64 still not being default for the older
32-bit architectures.

Suppose we want to make struct sockaddr_un available to C++ code under a namespaced name. Then C++ code has to use that name. But this could change ABI on the C++ side merely due to name mangling (on top of potential type compatibility issues interface with user code). If we do not solve this in some way, I don't think many C++ projects will switch to internal names to avoid the header file collision because it's not worth the impact on compatibility.

I was imagining a new annotation on _all_ undefined symbols in a
shared object, giving the soname of the object that they were
satisfied by at link time.  At load time, 'getrandom!libc.so.6'
resolves to the 'getrandom' definition in libc.so.6, ignoring all
other definitions of the same name.  If there are symbol versions
involved, only the versions exported by libc.so.6 are considered.
For instance, 'getrandom!libc.so.6@GLIBC_2.25' cannot be satisfied
by 'getrandom@GLIBC_2.25' exported by libmissing-syscalls.so.1.

We still need to support LD_PRELOAD and interposition of arbitrary
symbols, and not just malloc-related ones, for the benefit of
Address Sanitizer, fakeroot, cwrap, memstomp and other tools.

This is why hard-coding the DSO name does not seem advisable.

This argument applies equally to every new symbol we might add, and in
fact to every _intra_-libc call that currently _can't_ be interposed.
So I'm inclined to discount it.

I think there's a big difference if you have to write new interceptors to support newer glibc versions, or if you have to rewrite your whole library as an audit module because the ability to interpose the symbols you are interested in is gone completely.

The solution I'm leaning toward involves each library designating a set
of exported symbols, calls to which _can_ be interposed; the default is
not to allow it.  We'd probably have to spend some time figuring out
exactly which of libc's symbols should be interposeable.

It seems to me that interposition of arbitrary symbols is currently part of the programming interface. We didn't plan for things like fakeroot and cwrap, but someone created those tools eventually, and they apparently address a real need.

When an application is linked against a shared object, if it
interposes any symbols in it, the symbols becomes exported, so that
interposition works at run time (otherwise, it could not happen).
You can see an example here:

$ nm -Dg malloc/tst-interpose-nothread  | grep ' T '

The application is *not* compiled with -Bdynamic or something like
that, it happens automatically.

But the symbol version from libc.so.6 is not attached to this symbol
(“nm” would not show it, but you can check with eu-readelf, for
example).

Well, OK, why don't we just fix that?  Is there a good reason why it
_doesn't_ pick up the symbol version?

I'm not sure if interposition at load time will still happen. But this should be easy to verify. I'll give it a try.

We agreed that the unmangled name has to exist, so how about we move
forward by introducing only the unmangled names for the new symbols
currently proposed (getrandom, explicit_bzero), introduce mangling if
necessary based on feedback, and work toward a long-term solution that
can be applied across the board?

What kind of feedback would trigger mangled names? Is having a real-world application which triggers accidental interposition sufficient?

For getrandom, not using the mangled name by default looks like a security bug in the making. Less so for explict_bzero.

Thanks,
Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]