This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: ELF linking question related to symbol collisions
- From: Florian Weimer <fweimer at redhat dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>
- Cc: libc-help <libc-help at sourceware dot org>
- Date: Wed, 18 Dec 2013 15:47:25 +0100
- Subject: Re: ELF linking question related to symbol collisions
- Authentication-results: sourceware.org; auth=none
- References: <528CB7AB dot 2080901 at redhat dot com> <CAE2sS1gi_qV9n8Ynerh6nWTdTAftM1xvt=hQZ7DxxP9jfGyGwQ at mail dot gmail dot com> <528DDBB9 dot 8090208 at redhat dot com> <528E779E dot 3050009 at redhat dot com>
On 11/21/2013 10:14 PM, Carlos O'Donell wrote:
No, that's not the way it works. You must manage the global namespace
or collisions will lead to incorrect runtime behaviour.
Oh well, thanks for confirming my suspicions.
So we have several desktop applications that have an ambiguous
reference to json_object_get_type, via the pulseaudio library.
Yes, and that is a serious problem.
It turns out that there are further collisions between json-c and
jansson, on the json_object_get and json_object_iter_next symbol, but we
haven't binaries in Fedora that trigger it (based on the static
DT_NEEDED information).
The problem is mainly that they fall afoul of the ELF rules for
interposition.
Right. But apart from the occasional LD_PRELOAD and the language
interpreter optimization (which Fedora doesn't use—you compile your main
Python/Perl/etc interpreter binary with a statically linked non-PIC copy
of the actual interpreter implementation, exported dynamically, and your
extension modules link against that instead of the interpreter DSO), I
don't think there is actual use of that.
Or more to the point, I'm not sure if the current linking algorithm is
what we need. Something like -B direct might be a better fit for our
current needs.
The trouble with this is that's fairly difficult to detect. Static
analysis misses collisions introduced by dlopen and dlsym.
Right, in this case you need a special-purpose analysis tool to
catch this, something that models the dynamic linker and ELF.
I have a pretty good approximation, but exclusively based on data that
is statically available (and some heuristics to avoid implementing rpath
resolution or /etc/ld.so.conf.d/ handling). The amount of data is
fairly large, so I haven't tried to detect actual symbol interposition.
Symbol collisions are only bad if both symbols do not implement the
same ABI and API. If they do implement the same ABI and API then it's
a replacement function that is safe to interpose.
Unless that function references static symbols, then it may or may not
be safe. If everything ends up being interposed, it is okay, but if
not, code might hit different static symbols.
Symbol versioning does not solve the problem in general either since
then you need a global version name management, and you need to fix
all applications to use versioning which is a huge amount of work.
Even then you can still have problems if the projects lack the rigour
required to update their version maps.
I had hoped it would be possible to use a static map with a single
version-like string, like JSON-C, JSON-GLIB, and JANSSON. Maybe we
could even use the soname for this by default.
This is easier than renaming everything under a shared prefix, and would
not affect backward compatibility. Perhaps I'm a bit naïve, but I
suspect this could be used to implement part of the linking semantics of
-B direct and reap some of the performance benefits because it is easier
to locate the correct hash table.
A different linking algorithm isn't helpful either because at static
link time you don't need the devel libraries for all dependent libraries,
and requiring it would make compiling anything much more complicated.
Sure, but I think static linking has no future. If it reappears, it
will be in the form of LTO object files.
The only robust solution I see is a post-build tool that looks for
global namespace collisions and rejects the build if they exist.
That's difficult to do because anything that needs to consider more than
one piece of software in isolation has a high overhead, both
performance-wise and administratively.
If we decide to implement namespace management outside of the toolchain,
I think we should have a list of symbols and symbol prefixes that map to
library (soname, and then OS package) that defines them. If a library
(or dynamic executable) uses a symbol outside that list, we'd either
have to fix the list or address the unintentional out-of-namespace
symbol leakage. Both measures can be taken even before any actual
collision materializes, and it's only
The
workaround might be to register your allowed symbol interpositions in
the spec file such that the post-build tool can use those to resolve
such allowances. Note that just stating that symbol X may be interposed
is not sufficient to make this system safe, you must say symbol X from
SONAME Y may interpose.
I think it should be outside the SPEC file because of the syntax issues
(no good parsers, parsing requires arbitrary code execution etc.).
What's your next step?
If there's a strong desire to implement -B direct, that's the way to go,
but it's a bit out of my area of expertise. If there isn't, I'll look
into writing an out-of-band namespace model for the upcoming Fedora Base
package set.
--
Florian Weimer / Red Hat Product Security Team