This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: ELF linking question related to symbol collisions
- From: Florian Weimer <fweimer at redhat dot com>
- To: "Carlos O'Donell" <carlos at systemhalted dot org>
- Cc: libc-help <libc-help at sourceware dot org>
- Date: Thu, 21 Nov 2013 11:08:57 +0100
- Subject: Re: ELF linking question related to symbol collisions
- Authentication-results: sourceware.org; auth=none
- References: <528CB7AB dot 2080901 at redhat dot com> <CAE2sS1gi_qV9n8Ynerh6nWTdTAftM1xvt=hQZ7DxxP9jfGyGwQ at mail dot gmail dot com>
On 11/20/2013 10:13 PM, Carlos O'Donell wrote:
On Wed, Nov 20, 2013 at 8:22 AM, Florian Weimer <fweimer@redhat.com> wrote:
I've got a program which links (indirectly) to two DSOs which define the
same function. Is it guaranteed that ld.so resolves a symbol reference to
the topologically closest definition (from its own dependency graph), or
will ld.so pick a definition more or less at random?
To be clear:
Program -> lib1.so -> lib1a.so (defines foo)
\--> lib2.so -> lib2a.so (defines foo)
Call sequence is: Program->lib1.so (some function)-> foo (which foo?)
Program was built with `-l1 -l2' (very important because it sequences DT_NEEDED)
Thanks for your explanation.
In this case the topological sort results in the following flat
sequence (on x86-64):
/lib64/ld-linux-x86-64.so.2
/lib64/libc.so.6
./lib1a.so
./lib2a.so
./lib1.so
./lib2.so
Thus the answer to "which foo?" is "lib1a.so's foo."
And lib2.so will get the same foo? Ugh.
It's the closest definition from the *program* not ld.so, but since
ld.so is always
the first dependency then it can be correct to say this also.
I was hoping that ld.so picks the closest definition from the
referencing library, so that lib1.so would get the definition from
lib1a.so, and lib2.so would end up with the one from lib2a.so. That
would scale a little bit better despite the lack of global namespace
management.
The backstory on my question is this. I mistook an embedded copy of the
json-glib library for a copy of json-c, a totally different library
which also uses the json_object_ prefix for some of its functions. It
turns out that there is just one colliding symbol, json_object_get_type.
So I set out to find programs (f4) which link to both json-c (f1) and
json-glib (f2), and also link to something (f3) that references the
json_object_get_type function.
SELECT DISTINCT f4.name AS toplevel, f3.name AS json_object_get_type
FROM symboldb.file f1
JOIN symboldb.elf_closure ec1 ON f1.file_id = ec1.needed
CROSS JOIN symboldb.file f2
JOIN symboldb.elf_closure ec2
ON f2.file_id = ec2.needed AND ec1.file_id = ec2.file_id
JOIN symboldb.elf_closure ec3 ON ec3.file_id = ec2.file_id
JOIN symboldb.file f3
ON ec3.file_id = f3.file_id OR ec3.needed = f3.file_id
JOIN symboldb.elf_reference er ON f3.contents_id = er.contents_id
JOIN symboldb.file f4 ON ec3.file_id = f4.file_id
JOIN symboldb.package p ON f4.package_id = p.package_id
JOIN symboldb.package_set_member psm ON p.package_id = psm.package_id
WHERE f1.name = '/usr/lib64/libjson-c.so.2.0.1'
AND f2.name = '/usr/lib64/libjson-glib-1.0.so.0.1600.0'
AND er.name = 'json_object_get_type'
AND psm.set_id = symboldb.package_set('Fedora/19/x86_64');
I'm not sure how well the table will be preserved, but here it is:
toplevel | json_object_get_type
--------------------------------------------+-------------------------------
/usr/bin/gnome-control-center | /usr/lib64/libpulse.so.0.15.3
/usr/lib64/gnome-shell/libgnome-shell.so | /usr/lib64/libpulse.so.0.15.3
/usr/lib64/empathy/libempathy-gtk-3.8.4.so | /usr/lib64/libpulse.so.0.15.3
/usr/lib64/cinnamon/libcinnamon.so | /usr/lib64/libpulse.so.0.15.3
/usr/bin/gnome-shell | /usr/lib64/libpulse.so.0.15.3
/usr/libexec/empathy-auth-client | /usr/lib64/libpulse.so.0.15.3
/usr/bin/empathy-accounts | /usr/lib64/libpulse.so.0.15.3
/usr/bin/cinnamon | /usr/lib64/libpulse.so.0.15.3
/usr/libexec/empathy-call | /usr/lib64/libpulse.so.0.15.3
/usr/bin/empathy-debugger | /usr/lib64/libpulse.so.0.15.3
/usr/bin/gnome-boxes | /usr/lib64/libpulse.so.0.15.3
/usr/bin/empathy | /usr/lib64/libpulse.so.0.15.3
/usr/libexec/empathy-chat | /usr/lib64/libpulse.so.0.15.3
(13 rows)
So we have several desktop applications that have an ambiguous reference
to json_object_get_type, via the pulseaudio library.
The trouble with this is that's fairly difficult to detect. Static
analysis misses collisions introduced by dlopen and dlsym.
Warning! The answer changes if you link with `-l2 -l1' because it changes
the DT_NEEDED ordering which changes the order in which the graph
is traversed.
Thanks for the warning. I should record the order of DT_NEEDED elements
in the database. I also need to reflect this in the elf_closure table
in some way, to model the ld.so behavior more accurately.
If I understand things correctly, this unpredictability means that
symbol collisions are always bad, even if they are working at present,
because a change in the dependency graph could interpose a different
definition in the future.
Beware that if you introduce cycles the problem becomes non-deterministic
and depends on where you break the cycle. We have several bugs open at
the moment against glibc to make the cycle breaking deterministic and to
enable testing for millions of permutations of N cycles to double check
that the present code does the right thing. We'd appreciate any contributions
in this area as the dynamic linker code is in my opinion in need of refactoring
and simplification to enable future development.
I'm a bit worried that we're facing scalability issues at the
distribution level because we lack proper namespace management. I've
seen that a number of libraries do not export internal symbols, which is
good, but I think we either need some global namespace management (which
will be difficult for APIs with fairly general names, such as MAPI), a
different linking algorithm which provides better encapsulation
(reducing accidental symbol interposition), or a more aggressive move
towards symbol versioning as a namespace management tool. Or maybe
something else altogether. This is a really complicated topic, and I
don't think it can be considered in isolation from ld.so performance
improvements.
This is not mere speculation, we already had symbol collisions which
impacted customers.
--
Florian Weimer / Red Hat Product Security Team