This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ELF linking question related to symbol collisions


On 11/21/2013 05:08 AM, Florian Weimer wrote:
> On 11/20/2013 10:13 PM, Carlos O'Donell wrote:
>> On Wed, Nov 20, 2013 at 8:22 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>> I've got a program which links (indirectly) to two DSOs which define the
>>> same function.  Is it guaranteed that ld.so resolves a symbol reference to
>>> the topologically closest definition (from its own dependency graph), or
>>> will ld.so pick a definition more or less at random?
>>
>> To be clear:
>>
>> Program -> lib1.so -> lib1a.so (defines foo)
>>             \--> lib2.so -> lib2a.so (defines foo)
>>
>> Call sequence is: Program->lib1.so (some function)-> foo (which foo?)
>>
>> Program was built with `-l1 -l2' (very important because it sequences DT_NEEDED)
> 
> Thanks for your explanation.
> 
>> In this case the topological sort results in the following flat
>> sequence (on x86-64):
>> /lib64/ld-linux-x86-64.so.2
>> /lib64/libc.so.6
>> ./lib1a.so
>> ./lib2a.so
>> ./lib1.so
>> ./lib2.so
>>
>> Thus the answer to "which foo?" is "lib1a.so's foo."
> 
> And lib2.so will get the same foo?  Ugh.

Yes. These are the rules for interposition in ELF.
 
>> It's the closest definition from the *program* not ld.so, but since
>> ld.so is always
>> the first dependency then it can be correct to say this also.
> 
> I was hoping that ld.so picks the closest definition from the
> referencing library, so that lib1.so would get the definition from
> lib1a.so, and lib2.so would end up with the one from lib2a.so. That
> would scale a little bit better despite the lack of global namespace
> management.

No, that's not the way it works. You must manage the global namespace
or collisions will lead to incorrect runtime behaviour.
 
> The backstory on my question is this. I mistook an embedded copy of
> the json-glib library for a copy of json-c, a totally different
> library which also uses the json_object_ prefix for some of its
> functions. It turns out that there is just one colliding symbol,
> json_object_get_type.
> 
> So I set out to find programs (f4) which link to both json-c (f1) and
> json-glib (f2), and also link to something (f3) that references the
> json_object_get_type function.
> 
> SELECT DISTINCT f4.name AS toplevel, f3.name AS json_object_get_type
>   FROM symboldb.file f1
>   JOIN symboldb.elf_closure ec1 ON f1.file_id = ec1.needed
>   CROSS JOIN symboldb.file f2
>   JOIN symboldb.elf_closure ec2
>     ON f2.file_id = ec2.needed AND ec1.file_id = ec2.file_id
>   JOIN symboldb.elf_closure ec3 ON ec3.file_id = ec2.file_id
>   JOIN symboldb.file f3
>     ON ec3.file_id = f3.file_id OR ec3.needed = f3.file_id
>   JOIN symboldb.elf_reference er ON f3.contents_id = er.contents_id
>   JOIN symboldb.file f4 ON ec3.file_id = f4.file_id
>   JOIN symboldb.package p ON f4.package_id = p.package_id
>   JOIN symboldb.package_set_member psm ON p.package_id = psm.package_id
>   WHERE f1.name = '/usr/lib64/libjson-c.so.2.0.1'
>   AND f2.name = '/usr/lib64/libjson-glib-1.0.so.0.1600.0'
>   AND er.name = 'json_object_get_type'
>   AND psm.set_id = symboldb.package_set('Fedora/19/x86_64');
> 
> I'm not sure how well the table will be preserved, but here it is:
> 
>                   toplevel                  |     json_object_get_type
> --------------------------------------------+-------------------------------
>  /usr/bin/gnome-control-center              | /usr/lib64/libpulse.so.0.15.3
>  /usr/lib64/gnome-shell/libgnome-shell.so   | /usr/lib64/libpulse.so.0.15.3
>  /usr/lib64/empathy/libempathy-gtk-3.8.4.so | /usr/lib64/libpulse.so.0.15.3
>  /usr/lib64/cinnamon/libcinnamon.so         | /usr/lib64/libpulse.so.0.15.3
>  /usr/bin/gnome-shell                       | /usr/lib64/libpulse.so.0.15.3
>  /usr/libexec/empathy-auth-client           | /usr/lib64/libpulse.so.0.15.3
>  /usr/bin/empathy-accounts                  | /usr/lib64/libpulse.so.0.15.3
>  /usr/bin/cinnamon                          | /usr/lib64/libpulse.so.0.15.3
>  /usr/libexec/empathy-call                  | /usr/lib64/libpulse.so.0.15.3
>  /usr/bin/empathy-debugger                  | /usr/lib64/libpulse.so.0.15.3
>  /usr/bin/gnome-boxes                       | /usr/lib64/libpulse.so.0.15.3
>  /usr/bin/empathy                           | /usr/lib64/libpulse.so.0.15.3
>  /usr/libexec/empathy-chat                  | /usr/lib64/libpulse.so.0.15.3
> (13 rows)
> 
> So we have several desktop applications that have an ambiguous
> reference to json_object_get_type, via the pulseaudio library.

Yes, and that is a serious problem.

The problem is mainly that they fall afoul of the ELF rules for
interposition.
 
> The trouble with this is that's fairly difficult to detect. Static
> analysis misses collisions introduced by dlopen and dlsym.

Right, in this case you need a special-purpose analysis tool to
catch this, something that models the dynamic linker and ELF.
 
>> Warning! The answer changes if you link with `-l2 -l1' because it changes
>> the DT_NEEDED ordering which changes the order in which the graph
>> is traversed.
> 
> Thanks for the warning. I should record the order of DT_NEEDED
> elements in the database. I also need to reflect this in the
> elf_closure table in some way, to model the ld.so behavior more
> accurately.
> 
> If I understand things correctly, this unpredictability means that
> symbol collisions are always bad, even if they are working at
> present, because a change in the dependency graph could interpose a
> different definition in the future.

Symbol collisions are only bad if both symbols do not implement the
same ABI and API. If they do implement the same ABI and API then it's
a replacement function that is safe to interpose.

In your case it sounds like you have symbols in the global namespace
of all distribution shared libraries that collide and do different things.
That's dangerous. Someone needs to rename the function and break the API.
 
>> Beware that if you introduce cycles the problem becomes non-deterministic
>> and depends on where you break the cycle. We have several bugs open at
>> the moment against glibc to make the cycle breaking deterministic and to
>> enable testing for millions of permutations of N cycles to double check
>> that the present code does the right thing. We'd appreciate any contributions
>> in this area as the dynamic linker code is in my opinion in need of refactoring
>> and simplification to enable future development.
> 
> I'm a bit worried that we're facing scalability issues at the
> distribution level because we lack proper namespace management. I've
> seen that a number of libraries do not export internal symbols, which
> is good, but I think we either need some global namespace management
> (which will be difficult for APIs with fairly general names, such as
> MAPI), a different linking algorithm which provides better
> encapsulation (reducing accidental symbol interposition), or a more
> aggressive move towards symbol versioning as a namespace management
> tool. Or maybe something else altogether. This is a really
> complicated topic, and I don't think it can be considered in
> isolation from ld.so performance improvements.

Symbol versioning does not solve the problem in general either since
then you need a global version name management, and you need to fix
all applications to use versioning which is a huge amount of work.
Even then you can still have problems if the projects lack the rigour
required to update their version maps.

A different linking algorithm isn't helpful either because at static
link time you don't need the devel libraries for all dependent libraries,
and requiring it would make compiling anything much more complicated.

The only robust solution I see is a post-build tool that looks for
global namespace collisions and rejects the build if they exist. The
workaround might be to register your allowed symbol interpositions in
the spec file such that the post-build tool can use those to resolve
such allowances. Note that just stating that symbol X may be interposed
is not sufficient to make this system safe, you must say symbol X from
SONAME Y may interpose.

This is a very interesting discussion on a topic I'd not considered
before.

> This is not mere speculation, we already had symbol collisions which
> impacted customers.

I expected that your question was driven by some pragmatic need.

What's your next step?

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]