Bug 10238 - Gold linker does not resolve symbols using indirect dependencies
Summary: Gold linker does not resolve symbols using indirect dependencies
Status: RESOLVED WONTFIX
Alias: None
Product: binutils
Classification: Unclassified
Component: gold (show other bugs)
Version: 2.22
: P2 normal
Target Milestone: ---
Assignee: Ian Lance Taylor
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-04 01:10 UTC by apratt
Modified: 2012-12-11 09:10 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build: 2.19.51
Last reconfirmed:


Attachments
Shell script test case, demonstrates the bug. Edit GOLDBINDIR before running. (894 bytes, text/plain)
2009-06-04 01:14 UTC, apratt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description apratt 2009-06-04 01:10:53 UTC
The "gold" linker in the weekly build 2.19.51 does not resolve unresolved
symbols using indirect dependent shared libraries, while the default linker does
do so.

I found this on x86_64 Linux RHEL 4 U6. The default linker I have is GNU ld
version 2.15.92.0.2 20040927.

If you have an object file "top.o" that depends on a symbol exported by some
library, the default linker can resolve that undefined symbol even if you don't
mention the library on the link line, as long as you mention some library that
depends on the needed library. But the gold linker won't resolve that way and
reports an undefined symbol for the same command line.
Comment 1 apratt 2009-06-04 01:14:45 UTC
Created attachment 3981 [details]
Shell script test case, demonstrates the bug. Edit GOLDBINDIR before running.

This shell script demonstrates the bug by creating three source files,
compiling two of them into shared libraries, and compiling the third to use
those libraries. It runs the system linker first to confirm that the link
succeeds, then runs the gold linker to show that it fails. You have to edit the
script before you run it, to set the GOLDBINDIR variable to the path where the
gold "ld" executable appears.
Comment 2 Ian Lance Taylor 2009-06-04 16:17:09 UTC
I haven't tried your test case yet, but in general this is intended behaviour
for gold.  The GNU linker goes to considerable effort to replicate the search
path used by the dynamic linker.  This leads to issues of the program linker and
the dynamic linker getting out of synch and finding different libraries.  I
don't think that doing this searching in the program linker is necessary, and I
chose not to implement it in gold.  So, given that the difference is
intentional, can you explain whether this is an important feature that gold
should implement, and why?  Without a good reason, my inclination is to close
this bug report as WONTFIX.  Thanks.
Comment 3 apratt 2009-06-04 17:04:05 UTC
I had understood that gold was to be a drop-in replacement for the system
linkers on the platforms it supports, accepting the same inputs and performing
valid (though much faster) links on them. I reported this issue because I came
across it "in the wild": a link line that works with the host linker but not
with gold, a classic case of incompatibility.

This isn't quite the same as replicating the load-time library search, because
once the program links it forgets which library satisfied the undefined symbols.
If the linker and the loader find different libraries, the semantics are still
satisfied as long as the libraries that are found export the symbols you use.

I didn't construct this test case out of whole cloth: I encountered this
incompatibility while testing IBM Rational PurifyPlus against the gold linker.
That is, I found this incompatibility using a real-world, shipping product that
produces linker command lines that work with the default linker.

Of course it's possible to work around this by changing PurifyPlus, but that's
not how I understood the goals of the "gold" project: I thought users who adopt
gold would not be expected to go back to other tool vendors and ask for changes
to support it. That's why I filed this bug: to call attention to this missing
feature/incompatibility with the default GNU linker.
Comment 4 Ian Lance Taylor 2009-06-05 05:05:34 UTC
I should say: thanks for the bug report.  I appreciate it.

gold is not intended to be a precise replacement for the GNU linker.  The GNU
linker has too much history and is the result of too many odd decisions (many
made by myself).  gold is intended to be be a 98% replacement, but there are a
number of known incompatibilities.  This is one of them.

That said, if there is a good reason that gold should implement this feature,
I'm willing to consider it.  But precise compatibility with the GNU linker is
not yet enough of a reason to convince me.

It's not correct that it doesn't matter which library the linker finds at link
time.  If the library has version information in it, then there can be trouble
if the dynamic linker finds a different library.  Also, if the symbol being
resolved is a data symbol, the size of the symbol will be copied into the
executable.  If a COPY reloc is created, and the size of the symbol does not
match the size of the symbol in the library found by the dynamic linker, the
program will fail at runtime.  These are uncommon issues, but real enough that
it's fairly important in practice that both linkers use the same search path.

Your script did not work for me until I added -Wl,-rpath,. to the GNU linker
command line.  I'm not sure why you didn't need that.

Do you happen to know why PurifyPlus uses this feature of the GNU linker?  It's
not a feature of other ELF linkers.
Comment 5 Mark Wielaard 2009-07-25 09:16:39 UTC
I also hit this and I must admit it is slightly confusing at first. We were
linking -lnss3 which works fine with GNU ld, but with GNU gold you suddenly get
lots of unresolved references to PR_ functions. If you know about this
bug/feature then it is easy to figure out you need to add -lnsrp4 explicitly.
But otherwise (especially in a large final link command) it is a bit mystifying
why the "replacement linker" didn't work.

g++ -Wall -Werror  -g -O2 -Werror -fstack-protector-all -D_FORTIFY_SOURCE=2   
-o stap stap-main.o stap-parse.o stap-staptree.o stap-elaborate.o
stap-translate.o stap-tapsets.o stap-buildrun.o stap-loc2c.o stap-hash.o
stap-mdfour.o stap-cache.o stap-util.o stap-coveragedb.o stap-dwarf_wrappers.o
stap-tapset-been.o stap-tapset-procfs.o stap-tapset-timers.o
stap-tapset-perfmon.o stap-tapset-mark.o stap-tapset-itrace.o
stap-tapset-utrace.o stap-task_finder.o stap-dwflpp.o stap-rpm_finder.o
stap-modsign.o stap-nsscommon.o -Wl,--start-group -ldw -lebl -Wl,--end-group
-lelf -lsqlite3 -lrpm -lnss3 -ldl
/usr/local/binutils/bin/ld: stap-modsign.o: in function
check_cert_db_path(std::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&):/home/mark/src/systemtap/modsign.cxx:309: error:
undefined reference to 'PR_GetFileInfo'
[... and lots and lots more ...]

Spot the missing -lnspr4 in the above :)
Comment 6 apratt 2009-07-27 18:24:25 UTC
Reviewing this bug (due to this weekend's new comment), I notice I never
answered the question about why PurifyPlus is using this indirect-link feature.

I spoke with the developer who did the original Linux port, and it sounds like
this amounted to a workaround for a bug in the stock Linux ld. The
strong-vs-weak symbol resolution wasn't working as expected. Starting with
RedHad 8's libc-2.3.2, some versioned symbols like pthread_cond_wait were made
"strong hidden" but the linker was still picking a "weak hidden" symbol from
another library instead. (The "other library" is a stub library we provide in
case you don't link with libthread.) The choice appeared to depend on the order
that tables were getting built inside the linker rather than the strong-vs-weak
attributes of the symbols. The only way we found of getting the linker to pick
the right symbols was to make it see the instrumented libc indirectly, not on
the command line. 

We've been doing it that way ever since, with no reason to re-investigate the
issue or change the solution, even though much has changed in Linux libraries
(and probably in the linker) since then. 

This is not to say that there are no other possible solutions, only to explain
why we started using the indirect symbol resolution feature at all.
Comment 7 matt rice 2009-09-16 06:27:36 UTC
I would like to comment that I appreciate the lack of this feature,
I once had a typo in a makefile variable which was reported to me by a user of
either an old gnu-ld or a proprietary linker which didn't support this feature,
i am not sure, but it bothered me at the time that I was unable to find a way to
disable this feature of gnu-ld so I could test it myself, if you do end up
adding it please consider an option to disable it.

Mark Wielaard said:
>If you know about this
>bug/feature then it is easy to figure out you need to add -lnsrp4 explicitly.

Maybe the error reporting could be expanded, to do this lookup when the linker
finds unresolved symbols, then report the indirect shared libraries providing
the symbols?

looking at failures at 
http://people.debian.org/~lucas/logs/2009/07/13-binutils-gold/list_failures.txt

it looks like this is likely to be the cause of a large percentage of the failures.
Comment 8 Frank Ch. Eigler 2009-10-12 16:10:40 UTC
IMO, ld's automagic searching is a good thing.  Asking a program to enumerate
all the indirect dependencies of shared libraries is a burden that they may not
be equipped to carry.  How do you envision this be automatable?  readelf to get
DT_NEEDED notes, and form that synthesize -lMMM calls?

It is as if header files didn't #include their own dependencies, forcing main.c
to do include a topological sorted list of all headers.  It's possible but unfair.
Comment 9 Ian Lance Taylor 2009-10-12 20:02:24 UTC
To be clear, gold does not require that you enumerate all indirect dependencies
of shared libraries.  gold will not complain if a shared library refers to a
function defined in some dependency of that shared library.

What gold requires is that you enumerate all direct dependencies of the program
itself.  If your program calls foo(), then you must explicitly link against some
library which defines foo().  The GNU linker permits foo() to be defined
indirectly, by a dependency of some shared library which you do explicitly link
against.  gold does not search those indirect dependencies for symbol definitions.
Comment 10 Ian Lance Taylor 2009-10-12 20:15:38 UTC
Carrying on, it's true that gold's behaviour does impose a burden when using
shared libraries which come in bundles.  If a package provides a shared library
which includes other shared libraries, and the interface of the package is
intended to be the union of all symbols defined in all those shared libraries,
then gold's behaviour is suboptimal.  So, I guess the question is: how common is
that?
Comment 11 apratt 2009-10-12 22:25:24 UTC
Pretty common, based on the link in comment #7. The vast majority of those
failures are due to unresolved symbols, and it's possible many (most? virtually
all?) of them are due to programs expecting the old behavior: symbol resolution
via indirect shared-library dependencies. 

There appears to have been a specific design decision NOT to support indirect
symbol resolution in gold. While this can seem "more correct" from one
perspective, I think gold could end up like a new compiler that is so strictly
standards-compliant it doesn't accept real-world, existing source code. 

Since the linker is a system-wide choice, a user or developer will be reluctant
to install gold if there's a good chance downloaded source projects won't work
with it. You could be creating the best linker nobody uses.
Comment 12 Ian Lance Taylor 2009-10-12 22:35:23 UTC
Comment #7 does not necessarily indicate that there are a lot of packages which
provide a union-of-defined-symbols interface.  What is indicates is that a lot
of people think that linking against the KDE or GNOME libraries also links
against the X11 libraries.

I didn't make this decision on the basis of an abstract standard of correctness.
 In areas like linker scripts I've adopted the GNU linker behaviour even when it
seems abtractly wrong.  I made this decision because the code in the GNU linker
which does this is ugly and fragile.  It was developed over many years in
response to changes in the dynamic linker.  In order to work correctly in all
cases it must precisely duplicate the dynamic linker, but the dynamic linker
changes over time.

In all cases that I am currently aware of, the fix to use gold is to add a -l
option or two.  I think most package maintainers will be willing to do that once
they are aware of the issue.  Obviously I could be wrong, either about the cause
of the problem or the willingness of package maintainers to change.  However, I
would like to act on the basis of real data rather than speculation.
Comment 13 Frank Ch. Eigler 2009-10-12 23:53:48 UTC
I'm confused about whether gold's lack of DT_NEEDED resolution is
intended to affect only pure-indirect or merely mixed-direct-indirect
dependencies.  Specifically:

liba { int a() { return b(); } }
libb { int b() { return 0; } }

main() { a() }   -la   # libb a pure indirect dependency
- versus -
main() { b() }   -la   # libb a direct dependency

If the latter, I'm more sympathetic to the desire to have a program state
its own direct depencencies.

If the former, I'm more sympathetic to a program not having to know all the
indirect dependencies of all of its shared libraries.
Comment 14 apratt 2009-10-13 00:27:47 UTC
It's the second one, with a variation. I wouldn't expect your second example to
link successfully as written. If you change it so main() calls both a() and b(),
it will link with today's GNU linker. That's because when liba comes in (thanks
to the call to a), the symbols from its DT_NEEDED libraries are also visible for
resolving symbols used in main(), like b. Not so with gold.

On the one hand you can say that a link line "should" express the direct,
first-order dependencies of the program being linked. But with today's GNU
linker, a project's link line does not have to do so. That's what's at issue.
Gold risks suffering upon release with this kind of review: "It's great but
there are a bunch of existing projects for which it doesn't work, so if you make
it your system linker you risk having to tinker with projects you download and
build."

Over time, I'm sure projects would adapt - at least, those which are being
maintained. But user resistance at first release could be a problem for
widespread adoption. Of course, whether that's a problem really depends on your
goals.

I'm going to stop advocating either way because I don't really have a dog in
this hunt. For our part, I'm sure we can make PurifyPlus work. I realize the
current GNU linker behavior is ill-specified and variable, and it's probably
hard to intentionally match any of the organically-evolved implementations. But
I would worry about having built-in barriers to adoption out of the gate. I hope
Ian can "collect real data rather than speculation" before the initial (wide)
release, and version 1.0 gets the kind of reception he and the other project
members desire.
Comment 15 Ian Lance Taylor 2009-10-13 00:54:46 UTC
As far as I am concerned, gold has been released.  The question now is what
changes distros will want to see before picking it up as the default linker for
those targets which it supports.
Comment 16 Roland McGrath 2009-10-14 22:18:29 UTC
Please make gold accept and ignore the --no-add-needed switch so there is a
single command line that has the same semantics for both ld implementations.
Comment 17 Ian Lance Taylor 2011-07-09 00:24:38 UTC
This bug has been silent for a year and a half and I'm not seeing any increased pushback on this issue.  I'm going to close it.
Comment 18 mattijs.janssens 2012-12-10 15:59:09 UTC
This feature is causing us quite a headache. We are developing an open source application (OpenFOAM) which is chock full of models and models of models which are in separate libraries. We have always used the facility of indirect linkage so a library needs to link in only those libraries it directly calls, and not those that those libraries need. Works great.

If we want to use gold we suddenly need to specify all the indirectly used libraries. Why should a user of our libraries need to know that e.g. the turbulence library internally depends on the liquid properties library (and about 10 more)?

From my point of view: I have gone through all my dependencies and not link in more than needed and have 'told' the linker so with --copy-dt-needed-entries, --no-as-needed. But it seems to ignore these now.

Below a modification of the testscript which demonstrates the indirect linking problem at the shared library level.

My question: can we please keep/have an option to tell the linker to do indirect linkage.

Thanks,

Mattijs


# Build libl3.so with no dependents
echo 'l3() { ; }' > l3.c
gcc -Xlinker --no-as-needed -Xlinker  --copy-dt-needed-entries -Xlinker -rpath=. -shared -fPIC -o libl3.so l3.c

# Build libl2.so  that depends on libl3.so
echo 'l2() { l3(); }' > l2.c
gcc -Xlinker --no-as-needed -Xlinker  --copy-dt-needed-entries -Xlinker -rpath=. -shared -fPIC -o libl2.so l2.c -L. -ll3

# Build libl1.so that depends on libl2.so
echo 'l1() { l2(); }' > l1.c
gcc -Xlinker --no-as-needed -Xlinker  --copy-dt-needed-entries -Xlinker -rpath=. -shared -fPIC -o libl1.so l1.c -L. -ll2

# Build main source file which depends on l1 only (so indirectly on l2)
echo 'main() { l1(); }' > top.c
gcc top.c -L. -ll1
Comment 19 H.J. Lu 2012-12-10 16:46:15 UTC
(In reply to comment #18)
> # Build libl3.so with no dependents
> echo 'l3() { ; }' > l3.c
> gcc -Xlinker --no-as-needed -Xlinker  --copy-dt-needed-entries -Xlinker
> -rpath=. -shared -fPIC -o libl3.so l3.c
> 
> # Build libl2.so  that depends on libl3.so
> echo 'l2() { l3(); }' > l2.c
> gcc -Xlinker --no-as-needed -Xlinker  --copy-dt-needed-entries -Xlinker
> -rpath=. -shared -fPIC -o libl2.so l2.c -L. -ll3
> 
> # Build libl1.so that depends on libl2.so
> echo 'l1() { l2(); }' > l1.c
> gcc -Xlinker --no-as-needed -Xlinker  --copy-dt-needed-entries -Xlinker
> -rpath=. -shared -fPIC -o libl1.so l1.c -L. -ll2
> 
> # Build main source file which depends on l1 only (so indirectly on l2)
> echo 'main() { l1(); }' > top.c
> gcc top.c -L. -ll1

Please re-try this with bfd linker on trunk.  I just fixed a
--copy-dt-needed-entries -shared bug:

http://sourceware.org/bugzilla/show_bug.cgi?id=14915
Comment 20 Ian Lance Taylor 2012-12-10 17:22:17 UTC
When using gold you need to list the shared libraries that define symbols that you refer to directly.  You do not need to list libraries that define symbols that your shared libraries refer to.

> We have always used the facility of indirect linkage
> so a library needs to link in only those libraries it directly calls, and not
> those that those libraries need.

Yes, that is how gold works.

If it's not working for you, then something else is going on.

One possibility is this: gold will warn about undefined symbols in shared libraries for which gold has seen all the DT_NEEDED entries.  So if your shared libraries rely on picking up symbols from shared libraries that they do not explicitly depend on, you will get an undefined symbol error.  You can avoid that by using the --allow-shlib-undefined option.

Otherwise, you'll need to provide more details.
Comment 21 mattijs.janssens 2012-12-11 09:10:30 UTC
When I run that little test code above with 

GNU gold (GNU Binutils for Ubuntu 2.22) 1.11

I get:

gcc -Xlinker --no-as-needed -Xlinker  --copy-dt-needed-entries -Xlinker -rpath=. -shared -fPIC -o libl1.so l1.c -L. -ll2
/usr/bin/ld: error: --copy-dt-needed-entries is not supported but is required for libl3.so in ./libl2.so
collect2: ld returned 1 exit status


So: l1 depends directly only on l2 and indirectly on l3. How do I get this to work?

(I'll try the '--allow-shlib-undefined' but find it a bit strange - what is undefined? I've specified all the link references)