Summary: | Gold linker does not resolve symbols using indirect dependencies | ||
---|---|---|---|
Product: | binutils | Reporter: | apratt |
Component: | gold | Assignee: | Ian Lance Taylor <ian> |
Status: | RESOLVED WONTFIX | ||
Severity: | normal | CC: | bug-binutils, fche, kirill, mattijs.janssens, mnowak, ratmice+bugzilla |
Priority: | P2 | ||
Version: | 2.22 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | 2.19.51 | Last reconfirmed: | |
Attachments: | Shell script test case, demonstrates the bug. Edit GOLDBINDIR before running. |
Description
apratt
2009-06-04 01:10:53 UTC
Created attachment 3981 [details]
Shell script test case, demonstrates the bug. Edit GOLDBINDIR before running.
This shell script demonstrates the bug by creating three source files,
compiling two of them into shared libraries, and compiling the third to use
those libraries. It runs the system linker first to confirm that the link
succeeds, then runs the gold linker to show that it fails. You have to edit the
script before you run it, to set the GOLDBINDIR variable to the path where the
gold "ld" executable appears.
I haven't tried your test case yet, but in general this is intended behaviour for gold. The GNU linker goes to considerable effort to replicate the search path used by the dynamic linker. This leads to issues of the program linker and the dynamic linker getting out of synch and finding different libraries. I don't think that doing this searching in the program linker is necessary, and I chose not to implement it in gold. So, given that the difference is intentional, can you explain whether this is an important feature that gold should implement, and why? Without a good reason, my inclination is to close this bug report as WONTFIX. Thanks. I had understood that gold was to be a drop-in replacement for the system linkers on the platforms it supports, accepting the same inputs and performing valid (though much faster) links on them. I reported this issue because I came across it "in the wild": a link line that works with the host linker but not with gold, a classic case of incompatibility. This isn't quite the same as replicating the load-time library search, because once the program links it forgets which library satisfied the undefined symbols. If the linker and the loader find different libraries, the semantics are still satisfied as long as the libraries that are found export the symbols you use. I didn't construct this test case out of whole cloth: I encountered this incompatibility while testing IBM Rational PurifyPlus against the gold linker. That is, I found this incompatibility using a real-world, shipping product that produces linker command lines that work with the default linker. Of course it's possible to work around this by changing PurifyPlus, but that's not how I understood the goals of the "gold" project: I thought users who adopt gold would not be expected to go back to other tool vendors and ask for changes to support it. That's why I filed this bug: to call attention to this missing feature/incompatibility with the default GNU linker. I should say: thanks for the bug report. I appreciate it. gold is not intended to be a precise replacement for the GNU linker. The GNU linker has too much history and is the result of too many odd decisions (many made by myself). gold is intended to be be a 98% replacement, but there are a number of known incompatibilities. This is one of them. That said, if there is a good reason that gold should implement this feature, I'm willing to consider it. But precise compatibility with the GNU linker is not yet enough of a reason to convince me. It's not correct that it doesn't matter which library the linker finds at link time. If the library has version information in it, then there can be trouble if the dynamic linker finds a different library. Also, if the symbol being resolved is a data symbol, the size of the symbol will be copied into the executable. If a COPY reloc is created, and the size of the symbol does not match the size of the symbol in the library found by the dynamic linker, the program will fail at runtime. These are uncommon issues, but real enough that it's fairly important in practice that both linkers use the same search path. Your script did not work for me until I added -Wl,-rpath,. to the GNU linker command line. I'm not sure why you didn't need that. Do you happen to know why PurifyPlus uses this feature of the GNU linker? It's not a feature of other ELF linkers. I also hit this and I must admit it is slightly confusing at first. We were linking -lnss3 which works fine with GNU ld, but with GNU gold you suddenly get lots of unresolved references to PR_ functions. If you know about this bug/feature then it is easy to figure out you need to add -lnsrp4 explicitly. But otherwise (especially in a large final link command) it is a bit mystifying why the "replacement linker" didn't work. g++ -Wall -Werror -g -O2 -Werror -fstack-protector-all -D_FORTIFY_SOURCE=2 -o stap stap-main.o stap-parse.o stap-staptree.o stap-elaborate.o stap-translate.o stap-tapsets.o stap-buildrun.o stap-loc2c.o stap-hash.o stap-mdfour.o stap-cache.o stap-util.o stap-coveragedb.o stap-dwarf_wrappers.o stap-tapset-been.o stap-tapset-procfs.o stap-tapset-timers.o stap-tapset-perfmon.o stap-tapset-mark.o stap-tapset-itrace.o stap-tapset-utrace.o stap-task_finder.o stap-dwflpp.o stap-rpm_finder.o stap-modsign.o stap-nsscommon.o -Wl,--start-group -ldw -lebl -Wl,--end-group -lelf -lsqlite3 -lrpm -lnss3 -ldl /usr/local/binutils/bin/ld: stap-modsign.o: in function check_cert_db_path(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&):/home/mark/src/systemtap/modsign.cxx:309: error: undefined reference to 'PR_GetFileInfo' [... and lots and lots more ...] Spot the missing -lnspr4 in the above :) Reviewing this bug (due to this weekend's new comment), I notice I never answered the question about why PurifyPlus is using this indirect-link feature. I spoke with the developer who did the original Linux port, and it sounds like this amounted to a workaround for a bug in the stock Linux ld. The strong-vs-weak symbol resolution wasn't working as expected. Starting with RedHad 8's libc-2.3.2, some versioned symbols like pthread_cond_wait were made "strong hidden" but the linker was still picking a "weak hidden" symbol from another library instead. (The "other library" is a stub library we provide in case you don't link with libthread.) The choice appeared to depend on the order that tables were getting built inside the linker rather than the strong-vs-weak attributes of the symbols. The only way we found of getting the linker to pick the right symbols was to make it see the instrumented libc indirectly, not on the command line. We've been doing it that way ever since, with no reason to re-investigate the issue or change the solution, even though much has changed in Linux libraries (and probably in the linker) since then. This is not to say that there are no other possible solutions, only to explain why we started using the indirect symbol resolution feature at all. I would like to comment that I appreciate the lack of this feature, I once had a typo in a makefile variable which was reported to me by a user of either an old gnu-ld or a proprietary linker which didn't support this feature, i am not sure, but it bothered me at the time that I was unable to find a way to disable this feature of gnu-ld so I could test it myself, if you do end up adding it please consider an option to disable it. Mark Wielaard said: >If you know about this >bug/feature then it is easy to figure out you need to add -lnsrp4 explicitly. Maybe the error reporting could be expanded, to do this lookup when the linker finds unresolved symbols, then report the indirect shared libraries providing the symbols? looking at failures at http://people.debian.org/~lucas/logs/2009/07/13-binutils-gold/list_failures.txt it looks like this is likely to be the cause of a large percentage of the failures. IMO, ld's automagic searching is a good thing. Asking a program to enumerate all the indirect dependencies of shared libraries is a burden that they may not be equipped to carry. How do you envision this be automatable? readelf to get DT_NEEDED notes, and form that synthesize -lMMM calls? It is as if header files didn't #include their own dependencies, forcing main.c to do include a topological sorted list of all headers. It's possible but unfair. To be clear, gold does not require that you enumerate all indirect dependencies of shared libraries. gold will not complain if a shared library refers to a function defined in some dependency of that shared library. What gold requires is that you enumerate all direct dependencies of the program itself. If your program calls foo(), then you must explicitly link against some library which defines foo(). The GNU linker permits foo() to be defined indirectly, by a dependency of some shared library which you do explicitly link against. gold does not search those indirect dependencies for symbol definitions. Carrying on, it's true that gold's behaviour does impose a burden when using shared libraries which come in bundles. If a package provides a shared library which includes other shared libraries, and the interface of the package is intended to be the union of all symbols defined in all those shared libraries, then gold's behaviour is suboptimal. So, I guess the question is: how common is that? Pretty common, based on the link in comment #7. The vast majority of those failures are due to unresolved symbols, and it's possible many (most? virtually all?) of them are due to programs expecting the old behavior: symbol resolution via indirect shared-library dependencies. There appears to have been a specific design decision NOT to support indirect symbol resolution in gold. While this can seem "more correct" from one perspective, I think gold could end up like a new compiler that is so strictly standards-compliant it doesn't accept real-world, existing source code. Since the linker is a system-wide choice, a user or developer will be reluctant to install gold if there's a good chance downloaded source projects won't work with it. You could be creating the best linker nobody uses. Comment #7 does not necessarily indicate that there are a lot of packages which provide a union-of-defined-symbols interface. What is indicates is that a lot of people think that linking against the KDE or GNOME libraries also links against the X11 libraries. I didn't make this decision on the basis of an abstract standard of correctness. In areas like linker scripts I've adopted the GNU linker behaviour even when it seems abtractly wrong. I made this decision because the code in the GNU linker which does this is ugly and fragile. It was developed over many years in response to changes in the dynamic linker. In order to work correctly in all cases it must precisely duplicate the dynamic linker, but the dynamic linker changes over time. In all cases that I am currently aware of, the fix to use gold is to add a -l option or two. I think most package maintainers will be willing to do that once they are aware of the issue. Obviously I could be wrong, either about the cause of the problem or the willingness of package maintainers to change. However, I would like to act on the basis of real data rather than speculation. I'm confused about whether gold's lack of DT_NEEDED resolution is intended to affect only pure-indirect or merely mixed-direct-indirect dependencies. Specifically: liba { int a() { return b(); } } libb { int b() { return 0; } } main() { a() } -la # libb a pure indirect dependency - versus - main() { b() } -la # libb a direct dependency If the latter, I'm more sympathetic to the desire to have a program state its own direct depencencies. If the former, I'm more sympathetic to a program not having to know all the indirect dependencies of all of its shared libraries. It's the second one, with a variation. I wouldn't expect your second example to link successfully as written. If you change it so main() calls both a() and b(), it will link with today's GNU linker. That's because when liba comes in (thanks to the call to a), the symbols from its DT_NEEDED libraries are also visible for resolving symbols used in main(), like b. Not so with gold. On the one hand you can say that a link line "should" express the direct, first-order dependencies of the program being linked. But with today's GNU linker, a project's link line does not have to do so. That's what's at issue. Gold risks suffering upon release with this kind of review: "It's great but there are a bunch of existing projects for which it doesn't work, so if you make it your system linker you risk having to tinker with projects you download and build." Over time, I'm sure projects would adapt - at least, those which are being maintained. But user resistance at first release could be a problem for widespread adoption. Of course, whether that's a problem really depends on your goals. I'm going to stop advocating either way because I don't really have a dog in this hunt. For our part, I'm sure we can make PurifyPlus work. I realize the current GNU linker behavior is ill-specified and variable, and it's probably hard to intentionally match any of the organically-evolved implementations. But I would worry about having built-in barriers to adoption out of the gate. I hope Ian can "collect real data rather than speculation" before the initial (wide) release, and version 1.0 gets the kind of reception he and the other project members desire. As far as I am concerned, gold has been released. The question now is what changes distros will want to see before picking it up as the default linker for those targets which it supports. Please make gold accept and ignore the --no-add-needed switch so there is a single command line that has the same semantics for both ld implementations. This bug has been silent for a year and a half and I'm not seeing any increased pushback on this issue. I'm going to close it. This feature is causing us quite a headache. We are developing an open source application (OpenFOAM) which is chock full of models and models of models which are in separate libraries. We have always used the facility of indirect linkage so a library needs to link in only those libraries it directly calls, and not those that those libraries need. Works great. If we want to use gold we suddenly need to specify all the indirectly used libraries. Why should a user of our libraries need to know that e.g. the turbulence library internally depends on the liquid properties library (and about 10 more)? From my point of view: I have gone through all my dependencies and not link in more than needed and have 'told' the linker so with --copy-dt-needed-entries, --no-as-needed. But it seems to ignore these now. Below a modification of the testscript which demonstrates the indirect linking problem at the shared library level. My question: can we please keep/have an option to tell the linker to do indirect linkage. Thanks, Mattijs # Build libl3.so with no dependents echo 'l3() { ; }' > l3.c gcc -Xlinker --no-as-needed -Xlinker --copy-dt-needed-entries -Xlinker -rpath=. -shared -fPIC -o libl3.so l3.c # Build libl2.so that depends on libl3.so echo 'l2() { l3(); }' > l2.c gcc -Xlinker --no-as-needed -Xlinker --copy-dt-needed-entries -Xlinker -rpath=. -shared -fPIC -o libl2.so l2.c -L. -ll3 # Build libl1.so that depends on libl2.so echo 'l1() { l2(); }' > l1.c gcc -Xlinker --no-as-needed -Xlinker --copy-dt-needed-entries -Xlinker -rpath=. -shared -fPIC -o libl1.so l1.c -L. -ll2 # Build main source file which depends on l1 only (so indirectly on l2) echo 'main() { l1(); }' > top.c gcc top.c -L. -ll1 (In reply to comment #18) > # Build libl3.so with no dependents > echo 'l3() { ; }' > l3.c > gcc -Xlinker --no-as-needed -Xlinker --copy-dt-needed-entries -Xlinker > -rpath=. -shared -fPIC -o libl3.so l3.c > > # Build libl2.so that depends on libl3.so > echo 'l2() { l3(); }' > l2.c > gcc -Xlinker --no-as-needed -Xlinker --copy-dt-needed-entries -Xlinker > -rpath=. -shared -fPIC -o libl2.so l2.c -L. -ll3 > > # Build libl1.so that depends on libl2.so > echo 'l1() { l2(); }' > l1.c > gcc -Xlinker --no-as-needed -Xlinker --copy-dt-needed-entries -Xlinker > -rpath=. -shared -fPIC -o libl1.so l1.c -L. -ll2 > > # Build main source file which depends on l1 only (so indirectly on l2) > echo 'main() { l1(); }' > top.c > gcc top.c -L. -ll1 Please re-try this with bfd linker on trunk. I just fixed a --copy-dt-needed-entries -shared bug: http://sourceware.org/bugzilla/show_bug.cgi?id=14915 When using gold you need to list the shared libraries that define symbols that you refer to directly. You do not need to list libraries that define symbols that your shared libraries refer to.
> We have always used the facility of indirect linkage
> so a library needs to link in only those libraries it directly calls, and not
> those that those libraries need.
Yes, that is how gold works.
If it's not working for you, then something else is going on.
One possibility is this: gold will warn about undefined symbols in shared libraries for which gold has seen all the DT_NEEDED entries. So if your shared libraries rely on picking up symbols from shared libraries that they do not explicitly depend on, you will get an undefined symbol error. You can avoid that by using the --allow-shlib-undefined option.
Otherwise, you'll need to provide more details.
When I run that little test code above with GNU gold (GNU Binutils for Ubuntu 2.22) 1.11 I get: gcc -Xlinker --no-as-needed -Xlinker --copy-dt-needed-entries -Xlinker -rpath=. -shared -fPIC -o libl1.so l1.c -L. -ll2 /usr/bin/ld: error: --copy-dt-needed-entries is not supported but is required for libl3.so in ./libl2.so collect2: ld returned 1 exit status So: l1 depends directly only on l2 and indirectly on l3. How do I get this to work? (I'll try the '--allow-shlib-undefined' but find it a bit strange - what is undefined? I've specified all the link references) |