Created attachment 5840 [details] Test case The glibc dynamic linker behaves unpredictably when asked to look up a symbol in the base version. The GNU linker manual says: __asm__(".symver original_foo,foo@@"); foo@@ represents the symbol foo bound to the unspecified base version of the symbol. The apparent intent is to support changing a library from having no versions to one having versions. The idea is to put the oldest version of the symbol into the base version, and then to add new symbols to later versions. Executables which have no version for the symbol should then presumably link against the base version. However, because the glibc dynamic linker behaves unpredictably, this behaviour is useless. I have attached a test case. The tests ver_test_12 and ver_test_13 are identical except for the name of the function. The function is named 't1' in ver_test_12 and 'f1' in ver_test_13. When I run "make", ver_test_12 fails and ver_test_13 passes. I've gotten this result on Ubuntu Lucid using eglibc 2.11.1 and on Fedora Core 14 using glibc 2.13. I used the default GNU ld in both cases. Currently the gold linker rejects this usage because the behaviour is unpredictable. Using gold instead of GNU ld for the test case will give an error building ver_test_12b.so or ver_test_13b.so: ld: error: symbol t1 has undefined version However, the developers of the FUSE library are protesting, saying that the behaviour is documented in the GNU ld manual, and that it does happen to work for them. The unpredictable behaviour occurs because of how check_match nested within do_lookup_x in elf/dl-lookup.c handles symbols with no versions. The behaviour depends on the order in which the symbols are seen in the hash table. In order to know what the linkers should do in this case, I think we need to understand what the glibc developers believe is the correct behaviour here. Is the current behaviour a bug which should be fixed in glibc? Or is gold correct in refusing to create a shared library like this? There is some additional background at http://sourceforge.net/mailarchive/forum.php?thread_name=m3wrq48tso.fsf%40pepe.airs.com&forum_name=fuse-devel http://sourceware.org/bugzilla/show_bug.cgi?id=12261
Actual status of bug? Berkeley DB still fails with checking whether the C compiler works... no as you described here [1] glibc version 2.19. [1] https://blog.flameeyes.eu/2011/06/gold-readiness-obstacle-1-berkeley-db
We need the DSOs to reproduce this. With GNU ld version 2.25-17.fc23, ver_test_12b.so contains this: Version definition section [ 6] '.gnu.version_d' contains 3 entries: Addr: 0x0000000000000538 Offset: 0x000538 Link to section: [ 4] '.dynstr' 000000: Version: 1 Flags: BASE Index: 1 Cnt: 1 Name: ver_test_12b.so 0x001c: Version: 1 Flags: none Index: 2 Cnt: 1 Name: VER1 0x0038: Version: 1 Flags: none Index: 3 Cnt: 2 Name: VER2 0x0054: Parent 1: VER1 ver_test_13b.so looks like this: Version definition section [ 6] '.gnu.version_d' contains 3 entries: Addr: 0x0000000000000538 Offset: 0x000538 Link to section: [ 4] '.dynstr' 000000: Version: 1 Flags: BASE Index: 1 Cnt: 1 Name: ver_test_13b.so 0x001c: Version: 1 Flags: none Index: 2 Cnt: 1 Name: VER1 0x0038: Version: 1 Flags: none Index: 3 Cnt: 2 Name: VER2 0x0054: Parent 1: VER1 This means that the base version is VER1 in both cases. The closest thing we appear to have to a specification for our variant of ELF symbol versioning appears to be this: <https://www.akkadia.org/drepper/symbol-versioning> “ In case only the object file with the reference does not use versioning but the object with the definition does, then the reference only matches the base definition. The base definition is the one with index numbers 1 and 2 (1 is the unspecified name, 2 is the name given later to the baseline of symbols once the library started using symbol versioning). ” It's not clear to me if a binary is valid if provides conflicting definitions for version index 1 and 2. Our put differently, the binutils example appears to be misleading. With the DSOs generated by Fedora 23's ld.bfd, I cannot reproduce the difference in behavior between the two test cases. Perhaps the static linker has started to sort symbols with the same name in a predictable manner. I came across this old bug because I was trying to file a bug about the dlsym behavior. elf/dl-lookup.c currently looks like this: /* No specific version is selected. There are two ways we can got here: - a binary which does not include versioning information is loaded - dlsym() instead of dlvsym() is used to get a symbol which might exist in more than one form If the library does not provide symbol version information there is no problem at all: we simply use the symbol if it is defined. These two lookups need to be handled differently if the library defines versions. In the case of the old unversioned application the oldest (default) version should be used. In case of a dlsym() call the latest and public interface should be returned. */ if (verstab != NULL) { if ((verstab[symidx] & 0x7fff) >= ((flags & DL_LOOKUP_RETURN_NEWEST) ? 2 : 3)) { /* Don't accept hidden symbols. */ if ((verstab[symidx] & 0x8000) == 0 && (*num_versions)++ == 0) /* No version so far. */ *versioned_sym = sym; return NULL; } } This definitely causes weird behavior for dlsym lookups of _sys_errlist in libc.so.6. I currently get sys_errlist@GLIBC_2.4, which is neither the base version nor the default version.
Created attachment 9761 [details] Symbol overwrite example I found a different manifestation of this unpredictability w.r.t. the base version of symbols. I have attached an example program setup in which symbols are overwritten by the base version even though a library specifies that it needs a specific version. The example contains two libraries (A and B) that offer the symbol f. A does not define any version, but B does define it in version B_1.0. Then there are two libraries (AU and BU) that use these libraries, each using only one of them, and they offer the fAU and fBU functions. Then a program 'p' links both libraries AU and BU (See makefile for more information). At run time, the symbol f that is requested by BU with version B_1.0 gets matched with the base version one provided by library A. Even though the symbol with the adequate version is also available, the base version is returned. Here are the relevant symbol entries when doing objdump -TC libA.so: 0000000000000705 g DF .text 0000000000000012 Base f libB.so: 0000000000000765 g DF .text 0000000000000012 B_1.0 f libAU.so: 0000000000000000 DF *UND* 0000000000000000 f libBU.so: 0000000000000000 DF *UND* 0000000000000000 B_1.0 f It also happens the other way around, the f base version requested by AU is overwritten by f@@B_1.0 if changing the link order (-lBU -lAU). While this latter behaviour is also unexpected, matching the base version to a symbol containing a specific version sounds definitely unexpected to me. If you compile and run the program 'p', you should see something like this: AU::fAU() A::fA() A::f() A::f() BU::fBU() B::fB() A::f() A::f() If you add version information to library A (uncomment commented line in the makefile), you will see that the symbols get matched as expected and the program reports: AU::fAU() A::fA() A::f() A::f() BU::fBU() B::fB() B::f() B::f() I was able to reproduce this on ubuntu 14.04 (eglibc 2.19) and on ubuntu 16.04 (glibc 2.23). I also had a look at the source code and think that the check_match subroutine in elf/dl-lookup.c behaves unexpectedly. To my outsider eyes, it sounded like the 'else' case below could be at fault. It looks like it would return the default version if the version information does not match but a non-hidden default version exists. This would of course not be the case depending on "who-is-who" in these variables, but I could not get any further. const ElfW(Half) *verstab = map->l_versyms; if (version != NULL) { if (__builtin_expect (verstab == NULL, 0)) { /*...*/ } else { /* We can match the version information or use the default one if it is not hidden. */ ElfW(Half) ndx = verstab[symidx] & 0x7fff; if ((map->l_versions[ndx].hash != version->hash || strcmp (map->l_versions[ndx].name, version->name)) && (version->hidden || map->l_versions[ndx].hash || (verstab[symidx] & 0x8000))) /* It's not the version we want. */ return NULL; } /*...*/ I tried adding a version_matched flag and keep looking in other 'scopes' (see for loop in function _dl_lookup_symbol_x) for a better match instead of returning the first one but it did not immediately work so I decided to report it here instead. Does this example help you reproduce this?
(In reply to Diego Barrios Romero from comment #3) > At run time, the symbol f that is requested by BU with version B_1.0 gets > matched with the base version one provided by library A. Even though the > symbol with the adequate version is also available, the base version is > returned. Do you really mean “base version” here? Isn't this an unversioned symbol? Ulrich Drepper's description of symbol versioning is pretty clear that unversioned symbols preempt all versioned symbols of the same name, no matter what the symbol version is. I believe the original intent for this design choice was that it allows to introduce symbol versioning in a future library version, while still permitting existing binaries to interpose those symbols. No matter what the original choice was, we today have many interposing libraries (such as alternative malloc implementations) which depend on this particular behavior of the dynamic linker. I agree this behavior is surprising, but the behavior described in comment 3 does not qualify as a bug. Let's keep this bug open for the original issue (dlsym result depends on hash chain ordering in a surprising way).
(In reply to Florian Weimer from comment #4) I meant undefined version. Sorry I got them mixed. Thanks for the clarification.