Bug 12977 - glibc dynamic linker behaves unpredictable when using base version
Summary: glibc dynamic linker behaves unpredictable when using base version
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.13
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: glibc_2.13, glibc_2.19
Depends on:
Blocks: 12261
  Show dependency treegraph
 
Reported: 2011-07-09 05:32 UTC by Ian Lance Taylor
Modified: 2017-01-19 06:39 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Test case (1.13 KB, application/x-bzip)
2011-07-09 05:32 UTC, Ian Lance Taylor
Details
Symbol overwrite example (1.11 KB, application/x-bzip)
2017-01-18 17:58 UTC, Diego Barrios Romero
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ian Lance Taylor 2011-07-09 05:32:30 UTC
Created attachment 5840 [details]
Test case

The glibc dynamic linker behaves unpredictably when asked to look up a symbol in the base version.

The GNU linker manual says:

    __asm__(".symver original_foo,foo@@");

    foo@@ represents the symbol foo bound to the unspecified base version of the symbol.

The apparent intent is to support changing a library from having no versions to one having versions.  The idea is to put the oldest version of the symbol into the base version, and then to add new symbols to later versions.  Executables which have no version for the symbol should then presumably link against the base version.

However, because the glibc dynamic linker behaves unpredictably, this behaviour is useless.

I have attached a test case.  The tests ver_test_12 and ver_test_13 are identical except for the name of the function.  The function is named 't1' in ver_test_12 and 'f1' in ver_test_13.  When I run "make", ver_test_12 fails and ver_test_13 passes.  I've gotten this result on Ubuntu Lucid using eglibc 2.11.1 and on Fedora Core 14 using glibc 2.13. I used the default GNU ld in both cases.

Currently the gold linker rejects this usage because the behaviour is unpredictable.  Using gold instead of GNU ld for the test case will give an error building ver_test_12b.so or ver_test_13b.so:
    ld: error: symbol t1 has undefined version
However, the developers of the FUSE library are protesting, saying that the behaviour is documented in the GNU ld manual, and that it does happen to work for them.

The unpredictable behaviour occurs because of how check_match nested within do_lookup_x in elf/dl-lookup.c handles symbols with no versions.  The behaviour depends on the order in which the symbols are seen in the hash table.

In order to know what the linkers should do in this case, I think we need to understand what the glibc developers believe is the correct behaviour here.  Is the current behaviour a bug which should be fixed in glibc?  Or is gold correct in refusing to create a shared library like this?

There is some additional background at

http://sourceforge.net/mailarchive/forum.php?thread_name=m3wrq48tso.fsf%40pepe.airs.com&forum_name=fuse-devel

http://sourceware.org/bugzilla/show_bug.cgi?id=12261
Comment 1 David Heidelberg 2014-06-23 19:29:46 UTC
Actual status of bug?

Berkeley DB still fails with
checking whether the C compiler works... no

as you described here [1] glibc version 2.19.

[1] https://blog.flameeyes.eu/2011/06/gold-readiness-obstacle-1-berkeley-db
Comment 2 Florian Weimer 2016-10-09 18:15:37 UTC
We need the DSOs to reproduce this.  With GNU ld version 2.25-17.fc23, ver_test_12b.so contains this:

Version definition section [ 6] '.gnu.version_d' contains 3 entries:
 Addr: 0x0000000000000538  Offset: 0x000538  Link to section: [ 4] '.dynstr'
  000000: Version: 1  Flags: BASE   Index: 1  Cnt: 1  Name: ver_test_12b.so
  0x001c: Version: 1  Flags: none  Index: 2  Cnt: 1  Name: VER1
  0x0038: Version: 1  Flags: none  Index: 3  Cnt: 2  Name: VER2
  0x0054: Parent 1: VER1

ver_test_13b.so looks like this:

Version definition section [ 6] '.gnu.version_d' contains 3 entries:
 Addr: 0x0000000000000538  Offset: 0x000538  Link to section: [ 4] '.dynstr'
  000000: Version: 1  Flags: BASE   Index: 1  Cnt: 1  Name: ver_test_13b.so
  0x001c: Version: 1  Flags: none  Index: 2  Cnt: 1  Name: VER1
  0x0038: Version: 1  Flags: none  Index: 3  Cnt: 2  Name: VER2
  0x0054: Parent 1: VER1

This means that the base version is VER1 in both cases.  The closest thing we appear to have to a specification for our variant of ELF symbol versioning appears to be this:

  <https://www.akkadia.org/drepper/symbol-versioning>

“
In case only the object file with the reference does not use versioning but the object with the definition does, then the reference only matches the base definition.  The base definition is the one with index numbers 1 and 2 (1 is the unspecified name, 2 is the name given later to the baseline of symbols once the library started using symbol versioning).
”

It's not clear to me if a binary is valid if provides conflicting definitions for version index 1 and 2.  Our put differently, the binutils example appears to be misleading.

With the DSOs generated by Fedora 23's ld.bfd, I cannot reproduce the difference in behavior between the two test cases.  Perhaps the static linker has started to sort symbols with the same name in a predictable manner.

I came across this old bug because I was trying to file a bug about the dlsym behavior.  elf/dl-lookup.c currently looks like this:

      /* No specific version is selected.  There are two ways we
	 can got here:

	 - a binary which does not include versioning information
	 is loaded

	 - dlsym() instead of dlvsym() is used to get a symbol which
	 might exist in more than one form

	 If the library does not provide symbol version information
	 there is no problem at all: we simply use the symbol if it
	 is defined.

	 These two lookups need to be handled differently if the
	 library defines versions.  In the case of the old
	 unversioned application the oldest (default) version
	 should be used.  In case of a dlsym() call the latest and
	 public interface should be returned.  */
      if (verstab != NULL)
	{
	  if ((verstab[symidx] & 0x7fff)
	      >= ((flags & DL_LOOKUP_RETURN_NEWEST) ? 2 : 3))
	    {
	      /* Don't accept hidden symbols.  */
	      if ((verstab[symidx] & 0x8000) == 0
		  && (*num_versions)++ == 0)
		/* No version so far.  */
		*versioned_sym = sym;

	      return NULL;
	    }
	}

This definitely causes weird behavior for dlsym lookups of _sys_errlist in libc.so.6.  I currently get sys_errlist@GLIBC_2.4, which is neither the base version nor the default version.
Comment 3 Diego Barrios Romero 2017-01-18 17:58:21 UTC
Created attachment 9761 [details]
Symbol overwrite example

I found a different manifestation of this unpredictability w.r.t. the base version of symbols.

I have attached an example program setup in which symbols are overwritten by the base version even though a library specifies that it needs a specific version.

The example contains two libraries (A and B) that offer the symbol f.
A does not define any version, but B does define it in version B_1.0.
Then there are two libraries (AU and BU) that use these libraries, each using only one of them, and they offer the fAU and fBU functions.
Then a program 'p' links both libraries AU and BU (See makefile for more information).

At run time, the symbol f that is requested by BU with version B_1.0 gets matched with the base version one provided by library A. Even though the symbol with the adequate version is also available, the base version is returned.

Here are the relevant symbol entries when doing objdump -TC
libA.so:
  0000000000000705 g    DF .text	0000000000000012  Base        f

libB.so:
  0000000000000765 g    DF .text	0000000000000012  B_1.0       f

libAU.so:
  0000000000000000      DF *UND*	0000000000000000              f

libBU.so:
  0000000000000000      DF *UND*	0000000000000000  B_1.0       f

It also happens the other way around, the f base version requested by AU is overwritten by f@@B_1.0 if changing the link order (-lBU -lAU).
While this latter behaviour is also unexpected, matching the base version to a symbol containing a specific version sounds definitely unexpected to me.

If you compile and run the program 'p', you should see something like this:
  AU::fAU()
  A::fA()
  A::f()
  A::f()
  BU::fBU()
  B::fB()
  A::f()
  A::f()

If you add version information to library A (uncomment commented line in the makefile), you will see that the symbols get matched as expected and the program reports:
  AU::fAU()
  A::fA()
  A::f()
  A::f()
  BU::fBU()
  B::fB()
  B::f()
  B::f()

I was able to reproduce this on ubuntu 14.04 (eglibc 2.19) and on ubuntu 16.04 (glibc 2.23).

I also had a look at the source code and think that the check_match subroutine in elf/dl-lookup.c behaves unexpectedly.

To my outsider eyes, it sounded like the 'else' case below could be at fault. It looks like it would return the default version if the version information does not match but a non-hidden default version exists. This would of course not be the case depending on "who-is-who" in these variables, but I could not get any further.

        const ElfW(Half) *verstab = map->l_versyms;
	if (version != NULL)
	  {
	    if (__builtin_expect (verstab == NULL, 0))
	      {
	        /*...*/
	      }
	    else
	      {
		/* We can match the version information or use the
		   default one if it is not hidden.  */
		ElfW(Half) ndx = verstab[symidx] & 0x7fff;
		if ((map->l_versions[ndx].hash != version->hash
		     || strcmp (map->l_versions[ndx].name, version->name))
		    && (version->hidden || map->l_versions[ndx].hash
			|| (verstab[symidx] & 0x8000)))
		  /* It's not the version we want.  */
		  return NULL;
	      }
         /*...*/

I tried adding a version_matched flag and keep looking in other 'scopes' (see for loop in function _dl_lookup_symbol_x) for a better match instead of returning the first one but it did not immediately work so I decided to report it here instead.

Does this example help you reproduce this?
Comment 4 Florian Weimer 2017-01-18 18:24:16 UTC
(In reply to Diego Barrios Romero from comment #3)

> At run time, the symbol f that is requested by BU with version B_1.0 gets
> matched with the base version one provided by library A. Even though the
> symbol with the adequate version is also available, the base version is
> returned.

Do you really mean “base version” here?  Isn't this an unversioned symbol?

Ulrich Drepper's description of symbol versioning is pretty clear that unversioned symbols preempt all versioned symbols of the same name, no matter what the symbol version is.  I believe the original intent for this design choice was that it allows to introduce symbol versioning in a future library version, while still permitting existing binaries to interpose those symbols.  No matter what the original choice was, we today have many interposing libraries (such as alternative malloc implementations) which depend on this particular behavior of the dynamic linker.

I agree this behavior is surprising, but the behavior described in comment 3 does not qualify as a bug.

Let's keep this bug open for the original issue (dlsym result depends on hash chain ordering in a surprising way).
Comment 5 Diego Barrios Romero 2017-01-19 06:39:42 UTC
(In reply to Florian Weimer from comment #4)

I meant undefined version. Sorry I got them mixed.
Thanks for the clarification.