Revert pr16467 change

Alan Modra amodra@gmail.com
Thu Jun 2 02:38:00 GMT 2016


On Wed, Jun 01, 2016 at 09:00:26AM -0700, H.J. Lu wrote:
> On Wed, Jun 1, 2016 at 6:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Wed, Jun 1, 2016 at 6:28 AM, Alan Modra <amodra@gmail.com> wrote:
> >> This reverts the pr16467 change, which was incorrect due to faulty
> >> analysis of the pr16467 testcase.  The failure was not due to a
> >> mismatch in symbol type (ifunc/non-ifunc) but due to a symbol loop
> >> being set up.  I've added a pr16467d.c that shows the same symbol
> >> loop without ifunc being involved.
> >
> > I agree.  PR 16467 is an odd case which moves a versoned symbol
> > from one DSO to another.

I should explain exactly what was happening with pr16467, while it it
fresh in my mind.

First a quick description of symbol versioning in shared libraries.  A
symbol with a version, "sym@VERS1" say, isn't anything special.  It's
simply a symbol with an odd name containing '@', and is used like any
other symbol.  Where things get interesting is with versioned symbols
containing two '@'s, "sym@@VERS1" say.  These satisfy references to
both "sym@VERS1" and plain "sym" (but don't satisfy a reference to
"sym@VERS0").  References to plain "sym" get converted to dynamic
references to "sym@VERS1".

A further complication is due to ELF shared library symbol overriding.
A function or variable defined in the executable (or another shared
library) may override the same symbol defined in a shared library.
For example, if you define your own "malloc" in an executable then
both the executable and libc.so will use your "malloc" rather than the
one defined in libc.so.  Now "malloc" in glibc is currently
"malloc@@GLIBC_2.2.5" on x86_64-linux.  So that means references to
the versioned symbol "malloc@GLIBC_2.2.5" also need to be overridden
by the executable "malloc".

What about references to a hypothetical "malloc@GLIBC_2.2.4"?  Should
they also be satisfied by "malloc" defined in the executable?  No,
because "malloc@GLIBC_2.2.4" might have different parameters!  (malloc
isn't a good example here, but imagine if the old malloc took an "int"
size while the new one took "size_t".)  Also, there is no link between
plain "malloc" in the executable and "malloc@GLIBC_2.2.4" because
"malloc@@GLIBC_2.2.4" does not exist in the shared library.  Note that
the fact that older versioned symbols are not overridden can cause you
a problem.  Imagine if your app defined "malloc" and had a "libA.so"
linked against our hypothetical old libc.so defining
"malloc@@GLIBC_2.2.4".  When the app is run with the old libc, the
app, libA.so and libc.so all use "malloc" in the executable.  However
if libc.so is upgraded then app and libc.so continue to use "malloc"
in the executable but libA.so does not.  What's more, the app "malloc"
may not match the signature of the libc.so "malloc@GLIBC_2.2.5".

In the pr16467 testcase, we have an object file, pr16467b.o, defining
"sd_get_seats" and referencing "sd_get_seats@LIBSYSTEMD_209".  A
shared library libpr16467a.so defines "sd_get_seats@@LIBSYSTEMD_209".
Normally that would mean any references to
"sd_get_seats@LIBSYSTEMD_209", both in the object file and shared
library would be resolved to the definition of "sd_get_seats".
However, we are building another shared library here, libpr16467b.so,
and "sd_get_seats" in the object is going to end up as
"sd_get_seats@@LIBSYSTEMD_208".  Which means it is wrong to point the
libpr16467a.so "sd_get_seats" to the libpr16467b.so "sd_get_seats".

> >> I'm not 100% happy with this fix, due to not always having symbol
> >> versions around early enough.  See the comment.  However, comparing
> >> versions directly addresses the problem exposed by the testcase, and I
> >> don't see any other approach that will work in all cases.
> >>
> >> The strtab save/restore change is necessary due to
> >> _bfd_elf_add_default_symbol now calling elf_backend_hide_symbol, which
> >> decrements dynstr strtab refcounts.  Previously we only added new
> >> dynamic symbols.
> >>
> >> I'll commit this in a few days if no more problems like the strtab
> >> refcounting one appear.
> >
> > Can you check your change into a branch?  I'd like to add a few
> > more testcases on top of your change.

I'll commit my patch without the testcases, since it seems you already
have that covered, and my pr16467d.c was just to illustrate the symbol
loop.

> 
> There is one case:
> 
> hjl@gnu-6 pr16467c]$ gcc -B./  -o x pr16467c.o pr16467b.o
> libpr16467a.so -Wl,-R,.
> [hjl@gnu-6 pr16467c]$ ./x
> ./x: symbol lookup error: ./x: undefined symbol:
> [hjl@gnu-6 pr16467c]$ readelf --dyn-sym x
> 
> Symbol table '.dynsym' contains 11 entries:
>    Num:    Value          Size Type    Bind   Vis      Ndx Name
>      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>      1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5 (2)
>      2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND
> __libc_start_main@GLIBC_2.2.5 (2)
>      3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
>      4: 0000000000000000 0x20eb268 NOTYPE  LOCAL  DEFAULT  UND <corrupt>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Yes, this sort of thing is why I've been running with MALLOC_PERTURB_
set lately.  :)

> Linker should refuse to link with something like
> 
> diff --git a/bfd/elflink.c b/bfd/elflink.c
> index 93e7dd2..58f2554 100644
> --- a/bfd/elflink.c
> +++ b/bfd/elflink.c
> @@ -9396,6 +9408,15 @@ elf_link_output_extsym (struct bfd_hash_entry
> *bh, void *data)
>   foo is used with no version, then we add an indirect symbol
>   foo which points to foo@@GNU_1.2.  We ignore these symbols,
>   since the indirected symbol is already in the hash table.  */
> +      if (h->dynindx != -1)
> + {
> +  (*_bfd_error_handler)
> +    (_("%B: incompatible versioned symbol `%s'"),
> +     flinfo->output_bfd, h->root.root.string);
> +  bfd_set_error (bfd_error_bad_value);
> +  eoinfo->failed = TRUE;
> +  return FALSE;
> + }
>        return TRUE;
>      }

No, there isn't anything incompatible about the versioned symbol, and
this patch isn't actually addressing the cause of the linker failure.
You should look into why we are making an indirect symbol dynamic.

It's a really odd case though.  By linking against libpr16467a.so you
set up a symbol loop from "sd_get_seats@LIBSYSTEMD_209" to
"sd_get_seats" in the executable, according to the reasoning I've
given above about symbol versioning and overriding.  But, having done
so, libpr16467a.so is no longer needed!

-- 
Alan Modra
Australia Development Lab, IBM



More information about the Binutils mailing list