[PATCH v2] elf: Remove one-default-version check when searching an unversioned symbol

Fangrui Song maskray@google.com
Wed Jun 1 08:14:54 GMT 2022


On 2022-06-01, Florian Weimer wrote:
>* Fangrui Song:
>
>>>Based on first principles, I think we should do the following:
>>>
>>>* If we have a version to look up (via a DT_VERNEED binding or dlvsym),
>>>  and the target is versioned as well, a full match means equality of
>>>  symbol name and version.  The hidden/default flag and the version
>>>  index do not matter.
>>
>> Yes, the intuitive case.
>>
>>>* If we have a version to lookup up, but no version information in the
>>>  target, we match only the name.  This is necessary so that
>>>  interposition of e.g. malloc without a version works.
>>
>> Yes. Sanitizers work this way: they are unversioned but can intercept
>> *@GLIBC_* functions.
>
>They break symbol versioning due to this, so I'm not sure if they are a
>good example.

The behavior is natural to me: the behavior when DT_VERSYM is absent
matches the intention of the index 1/2 special rule when DT_VERSYM is present.

On the other hand, without the rule, an interceptor needs a glibc style version
script to intercept functions like malloc.  This would make the interceptor
cumbersome.  Also, if programs get newer malloc versions (because glibc updated malloc behavior),
the interceptor would have to keep adding new versioned definitions.

>>>  dlvsym should
>>>  match symbol binding here and pick up the unversioned definition as
>>>  well.
>>
>> Yes if the object does not have DT_VERSYM:
>> dlvsym(RTLD_DEFAULT, "foo", "v1") pick up unversioned foo if the object
>> does not have DT_VERSYM.
>>
>>>  Without interposition, the non-dlvsym cannot happen because we
>>>  have a version coverage check in elf/dl-version.c that should have
>>>  detected that the soname lacks the required version information.
>>
>> ?
>
>Needed versions contain a soname (vn_file), and we check that the shared
>object with that soname provides all the needed versions.  This is how
>we detect applications that run on too-old glibc even if lazy binding is
>active.

Yeah, this matches my understanding.

>>>* For symbol binding during relocation, a version-less lookup may need
>>>  to find a versioned definition.  This was initially implemented so
>>>  that glibc 2.1 could introduce symbol versioning without rebuilding
>>>  the world.  (We have plenty of GLIBC_2.0 symbols, but glibc 2.0 did
>>>  not have symbol versioning.)  It has subsequently been used by other
>>>  libraries.  In this case, we should pick the symbol with the lowest
>>>  version index and a matching name.  The hidden/default version index
>>>  flag does not matter.  We need to pick the lowest version even if it
>>>  is no longer the default.  This ensures that old binaries do not pick
>>>  up a newer definition, changing behavior.
>>
>> Right. The index is either 1 (VER_NDX_GLOBAL; ld ensures VER_NDX_GLOBAL
>> is the only version attached to the symbol name) or 2.
>
>I believe this not to be true.  For example,
>sched_setaffinity@GLIBC_2.3.3 is the lowest version (with index 6, but
>that might vary from build to build), not sched_setaffinity@GLIBC_2.2.5.
>I think we should still bind to that if an object contains an
>unversioned undefined symbol reference to sched_setaffinity.

The current behavior is that an unversioned sched_setaffinity from
relocation does not bind to sched_setaffinity@GLIBC_2.3.3 (index > 2).

I think the idea is that when a DSO becomes versioned, all definitions
should be assigned VER_NDX_GLOBAL (1) or index 2.  Definitions from newer
versions do not count.

cat > ./a.c <<'eof'
#include <stdio.h>
int foo(int a);
int main(void) { printf("%d\n", foo(0)); }
eof
echo 'int foo(int a) { return -1; }' > ./b0.c
echo 'int foo_v1(int a) { return 1; } asm(".symver foo_v1, foo@v1, remove");' > ./b.c
echo 'va {}; v1 {}; v2 {};' > ./b.ver
sed 's/^        /\t/' > ./Makefile <<'eof'
.MAKE.MODE := meta curdirOk=1
CFLAGS = -fpic -g

a: a.o b0.so b.so
         $(CC) -Wl,--no-as-needed a.o b0.so -ldl -Wl,-rpath=$$PWD -o $@

b0.so: b0.o
         $(CC) $> -shared -Wl,--soname=b.so -o $@

b.so: b.o
         $(CC) $> -shared -Wl,--soname=b.so,--version-script=b.ver -o $@

clean:
         rm -f a *.so *.o *.meta
eof

Run `bmake; ./a` => `./a: symbol lookup error: ./a: undefined symbol: foo`

If va and v1 are swapped in b.ver, foo from a will be able to bind to foo@v1 in b.so.

Loosen the rule will unlikely break any users, but seems unnecessary to me
and does not help with the (necessary) inconsistency among reloc/dlsym/dlvsym.

>>>* For symbol lookup with dlsym (without a version), we need to find a
>>>  versioned definition.  The use case for that is mainly FFI bindings
>>>  (e.g., Python ctypes), where programmers generally do not specify
>>>  version information.  We decided that we would want the latest version
>>>  in this case.
>>
>> Correction: the default version, instead of the latest version.
>>
>> If somehow there are foo@@v2 and foo@v3, dlsym(RTLD_DEFAULT, "foo")
>> picks up foo@@v2.
>
>If there is no default version, I think we need to pick the latest
>version, instead of failing the lookup.

I prefer the current rtld behavior: it matches ld behavior.

In ld, an unversioned reference foo can be resolved by either foo or foo@@v2.
It cannot bind to a non-default version definition foo@v3.

foo can be understood as VER_NDX_GLOBAL in a versioned shared object.

I have some notes about ld behavior:
https://maskray.me/blog/2020-11-26-all-about-symbol-versioning#linker-behavior

Basically rtld behavior mostly matches ld, but is loose in some aspects for DSO-upgrade compatibility.

>>>  However, in this case, I would want us to consider
>>>  default versions only if they exist.
>>
>> The current behavior already implements this.
>>
>> When searching a definition for foo,
>>
>> * for an object without DT_VERSYM
>>   + it can be bound to foo
>> * for an object with DT_VERSYM
>>   + it can be bound to foo of version VER_NDX_GLOBAL. This takes precendence over the next two rules
>>   + it can be bound to foo of any default version
>>   + it can be bound to foo of non-default version index 2 in relocation resolving phase (not dlsym/dlvsym)
>
>Is VER_NDX_GLOBAL actually used?  It seems to be a marker for “we no
>longer use symbol versioning for this symbol”.

VER_NDX_GLOBAL (1) is used.  If a definition is not attached to a version node,
it gets VER_NDX_GLOBAL.

>I think these rules are not quite correct because they do not find the
>earliest version (for relocation bindings) or latest eligible version
>(for dlsym).

I have tried many examples to testify the rules.
Do you have an example that the rules fail?

>>>The current implementation has two different, potentially conflicting
>>>flags for indicating dlvsym lookups: the “hidden” member in the struct
>>>r_found_version object that is passed in, and the
>>>DL_LOOKUP_RETURN_NEWEST.  I think we should consolidate this.  For
>>>versioned symbol binding during relocation processing, the hidden flag
>>>does not matter according to the overview above.
>>
>> Based on my current understanding, removing DL_LOOKUP_RETURN_NEWEST is
>> fine, but we'd still need a flag.
>>
>> For the versioned lookup code, DL_LOOKUP_RETURN_NEWEST is unused.
>>
>> For the unversioned lookup code (dlvsym has been ruled out),
>> DL_LOOKUP_RETURN_NEWEST is to differentiate relocation resolving and
>> dlsym.
>
>I think we don't need both DL_LOOKUP_RETURN_NEWEST and version->hidden.
>
>We need to differentiate between unversioned relocation binding and
>dlsym.
>
>Versioned relocation binding and dlvsym should behave identical, I htink.

The current behavior is: dlvsym is stricter than versioned relocation binding.
Relocation binding accepts VER_NDX_GLOBAL while dlvsym does not.

dlvsym is to find the exact match, while relocation binding (for compatibility
reasons) has a fallback.

The current rtld behavior on dlvsym exactly matches how ld resolves a foo@v1
reference, so I quite like its semantics.

>>>There is a corner case for unversioned symbol definitions in versioned
>>>object, and how versioned lookups can bind against them.  I'm changing
>>>this code in my patch for bug 29190.  Such symbol definitions have an
>>>empty version name and version hash zero.  Apparently, the intent is
>>>that they can bind to any version, particularly if they are the default.
>>>(Or at index 0 or 1?  But the existing code does not check that.)  I
>>>have never seen them used in practice, so I don't know how to test them.
>>>The current code has checks for the default/hidden version index flag,
>>>but I don't think we need that, it is inconsistent with that.
>>
>> Index 0 is VER_NDX_LOCAL. Such symbols should have STB_LOCAL binding.
>> Such symbols, if present in .dynsym, should be STT_SECTION.
>> rtld does not need to check the invariant.
>> STT_SECTION symbols have been ignored by `check_match`.
>>
>> (GNU ld's powerpc64 port may produce R_PPC64_UADDR64 referencing a
>> STB_LOCAL section symbol.)
>
>dl-lookup.c does not encounter STB_LOCAL symbols because the DT_GNU_HASH
>array does not cover them (the symbias value in elf/dl-setup_hash.c).
>STB_LOCAL symbols come first in the symbol table (an ELF rule for symbol
>tables, “In each symbol table, all symbols with STB_LOCAL binding
>precede the weak and global symbols.”), and I think symbias is supposed
>to chosen in such a way that they do not show up in the table.

Agree

>However, in some objects, I see VER_NDX_LOCAL being used for undefined
>symbols.  Other objects use VER_NDX_GLOBAL.  As far as I can tell, they
>are all produced by BFD ld.

GNU ld before 2.37 may use VER_NDX_LOCAL for undefined symbols.
I reported the bug at https://sourceware.org/bugzilla/show_bug.cgi?id=26002

>>>Going back to your suggestion regarding default_version_sym, I think we
>>>need to turn this into a struct and keep track of the version index (the
>>>vd_ndx value) as well.  num_versions still should go because I don't
>>>think the number of versions matter in any case.
>>
>> Hmm. I don't think we need a vd_ndx value.
>
>I think we need it for unversioned relocation binding if version index 2
>does not exist for the symbol name, and for dlsym if the default version
>has been removed.

The debate is about whether we want to further loosen dlsym/reloc resolving behavior.
My opinion is no, so the current information is sufficient and we do not need a new value...

>Thanks,
>Florian
>


More information about the Libc-alpha mailing list