This is a repeat of GNATS libc/1188 issue from 13 years ago:
I accept that there is ambiguity in what dlsym(..., "foo") is ambiguous, but shouldn't two dlsym()s below return the *same* answer?
void *p = dlsym(RTLD_NEXT, "fopen");
void *h = dlopen("libc.so.6", RTLD_LAZY);
void *q = dlsym(h, "fopen");
printf("%p -- p (via RTLD_NEXT)\n", p);
printf("%p -- q (via dlsym(%p))\n", q, h);
gcc -m32 -g t.c -ldl && ./a.out
0xf76e41a0 -- p (via RTLD_NEXT)
0xf7634e70 -- q (via dlsym(0xf7758c88))
Google ref: b/7695672
Worse is that dlsym for a symbol that has no default returns NULL, for example in the case of libpthread.so and you must then use dlvsym to get the symbol.
I agree this should be fixed and dlsym should behave like the application would normally behave had it just been compiled. You get the default version of the symbol and RTLD_NEXT gives you the next version and so on.
*** Bug 20220 has been marked as a duplicate of this bug. ***
I just saw a compiler-rt patch which wants to change a dlsym call to dlvsym because of this: https://reviews.llvm.org/D96348
> COMMON_INTERCEPT_FUNCTION_GLIBC_VER(regexec, "GLIBC_2.3.4");
Due to how the code is organized (it is difficult to inspect the default version name when compiling), compiler-rt will need to be follow glibc default version names more closely, which is undesired.
I think changing this is highly undesirable, as written in
the current dlsym behavior provides ABI stability, no matter what you run the program against it will always return the same symbol.
While if it returns the default symbol (there doesn't need to be any), it would change behavior depending on what glibc is used at runtime.
Say, you compile/link your program against glibc 2.5, which provides foobar symbol say at @@GLIBC_2.5, later glibc 2.37 comes and will have
foobar @GLIBC_2.5 and @@GLIBC_2.37. Some versioned symbols are solely about adding features while keeping the ABI the same, say it handles another bitmask in argument that was previously rejected with -1/setting errno or similar, say regexec, that perhaps can be handled. But in other cases the ABI of the function changes, adds/removes argument or e.g. what some argument points to changes layout (and not just in glibc, but in any versioned shared library).
You really don't want your old programs to suddenly break because of that.
Basically, dlsym should follow what happens when you link against a shared library before symbol versioning has been added to it (e.g. very old glibc).
sanitizers need to be fixed so that the sanitizer libraries are also symbol versioned and when they wrap some glibc symbols, they follow the versioning there, i.e. if they wrap regexec and regexec is on a particular architecture
regexec@GLIBC_2.2.5 and regexec@@GLIBC_2.3.4, then it wraps both of them
and each of those uses dlvsym to find the corresponding symbol in glibc.
Alternatively, we could add a sanitizer-enabled build of glibc, similarly to what we have for profiling mode today, so that the need for interceptors goes away completely.
(This is slightly different from building most of glibc with sanitizers enabled.)
(In reply to Jakub Jelinek from comment #4)
> the current dlsym behavior provides ABI stability, no matter what you run
> the program against it will always return the same symbol.
This bug is about "dlsym(handle, "foo") and dlsym(RTLD_NEXT, "foo")".
Shouldn't they _both_ provide the same ABI stability (and therefore the _same_ answer)?