The dlsym(RTLD_NEXT, "foo") is broken when more than one library contain the "foo". Here are some samples : Here are 2 little programs and 4 libraries to underline the faulty behaviour: # the libraries: for i in 1 2 3 4 do cat > lib$i.c <<EOF #include <stdio.h> #include <dlfcn.h> void foo() { void (*next_foo)(void); printf("lib$i.foo()\n"); if (next_foo = (void (*)(void)) dlsym(RTLD_NEXT, "foo")) next_foo(); } EOF gcc -shared -fPIC lib$i.c -o lib$i.so -D_GNU_SOURCE done # the test program liking libs at compile time: cat > chain.c <<EOF #include <dlfcn.h> extern void foo(); int main() { foo(); return 0; } EOF gcc chain.c -o chain -L. -l1 -l2 -l3 -l4 -ldl LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./chain #And the result : # # lib1.foo() # lib2.foo() # lib3.foo() # lib4.foo() # # And now a runtime linking version: # cat > chain2.c <<EOF #include <dlfcn.h> int main() { void *l1, *l2, *l3, *l4; void (*bar)(); l1 = dlopen("lib1.so", RTLD_NOW | RTLD_GLOBAL); l2 = dlopen("lib2.so", RTLD_NOW | RTLD_GLOBAL); l3 = dlopen("lib3.so", RTLD_NOW | RTLD_GLOBAL); l4 = dlopen("lib4.so", RTLD_NOW | RTLD_GLOBAL); bar = (void (*)()) dlsym(RTLD_DEFAULT, "foo"); bar(); dlclose(l4); dlclose(l3); dlclose(l2); dlclose(l1); return 0; } EOF gcc chain2.c -o chain2 -ldl -D_GNU_SOURCE # LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./chain2 # # And the result : # # lib1.foo() # On alternate plateforms (bsd/libtld, Sun/libdl) the result is : lib1.foo() lib2.foo() lib3.foo() lib4.foo() Here is a more verbose version of chain2: cat > chain5.c <<EOF #include <dlfcn.h> int main() { void *l1, *l2, *l3, *l4; void (*bar)(); l1 = dlopen("lib1.so", RTLD_NOW | RTLD_GLOBAL); bar = (void (*)()) dlsym(l1, "foo"); printf("l1 is %x l1.foo is %x\n", l1, bar); l2 = dlopen("lib2.so", RTLD_NOW | RTLD_GLOBAL); bar = (void (*)()) dlsym(l2, "foo"); printf("l2 is %x l2.foo is %x\n", l2, bar); l3 = dlopen("lib3.so", RTLD_NOW | RTLD_GLOBAL); bar = (void (*)()) dlsym(l3, "foo"); printf("l3 is %x l3.foo is %x\n", l3, bar); l4 = dlopen("lib4.so", RTLD_NOW | RTLD_GLOBAL); bar = (void (*)()) dlsym(l4, "foo"); printf("l4 is %x l4.foo is %x\n", l4, bar); bar = (void (*)()) dlsym(l1, "foo"); printf("l1.foo is %x \n", bar); bar = (void (*)()) dlsym(l2, "foo"); printf("l2.foo is %x \n", bar); bar = (void (*)()) dlsym(l3, "foo"); printf("l3.foo is %x \n", bar); bar = (void (*)()) dlsym(l4, "foo"); printf("l4.foo is %x \n", bar); bar = (void (*)()) dlsym(RTLD_DEFAULT, "foo"); printf("default foo is %x \n", bar); bar(); dlclose(l4); dlclose(l3); dlclose(l2); dlclose(l1); return 0; } EOF gcc chain5.c -o chain5 -ldl -D_GNU_SOURCE LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./chain5 And the result looks like this : l1 is 804a018 l1.foo is 5556d584 l2 is 804a3b0 l2.foo is 55570584 l3 is 804a710 l3.foo is 55572584 l4 is 804aa70 l4.foo is 55574584 l1.foo is 5556d584 l2.foo is 55570584 l3.foo is 55572584 l4.foo is 55574584 default foo is 5556d584 lib1.foo() Running the chain2 program with LD_DEBUG="files symbols" gives : 14129: symbol=dlsym; lookup in file=./chain2 14129: symbol=dlsym; lookup in file=/lib/tls/libdl.so.2 14129: symbol=_dl_sym; lookup in file=./chain2 14129: symbol=_dl_sym; lookup in file=/lib/tls/libdl.so.2 14129: symbol=_dl_sym; lookup in file=/lib/tls/libc.so.6 14129: symbol=foo; lookup in file=./chain2 14129: symbol=foo; lookup in file=/lib/tls/libdl.so.2 14129: symbol=foo; lookup in file=/lib/tls/libc.so.6 14129: symbol=foo; lookup in file=/lib/ld-linux.so.2 14129: symbol=foo; lookup in file=lib1.so --> foo is found in lib1.so lib1.foo() 14129: symbol=foo; lookup in file=/lib/tls/libc.so.6 14129: symbol=foo; lookup in file=/lib/ld-linux.so.2 --> Then all other lib$i.so are ignored : 14129: symbol=dlclose; lookup in file=./chain2 14129: symbol=dlclose; lookup in file=/lib/tls/libdl.so.2 14129: symbol=_dl_close; lookup in file=./chain2 14129: symbol=_dl_close; lookup in file=/lib/tls/libdl.so.2 14129: symbol=_dl_close; lookup in file=/lib/tls/libc.so.6 14129: According to "Open Group Base Specifications Issue 6IEEE Std 1003.1, 2004 Edition" dlopen and dlsym entry, this should be a bug. Fell free to contact me for any info. Regards. Yann LANGAIS.
POSIX just reserves RTLD_NEXT for future use, nothing more.
According to http://www.opengroup.org/onlinepubs/009695399/toc.htm it is indeed reserved for future use : >>>>>>>>>>>>>>>>>> APPLICATION USAGE Special purpose values for handle are reserved for future use. These values and their meanings are: RTLD_DEFAULT The symbol lookup happens in the normal global scope; that is, a search for a symbol using this handle would find the same definition as a direct use of this symbol in the program code. RTLD_NEXT Specifies the next object after this one that defines name. This one refers to the object containing the invocation of dlsym(). The next object is the one found upon the application of a load order symbol resolution algorithm (see dlopen()). The next object is either one of global scope (because it was introduced as part of the original process image or because it was added with a dlopen() operation including the RTLD_GLOBAL flag), or is an object that was included in the same dlopen() operation that loaded this one. The RTLD_NEXT flag is useful to navigate an intentionally created hierarchy of multiply-defined symbols created through interposition. For example, if a program wished to create an implementation of malloc() that embedded some statistics gathering about memory allocations, such an implementation could use the real malloc() definition to perform the memory allocation-and itself only embed the necessary logic to implement the statistics gathering function. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< But as it *IS* documented in the man page as without mention of "reservation for future use". The bug is then where in the code, whether in the documentation. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Extract of "man dlsym" : dlsym The function dlsym() takes a "handle" of a dynamic library returned by dlopen and the NUL-terminated symbol name, returning the address where that symbol is loaded into memory. If the symbol is not found, in the spec- ified library or any of the libraries that were automatically loaded by dlopen() when that library was loaded, dlsym() returns NULL. (The search performed by dlsym() is breadth first through the dependency tree of these libraries.) Since the value of the symbol could actually be NULL (so that a NULL return from dlsym() need not indicate an error), the correct way to test for an error is to call dlerror() to clear any old error condi- tions, then call dlsym(), and then call dlerror() again, saving its return value into a variable, and check whether this saved value is not NULL. There are two special pseudo-handles, RTLD_DEFAULT and RTLD_NEXT. The former will find the first occurrence of the desired symbol using the default library search order. The latter will find the next occurrence of a function in the search order after the current library. This allows one to provide a wrapper around a func- tion in another shared library. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
That's not normative.
The New version of the LSB (3.0.0) makes the following statement about dlsym RTLD_NEXT : "The value RTLD_NEXT, which is reserved for future use shall be available, with the behavior as described in ISO POSIX (2003)." http://refspecs.freestandards.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/baselib-dlsym-1.html The current behaviour is incompatible with the LSB 3.0.0 statement (according the informative part of the POSIX description of dslym()).
If the LSB specifies something different, they are wrong. File a bug with them.
LSB specifies the same thing that POSIX says in its descriptive part. Implementation of libc differs from what LSB and POSIX say. Then what ? YOU are right supporting something that doesn't behave as specified ??? Did you take 3 minutes to understand what the problem is all about ? What is the problem with you guys ? Your political points of view are far away from most of unix programmers Ulrich! Clearly lsb board is NOT perfect. But are you the perfection ? The point is that IT has at the very least the good point of existing. And by the way, your own company has its word to say since it IS part of lsb effort as a Gold Member. That's kind of what we call "to spit in the soup".
Ok. Since this place is supposed to be a bug filing place (BUG-ZILLA) and not a dumb troll about politically correct way to call a bee a bee, let me rephrase the BUG description: According to : 1/ Posix dlsym RTLD_NEXT descriptive part. 2/ LSB (even prior 2.0) description of dlsym RTLD_NEXT that agrees with 1/ 3/ dlsym man page RTLD_NEXT section The behavior of the following stuff is *INCORRECT* : for i in 1 2 3 4 do cat > lib$i.c <<EOF #include <stdio.h> #include <dlfcn.h> void foo() { void (*next_foo)(void); printf("lib$i.foo()\n"); if (next_foo = (void (*)(void)) dlsym(RTLD_NEXT, "foo")) next_foo(); } EOF gcc -shared -fPIC lib$i.c -o lib$i.so -D_GNU_SOURCE done cat > chain2.c <<EOF #include <dlfcn.h> int main() { void *l1, *l2, *l3, *l4; void (*bar)(); l1 = dlopen("lib1.so", RTLD_NOW | RTLD_GLOBAL); l2 = dlopen("lib2.so", RTLD_NOW | RTLD_GLOBAL); l3 = dlopen("lib3.so", RTLD_NOW | RTLD_GLOBAL); l4 = dlopen("lib4.so", RTLD_NOW | RTLD_GLOBAL); bar = (void (*)()) dlsym(RTLD_DEFAULT, "foo"); bar(); dlclose(l4); dlclose(l3); dlclose(l2); dlclose(l1); return 0; } EOF gcc chain2.c -o chain2 -ldl -D_GNU_SOURCE # LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./chain2 May you please : 1/ Correct the bug in dlsym OR 2/ Correct : - dlsym MANPAGE - LSB from at least 1.2 - POSIX dlsym RTLD_NEXT descriptive section Sorry to bother you with this, and many thanks in advance.
There is no bug since there is *nowhere* a description what RTLD_NEXT is supposed to do.
(In reply to comment #8) > There is no bug since there is *nowhere* a description what RTLD_NEXT is > supposed to do. THRE IS A BUG SINCE THE MAN PAGE IS NOT CONFORM TO THE BEHAVIOUR OF LIBDL.SO PLEASE FIX AT LEAST THE MAN PAGE OF DLSYM REMOVING DESCRIPTION OF RTLD_NEXT
The man pages are not part of glibc. If you reopen this bug again or file a new one for the same problem I have no choice but to block your access.
It's puzzling why this isn't a bug. I suspect RTLD_NEXT implements dependency order instead of load order. Even back in 2005, we could no longer change that, and the only way to deal with this is to document the discrepancy with other systems.
(In reply to Florian Weimer from comment #11) > It's puzzling why this isn't a bug. I suspect RTLD_NEXT implements > dependency order instead of load order. Even back in 2005, we could no > longer change that, and the only way to deal with this is to document the > discrepancy with other systems. POSIX requires dlsym implement dependency ordering, it's written into the standard, see the text in dlopen: ~~~ ... With the exception of the global symbol object obtained via a dlopen() operation on a file of 0, dependency ordering is used by the dlsym() function. Load ordering is used in dlsym() operations upon the global symbol object. ... ~~~ I'm reopening this because regardless of the fact that POSIX only reserves RTLD_NEXT we are basically implementing a Solaris feature and need to reconsider exactly what RTLD_NEXT does when we have more than 1 library with the symbol. It seems entirely reasonable to be able to walk the entire hierarchy of defined symbols instead of stopping at the first definition.
I'm still worried that if we change the search results now, we break applications. This is less of a concern if after the bug fix, we only find *additional* symbols, but if we return *different* symbols, that seems quite risky.
(In reply to Florian Weimer from comment #13) > I'm still worried that if we change the search results now, we break > applications. > > This is less of a concern if after the bug fix, we only find *additional* > symbols, but if we return *different* symbols, that seems quite risky. Agreed, we'll need regression test cases for every minute change we make here. It's certainly a project that will need lots more testing. I'm considering automation to create complete DAG's for all DSO deps and recording ordering and then using that to drive some comparison while I change the sort algorithms.
Regarding ordering, I have a small testcase that shows that the libc symbol found by RTLD_NEXT is not always the default one (pthread_cond_* differs). In other words, interposing anything pthread_cond_* related would always have to dlvsym correct symbols.