Possibly related to https://sourceware.org/bugzilla/show_bug.cgi?id=18684, if you attempt to dlmopen() a library that itself makes a failed call to dlopen() internally as part of initialization, control appears to be returned to the outer dlmopen() call immediately, instead of the inner dlopen() call returning null as expected. In contrast, using dlopen() works as expected. In GDB, setting a breakpoint immediately before foo() shows: #1 0x00007ffff7de7ef2 in call_init.part () from /lib64/ld-linux-x86-64.so.2 #2 0x00007ffff7de7fe6 in _dl_init () from /lib64/ld-linux-x86-64.so.2 #3 0x00007ffff7dec16d in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 #4 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6 #5 0x00007ffff7deb9a9 in _dl_open () from /lib64/ld-linux-x86-64.so.2 #6 0x00007ffff7bd6960 in dlmopen_doit () from /lib64/libdl.so.2 #7 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6 #8 0x00007ffff7bd6675 in _dlerror_run () from /lib64/libdl.so.2 #9 0x00007ffff7bd6a36 in dlmopen () from /lib64/libdl.so.2 #10 0x000000000040086c in main (argc=1, argv=0x7fffffffe3a8) at with_dlmopen.cpp:7 Running a single step, instead of moving to the line that prints "I expected to see this", control transfers to _dl_catch_error(): #0 0x00007ffff7078361 in _dl_catch_error () from /lib64/libc.so.6 #1 0x00007ffff7deb9a9 in _dl_open () from /lib64/ld-linux-x86-64.so.2 #2 0x00007ffff7bd6960 in dlmopen_doit () from /lib64/libdl.so.2 #3 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6 #4 0x00007ffff7bd6675 in _dlerror_run () from /lib64/libdl.so.2 #5 0x00007ffff7bd6a36 in dlmopen () from /lib64/libdl.so.2 #6 0x000000000040086c in main (argc=1, argv=0x7fffffffe3a8) at with_dlmopen.cpp:7 The assembly suggests that this is occurring because of some setjmp()/pseudo-exception handling logic within glibc. Reproduction commands: // foo > cat foo.cpp #include <dlfcn.h> #include <iostream> void __attribute__((constructor)) foo() { dlopen("libdoesnotexist.so", RTLD_LAZY); std::cerr << "I expected to see this" << std::endl; } > g++ -g foo.cpp -fpic -shared -o libfoo.so -ldl // with_dlmopen > cat cat with_dlmopen.cpp #include <iostream> #include <dlfcn.h> #include <sysexits.h> int main(int argc, char * argv[]){ void * handle = dlmopen(LM_ID_NEWLM, "./libfoo.so", RTLD_LAZY); if(!handle){ std::cerr << dlerror() << std::endl; return EX_SOFTWARE; } return EX_OK; } > g++ -g with_dlopen.cpp -o with_dlopen -ldl > ./with_dlopen < I expected to see this // with_dlmopen > cat with_dlmopen.cpp #include <iostream> #include <dlfcn.h> #include <sysexits.h> int main(int argc, char * argv[]){ void * handle = dlmopen(LM_ID_NEWLM, "./libfoo.so", RTLD_LAZY); if(!handle){ std::cerr << dlerror() << std::endl; return EX_SOFTWARE; } return EX_OK; } > g++ -g with_dlmopen.cpp -o with_dlmopen -ldl > ./with_dlmopen < libdoesnotexist.so: cannot open shared object file: No such file or directory
Apologies, pasted the same thing twice in repro. Here's with_dlopen.cpp: > cat with_dlopen.cpp #include <iostream> #include <dlfcn.h> #include <sysexits.h> int main(int argc, char * argv[]){ void * handle = dlopen("./libfoo.so", RTLD_LAZY); if(!handle){ std::cerr << dlerror() << std::endl; return EX_SOFTWARE; } return EX_OK; }
I believe this was fixed in glibc 2.34 via this commit: commit b2964eb1d9a6b8ab1250e8a881cf406182da5875 Author: Florian Weimer <fweimer@redhat.com> Date: Wed Apr 21 19:49:51 2021 +0200 dlfcn: Failures after dlmopen should not terminate process [BZ #24772] Commit 9e78f6f6e7134a5f299cc8de77370218f8019237 ("Implement _dl_catch_error, _dl_signal_error in libc.so [BZ #16628]") has the side effect that distinct namespaces, as created by dlmopen, now have separate implementations of the rtld exception mechanism. This means that the call to _dl_catch_error from libdl in a secondary namespace does not actually install an exception handler because the thread-local variable catch_hook in the libc.so copy in the secondary namespace is distinct from that of the base namepace. As a result, a dlsym/dlopen/... failure in a secondary namespace terminates the process with a dynamic linker error because it looks to the exception handler mechanism as if no handler has been installed. This commit restores GLRO (dl_catch_error) and uses it to set the handler in the base namespace. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (There have been other fixes for ld.so exception handling and dlmopen, and more generally dlmopen.)
Thanks Florian, that sounds like precisely the issue. I'll re-test with 3.24 and report back.