The following program segfaults on glibc master: cat >> main.c <<EOF /* Test dlmopen of a DSO that calls dlopen RTLD_GLOBAL. */ #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <dlfcn.h> #define DSO "./libfoo.so" #define FUNC "foo" int main (void) { void *dso; int (*func) (void); dso = dlmopen (LM_ID_NEWLM, DSO, RTLD_NOW|RTLD_LOCAL); *(void **) (&func) = dlsym (dso, FUNC); (*func) (); dlclose (dso); return 0; } EOF cat >> foo.c <<EOF #include <stdio.h> #include <stdlib.h> #include <dlfcn.h> #define DSO "./libbar.so" #define FUNC "bar" void foo (void) { void *dso; int (*func) (void); dso = dlopen (DSO, RTLD_NOW|RTLD_GLOBAL); *(void **) (&func) = dlsym (dso, FUNC); (*func) (); dlclose (dso); } EOF cat >> bar.c <<EOF int bar (void) { return 42; } EOF cat >> build.sh <<EOF #!/bin/bash set -x set -e BUILD=/home/carlos/build/glibc gcc -O0 -g3 -Wall -pedantic -shared -fPIC -o libbar.so bar.c gcc -O0 -g3 -Wall -pedantic -shared -fPIC -o libfoo.so foo.c -ldl gcc -Wl,--dynamic-linker=$BUILD/elf/ld.so -Wl,-rpath=$BUILD:$BUILD/elf:$BUILD/dlfcn -O0 -g3 -Wall -pedantic -o main main.c -ldl LD_LIBRARY_PATH=. ./main EOF chmod u+x build.sh ./build.sh + set -e + BUILD=/home/carlos/build/glibc + gcc -O0 -g3 -Wall -pedantic -shared -fPIC -o libbar.so bar.c + gcc -O0 -g3 -Wall -pedantic -shared -fPIC -o libfoo.so foo.c -ldl + gcc -Wl,--dynamic-linker=/home/carlos/build/glibc/elf/ld.so -Wl,-rpath=/home/carlos/build/glibc:/home/carlos/build/glibc/elf:/home/carlos/build/glibc/dlfcn -O0 -g3 -Wall -pedantic -o main main.c -ldl + LD_LIBRARY_PATH=. + ./main ./build.sh: line 8: 22948 Segmentation fault (core dumped) LD_LIBRARY_PATH=. ./main gdb main GNU gdb (GDB) Fedora 7.8.2-38.fc21 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from main...done. (gdb) r Starting program: /home/carlos/support/dlmopen-rtld-global/main Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7dedd44 in add_to_global (new=new@entry=0x6039b0) at dl-open.c:94 94 = ns->_ns_main_searchlist->r_nlist + to_add + 8; (gdb) bt #0 0x00007ffff7dedd44 in add_to_global (new=new@entry=0x6039b0) at dl-open.c:94 #1 0x00007ffff7deeafe in dl_open_worker (a=a@entry=0x7fffffffdb88) at dl-open.c:563 #2 0x00007ffff7dea104 in _dl_catch_error (objname=objname@entry=0x7fffffffdb78, errstring=errstring@entry=0x7fffffffdb80, mallocedp=mallocedp@entry=0x7fffffffdb77, operate=operate@entry=0x7ffff7dee490 <dl_open_worker>, args=args@entry=0x7fffffffdb88) at dl-error.c:187 #3 0x00007ffff7dedf03 in _dl_open (file=0x7ffff76307ed "./libbar.so", mode=-2147483390, caller_dlopen=0x7ffff76307aa, nsid=-2, argc=<optimized out>, argv=<optimized out>, env=0x7fffffffdf18) at dl-open.c:648 #4 0x00007ffff742cfa9 in ?? () #5 0x00007fffffffdf18 in ?? () #6 0x00007fffffffddc0 in ?? () #7 0x0000000000000000 in ?? () (gdb) The bug is that the the namespace's global searchlist (RTLD_GLOBAL) is never initialized. The main global searchlist is initliazed by rtld. We need a similar initialization in elf/dl-open.c (add_to_global) and set ns->_ns_main_searchlist to something. The most appropriate thing is to set it to the searchlist of the first DSO loaded into the namespace with RTLD_GLOBAL.
I could be wrong, but Carlos's suggested fix sounds simple and reasonable: > We need a similar initialization in elf/dl-open.c (add_to_global) and > set ns->_ns_main_searchlist to something. The most appropriate thing is to > set it to the searchlist of the first DSO loaded into the namespace with > RTLD_GLOBAL. Can the priority be bumped here? We just hit this trying to load Intel's libmkl_rt.so into its own namespace. Here are the detailf from gdb, looks like the same thing Carlos hit: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7dee9cd in add_to_global (new=new@entry=0x87be00) at dl-open.c:101 101 = ns->_ns_main_searchlist->r_nlist + to_add + 8; (gdb) bt #0 0x00007ffff7dee9cd in add_to_global (new=new@entry=0x87be00) at dl-open.c:101 #1 0x00007ffff7def8b0 in dl_open_worker (a=a@entry=0x7fffffff7418) at dl-open.c:564 #2 0x00007ffff7dea7c4 in _dl_catch_error (objname=objname@entry=0x7fffffff7408, errstring=errstring@entry=0x7fffffff7410, mallocedp=mallocedp@entry=0x7fffffff7400, operate=operate@entry=0x7ffff7def150 <dl_open_worker>, args=args@entry=0x7fffffff7418) at dl-error.c:177 #3 0x00007ffff7deeb7b in _dl_open (file=0x7fffffffc650 "/opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_core.so", mode=-2147483391, caller_dlopen=<optimized out>, nsid=-2, argc=2, argv=0x7fffffffe118, env=0x6feee0) at dl-open.c:649 Is there a workaround we can do from our code without touching glibc internals? Thanks for your help! -Eric
This link has more discussion on the issue: https://sourceware.org/legacy-ml/libc-alpha/2015-07/msg00474.html
Created attachment 13661 [details] 1-line patch plus comments originally written by Carlos O'Donell Here is a summary of related comments from Michael Kerrisk from here: https://sourceware.org/legacy-ml/libc-alpha/2015-07/msg00628.html --------------------------------------------------------------------- [...] This is precisely the use case the Solaris dlmopen() does support: isolation of load namespaces, while allowing DSOs inside a namespace to share symbols via RTLD_GLOBAL. > > This trick fails for the same reason that calling dlmopen > with RTLD_GLOBAL would fail if you removed the check in dlfcn/dmlopen.c > (dlmopen_doit). When you go to add the DSO to the global > search list you find there is no search list setup. In the case of > the application we have rtld setup the global search list. > > Which begs the question? What should the global search list > be for a new namespace? I propose that the global search > list for a new namespace should be a copy of the symbol search > list (scope) of the first DSO loaded into the namespace with > RTLD_GLOBAL, and subsequent RTLD_GLOBAL loads into the namespace > add to that list. The above is what Solaris appears to provide. [...] One other deviation that I note from Solaris. The dlopen() man page currently says: If filename is NULL, then the returned handle is for the main program. And this is what glibc currently does *regardless* of the namespace from which the dlopen(NULL, flags) call is made. But, in the context of dlmopen(LM_ID_NEWLM) namespaces, I'd expect this call to return something like "the root of the this namespace". And that is what Solaris appears to do. [...] The dlmopen() seems to have been added to Solaris to support precisely the use cases that Carlos describes, and the glibc implementation doesn't support those cases at all.