Bug 18684 - dlmopen a DSO that dlopen's into RTLD_GLOBAL segfaults.
Summary: dlmopen a DSO that dlopen's into RTLD_GLOBAL segfaults.
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.21
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 16881
  Show dependency treegraph
 
Reported: 2015-07-16 03:15 UTC by Carlos O'Donell
Modified: 2022-10-08 08:47 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
1-line patch plus comments originally written by Carlos O'Donell (1.20 KB, patch)
2021-09-10 23:28 UTC, Eric Wheeler
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos O'Donell 2015-07-16 03:15:00 UTC
The following program segfaults on glibc master:

cat >> main.c <<EOF
/* Test dlmopen of a DSO that calls dlopen RTLD_GLOBAL.  */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#define DSO "./libfoo.so"
#define FUNC "foo"
int
main (void)
{
  void *dso;
  int (*func) (void);
  dso = dlmopen (LM_ID_NEWLM, DSO, RTLD_NOW|RTLD_LOCAL);
  *(void **) (&func) = dlsym (dso, FUNC);
  (*func) ();
  dlclose (dso);
  return 0;
}
EOF
cat >> foo.c <<EOF
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#define DSO "./libbar.so"
#define FUNC "bar"

void 
foo (void)
{
  void *dso;
  int (*func) (void);
  dso = dlopen (DSO, RTLD_NOW|RTLD_GLOBAL);
  *(void **) (&func) = dlsym (dso, FUNC);
  (*func) ();
  dlclose (dso);
}
EOF
cat >> bar.c <<EOF
int
bar (void)
{
  return 42;
}
EOF
cat >> build.sh <<EOF
#!/bin/bash
set -x
set -e
BUILD=/home/carlos/build/glibc
gcc -O0 -g3 -Wall -pedantic -shared -fPIC -o libbar.so bar.c
gcc -O0 -g3 -Wall -pedantic -shared -fPIC -o libfoo.so foo.c -ldl
gcc -Wl,--dynamic-linker=$BUILD/elf/ld.so -Wl,-rpath=$BUILD:$BUILD/elf:$BUILD/dlfcn -O0 -g3 -Wall -pedantic -o main main.c -ldl 
LD_LIBRARY_PATH=. ./main
EOF
chmod u+x build.sh
./build.sh

+ set -e
+ BUILD=/home/carlos/build/glibc
+ gcc -O0 -g3 -Wall -pedantic -shared -fPIC -o libbar.so bar.c
+ gcc -O0 -g3 -Wall -pedantic -shared -fPIC -o libfoo.so foo.c -ldl
+ gcc -Wl,--dynamic-linker=/home/carlos/build/glibc/elf/ld.so -Wl,-rpath=/home/carlos/build/glibc:/home/carlos/build/glibc/elf:/home/carlos/build/glibc/dlfcn -O0 -g3 -Wall -pedantic -o main main.c -ldl
+ LD_LIBRARY_PATH=.
+ ./main
./build.sh: line 8: 22948 Segmentation fault      (core dumped) LD_LIBRARY_PATH=. ./main

gdb main
GNU gdb (GDB) Fedora 7.8.2-38.fc21
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main...done.
(gdb) r
Starting program: /home/carlos/support/dlmopen-rtld-global/main 

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7dedd44 in add_to_global (new=new@entry=0x6039b0) at dl-open.c:94
94		= ns->_ns_main_searchlist->r_nlist + to_add + 8;
(gdb) bt
#0  0x00007ffff7dedd44 in add_to_global (new=new@entry=0x6039b0) at dl-open.c:94
#1  0x00007ffff7deeafe in dl_open_worker (a=a@entry=0x7fffffffdb88) at dl-open.c:563
#2  0x00007ffff7dea104 in _dl_catch_error (objname=objname@entry=0x7fffffffdb78, 
    errstring=errstring@entry=0x7fffffffdb80, mallocedp=mallocedp@entry=0x7fffffffdb77, 
    operate=operate@entry=0x7ffff7dee490 <dl_open_worker>, args=args@entry=0x7fffffffdb88)
    at dl-error.c:187
#3  0x00007ffff7dedf03 in _dl_open (file=0x7ffff76307ed "./libbar.so", mode=-2147483390, 
    caller_dlopen=0x7ffff76307aa, nsid=-2, argc=<optimized out>, argv=<optimized out>, 
    env=0x7fffffffdf18) at dl-open.c:648
#4  0x00007ffff742cfa9 in ?? ()
#5  0x00007fffffffdf18 in ?? ()
#6  0x00007fffffffddc0 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) 

The bug is that the the namespace's global searchlist (RTLD_GLOBAL) is never initialized.

The main global searchlist is initliazed by rtld.

We need a similar initialization in elf/dl-open.c (add_to_global) and set ns->_ns_main_searchlist to something. The most appropriate thing is to set it to the searchlist of the first DSO loaded into the namespace with RTLD_GLOBAL.
Comment 1 Eric Wheeler 2021-09-10 23:15:08 UTC
I could be wrong, but Carlos's suggested fix sounds simple and reasonable:

> We need a similar initialization in elf/dl-open.c (add_to_global) and 
> set ns->_ns_main_searchlist to something. The most appropriate thing is to 
> set it to the searchlist of the first DSO loaded into the namespace with
> RTLD_GLOBAL.

Can the priority be bumped here?

We just hit this trying to load Intel's libmkl_rt.so into its own namespace.  Here are the detailf from gdb, looks like the same thing Carlos hit:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7dee9cd in add_to_global (new=new@entry=0x87be00) at dl-open.c:101
101		= ns->_ns_main_searchlist->r_nlist + to_add + 8;
(gdb) bt
#0  0x00007ffff7dee9cd in add_to_global (new=new@entry=0x87be00) at dl-open.c:101
#1  0x00007ffff7def8b0 in dl_open_worker (a=a@entry=0x7fffffff7418) at dl-open.c:564
#2  0x00007ffff7dea7c4 in _dl_catch_error (objname=objname@entry=0x7fffffff7408, errstring=errstring@entry=0x7fffffff7410, mallocedp=mallocedp@entry=0x7fffffff7400, operate=operate@entry=0x7ffff7def150 <dl_open_worker>, args=args@entry=0x7fffffff7418) at dl-error.c:177
#3  0x00007ffff7deeb7b in _dl_open (file=0x7fffffffc650 "/opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_core.so", mode=-2147483391, caller_dlopen=<optimized out>, nsid=-2, argc=2, argv=0x7fffffffe118, env=0x6feee0) at dl-open.c:649



  Is there a workaround we can do from our code without touching glibc internals?

Thanks for your help!

-Eric
Comment 2 Eric Wheeler 2021-09-10 23:18:39 UTC
This link has more discussion on the issue:

https://sourceware.org/legacy-ml/libc-alpha/2015-07/msg00474.html
Comment 3 Eric Wheeler 2021-09-10 23:28:37 UTC
Created attachment 13661 [details]
1-line patch plus comments originally written by Carlos O'Donell


Here is a summary of related comments from Michael Kerrisk from here:
  https://sourceware.org/legacy-ml/libc-alpha/2015-07/msg00628.html

---------------------------------------------------------------------

[...]

This is precisely the use case the Solaris dlmopen() does support:
isolation of load namespaces, while allowing DSOs inside a namespace
to share symbols via RTLD_GLOBAL.
> 
> This trick fails for the same reason that calling dlmopen
> with RTLD_GLOBAL would fail if you removed the check in dlfcn/dmlopen.c
> (dlmopen_doit). When you go to add the DSO to the global
> search list you find there is no search list setup. In the case of
> the application we have rtld setup the global search list.
> 
> Which begs the question? What should the global search list
> be for a new namespace? I propose that the global search
> list for a new namespace should be a copy of the symbol search
> list (scope) of the first DSO loaded into the namespace with
> RTLD_GLOBAL, and subsequent RTLD_GLOBAL loads into the namespace
> add to that list.

The above is what Solaris appears to provide.

[...]

One other deviation that I note from Solaris. The dlopen() man page
currently says:

       If filename is NULL, then the returned handle is  for
       the  main  program.

And this is what glibc currently does *regardless* of the namespace
from which the dlopen(NULL, flags) call is made. But, in the context
of dlmopen(LM_ID_NEWLM) namespaces, I'd expect this call to return 
something like "the root of the this namespace". And that is what
Solaris appears to do.

[...]

The dlmopen() seems to have been added to Solaris to support
precisely the use cases that Carlos describes, and the glibc
implementation doesn't support those cases at all.