Bug 15271 - dlmopen()ed shared library with LM_ID_NEWLM crashes if it fails dlsym() twice
Summary: dlmopen()ed shared library with LM_ID_NEWLM crashes if it fails dlsym() twice
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.34
Assignee: Florian Weimer
URL:
Keywords:
: 24772 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-03-12 17:22 UTC by Brian Nguyen
Modified: 2021-06-09 15:17 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Test case, and patch which appears to work around the crash (2.18 KB, application/x-gzip)
2013-03-12 17:22 UTC, Brian Nguyen
Details
0001-dlerror-Use-check_free-to-free-the-error-string.patch (2.00 KB, patch)
2020-07-23 19:12 UTC, Adam Jackson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Nguyen 2013-03-12 17:22:45 UTC
Created attachment 6929 [details]
Test case, and patch which appears to work around the crash

The attached test case loads a dummy DSO using dlmopen() with LM_ID_NEWLM; the
library's constructor, in turn, calls dlsym() with bogus symbol names twice.
This reproduces a segfault in ld-2.15.so on my 64-bit Ubuntu 12.04 LTS system,
kernel 3.2.0-26-generic. I confirmed this also reproduces on an ld.so built from
git commit 72a3b700c592d39e0e76cd75b2c5ff483e70e083.

Investigating further, it appears that different instances of libc were used
to allocate and free the error string used internally by dlerror(). In my
reproduction using this test case, with the following maps file:

00400000-00401000 r-xp 00000000 08:11 3722707                           /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dlmopen_test
00600000-00601000 r--p 00000000 08:11 3722707                           /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dlmopen_test
00601000-00602000 rw-p 00001000 08:11 3722707                           /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dlmopen_test
555555554000-555555575000 r-xp 00000000 08:11 3728093                   /home/brnguyen/bin/glibc-build/elf/ld.so
555555774000-555555775000 r--p 00020000 08:11 3728093                   /home/brnguyen/bin/glibc-build/elf/ld.so
555555775000-555555777000 rw-p 00021000 08:11 3728093                   /home/brnguyen/bin/glibc-build/elf/ld.so
555555777000-555555798000 rw-p 00000000 00:00 0                         [heap]
7ffff6f8e000-7ffff6f90000 r-xp 00000000 08:11 3730007                   /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so
7ffff6f90000-7ffff7190000 ---p 00002000 08:11 3730007                   /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so
7ffff7190000-7ffff7191000 r--p 00002000 08:11 3730007                   /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so
7ffff7191000-7ffff7192000 rw-p 00003000 08:11 3730007                   /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so
7ffff7192000-7ffff7339000 r-xp 00000000 08:11 3416152                   /home/brnguyen/bin/glibc-build/libc.so
7ffff7339000-7ffff7538000 ---p 001a7000 08:11 3416152                   /home/brnguyen/bin/glibc-build/libc.so
7ffff7538000-7ffff753c000 r--p 001a6000 08:11 3416152                   /home/brnguyen/bin/glibc-build/libc.so
7ffff753c000-7ffff753e000 rw-p 001aa000 08:11 3416152                   /home/brnguyen/bin/glibc-build/libc.so
7ffff753e000-7ffff7542000 rw-p 00000000 00:00 0
7ffff7542000-7ffff763e000 r-xp 00000000 08:11 3729959                   /home/brnguyen/bin/glibc-build/lib/libm-2.17.90.so
7ffff763e000-7ffff783e000 ---p 000fc000 08:11 3729959                   /home/brnguyen/bin/glibc-build/lib/libm-2.17.90.so
7ffff783e000-7ffff783f000 r--p 000fc000 08:11 3729959                   /home/brnguyen/bin/glibc-build/lib/libm-2.17.90.so
7ffff783f000-7ffff7840000 rw-p 000fd000 08:11 3729959                   /home/brnguyen/bin/glibc-build/lib/libm-2.17.90.so
7ffff7840000-7ffff7842000 r--p 00000000 08:11 3729060                   /home/brnguyen/bin/glibc-build/etc/ld.so.cache
7ffff7842000-7ffff7843000 r-xp 00000000 08:11 3722702                   /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dummy.so.1
7ffff7843000-7ffff7a42000 ---p 00001000 08:11 3722702                   /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dummy.so.1
7ffff7a42000-7ffff7a43000 rw-p 00000000 08:11 3722702                   /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dummy.so.1
7ffff7a43000-7ffff7a46000 rw-p 00000000 00:00 0
7ffff7a46000-7ffff7bed000 r-xp 00000000 08:11 3416152                   /home/brnguyen/bin/glibc-build/libc.so
7ffff7bed000-7ffff7dec000 ---p 001a7000 08:11 3416152                   /home/brnguyen/bin/glibc-build/libc.so
7ffff7dec000-7ffff7df0000 r--p 001a6000 08:11 3416152                   /home/brnguyen/bin/glibc-build/libc.so
7ffff7df0000-7ffff7df2000 rw-p 001aa000 08:11 3416152                   /home/brnguyen/bin/glibc-build/libc.so
7ffff7df2000-7ffff7df6000 rw-p 00000000 00:00 0
7ffff7df6000-7ffff7df8000 r-xp 00000000 08:11 3730007                   /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so
7ffff7df8000-7ffff7ff8000 ---p 00002000 08:11 3730007                   /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so
7ffff7ff8000-7ffff7ff9000 r--p 00002000 08:11 3730007                   /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so
7ffff7ff9000-7ffff7ffa000 rw-p 00003000 08:11 3730007                   /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so
7ffff7ffa000-7ffff7ffb000 rw-p 00000000 00:00 0
7ffff7ffc000-7ffff7ffe000 rw-p 00000000 00:00 0
7ffff7ffe000-7ffff7fff000 r-xp 00000000 00:00 0                         [vdso]
7ffffffdd000-7ffffffff000 rw-p 00000000 00:00 0                         [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                 [vsyscall]

elf/dl-error.c:_dl_signal_error(), which allocates the error string, lives in ld.so within the application's namespace:

(gdb) p $pc
$22 = (void (*)()) 0x555555562161 <_dl_signal_error+113>
(gdb) p __GI__dl_find_dso_for_object($pc)
$23 = (struct link_map *) 0x5555557759d8
(gdb) p $->l_ns
$25 = 0

But the instance of dlfcn/dlerror.c:_dlerror_run() which clears the error lives in libdl.so within the dummy library's
new namespace:
(gdb) p $pc
$27 = (void (*)()) 0x7ffff6f8f460 <_dlerror_run>
(gdb) p __GI__dl_find_dso_for_object($pc)
$28 = (struct link_map *) 0x555555778020
(gdb) p $->l_ns
$30 = 1

For some reason, calling dlerror() in between the two dlsym() calls appears to
work around this crash. I attempted to investigate this, but haven't determined
why this suppresses the segfault.

There is an existing helper function in dlerror.c called check_free(), which
first checks if the caller is using the base link map before attempting to free
the error string. In the included patch, I have replaced direct invocations of
free() with check_free() in _dlerror_run() as well as __dlerror(); I verified
this patch fixed the segfault on my system.
Comment 1 Carlos O'Donell 2013-03-12 20:22:48 UTC
Brian,

I can confirm that I see the behaviour you're seeing.

Do you have a copyright notice on file with the FSF for glibc?
Comment 2 Brian Nguyen 2013-03-14 23:23:42 UTC
Hi Carlos,

I don't believe this patch is legally significant per
http://www.gnu.org/prep/maintain/maintain.html#Legally-Significant since it
simply moves an existing function and replaces free() with check_free() in
two locations in the code.

Furthermore, per this link I believe this is merely an "idea": a possible
fix for this bug is to check the frees that are performed in
_dlerror_run() and __dlerror().

Thanks,
Brian
Comment 3 Carlos O'Donell 2013-03-17 17:25:15 UTC
(In reply to comment #2)
> Hi Carlos,
> 
> I don't believe this patch is legally significant per
> http://www.gnu.org/prep/maintain/maintain.html#Legally-Significant since it
> simply moves an existing function and replaces free() with check_free() in
> two locations in the code.
> 
> Furthermore, per this link I believe this is merely an "idea": a possible
> fix for this bug is to check the frees that are performed in
> _dlerror_run() and __dlerror().

Brian,

Thanks for clarifying that. Given that you don't think it's legally significant I've had a look at the patch and I agree. I try not to look at patches until copyright is settled, thanks for understanding and answering the question :-)

Firstly, please avoid the use of `ld' directly and instead use the compiler driver `gcc' to link your shared library. Without a lot of care using `ld' directly can cause problems.

The patch itself looks good and does fix this test case. I'm still double checking that I didn't miss anything in the dlmopen() case.

How do you feel about doing a little more work?

* Add your test case to the glibc testsuite, you can see how other similar tests are handled, and we have a test-skeleton.c that can be used to run tests easily.

* Build and run the testsuite with `make -k check', and report back the results.

I'm not trying to avoid work, just leaning on you, since you appear to be an experienced developer and competent enough to have waded into dynamic loader code for multiple namespace support... and we could use the help :-)

Please don't feel that you need to say yes to this additional work, but it will help us speed up getting this patch into 2.18.

Is there a particular maintenance branch that you need or want this change for? 2.17? 2.16? Fedora? Debian? etc.

Cheers,
Carlos.
Comment 4 Brian Nguyen 2013-03-21 00:46:50 UTC
Hi Carlos,

My employer doesn't currently have the necessary paperwork in place to assign copyright to the FSF (and getting that paperwork in place is unlikely to happen in the near future).  I'm happy to help as long as my contributions can be categorized as an idea and not require explicit assignment of copyright.  Is that possible for testsuite updates?

If this patch is accepted, I would like to have it back-ported to Debian glibc. I'm not aware of any other maintenance branches where this would be needed at this time.

Thanks,
Brian
Comment 5 Carlos O'Donell 2013-03-27 04:03:21 UTC
(In reply to comment #4)
> Hi Carlos,
> 
> My employer doesn't currently have the necessary paperwork in place to assign
> copyright to the FSF (and getting that paperwork in place is unlikely to happen
> in the near future).  I'm happy to help as long as my contributions can be
> categorized as an idea and not require explicit assignment of copyright.  Is
> that possible for testsuite updates?
> 
> If this patch is accepted, I would like to have it back-ported to Debian glibc.
> I'm not aware of any other maintenance branches where this would be needed at
> this time.

Thanks for setting the expectation. In general you would need copyright assignment to do any further work. We can take it from here, but it might not happen as quickly as you wish :-(

Thanks for posting the patch and bug report though. We do appreciate it.
Comment 6 Ondrej Bilka 2013-10-02 22:25:01 UTC
ping
Comment 7 Adam Jackson 2020-07-23 19:12:35 UTC
Created attachment 12722 [details]
0001-dlerror-Use-check_free-to-free-the-error-string.patch

I've (trivially) rebased this patch to current glibc, and attempted to wire up the test case to 'make check'. While the patch does fix the crash, it does not make the test case actually pass, and I think there's another bug to be fixed here.

The test case now yields:

---
datura:~/git/glibc% ./build/testrun.sh ./build/dlfcn/bug-dlmopen1                 
Opening bug-dlmopen1-lib.so...
./build/dlfcn/bug-dlmopen1: symbol lookup error: ./build/dlfcn/bug-dlmopen1-lib.so: undefined symbol: foo
---

Which is clever, because we're just asking for foo with dlsym(), so this shouldn't be fatal. Attempting to reproduce this under gdb (I hope, I don't really understand testrun.sh) yields slightly different (also wrong) results:

---
datura:~/git/glibc% ./build/testrun.sh /usr/bin/gdb -q ./build/dlfcn/bug-dlmopen1
Reading symbols from ./build/dlfcn/bug-dlmopen1...
(gdb) break _dl_signal_cexception
Function "_dl_signal_cexception" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y  
Breakpoint 1 (_dl_signal_cexception) pending.
(gdb) run
Starting program: /home/ajax/git/glibc/build/dlfcn/bug-dlmopen1 
Opening bug-dlmopen1-lib.so...
failed dlmopen!
[Inferior 1 (process 2482832) exited with code 01]
---

gdb doesn't seem at all aware of the additional link map (hence not finding the copy of _dl_signal_cexception inside it, I suppose), but the ctor still seems to be crashing and thus saying dlmopen failed.

LD_DEBUG=all's output isn't super insightful either: (trimmed to the bit where control transfers to main()):

---
datura:~/git/glibc% ./build/testrun.sh /usr/bin/env LD_DEBUG=all  ./build/dlfcn/bug-dlmopen1
# ...	
   2482863:	transferring control: ./build/dlfcn/bug-dlmopen1
   2482863:	
   2482863:	symbol=printf;  lookup in file=./build/dlfcn/bug-dlmopen1 [0]
   2482863:	symbol=printf;  lookup in file=/lib64/libdl.so.2 [0]
   2482863:	symbol=printf;  lookup in file=/lib64/libc.so.6 [0]
   2482863:	binding file ./build/dlfcn/bug-dlmopen1 [0] to /lib64/libc.so.6 [0]: normal symbol `printf' [GLIBC_2.2.5]
Opening bug-dlmopen1-lib.so...
   2482863:	symbol=dlmopen;  lookup in file=./build/dlfcn/bug-dlmopen1 [0]
   2482863:	symbol=dlmopen;  lookup in file=/lib64/libdl.so.2 [0]
   2482863:	binding file ./build/dlfcn/bug-dlmopen1 [0] to /lib64/libdl.so.2 [0]: normal symbol `dlmopen' [GLIBC_2.3.4]
   2482863:	
   2482863:	file=bug-dlmopen1-lib.so [1];  dynamically loaded by ./build/dlfcn/bug-dlmopen1 [0]
   2482863:	find library=bug-dlmopen1-lib.so [1]; searching
   2482863:	 search cache=/etc/ld.so.cache
   2482863:	 search path=/lib64/tls/haswell/x86_64:/lib64/tls/haswell:/lib64/tls/x86_64:/lib64/tls:/lib64/haswell/x86_64:/lib64/haswell:/lib64/x86_64:/lib64:/usr/lib64/tls/haswell/x86_64:/usr/lib64/tls/haswell:/usr/lib64/tls/x86_64:/usr/lib64/tls:/usr/lib64/haswell/x86_64:/usr/lib64/haswell:/usr/lib64/x86_64:/usr/lib64		(system search path)
   2482863:	  trying file=/lib64/tls/haswell/x86_64/bug-dlmopen1-lib.so
   2482863:	  trying file=/lib64/tls/haswell/bug-dlmopen1-lib.so
   2482863:	  trying file=/lib64/tls/x86_64/bug-dlmopen1-lib.so
   2482863:	  trying file=/lib64/tls/bug-dlmopen1-lib.so
   2482863:	  trying file=/lib64/haswell/x86_64/bug-dlmopen1-lib.so
   2482863:	  trying file=/lib64/haswell/bug-dlmopen1-lib.so
   2482863:	  trying file=/lib64/x86_64/bug-dlmopen1-lib.so
   2482863:	  trying file=/lib64/bug-dlmopen1-lib.so
   2482863:	  trying file=/usr/lib64/tls/haswell/x86_64/bug-dlmopen1-lib.so
   2482863:	  trying file=/usr/lib64/tls/haswell/bug-dlmopen1-lib.so
   2482863:	  trying file=/usr/lib64/tls/x86_64/bug-dlmopen1-lib.so
   2482863:	  trying file=/usr/lib64/tls/bug-dlmopen1-lib.so
   2482863:	  trying file=/usr/lib64/haswell/x86_64/bug-dlmopen1-lib.so
   2482863:	  trying file=/usr/lib64/haswell/bug-dlmopen1-lib.so
   2482863:	  trying file=/usr/lib64/x86_64/bug-dlmopen1-lib.so
   2482863:	  trying file=/usr/lib64/bug-dlmopen1-lib.so
   2482863:	
   2482863:	symbol=puts;  lookup in file=./build/dlfcn/bug-dlmopen1 [0]
   2482863:	symbol=puts;  lookup in file=/lib64/libdl.so.2 [0]
   2482863:	symbol=puts;  lookup in file=/lib64/libc.so.6 [0]
   2482863:	binding file ./build/dlfcn/bug-dlmopen1 [0] to /lib64/libc.so.6 [0]: normal symbol `puts' [GLIBC_2.2.5]
failed dlmopen!
   2482863:	
   2482863:	calling fini: ./build/dlfcn/bug-dlmopen1 [0]
   2482863:	
   2482863:	
   2482863:	calling fini: /lib64/libdl.so.2 [0]
   2482863:	
---
Comment 8 Florian Weimer 2020-07-23 21:00:40 UTC
*** Bug 24772 has been marked as a duplicate of this bug. ***
Comment 9 Florian Weimer 2020-07-23 21:05:57 UTC
I think I posted fixes for this a while back:

https://sourceware.org/pipermail/libc-alpha/2019-July/104832.html
https://sourceware.org/pipermail/libc-alpha/2019-July/104840.html

I think they will need adjustment once this is merged:

https://sourceware.org/pipermail/libc-alpha/2020-February/111168.html

It's unfortunately very difficult to get reviews for dynamic linker patches.
Comment 10 Florian Weimer 2021-04-21 19:42:26 UTC
Fixed in glibc 2.34 via:

commit b2964eb1d9a6b8ab1250e8a881cf406182da5875
Author: Florian Weimer <fweimer@redhat.com>
Date:   Wed Apr 21 19:49:51 2021 +0200

    dlfcn: Failures after dlmopen should not terminate process [BZ #24772]
    
    Commit 9e78f6f6e7134a5f299cc8de77370218f8019237 ("Implement
    _dl_catch_error, _dl_signal_error in libc.so [BZ #16628]") has the
    side effect that distinct namespaces, as created by dlmopen, now have
    separate implementations of the rtld exception mechanism.  This means
    that the call to _dl_catch_error from libdl in a secondary namespace
    does not actually install an exception handler because the
    thread-local variable catch_hook in the libc.so copy in the secondary
    namespace is distinct from that of the base namepace.  As a result, a
    dlsym/dlopen/... failure in a secondary namespace terminates the process
    with a dynamic linker error because it looks to the exception handler
    mechanism as if no handler has been installed.
    
    This commit restores GLRO (dl_catch_error) and uses it to set the
    handler in the base namespace.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 11 Florian Weimer 2021-06-09 13:57:43 UTC
*** Bug 27646 has been marked as a duplicate of this bug. ***
Comment 12 Florian Weimer 2021-06-09 15:17:38 UTC
Fixed in glibc 2.33 via:

commit f4cba6ca1e5ff2c229af6d474c3ef2c0df2fae52
Author: Florian Weimer <fweimer@redhat.com>
Date:   Wed Apr 21 19:49:51 2021 +0200

    dlfcn: Failures after dlmopen should not terminate process [BZ #15271]
    
    Commit 9e78f6f6e7134a5f299cc8de77370218f8019237 ("Implement
    _dl_catch_error, _dl_signal_error in libc.so [BZ #16628]") has the
    side effect that distinct namespaces, as created by dlmopen, now have
    separate implementations of the rtld exception mechanism.  This means
    that the call to _dl_catch_error from libdl in a secondary namespace
    does not actually install an exception handler because the
    thread-local variable catch_hook in the libc.so copy in the secondary
    namespace is distinct from that of the base namepace.  As a result, a
    dlsym/dlopen/... failure in a secondary namespace terminates the process
    with a dynamic linker error because it looks to the exception handler
    mechanism as if no handler has been installed.
    
    Backport notes: GLRO (dl_catch_error) is replaced with
    _dl_catch_error_ptr to preserve _rtld_global_ro layout.
    
    (cherry picked from commit b2964eb1d9a6b8ab1250e8a881cf406182da5875)