Created attachment 6929 [details] Test case, and patch which appears to work around the crash The attached test case loads a dummy DSO using dlmopen() with LM_ID_NEWLM; the library's constructor, in turn, calls dlsym() with bogus symbol names twice. This reproduces a segfault in ld-2.15.so on my 64-bit Ubuntu 12.04 LTS system, kernel 3.2.0-26-generic. I confirmed this also reproduces on an ld.so built from git commit 72a3b700c592d39e0e76cd75b2c5ff483e70e083. Investigating further, it appears that different instances of libc were used to allocate and free the error string used internally by dlerror(). In my reproduction using this test case, with the following maps file: 00400000-00401000 r-xp 00000000 08:11 3722707 /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dlmopen_test 00600000-00601000 r--p 00000000 08:11 3722707 /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dlmopen_test 00601000-00602000 rw-p 00001000 08:11 3722707 /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dlmopen_test 555555554000-555555575000 r-xp 00000000 08:11 3728093 /home/brnguyen/bin/glibc-build/elf/ld.so 555555774000-555555775000 r--p 00020000 08:11 3728093 /home/brnguyen/bin/glibc-build/elf/ld.so 555555775000-555555777000 rw-p 00021000 08:11 3728093 /home/brnguyen/bin/glibc-build/elf/ld.so 555555777000-555555798000 rw-p 00000000 00:00 0 [heap] 7ffff6f8e000-7ffff6f90000 r-xp 00000000 08:11 3730007 /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so 7ffff6f90000-7ffff7190000 ---p 00002000 08:11 3730007 /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so 7ffff7190000-7ffff7191000 r--p 00002000 08:11 3730007 /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so 7ffff7191000-7ffff7192000 rw-p 00003000 08:11 3730007 /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so 7ffff7192000-7ffff7339000 r-xp 00000000 08:11 3416152 /home/brnguyen/bin/glibc-build/libc.so 7ffff7339000-7ffff7538000 ---p 001a7000 08:11 3416152 /home/brnguyen/bin/glibc-build/libc.so 7ffff7538000-7ffff753c000 r--p 001a6000 08:11 3416152 /home/brnguyen/bin/glibc-build/libc.so 7ffff753c000-7ffff753e000 rw-p 001aa000 08:11 3416152 /home/brnguyen/bin/glibc-build/libc.so 7ffff753e000-7ffff7542000 rw-p 00000000 00:00 0 7ffff7542000-7ffff763e000 r-xp 00000000 08:11 3729959 /home/brnguyen/bin/glibc-build/lib/libm-2.17.90.so 7ffff763e000-7ffff783e000 ---p 000fc000 08:11 3729959 /home/brnguyen/bin/glibc-build/lib/libm-2.17.90.so 7ffff783e000-7ffff783f000 r--p 000fc000 08:11 3729959 /home/brnguyen/bin/glibc-build/lib/libm-2.17.90.so 7ffff783f000-7ffff7840000 rw-p 000fd000 08:11 3729959 /home/brnguyen/bin/glibc-build/lib/libm-2.17.90.so 7ffff7840000-7ffff7842000 r--p 00000000 08:11 3729060 /home/brnguyen/bin/glibc-build/etc/ld.so.cache 7ffff7842000-7ffff7843000 r-xp 00000000 08:11 3722702 /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dummy.so.1 7ffff7843000-7ffff7a42000 ---p 00001000 08:11 3722702 /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dummy.so.1 7ffff7a42000-7ffff7a43000 rw-p 00000000 08:11 3722702 /home/brnguyen/bugs/1188502/dlmopen_bug_debug/dummy.so.1 7ffff7a43000-7ffff7a46000 rw-p 00000000 00:00 0 7ffff7a46000-7ffff7bed000 r-xp 00000000 08:11 3416152 /home/brnguyen/bin/glibc-build/libc.so 7ffff7bed000-7ffff7dec000 ---p 001a7000 08:11 3416152 /home/brnguyen/bin/glibc-build/libc.so 7ffff7dec000-7ffff7df0000 r--p 001a6000 08:11 3416152 /home/brnguyen/bin/glibc-build/libc.so 7ffff7df0000-7ffff7df2000 rw-p 001aa000 08:11 3416152 /home/brnguyen/bin/glibc-build/libc.so 7ffff7df2000-7ffff7df6000 rw-p 00000000 00:00 0 7ffff7df6000-7ffff7df8000 r-xp 00000000 08:11 3730007 /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so 7ffff7df8000-7ffff7ff8000 ---p 00002000 08:11 3730007 /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so 7ffff7ff8000-7ffff7ff9000 r--p 00002000 08:11 3730007 /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so 7ffff7ff9000-7ffff7ffa000 rw-p 00003000 08:11 3730007 /home/brnguyen/bin/glibc-build/lib/libdl-2.17.90.so 7ffff7ffa000-7ffff7ffb000 rw-p 00000000 00:00 0 7ffff7ffc000-7ffff7ffe000 rw-p 00000000 00:00 0 7ffff7ffe000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso] 7ffffffdd000-7ffffffff000 rw-p 00000000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] elf/dl-error.c:_dl_signal_error(), which allocates the error string, lives in ld.so within the application's namespace: (gdb) p $pc $22 = (void (*)()) 0x555555562161 <_dl_signal_error+113> (gdb) p __GI__dl_find_dso_for_object($pc) $23 = (struct link_map *) 0x5555557759d8 (gdb) p $->l_ns $25 = 0 But the instance of dlfcn/dlerror.c:_dlerror_run() which clears the error lives in libdl.so within the dummy library's new namespace: (gdb) p $pc $27 = (void (*)()) 0x7ffff6f8f460 <_dlerror_run> (gdb) p __GI__dl_find_dso_for_object($pc) $28 = (struct link_map *) 0x555555778020 (gdb) p $->l_ns $30 = 1 For some reason, calling dlerror() in between the two dlsym() calls appears to work around this crash. I attempted to investigate this, but haven't determined why this suppresses the segfault. There is an existing helper function in dlerror.c called check_free(), which first checks if the caller is using the base link map before attempting to free the error string. In the included patch, I have replaced direct invocations of free() with check_free() in _dlerror_run() as well as __dlerror(); I verified this patch fixed the segfault on my system.
Brian, I can confirm that I see the behaviour you're seeing. Do you have a copyright notice on file with the FSF for glibc?
Hi Carlos, I don't believe this patch is legally significant per http://www.gnu.org/prep/maintain/maintain.html#Legally-Significant since it simply moves an existing function and replaces free() with check_free() in two locations in the code. Furthermore, per this link I believe this is merely an "idea": a possible fix for this bug is to check the frees that are performed in _dlerror_run() and __dlerror(). Thanks, Brian
(In reply to comment #2) > Hi Carlos, > > I don't believe this patch is legally significant per > http://www.gnu.org/prep/maintain/maintain.html#Legally-Significant since it > simply moves an existing function and replaces free() with check_free() in > two locations in the code. > > Furthermore, per this link I believe this is merely an "idea": a possible > fix for this bug is to check the frees that are performed in > _dlerror_run() and __dlerror(). Brian, Thanks for clarifying that. Given that you don't think it's legally significant I've had a look at the patch and I agree. I try not to look at patches until copyright is settled, thanks for understanding and answering the question :-) Firstly, please avoid the use of `ld' directly and instead use the compiler driver `gcc' to link your shared library. Without a lot of care using `ld' directly can cause problems. The patch itself looks good and does fix this test case. I'm still double checking that I didn't miss anything in the dlmopen() case. How do you feel about doing a little more work? * Add your test case to the glibc testsuite, you can see how other similar tests are handled, and we have a test-skeleton.c that can be used to run tests easily. * Build and run the testsuite with `make -k check', and report back the results. I'm not trying to avoid work, just leaning on you, since you appear to be an experienced developer and competent enough to have waded into dynamic loader code for multiple namespace support... and we could use the help :-) Please don't feel that you need to say yes to this additional work, but it will help us speed up getting this patch into 2.18. Is there a particular maintenance branch that you need or want this change for? 2.17? 2.16? Fedora? Debian? etc. Cheers, Carlos.
Hi Carlos, My employer doesn't currently have the necessary paperwork in place to assign copyright to the FSF (and getting that paperwork in place is unlikely to happen in the near future). I'm happy to help as long as my contributions can be categorized as an idea and not require explicit assignment of copyright. Is that possible for testsuite updates? If this patch is accepted, I would like to have it back-ported to Debian glibc. I'm not aware of any other maintenance branches where this would be needed at this time. Thanks, Brian
(In reply to comment #4) > Hi Carlos, > > My employer doesn't currently have the necessary paperwork in place to assign > copyright to the FSF (and getting that paperwork in place is unlikely to happen > in the near future). I'm happy to help as long as my contributions can be > categorized as an idea and not require explicit assignment of copyright. Is > that possible for testsuite updates? > > If this patch is accepted, I would like to have it back-ported to Debian glibc. > I'm not aware of any other maintenance branches where this would be needed at > this time. Thanks for setting the expectation. In general you would need copyright assignment to do any further work. We can take it from here, but it might not happen as quickly as you wish :-( Thanks for posting the patch and bug report though. We do appreciate it.
ping
Created attachment 12722 [details] 0001-dlerror-Use-check_free-to-free-the-error-string.patch I've (trivially) rebased this patch to current glibc, and attempted to wire up the test case to 'make check'. While the patch does fix the crash, it does not make the test case actually pass, and I think there's another bug to be fixed here. The test case now yields: --- datura:~/git/glibc% ./build/testrun.sh ./build/dlfcn/bug-dlmopen1 Opening bug-dlmopen1-lib.so... ./build/dlfcn/bug-dlmopen1: symbol lookup error: ./build/dlfcn/bug-dlmopen1-lib.so: undefined symbol: foo --- Which is clever, because we're just asking for foo with dlsym(), so this shouldn't be fatal. Attempting to reproduce this under gdb (I hope, I don't really understand testrun.sh) yields slightly different (also wrong) results: --- datura:~/git/glibc% ./build/testrun.sh /usr/bin/gdb -q ./build/dlfcn/bug-dlmopen1 Reading symbols from ./build/dlfcn/bug-dlmopen1... (gdb) break _dl_signal_cexception Function "_dl_signal_cexception" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (_dl_signal_cexception) pending. (gdb) run Starting program: /home/ajax/git/glibc/build/dlfcn/bug-dlmopen1 Opening bug-dlmopen1-lib.so... failed dlmopen! [Inferior 1 (process 2482832) exited with code 01] --- gdb doesn't seem at all aware of the additional link map (hence not finding the copy of _dl_signal_cexception inside it, I suppose), but the ctor still seems to be crashing and thus saying dlmopen failed. LD_DEBUG=all's output isn't super insightful either: (trimmed to the bit where control transfers to main()): --- datura:~/git/glibc% ./build/testrun.sh /usr/bin/env LD_DEBUG=all ./build/dlfcn/bug-dlmopen1 # ... 2482863: transferring control: ./build/dlfcn/bug-dlmopen1 2482863: 2482863: symbol=printf; lookup in file=./build/dlfcn/bug-dlmopen1 [0] 2482863: symbol=printf; lookup in file=/lib64/libdl.so.2 [0] 2482863: symbol=printf; lookup in file=/lib64/libc.so.6 [0] 2482863: binding file ./build/dlfcn/bug-dlmopen1 [0] to /lib64/libc.so.6 [0]: normal symbol `printf' [GLIBC_2.2.5] Opening bug-dlmopen1-lib.so... 2482863: symbol=dlmopen; lookup in file=./build/dlfcn/bug-dlmopen1 [0] 2482863: symbol=dlmopen; lookup in file=/lib64/libdl.so.2 [0] 2482863: binding file ./build/dlfcn/bug-dlmopen1 [0] to /lib64/libdl.so.2 [0]: normal symbol `dlmopen' [GLIBC_2.3.4] 2482863: 2482863: file=bug-dlmopen1-lib.so [1]; dynamically loaded by ./build/dlfcn/bug-dlmopen1 [0] 2482863: find library=bug-dlmopen1-lib.so [1]; searching 2482863: search cache=/etc/ld.so.cache 2482863: search path=/lib64/tls/haswell/x86_64:/lib64/tls/haswell:/lib64/tls/x86_64:/lib64/tls:/lib64/haswell/x86_64:/lib64/haswell:/lib64/x86_64:/lib64:/usr/lib64/tls/haswell/x86_64:/usr/lib64/tls/haswell:/usr/lib64/tls/x86_64:/usr/lib64/tls:/usr/lib64/haswell/x86_64:/usr/lib64/haswell:/usr/lib64/x86_64:/usr/lib64 (system search path) 2482863: trying file=/lib64/tls/haswell/x86_64/bug-dlmopen1-lib.so 2482863: trying file=/lib64/tls/haswell/bug-dlmopen1-lib.so 2482863: trying file=/lib64/tls/x86_64/bug-dlmopen1-lib.so 2482863: trying file=/lib64/tls/bug-dlmopen1-lib.so 2482863: trying file=/lib64/haswell/x86_64/bug-dlmopen1-lib.so 2482863: trying file=/lib64/haswell/bug-dlmopen1-lib.so 2482863: trying file=/lib64/x86_64/bug-dlmopen1-lib.so 2482863: trying file=/lib64/bug-dlmopen1-lib.so 2482863: trying file=/usr/lib64/tls/haswell/x86_64/bug-dlmopen1-lib.so 2482863: trying file=/usr/lib64/tls/haswell/bug-dlmopen1-lib.so 2482863: trying file=/usr/lib64/tls/x86_64/bug-dlmopen1-lib.so 2482863: trying file=/usr/lib64/tls/bug-dlmopen1-lib.so 2482863: trying file=/usr/lib64/haswell/x86_64/bug-dlmopen1-lib.so 2482863: trying file=/usr/lib64/haswell/bug-dlmopen1-lib.so 2482863: trying file=/usr/lib64/x86_64/bug-dlmopen1-lib.so 2482863: trying file=/usr/lib64/bug-dlmopen1-lib.so 2482863: 2482863: symbol=puts; lookup in file=./build/dlfcn/bug-dlmopen1 [0] 2482863: symbol=puts; lookup in file=/lib64/libdl.so.2 [0] 2482863: symbol=puts; lookup in file=/lib64/libc.so.6 [0] 2482863: binding file ./build/dlfcn/bug-dlmopen1 [0] to /lib64/libc.so.6 [0]: normal symbol `puts' [GLIBC_2.2.5] failed dlmopen! 2482863: 2482863: calling fini: ./build/dlfcn/bug-dlmopen1 [0] 2482863: 2482863: 2482863: calling fini: /lib64/libdl.so.2 [0] 2482863: ---
*** Bug 24772 has been marked as a duplicate of this bug. ***
I think I posted fixes for this a while back: https://sourceware.org/pipermail/libc-alpha/2019-July/104832.html https://sourceware.org/pipermail/libc-alpha/2019-July/104840.html I think they will need adjustment once this is merged: https://sourceware.org/pipermail/libc-alpha/2020-February/111168.html It's unfortunately very difficult to get reviews for dynamic linker patches.
Fixed in glibc 2.34 via: commit b2964eb1d9a6b8ab1250e8a881cf406182da5875 Author: Florian Weimer <fweimer@redhat.com> Date: Wed Apr 21 19:49:51 2021 +0200 dlfcn: Failures after dlmopen should not terminate process [BZ #24772] Commit 9e78f6f6e7134a5f299cc8de77370218f8019237 ("Implement _dl_catch_error, _dl_signal_error in libc.so [BZ #16628]") has the side effect that distinct namespaces, as created by dlmopen, now have separate implementations of the rtld exception mechanism. This means that the call to _dl_catch_error from libdl in a secondary namespace does not actually install an exception handler because the thread-local variable catch_hook in the libc.so copy in the secondary namespace is distinct from that of the base namepace. As a result, a dlsym/dlopen/... failure in a secondary namespace terminates the process with a dynamic linker error because it looks to the exception handler mechanism as if no handler has been installed. This commit restores GLRO (dl_catch_error) and uses it to set the handler in the base namespace. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*** Bug 27646 has been marked as a duplicate of this bug. ***
Fixed in glibc 2.33 via: commit f4cba6ca1e5ff2c229af6d474c3ef2c0df2fae52 Author: Florian Weimer <fweimer@redhat.com> Date: Wed Apr 21 19:49:51 2021 +0200 dlfcn: Failures after dlmopen should not terminate process [BZ #15271] Commit 9e78f6f6e7134a5f299cc8de77370218f8019237 ("Implement _dl_catch_error, _dl_signal_error in libc.so [BZ #16628]") has the side effect that distinct namespaces, as created by dlmopen, now have separate implementations of the rtld exception mechanism. This means that the call to _dl_catch_error from libdl in a secondary namespace does not actually install an exception handler because the thread-local variable catch_hook in the libc.so copy in the secondary namespace is distinct from that of the base namepace. As a result, a dlsym/dlopen/... failure in a secondary namespace terminates the process with a dynamic linker error because it looks to the exception handler mechanism as if no handler has been installed. Backport notes: GLRO (dl_catch_error) is replaced with _dl_catch_error_ptr to preserve _rtld_global_ro layout. (cherry picked from commit b2964eb1d9a6b8ab1250e8a881cf406182da5875)