Bug is originally reported as a https://bugs.gentoo.org/712356. In this case the following tests fail: FAIL: elf/ifuncmain9picstatic FAIL: elf/ifuncmain9static The crash seem to happen at a point when we access TLS canary before TLS segment is initialized (yes?) $ /tmp/portage/sys-libs/glibc-9999:gdb --quiet --args work/build-x86-x86_64-pc-linux-gnu-nptl/elf/ifuncmain9static Reading symbols from work/build-x86-x86_64-pc-linux-gnu-nptl/elf/ifuncmain9static... (gdb) run Starting program: /tmp/portage/sys-libs/glibc-9999/work/build-x86-x86_64-pc-linux-gnu-nptl/elf/ifuncmain9static Program received signal SIGSEGV, Segmentation fault. 0xf7f5a0cd in resolver () at ifuncmain9.c:47 47 { (gdb) bt #0 0xf7f5a0cd in resolver () at ifuncmain9.c:47 #1 0xf7f8ddc2 in elf_machine_rel (skip_ifunc=0, reloc_addr_arg=0xf7ffb098 <*ABS*@got.plt>, version=0x0, sym=0xf7f561ac, reloc=0xf7f58b80, map=0xf7ffba80 <_dl_main_map>) at ../sysdeps/i386/dl-machine.h:484 #2 elf_dynamic_do_Rel (skip_ifunc=0, lazy=0, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=<optimized out>) at do-rel.h:170 #3 _dl_relocate_static_pie () at dl-reloc-static-pie.c:49 #4 0xf7f5a4f8 in __libc_start_main (main=0xf7f59930 <main>, argc=1, argv=0xffffca14, init=0xf7f5af20 <__libc_csu_init>, fini=0xf7f5afc0 <__libc_csu_fini>, rtld_fini=0x0, stack_end=0xffffca0c) at ../csu/libc-start.c:144 #5 0xf7f59f62 in _start () at ../sysdeps/i386/start.S:113 (gdb) disassemble Dump of assembler code for function resolver: 0xf7f5a0c0 <+0>: call 0xf7f5a105 <__x86.get_pc_thunk.ax> 0xf7f5a0c5 <+5>: add $0xa0f3b,%eax 0xf7f5a0ca <+10>: sub $0x1c,%esp => 0xf7f5a0cd <+13>: mov %gs:0x14,%ecx 0xf7f5a0d4 <+20>: mov %ecx,0xc(%esp) 0xf7f5a0d8 <+24>: xor %ecx,%ecx 0xf7f5a0da <+26>: mov 0x1304(%eax),%edx 0xf7f5a0e0 <+32>: add $0x1,%edx 0xf7f5a0e3 <+35>: mov %edx,0x1304(%eax) 0xf7f5a0e9 <+41>: mov 0xc(%esp),%ecx 0xf7f5a0ed <+45>: sub %gs:0x14,%ecx 0xf7f5a0f4 <+52>: jne 0xf7f5a100 <resolver+64> 0xf7f5a0f6 <+54>: lea -0xa0f90(%eax),%eax 0xf7f5a0fc <+60>: add $0x1c,%esp 0xf7f5a0ff <+63>: ret 0xf7f5a100 <+64>: call 0xf7f8ab70 <__stack_chk_fail> End of assembler dump. glibc was build with the following configure options: * Manual CC: x86_64-pc-linux-gnu-gcc -m32 * Running do_src_configure for ABI x86 * Configuring glibc for nptl * ABI: x86 * CBUILD: x86_64-pc-linux-gnu * CHOST: x86_64-pc-linux-gnu * CTARGET: x86_64-pc-linux-gnu * CBUILD_OPT: i686-pc-linux-gnu * CTARGET_OPT: i686-pc-linux-gnu * CC: x86_64-pc-linux-gnu-gcc -m32 * CXX: * LD: * ASFLAGS: * CFLAGS: -march=sandybridge -mtune=sandybridge -pipe -fdiagnostics-show-option -Wall -Wextra -Wstack-protector -g -O2 * CPPFLAGS: * CXXFLAGS: -march=sandybridge -mtune=sandybridge -pipe -fdiagnostics-show-option -Wall -Wextra -Wstack-protector -g -O2 * LDFLAGS: -Wl,-O1 -Wl,--as-needed -Wl,--hash-style=gnu * MAKEINFO: /dev/null * Manual CC: x86_64-pc-linux-gnu-gcc -m32 -march=sandybridge -mtune=sandybridge -pipe -fdiagnostics-show-option -Wall -Wextra -Wstack-protector -g -O2 -Wl,-O1 -Wl,--as-needed -Wl,--hash-style=gnu * Manual CXX: x86_64-pc-linux-gnu-g++ -m32 -march=sandybridge -mtune=sandybridge -pipe -fdiagnostics-show-option -Wall -Wextra -Wstack-protector -g -O2 /tmp/portage/sys-libs/glibc-9999/work/glibc-9999/configure --enable-stack-protector=all --enable-stackguard-randomization --disable-cet --enable-kernel=3.2.0 --without-selinux --without-cvs --disable-werror --enable-bind-now --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --disable-profile --without-gd --with-headers=/usr/include --prefix=/usr --sysconfdir=/etc --localstatedir=/var --libdir=$(prefix)/lib --mandir=$(prefix)/share/man --infodir=$(prefix)/share/info --libexecdir=$(libdir)/misc/glibc --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion=Gentoo 9999 p16 --enable-crypt --enable-static-pie --disable-systemtap --disable-nscd --disable-timezone-tools
I'm not sure what requirements for IFUNC resolver() there are. Should it always run after canary is initialized? Or should resolver function always be marked as exempt from stack protection with 'inhibit_stack_protector' or similar?
You should not use --enable-stack-protector=all, but --enable-stack-protector=strong. “all” is very unlikely to add additional protection. If you do that, the IFUNC resolver will not be instrumented (because it has no local addressable variables, so no stack buffer overflow is possible), so the bug goes away. I'm not sure if we can or should order the startup sequence for statically linked binaries differently, so I'm not closing this bug.
(In reply to Florian Weimer from comment #2) > You should not use --enable-stack-protector=all, but > --enable-stack-protector=strong. “all” is very unlikely to add additional > protection. If you do that, the IFUNC resolver will not be instrumented > (because it has no local addressable variables, so no stack buffer overflow > is possible), so the bug goes away. > > I'm not sure if we can or should order the startup sequence for statically > linked binaries differently, so I'm not closing this bug. Oh, that's surprising to hear. Gentoo will have to switch the default from =all to =strong then. Does it mean --enable-stack-protector=all is not really a supported option for glibc? If' it's not I can hack a patch that complains loudly about the option being experimental. My main worry is that resolver in ifuncmain9picstatic's source code only happens to be a part of glibc. It could have been an external program. Any external program might decide to enable -fstack-protector-all and will get non-working resolver as a result. Is there a doc that describes this and other limitations imposed on IFUNC resolves by glibc? I wonder what else we break by accident.
(In reply to Sergei Trofimovich from comment #3) > Does it mean --enable-stack-protector=all is not really a supported option > for glibc? If' it's not I can hack a patch that complains loudly about the > option being experimental. It is supported in the sense that it is there and you can build glibc with it, but some edge cases in the test suite have not been covered. (At this time, I'm not sure if this is a test bug or a bug in the static initialization sequence, though.) > My main worry is that resolver in ifuncmain9picstatic's source code only > happens to be a part of glibc. It could have been an external program. Any > external program might decide to enable -fstack-protector-all and will get > non-working resolver as a result. Is there a doc that describes this and > other limitations imposed on IFUNC resolves by glibc? I wonder what else we > break by accident. It's a valid concern. We probably have to reorder the initialization sequence because on some architecture (notably powerpc as of commit 67385a01d229751569b6aac067ffdcd813a15d7a ("powerpc: Add hwcap/hwcap2/platform data to TCB."), IFUNC resolvers are expected to access the TCB to choose the appropriate implementation.
I can confirm this. IFUNC resolvers should be marked with inhibit_stack_protector: or, at least, when I added --enable-stack-protector=all, I marked all then present in glibc that way (and nobody said it was a bad idea). I don't see any other way to do it without radical reorganization of ld.so, for more or less no gain. IFUNC resolvers have lots of obscure requirements on them anyway: this is just another one. I'll submit a patch fixing this in the next few days (probably next weekend, over Easter) -- though the patch is so trivial that anyone else who's already got a build-many-glibcs setup that doesn't need major surgery to revive it (unlike me) is welcome to come up with it earlier :) (taken.)
(btw, I note that --enable-stack-protector=all *has* spotted two bugs in glibc that no lower level has identified. Another way of putting this, of course, is that it's only spotted two -- and one of them would have been entirely harmless without the canary breaking things :) Personally, I run with =strong everywhere except for firewalls, which get =all.)
Created attachment 12663 [details] inhibit_stack_protector_patch Patch posted to libc-alpha ML and also attached here. I went with Nick's advice with inhibit_stack_protector like he done previously. Tests no longer failing.
The master branch has been updated by Siddhesh Poyarekar <siddhesh@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=03f42a56eb4e88601ebb334787c8198156197b29 commit 03f42a56eb4e88601ebb334787c8198156197b29 Author: David Hughes <davidhughes205@gmail.com> Date: Mon Mar 15 20:23:39 2021 +0530 Add inhibit_stack_protector to ifuncmain9 [BZ #25680] Enabling --enable-stack-protector=all causes the following tests to fail: FAIL: elf/ifuncmain9picstatic FAIL: elf/ifuncmain9static Nick Alcock (who committed the stack protector code) marked the IFUNC resolvers with inhibit_stack_protector when he done the original work and suggested doing so again @ BZ #25680. This patch adds inhibit_stack_protector to ifuncmain9. After patch is applied, --enable-stack-protector=all does not fail the above tests. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Fix is in mainline, thanks for the patch Dave! I've also created bug 27582 because it is possible (but just a bit more work to vwrite and verify than I can put in at the moment) to make this test non-flaky with stack-protector-all.