% cat a.c int main() {} % gcc -fuse-ld=bfd -mstack-protector-guard=global -fstack-protector-all a.c ... /usr/bin/ld.bfd: a.c:(.text+0x21): undefined reference to `__stack_chk_guard' (GCC source code let either libssp or libc to provide __stack_chk_guard. On many glibc platforms, TARGET_LIBC_PROVIDES_SSP is set.) In glibc/csu/libc-start.c, either -mstack-protector-guard=global or -mstack-protector-guard=tls is supported, but not both. On x86-64, =tls is the supported variant. However, there is no fundamental reason that the two variants cannot coexist. /* Set up the stack checker's canary. */ uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random); # ifdef THREAD_SET_STACK_GUARD THREAD_SET_STACK_GUARD (stack_chk_guard); # else __stack_chk_guard = stack_chk_guard; # endif There are some arguments that -mstack-protector-guard=global may be better: (1) Performance. On x86-64 -fno-pic and -fpie, gcc emits R_X86_64_PC32 referencing the external __stack_chk_guard. The code sequence is shorter than a TP relative access. (-fpie not using GOT was an unfortunate GCC 5 change). On -fpic =global requires a GOT indirection, which might be slower than a TP relative access. (2) Security. For a new thread, nptl/allocatestack.c allocates a memory region, places the static TLS block at the end, and uses the remaining space as the thread stack. The stack pointer may be just a few hundred bytes below the canary address (%fs:0x28 on x86-64) and a large out-of-bounds write can potentially override it. #include <stdio.h> #include <stdlib.h> #include <pthread.h> void *foo(void *_) { char a[0x100]; for (int i = 0; i < LEN; i++) a[i] = 0x41; return 0; } int main() { pthread_t pid; pthread_create(&pid, 0, foo, 0); pthread_join(pid, 0); } % gcc -fstack-protector-all a.c -pthread -D LEN=0x200 && ./a.out *** stack smashing detected ***: terminated [1] 2672403 abort ./a.out # %fs:0x28 has been overwritten, so there is no 'stack smashing detected'. % gcc -fstack-protector-all a.c -pthread -D LEN=0x1000 && ./a.out [1] 2672478 segmentation fault ./a.out -mstack-protector-guard=global does not have the problem. nsz mentioned: the kernel prefers tls because it can have different canary per thread. in userspace that would not work because some code may context switch to a different thread and check the wrong canary.
Note, -mstack-protector-guard=tls can mitigate the problem by inserting a guard page between the stack and the static TLS block, but that can come with a large overhead.
(In reply to Fangrui Song from comment #0) > (2) Security. > For a new thread, nptl/allocatestack.c allocates a memory region, places the > static TLS block at the end, and uses the remaining space as the thread > stack. > The stack pointer may be just a few hundred bytes below the canary address > (%fs:0x28 on x86-64) and a large out-of-bounds write can potentially > override it. We need to fix: Bug 11787 - Program with large TLS segments fail. https://sourceware.org/bugzilla/show_bug.cgi?id=11787 The static TLS should not be part of the stack accounting and when we split them out there won't be a problem with the stack being beside static TLS.