_dl_tlsdesc_dynamic only preserves integer registers, but not vector registers.
Created attachment 15358 [details] A patch I am testing this.
I think we need more discussion what the ABI should be like. The trampoline does not need to save argument registers.
(In reply to Florian Weimer from comment #2) > I think we need more discussion what the ABI should be like. > > The trampoline does not need to save argument registers. Or maybe it does. We should have official ABI clarification.
(In reply to Florian Weimer from comment #3) > (In reply to Florian Weimer from comment #2) > > I think we need more discussion what the ABI should be like. > > > > The trampoline does not need to save argument registers. > > Or maybe it does. We should have official ABI clarification. It is explicitly documented: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt --- The functions defined above use custom calling conventions that require them to preserve any registers they modify. This penalizes the case that requires dynamic TLS, since it must preserve (*) all call-clobbered registers before calling __tls_get_addr(), but it is optimized for the most common case of static TLS, and also for the case in which the code generated by the compiler can be relaxed by the linker to a more efficient access model: being able to assume no registers are clobbered by the call tends to improve register allocation. Also, the function that handles the dynamic TLS case will most often be able to avoid calling __tls_get_addr(), thus potentially avoiding the need for preserving registers. ---
Arm has the similar issue: https://github.com/zatrazz/glibc/commits/azanella/tls-descriptor-fixes-arm/
I feel like this emphasises that we need to look at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48868.
(In reply to Sam James from comment #6) > I feel like this emphasises that we need to look at > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48868. GCC testsuite isn't set up to support run-time tests with extra library dependencies. For now, we can add GNU2 TLS run-time tests to glibc. My glibc patch: https://patchwork.sourceware.org/project/glibc/list/?series=30995 includes one. So far, it has detected the GNU2 TLS bug on 3 arches.
(In reply to H.J. Lu from comment #7) > (In reply to Sam James from comment #6) > > I feel like this emphasises that we need to look at > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48868. > > GCC testsuite isn't set up to support run-time tests with extra library > dependencies. For now, we can add GNU2 TLS run-time tests to glibc. gcc.dg/torture/tls has some basic tests but only with a single TU. We could add -mtls-dialect=gnu2 to tls.exp in DG_TORTURE_OPTIONS but it would not gain us much. I hoped that there might be some for IPA or LTO but looks like no. > My glibc patch: > [...] > includes one. So far, it has detected the GNU2 TLS bug on 3 arches. Ah, thanks, I hadn't looked at it. I guess this with nsz's previous tests should give us OK coverage for now.
(In reply to Sam James from comment #8) > > My glibc patch: > > [...] > > includes one. So far, it has detected the GNU2 TLS bug on 3 arches. > > Ah, thanks, I hadn't looked at it. I guess this with nsz's previous tests > should give us OK coverage for now. Each arch should define PREPARE_MALLOC to clobber more caller-saved registers when calling malloc so that we can improve coverage on each arch.
The master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0aac205a814a8511e98d02b91a8dc908f1c53cde commit 0aac205a814a8511e98d02b91a8dc908f1c53cde Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Feb 26 06:37:03 2024 -0800 x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers Compiler generates the following instruction sequence for GNU2 dynamic TLS access: leaq tls_var@TLSDESC(%rip), %rax call *tls_var@TLSCALL(%rax) or leal tls_var@TLSDESC(%ebx), %eax call *tls_var@TLSCALL(%eax) CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS and RAX/EAX, are unchanged after CALL. When _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow path. __tls_get_addr is a normal function which doesn't preserve any caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer caller-saved registers, but didn't preserve any other caller-saved registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all caller-saved registers. This fixes BZ #31372. Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic) to optimize elf_machine_runtime_setup. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
The master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9b7091415af47082664717210ac49d51551456ab commit 9b7091415af47082664717210ac49d51551456ab Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Feb 28 12:08:03 2024 -0800 x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers _dl_tlsdesc_dynamic should also preserve AMX registers which are caller-saved. Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID to x86-64 TLSDESC_CALL_STATE_SAVE_MASK. Compute the AMX state size and save it in xsave_state_full_size which is only used by _dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec. This fixes the AMX part of BZ #31372. Tested on AMX processor. AMX test is enabled only for compilers with the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098 GCC 14 and GCC 11/12/13 branches have the bug fix. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
These two other issues are needed and should be noted as needed: commit fd7ee2e6c5eb49e4a630a9978b4d668bff6354ee Author: Andreas Schwab <schwab@suse.de> Date: Tue Mar 19 13:49:50 2024 +0100 Add tst-gnu2-tls2mod1 to test-internal-extras That allows sysdeps/x86_64/tst-gnu2-tls2mod1.S to use internal headers. Fixes: 717ebfa85c ("x86-64: Allocate state buffer space for RDI, RSI and RBX") commit 717ebfa85c8240d32d0d19d86a484c31c55c9617 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 18 06:40:16 2024 -0700 x86-64: Allocate state buffer space for RDI, RSI and RBX _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack. After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to STATE_SAVE_OFFSET(%rsp). +==================+<- stack frame start aligned at 8 or 16 bytes | |<- RDI saved in the red zone | |<- RSI saved in the red zone | |<- RBX saved in the red zone | |<- paddings for stack realignment of 64 bytes |------------------|<- xsave buffer end aligned at 64 bytes | |<- | |<- | |<- |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) | |<- 8-byte padding for 64-byte alignment | |<- 8-byte padding for 64-byte alignment | |<- R11 | |<- R10 | |<- R9 | |<- R8 | |<- RDX | |<- RCX +==================+<- RSP aligned at 64 bytes Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI and RBX are saved onto stack without adjusting stack pointer first, using the red-zone. This fixes BZ #31501. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
The release/2.39/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a364304718725a31ab141936322855c76c73e35e commit a364304718725a31ab141936322855c76c73e35e Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Feb 26 06:37:03 2024 -0800 x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers Compiler generates the following instruction sequence for GNU2 dynamic TLS access: leaq tls_var@TLSDESC(%rip), %rax call *tls_var@TLSCALL(%rax) or leal tls_var@TLSDESC(%ebx), %eax call *tls_var@TLSCALL(%eax) CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS and RAX/EAX, are unchanged after CALL. When _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow path. __tls_get_addr is a normal function which doesn't preserve any caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer caller-saved registers, but didn't preserve any other caller-saved registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all caller-saved registers. This fixes BZ #31372. Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic) to optimize elf_machine_runtime_setup. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit 0aac205a814a8511e98d02b91a8dc908f1c53cde)
The release/2.39/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=853e915fdd6ae6c5f1a7a68d2594ec8dbfef1286 commit 853e915fdd6ae6c5f1a7a68d2594ec8dbfef1286 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Feb 28 12:08:03 2024 -0800 x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers _dl_tlsdesc_dynamic should also preserve AMX registers which are caller-saved. Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID to x86-64 TLSDESC_CALL_STATE_SAVE_MASK. Compute the AMX state size and save it in xsave_state_full_size which is only used by _dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec. This fixes the AMX part of BZ #31372. Tested on AMX processor. AMX test is enabled only for compilers with the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098 GCC 14 and GCC 11/12/13 branches have the bug fix. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com> (cherry picked from commit 9b7091415af47082664717210ac49d51551456ab)
Fixed on 2.40. The x86 fixes are 0aac205a814a8511e98d02b91a8dc908f1c53cde, 9b7091415af47082664717210ac49d51551456ab, and 717ebfa85c8240d32d0d19d86a484c31c55c9617; while arm32 is fixed by 64c7e344289ed085517c2227d8e3b06388242c13 (along with other commits for the test infrastructure). The aarch64 implementation already handle the required caller-saved registers.