I believe the stack realignment code in sysdeps/x86_64/dl-tlsdesc-dynamic.h can clobber the initial set of saved registers (including %rbx) if the stack alignment is unfortunate. The stack pointer is aligned down without first decrementing it, so the created stack area can partially overlap with the part of the red zone that was used for the initial register save.
Florian, do you mind if I ask how you noticed this? We haven't tested much with it yet (I've only done it locally for the last few months) and wondering what to look out for.
An attempt to rebuild GCC with -mtls-dialect=gnu2 resulted its LTO plugin crashing randomly, fortunately during the GCC build itself. The builder had AVX-512 support. We are not sure yet if the problem can happen with AVX2 only.
(In reply to Florian Weimer from comment #2) > An attempt to rebuild GCC with -mtls-dialect=gnu2 resulted its LTO plugin > crashing randomly, fortunately during the GCC build itself. The builder had > AVX-512 support. We are not sure yet if the problem can happen with AVX2 > only. Intel ABC machines won't crash. But my testcase may crash on AMD AVX machines. Can someone try my testcase on AMD?
(In reply to Florian Weimer from comment #2) > An attempt to rebuild GCC with -mtls-dialect=gnu2 resulted its LTO plugin > crashing randomly, fortunately during the GCC build itself. The builder had > AVX-512 support. We are not sure yet if the problem can happen with AVX2 > only. I only tried so far on znver2, but been daily driving it since then. Thanks. (In reply to H.J. Lu from comment #3) > Intel ABC machines won't crash. But my testcase > may crash on AMD AVX machines. Can someone > try my testcase on AMD? OK.
I get PASS: elf/tst-gnu2-tls2-x86-64 with+without your change to sysdeps/x86/cpu-features.c on znver2.
Created attachment 15411 [details] Dump xsave On Intel AVX machine, the last 128 bytes (LWP area) of xsave buffer is unchanged: [hjl@gnu-cfl-3 xsave-1]$ make ./x xstate_header.xfeatures: 0xfffffffffffffff6 xmm0: 0xffffffffffffffffffffffffffffffff xmm1: 0x00000000000000000000000000000000 xmm2: 0x00000000000000000000000000000000 xmm3: 0x00000000000000000000000000000000 xmm4: 0x00000000000000000000000000000000 xmm5: 0x00000000000000000000000000000000 xmm6: 0x00000000000000000000000000000000 xmm7: 0xffffffffffffffffffffffffffffffff xmm8: 0x00000000000000000000000000000000 xmm9: 0x00000000000000000000000000000000 xmm10: 0x00000000000000000000000000000000 xmm11: 0x00000000000000000000000000000000 xmm12: 0x00000000000000000000000000000000 xmm13: 0x00000000000000000000000000000000 xmm14: 0x00000000000000000000000000000000 xmm15: 0xffffffffffffffffffffffffffffffff ymm_h0: 0xffffffffffffffffffffffffffffffff ymm_h1: 0x00000000000000000000000000000000 ymm_h2: 0x00000000000000000000000000000000 ymm_h3: 0x00000000000000000000000000000000 ymm_h4: 0x00000000000000000000000000000000 ymm_h5: 0x00000000000000000000000000000000 ymm_h6: 0x00000000000000000000000000000000 ymm_h7: 0xffffffffffffffffffffffffffffffff ymm_h8: 0x00000000000000000000000000000000 ymm_h9: 0x00000000000000000000000000000000 ymm_h10: 0x00000000000000000000000000000000 ymm_h11: 0x00000000000000000000000000000000 ymm_h12: 0x00000000000000000000000000000000 ymm_h13: 0x00000000000000000000000000000000 ymm_h14: 0x00000000000000000000000000000000 ymm_h15: 0xffffffffffffffffffffffffffffffff lwp: 0xffffffffffffffffffffffffffffffff mpx0: 0x00000000000000000000000000000000 mpx1: 0x00000000000000000000000000000000 mpx2: 0x00000000000000000000000000000000 mpx3: 0x00000000000000000000000000000000 mpx4: 0xffffffffffffffffffffffffffffffff mpx5: 0xffffffffffffffffffffffffffffffff mpx6: 0xffffffffffffffffffffffffffffffff mpx7: 0xffffffffffffffffffffffffffffffff k0: 0xffffffffffffffff k1: 0xffffffffffffffff k2: 0xffffffffffffffff k3: 0xffffffffffffffff k4: 0xffffffffffffffff k5: 0xffffffffffffffff k6: 0xffffffffffffffff k7: 0xffffffffffffffff zmmh0: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh1: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh2: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh3: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh4: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh5: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh6: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh7: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh8: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh9: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh10: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh11: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh12: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh13: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh14: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh15: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh16: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh17: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh18: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh19: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh20: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh21: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh22: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh23: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh24: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh25: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh26: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh27: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh28: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh29: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh30: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff zmmh31: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff [hjl@gnu-cfl-3 xsave-1]$
The master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=717ebfa85c8240d32d0d19d86a484c31c55c9617 commit 717ebfa85c8240d32d0d19d86a484c31c55c9617 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 18 06:40:16 2024 -0700 x86-64: Allocate state buffer space for RDI, RSI and RBX _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack. After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to STATE_SAVE_OFFSET(%rsp). +==================+<- stack frame start aligned at 8 or 16 bytes | |<- RDI saved in the red zone | |<- RSI saved in the red zone | |<- RBX saved in the red zone | |<- paddings for stack realignment of 64 bytes |------------------|<- xsave buffer end aligned at 64 bytes | |<- | |<- | |<- |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) | |<- 8-byte padding for 64-byte alignment | |<- 8-byte padding for 64-byte alignment | |<- R11 | |<- R10 | |<- R9 | |<- R8 | |<- RDX | |<- RCX +==================+<- RSP aligned at 64 bytes Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI and RBX are saved onto stack without adjusting stack pointer first, using the red-zone. This fixes BZ #31501. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
Fixed.
Also needs: commit fd7ee2e6c5eb49e4a630a9978b4d668bff6354ee Author: Andreas Schwab <schwab@suse.de> Date: Tue Mar 19 13:49:50 2024 +0100 Add tst-gnu2-tls2mod1 to test-internal-extras That allows sysdeps/x86_64/tst-gnu2-tls2mod1.S to use internal headers. Fixes: 717ebfa85c ("x86-64: Allocate state buffer space for RDI, RSI and RBX")
The release/2.39/master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=354cabcb2634abe16da7a2ba5e648aac1204b58e commit 354cabcb2634abe16da7a2ba5e648aac1204b58e Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 18 06:40:16 2024 -0700 x86-64: Allocate state buffer space for RDI, RSI and RBX _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack. After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to STATE_SAVE_OFFSET(%rsp). +==================+<- stack frame start aligned at 8 or 16 bytes | |<- RDI saved in the red zone | |<- RSI saved in the red zone | |<- RBX saved in the red zone | |<- paddings for stack realignment of 64 bytes |------------------|<- xsave buffer end aligned at 64 bytes | |<- | |<- | |<- |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) | |<- 8-byte padding for 64-byte alignment | |<- 8-byte padding for 64-byte alignment | |<- R11 | |<- R10 | |<- R9 | |<- R8 | |<- RDX | |<- RCX +==================+<- RSP aligned at 64 bytes Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI and RBX are saved onto stack without adjusting stack pointer first, using the red-zone. This fixes BZ #31501. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com> (cherry picked from commit 717ebfa85c8240d32d0d19d86a484c31c55c9617)