Bug 31501 - _dl_tlsdesc_dynamic_xsavec may clobber %rbx
Summary: _dl_tlsdesc_dynamic_xsavec may clobber %rbx
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.40
: P2 normal
Target Milestone: 2.40
Assignee: H.J. Lu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-16 14:23 UTC by Florian Weimer
Modified: 2024-04-03 17:43 UTC (History)
4 users (show)

See Also:
Host:
Target: x86-64
Build:
Last reconfirmed:
fweimer: security-


Attachments
Dump xsave (2.03 KB, application/octet-stream)
2024-03-17 16:28 UTC, H.J. Lu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Weimer 2024-03-16 14:23:18 UTC
I believe the stack realignment code in sysdeps/x86_64/dl-tlsdesc-dynamic.h can clobber the initial set of saved registers (including %rbx) if the stack alignment is unfortunate. The stack pointer is aligned down without first decrementing it, so the created stack area can partially overlap with the part of the red zone that was used for the initial register save.
Comment 1 Sam James 2024-03-17 04:00:51 UTC
Florian, do you mind if I ask how you noticed this? We haven't tested much with it yet (I've only done it locally for the last few months) and wondering what to look out for.
Comment 2 Florian Weimer 2024-03-17 08:12:24 UTC
An attempt to rebuild GCC with -mtls-dialect=gnu2 resulted its LTO plugin crashing randomly, fortunately during the GCC build itself. The builder had AVX-512 support. We are not sure yet if the problem can happen with AVX2 only.
Comment 3 H.J. Lu 2024-03-17 10:39:47 UTC
(In reply to Florian Weimer from comment #2)
> An attempt to rebuild GCC with -mtls-dialect=gnu2 resulted its LTO plugin
> crashing randomly, fortunately during the GCC build itself. The builder had
> AVX-512 support. We are not sure yet if the problem can happen with AVX2
> only.

Intel ABC machines won't crash. But my testcase
may crash on AMD AVX machines. Can someone
try my testcase on AMD?
Comment 4 Sam James 2024-03-17 10:43:23 UTC
(In reply to Florian Weimer from comment #2)
> An attempt to rebuild GCC with -mtls-dialect=gnu2 resulted its LTO plugin
> crashing randomly, fortunately during the GCC build itself. The builder had
> AVX-512 support. We are not sure yet if the problem can happen with AVX2
> only.

I only tried so far on znver2, but been daily driving it since then. Thanks.

(In reply to H.J. Lu from comment #3) 
> Intel ABC machines won't crash. But my testcase
> may crash on AMD AVX machines. Can someone
> try my testcase on AMD?

OK.
Comment 5 Sam James 2024-03-17 10:57:40 UTC
I get PASS: elf/tst-gnu2-tls2-x86-64 with+without your change to sysdeps/x86/cpu-features.c on znver2.
Comment 6 H.J. Lu 2024-03-17 16:28:34 UTC
Created attachment 15411 [details]
Dump xsave

On Intel AVX machine, the last 128 bytes (LWP area) of xsave buffer is unchanged:

[hjl@gnu-cfl-3 xsave-1]$ make
./x
xstate_header.xfeatures: 0xfffffffffffffff6
xmm0: 0xffffffffffffffffffffffffffffffff
xmm1: 0x00000000000000000000000000000000
xmm2: 0x00000000000000000000000000000000
xmm3: 0x00000000000000000000000000000000
xmm4: 0x00000000000000000000000000000000
xmm5: 0x00000000000000000000000000000000
xmm6: 0x00000000000000000000000000000000
xmm7: 0xffffffffffffffffffffffffffffffff
xmm8: 0x00000000000000000000000000000000
xmm9: 0x00000000000000000000000000000000
xmm10: 0x00000000000000000000000000000000
xmm11: 0x00000000000000000000000000000000
xmm12: 0x00000000000000000000000000000000
xmm13: 0x00000000000000000000000000000000
xmm14: 0x00000000000000000000000000000000
xmm15: 0xffffffffffffffffffffffffffffffff
ymm_h0: 0xffffffffffffffffffffffffffffffff
ymm_h1: 0x00000000000000000000000000000000
ymm_h2: 0x00000000000000000000000000000000
ymm_h3: 0x00000000000000000000000000000000
ymm_h4: 0x00000000000000000000000000000000
ymm_h5: 0x00000000000000000000000000000000
ymm_h6: 0x00000000000000000000000000000000
ymm_h7: 0xffffffffffffffffffffffffffffffff
ymm_h8: 0x00000000000000000000000000000000
ymm_h9: 0x00000000000000000000000000000000
ymm_h10: 0x00000000000000000000000000000000
ymm_h11: 0x00000000000000000000000000000000
ymm_h12: 0x00000000000000000000000000000000
ymm_h13: 0x00000000000000000000000000000000
ymm_h14: 0x00000000000000000000000000000000
ymm_h15: 0xffffffffffffffffffffffffffffffff
lwp: 0xffffffffffffffffffffffffffffffff
mpx0: 0x00000000000000000000000000000000
mpx1: 0x00000000000000000000000000000000
mpx2: 0x00000000000000000000000000000000
mpx3: 0x00000000000000000000000000000000
mpx4: 0xffffffffffffffffffffffffffffffff
mpx5: 0xffffffffffffffffffffffffffffffff
mpx6: 0xffffffffffffffffffffffffffffffff
mpx7: 0xffffffffffffffffffffffffffffffff
k0: 0xffffffffffffffff
k1: 0xffffffffffffffff
k2: 0xffffffffffffffff
k3: 0xffffffffffffffff
k4: 0xffffffffffffffff
k5: 0xffffffffffffffff
k6: 0xffffffffffffffff
k7: 0xffffffffffffffff
zmmh0: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh1: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh2: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh3: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh4: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh5: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh6: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh7: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh8: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh9: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh10: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh11: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh12: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh13: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh14: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh15: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh16: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh17: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh18: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh19: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh20: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh21: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh22: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh23: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh24: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh25: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh26: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh27: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh28: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh29: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh30: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
zmmh31: 0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
        0xffffffffffffffffffffffffffffffff0xffffffffffffffffffffffffffffffff
[hjl@gnu-cfl-3 xsave-1]$
Comment 7 Sourceware Commits 2024-03-19 02:46:16 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=717ebfa85c8240d32d0d19d86a484c31c55c9617

commit 717ebfa85c8240d32d0d19d86a484c31c55c9617
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Mar 18 06:40:16 2024 -0700

    x86-64: Allocate state buffer space for RDI, RSI and RBX
    
    _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack.
    After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11.  Define
    TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX
    to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to
    STATE_SAVE_OFFSET(%rsp).
    
       +==================+<- stack frame start aligned at 8 or 16 bytes
       |                  |<- RDI saved in the red zone
       |                  |<- RSI saved in the red zone
       |                  |<- RBX saved in the red zone
       |                  |<- paddings for stack realignment of 64 bytes
       |------------------|<- xsave buffer end aligned at 64 bytes
       |                  |<-
       |                  |<-
       |                  |<-
       |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp)
       |                  |<- 8-byte padding for 64-byte alignment
       |                  |<- 8-byte padding for 64-byte alignment
       |                  |<- R11
       |                  |<- R10
       |                  |<- R9
       |                  |<- R8
       |                  |<- RDX
       |                  |<- RCX
       +==================+<- RSP aligned at 64 bytes
    
    Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size
    for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI
    and RBX are saved onto stack without adjusting stack pointer first, using
    the red-zone.  This fixes BZ #31501.
    Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
Comment 8 H.J. Lu 2024-03-19 02:46:52 UTC
Fixed.
Comment 9 Carlos O'Donell 2024-04-02 13:17:21 UTC
Also needs:

commit fd7ee2e6c5eb49e4a630a9978b4d668bff6354ee
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue Mar 19 13:49:50 2024 +0100

    Add tst-gnu2-tls2mod1 to test-internal-extras
    
    That allows sysdeps/x86_64/tst-gnu2-tls2mod1.S to use internal headers.
    
    Fixes: 717ebfa85c ("x86-64: Allocate state buffer space for RDI, RSI and RBX")
Comment 10 Sourceware Commits 2024-04-03 17:43:09 UTC
The release/2.39/master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=354cabcb2634abe16da7a2ba5e648aac1204b58e

commit 354cabcb2634abe16da7a2ba5e648aac1204b58e
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Mar 18 06:40:16 2024 -0700

    x86-64: Allocate state buffer space for RDI, RSI and RBX
    
    _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack.
    After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11.  Define
    TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX
    to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to
    STATE_SAVE_OFFSET(%rsp).
    
       +==================+<- stack frame start aligned at 8 or 16 bytes
       |                  |<- RDI saved in the red zone
       |                  |<- RSI saved in the red zone
       |                  |<- RBX saved in the red zone
       |                  |<- paddings for stack realignment of 64 bytes
       |------------------|<- xsave buffer end aligned at 64 bytes
       |                  |<-
       |                  |<-
       |                  |<-
       |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp)
       |                  |<- 8-byte padding for 64-byte alignment
       |                  |<- 8-byte padding for 64-byte alignment
       |                  |<- R11
       |                  |<- R10
       |                  |<- R9
       |                  |<- R8
       |                  |<- RDX
       |                  |<- RCX
       +==================+<- RSP aligned at 64 bytes
    
    Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size
    for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI
    and RBX are saved onto stack without adjusting stack pointer first, using
    the red-zone.  This fixes BZ #31501.
    Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
    
    (cherry picked from commit 717ebfa85c8240d32d0d19d86a484c31c55c9617)