[PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
H.J. Lu
hjl.tools@gmail.com
Sun Mar 17 03:14:25 GMT 2024
On Sat, Mar 16, 2024 at 6:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 3:05 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >
> > > > * H. J. Lu:
> > > >
> > > > > Please verify if this is the right testcase.
> > > >
> > > > Test case works (fails without my fix, succeeds with my fix). Some
> > > > comments below.
> > > >
> > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > > new file mode 100644
> > > > > index 0000000000..8129b28061
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > > @@ -0,0 +1,57 @@
> > > >
> > > > > + .text
> > > > > + .p2align 4
> > > > > + .globl apply_tls
> > > > > + .type apply_tls, @function
> > > > > +apply_tls:
> > > > > + .cfi_startproc
> > > >
> > > > Missing CET marker.
> > > >
> > > > > + subq $24, %rsp
> > > > > + .cfi_def_cfa_offset 32
> > > > > + movdqu (%rdi), %xmm0
> > > > > + movq %fs:40, %rax
> > > > > + movq %rax, 8(%rsp)
> > > > > + xorl %eax, %eax
> > > > > + leaq tls_var0@TLSDESC(%rip), %rax
> > > > > + call *tls_var0@TLSCALL(%rax)
> > > > > + addq %fs:0, %rax
> > > > > + movups %xmm0, (%rax)
> > > > > + movdqu 16(%rdi), %xmm1
> > > > > + movups %xmm1, 16(%rax)
> > > > > + movq 8(%rsp), %rdx
> > > > > + subq %fs:40, %rdx
> > > > > + jne .L5
> > > > > + addq $24, %rsp
> > > > > + .cfi_remember_state
> > > > > + .cfi_def_cfa_offset 8
> > > > > + ret
> > > > > +.L5:
> > > > > + .cfi_restore_state
> > > > > + call __stack_chk_fail@PLT
> > > >
> > > > Not sure if we need this?
> > > >
> > > > Maybe add some comment what exactly this subtest is exercising?
> > > >
> > > > These are present in the other TLS modules as well.
> > > >
> > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > > > new file mode 100644
> > > > > index 0000000000..af4b7ca761
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > >
> > > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > > > + clobber %rbx. */
> > > > > +#define OFFSET (56 + 16 + 16 + 16)
> > > > > +
> > > > > + .text
> > > > > + .p2align 4
> > > > > + .globl apply_tls
> > > > > + .type apply_tls, @function
> > > > > +apply_tls:
> > > > > + .cfi_startproc
> > > > > + pushq %rbp
> > > > > + .cfi_def_cfa_offset 16
> > > > > + .cfi_offset 6, -16
> > > > > + movq %rsp, %rbp
> > > > > + .cfi_def_cfa_register 6
> > > > > + /* Align stack to 64 bytes. */
> > > > > + andq $-64, %rsp
> > > > > + pushq %rbx
> > > > > + subq $OFFSET, %rsp
> > > >
> > > > The offset could be loaded from a global variable or something like
> > > > that. We should exercise a wide range of stack alignments—the
> > > > individual tests are cheap. And maybe check extra registers.
> > >
> > > I will clean it up with a different fix.
> > >
> >
> > I submitted a patch with a testase:
> >
> > https://patchwork.sourceware.org/project/glibc/list/?series=31963
> >
> > My patch allocates 64 more bytes to avoid clobbering saved RDI,
> > RSI and RBX values on stack by xsave. It avoids 2 stack
> > adjustments. Either my fix or Florian's fix should fix the issue.
> > I don't have a strong preference as long as my testcase is
> > included.
> >
> >
> I think my testcase may fail on AMD AVX CPUs without the
> fix. On Intel AVX CPUs, the state size is 960 bytes. But the
> last 128 bytes may be unused.
>
I sent out the v2 patch:
https://patchwork.sourceware.org/project/glibc/list/?series=31966
to simplify the testcase.
--
H.J.
More information about the Libc-alpha
mailing list