Bug 12113

Summary: Segmentation fault in dynamic loader on AVX enabled OS and CPU with AVX
Product: glibc Reporter: Vitaly Slobodskoy <slvital>
Component: libcAssignee: Ulrich Drepper <drepper.fsp>
Status: RESOLVED FIXED    
Severity: critical CC: hjl.tools, jakub
Priority: P2 Flags: fweimer: security-
Version: 2.12   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed: 2010-10-13 10:25:47
Attachments: small reproducer of this problem
A patch
An updated patch

Description Vitaly Slobodskoy 2010-10-12 16:05:10 UTC
Created attachment 5053 [details]
small reproducer of this problem

Small example is attached: unpack, run “make”, then run “./exe” -> segmentation fault (won’t segfault on non-AVX OS or CPU)

Stack of crash:
#0  _dl_x86_64_save_sse () at ../sysdeps/x86_64/dl-trampoline.S:189
#1  0x0000003dc380a7dd in add_dependency (flags=5, map=0x7ffff0000910, undef_map=0x7ffff0000e70) at dl-lookup.c:613
#2  _dl_lookup_symbol_x (flags=5, map=0x7ffff0000910, undef_map=0x7ffff0000e70) at dl-lookup.c:816
#3  0x0000003dc380dbb0 in _dl_fixup (l=0x0, reloc_arg=<value optimized out>) at ../elf/dl-runtime.c:118
#4  0x0000003dc3814315 in _dl_runtime_resolve () at ../sysdeps/x86_64/dl-trampoline.S:41
#5  0x00007ffff71cf5dc in hello2 () from ./libso2.so
#6  0x00007ffff73d0636 in hello1 () from ./libso1.so
#7  0x000000000040075e in doTask ()
#8  0x0000003dc4406a3a in start_thread (arg=0x7ffff7fd1710) at pthread_create.c:297
#9  0x0000003dc3cde77d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#10 0x0000000000000000 in ?? ()

Short description:

There is AVX support in glibc 2.11. Function _dl_x86_64_save_sse has been updated with saving AVX registers. In fact, first AVX instruction in this code crashes being called from additional thread. To cause calling _dl_x86_64_save_sse function it is needed to have two DSOs loaded via dlopen with RTLD_LAZY having two global symbols with identical names. Then it is needed to call this function from both DSOs, second call will cause calling _dl_x86_64_save_sse.
Comment 1 Ulrich Drepper 2010-10-12 16:20:21 UTC
You're using obsolete code.  Check with the current git version or file a problem with your distribution provider.  There have been changes which probably affect this behavior.
Comment 2 Vitaly Slobodskoy 2010-10-12 18:18:32 UTC
The same happens for attached reproducer on FC13 (glibc-2.12-2).
Unfortunately I don't know how to check this on current git version..
Comment 3 Ulrich Drepper 2010-10-12 19:06:14 UTC
That's not recent either.  Report this to the Fedora people if you cannot reproduce the current code.  I don't have access to an AVX machine and cannot work on this.
Comment 4 Vitaly Slobodskoy 2010-10-12 19:22:23 UTC
Just a status for glibc 2.12.1-2:

#0  _dl_x86_64_save_sse () at ../sysdeps/x86_64/dl-trampoline.S:189
#1  0x00007ffff7de8aea in add_dependency (undef_name=0x7ffff683c36f "hello",
    undef_map=0x7ffff0000e70, ref=0x7ffff763dde8, symbol_scope=0x7ffff00011c8,
    version=0x0, type_class=1, flags=5, skip_map=0x0) at dl-lookup.c:628
#2  _dl_lookup_symbol_x (undef_name=0x7ffff683c36f "hello",
    undef_map=0x7ffff0000e70, ref=0x7ffff763dde8, symbol_scope=0x7ffff00011c8,
    version=0x0, type_class=1, flags=5, skip_map=0x0) at dl-lookup.c:831
#3  0x00007ffff7debee0 in _dl_fixup (l=<value optimized out>,
    reloc_arg=<value optimized out>) at ../elf/dl-runtime.c:118
#4  0x00007ffff7df2795 in _dl_runtime_resolve ()
    at ../sysdeps/x86_64/dl-trampoline.S:41
#5  0x00007ffff683c5dc in hello2 () from ./libso2.so
#6  0x00007ffff6a3d636 in hello1 () from ./libso1.so
#7  0x000000000040075e in doTask ()
#8  0x00007ffff79c5761 in start_thread (arg=0x7ffff763e710)
    at pthread_create.c:301
#9  0x00007ffff772051d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Comment 5 H.J. Lu 2010-10-13 10:25:47 UTC
It still happens with glibc trunk as of commit b833d51fbbf78b38c6ff68074c22d3fe3ddd0ce3.
Comment 6 H.J. Lu 2010-10-13 11:06:46 UTC
nptl/sysdeps/x86_64/tls.h has

# if __WORDSIZE == 64
  long int __unused2;
  /* Have space for the post-AVX register size.  */
  __m128 rtld_savespace_sse[8][4];

  void *__padding[8];
# endif
} tcbhead_t;

sysdeps/x86_64/dl-trampoline.S has

        vmovdqa %ymm0, %fs:RTLD_SAVESPACE_SSE+0*YMM_SIZE
        vmovdqa %ymm1, %fs:RTLD_SAVESPACE_SSE+1*YMM_SIZE
        vmovdqa %ymm2, %fs:RTLD_SAVESPACE_SSE+2*YMM_SIZE
        vmovdqa %ymm3, %fs:RTLD_SAVESPACE_SSE+3*YMM_SIZE
        vmovdqa %ymm4, %fs:RTLD_SAVESPACE_SSE+4*YMM_SIZE
        vmovdqa %ymm5, %fs:RTLD_SAVESPACE_SSE+5*YMM_SIZE
        vmovdqa %ymm6, %fs:RTLD_SAVESPACE_SSE+6*YMM_SIZE
        vmovdqa %ymm7, %fs:RTLD_SAVESPACE_SSE+7*YMM_SIZE
...
        vmovdqa %fs:RTLD_SAVESPACE_SSE+0*YMM_SIZE, %ymm0
        vmovdqa %fs:RTLD_SAVESPACE_SSE+1*YMM_SIZE, %ymm1
        vmovdqa %fs:RTLD_SAVESPACE_SSE+2*YMM_SIZE, %ymm2
        vmovdqa %fs:RTLD_SAVESPACE_SSE+3*YMM_SIZE, %ymm3
        vmovdqa %fs:RTLD_SAVESPACE_SSE+4*YMM_SIZE, %ymm4
        vmovdqa %fs:RTLD_SAVESPACE_SSE+5*YMM_SIZE, %ymm5
        vmovdqa %fs:RTLD_SAVESPACE_SSE+6*YMM_SIZE, %ymm6
        vmovdqa %fs:RTLD_SAVESPACE_SSE+7*YMM_SIZE, %ymm7

But rtld_savespace_sse may not be aligned at 32byte.
Comment 7 H.J. Lu 2010-10-13 12:09:06 UTC
Created attachment 5055 [details]
A patch

A patch align rtld_savespace_sse in tcbhead_t to maximum register
size.
Comment 8 Ulrich Drepper 2010-10-13 13:35:54 UTC
The aligning should happen explicitly.  There is no need for the preceding __unused field if this doesn't align the following SSE storage.  We need to add more __unused data so that we can, perhaps, reuse those fields in future for other things.


About the new test: of course it's good to have tests.  I just don't see where this test differs from other tests.  We of course already have tests which do dynamic loading.  What is this test adding?
Comment 9 H.J. Lu 2010-10-13 13:50:21 UTC
(In reply to comment #8)
> The aligning should happen explicitly.  There is no need for the preceding
> __unused field if this doesn't align the following SSE storage.  We need to add
> more __unused data so that we can, perhaps, reuse those fields in future for
> other things.

Offset is good for up to 128byte alignment:

tcb-offsets.h:#define RTLD_SAVESPACE_SSE 128

But tcbhead_t is only aligned at 16byte. No matter
how much padding you add, rtld_savespace_sse can only
be guaranteed to be aligned at alignment of tcbhead_t.

> 
> About the new test: of course it's good to have tests.  I just don't see where
> this test differs from other tests.  We of course already have tests which do
> dynamic loading.  What is this test adding?

This test happens to make tcbhead_t only aligned at
16byte, not 32byte. My patch aligns tcbhead_t at maximum
register size so that rtld_savespace_sse is properly aligned.
Comment 10 Jakub Jelinek 2010-10-13 16:38:20 UTC
Isn't TLS_TCB_ALIGN 32 though?  That should be used when aligning the TLS block and thus tcbhead_t too.
Comment 11 H.J. Lu 2010-10-13 16:57:36 UTC
(In reply to comment #10)
> Isn't TLS_TCB_ALIGN 32 though?  That should be used when aligning the TLS block
> and thus tcbhead_t too.

It is

sysdeps/x86_64/tls.h:# define TLS_INIT_TCB_ALIGN __alignof__ (struct pthread)
Comment 12 Jakub Jelinek 2010-10-13 17:23:29 UTC
So shouldn't the fix be just that TLS_INIT_TCB_ALIGN is defined to 32 as well?
Comment 13 Ulrich Drepper 2010-10-13 22:29:56 UTC
(In reply to comment #9)
> Offset is good for up to 128byte alignment:
> 
> tcb-offsets.h:#define RTLD_SAVESPACE_SSE 128
> 
> But tcbhead_t is only aligned at 16byte. No matter
> how much padding you add, rtld_savespace_sse can only
> be guaranteed to be aligned at alignment of tcbhead_t.

I don't doubt that the patch does the job.

I don't like aligning the struct member.  The struct itself (tcbhead_t) should get the alignment element.  Otherwise we might silently introduce holes in the structure which are glossed over by the alignment of the struct element.


> This test happens to make tcbhead_t only aligned at
> 16byte, not 32byte. My patch aligns tcbhead_t at maximum
> register size so that rtld_savespace_sse is properly aligned.

And what is it about this test that does it?  There seems to be nothing special.  Any alignment introduced by this is incidental, based on the current implementation having specific sizes.  I do not think it is worthwhile adding the patch in this form.
Comment 14 H.J. Lu 2010-10-13 23:22:42 UTC
Created attachment 5056 [details]
An updated patch

descr.h has

struct pthread
{
  union
  {
#if !TLS_DTV_AT_TP
    /* This overlaps the TCB as used for TLS without threads (see tls.h).  */
    tcbhead_t header;
#else
...
} __attribute ((aligned (TCB_ALIGNMENT)));

We just need to define TCB_ALIGNMENT to 32byte.
Comment 15 H.J. Lu 2010-10-13 23:24:42 UTC
(In reply to comment #12)
> So shouldn't the fix be just that TLS_INIT_TCB_ALIGN is defined to 32 as well?

Not enough due to

descr.h has

struct pthread
{
  union
  {
#if !TLS_DTV_AT_TP
    /* This overlaps the TCB as used for TLS without threads (see tls.h).  */
    tcbhead_t header;
#else
...
} __attribute ((aligned (TCB_ALIGNMENT)));

#define TCB_ALIGNMENT		16

If tcbhead_t isn't aligned at >= 32byte, struct pthread will be
aligned at 16byte,
Comment 16 H.J. Lu 2010-10-13 23:26:30 UTC
(In reply to comment #13)
> (In reply to comment #9)
> > Offset is good for up to 128byte alignment:
> > 
> > tcb-offsets.h:#define RTLD_SAVESPACE_SSE 128
> > 
> > But tcbhead_t is only aligned at 16byte. No matter
> > how much padding you add, rtld_savespace_sse can only
> > be guaranteed to be aligned at alignment of tcbhead_t.
> 
> I don't doubt that the patch does the job.
> 
> I don't like aligning the struct member.  The struct itself (tcbhead_t) should
> get the alignment element.  Otherwise we might silently introduce holes in the
> structure which are glossed over by the alignment of the struct element.
>

I posted an updated patch to increase TCB_ALIGNMENT to 32.

> 
> > This test happens to make tcbhead_t only aligned at
> > 16byte, not 32byte. My patch aligns tcbhead_t at maximum
> > register size so that rtld_savespace_sse is properly aligned.
> 
> And what is it about this test that does it?  There seems to be nothing
> special.  Any alignment introduced by this is incidental, based on the current
> implementation having specific sizes.

That is true. This test happens to align TCB at 16byte by chance.
Comment 17 Ulrich Drepper 2010-10-14 02:13:03 UTC
I applied the patch without the extra tests.  It really has no real value as is.
Comment 18 Jackie Rosen 2014-02-16 17:46:57 UTC Comment hidden (spam)