Summary: | Segmentation fault in dynamic loader on AVX enabled OS and CPU with AVX | ||
---|---|---|---|
Product: | glibc | Reporter: | Vitaly Slobodskoy <slvital> |
Component: | libc | Assignee: | Ulrich Drepper <drepper.fsp> |
Status: | RESOLVED FIXED | ||
Severity: | critical | CC: | hjl.tools, jakub |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | 2.12 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | 2010-10-13 10:25:47 | |
Attachments: |
small reproducer of this problem
A patch An updated patch |
You're using obsolete code. Check with the current git version or file a problem with your distribution provider. There have been changes which probably affect this behavior. The same happens for attached reproducer on FC13 (glibc-2.12-2). Unfortunately I don't know how to check this on current git version.. That's not recent either. Report this to the Fedora people if you cannot reproduce the current code. I don't have access to an AVX machine and cannot work on this. Just a status for glibc 2.12.1-2: #0 _dl_x86_64_save_sse () at ../sysdeps/x86_64/dl-trampoline.S:189 #1 0x00007ffff7de8aea in add_dependency (undef_name=0x7ffff683c36f "hello", undef_map=0x7ffff0000e70, ref=0x7ffff763dde8, symbol_scope=0x7ffff00011c8, version=0x0, type_class=1, flags=5, skip_map=0x0) at dl-lookup.c:628 #2 _dl_lookup_symbol_x (undef_name=0x7ffff683c36f "hello", undef_map=0x7ffff0000e70, ref=0x7ffff763dde8, symbol_scope=0x7ffff00011c8, version=0x0, type_class=1, flags=5, skip_map=0x0) at dl-lookup.c:831 #3 0x00007ffff7debee0 in _dl_fixup (l=<value optimized out>, reloc_arg=<value optimized out>) at ../elf/dl-runtime.c:118 #4 0x00007ffff7df2795 in _dl_runtime_resolve () at ../sysdeps/x86_64/dl-trampoline.S:41 #5 0x00007ffff683c5dc in hello2 () from ./libso2.so #6 0x00007ffff6a3d636 in hello1 () from ./libso1.so #7 0x000000000040075e in doTask () #8 0x00007ffff79c5761 in start_thread (arg=0x7ffff763e710) at pthread_create.c:301 #9 0x00007ffff772051d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 It still happens with glibc trunk as of commit b833d51fbbf78b38c6ff68074c22d3fe3ddd0ce3. nptl/sysdeps/x86_64/tls.h has # if __WORDSIZE == 64 long int __unused2; /* Have space for the post-AVX register size. */ __m128 rtld_savespace_sse[8][4]; void *__padding[8]; # endif } tcbhead_t; sysdeps/x86_64/dl-trampoline.S has vmovdqa %ymm0, %fs:RTLD_SAVESPACE_SSE+0*YMM_SIZE vmovdqa %ymm1, %fs:RTLD_SAVESPACE_SSE+1*YMM_SIZE vmovdqa %ymm2, %fs:RTLD_SAVESPACE_SSE+2*YMM_SIZE vmovdqa %ymm3, %fs:RTLD_SAVESPACE_SSE+3*YMM_SIZE vmovdqa %ymm4, %fs:RTLD_SAVESPACE_SSE+4*YMM_SIZE vmovdqa %ymm5, %fs:RTLD_SAVESPACE_SSE+5*YMM_SIZE vmovdqa %ymm6, %fs:RTLD_SAVESPACE_SSE+6*YMM_SIZE vmovdqa %ymm7, %fs:RTLD_SAVESPACE_SSE+7*YMM_SIZE ... vmovdqa %fs:RTLD_SAVESPACE_SSE+0*YMM_SIZE, %ymm0 vmovdqa %fs:RTLD_SAVESPACE_SSE+1*YMM_SIZE, %ymm1 vmovdqa %fs:RTLD_SAVESPACE_SSE+2*YMM_SIZE, %ymm2 vmovdqa %fs:RTLD_SAVESPACE_SSE+3*YMM_SIZE, %ymm3 vmovdqa %fs:RTLD_SAVESPACE_SSE+4*YMM_SIZE, %ymm4 vmovdqa %fs:RTLD_SAVESPACE_SSE+5*YMM_SIZE, %ymm5 vmovdqa %fs:RTLD_SAVESPACE_SSE+6*YMM_SIZE, %ymm6 vmovdqa %fs:RTLD_SAVESPACE_SSE+7*YMM_SIZE, %ymm7 But rtld_savespace_sse may not be aligned at 32byte. Created attachment 5055 [details]
A patch
A patch align rtld_savespace_sse in tcbhead_t to maximum register
size.
The aligning should happen explicitly. There is no need for the preceding __unused field if this doesn't align the following SSE storage. We need to add more __unused data so that we can, perhaps, reuse those fields in future for other things. About the new test: of course it's good to have tests. I just don't see where this test differs from other tests. We of course already have tests which do dynamic loading. What is this test adding? (In reply to comment #8) > The aligning should happen explicitly. There is no need for the preceding > __unused field if this doesn't align the following SSE storage. We need to add > more __unused data so that we can, perhaps, reuse those fields in future for > other things. Offset is good for up to 128byte alignment: tcb-offsets.h:#define RTLD_SAVESPACE_SSE 128 But tcbhead_t is only aligned at 16byte. No matter how much padding you add, rtld_savespace_sse can only be guaranteed to be aligned at alignment of tcbhead_t. > > About the new test: of course it's good to have tests. I just don't see where > this test differs from other tests. We of course already have tests which do > dynamic loading. What is this test adding? This test happens to make tcbhead_t only aligned at 16byte, not 32byte. My patch aligns tcbhead_t at maximum register size so that rtld_savespace_sse is properly aligned. Isn't TLS_TCB_ALIGN 32 though? That should be used when aligning the TLS block and thus tcbhead_t too. (In reply to comment #10) > Isn't TLS_TCB_ALIGN 32 though? That should be used when aligning the TLS block > and thus tcbhead_t too. It is sysdeps/x86_64/tls.h:# define TLS_INIT_TCB_ALIGN __alignof__ (struct pthread) So shouldn't the fix be just that TLS_INIT_TCB_ALIGN is defined to 32 as well? (In reply to comment #9) > Offset is good for up to 128byte alignment: > > tcb-offsets.h:#define RTLD_SAVESPACE_SSE 128 > > But tcbhead_t is only aligned at 16byte. No matter > how much padding you add, rtld_savespace_sse can only > be guaranteed to be aligned at alignment of tcbhead_t. I don't doubt that the patch does the job. I don't like aligning the struct member. The struct itself (tcbhead_t) should get the alignment element. Otherwise we might silently introduce holes in the structure which are glossed over by the alignment of the struct element. > This test happens to make tcbhead_t only aligned at > 16byte, not 32byte. My patch aligns tcbhead_t at maximum > register size so that rtld_savespace_sse is properly aligned. And what is it about this test that does it? There seems to be nothing special. Any alignment introduced by this is incidental, based on the current implementation having specific sizes. I do not think it is worthwhile adding the patch in this form. Created attachment 5056 [details]
An updated patch
descr.h has
struct pthread
{
union
{
#if !TLS_DTV_AT_TP
/* This overlaps the TCB as used for TLS without threads (see tls.h). */
tcbhead_t header;
#else
...
} __attribute ((aligned (TCB_ALIGNMENT)));
We just need to define TCB_ALIGNMENT to 32byte.
(In reply to comment #12) > So shouldn't the fix be just that TLS_INIT_TCB_ALIGN is defined to 32 as well? Not enough due to descr.h has struct pthread { union { #if !TLS_DTV_AT_TP /* This overlaps the TCB as used for TLS without threads (see tls.h). */ tcbhead_t header; #else ... } __attribute ((aligned (TCB_ALIGNMENT))); #define TCB_ALIGNMENT 16 If tcbhead_t isn't aligned at >= 32byte, struct pthread will be aligned at 16byte, (In reply to comment #13) > (In reply to comment #9) > > Offset is good for up to 128byte alignment: > > > > tcb-offsets.h:#define RTLD_SAVESPACE_SSE 128 > > > > But tcbhead_t is only aligned at 16byte. No matter > > how much padding you add, rtld_savespace_sse can only > > be guaranteed to be aligned at alignment of tcbhead_t. > > I don't doubt that the patch does the job. > > I don't like aligning the struct member. The struct itself (tcbhead_t) should > get the alignment element. Otherwise we might silently introduce holes in the > structure which are glossed over by the alignment of the struct element. > I posted an updated patch to increase TCB_ALIGNMENT to 32. > > > This test happens to make tcbhead_t only aligned at > > 16byte, not 32byte. My patch aligns tcbhead_t at maximum > > register size so that rtld_savespace_sse is properly aligned. > > And what is it about this test that does it? There seems to be nothing > special. Any alignment introduced by this is incidental, based on the current > implementation having specific sizes. That is true. This test happens to align TCB at 16byte by chance. I applied the patch without the extra tests. It really has no real value as is. *** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla. |
Created attachment 5053 [details] small reproducer of this problem Small example is attached: unpack, run “make”, then run “./exe” -> segmentation fault (won’t segfault on non-AVX OS or CPU) Stack of crash: #0 _dl_x86_64_save_sse () at ../sysdeps/x86_64/dl-trampoline.S:189 #1 0x0000003dc380a7dd in add_dependency (flags=5, map=0x7ffff0000910, undef_map=0x7ffff0000e70) at dl-lookup.c:613 #2 _dl_lookup_symbol_x (flags=5, map=0x7ffff0000910, undef_map=0x7ffff0000e70) at dl-lookup.c:816 #3 0x0000003dc380dbb0 in _dl_fixup (l=0x0, reloc_arg=<value optimized out>) at ../elf/dl-runtime.c:118 #4 0x0000003dc3814315 in _dl_runtime_resolve () at ../sysdeps/x86_64/dl-trampoline.S:41 #5 0x00007ffff71cf5dc in hello2 () from ./libso2.so #6 0x00007ffff73d0636 in hello1 () from ./libso1.so #7 0x000000000040075e in doTask () #8 0x0000003dc4406a3a in start_thread (arg=0x7ffff7fd1710) at pthread_create.c:297 #9 0x0000003dc3cde77d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #10 0x0000000000000000 in ?? () Short description: There is AVX support in glibc 2.11. Function _dl_x86_64_save_sse has been updated with saving AVX registers. In fact, first AVX instruction in this code crashes being called from additional thread. To cause calling _dl_x86_64_save_sse function it is needed to have two DSOs loaded via dlopen with RTLD_LAZY having two global symbols with identical names. Then it is needed to call this function from both DSOs, second call will cause calling _dl_x86_64_save_sse.