Created attachment 15345 [details] Build log and out file of test fails Originally discovered by drumgod on a Celeron Mendocino and then confirmed by myself on a Pentium 2 (deschutes) which was kindly donated by a Gentoo user to be used to dig further into this issue. This currently effects 2.38 and the upcoming 2.39 release however no issue is found in the releases before these two. The two fails are: misc/tst-dirname: Didn't expect signal from child: got `Illegal instruction misc/tst-glibc-hwcaps-prepend-cache: tst-glibc-hwcaps-prepend-cache.c:90: numeric comparison failure left: 1 (0x1); from: marker1 () right: 2 (0x2); from: 2 tst-glibc-hwcaps-prepend-cache.c:105: numeric comparison failure left: 1 (0x1); from: marker1 () right: 2 (0x2); from: 2 tst-glibc-hwcaps-prepend-cache.c:113: numeric comparison failure left: 1 (0x1); from: marker1 () right: 3 (0x3); from: 3 error: tst-glibc-hwcaps-prepend-cache.c:122: not true: dlopen (SONAME, RTLD_NOW) == NULL tst-glibc-hwcaps-prepend-cache.c:129: numeric comparison failure left: 1 (0x1); from: marker1 () right: 2 (0x2); from: 2 I think the first one is the one which is the main issue but included both for completeness. On a Gentoo system this can also be reproduced by installing glibc 2.38 then running "emerge -va nano": /usr/lib/portage/python3.11/ebuild.sh: line 789: 21 Illegal instruction ( if [[ -n ${PORTAGE_PIPE_FD} ]]; then eval "exec ${PORTAGE_PIPE_FD}>&-"; unset PORTAGE_PIPE_FD; fi; __ebuild_main ${EBUILD_SH_ARGS}; exit 0 ) dmesg output: [106071.719000] process 'ld-linux.so.2' launched '/var/tmp/portage/sys-libs/glibc-9999/temp/testscriptRsIrAs' with NULL argv: empty string added [110136.637049] traps: ld-linux.so.2[9275] trap invalid opcode ip:b7e4fc8f sp:bfa360cc error:0 in libc.so[b7cc7000+194000] [110326.703936] ld-linux.so.2[9796]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [240275.802537] ld-linux.so.2[16706]: segfault at 9393b150 ip b7d814e3 sp bfc36c40 error 4 in libc.so[b7d02000+193000] likely on CPU 0 (core 0, socket 0) [240275.802721] Code: 8d 70 08 f7 c6 0f 00 00 00 0f 85 20 0a 00 00 8d 5c 97 04 65 a1 0c 00 00 00 85 c0 0f 85 46 04 00 00 8b 4c 24 18 89 f0 c1 e8 0c <8b> 79 08 8b 4c 24 14 31 f8 89 41 04 8b 44 24 18 8b 40 04 89 44 24 [240275.807617] ld-linux.so.2[16707]: segfault at 151537d0 ip b7d814e3 sp bfc36c40 error 4 in libc.so[b7d02000+193000] likely on CPU 0 (core 0, socket 0) [240275.807770] Code: 8d 70 08 f7 c6 0f 00 00 00 0f 85 20 0a 00 00 8d 5c 97 04 65 a1 0c 00 00 00 85 c0 0f 85 46 04 00 00 8b 4c 24 18 89 f0 c1 e8 0c <8b> 79 08 8b 4c 24 14 31 f8 89 41 04 8b 44 24 18 8b 40 04 89 44 24 [264420.901533] traps: ld-linux.so.2[18454] trap invalid opcode ip:b7f0258f sp:bf80e39c error:0 in libc.so[b7d7b000+193000] [363201.403362] traps: bash[1205] trap invalid opcode ip:b7dcceb5 sp:bf90298c error:0 in libc.so.6[b7c46000+193000] [363357.204421] traps: bash[2431] trap invalid opcode ip:b7e9aeb5 sp:bfb088ac error:0 in libc.so.6[b7d14000+193000] I knowing finding one of machines isn't easy but I can arrange for access to be given although I have found a way to reproduce in QEMU 1. qemu-img create -f qcow2 linux-nosse.qcow2 30G 2. Install distro of choice with the command " qemu-system-i386 -cpu pentium2 -enable-kvm -smp 8 -m 3G -cdrom <iso> -drive file=linux-nosse.qcow2,format=qcow2 -boot d" 3. Once installed boot into the machine with "qemu-system-i386 -cpu pentium2 -smp 8 -m 3G -cdrom <iso> -drive file=linux-nosse.qcow2,format=qcow2 -boot c" 4. Now compile glibc as normal and run "make check". Gentoo bug: https://bugs.gentoo.org/922497
The instruction stream does not make sense. Running this: “ import base64 data="""8d 70 08 f7 c6 0f 00 00 00 0f 85 20 0a 00 00 8d 5c 97 04 65 a1 0c 00 00 00 85 c0 0f 85 46 04 00 00 8b 4c 24 18 89 f0 c1 e8 0c 8b 79 08 8b 4c 24 14 31 f8 89 41 04 8b 44 24 18 8b 40 04 89 44 24 """.replace(' ', '').replace('\n', '') with open('f', 'wb') as out: out.write(base64.b16decode(data, casefold=True)) ” And: objdump -b binary -m i386 -D f gives: 0: 8d 70 08 lea 0x8(%eax),%esi 3: f7 c6 0f 00 00 00 test $0xf,%esi 9: 0f 85 20 0a 00 00 jne 0xa2f f: 8d 5c 97 04 lea 0x4(%edi,%edx,4),%ebx 13: 65 a1 0c 00 00 00 mov %gs:0xc,%eax 19: 85 c0 test %eax,%eax 1b: 0f 85 46 04 00 00 jne 0x467 21: 8b 4c 24 18 mov 0x18(%esp),%ecx 25: 89 f0 mov %esi,%eax 27: c1 e8 0c shr $0xc,%eax 2a: 8b 79 08 mov 0x8(%ecx),%edi 2d: 8b 4c 24 14 mov 0x14(%esp),%ecx 31: 31 f8 xor %edi,%eax 33: 89 41 04 mov %eax,0x4(%ecx) 36: 8b 44 24 18 mov 0x18(%esp),%eax 3a: 8b 40 04 mov 0x4(%eax),%eax 3d: 89 .byte 0x89 3e: 44 inc %esp 3f: 24 .byte 0x24 The fault is at offset 0x2a, which is as far as I can see a perfectly fine i386 instruction. It's also not in a string function, the instruction sequence involving %gs is related to a single-thread optimization. The tail involving inc %esp is also really dubious. Could you obtain a backtrace using GDB, possibly from a coredump?
I got access to an environment where it reproduces. Oops: ``` Thread 2.1 "ld-linux.so.2" received signal SIGILL, Illegal instruction. [Switching to process 27432] 0xb7f4ccff in __memrchr_sse2 () from /var/tmp/portage/sys-libs/glibc-9999/work/build-x86-i686-pc-linux-gnu-nptl/libc.so.6 (gdb) bt #0 0xb7f4ccff in __memrchr_sse2 () from /var/tmp/portage/sys-libs/glibc-9999/work/build-x86-i686-pc-linux-gnu-nptl/libc.so.6 #1 0xb7ec7906 in dirname () from /var/tmp/portage/sys-libs/glibc-9999/work/build-x86-i686-pc-linux-gnu-nptl/libc.so.6 #2 0xb7fbf5d9 in ?? () #3 0xb7fbf684 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) ```
Sam, could you show us the output from ld.so --list-diagnostics? Thanks.
Created attachment 15363 [details] ld.so --list-diagnostics (2.37)
Created attachment 15364 [details] /proc/cpuinfo
Created attachment 15365 [details] ld.so --list-diagnostics (2.39/HEAD) I forgot that the host was an older glibc -- annotated the two outputs properly now. ``` # diff -u <(elf/ld.so --library-path . --list-diagnostics) <(ld.so --list-diagnostics) --- /dev/fd/63 2024-02-13 17:40:04.995461742 -0000 +++ /dev/fd/62 2024-02-13 17:40:04.998795127 -0000 @@ -46,12 +46,12 @@ path.sysconfdir="/etc" path.system_dirs[0x0]="/lib/" path.system_dirs[0x1]="/usr/lib/" -version.release="development" -version.version="2.39.9000" +version.release="stable" +version.version="2.37" auxv[0x0].a_type=0x20 -auxv[0x0].a_val=0xb7f32570 +auxv[0x0].a_val=0xb7f93570 auxv[0x1].a_type=0x21 -auxv[0x1].a_val=0xb7f32000 +auxv[0x1].a_val=0xb7f93000 auxv[0x2].a_type=0x33 auxv[0x2].a_val=0x5a0 auxv[0x3].a_type=0x10 @@ -61,7 +61,7 @@ auxv[0x5].a_type=0x11 auxv[0x5].a_val=0x64 auxv[0x6].a_type=0x3 -auxv[0x6].a_val=0xb7f34034 +auxv[0x6].a_val=0xb7f95034 auxv[0x7].a_type=0x4 auxv[0x7].a_val=0x20 auxv[0x8].a_type=0x5 @@ -71,7 +71,7 @@ auxv[0xa].a_type=0x8 auxv[0xa].a_val=0x0 auxv[0xb].a_type=0x9 -auxv[0xb].a_val=0xb7f51920 +auxv[0xb].a_val=0xb7fb2370 auxv[0xc].a_type=0xb auxv[0xc].a_val=0x0 auxv[0xd].a_type=0xc @@ -83,13 +83,13 @@ auxv[0x10].a_type=0x17 auxv[0x10].a_val=0x0 auxv[0x11].a_type=0x19 -auxv[0x11].a_val=0xbfb7b79b +auxv[0x11].a_val=0xbfbdf86b auxv[0x12].a_type=0x1a auxv[0x12].a_val=0x0 auxv[0x13].a_type=0x1f -auxv[0x13].a_val_string="elf/ld.so" +auxv[0x13].a_val="/usr/bin/ld.so" auxv[0x14].a_type=0xf -auxv[0x14].a_val_string="i686" +auxv[0x14].a_val="i686" auxv[0x15].a_type=0x1b auxv[0x15].a_val=0x1c auxv[0x16].a_type=0x1c @@ -177,14 +177,6 @@ x86.cpu_features.features[0x8].active[0x1]=0x0 x86.cpu_features.features[0x8].active[0x2]=0x0 x86.cpu_features.features[0x8].active[0x3]=0x0 -x86.cpu_features.features[0x9].cpuid[0x0]=0x0 -x86.cpu_features.features[0x9].cpuid[0x1]=0x0 -x86.cpu_features.features[0x9].cpuid[0x2]=0x0 -x86.cpu_features.features[0x9].cpuid[0x3]=0x0 -x86.cpu_features.features[0x9].active[0x0]=0x0 -x86.cpu_features.features[0x9].active[0x1]=0x0 -x86.cpu_features.features[0x9].active[0x2]=0x0 -x86.cpu_features.features[0x9].active[0x3]=0x0 x86.cpu_features.preferred.Fast_Rep_String=0x0 x86.cpu_features.preferred.Fast_Copy_Backward=0x0 x86.cpu_features.preferred.Slow_BSF=0x0 @@ -195,7 +187,6 @@ x86.cpu_features.preferred.I686=0x1 x86.cpu_features.preferred.Slow_SSE4_2=0x0 x86.cpu_features.preferred.AVX_Fast_Unaligned_Load=0x0 -x86.cpu_features.preferred.Prefer_MAP_32BIT_EXEC=0x0 x86.cpu_features.preferred.Prefer_No_VZEROUPPER=0x0 x86.cpu_features.preferred.Prefer_ERMS=0x0 x86.cpu_features.preferred.Prefer_No_AVX512=0x1 @@ -223,4 +214,3 @@ x86.cpu_features.level3_cache_assoc=0x0 x86.cpu_features.level3_cache_linesize=0x0 x86.cpu_features.level4_cache_size=0x0 -x86.cpu_features.cachesize_non_temporal_divisor=0x4 ```
These diagnostics look fine. I think the issue is that in sysdeps/i386/i686/multiarch/memrchr-sse2.S, we have strong_alias (__memrchr_sse2, __GI___memrchr) under #if IS_IN (libc). This unconditionally sets the default implementation for libc.so.6 to the SSE2 one. That should probably be in sysdeps/i386/i686/multiarch/memrchr-c.c, for __memrchr_ia32. Completely untested: diff --git a/sysdeps/i386/i686/multiarch/memrchr-c.c b/sysdeps/i386/i686/multiarch/memrchr-c.c index ef7bbbe792..20bfdf3af3 100644 --- a/sysdeps/i386/i686/multiarch/memrchr-c.c +++ b/sysdeps/i386/i686/multiarch/memrchr-c.c @@ -5,3 +5,4 @@ extern void *__memrchr_ia32 (const void *, int, size_t); #endif #include "string/memrchr.c" +strong_alias (__memrchr_ia32, __GI___memrchr) diff --git a/sysdeps/i386/i686/multiarch/memrchr-sse2.S b/sysdeps/i386/i686/multiarch/memrchr-sse2.S index d9dae04171..e123f87435 100644 --- a/sysdeps/i386/i686/multiarch/memrchr-sse2.S +++ b/sysdeps/i386/i686/multiarch/memrchr-sse2.S @@ -720,5 +720,4 @@ L(ret_null): ret END (__memrchr_sse2) -strong_alias (__memrchr_sse2, __GI___memrchr) #endif
Thanks, that works and cures the illegal instruction. Took a few days to run the test suite on the machine: 2 FAIL 4843 PASS 45 UNSUPPORTED 17 XFAIL 8 XPASS FAIL: elf/tst-glibc-hwcaps-prepend-cache FAIL: locale/tst-localedef-path-norm ``` $ cat ./elf/tst-glibc-hwcaps-prepend-cache.out tst-glibc-hwcaps-prepend-cache.c:90: numeric comparison failure left: 1 (0x1); from: marker1 () right: 2 (0x2); from: 2 tst-glibc-hwcaps-prepend-cache.c:105: numeric comparison failure left: 1 (0x1); from: marker1 () right: 2 (0x2); from: 2 tst-glibc-hwcaps-prepend-cache.c:113: numeric comparison failure left: 1 (0x1); from: marker1 () right: 3 (0x3); from: 3 error: tst-glibc-hwcaps-prepend-cache.c:122: not true: dlopen (SONAME, RTLD_NOW) == NULL tst-glibc-hwcaps-prepend-cache.c:129: numeric comparison failure left: 1 (0x1); from: marker1 () right: 2 (0x2); from: 2 error: 5 test failures running post-clean rsync ``` Not sure what that's about, but it was also there before in the original report. Not checked on any other hw. Thanks Florian.
Patch posted: [PATCH] i386: Use generic memrchr in libc (bug 31316) <https://inbox.sourceware.org/libc-alpha/87zfw1aik9.fsf@oldenburg.str.redhat.com/>
Fixed via: commit 0d9166c2245cad4ac520b337dee40c9a583872b6 Author: Florian Weimer <fweimer@redhat.com> Date: Fri Feb 16 07:40:37 2024 +0100 i386: Use generic memrchr in libc (bug 31316) Before this change, we incorrectly used the SSE2 variant in the implementation, without checking that the system actually supports SSE2. Tested-by: Sam James <sam@gentoo.org> I have no immediate plans to handle the backports.