Bug 31316 - Fails test misc/tst-dirname "Didn't expect signal from child: got `Illegal instruction'" on non SSE CPUs
Summary: Fails test misc/tst-dirname "Didn't expect signal from child: got `Illegal in...
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: build (show other bugs)
Version: 2.38
: P2 normal
Target Milestone: 2.40
Assignee: Florian Weimer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-30 03:05 UTC by Immolo
Modified: 2024-02-16 06:42 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Build log and out file of test fails (580.43 KB, application/x-xz)
2024-01-30 03:05 UTC, Immolo
Details
ld.so --list-diagnostics (2.37) (1.55 KB, text/plain)
2024-02-13 17:27 UTC, Sam James
Details
/proc/cpuinfo (441 bytes, text/plain)
2024-02-13 17:28 UTC, Sam James
Details
ld.so --list-diagnostics (2.39/HEAD) (1.62 KB, text/plain)
2024-02-13 17:42 UTC, Sam James
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Immolo 2024-01-30 03:05:44 UTC
Created attachment 15345 [details]
Build log and out file of test fails

Originally discovered by drumgod on a Celeron Mendocino and then confirmed by myself on a Pentium 2 (deschutes) which was kindly donated by a Gentoo user to be used to dig further into this issue.

This currently effects 2.38 and the upcoming 2.39 release however no issue is found in the releases before these two.

The two fails are:

misc/tst-dirname: Didn't expect signal from child: got `Illegal instruction
misc/tst-glibc-hwcaps-prepend-cache:
tst-glibc-hwcaps-prepend-cache.c:90: numeric comparison failure
   left: 1 (0x1); from: marker1 ()
  right: 2 (0x2); from: 2
tst-glibc-hwcaps-prepend-cache.c:105: numeric comparison failure
   left: 1 (0x1); from: marker1 ()
  right: 2 (0x2); from: 2
tst-glibc-hwcaps-prepend-cache.c:113: numeric comparison failure
   left: 1 (0x1); from: marker1 ()
  right: 3 (0x3); from: 3
error: tst-glibc-hwcaps-prepend-cache.c:122: not true: dlopen (SONAME, RTLD_NOW) == NULL
tst-glibc-hwcaps-prepend-cache.c:129: numeric comparison failure
   left: 1 (0x1); from: marker1 ()
  right: 2 (0x2); from: 2

I think the first one is the one which is the main issue but included both for completeness.

On a Gentoo system this can also be reproduced by installing glibc 2.38 then running "emerge -va nano":

/usr/lib/portage/python3.11/ebuild.sh: line 789:    21 Illegal instruction     ( if [[ -n ${PORTAGE_PIPE_FD} ]]; then
    eval "exec ${PORTAGE_PIPE_FD}>&-"; unset PORTAGE_PIPE_FD;
fi; __ebuild_main ${EBUILD_SH_ARGS}; exit 0 )

dmesg output:

[106071.719000] process 'ld-linux.so.2' launched '/var/tmp/portage/sys-libs/glibc-9999/temp/testscriptRsIrAs' with NULL argv: empty string added
[110136.637049] traps: ld-linux.so.2[9275] trap invalid opcode ip:b7e4fc8f sp:bfa360cc error:0 in libc.so[b7cc7000+194000]
[110326.703936] ld-linux.so.2[9796]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[240275.802537] ld-linux.so.2[16706]: segfault at 9393b150 ip b7d814e3 sp bfc36c40 error 4 in libc.so[b7d02000+193000] likely on CPU 0 (core 0, socket 0)
[240275.802721] Code: 8d 70 08 f7 c6 0f 00 00 00 0f 85 20 0a 00 00 8d 5c 97 04 65 a1 0c 00 00 00 85 c0 0f 85 46 04 00 00 8b 4c 24 18 89 f0 c1 e8 0c <8b> 79 08 8b 4c 24 14 31 f8 89 41 04 8b 44 24 18 8b 40 04 89 44 24
[240275.807617] ld-linux.so.2[16707]: segfault at 151537d0 ip b7d814e3 sp bfc36c40 error 4 in libc.so[b7d02000+193000] likely on CPU 0 (core 0, socket 0)
[240275.807770] Code: 8d 70 08 f7 c6 0f 00 00 00 0f 85 20 0a 00 00 8d 5c 97 04 65 a1 0c 00 00 00 85 c0 0f 85 46 04 00 00 8b 4c 24 18 89 f0 c1 e8 0c <8b> 79 08 8b 4c 24 14 31 f8 89 41 04 8b 44 24 18 8b 40 04 89 44 24
[264420.901533] traps: ld-linux.so.2[18454] trap invalid opcode ip:b7f0258f sp:bf80e39c error:0 in libc.so[b7d7b000+193000]
[363201.403362] traps: bash[1205] trap invalid opcode ip:b7dcceb5 sp:bf90298c error:0 in libc.so.6[b7c46000+193000]
[363357.204421] traps: bash[2431] trap invalid opcode ip:b7e9aeb5 sp:bfb088ac error:0 in libc.so.6[b7d14000+193000]

I knowing finding one of machines isn't easy but I can arrange for access to be given although I have found a way to reproduce in QEMU

1. qemu-img create -f qcow2 linux-nosse.qcow2 30G
2. Install distro of choice with the command " 
qemu-system-i386 -cpu pentium2 -enable-kvm -smp 8 -m 3G -cdrom <iso> -drive file=linux-nosse.qcow2,format=qcow2 -boot d"
3. Once installed boot into the machine with "qemu-system-i386 -cpu pentium2 -smp 8 -m 3G -cdrom <iso> -drive file=linux-nosse.qcow2,format=qcow2 -boot c"
4. Now compile glibc as normal and run "make check".

Gentoo bug: https://bugs.gentoo.org/922497
Comment 1 Florian Weimer 2024-01-30 11:55:54 UTC
The instruction stream does not make sense.

Running this:

“
import base64
data="""8d 70 08 f7 c6 0f 00 00 00 0f 85 20 0a 00 00 8d 5c 97 04
65 a1 0c 00 00 00 85 c0 0f 85 46 04 00 00 8b 4c 24 18 89 f0 c1 e8 0c 8b 79 08
8b 4c 24 14 31 f8 89 41 04 8b 44 24 18 8b 40 04 89 44 24
""".replace(' ', '').replace('\n', '')
with open('f', 'wb') as out:
    out.write(base64.b16decode(data, casefold=True))
”

And: objdump -b binary -m i386 -D f

gives:

   0:	8d 70 08             	lea    0x8(%eax),%esi
   3:	f7 c6 0f 00 00 00    	test   $0xf,%esi
   9:	0f 85 20 0a 00 00    	jne    0xa2f
   f:	8d 5c 97 04          	lea    0x4(%edi,%edx,4),%ebx
  13:	65 a1 0c 00 00 00    	mov    %gs:0xc,%eax
  19:	85 c0                	test   %eax,%eax
  1b:	0f 85 46 04 00 00    	jne    0x467
  21:	8b 4c 24 18          	mov    0x18(%esp),%ecx
  25:	89 f0                	mov    %esi,%eax
  27:	c1 e8 0c             	shr    $0xc,%eax
  2a:	8b 79 08             	mov    0x8(%ecx),%edi
  2d:	8b 4c 24 14          	mov    0x14(%esp),%ecx
  31:	31 f8                	xor    %edi,%eax
  33:	89 41 04             	mov    %eax,0x4(%ecx)
  36:	8b 44 24 18          	mov    0x18(%esp),%eax
  3a:	8b 40 04             	mov    0x4(%eax),%eax
  3d:	89                   	.byte 0x89
  3e:	44                   	inc    %esp
  3f:	24                   	.byte 0x24

The fault is at offset 0x2a, which is as far as I can see a perfectly fine i386 instruction. It's also not in a string function, the instruction sequence involving %gs is related to a single-thread optimization. The tail involving inc %esp is also really dubious.

Could you obtain a backtrace using GDB, possibly from a coredump?
Comment 2 Sam James 2024-02-13 17:21:26 UTC
I got access to an environment where it reproduces.

Oops:
```
Thread 2.1 "ld-linux.so.2" received signal SIGILL, Illegal instruction.
[Switching to process 27432]
0xb7f4ccff in __memrchr_sse2 () from /var/tmp/portage/sys-libs/glibc-9999/work/build-x86-i686-pc-linux-gnu-nptl/libc.so.6
(gdb) bt
#0  0xb7f4ccff in __memrchr_sse2 () from /var/tmp/portage/sys-libs/glibc-9999/work/build-x86-i686-pc-linux-gnu-nptl/libc.so.6
#1  0xb7ec7906 in dirname () from /var/tmp/portage/sys-libs/glibc-9999/work/build-x86-i686-pc-linux-gnu-nptl/libc.so.6
#2  0xb7fbf5d9 in ?? ()
#3  0xb7fbf684 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)
```
Comment 3 Florian Weimer 2024-02-13 17:25:32 UTC
Sam, could you show us the output from ld.so --list-diagnostics? Thanks.
Comment 4 Sam James 2024-02-13 17:27:41 UTC
Created attachment 15363 [details]
ld.so --list-diagnostics (2.37)
Comment 5 Sam James 2024-02-13 17:28:31 UTC
Created attachment 15364 [details]
/proc/cpuinfo
Comment 6 Sam James 2024-02-13 17:42:37 UTC
Created attachment 15365 [details]
ld.so --list-diagnostics (2.39/HEAD)

I forgot that the host was an older glibc -- annotated the two outputs properly now.

```
# diff -u <(elf/ld.so --library-path . --list-diagnostics) <(ld.so --list-diagnostics)
--- /dev/fd/63  2024-02-13 17:40:04.995461742 -0000
+++ /dev/fd/62  2024-02-13 17:40:04.998795127 -0000
@@ -46,12 +46,12 @@
 path.sysconfdir="/etc"
 path.system_dirs[0x0]="/lib/"
 path.system_dirs[0x1]="/usr/lib/"
-version.release="development"
-version.version="2.39.9000"
+version.release="stable"
+version.version="2.37"
 auxv[0x0].a_type=0x20
-auxv[0x0].a_val=0xb7f32570
+auxv[0x0].a_val=0xb7f93570
 auxv[0x1].a_type=0x21
-auxv[0x1].a_val=0xb7f32000
+auxv[0x1].a_val=0xb7f93000
 auxv[0x2].a_type=0x33
 auxv[0x2].a_val=0x5a0
 auxv[0x3].a_type=0x10
@@ -61,7 +61,7 @@
 auxv[0x5].a_type=0x11
 auxv[0x5].a_val=0x64
 auxv[0x6].a_type=0x3
-auxv[0x6].a_val=0xb7f34034
+auxv[0x6].a_val=0xb7f95034
 auxv[0x7].a_type=0x4
 auxv[0x7].a_val=0x20
 auxv[0x8].a_type=0x5
@@ -71,7 +71,7 @@
 auxv[0xa].a_type=0x8
 auxv[0xa].a_val=0x0
 auxv[0xb].a_type=0x9
-auxv[0xb].a_val=0xb7f51920
+auxv[0xb].a_val=0xb7fb2370
 auxv[0xc].a_type=0xb
 auxv[0xc].a_val=0x0
 auxv[0xd].a_type=0xc
@@ -83,13 +83,13 @@
 auxv[0x10].a_type=0x17
 auxv[0x10].a_val=0x0
 auxv[0x11].a_type=0x19
-auxv[0x11].a_val=0xbfb7b79b
+auxv[0x11].a_val=0xbfbdf86b
 auxv[0x12].a_type=0x1a
 auxv[0x12].a_val=0x0
 auxv[0x13].a_type=0x1f
-auxv[0x13].a_val_string="elf/ld.so"
+auxv[0x13].a_val="/usr/bin/ld.so"
 auxv[0x14].a_type=0xf
-auxv[0x14].a_val_string="i686"
+auxv[0x14].a_val="i686"
 auxv[0x15].a_type=0x1b
 auxv[0x15].a_val=0x1c
 auxv[0x16].a_type=0x1c
@@ -177,14 +177,6 @@
 x86.cpu_features.features[0x8].active[0x1]=0x0
 x86.cpu_features.features[0x8].active[0x2]=0x0
 x86.cpu_features.features[0x8].active[0x3]=0x0
-x86.cpu_features.features[0x9].cpuid[0x0]=0x0
-x86.cpu_features.features[0x9].cpuid[0x1]=0x0
-x86.cpu_features.features[0x9].cpuid[0x2]=0x0
-x86.cpu_features.features[0x9].cpuid[0x3]=0x0
-x86.cpu_features.features[0x9].active[0x0]=0x0
-x86.cpu_features.features[0x9].active[0x1]=0x0
-x86.cpu_features.features[0x9].active[0x2]=0x0
-x86.cpu_features.features[0x9].active[0x3]=0x0
 x86.cpu_features.preferred.Fast_Rep_String=0x0
 x86.cpu_features.preferred.Fast_Copy_Backward=0x0
 x86.cpu_features.preferred.Slow_BSF=0x0
@@ -195,7 +187,6 @@
 x86.cpu_features.preferred.I686=0x1
 x86.cpu_features.preferred.Slow_SSE4_2=0x0
 x86.cpu_features.preferred.AVX_Fast_Unaligned_Load=0x0
-x86.cpu_features.preferred.Prefer_MAP_32BIT_EXEC=0x0
 x86.cpu_features.preferred.Prefer_No_VZEROUPPER=0x0
 x86.cpu_features.preferred.Prefer_ERMS=0x0
 x86.cpu_features.preferred.Prefer_No_AVX512=0x1
@@ -223,4 +214,3 @@
 x86.cpu_features.level3_cache_assoc=0x0
 x86.cpu_features.level3_cache_linesize=0x0
 x86.cpu_features.level4_cache_size=0x0
-x86.cpu_features.cachesize_non_temporal_divisor=0x4
```
Comment 7 Florian Weimer 2024-02-13 17:48:32 UTC
These diagnostics look fine.

I think the issue is that in sysdeps/i386/i686/multiarch/memrchr-sse2.S, we have

strong_alias (__memrchr_sse2, __GI___memrchr)

under #if IS_IN (libc). This unconditionally sets the default implementation for libc.so.6 to the SSE2 one. That should probably be in sysdeps/i386/i686/multiarch/memrchr-c.c, for __memrchr_ia32.

Completely untested:

diff --git a/sysdeps/i386/i686/multiarch/memrchr-c.c b/sysdeps/i386/i686/multiarch/memrchr-c.c
index ef7bbbe792..20bfdf3af3 100644
--- a/sysdeps/i386/i686/multiarch/memrchr-c.c
+++ b/sysdeps/i386/i686/multiarch/memrchr-c.c
@@ -5,3 +5,4 @@ extern void *__memrchr_ia32 (const void *, int, size_t);
 #endif
 
 #include "string/memrchr.c"
+strong_alias (__memrchr_ia32, __GI___memrchr)
diff --git a/sysdeps/i386/i686/multiarch/memrchr-sse2.S b/sysdeps/i386/i686/multiarch/memrchr-sse2.S
index d9dae04171..e123f87435 100644
--- a/sysdeps/i386/i686/multiarch/memrchr-sse2.S
+++ b/sysdeps/i386/i686/multiarch/memrchr-sse2.S
@@ -720,5 +720,4 @@ L(ret_null):
        ret
 
 END (__memrchr_sse2)
-strong_alias (__memrchr_sse2, __GI___memrchr)
 #endif
Comment 8 Sam James 2024-02-15 14:05:18 UTC
Thanks, that works and cures the illegal instruction.

Took a few days to run the test suite on the machine:
      2 FAIL
   4843 PASS
     45 UNSUPPORTED
     17 XFAIL
      8 XPASS

FAIL: elf/tst-glibc-hwcaps-prepend-cache
FAIL: locale/tst-localedef-path-norm

```
$ cat ./elf/tst-glibc-hwcaps-prepend-cache.out
tst-glibc-hwcaps-prepend-cache.c:90: numeric comparison failure
   left: 1 (0x1); from: marker1 ()
  right: 2 (0x2); from: 2
tst-glibc-hwcaps-prepend-cache.c:105: numeric comparison failure
   left: 1 (0x1); from: marker1 ()
  right: 2 (0x2); from: 2
tst-glibc-hwcaps-prepend-cache.c:113: numeric comparison failure
   left: 1 (0x1); from: marker1 ()
  right: 3 (0x3); from: 3
error: tst-glibc-hwcaps-prepend-cache.c:122: not true: dlopen (SONAME, RTLD_NOW) == NULL
tst-glibc-hwcaps-prepend-cache.c:129: numeric comparison failure
   left: 1 (0x1); from: marker1 ()
  right: 2 (0x2); from: 2
error: 5 test failures
running post-clean rsync
```

Not sure what that's about, but it was also there before in the original report. Not checked on any other hw. Thanks Florian.
Comment 9 Florian Weimer 2024-02-15 14:50:33 UTC
Patch posted:

[PATCH] i386: Use generic memrchr in libc (bug 31316)
<https://inbox.sourceware.org/libc-alpha/87zfw1aik9.fsf@oldenburg.str.redhat.com/>
Comment 10 Florian Weimer 2024-02-16 06:42:51 UTC
Fixed via:

commit 0d9166c2245cad4ac520b337dee40c9a583872b6
Author: Florian Weimer <fweimer@redhat.com>
Date:   Fri Feb 16 07:40:37 2024 +0100

    i386: Use generic memrchr in libc (bug 31316)
    
    Before this change, we incorrectly used the SSE2 variant in the
    implementation, without checking that the system actually supports
    SSE2.
    
    Tested-by: Sam James <sam@gentoo.org>

I have no immediate plans to handle the backports.