Bug 30037 - glibc 2.34 and newer segfault if CPUID leaf 0x2 reports zero
Summary: glibc 2.34 and newer segfault if CPUID leaf 0x2 reports zero
Status: RESOLVED DUPLICATE of bug 29953
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.36
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-24 03:48 UTC by Dexuan Cui
Modified: 2023-07-17 07:29 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Proposed patch (1.73 KB, patch)
2023-02-24 16:27 UTC, Adam Yi
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dexuan Cui 2023-01-24 03:48:00 UTC
When I start an Intel TDX Ubuntu 22.04/22.10/23.04 (or RHEL 9.0) guest on Hyper-V and on KVM, the guest always hits segfaults and can’t boot up:

[ 21.081453] Run /inits init process
[ 21.086896] with arguments:
[ 21.095790] /init
[ 21.100982] with environment:
[ 21.106611] HOME=/
[ 21.112463] TERM=linux
[ 21.119850] BOOT_IMAGE=/boot/vmlinuz-6.1.0-rc7-decui+
Loading, please wait...
Starting version 249.11-0ubuntu3.6
[ 21.253908] udevadm[144]: segfault at 56538d61e0c0 ip 00007f8f5899efeb sp 00007ffd08fb7648 error 6 in libc.so.6[7f8f58820000+195000] likely on CPU 0 (core 0, socket 0)
[ 21.316549] Code: 07 62 e1 7d 48 e7 4f 01 62 e1 7d 48 e7 67 40 62 e1 7d 48 e7 6f 41 62 61 7d 48 e7 87 00 20 00 00 62 61 7d 48 e7 8f 40 20 00 00 <62> 61 7d 48 e7 a7 00 30 00 00 62 61 7d 48 e7 af 40 30 00 00 48 83
Segmentation fault
[ 22.499317] setfont[153]: segfault at 55ef3b91b000 ip 00007f5899899fa4 sp 00007ffc8008f628 error 4 in libc.so.6[7f589971b000+195000] likely on CPU 0 (core 0, socket 0)
[ 22.602677] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83
[ 22.732413] loadkeys[156]: segfault at 563ffe292000 ip 00007fbff957afa4 sp 00007ffe31453808 error 4 in libc.so.6[7fbff93fc000+195000] likely on CPU 0 (core 0, socket 0)
[ 22.833061] Code: 06 62 e1 fe 48 6f 4e 01 62 e1 fe 48 6f 66 40 62 e1 fe 48 6f 6e 41 62 61 fe 48 6f 86 00 20 00 00 62 61 fe 48 6f 8e 40 20 00 00 <62> 61 fe 48 6f a6 00 30 00 00 62 61 fe 48 6f ae 40 30 00 00 48 83

The segfault only happens to recent glibc versions (e.g. v2.35 in Ubuntu 22.04, and v2.34 in RHEL 9.0). It doesn’t happens to v2.31 in Ubuntu 20.04, or v2.32 in Ubuntu 20.10.

At first I thought this is Bug 28784 - x86: crash in 32bit memset-sse2.s when the cache size can not be determined (https://sourceware.org/bugzilla/show_bug.cgi?id=28784), but it turns out the fix for Bug 28784 (i.e. commit a51b76b71e8190a50b0e0c0b32f313888b930108 "x86: use default cache size if it cannot be determined [BZ #28784]") is alredy included into the Ubuntu distros. 

The fix for Bug 28784 is in the upstream glibc 2.35, so glibc 2.36 doesn't suffer from Bug 28784, but I'm seeting the same segfault with 
the Ubuntu 23.04 dev build (https://cloud-images.ubuntu.com/lunar/20230120/lunar-server-cloudimg-amd64-azure.vhd.tar.gz) where glibc 2.36-0ubuntu4 is used (BTW, this file can confirm the fix for Bug 28784 is indeed in the glibc 2.36 code in Ubuntu 23.04: https://git.launchpad.net/ubuntu/+source/glibc/tree/sysdeps/x86/cacheinfo.h?h=import/2.36-4#n64)


I suspect the segfault also exists in the upstream glibc 2.36 and probably newer, but I can't confirm it because I don't know how to upgrade the glibc in a distro (is this even possible?) so I'm opening this bug and I hope someone can shed some light. Thanks!
Comment 1 Dexuan Cui 2023-01-24 04:03:55 UTC
I'm reading "Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 2A: Instruction Set Reference, A-L" for the definition of CPUID leaf 0x2: 

"
INPUT EAX = 02H: TLB/Cache/Prefetch Information Returned in EAX, EBX, ECX, EDX

When CPUID executes with EAX set to 02H, the processor returns information about the processor’s internal TLBs, cache and prefetch hardware in the EAX, EBX, ECX, and EDX registers. The information is reported in encoded form and fall into the following categories:

• The least-significant byte in register EAX (register AL) will always return 01H. Software should ignore this value and not interpret it as an informational descriptor.

• The most significant bit (bit 31) of each register indicates whether the register contains valid information (set to 0) or is reserved (set to 1).

• If a register contains valid information, the information is contained in 1 byte descriptors. There are four types of encoding values for the byte descriptor, the encoding type is noted in the second column of Table 3-12. Table
3-12 lists the encoding of these descriptors. Note that the order of descriptors in the EAX, EBX, ECX, and EDX registers is not defined; that is, specific bytes are not designated to contain descriptors for specific cache, prefetch, or TLB types. The descriptors may appear in any order. Note also a processor may report a general descriptor type (FFH) and not report any byte descriptor of “cache type” via CPUID leaf 2.

"
Comment 2 Dexuan Cui 2023-01-24 04:18:00 UTC
CPUID leaf 0x2 is emulated for a TDX guest, and currently the returned EAX/EBX/ECX/EDX are all zeros, and I see the segfault issue in recent releases of glibc, including 2.36-0ubuntu4.

If I change the emulation logic in Linux kernel to return 0xff01 in EAX, then the segfault is gone.

0xff01 in EAX means "CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters".

So it looks like recent versions of glibc (2.34 and newer?) require a non-zero value in EAX? Please shed some light on this. Thanks!
Comment 3 Noah Goldstein 2023-01-25 20:35:09 UTC
Maybe this is duplicate of: https://sourceware.org/bugzilla/show_bug.cgi?id=29953 in which case it was fixed by:
https://sourceware.org/git/?p=glibc.git;a=commit;h=48b74865c63840b288bd85b4d8743533b73b339b

x86: Check minimum/maximum of non_temporal_threshold [BZ #29953]

The minimum non_temporal_threshold is 0x4040.  non_temporal_threshold may
be set to less than the minimum value when the shared cache size isn't
available (e.g., in an emulator) or by the tunable.  Add checks for
minimum and maximum of non_temporal_threshold.

This fixes BZ #29953.
Comment 4 Adam Yi 2023-02-24 16:27:50 UTC
Created attachment 14719 [details]
Proposed patch
Comment 5 Adam Yi 2023-02-24 16:29:00 UTC
Ah sorry please ignore the patch above. I added it to the wrong bug :(
Comment 6 ioanna alifieraki 2023-03-13 12:16:52 UTC
I can confirm that this is a duplicate of https://sourceware.org/bugzilla/show_bug.cgi?id=29953 and commit https://sourceware.org/git/?p=glibc.git;a=commit;h=48b74865c63840b288bd85b4d8743533b73b339b resolves the crash on Ubuntu 22.04.
Comment 7 Florian Weimer 2023-03-13 17:59:37 UTC
Marking as duplicated as instructed. Thanks.

*** This bug has been marked as a duplicate of bug 29953 ***