Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory. Ffsll function code size is 17 bytes. Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line. When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.
Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.
This patch fixes ffsll function random performance regression.
Yanzhang Wang [Tue, 2 Jan 2024 10:54:15 +0000 (18:54 +0800)]
RISC-V: Enable static-pie.
This patch referents the commit 374cef3 to add static-pie support. And
because the dummy link map is used when relocating ourselves, so need
not to set __global_pointer$ at this time.
It will also check whether toolchain supports to build static-pie. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The 551101e8240b7514fc646d1722f8b79c90362b8f change is incorrect for
alpha and sparc, since __NR_stat is defined by both kABI. Use
__NR_newfstat to check whether to fallback to __NR_fstat64 (similar
to what fstatat64 does).
Checked on sparc64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Wilco Dijkstra [Tue, 9 Jan 2024 15:32:08 +0000 (15:32 +0000)]
math: remove exp10 wrappers
Remove the error handling wrapper from exp10. This is very similar to
the changes done to exp and exp2, except that we also need to handle
pow10 and pow10l.
Wilco Dijkstra [Tue, 2 Jan 2024 17:08:02 +0000 (17:08 +0000)]
Benchtests: Increase benchmark iterations
Increase benchmark iterations for math and vector math functions to improve
timing accuracy. Vector math benchmarks now take 1-3 seconds on a modern CPU.
Xi Ruoyao [Thu, 4 Jan 2024 13:41:20 +0000 (21:41 +0800)]
Make __getrandom_nocancel set errno and add a _nostatus version
The __getrandom_nocancel function returns errors as negative values
instead of errno. This is inconsistent with other _nocancel functions
and it breaks "TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0))" in
__arc4random_buf. Use INLINE_SYSCALL_CALL instead of
INTERNAL_SYSCALL_CALL to fix this issue.
But __getrandom_nocancel has been avoiding from touching errno for a
reason, see BZ 29624. So add a __getrandom_nocancel_nostatus function
and use it in tcache_key_initialize.
Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
H.J. Lu [Wed, 10 Jan 2024 16:48:47 +0000 (08:48 -0800)]
x86-64/cet: Make CET feature check specific to Linux/x86
CET feature bits in TCB, which are Linux specific, are used to check if
CET features are active. Move CET feature check to Linux/x86 directory. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Stefan Liebler [Thu, 11 Jan 2024 13:01:18 +0000 (14:01 +0100)]
resolv: Fix endless loop in __res_context_query
Starting with commit 40c0add7d48739f5d89ebba255c1df26629a76e2
"resolve: Remove __res_context_query alloca usage"
there is an endless loop in __res_context_query if
__res_context_mkquery fails e.g. if type is invalid. Then the
scratch buffer is resized to MAXPACKET size and it is retried again.
Before the mentioned commit, it was retried only once and with the
mentioned commit, there is no check and it retries in an endless loop.
This is observable with xtest resolv/tst-resolv-qtypes which times out
after 300s.
This patch retries mkquery only once as before the mentioned commit.
Furthermore, scratch_buffer_set_array_size is now only called with
nelem=2 if type is T_QUERY_A_AND_AAAA (also see mentioned commit).
The test tst-resolv-qtypes is also adjusted to verify that <func>
is really returning with -1 in case of an invalid type. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Carlos O'Donell [Wed, 10 Jan 2024 15:46:03 +0000 (10:46 -0500)]
elf: Fix tst-nodeps2 test failure.
After 78ca44da0160a0b442f0ca1f253e3360f044b2ec
("elf: Relocate libc.so early during startup and dlmopen (bug 31083)")
we start seeing tst-nodeps2 failures when building the testsuite with
--enable-hard-coded-path-in-tests.
When building the testsuite with --enable-hard-coded-path-in-tests
the tst-nodeps2-mod.so is not built with the required DT_RUNPATH
values and the test escapes the test framework and loads the system
libraries and aborts. The fix is to use the existing
$(link-test-modules-rpath-link) variable to set DT_RUNPATH correctly.
H.J. Lu [Tue, 9 Jan 2024 20:23:27 +0000 (12:23 -0800)]
i386: Remove CET support bits
1. Remove _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk.
2. Move CET offsets from x86 cpu-features-offsets.sym to x86-64
features-offsets.sym.
3. Rename x86 cet-control.h to x86-64 feature-control.h since it is only
for x86-64 and also used for PLT rewrite.
4. Add x86-64 ldsodefs.h to include feature-control.h.
5. Change TUNABLE_CALLBACK (set_plt_rewrite) to x86-64 only.
6. Move x86 dl-procruntime.c to x86-64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
H.J. Lu [Tue, 9 Jan 2024 20:23:24 +0000 (12:23 -0800)]
x86: Move x86-64 shadow stack startup codes
Move sysdeps/x86/libc-start.h to sysdeps/x86_64/libc-start.h and use
sysdeps/generic/libc-start.h for i386. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Joseph Myers [Wed, 10 Jan 2024 13:02:16 +0000 (13:02 +0000)]
Fix deprecated utcnow() usage in build-many-glibcs.py
Running build-many-glibcs.py with Python 3.12 or later produces a
warning:
build-many-glibcs.py:566: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
build_time = datetime.datetime.utcnow()
Replace with datetime.datetime.now(datetime.timezone.utc) (the
datetime.UTC constant is new in 3.11, so not suitable for use in this
script at present).
Joseph Myers [Wed, 10 Jan 2024 13:01:39 +0000 (13:01 +0000)]
Fix invalid escape sequence in build-many-glibcs.py
Running build-many-glibcs.py with Python 3.12 or later produces a
warning:
build-many-glibcs.py:173: SyntaxWarning: invalid escape sequence '\.'
m = re.fullmatch('([0-9]+)\.([0-9]+)[.0-9]*', l)
Use a raw string instead to avoid that warning. (Note: I haven't
checked whether any other Python scripts included with glibc might
have issues with newer Python versions.)
The feupdateenv tests added by 802aef27b2 do not restore the floating
point mask, which might keep some floating point exception enabled and
thus make the feupdateenv_single_test raise an unexpected signal.
Checked on x86_64-linux-gnu and aarch64-linux-gnu (on Apple M1 trapping
is supported). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
CET is only support for x86_64, this patch reverts:
- faaee1f07ed x86: Support shadow stack pointer in setjmp/longjmp.
- be9ccd27c09 i386: Add _CET_ENDBR to indirect jump targets in
add_n.S/sub_n.S
- c02695d7764 x86/CET: Update vfork to prevent child return
- 5d844e1b725 i386: Enable CET support in ucontext functions
- 124bcde683 x86: Add _CET_ENDBR to functions in crti.S
- 562837c002 x86: Add _CET_ENDBR to functions in dl-tlsdesc.S
- f753fa7dea x86: Support IBT and SHSTK in Intel CET [BZ #21598]
- 825b58f3fb i386-mcount.S: Add _CET_ENDBR to _mcount and __fentry__
- 7e119cd582 i386: Use _CET_NOTRACK in i686/memcmp.S
- 177824e232 i386: Use _CET_NOTRACK in memcmp-sse4.S
- 0a899af097 i386: Use _CET_NOTRACK in memcpy-ssse3-rep.S
- 7fb613361c i386: Use _CET_NOTRACK in memcpy-ssse3.S
- 77a8ae0948 i386: Use _CET_NOTRACK in memset-sse2-rep.S
- 00e7b76a8f i386: Use _CET_NOTRACK in memset-sse2.S
- 90d15dc577 i386: Use _CET_NOTRACK in strcat-sse2.S
- f1574581c7 i386: Use _CET_NOTRACK in strcpy-sse2.S
- 4031d7484a i386/sub_n.S: Add a missing _CET_ENDBR to indirect jump
- target
-
Checked on i686-linux-gnu.
The CET is only supported for x86_64 and there is no plan to add
kernel support for i386. Move the Makefile rules and files from the
generic x86 folder to x86_64 one.
Linux 6.7 removed ia64 from the official tree [1], following the general
principle that a glibc port needs upstream support for the architecture
in all the components it depends on (binutils, GCC, and the Linux
kernel).
Apart from the removal of sysdeps/ia64 and sysdeps/unix/sysv/linux/ia64,
there are updates to various comments referencing ia64 for which removal
of those references seemed appropriate. The configuration is removed
from README and build-many-glibcs.py.
The CONTRIBUTED-BY, elf/elf.h, manual/contrib.texi (the porting
mention), *.po files, config.guess, and longlong.h are not changed.
For Linux it allows cleanup some clone2 support on multiple files.
The following bug can be closed as WONTFIX: BZ 22634 [2], BZ 14250 [3],
BZ 21634 [4], BZ 10163 [5], BZ 16401 [6], and BZ 11585 [7].
H.J. Lu [Sat, 6 Jan 2024 22:03:37 +0000 (14:03 -0800)]
x32: Handle displacement overflow in PLT rewrite [BZ #31218]
PLT rewrite calculated displacement with
ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
On x32, displacement from 0xf7fbe060 to 0x401030 was calculated as
unsigned int disp = 0x401030 - 0xf7fbe060 - 5;
with disp == 0x8442fcb and caused displacement overflow. The PLT entry
was changed to:
0xf7fbe060 <+0>: e9 cb 2f 44 08 jmp 0x401030
0xf7fbe065 <+5>: cc int3
0xf7fbe066 <+6>: cc int3
0xf7fbe067 <+7>: cc int3
0xf7fbe068 <+8>: cc int3
0xf7fbe069 <+9>: cc int3
0xf7fbe06a <+10>: cc int3
0xf7fbe06b <+11>: cc int3
0xf7fbe06c <+12>: cc int3
0xf7fbe06d <+13>: cc int3
0xf7fbe06e <+14>: cc int3
0xf7fbe06f <+15>: cc int3
x32 has 32-bit address range, but it doesn't wrap address around at 4GB,
JMP target was changed to 0x100401030 (0xf7fbe060LL + 0x8442fcbLL + 5),
which is above 4GB.
Always use uint64_t to calculate displacement. This fixes BZ #31218. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
It seems to boiler down to __builtin_clz not having a variant for 8 or
16 bits. Fix it by explicit casting to the expected types. Although
not strickly required for older gcc, using the same __pacify macro
simpify the required code.
stdlib: Fix stdbit.h with -Wconversion for older gcc
With gcc 6.5.0, 7.5.0, 8.5.0, and 9.5.0 the tst-stdbit-Wconversion
issues the warnings:
../stdlib/stdbit.h: In function ‘__clo16_inline’:
../stdlib/stdbit.h:128:26: error: conversion to ‘uint16_t {aka short
unsigned int}’ from ‘int’ may alter its value [-Werror=conversion]
return __clz16_inline (~__x);
^
../stdlib/stdbit.h: In function ‘__clo8_inline’:
../stdlib/stdbit.h:134:25: error: conversion to ‘uint8_t {aka unsigned
char}’ from ‘int’ may alter its value [-Werror=conversion]
return __clz8_inline (~__x);
^
../stdlib/stdbit.h: In function ‘__cto16_inline’:
../stdlib/stdbit.h:232:26: error: conversion to ‘uint16_t {aka short
unsigned int}’ from ‘int’ may alter its value [-Werror=conversion]
return __ctz16_inline (~__x);
^
../stdlib/stdbit.h: In function ‘__cto8_inline’:
../stdlib/stdbit.h:238:25: error: conversion to ‘uint8_t {aka unsigned
char}’ from ‘int’ may alter its value [-Werror=conversion]
return __ctz8_inline (~__x);
^
../stdlib/stdbit.h: In function ‘__bf16_inline’:
../stdlib/stdbit.h:701:23: error: conversion to ‘uint16_t {aka short
unsigned int}’ from ‘int’ may alter its value [-Werror=conversion]
return __x == 0 ? 0 : ((uint16_t) 1) << (__bw16_inline (__x) - 1);
~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../stdlib/stdbit.h: In function ‘__bf8_inline’:
../stdlib/stdbit.h:707:23: error: conversion to ‘uint8_t {aka unsigned
char}’ from ‘int’ may alter its value [-Werror=conversion]
return __x == 0 ? 0 : ((uint8_t) 1) << (__bw8_inline (__x) - 1);
~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../stdlib/stdbit.h: In function ‘__bc16_inline’:
../stdlib/stdbit.h:751:59: error: conversion to ‘uint16_t {aka short
unsigned int}’ from ‘int’ may alter its value [-Werror=conversion]
return __x <= 1 ? 1 : ((uint16_t) 2) << (__bw16_inline (__x - 1) -
1);
^~~
../stdlib/stdbit.h:751:23: error: conversion to ‘uint16_t {aka short
unsigned int}’ from ‘int’ may alter its value [-Werror=conversion]
return __x <= 1 ? 1 : ((uint16_t) 2) << (__bw16_inline (__x - 1) -
1);
~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../stdlib/stdbit.h: In function ‘__bc8_inline’:
../stdlib/stdbit.h:757:57: error: conversion to ‘uint8_t {aka unsigned
char}’ from ‘int’ may alter its value [-Werror=conversion]
return __x <= 1 ? 1 : ((uint8_t) 2) << (__bw8_inline (__x - 1) - 1);
^~~
../stdlib/stdbit.h:757:23: error: conversion to ‘uint8_t {aka unsigned
char}’ from ‘int’ may alter its value [-Werror=conversion]
return __x <= 1 ? 1 : ((uint8_t) 2) << (__bw8_inline (__x - 1) - 1);
It seems to boiler down to __builtin_clz not having a variant for 8 or
16 bits. Fix it by explicit casting to the expected types.
Checked on x86_64-linux-gnu and i686-linux-gnu with gcc 9.5.0.
1. DT_X86_64_PLT: The address of the procedure linkage table.
2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
table.
3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
entry.
With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
memory offset of the indirect branch instruction.
Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
with direct branch after relocation when the lazy binding is disabled.
PLT rewrite is disabled by default since SELinux may disallow modifying
code pages and ld.so can't detect it in all cases. Use
$ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
to enable PLT rewrite with 32-bit direct jump at run-time or
$ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2
to enable PLT rewrite with 32-bit direct jump and on APX processors with
64-bit absolute jump at run-time.
Sergey Bugaev [Wed, 3 Jan 2024 17:14:45 +0000 (20:14 +0300)]
aarch64: Make cpu-features definitions not Linux-specific
These describe generic AArch64 CPU features, and are not tied to a
kernel-specific way of determining them. We can share them between
the Linux and Hurd AArch64 ports.
Sergey Bugaev [Wed, 3 Jan 2024 17:14:44 +0000 (20:14 +0300)]
hurd: Initializy _dl_pagesize early in static builds
We fetch __vm_page_size as the very first RPC that we do, inside
__mach_init (). Propagate that to _dl_pagesize ASAP after that,
before any other initialization.
In dynamic builds, this is already done immediately after
__mach_init (), inside _dl_sysdep_start ().
Sergey Bugaev [Wed, 3 Jan 2024 17:14:40 +0000 (20:14 +0300)]
hurd: Pass the data pointer to _hurd_stack_setup explicitly
Instead of relying on the stack frame layout to figure out where the stack
pointer was prior to the _hurd_stack_setup () call, just pass the pointer
as an argument explicitly. This is less brittle and much more portable.
H.J. Lu [Tue, 2 Jan 2024 15:03:29 +0000 (07:03 -0800)]
x86-64/cet: Check the restore token in longjmp
setcontext and swapcontext put a restore token on the old shadow stack
which is used to restore the target shadow stack when switching user
contexts. When longjmp from a user context, the target shadow stack
can be different from the current shadow stack and INCSSP can't be
used to restore the shadow stack pointer to the target shadow stack.
Update longjmp to search for a restore token. If found, use the token
to restore the shadow stack pointer before using INCSSP to pop the
shadow stack. Stop the token search and use INCSSP if the shadow stack
entry value is the same as the current shadow stack pointer.
It is a user error if there is a shadow stack switch without leaving a
restore token on the old shadow stack.
The only difference between __longjmp.S and __longjmp_chk.S is that
__longjmp_chk.S has a check for invalid longjmp usages. Merge
__longjmp.S and __longjmp_chk.S by adding the CHECK_INVALID_LONGJMP
macro. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Joseph Myers [Wed, 3 Jan 2024 12:07:14 +0000 (12:07 +0000)]
Implement C23 <stdbit.h>
C23 adds a header <stdbit.h> with various functions and type-generic
macros for bit-manipulation of unsigned integers (plus macro defines
related to endianness). Implement this header for glibc.
The functions have both inline definitions in the header (referenced
by macros defined in the header) and copies with external linkage in
the library (which are implemented in terms of those macros to avoid
duplication). They are documented in the glibc manual. Tests, as
well as verifying results for various inputs (of both the macros and
the out-of-line functions), verify the types of those results (which
showed up a bug in an earlier version with the type-generic macro
stdc_has_single_bit wrongly returning a promoted type), that the
macros can be used at top level in a source file (so don't use ({})),
that they evaluate their arguments exactly once, and that the macros
for the type-specific functions have the expected implicit conversions
to the relevant argument type.
Jakub previously referred to -Wconversion warnings in type-generic
macros, so I've included a test with -Wconversion (but the only
warnings I saw and fixed from that test were actually in inline
functions in the <stdbit.h> header - not anything coming from use of
the type-generic macros themselves).
This implementation of the type-generic macros does not handle
unsigned __int128, or unsigned _BitInt types with a width other than
that of a standard integer type (and C23 doesn't require the header to
handle such types either). Support for those types, using the new
type-generic built-in functions Jakub's added for GCC 14, can
reasonably be added in a followup (along of course with associated
tests).
This implementation doesn't do anything special to handle C++, or have
any tests of functionality in C++ beyond the existing tests that all
headers can be compiled in C++ code; it's not clear exactly what form
this header should take in C++, but probably not one using macros.
DIS ballot comment AT-107 asks for the word "count" to be added to the
names of the stdc_leading_zeros, stdc_leading_ones,
stdc_trailing_zeros and stdc_trailing_ones functions and macros. I
don't think it's likely to be accepted (accepting any technical
comments would mean having an FDIS ballot), but if it is accepted at
the WG14 meeting (22-26 January in Strasbourg, starting with DIS
ballot comment handling) then there would still be time to update
glibc for the renaming before the 2.39 release.
The new functions and header are placed in the stdlib/ directory in
glibc, rather than creating a new toplevel stdbit/ or putting them in
string/ alongside ffs.
Szabolcs Nagy [Tue, 21 Dec 2021 13:49:37 +0000 (13:49 +0000)]
aarch64: Add longjmp test for SME
Includes test for setcontext too.
The test directly checks after longjmp if ZA got disabled and the
ZA contents got saved following the lazy saving scheme. It does not
use ACLE code to verify that gcc can interoperate with glibc.
Szabolcs Nagy [Fri, 10 Sep 2021 15:52:17 +0000 (16:52 +0100)]
aarch64: Add SME runtime support
The runtime support routines for the call ABI of the Scalable Matrix
Extension (SME) are mostly in libgcc. Since libc.so cannot depend on
libgcc_s.so have an implementation of __arm_za_disable in libc for
libc internal use in longjmp and similar APIs.
__libc_arm_za_disable follows the same PCS rules as __arm_za_disable,
but it's a hidden symbol so it does not need variant PCS marking.
Using __libc_fatal instead of abort because it can print a message and
works in ld.so too. But for now we don't need SME routines in ld.so.
To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in
_rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2
object is accessed.
Florian Weimer [Tue, 2 Jan 2024 13:36:17 +0000 (14:36 +0100)]
libio: Check remaining buffer size in _IO_wdo_write (bug 31183)
The multibyte character needs to fit into the remaining buffer space,
not the already-written buffer space. Without the fix, we were never
moving the write pointer from the start of the buffer, always using
the single-character fallback buffer.
Fixes commit 04b76b5aa8b2d1d19066e42dd1 ("Don't error out writing
a multibyte character to an unbuffered stream (bug 17522)").
Noah Goldstein [Wed, 27 Dec 2023 19:29:32 +0000 (11:29 -0800)]
string: Add additional output in test-strchr failure
Seeing occasional failures in `__strchrnul_evex512` that are not
consistently reproducible. Hopefully by adding this the next failure
will provide enough information to debug.
H.J. Lu [Wed, 20 Dec 2023 15:34:42 +0000 (07:34 -0800)]
Add a setjmp/longjmp test between user contexts
Verify that setjmp and longjmp work correctly between user contexts.
Arrange stacks for uctx_func1 and uctx_func2 so that ____longjmp_chk
works when setjmp and longjmp are called from different user contexts.
Paul Eggert [Mon, 1 Jan 2024 18:40:37 +0000 (10:40 -0800)]
Omit regex.c pragmas no longer needed
* posix/regex.c: [!_LIBC && __GNUC_PREREQ (4, 3)]:
Omit GCC pragmas no longer needed when this file is used as part of Gnulib.
-Wold-style-definition no longer needs to be ignored because the regex
code no longer uses old style definitions. -Wtype-limits no longer
needs to be ignored because Gnulib already arranges for it to be
ignored in the C compiler flags. This patch is taken from Gnulib.
Paul Eggert [Mon, 1 Jan 2024 18:35:28 +0000 (10:35 -0800)]
Update copyright dates not handled by scripts/update-copyrights
I've updated copyright dates in glibc for 2024. This is the patch for
the changes not generated by scripts/update-copyrights and subsequent
build / regeneration of generated files.
H.J. Lu [Fri, 29 Dec 2023 16:43:53 +0000 (08:43 -0800)]
x86/cet: Don't set CET active by default
Not all CET enabled applications and libraries have been properly tested
in CET enabled environments. Some CET enabled applications or libraries
will crash or misbehave when CET is enabled. Don't set CET active by
default so that all applications and libraries will run normally regardless
of whether CET is active or not. Shadow stack can be enabled by
$ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK
at run-time if shadow stack can be enabled by kernel.
NB: This commit can be reverted if it is OK to enable CET by default for
all applications and libraries.
H.J. Lu [Fri, 29 Dec 2023 16:43:52 +0000 (08:43 -0800)]
x86/cet: Check feature_1 in TCB for active IBT and SHSTK
Initially, IBT and SHSTK are marked as active when CPU supports them
and CET are enabled in glibc. They can be disabled early by tunables
before relocation. Since after relocation, GLRO(dl_x86_cpu_features)
becomes read-only, we can't update GLRO(dl_x86_cpu_features) to mark
IBT and SHSTK as inactive. Instead, check the feature_1 field in TCB
to decide if IBT and SHST are active.
H.J. Lu [Fri, 29 Dec 2023 16:43:51 +0000 (08:43 -0800)]
x86/cet: Enable shadow stack during startup
Previously, CET was enabled by kernel before passing control to user
space and the startup code must disable CET if applications or shared
libraries aren't CET enabled. Since the current kernel only supports
shadow stack and won't enable shadow stack before passing control to
user space, we need to enable shadow stack during startup if the
application and all shared library are shadow stack enabled. There
is no need to disable shadow stack at startup. Shadow stack can only
be enabled in a function which will never return. Otherwise, shadow
stack will underflow at the function return.
1. GL(dl_x86_feature_1) is set to the CET features which are supported
by the processor and are not disabled by the tunable. Only non-zero
features in GL(dl_x86_feature_1) should be enabled. After enabling
shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check
if shadow stack is really enabled.
2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable. It is
safe since RTLD_START never returns.
3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static
executable. Since the start function using ARCH_SETUP_TLS never returns,
it is safe to enable shadow stack in ARCH_SETUP_TLS.