sourceware.org Git - glibc.git/log

Implement C23 cospi

C23 adds various <math.h> function families originally defined in TS
18661-4. Add the cospi functions (cos(pi*x)).

Tested for x86_64 and x86, and with build-many-glibcs.py.

malloc: Optimize small memory clearing for calloc

Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 bytes
on 64-bit targets) for calloc. Use repeated stores with 1 branch, instead
of up to 3 branches. On x86-64, it is faster than memset since calling
memset needs 1 indirect branch, 1 broadcast, and up to 4 branches.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

Use Linux 6.12 in build-many-glibcs.py

Tested with build-many-glibcs.py (host-libraries, compilers and glibcs
builds).

locale: More strictly implement ISO 8601 for Esperanto locale

Esperanto, as an international language and a bit of a non-locale,
usually defaults to international consensus. In this commit, I make the
Esperanto locale more in line with ISO 8601 by setting the first day as
Monday, and the first week as containing January 4.

Closes: BZ #32323
Signed-off-by: Carmen Bianca BAKKER <carmen@carmenbianca.eu>
Reviewed-by: Mike FABIAN <mfabian@redhat.com>

elf: Consolidate stackinfo.h

And use sane default the generic implementation.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

manual: Describe struct link_map, support link maps with dlinfo

This does not describe how to use RTLD_DI_ORIGIN and l_name
to reconstruct a full path for the an object. The reason
is that I think we should not recommend further use of
RTLD_DI_ORIGIN due to its buffer overflow potential (bug 24298).
This should be covered by another dlinfo extension. It would
also obsolete the need for the dladdr approach to obtain
the file name for the main executable.

Obtaining the lowest address from load segments in program
headers is quite clumsy and should be provided directly
via dlinfo.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

Add threaded test of sem_trywait

All the existing glibc tests of sem_trywait are single-threaded. Add
one that calls sem_trywait and sem_post in separate threads.

Tested for x86_64.

Add test of ELF hash collisions

Add tests that the dynamic linker works correctly with symbol names
involving hash collisions, for both choices of hash style (and
--hash-style=both as well). I note that there weren't actually any
previous tests using --hash-style (so tests would only cover the
default linker configuration in that regard). Also test symbol
versions involving hash collisions.

Tested for x86_64.

nptl: Add new test for pthread_spin_trylock

Add a threaded test for pthread_spin_trylock attempting to lock already
acquired spin lock and checking for correct return code.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

malloc: send freed small chunks to smallbin

Large chunks get added to the unsorted bin since
sorting them takes time, for small chunks the
benefit of adding them to the unsorted bin is
non-existant, actually hurting performance.

Splitting and malloc_consolidate still add small
chunks to unsorted, but we can hint the compiler
that that is a relatively rare occurance.
Benchmarking shows this to be consistently good.

Authored-by: k4lizen <k4lizen@proton.me>
Signed-off-by: Aleksa Siriški <sir@tmina.org>

AArch64: Remove zva_128 from memset

Remove ZVA 128 support from memset - the new memset no longer
guarantees count >= 256, which can result in underflow and a
crash if ZVA size is 128 ([1]). Since only one CPU uses a ZVA
size of 128 and its memcpy implementation was removed in commit
e162ab2bf1b82c40f29e1925986582fa07568ce8, remove this special
case too.

[1] https://sourceware.org/pipermail/libc-alpha/2024-November/161626.html

Reviewed-by: Andrew Pinski <quic_apinski@quicinc.com>

benchtests: Add calloc test

Two new benchmarks related to calloc added:
- bench-calloc-simple
- bench-calloc-thread
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

pthread_getcpuclockid: Add descriptive comment to smoke test

Add a descriptive comment to the tst-pthread-cpuclockid-invalid test and
also drop pthread_getcpuclockid from the TODO-testing list since it now
has full coverage.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

Remove nios2-linux-gnu

GCC 15 (e876acab6cdd84bb2b32c98fc69fb0ba29c81153) and binutils
(e7a16d9fd65098045ef5959bf98d990f12314111) both removed all Nios II
support, and the architecture has been EOL'ed by the vendor. The
kernel still has support, but without a proper compiler there
is no much sense in keep it on glibc.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

libio: make _IO_least_marker static

Trivial cleanup to limit _IO_least_marker so that it's clear that it is
unused outside of genops.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

malloc: Avoid func call for tcache quick path in free()

Tcache is an important optimzation to accelerate memory free(), things
within this code path should be kept as simple as possible. This commit
try to remove the function call when free() invokes tcache code path by
inlining _int_free().

Result of bench-malloc-thread benchmark

Test Platform: Xeon-8380
Ratio: New / Original time_per_iteration (Lower is Better)

Threads#   | Ratio
-----------|------
1 thread   | 0.879
4 threads  | 0.874

The performance data shows it can improve bench-malloc-thread benchmark
by ~12% in both single thread and multi-thread scenario.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

debug: Fix tst-longjmp_chk3 build failure on Hurd

Explicitly include <unistd.h> for _exit and getpid.

math: Add internal roundeven_finite

Some CORE-MATH routines uses roundeven and most of ISA do not have
an specific instruction for the operation.  In this case, the call
will be routed to generic implementation.

However, if the ISA does support round() and ctz() there is a better
alternative (as used by CORE-MATH).

This patch adds such optimization and also enables it on powerpc.
On a power10 it shows the following improvement:

expm1f                      master      patched       improvement
latency                     9.8574       7.0139            28.85%
reciprocal-throughput       4.3742       2.6592            39.21%

Checked on powerpc64le-linux-gnu and aarch64-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>

RISC-V: Use builtin for fma and fmaf

The built-in functions `builtin_{fma, fmaf}` are sufficient to generate correct `fmadd.d`/`fmadd.s` instructions on RISC-V.

Signed-off-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

RISC-V: Use builtin for copysign and copysignf

The built-in functions `builtin_{copysign, copysignf}` are sufficient to generate correct `fsgnj.d/fsgnj.s` instructions on RISC-V.

Signed-off-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Silence most -Wzero-as-null-pointer-constant diagnostics

Replace 0 by NULL and {0} by {}.

Omit a few cases that aren't so trivial to fix.

Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059>
Link: <https://software.codidact.com/posts/292718/292759#answer-292759>
Signed-off-by: Alejandro Colomar <alx@kernel.org>

sysdeps: linux: Fix output of LD_SHOW_AUXV=1 for AT_RSEQ_*

The constants themselves were added to elf.h back in 8754a4133e but the
array in _dl_show_auxv wasn't modified accordingly, resulting in the
following output when running LD_SHOW_AUXV=1 /bin/true on recent Linux:

    AT_??? (0x1b): 0x1c
    AT_??? (0x1c): 0x20

With this patch:

    AT_RSEQ_FEATURE_SIZE: 28
    AT_RSEQ_ALIGN:        32

Tested on Linux 6.11 x86_64

Signed-off-by: Yannick Le Pennec <yannick.lepennec@live.fr>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

debug: Wire up tst-longjmp_chk3

The test was added in commit ac8cc9e300a002228eb7e660df3e7b333d9a7414
without all the required Makefile scaffolding. Tweak the test
so that it actually builds (including with dynamic SIGSTKSZ).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

nptl: initialize cpu_id_start prior to rseq registration

When adding explicit initialization of rseq fields prior to
registration, I glossed over the fact that 'cpu_id_start' is also
documented as initialized by user-space.

While current kernels don't validate the content of this field on
registration, future ones could.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

math: Fix branch hint for 68d7128942

powerpc64le: ROP Changes for strncpy/ppc-mount

Add ROP protect instructions to strncpy and ppc-mount functions.
Modify FRAME_MIN_SIZE to 48 bytes for ELFv2 to reserve additional
16 bytes for ROP save slot and padding.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>

math: Fix non-portability in the computation of signgam in lgammaf

The k>>31 in signgam = 1 - (((k&(k>>31))&1)<<1); is not portable:

* The ISO C standard says "If E1 has a signed type and a negative
  value, the resulting value is implementation-defined." (this is
  still in C23).
* If the int type is larger than 32 bits (e.g. a 64-bit type),
  then k = INT_MAX; line 144 will make k>>31 put 1 in bit 0
  (thus signgam will be -1) while 0 is expected.

Moreover, instead of the fx >= 0x1p31f condition, testing fx >= 0
is probably better for 2 reasons:

The signgam expression has more or less a condition on the sign
of fx (the goal of k>>31, which can be dropped with this new
condition). Since fx ≥ 0 should be the most common case, one can
get signgam directly in this case (value 1). And this simplifies
the expression for the other case (fx < 0).

This new condition may be easier/faster to test on the processor
(e.g. by avoiding a load of a constant from the memory).

This is commit d41459c731865516318f813cf4c966dafa0eecbf from CORE-MATH.

Checked on x86_64-linux-gnu.

malloc: Split _int_free() into 3 sub functions

Split _int_free() into 3 smaller functions for flexible combination:
* _int_free_check -- sanity check for free
* tcache_free -- free memory to tcache (quick path)
* _int_free_chunk -- free memory chunk (slow path)

hurd: Add MAP_NORESERVE mmap flag

This is already the current default behavior, which we will change with
overcommit support addition.

nptl: Add smoke test for pthread_getcpuclockid failure

Exercise the case where an exited thread will cause
pthread_getcpuclockid to fail.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

Add multithreaded test of sem_getvalue

Test coverage of sem_getvalue is fairly limited. Add a test that runs
it on threads on each CPU. For this purpose I adapted
tst-skeleton-thread-affinity.c; it didn't seem very suitable to use
as-is or include directly in a different test doing things per-CPU,
but did seem a suitable starting point (thus sharing
tst-skeleton-affinity.c) for such testing.

Tested for x86_64.

math: Use tanf from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       82.3961       54.8052       33.49%
x86_64v2                     82.3415       54.8052       33.44%
x86_64v3                     69.3661       50.4864       27.22%
i686                         219.271       45.5396       79.23%
aarch64                      29.2127       19.1951       34.29%
power10                      19.5060       16.2760       16.56%

reciprocal-throughput         master       patched  improvement
x86_64                       28.3976       19.7334       30.51%
x86_64v2                     28.4568       19.7334       30.65%
x86_64v3                     21.1815       16.1811       23.61%
i686                         105.016       15.1426       85.58%
aarch64                      18.1573       10.7681       40.70%
power10                       8.7207        8.7097        0.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

math: Use lgammaf from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic lgammaf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, to use math_narrow_eval
on overflow usage, and to adapt to make it reentrant.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       86.5609       70.3278       18.75%
x86_64v2                     78.3030       69.9709       10.64%
x86_64v3                     74.7470       59.8457       19.94%
i686                         387.355       229.761       40.68%
aarch64                      40.8341       33.7563       17.33%
power10                      26.5520       16.1672       39.11%
powerpc                      28.3145       17.0625       39.74%

reciprocal-throughput         master       patched  improvement
x86_64                       68.0461       48.3098       29.00%
x86_64v2                     55.3256       47.2476       14.60%
x86_64v3                     52.3015       38.9028       25.62%
i686                         340.848       195.707       42.58%
aarch64                      36.8000       30.5234       17.06%
power10                      20.4043       12.6268       38.12%
powerpc                      22.6588       13.8866       38.71%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

math: Use erfcf from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erfcf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       98.8796       66.2142       33.04%
x86_64v2                     98.9617       67.4221       31.87%
x86_64v3                     87.4161       53.1754       39.17%
aarch64                      33.8336       22.0781       34.75%
power10                      21.1750       13.5864       35.84%
powerpc                      21.4694       13.8149       35.65%

reciprocal-throughput         master       patched  improvement
x86_64                       48.5620       27.6731       43.01%
x86_64v2                     47.9497       28.3804       40.81%
x86_64v3                     42.0255       18.1355       56.85%
aarch64                      24.3938       13.4041       45.05%
power10                      10.4919        6.1881       41.02%
powerpc                       11.763       6.76468       42.49%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

math: Use erff from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erff.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       85.7363       45.1372       47.35%
x86_64v2                     86.6337       38.5816       55.47%
x86_64v3                     71.3810       34.0843       52.25%
i686                         190.143       97.5014       48.72%
aarch64                      34.9091       14.9320       57.23%
power10                      38.6160        8.5188       77.94%
powerpc                      39.7446       8.45781       78.72%

reciprocal-throughput         master       patched  improvement
x86_64                       35.1739       14.7603       58.04%
x86_64v2                     34.5976       11.2283       67.55%
x86_64v3                     27.3260        9.8550       63.94%
i686                         91.0282       30.8840       66.07%
aarch64                      22.5831        6.9615       69.17%
power10                      18.0386        3.0918       82.86%
powerpc                      20.7277       3.63396       82.47%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

math: Split s_erfF in erff and erfc

So we can eventually replace each implementation.

Reviewed-by: DJ Delorie <dj@redhat.com>

math: Use cbrtf from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic cbrtf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master        patched       improvement
x86_64                       68.6348        36.8908            46.25%
x86_64v2                     67.3418        36.6968            45.51%
x86_64v3                     63.4981        32.7859            48.37%
aarch64                      29.3172        12.1496            58.56%
power10                      18.0845         8.8893            50.85%
powerpc                      18.0859        8.79527            51.37%

reciprocal-throughput         master        patched       improvement
x86_64                       36.4369        13.3565            63.34%
x86_64v2                     37.3611        13.1149            64.90%
x86_64v3                     31.6024        11.2102            64.53%
aarch64                      18.6866        7.3474             60.68%
power10                       9.4758        3.6329             61.66%
powerpc                      9.58896        3.90439            59.28%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

benchtests: Add tanf benchmark

Random inputs in [-pi, pi].

Reviewed-by: DJ Delorie <dj@redhat.com>

benchtests: Add lgammaf benchmark

Random inputs in the range [-20.0,20.0].

Reviewed-by: DJ Delorie <dj@redhat.com>

benchtests: Add erfcf benchmark

It is based on binary64 erfc-inputs, with random inputs in
[0,b=0x1.41bbf6p+3] where b in the smallest number such that
erfcf(b) rounds to 0 (to nearest).

Reviewed-by: DJ Delorie <dj@redhat.com>

benchtests: Add erff benchmark

It is based on binary64 erf-inputs, with random inputs in [0,b=0x1.f5a888p+1]
where b in the smallest number such that erff(b) rounds to 1 (to nearest).

Reviewed-by: DJ Delorie <dj@redhat.com>

benchtests: Add cbrtf benchmark

Based on binary64 benchtests, with random inputs in [1,8].

elf: Handle static PIE with non-zero load address [BZ #31799]

For a static PIE with non-zero load address, its PT_DYNAMIC segment
entries contain the relocated values for the load address in static PIE.
Since static PIE usually doesn't have PT_PHDR segment, use p_vaddr of
the PT_LOAD segment with offset == 0 as the load address in static PIE
and adjust the entries of PT_DYNAMIC segment in static PIE by properly
setting the l_addr field for static PIE. This fixes BZ #31799.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>

x86/string: Use `movsl` instead of `movsd` in strncat [BZ #32344]

The previous patch missed strncat, so fixed that.

Resolves: BZ #32344

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

stdlib: Make getenv thread-safe in more cases

Async-signal-safety is preserved, too.  In fact, getenv is fully
reentrant and can be called from the malloc call in setenv
(if a replacement malloc uses getenv during its initialization).

This is relatively easy to implement because even before this change,
setenv, unsetenv, clearenv, putenv do not deallocate the environment
strings themselves as they are removed from the environment.

The main changes are:

* Use release stores for environment array updates, following
  the usual pattern for safely publishing immutable data
  (in this case, the environment strings).

* Do not deallocate the environment array.  Instead, keep older
  versions around and adopt an  exponential resizing policy.  This
  results in an amortized constant space leak per active environment
  variable, but there already is such a leak for the variable itself
  (and that is even length-dependent, and includes no-longer used
  values).

* Add a seqlock-like mechanism to retry getenv if a concurrent
  unsetenv is observed.  Without that, it is possible that
  getenv returns NULL for a variable that is never unset.  This
  is visible on some AArch64 implementations with the newly
  added stdlib/tst-getenv-unsetenv test case.  The mechanism
  is not a pure seqlock because it tolerates one write from
  unsetenv.  This avoids the need for a second copy of the
  environ array that getenv can read from a signal handler
  that happens to interrupt an unsetenv call.

No manual updates are included with this patch because environ
usage with execve, posix_spawn, system is still not thread-safe
relative unsetenv.  The new process may end up with an environment
that misses entries that were never unset.  This is the same issue
described above for getenv.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

aarch64: Remove non-temporal load/stores from oryon-1's memset

The hardware architects have a new recommendation not to use
non-temporal load/stores for memset. This patch removes this path.
I found there was no difference in the memset speed with/without
non-temporal load/stores either.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

aarch64: Remove non-temporal load/stores from oryon-1's memcpy

The hardware architects have a new recommendation not to use
non-temporal load/stores for memcpy. This patch removes this path.
I found there was no difference in the memcpy speed with/without
non-temporal load/stores either.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

powerpc64le: _init/_fini file changes for ROP

The ROP instructions were added in ISA 3.1 (ie, Power10), however they
were defined so that if executed on older cpus, they would behave as
nops. This allows us to emit them on older cpus and they'd just be
ignored, but if run on a Power10, then the binary would be ROP protected.

Hash instructions use negative offsets so the default position
of ROP pointer is FRAME_ROP_SAVE from caller's SP.

Modified FRAME_MIN_SIZE_PARM to 112 for ELFv2 to reserve
additional 16 bytes for ROP save slot and padding.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>

mman.h: Fix MAP_HASSEMPHORE typo

BSD's MAP_HASSEMAPHORE is with an A. MAP_HASSEMPHORE is not used in any
Debian software for instance.

misc: remove extra va_end in error_tail (bug 32233)

This is an addendum to commit b7b52b9dec ("error, error_at_line: Add
missing va_end calls"), which added the va_end calls in the callers where
they belong.

intl: avoid alloca for arbitrary sizes (bug 32380)

Use malloc for the copy of the domain name and the category value, which
can both be of arbitrary size.

manual: Add description of AArch64-specific pkey flags

Describe AArch64 specific flags PKEY_DISABLE_READ and PKEY_DISABLE_EXECUTE that
are available on AArch64 systems with enabled Stage 1 permission overlays
feature introduced in Armv8.9 / 9.4 (FEAT_S1POE).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

AArch64: Add support for memory protection keys

This patch adds support for memory protection keys on AArch64 systems with
enabled Stage 1 permission overlays feature introduced in Armv8.9 / 9.4
(FEAT_S1POE) [1].

1. Internal functions "pkey_read" and "pkey_write" to access data
    associated with memory protection keys.
2. Implementation of API functions "pkey_get" and "pkey_set" for
    the AArch64 target.
3. AArch64-specific PKEY flags for READ and EXECUTE (see below).
4. New target-specific test that checks behaviour of pkeys on
    AArch64 targets.
5. This patch also extends existing generic test for pkeys.
6. HWCAP constant for Permission Overlay Extension feature.

To support more accurate mapping of underlying permissions to the
PKEY flags, we introduce additional AArch64-specific flags. The full
list of flags is:

- PKEY_UNRESTRICTED: 0x0 (for completeness)
- PKEY_DISABLE_ACCESS: 0x1 (existing flag)
- PKEY_DISABLE_WRITE: 0x2 (existing flag)
- PKEY_DISABLE_EXECUTE: 0x4 (new flag, AArch64 specific)
- PKEY_DISABLE_READ: 0x8 (new flag, AArch64 specific)

The problem here is that PKEY_DISABLE_ACCESS has unusual semantics as
it overlaps with existing PKEY_DISABLE_WRITE and new PKEY_DISABLE_READ.
For this reason mapping between permission bits RWX and "restrictions"
bits awxr (a for disable access, etc) becomes complicated:

- PKEY_DISABLE_ACCESS disables both R and W
- PKEY_DISABLE_{WRITE,READ} disables W and R respectively
- PKEY_DISABLE_EXECUTE disables X

Combinations like the one below are accepted although they are redundant:

- PKEY_DISABLE_ACCESS | PKEY_DISABLE_READ | PKEY_DISABLE_WRITE

Reverse mapping tries to retain backward compatibility and ORs
PKEY_DISABLE_ACCESS whenever both flags PKEY_DISABLE_READ and
PKEY_DISABLE_WRITE would be present.

This will break code that compares pkey_get output with == instead
of using bitwise operations. The latter is more correct since PKEY_*
constants are essentially bit flags.

It should be noted that PKEY_DISABLE_ACCESS does not prevent execution.

[1] https://developer.arm.com/documentation/ddi0487/ka/ section D8.4.1.4

Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

AArch64: Remove thunderx{,2} memcpy

ThunderX1 and ThunderX2 have been retired for a few years now.
So let's remove the thunderx{,2} specific versions of memcpy.
The performance gain or them was for medium and large sizes
while the generic (aarch64) memcpy will handle just slightly worse.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

Fix femode_t conditionals for arc and or1k

Two of the architecture bits/fenv.h headers define femode_t if
__GLIBC_USE (IEC_60559_BFP_EXT), instead of the correct condition
__GLIBC_USE (IEC_60559_BFP_EXT_C23) (both were added after commit
0175c9e9be5f0b2000859666b6e1ef3696f1123b, but were probably first
developed before it and then not updated to take account of its
changes). This results in failures of the installed headers check for
fenv.h when building with GCC 15 (defaults to -std=gnu23 - we don't
yet have an installed-headers test specifically for C23 mode and don't
yet require a compiler with such a mode for building glibc) together
with a combination of options leaving C23 features enabled, since the
declarations of functions using femode_t use the correct conditions;
see
<https://sourceware.org/pipermail/libc-testresults/2024q4/013163.html>.
Fix the conditionals to get <fenv.h> to work correctly in C23 mode
again.

Tested with build-many-glibcs.py (arc-linux-gnu, arch-linux-gnuhf,
or1k-linux-gnu-hard, or1k-linux-gnu-soft).

powerpc64le: Optimized strcat for POWER10

This patch adds an optimized strcat which makes use of the default
strcat function which calls the Power10 strcpy and strlen routines.

powerpc: Improve the inline asm for syscall wrappers

Update the inline asm syscall wrappers to match the newer register constraint
usage in INTERNAL_VSYSCALL_CALL_TYPE. Use the faster mfocrf instruction when
available, rather than the slower mfcr microcoded instruction.

htl: move pthread_attr_init into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl: move pthread_attr_setguardsize into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl: move pthread_attr_setschedparam into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl: move pthread_attr_setscope into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl: move pthread_attr_setstackaddr into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl: move pthread_attr_setstacksize into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl: move pthread_attr_getstack into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl: move pthread_attr_getstackaddr into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl move pthread_attr_getstacksize into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl move pthread_attr_getscope into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl move pthread_attr_getguardsize into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

htl: move __pthread_default_attr into libc

Signed-off-by: gfleury <gfleury@disroot.org>

htl: move pthread_attr_destroy into libc.

Signed-off-by: gfleury <gfleury@disroot.org>

stdio-common: Fix C23-ism in formatted output specifier tests [BZ #32360]

Nameless function parameters have only been added to ISO C with the C23
revision of the language standard. Give names to the unused parameters
of the stub 'dladdr' implementation then so as to make compilation happy
with the earlier language definitions, fixing errors such as:

tst-printf-format-skeleton.c:374:9: error: parameter name omitted
374 | dladdr (const void *, Dl_info *)

reported by older compilers.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

elf: handle addition overflow in _dl_find_object_update_1 [BZ #32245]

The remaining_to_add variable can be 0 if (current_used + count) wraps,
This is caught by GCC 14+ on hppa, which determines from there that
target_seg could be be NULL when remaining_to_add is zero, which in
turns causes a -Wstringop-overflow warning:

In file included from ../include/atomic.h:49,
                  from dl-find_object.c:20:
In function '_dlfo_update_init_seg',
     inlined from '_dl_find_object_update_1' at dl-find_object.c:689:30,
     inlined from '_dl_find_object_update' at dl-find_object.c:805:13:
../sysdeps/unix/sysv/linux/hppa/atomic-machine.h:44:4: error: '__atomic_store_4' writing 4 bytes into a region of size 0 overflows the destination [-Werror=stringop-overflow=]
    44 |    __atomic_store_n ((mem), (val), __ATOMIC_RELAXED);                        \
       |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dl-find_object.c:644:3: note: in expansion of macro 'atomic_store_relaxed'
   644 |   atomic_store_relaxed (&seg->size, new_seg_size);
       |   ^~~~~~~~~~~~~~~~~~~~
In function '_dl_find_object_update':
cc1: note: destination object is likely at address zero

In practice, this is not possible as it represent counts of link maps.
Link maps have sizes larger than 1 byte, so the sum of any two link map
counts will always fit within a size_t without wrapping around.

This patch therefore adds a check on remaining_to_add == 0 and tell GCC
that this can not happen using __builtin_unreachable.

Thanks to Andreas Schwab for the investigation.

Closes: BZ #32245
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: John David Anglin <dave.anglin@bell.net>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

x86/string: Use `movsl` instead of `movsd` in strncpy/strncat [BZ #32344]

`ld`, starting at 2.40, emits a warning when using `movsd`. There is
no change to the actual code produced.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

manual: Fix overeager s/int/size_t/ in memory.texi

The change in e3960d1c57e57f33e0e846d615788f4ede73b945 should only have
affected 'int' not 'internally'.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

hppa: Update libm-test-ulps

Update imaginary part of csin.

Signed-off-by: John David Anglin <dave.anglin@bell.net>

Revert "hurd: Stop depending on the default_pager stubs provided by gnumach"

This reverts commit f7f7dd8009275504b211c170caf5bce50fa472ac.

default_pager is actually also used in e.g. xosview.

linux: Add support for getrandom vDSO

Linux 6.11 has getrandom() in vDSO. It operates on a thread-local opaque
state allocated with mmap using flags specified by the vDSO.

Multiple states are allocated at once, as many as fit into a page, and
these are held in an array of available states to be doled out to each
thread upon first use, and recycled when a thread terminates. As these
states run low, more are allocated.

To make this procedure async-signal-safe, a simple guard is used in the
LSB of the opaque state address, falling back to the syscall if there's
reentrancy contention.

Also, _Fork() is handled by blocking signals on opaque state allocation
(so _Fork() always sees a consistent state even if it interrupts a
getrandom() call) and by iterating over the thread stack cache on
reclaim_stack. Each opaque state will be in the free states list
(grnd_alloc.states) or allocated to a running thread.

The cancellation is handled by always using GRND_NONBLOCK flags while
calling the vDSO, and falling back to the cancellable syscall if the
kernel returns EAGAIN (would block). Since getrandom is not defined by
POSIX and cancellation is supported as an extension, the cancellation is
handled as 'may occur' instead of 'shall occur' [1], meaning that if
vDSO does not block (the expected behavior) getrandom will not act as a
cancellation entrypoint. It avoids a pthread_testcancel call on the fast
path (different than 'shall occur' functions, like sem_wait()).

It is currently enabled for x86_64, which is available in Linux 6.11,
and aarch64, powerpc32, powerpc64, loongarch64, and s390x, which are
available in Linux 6.12.

Link: https://pubs.opengroup.org/onlinepubs/9799919799/nframe.html
Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> # x86_64
Tested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> # x86_64, aarch64
Tested-by: Xi Ruoyao <xry111@xry111.site> # x86_64, aarch64, loongarch64
Tested-by: Stefan Liebler <stli@linux.ibm.com> # s390x

io: Add setuid tests for faccessat

Add a new test tst-faccessat-setuid that iterates through real and
effective UID/GID combination and tests the faccessat() interface for
default and AT_EACCESS flags.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

tst-faccessat.c: Port to libsupport

Use libsupport convenience functions and macros instead of the old
test-skeleton. Also add a new xdup() convenience wrapper function.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

support: Add xdup

Add xdup as the error-checking version of dup for test cases.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

LoongArch: Update ulps

Needed for test-float-cacosh, test-float-csin, test-float32-cacosh and
test-float32-csin.

Signed-off-by: caiyinyu <caiyinyu@loongson.cn>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

stat.h: Fix missing declaration of struct timespec

When building with e.g. -std=c99 and _ATFILE_SOURCE, stat.h was missing
including bits/types/struct_timespec.h to get the struct timespec
declaration for utimensat.

mach: Fix __xpg_strerror_r on in-range but undefined errors [BZ #32350]

For instance, 1073741906 leads to system 16, subsystem 0 and code 82,
which is in range (max_code is 122), but not defined. Return EINVAL in
that case, like

x86/string: Use `movsl` instead of `movsd` [BZ #32344]

`ld`, starting at 2.40, emits a warning when using `movsd`. There is
no change to the actual code produced.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Rename new tst-sem17 test to tst-sem18

As noted by Adhemerval, we already have a tst-sem17 in nptl.

Tested for x86_64.

Avoid uninitialized result in sem_open when file does not exist

A static analyzer apparently reported an uninitialized use of the
variable result in sem_open in the case where the file is required to
exist but does not exist.

The report appears to be correct; set result to SEM_FAILED in that
case, and add a test for it.

Note: the test passes for me even without the sem_open fix, I guess
because result happens to get value SEM_FAILED (i.e. 0) when
uninitialized.

Tested for x86_64.

nptl: initialize rseq area prior to registration

Per the rseq syscall documentation, 3 fields are required to be
initialized by userspace prior to registration, they are 'cpu_id',
'rseq_cs' and 'flags'. Since we have no guarantee that 'struct pthread'
is cleared on all architectures, explicitly set those 3 fields prior to
registration.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>

s390x: Update ulps

Needed for test-float-cacosh, test-float-csin, test-float32-cacosh and
test-float32-csin.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

elf: avoid jumping over a needed declaration

The declaration of found_other_class could be jumped
over via the goto just above it, but the code jumped
to uses found_other_class. Move the declaration
up a bit to ensure it's properly declared and initialized.

math: Fix log10f on some ABIs

The commit 9247f53219 triggered some regressions on loongarch and
riscv:

math/test-float-log10
math/test-float32-log10

And it is due a wrong sync with CORE-MATH for special 0.0/-0.0
inputs.

Checked on aarch64-linux-gnu and loongarch64-linux-gnu-lp64d.

stdio-common: Add tests for formatted vsnprintf output specifiers

Wire vsnprintf into test infrastructure for formatted printf output
specifiers.

Reviewed-by: DJ Delorie <dj@redhat.com>

stdio-common: Add tests for formatted vsprintf output specifiers

Wire vsprintf into test infrastructure for formatted printf output
specifiers.

Reviewed-by: DJ Delorie <dj@redhat.com>

stdio-common: Add tests for formatted vfprintf output specifiers

Wire vfprintf into test infrastructure for formatted printf output
specifiers.

Reviewed-by: DJ Delorie <dj@redhat.com>

stdio-common: Add tests for formatted vdprintf output specifiers

Wire vdprintf into test infrastructure for formatted printf output
specifiers.

Reviewed-by: DJ Delorie <dj@redhat.com>

stdio-common: Add tests for formatted vasprintf output specifiers

Wire vasprintf into test infrastructure for formatted printf output
specifiers.

Owing to mtrace logging these tests take amounts of time to complete
similar to those of corresponding asprintf tests, so set timeouts for
the tests accordingly, with a global default for all the vasprintf
tests, and then individual higher settings for double and long double
tests each.

Reviewed-by: DJ Delorie <dj@redhat.com>

stdio-common: Add tests for formatted vprintf output specifiers

Wire vprintf into test infrastructure for formatted printf output
specifiers.

Reviewed-by: DJ Delorie <dj@redhat.com>

stdio-common: Add tests for formatted snprintf output specifiers

Wire snprintf into test infrastructure for formatted printf output
specifiers.

Reviewed-by: DJ Delorie <dj@redhat.com>

stdio-common: Add tests for formatted sprintf output specifiers

Wire sprintf into test infrastructure for formatted printf output
specifiers.

Reviewed-by: DJ Delorie <dj@redhat.com>

stdio-common: Add tests for formatted fprintf output specifiers

Wire fprintf into test infrastructure for formatted printf output
specifiers.

Reviewed-by: DJ Delorie <dj@redhat.com>

stdio-common: Add tests for formatted dprintf output specifiers

Wire dprintf into test infrastructure for formatted printf output
specifiers.

Reviewed-by: DJ Delorie <dj@redhat.com>