nptl: Fix stray process left by tst-cancel7 blocking testing
Fix an issue with commit b74121ae4bc5 ("Update.") and prevent a stray
process from being left behind by tst-cancel7 (and also tst-cancelx7,
which is the same test built with '-fexceptions' additionally supplied
to the compiler), which then blocks remote testing until the process has
been killed by hand.
This test case creates a thread that runs an extra copy of the test via
system(3) and using the '--direct' option so that the test wrapper does
not interfere with this instance. This extra copy executes its business
and calls sigsuspend(2) and then never terminates by itself. Instead it
relies on being killed by the main test process directly via a thread
cancellation request or, should that fail, by issuing SIGKILL either at
the conclusion of 'do_test' or by the test driver via 'do_cleanup' where
the test timeout has been hit or the test driver interrupted.
However if the main test process has been instead killed by a signal,
such as due to incorrect execution, before it had a chance to kill the
extra copy of the test case, then the test wrapper will terminate
without running 'do_cleanup' and consequently the extra copy of the test
case will remain forever in its suspended state, and in the remote case
in particular it means that the remote test wrapper will wait forever
for the SSH command to complete.
This has been observed with the 'alpha-linux-gnu' target, where the main
test process triggers SIGSEGV and the test wrapper correctly records:
Didn't expect signal from child: got `Segmentation fault'
in nptl/tst-cancel7.out and terminates, but then the calling SSH command
continues waiting for the remaining process started in the same session
on the remote target to complete.
Address this problem by also registering 'do_cleanup' via atexit(3),
observing that 'support_delete_temp_files' is registered by the test
wrapper before the test initializing function 'do_prepare' is called and
that we call all the functions registered in the reverse of the order in
which they were registered, so it is safe to refer to 'pidfilename' in
'do_cleanup' invoked by exit(3) because by that time temporary files
have not yet been deleted.
A minor inconvenience is that if 'signal_handler' is invoked in the test
wrapper as a result of SIGALRM rather than SIGINT, then 'do_cleanup'
will be called twice, once as a cleanup handler and again by exit(3).
In reality it is harmless though, because issuing SIGKILL is guarded by
a record lock, so if the first call has succeeded in killing the extra
copy of the test case, then the subsequent call will do nothing.
Move the release of the semaphore used to synchronize between an extra
copy of the test run as a separate process and the main test process
until after the PID file has been locked. It is so that if the cleanup
function gets called by the test driver due to premature termination of
the main test process, then the function does not get at the PID file
before it has been locked and conclude that the extra copy of the test
has already terminated. This won't usually happen due to a relatively
high amount of time required to elapse before timeout triggers in the
test driver, but it will change with the next change.
There is still a small time window remaining with this change in place
where the main test process gets killed for some reason between the
extra copy of the test has been already started by pthread_create(3) and
a successful return from the call to sem_wait(3), in which case the
cleanup function can be reached before PID has been written to the PID
file and the file locked. It seems that with the test case structured
as it is now and PID-based process management we have no means to avoid
it.
Wilco Dijkstra [Wed, 7 Aug 2024 13:49:33 +0000 (14:49 +0100)]
benchtests: Add random memset benchmark
Add a new randomized memset test similar to bench-random-memcpy. Instead of
repeating the same call to memset over and over again, it times a large number
of different inputs. The distribution of memset length and alignment is based
on SPEC2017 (length up to 4096 and alignment up to 64).
Wilco Dijkstra [Wed, 7 Aug 2024 13:43:47 +0000 (14:43 +0100)]
AArch64: Improve generic strlen
Improve performance by handling another 16 bytes before entering the loop.
Use ADDHN in the loop to avoid SHRN+FMOV when it terminates. Change final
size computation to avoid increasing latency. On Neoverse V1 performance
of the random strlen benchmark improves by 4.6%.
Paul Zimmermann [Thu, 25 Jul 2024 14:38:08 +0000 (16:38 +0200)]
added inputs giving large errors on x86_64 for new C23 functions
These functions are exp10m1, exp2m1, log10p1, log2p1.
Also regenerated ulps on x86_64.
For each format, there are 4 values, one for each rounding mode.
(For the intel96 format, there are 8 values, 4 for Intel hardware,
and 4 for AMD hardware. However, regen-ulps was only run on Intel.
It should be run in a separate patch on a AMD x86_64.) Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This commit further clarifies these entries and brings getc and getwc in
line with the descriptions of putc and putwc, removing any performance
claims from them as well. Reviewed-by: Florian Weimer <fweimer@redhat.com>
As for exit, also allows concurrent quick_exit to avoid race
conditions when it is called concurrently. Since it uses the same
internal function as exit, the __exit_lock lock is moved to
__run_exit_handlers. It also solved a potential concurrent when
calling exit and quick_exit concurrently.
The test case 'expected' is expanded to a value larger than the
minimum required by C/POSIX (32 entries) so at_quick_exit() will
require libc to allocate a new block. This makes the test mre likely to
trigger concurrent issues (through free() at __run_exit_handlers)
if quick_exit() interacts with the at_quick_exit list concurrently.
This is also the latest interpretation of the Austin Ticket [1].
Checked on x86_64-linux-gnu.
[1] https://austingroupbugs.net/view.php?id=1845 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Florian Weimer [Thu, 1 Aug 2024 21:31:30 +0000 (23:31 +0200)]
elf: Avoid re-initializing already allocated TLS in dlopen (bug 31717)
The old code used l_init_called as an indicator for whether TLS
initialization was complete. However, it is possible that
TLS for an object is initialized, written to, and then dlopen
for this object is called again, and l_init_called is not true at
this point. Previously, this resulted in TLS being initialized
twice, discarding any interim writes (technically introducing a
use-after-free bug even).
This commit introduces an explicit per-object flag, l_tls_in_slotinfo.
It indicates whether _dl_add_to_slotinfo has been called for this
object. This flag is used to avoid double-initialization of TLS.
In update_tls_slotinfo, the first_static_tls micro-optimization
is removed because preserving the initalization flag for subsequent
use by the second loop for static TLS is a bit complicated, and
another per-object flag does not seem to be worth it. Furthermore,
the l_init_called flag is dropped from the second loop (for static
TLS initialization) because l_need_tls_init on its own prevents
double-initialization.
The remaining l_init_called usage in resize_scopes and update_scopes
is just an optimization due to the use of scope_has_map, so it is
not changed in this commit.
The isupper check ensures that libc.so.6 is TLS is not reverted.
Such a revert happens if l_need_tls_init is not cleared in
_dl_allocate_tls_init for the main_thread case, now that
l_init_called is not checked anymore in update_tls_slotinfo
in elf/dl-open.c.
Reported-by: Jonathon Anderson <janderson@rice.edu> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Andreas Schwab [Mon, 10 Jun 2024 10:19:17 +0000 (12:19 +0200)]
iconv: Fix matching of multi-character transliterations (bug 31859)
Only return __GCONV_INCOMPLETE_INPUT for a partial match when the end of
the input buffer is reached. Otherwise it is a non-match, and other
patterns should be tried.
Samuel Thibault [Wed, 17 Jul 2024 00:03:13 +0000 (02:03 +0200)]
hurd: Fix missing pthread_ compat symbol in libc
5476f8cd2e68 ("htl: move pthread_self info libc.") and 9dfa2562162b ("htl: move pthread_equal into libc") to 1dc0bc8f0748 ("htl: move pthread_attr_setdetachstate into libc")
moved some pthread_ symbols from libpthread.so to libc.so, but missed
adding the compat version like 5476f8cd2e68 ("htl: move pthread_self
info libc.") did: libc already had these symbols as forwards,
but versioned GLIBC_2.21, while the symbols in libpthread.so were
versioned GLIBC_2.12.
To fix running executables built before this, we thus have to add the
GLIBC_2.12 version, otherwise execution fails with e.g.
/usr/lib/i386-gnu/libglib-2.0.so: symbol lookup error: /usr/lib/i386-gnu/libglib-2.0.so: undefined symbol: pthread_attr_setinheritsched, version GLIBC_2.12
H.J. Lu [Wed, 24 Jul 2024 21:05:13 +0000 (14:05 -0700)]
linux: Update the mremap C implementation [BZ #31968]
Update the mremap C implementation to support the optional argument for
MREMAP_DONTUNMAP added in Linux 5.7 since it may not always be correct
to implement a variadic function as a non-variadic function on all Linux
targets. Return MAP_FAILED and set errno to EINVAL for unknown flag bits.
This fixes BZ #31968.
Note: A test must be added when a new flag bit is introduced.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Florian Weimer [Thu, 27 Jun 2024 14:26:56 +0000 (16:26 +0200)]
Enhanced test coverage for strncmp, wcsncmp
Add string/test-strncmp-nonarray and
wcsmbs/test-wcsncmp-nonarray.
This is the test that uncovered bug 31934. Test run time
is more than one minute on a fairly current system, so turn
these into xtests that do not run automatically.
Thought about doing `exit` as well since its only called once per
process, but since its easy to imagine a hot path leading into
`exit(0)`, its less clear if its profitable.
Even if C/POSIX standard states that exit is not formally thread-unsafe,
calling it more than once is UB. The glibc already supports
it for the single-thread, and both elf/nodelete2.c and tst-rseq-disable.c
call exit from a DSO destructor (which is called by _dl_fini, registered
at program startup with __cxa_atexit).
However, there are still race issues when it is called more than once
concurrently by multiple threads. A recent Rust PR triggered this
issue [1], which resulted in an Austin Group ask for clarification [2].
Besides it, there is a discussion to make concurrent calling not UB [3],
wtih a defined semantic where any remaining callers block until the first
call to exit has finished (reentrant calls, leaving through longjmp, and
exceptions are still undefined).
For glibc, at least reentrant calls are required to be supported to avoid
changing the current behaviour. This requires locking using a recursive
lock, where any exit called by atexit() handlers resumes at the point of
the current handler (thus avoiding calling the current handle multiple
times).
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
[1] https://github.com/rust-lang/rust/issues/126600
[2] https://austingroupbugs.net/view.php?id=1845
[3] https://www.openwall.com/lists/libc-coord/2024/07/24/4 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* mseal for all architectures.
* map_shadow_stack for x32.
* Replace sync_file_range with sync_file_range2 for csky (which
fixes a broken sync_file_range usage).
Update syscall-names.list and regenerate the arch-syscall.h headers
with build-many-glibcs.py update-syscalls.
Tested with build-many-glibcs.py. Reviewed-by: Florian Weimer <fweimer@redhat.com>
Michael Karcher [Sun, 28 Jul 2024 13:30:57 +0000 (15:30 +0200)]
Mitigation for "clone on sparc might fail with -EFAULT for no valid reason" (bz 31394)
It seems the kernel can not deal with uncommitted stack space in the area intended
for the register window when executing the clone() system call. So create a nested
frame (proxy for the kernel frame) and flush it from the processor to memory to
force committing pages to the stack before invoking the system call.
Bug: https://www.mail-archive.com/debian-glibc@lists.debian.org/msg62592.html
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31394
See-also: https://lore.kernel.org/sparclinux/62f9be9d-a086-4134-9a9f-5df8822708af@mkarcher.dialup.fu-berlin.de/ Signed-off-by: Michael Karcher <sourceware-bugzilla@mkarcher.dialup.fu-berlin.de> Reviewed-by: DJ Delorie <dj@redhat.com>
manual: make setrlimit() description less ambiguous
The existing description for setrlimit() has some ambiguity. It could be
understood to have the semantics of getrlimit(), i.e., the limits from the
process are stored in the provided rlp pointer.
Make the description more explicit that rlp are the input values, and that
the limits of the process is changed with this function.
The manual entry for `putc' described what "most systems" do instead of
describing the glibc implementation and its guarantees. This commit
fixes that by warning that putc may be implemented as a macro that
double-evaluates `stream', and removing the performance claim.
Even though the current `putc' implementation does not double-evaluate
`stream', offering this obscure guarantee as an extension to what
POSIX allows does not seem very useful.
The entry for `putwc' is also edited to bring it in line with `putc'. Reviewed-by: Florian Weimer <fweimer@redhat.com>
This helps compilers split the codegen for setting up the arguments
(`__expression`, `__filename`, etc...) from the potentially hot cold
where the `assert` is to a presumably cold region on the assertion
failure path.
Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Sam James <sam@gentoo.org>
stdio-common: Add test for vfscanf with matches longer than INT_MAX [BZ #27650]
Complement commit b03e4d7bd25b ("stdio: fix vfscanf with matches longer
than INT_MAX (bug 27650)") and add a test case for the issue, inspired
by the reproducer provided with the bug report.
This has been verified to succeed as from the commit referred and fail
beforehand.
As the test requires 2GiB of data to be passed around its performance
has been evaluated using a choice of systems and the execution time
determined to be respectively in the range of 9s for POWER9@2.166GHz,
24s for FU740@1.2GHz, and 40s for 74Kf@950MHz. As this is on the verge
of and beyond the default timeout it has been increased by the factor of
8. Regardless, following recent practice the test has been added to the
standard rather than extended set.
Add a FAIL test failure helper analogous to FAIL_RET, that does not
cause the current function to return, providing a standardized way to
report a test failure with a message supplied while permitting the
caller to continue executing, for further reporting, cleaning up, etc.
Update existing test cases that provide a conflicting definition of FAIL
by removing the local FAIL definition and then as follows:
- tst-fortify-syslog: provide a meaningful message in addition to the
file name already added by <support/check.h>; 'support_record_failure'
is already called by 'support_print_failure_impl' invoked by the new
FAIL test failure helper.
- tst-ctype: no update to FAIL calls required, with the name of the file
and the line number within of the failure site additionally included
by the new FAIL test failure helper, and error counting plus count
reporting upon test program termination also already provided by
'support_record_failure' and 'support_report_failure' respectively,
called by 'support_print_failure_impl' and 'adjust_exit_status' also
respectively. However in a number of places 'printf' is called and
the error count adjusted by hand, so update these places to make use
of FAIL instead. And last but not least adjust the final summary just
to report completion, with any error count following as reported by
the test driver.
- test-tgmath2: no update to FAIL calls required, with the name of the
file of the failure site additionally included by the new FAIL test
failure helper. Also there is no need to track the return status by
hand as any call to FAIL will eventually cause the test case to return
an unsuccesful exit status regardless of the return status from the
test function, via a call to 'adjust_exit_status' made by the test
driver.
posix: Use <support/check.h> facilities in tst-truncate and tst-truncate64
Remove local FAIL macro in favor to FAIL_RET from <support/check.h>,
which provides equivalent reporting, with the name of the file of the
failure site additionally included, for the tst-truncate-common core
shared between the tst-truncate and tst-truncate64 tests.
nptl: Use <support/check.h> facilities in tst-setuid3
Remove local FAIL macro in favor to FAIL_EXIT1 from <support/check.h>,
which provides equivalent reporting, with the name of the file and the
line number within of the failure site additionally included. Remove
FAIL_ERR altogether and include ": %m" explicitly with the format string
supplied to FAIL_EXIT1 as there seems little value to have a separate
macro just for this.
1. Add shadow stack support to x32 signal.
2. Use the 64-bit map_shadow_stack syscall for x32.
3. Set up shadow stack for x32.
Add the map_shadow_stack system call to <fixup-asm-unistd.h> and regenerate
arch-syscall.h. Tested on Intel Tiger Lake with CET enabled x32. There
are no regressions with CET enabled x86-64. There are no changes in CET
enabled x86-64 _dl_start_user.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
H.J. Lu [Tue, 23 Jul 2024 00:47:21 +0000 (17:47 -0700)]
x86-64: Remove sysdeps/x86_64/x32/dl-machine.h
Remove sysdeps/x86_64/x32/dl-machine.h by folding x32 ARCH_LA_PLTENTER,
ARCH_LA_PLTEXIT and RTLD_START into sysdeps/x86_64/dl-machine.h. There
are no regressions on x86-64 nor x32. There are no changes in x86-64
_dl_start_user. On x32, _dl_start_user changes are
resolv: Do not wait for non-existing second DNS response after error (bug 30081)
In single-request mode, there is no second response after an error
because the second query has not been sent yet. Waiting for it
introduces an unnecessary timeout.
Miguel Martín [Tue, 16 Jul 2024 15:14:57 +0000 (17:14 +0200)]
malloc: add multi-threaded tests for aligned_alloc/calloc/malloc
Improve aligned_alloc/calloc/malloc test coverage by adding
multi-threaded tests with random memory allocations and with/without
cross-thread memory deallocations.
Perform a number of memory allocation calls with random sizes limited
to 0xffff.
Use the existing DSO ('malloc/tst-aligned_alloc-lib.c') to randomize
allocator selection.
The multi-threaded allocation/deallocation is staged as described below:
- Stage 1: Half of the threads will be allocating memory and the
other half will be waiting for them to finish the allocation.
- Stage 2: Half of the threads will be allocating memory and the
other half will be deallocating memory.
- Stage 3: Half of the threads will be deallocating memory and the
second half waiting on them to finish.
Add 'malloc/tst-aligned-alloc-random-thread.c' where each thread will
deallocate only the memory that was previously allocated by itself.
Add 'malloc/tst-aligned-alloc-random-thread-cross.c' where each thread
will deallocate memory that was previously allocated by another thread.
The intention is to be able to utilize existing malloc testing to ensure
that similar allocation APIs are also exposed to the same rigors. Reviewed-by: Arjun Shankar <arjun@redhat.com>
Miguel Martín [Tue, 16 Jul 2024 15:14:56 +0000 (17:14 +0200)]
malloc: avoid global locks in tst-aligned_alloc-lib.c
Make sure the DSO used by aligned_alloc/calloc/malloc tests does not get
a global lock on multithreaded tests. Reviewed-by: Arjun Shankar <arjun@redhat.com>
elf: Fix localplt.awk for DT_RELR-enabled builds (BZ 31978)
For each input readelf output, localplt.awk parses each 'Relocation
section' entry, checks its offset against the dynamic section entry, and
saves each DT_JMPREL, DT_RELA, and DT_REL offset value it finds. After
all lines are read, the script checks if any segment offset differed
from 0, meaning at least one 'Relocation section' was matched.
However, if the shared object was built with RELR support and the static
linker could place all the relocation on DT_RELR, there would be no
DT_JMPREL, DT_RELA, and DT_REL entries; only a DT_RELR.
For the current three ABIs that support (aarch64, x86, and powerpc64),
the powerpc64 ld.so shows the behavior above. Both x86_64 and aarch64
show extra relocations on '.rela.dyn', which makes the script check to
succeed.
This patch fixes by handling DT_RELR, where the offset is checked
against the dynamic section entries and if the shared object contains an
entry it means that there are no extra PLT entries (since all
relocations are relative).
It fixes the elf/check-localplt failure on powerpc.
Checked with a build/check for aarch64-linux-gnu, x86_64-linux-gnu,
i686-linux-gnu, arm-linux-gnueabihf, s390x-linux-gnu, powerpc-linux-gnu,
powerpc64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
linux: Also check pkey_get for ENOSYS on tst-pkey (BZ 31996)
The powerpc pkey_get/pkey_set support was only added for 64-bit [1],
and tst-pkey only checks if the support was present with pkey_alloc
(which does not fail on powerpc32, at least running a 64-bit kernel).
Samuel Thibault [Wed, 17 Jul 2024 12:56:14 +0000 (14:56 +0200)]
htl: Fix __pthread_init_thread declaration and definition
0e75c4a4634f ("hurd: Fix pthread_self() without libpthread") added a
declaration for ___pthread_init_thread instead of __pthread_init_thread,
and missed defining the external hidden symbol.
Samuel Thibault [Wed, 17 Jul 2024 12:06:25 +0000 (14:06 +0200)]
hurd: Fix pthread_self() without libpthread
5476f8cd2e68 ("htl: move pthread_self info libc.") moved the htl
pthread_self() function from libpthread to libc, replacing the previous libc
stub that just returns 0. And 53da64d1cf36 ("htl: Initialize ___pthread_self
early") added initialization code which is needed before being able to
call pthread_self. It is currently in libpthread, and thus never called
before programs can call pthread_self from libc, which then segfaults
when accessing _pthread_self()->thread.
This moves the initialization to libc itself, as initialized variables, so
pthread_self can always be called fine.
LoongArch: Add cfi instructions for _dl_tlsdesc_dynamic
In _dl_tlsdesc_dynamic, there are three 'addi.d sp, sp, -size'
instructions to allocate stack size for Float/LSX/LASX registers.
Every 'addi.d sp, sp, -size' needs a cfi_adjust_cfa_offset because
of sp is used to compute CFA. But only one 'addi.d sp, sp, -size'
will be run according to HWCAP value. And all cfi_adjust_cfa_offset
will be executed in stack unwinding, it result in incorrect CFA.
Change _dl_tlsdesc_dynamic to _dl_tlsdesc_dynamic,
_dl_tlsdesc_dynamic_lsx and _dl_tlsdesc_dynamic_lasx.
Conflicting cfi instructions can be distributed to the three functions.
And cfi instructions can correspond to stack down instructions.