Samuel Thibault [Tue, 14 Nov 2023 01:03:35 +0000 (02:03 +0100)]
hurd: Make _hurd_intr_rpc_mach_msg avoid returning MACH_SEND_INTERRUPTED
When the given options do not include MACH_SEND_INTERRUPT,
_hurd_intr_rpc_mach_msg (aka mach_msg) is not supposed to return
MACH_SEND_INTERRUPTED. In such a case we thus have to retry sending the
message.
This was observed to fix various occurrences of spurious
"(ipc/send) interrupted" errors when running haskell programs.
Wilco Dijkstra [Thu, 26 Oct 2023 16:30:36 +0000 (17:30 +0100)]
AArch64: Remove Falkor memcpy
The latest implementations of memcpy are actually faster than the Falkor
implementations [1], so remove the falkor/phecda ifuncs for memcpy and
the now unused IS_FALKOR/IS_PHECDA defines.
Wilco Dijkstra [Thu, 26 Oct 2023 16:07:21 +0000 (17:07 +0100)]
AArch64: Add memset_zva64
Add a specialized memset for the common ZVA size of 64 to avoid the
overhead of reading the ZVA size. Since the code is identical to
__memset_falkor, remove the latter.
Florian Weimer [Wed, 8 Nov 2023 14:18:02 +0000 (15:18 +0100)]
stdlib: Avoid element self-comparisons in qsort
This improves compatibility with applications which assume that qsort
does not invoke the comparison function with equal pointer arguments.
The newly introduced branches should be predictable, as leading to a
call to the comparison function. If the prediction fails, we avoid
calling the function.
The PR_SET_VMA_ANON_NAME support is only enabled through a configurable
kernel switch, mainly because assigning a name to a
anonymous virtual memory area might prevent that area from being
merged with adjacent virtual memory areas.
The kernel will potentially merge both mappings resulting in only one
segment of size 0x800000. If the segment is names with
PR_SET_VMA_ANON_NAME with different names, it results in two mappings.
Although this will unlikely be an issue for pthread stacks and malloc
arenas (since for pthread stacks the guard page will result in
a PROT_NONE segment, similar to the alignment requirement for the arena
block), it still might prevent the mmap memory allocated for detail
malloc.
There is also another potential scalability issue, where the prctl
requires
to take the mmap global lock which is still not fully fixed in Linux
[1] (for pthread stacks and arenas, it is mitigated by the stack
cached and the arena reuse).
So this patch disables anonymous mapping annotations as default and
add a new tunable, glibc.mem.decorate_maps, can be used to enable
it.
[1] https://lwn.net/Articles/906852/
Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
Linux 4.5 removed thread stack annotations due to the complexity of
computing them [1], and Linux added PR_SET_VMA_ANON_NAME on 5.17
as a way to name anonymous virtual memory areas.
This patch adds decoration on the stack created and used by
pthread_create, for glibc crated thread stack the /proc/self/maps will
now show:
[anon: glibc: pthread stack: <tid>]
And for user-provided stacks:
[anon: glibc: pthread user stack: <tid>]
The guard page is not decorated, and the mapping name is cleared when
the thread finishes its execution (so the cached stack does not have any
name associated).
Checked on x86_64-linux-gnu aarch64 aarch64-linux-gnu.
Linux 5.17 added support to naming anonymous virtual memory areas
through the prctl syscall. The __set_vma_name is a wrapper to avoid
optimizing the prctl call if the kernel does not support it.
If the kernel does not support PR_SET_VMA_ANON_NAME, prctl returns
EINVAL. And it also returns the same error for an invalid argument.
Since it is an internal-only API, it assumes well-formatted input:
aligned START, with (START, START+LEN) being a valid memory range,
and NAME with a limit of 80 characters without an invalid one
("\\`$[]"). Reviewed-by: DJ Delorie <dj@redhat.com>
sysdeps: sem_open: Clear O_CREAT when semaphore file is expected to exist [BZ #30789]
When invoking sem_open with O_CREAT as one of its flags, we'll end up
in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
& O_EXCL) == 0)", which means that we don't expect the semaphore file
to exist.
In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
| O_CLOEXEC" and there's an attempt to open(2) the file, which will
likely fail because it won't exist. After that first (expected)
failure, some cleanup is done and we go back to the label "try_again",
which lives in the first part of the aforementioned "if".
The problem is that, in that part of the code, we expect the semaphore
file to exist, and as such O_CREAT (this time the flag we pass to
open(2)) needs to be cleaned from open_flags, otherwise we'll see
another failure (this time unexpected) when trying to open the file,
which will lead the call to sem_open to fail as well.
This can cause very strange bugs, especially with OpenMPI, which makes
extensive use of semaphores.
Fix the bug by simplifying the logic when choosing open(2) flags and
making sure O_CREAT is not set when the semaphore file is expected to
exist.
A regression test for this issue would require a complex and cpu time
consuming logic, since to trigger the wrong code path is not
straightforward due the racy condition. There is a somewhat reliable
reproducer in the bug, but it requires using OpenMPI.
This resolves BZ #30789.
See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912
Signed-off-by: Sergio Durigan Junior <sergiodj@sergiodj.net> Co-Authored-By: Simon Chopin <simon.chopin@canonical.com> Co-Authored-By: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Fixes: 533deafbdf189f5fbb280c28562dd43ace2f4b0f ("Use O_CLOEXEC in more places (BZ #15722)")
Maxim Kuvyrkov [Fri, 19 May 2023 08:28:21 +0000 (08:28 +0000)]
Format test results closer to what DejaGnu does
The years of dealing with Binutils, GCC and GDB test results
made the community create good tools for comparison and analysis
of DejaGnu test results. This change allows to use those tools
for Glibc's test results as well.
The motivation for this change is Linaro's pre-commit testers,
which use a modified version of GCC's validate_failures.py
to create test xfail lists with baseline failures and known
flaky tests. See below links for an example xfails file (only
one link is supposed to work at any given time):
- https://ci.linaro.org/job/tcwg_glibc_check--master-arm-build/lastSuccessfulBuild/artifact/artifacts/artifacts.precommit/sumfiles/xfails.xfail/*view*/
- https://ci.linaro.org/job/tcwg_glibc_check--master-arm-build/lastSuccessfulBuild/artifact/artifacts/sumfiles/xfails.xfail/*view*/
Specifacally, this patch changes format of glibc's .sum files from ...
<cut>
FAIL: elf/test1
PASS: string/test2
</cut>
... to ...
<cut>
=== glibc tests ===
Running elf ...
FAIL: elf/test1
Running string ...
PASS: string/test2
</cut>.
And output of "make check" from ...
<cut>
FAIL: elf/test1
</cut>
... to ...
<cut>
FAIL: elf/test1
=== Summary of results ===
1 FAIL
1 PASS
</cut>.
Wilco Dijkstra [Tue, 24 Oct 2023 12:51:07 +0000 (13:51 +0100)]
AArch64: Cleanup ifuncs
Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than
ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename
strlen_mte to strlen_generic. Remove rtld-memset.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Arjun Shankar [Tue, 31 Oct 2023 09:44:32 +0000 (10:44 +0100)]
Use correct subdir when building tst-rfc3484* for mach and arm
Commit 7f602256ab5b85db1dbfb5f40bd109c4b37b68c8 moved the tst-rfc3484*
tests from posix/ to nss/, but didn't correct references to point to
their new subdir when building for mach and arm. This commit fixes
that.
This patch adds a qsort and qsort_r to trigger the worst case
scenario for the quicksort (which glibc current lacks coverage).
The test is done with random input, dfferent internal types (uint8_t,
uint16_t, uint32_t, uint64_t, large size), and with
different set of element numbers.
Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
stdlib: Remove use of mergesort on qsort (BZ 21719)
This patch removes the mergesort optimization on qsort implementation
and uses the introsort instead. The mergesort implementation has some
issues:
- It is as-safe only for certain types sizes (if total size is less
than 1 KB with large element sizes also forcing memory allocation)
which contradicts the function documentation. Although not required
by the C standard, it is preferable and doable to have an O(1) space
implementation.
- The malloc for certain element size and element number adds
arbitrary latency (might even be worse if malloc is interposed).
- To avoid trigger swap from memory allocation the implementation
relies on system information that might be virtualized (for instance
VMs with overcommit memory) which might lead to potentially use of
swap even if system advertise more memory than actually has. The
check also have the downside of issuing syscalls where none is
expected (although only once per execution).
- The mergesort is suboptimal on an already sorted array (BZ#21719).
The introsort implementation is already optimized to use constant extra
space (due to the limit of total number of elements from maximum VM
size) and thus can be used to avoid the malloc usage issues.
Resulting performance is slower due the usage of qsort, specially in the
worst-case scenario (partialy or sorted arrays) and due the fact
mergesort uses a slight improved swap operations.
This change also renders the BZ#21719 fix unrequired (since it is meant
to fix the sorted input performance degradation for mergesort). The
manual is also updated to indicate the function is now async-cancel
safe.
Checked on x86_64-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
This patch makes the quicksort implementation to acts as introsort, to
avoid worse-case performance (and thus making it O(nlog n)). It switch
to heapsort when the depth level reaches 2*log2(total elements). The
heapsort is a textbook implementation.
Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
The optimization takes in consideration both the most common elements
are either 32 or 64 bit in size and inputs are aligned to the word
boundary. This is similar to what msort does.
For large buffer the swap operation uses memcpy/mempcpy with a
small fixed size buffer (so compiler might inline the operations).
Checked on x86_64-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
All the crypt related functions, cryptographic algorithms, and
make requirements are removed, with only the exception of md5
implementation which is moved to locale folder since it is
required by localedef for integrity protection (libc's
locale-reading code does not check these, but localedef does
generate them).
Besides thec code itself, both internal documentation and the
manual is also adjusted. This allows to remove both --enable-crypt
and --enable-nss-crypt configure options.
Checked with a build for all affected ABIs.
Co-authored-by: Zack Weinberg <zack@owlfolio.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The libcrypt was maked to be phase out on 2.38, and a better project
already exist that provide both compatibility and better API
(libxcrypt). The sparc optimizations add the burden to extra
build-many-glibcs.py configurations.
Checked on sparc64 and sparcv9. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Wilco Dijkstra [Tue, 17 Oct 2023 15:54:21 +0000 (16:54 +0100)]
AArch64: Add support for MOPS memcpy/memmove/memset
Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for
memcpy, memmove and memset (use .inst for now so it works with all binutils
versions without needing complex configure and conditional compilation).
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Arjun Shankar [Mon, 2 Oct 2023 12:55:28 +0000 (14:55 +0200)]
Move getnameinfo from 'inet' to 'nss'
getnameinfo is an entry points for nss functionality. This commit moves
it from the 'inet' subdirectory to 'nss'. The corresponding Versions
entry is also moved from 'posix' into 'nss'. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:27 +0000 (14:55 +0200)]
Move getaddrinfo from 'posix' into 'nss'
getaddrinfo is an entry point for nss functionality. This commit moves
it from 'sysdeps/posix' to 'nss', gets rid of the stub in 'posix', and
moves all associated tests as well. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:26 +0000 (14:55 +0200)]
Move 'services' routines from 'inet' into 'nss'
The getservby* and getservent* routines are entry points for nss
functionality. This commit moves them from the 'inet' subdirectory to
'nss'. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:25 +0000 (14:55 +0200)]
Move 'rpc' routines from 'inet' into 'nss'
The getrpcby* and getrpcent* routines are entry points for nss
functionality. This commit moves them from the 'inet' subdirectory to
'nss'. The Versions entries for these routines along with a test,
located in the 'sunrpc' subdirectory, are also moved into 'nss'. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:24 +0000 (14:55 +0200)]
Move 'protocols' routines from 'inet' into 'nss'
The getprotoby* and getprotoent* routines are entry points for nss
functionality. This commit moves them from the 'inet' subdirectory to
'nss'. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:23 +0000 (14:55 +0200)]
Move 'networks' routines from 'inet' into 'nss'
The getnetby* and getnetent* routines are entry points for nss
functionality. This commit moves them from the 'inet' subdirectory to
'nss'. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:22 +0000 (14:55 +0200)]
Move 'netgroup' routines from 'inet' into 'nss'
These netgroup routines are entry points for nss functionality.
This commit moves them along with netgroup.h from the 'inet'
subdirectory to 'nss', and adjusts any references accordingly. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:21 +0000 (14:55 +0200)]
Move 'hosts' routines from 'inet' into 'nss'
The gethostby* and gethostent* routines are entry points for nss
functionality. This commit moves them from the 'inet' subdirectory to
'nss'. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:20 +0000 (14:55 +0200)]
Move 'ethers' routines from 'inet' into 'nss'
ether_hostton and ether_ntohost are entry points for nss functionality.
This commit moves them from the 'inet' subdirectory to 'nss', and
adjusts any references accordingly. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:19 +0000 (14:55 +0200)]
Move 'aliases' routines from 'inet' into 'nss'
The aliases routines are entry points for nss functionality. This
commit moves aliases.h and the aliases routines from the 'inet'
subdirectory to 'nss', and adjusts any external references. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:18 +0000 (14:55 +0200)]
Remove 'shadow' and merge into 'nss'
The majority of shadow routines are entry points for nss functionality.
This commit removes the 'shadow' subdirectory and moves all
functionality and tests to 'nss'. References to shadow/ are accordingly
changed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:17 +0000 (14:55 +0200)]
Remove 'pwd' and merge into 'nss'
The majority of pwd routines are entry points for nss functionality.
This commit removes the 'pwd' subdirectory and moves all functionality
and tests to 'nss'. References to pwd/ are accordingly changed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:16 +0000 (14:55 +0200)]
Remove 'gshadow' and merge into 'nss'
The majority of gshadow routines are entry points for nss functionality.
This commit removes the 'gshadow' subdirectory and moves all
functionality and tests to 'nss'. References to gshadow/ are
accordingly changed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:15 +0000 (14:55 +0200)]
Remove 'grp' and merge into 'nss' and 'posix'
The majority of grp routines are entry points for nss functionality.
This commit removes the 'grp' subdirectory and moves all nss-relevant
functionality and all tests to 'nss', and the 'setgroups' stub into
'posix' (alongside the 'getgroups' stub). References to grp/ are
accordingly changed. In addition, compat-initgroups.c, a fallback
implementation of initgroups is renamed to initgroups-fallback.c so that
the build system does not confuse it for nss_compat/compat-initgroups.c.
Build time improves very slightly; e.g. down from an average of 45.5s to
44.5s on an 8-thread mobile x86_64 CPU. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Simon Chopin [Thu, 5 Oct 2023 12:54:31 +0000 (14:54 +0200)]
test-container: disable ld.so system cache on DSO detection
When building the testroot, the script runs the newly built ld.so on a
couple of binaries in order to copy over any additional libraries
needed. However, if the dependencies are found in the system cache, it
will be copied over using that path.
This is problematic if the system ld.so and the one built don't have the
exact same search configuration. We encountered this in Ubuntu, where we
build a variant of libc with -fno-omit-frame-pointer for accurate
performance profiling.
This variant is built using a non-standard slibdir to be able to be
co-installed with the default library (e.g. slibdir = /lib/libc6-prof).
Since we have /lib pointing to /usr/lib, any additional dependency
should still be reachable via /usr. However, resolving via the cache
might result in the additional DSOs being copied into $testroot/lib, out
of the search path in the container.
The problem has been triggered by 1d5024f4f052c12e404d42d3b5bfe9c3e9fd27c4
("support: Build with exceptions and asynchronous unwind tables [BZ #30587]")
which introduced a dependency on libgcc_s.so.1 under some circumstances.
Stefan Liebler [Thu, 19 Oct 2023 12:35:59 +0000 (14:35 +0200)]
tst-spawn-cgroup.c: Fix argument order of UNSUPPORTED message.
The arguments for "expected" and "got" are mismatched. Furthermore
this patch is dumping both values as hex. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Florian Weimer <fweimer@redhat.com>
Stefan Liebler [Wed, 18 Oct 2023 13:08:40 +0000 (15:08 +0200)]
s390: Fix undefined behaviour in feenableexcept, fedisableexcept [BZ #30960]
If feenableexcept or fedisableexcept gets excepts=FE_INVALID=0x80
as input, we have a signed left shift: 0x80 << 24 which is not
representable as int and thus is undefined behaviour according to
C standard.
This patch casts excepts as unsigned int before shifting, which is
defined.
For me, the observed undefined behaviour is that the shift is done
with "unsigned"-instructions, which is exactly what we want.
Furthermore, I don't get any exception-flags.
After the fix, the code is using the same instruction sequence as
before.
The commit changes the order of ELF destructor calls too much relative
to what applications expect or can handle. In particular, during
process exit and _dl_fini, after the revert commit, we no longer call
the destructors of the main program first; that only happens after
some dlopen'ed objects have been destructed. This robs applications
of an opportunity to influence destructor order by calling dlclose
explicitly from the main program's ELF destructors. A couple of
different approaches involving reverse constructor order were tried,
and none of them worked really well. It seems we need to keep the
dependency sorting in _dl_fini.
There is also an ambiguity regarding nested dlopen calls from ELF
constructors: Should those destructors run before or after the object
that called dlopen? Commit 6985865bc3ad5b2314 used reverse order
of the start of ELF constructor calls for destructors, but arguably
using completion of constructors is more correct. However, that alone
is not sufficient to address application compatibility issues (it
does not change _dl_fini ordering at all).
This patch implements comprehensive tests for strlcat/wcslcat
functions. Tests are mostly derived from strncat test suites
and modified to incorporate strlcat/wcslcat specifications.
This patch implements comprehensive tests for strlcpy/wcslcpy
functions. Tests are mostly derived from strncpy test suites
and modified to incorporate strlcpy/wcslcpy specifications.
Joseph Myers [Mon, 16 Oct 2023 13:19:26 +0000 (13:19 +0000)]
Add SCM_SECURITY, SCM_PIDFD to bits/socket.h
Linux 6.5 adds a constant SCM_PIDFD (recall that the non-uapi
linux/socket.h, where this constant is added, is in fact a header
providing many constants that are part of the kernel/userspace
interface). This shows up that SCM_SECURITY, from the same set of
definitions and added in Linux 2.6.17, is also missing from glibc,
although glibc has the first two constants from this set, SCM_RIGHTS
and SCM_CREDENTIALS; add both missing constants to glibc.
Joseph Myers [Mon, 16 Oct 2023 13:18:51 +0000 (13:18 +0000)]
Add AT_HANDLE_FID from Linux 6.5 to bits/fcntl-linux.h
Linux 6.5 adds a constant AT_HANDLE_FID; add it to glibc. Because
this is a flag for the function name_to_handle_at declared in
bits/fcntl-linux.h, put the flag there rather than alongside other
AT_* flags in (OS-independent) fcntl.h.
Andreas Schwab [Sun, 8 Oct 2023 16:23:30 +0000 (18:23 +0200)]
Avoid maybe-uninitialized warning in __kernel_rem_pio2
With GCC 14 on 32-bit x86 the compiler emits a maybe-uninitialized
warning:
../sysdeps/ieee754/dbl-64/k_rem_pio2.c: In function '__kernel_rem_pio2':
../sysdeps/ieee754/dbl-64/k_rem_pio2.c:364:20: error: 'fq' may be used uninitialized [-Werror=maybe-uninitialized]
364 | y[0] = fq[0]; y[1] = fq[1]; y[2] = fw;
| ~~^~~
This is similar to the warning that is suppressed in the other branch of
the switch. Help the compiler knowing that the variable is always
initialized, which also makes the suppression obsolete.
Stefan Liebler [Thu, 28 Sep 2023 10:50:40 +0000 (12:50 +0200)]
Fix WAIT_FOR_DEBUGGER for container tests.
For container tests, gdb needs to set the sysroot to the corresponding
testroot.root directory. The assumption was that PIDs < 3 means that
we are running within a container.
Starting with commit 2fe64148a81f0d78050c302f34a6853d21f7cae4
"Allow for unpriviledged nested containers", the default is to use
the PID namespace of the parent. Thus support_test_main.c does not
recognize our container anymore.
This patch now assumes that we are running inside a container if
test-container.c has set PID_OUTSIDE_CONTAINER and always uses this
PID independent of having a new PID namespace or not. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Andreas Schwab [Wed, 11 Oct 2023 14:22:16 +0000 (16:22 +0200)]
stdlib: fix grouping verification with multi-byte thousands separator (bug 30964)
The grouping verification only worked for a single-byte thousands
separator. With a multi-byte separator it returned as if no separators
were present. The actual parsing in str_to_mpn will then go wrong when
there are multiple adjacent multi-byte separators in the number.
DJ Delorie [Thu, 21 Sep 2023 21:24:05 +0000 (17:24 -0400)]
build-many-glibcs: Check for required system tools
Notes for future devs:
* Add tools as you find they're needed, with version 0,0
* Bump version when you find an old tool that doesn't work
* Don't add a version just because you know it works
x86: Prepare `strrchr-evex` and `strrchr-evex512` for AVX10
This commit refactors `strrchr-evex` and `strrchr-evex512` to use a
common implementation: `strrchr-evex-base.S`.
The motivation is `strrchr-evex` needed to be refactored to not use
64-bit masked registers in preperation for AVX10.
Once vec-width masked register combining was removed, the EVEX and
EVEX512 implementations can easily be implemented in the same file
without any major overhead.
The net result is performance improvements (measured on TGL) for both
`strrchr-evex` and `strrchr-evex512`. Although, note there are some
regressions in the test suite and it may be many of the cases that
make the total-geomean of improvement/regression across bench-strrchr
are cold. The point of the performance measurement is to show there
are no major regressions, but the primary motivation is preperation
for AVX10.
Benchmarks where taken on TGL:
https://www.intel.com/content/www/us/en/products/sku/213799/intel-core-i711850h-processor-24m-cache-up-to-4-80-ghz/specifications.html
EVEX geometric_mean(N=5) of all benchmarks New / Original : 0.74
EVEX512 geometric_mean(N=5) of all benchmarks New / Original: 0.87
Joe Ramsay [Wed, 4 Oct 2023 09:38:57 +0000 (10:38 +0100)]
aarch64: Optimise vecmath logs
* Transpose table layout for improved memory access
* Use half-vector special comparisons for AdvSIMD
* Improve register use near special-case branches
- Due to the presence of a function call, return value would get
mov-d out of x0 in order to facilitate PCS. By moving the final
computation after the branch this can be avoided
Also change SVE routines to use overloaded intrinsics for readability.
Joe Ramsay [Wed, 4 Oct 2023 09:40:04 +0000 (10:40 +0100)]
aarch64: Cosmetic change in SVE exp routines
Use overloaded intrinsics for readability. Codegen does not
change, however while we're bringing the routines up-to-date with
recent improvements to other routines in AOR it is worth copying
this change over as well.
Joe Ramsay [Thu, 5 Oct 2023 09:31:38 +0000 (10:31 +0100)]
aarch64: Improve vecmath sin routines
* Update ULP comment reflecting a new observed max in [-pi/2, pi/2]
* Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions
* Improve register use near special-case branch
Volker Weißmann [Tue, 3 Oct 2023 17:18:44 +0000 (19:18 +0200)]
Fix FORTIFY_SOURCE false positive
When -D_FORTIFY_SOURCE=2 was given during compilation,
sprintf and similar functions will check if their
first argument is in read-only memory and exit with
*** %n in writable segment detected ***
otherwise. To check if the memory is read-only, glibc
reads frpm the file "/proc/self/maps". If opening this
file fails due to too many open files (EMFILE), glibc
will now ignore this error.
Arjun Shankar [Mon, 2 Oct 2023 12:55:14 +0000 (14:55 +0200)]
nss: Rearrange and sort Makefile variables
Rearrange lists of routines, tests, etc. into one-per-line in
nss/Makefile and sort them using scripts/sort-makefile-lines.py. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Arjun Shankar [Mon, 2 Oct 2023 12:55:13 +0000 (14:55 +0200)]
inet: Rearrange and sort Makefile variables
Rearrange lists of routines, tests, etc. into one-per-line in
inet/Makefile and sort them using scripts/sort-makefile-lines.py. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>