string: Fix tester build with fortify enable with gcc 6
When building with fortify enabled, GCC 6 issues an warning the fortify
wrapper might overflow the destination buffer. However, GCC does not
provide a specific flag to disable the warning (the failure is tied to
-Werror). So to avoid disable all errors, only enable the check for
GCC 7 or newer.
Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
On __convert_scm_timestamps GCC 6 issues an warning that tvts[0]/tvts[1]
maybe be used uninitialized, however it would be used if type is set to a
value different than 0 (done by either COMPAT_SO_TIMESTAMP_OLD or
COMPAT_SO_TIMESTAMPNS_OLD) which will fallthrough to 'common' label.
It does not show with gcc 7 or more recent versions.
Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Similar to memcpy, mempcpy, and memmove there is no need for an
specific memset_chk-nonshared.S. It can be provided by
memset-ia32.S itself for static library.
Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Although is not an issue for normal static builds, with fortify=3
glibc itself might use the fortify chk functions and thus static
build might fail with multiple definitions. For instance:
x86_64-glibc-linux-gnu-gcc -m32 -march=i686 -o [...]math/test-signgam-uchar-static -nostdlib -nostartfiles -static -static-pie [...]
x86_64-glibc-linux-gnu/bin/ld: [...]/libc.a(mempcpy-ia32.o):
in function `__mempcpy_chk': [...]/glibc-git/string/../sysdeps/i386/i686/mempcpy.S:32: multiple definition of `__mempcpy_chk';
[...]/libc.a(mempcpy_chk-nonshared.o):[...]/debug/../sysdeps/i386/mempcpy_chk.S:28: first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [../Rules:298:
There is no need for mem*-nonshared.S, the __mem*_chk routines
are already provided by the assembly routines.
Checked on i686-linux-gnu with gcc 13 built with fortify=1,2,3 and
without fortify. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
With gcc 11.3.1, building with -D_FORTIFY_SOURCE=2 shows:
In function ‘getgroups’,
inlined from ‘do_test’ at test-errno.c:129:12:
../misc/sys/cdefs.h:195:6: error: argument 1 value -1 is negative
[-Werror=stringop-overflow=]
195 | ? __ ## f ## _alias (__VA_ARGS__)
\
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../posix/bits/unistd.h:115:10: note: in expansion of macro
‘__glibc_fortify’
115 | return __glibc_fortify (getgroups, __size, sizeof (__gid_t),
| ^~~~~~~~~~~~~~~
../posix/bits/unistd.h: In function ‘do_test’:
../posix/bits/unistd-decl.h:135:28: note: in a call to function
‘__getgroups_alias’ declared with attribute ‘access (write_only, 2, 1)’
135 | extern int __REDIRECT_NTH (__getgroups_alias, (int __size,
__gid_t __list[]),
| ^~~~~~~~~~~~~~~~~
../misc/sys/cdefs.h:264:6: note: in definition of macro ‘__REDIRECT_NTH’
264 | name proto __asm__ (__ASMNAME (#alias)) __THROW
It builds fine with gcc 12 and gcc 13.
Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Old GCC might trigger the the comparison will always evaluate as ‘true’
warnig for static build:
set-freeres.c:87:14: error: the comparison will always evaluate as
‘true’ for the address of ‘__libc_getgrgid_freemem_ptr’ will never be
NULL [-Werror=address]
if (&__ptr != NULL) \
So add pragma weak for all affected usages.
Checked on x86_64 and i686 with gcc 6 and 13. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Now that we have a proper configure argument for F_S (--enable-fortify-source),
just drop this entirely, to avoid conflicting with e.g. detected --enable-fortify-source
finding F_S=3, then nscd's Makefile setting F_S=2, resulting in a build-failure
because of the redefinition.
Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> Reviewed-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Stefan Liebler [Tue, 25 Jul 2023 09:34:30 +0000 (11:34 +0200)]
Include sys/rseq.h in tst-rseq-disable.c
Starting with commit 2c6b4b272e6b4d07303af25709051c3e96288f2d
"nptl: Unconditionally use a 32-byte rseq area", the testcase
misc/tst-rseq-disable is UNSUPPORTED as RSEQ_SIG is not defined.
The mentioned commit removes inclusion of sys/rseq.h in nptl/descr.h.
Thus just include sys/rseq.h in the tst-rseq-disable.c as also done
in tst-rseq.c and tst-rseq-nptl.c. Reviewed-by: Florian Weimer <fweimer@redhat.com>
If fortify is enabled, the truncated output warning is issued by
the wrapper itself:
In function ‘strncpy’,
inlined from ‘test_strncpy’ at tester.c:505:10:
../string/bits/string_fortified.h:95:10: error: ‘__builtin_strncpy’
destination unchanged after copying no bytes from a string of length 3
[-Werror=stringop-truncation]
95 | return __builtin___strncpy_chk (__dest, __src, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
96 | __glibc_objsize (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../include/bits/string_fortified.h:1,
from ../string/string.h:548,
from ../include/string.h:60,
from tester.c:33,
from inl-tester.c:6:
In function ‘strncpy’,
inlined from ‘test_strncpy’ at tester.c:505:10:
Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
nscd: Use errval, not errno to guide cache update (bug 30662)
The errno variable is potentially clobbered by the preceding
send call. It is not related to the to-be-cached information.
The parallel code in hstcache.c and servicescache.c already uses
errval.
Ying Huang [Thu, 15 Jun 2023 07:50:21 +0000 (03:50 -0400)]
MIPS: Sync elf.h from binutils
Add new definitions for the MIPS target, specifically: relocation
types, machine flags, section type names, and object attribute tags
and values. On MIPS64, up to three relocations may be specified
within r_info, by the r_type, r_type2, and r_type3 fields, so add new
macros to get the respective reloc types for MIPS64.
If the kernel headers provide a larger struct rseq, we used that
size as the argument to the rseq system call. As a result,
rseq registration would fail on older kernels which only accept
size 32.
scripts: Fix fortify checks if compiler does not support _FORTIFY_SOURCE=3
The 30379efad1 added _FORTIFY_SOURCE checks without check if compiler
does support all used fortify levels. This patch fixes it by first
checking at configure time the maximum support fortify level and using
it instead of a pre-defined one.
We mentioned eventual dropping of libcrypt in the 2.28 NEWS. Actually
put that plan in motion by first disabling building libcrypt by default.
note in NEWS that the library will be dropped completely in a future
release.
Also add a couple of builds into build-many-glibcs.py.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Andreas K. Hüttel <dilfridge@gentoo.org>
The _FORTIFY_SOURCE is used as default by some system compilers,
and there is no way to check if some fortify extension does not
trigger any conformance issue.
Based on feedback by Mike Gilbert <floppym@gentoo.org>
Linux-6.1.38-dist x86_64 AMD Phenom-tm- II X6 1055T Processor
-march=amdfam10
failures occur for x32 ABI
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Stefan Liebler [Thu, 13 Jul 2023 13:13:48 +0000 (15:13 +0200)]
Fix getting return address in elf/tst-audit28.c.
Starting with commit 1bcfe0f732066ae5336b252295591ebe7e51c301, the
test was enhanced and the object for __builtin_return_address (0)
is searched with _dl_find_object.
Unfortunately on e.g. s390 (31bit), a postprocessing step is needed
as the highest bit has to be masked out. This can be done with
__builtin_extract_return_addr.
Without this postprocessing, _dl_find_object returns with -1 and the
content of dlfo is invalid, which may lead to segfaults in basename.
Therefore those checks are now only done on success. Reviewed-by: Florian Weimer <fweimer@redhat.com>
[PATCH v1] x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold.
On some machines we end up with incomplete cache information. This can
make the new calculation of `sizeof(total-L3)/custom-divisor` end up
lower than intended (and lower than the prior value). So reintroduce
the old bound as a lower bound to avoid potentially regressing code
where we don't have complete information to make the decision. Reviewed-by: DJ Delorie <dj@redhat.com>
x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4`
```
Split `shared` (cumulative cache size) from `shared_per_thread` (cache
size per socket), the `shared_per_thread` *can* be slightly off from
the previous calculation.
Previously we added `core` even if `threads_l2` was invalid, and only
used `threads_l2` to divide `core` if it was present. The changed
version only included `core` if `threads_l2` was valid.
This change restores the old behavior if `threads_l2` is invalid by
adding the entire value of `core`. Reviewed-by: DJ Delorie <dj@redhat.com>
Bump autoconf requirement to 2.71 to allow regenerating configure on
more recent distributions. autoconf 2.71 has been in Fedora since F36
and is the current version in Debian stable (bookworm). It appears to
be current in Gentoo as well.
All sysdeps configure and preconfigure scripts have also been
regenerated; all changes are trivial transformations that do not affect
functionality.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The sparc ABI has multiple cases on how to handle JMP_SLOT relocations,
(sparc_fixup_plt/sparc64_fixup_plt). For BINDNOW, _dl_audit_symbind
will be responsible to setup the final relocation value; while for
lazy binding _dl_fixup/_dl_profile_fixup will call the audit callback
and tail cail elf_machine_fixup_plt (which will call
sparc64_fixup_plt).
This patch fixes by issuing the SPARC specific routine on bindnow and
forwarding the audit value to elf_machine_fixup_plt for lazy resolution.
It fixes the la_symbind for bind-now tests on sparc64 and sparcv9:
This patch checks if assembler supports vector instructions to
generate LASX/LSX code or not, and then define HAVE_LOONGARCH_VEC_ASM macro
We have added support for vector instructions in binutils-2.41
See:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=75b2f521b101d974354f6ce9ed7c054d8b2e3b7a
sysdeps/s390: Exclude fortified routines from being built with _FORTIFY_SOURCE
Depending on build configuration, the [routine]-c.c files may be chosen
to provide fortified routines implementation. While [routines].c
implementation were automatically excluded, the [routines]-c.c ones were
not. This patch fixes that by adding these file to the list to be
filtered.
Carlos O'Donell [Fri, 7 Jul 2023 15:27:08 +0000 (11:27 -0400)]
Translations: Add new ro support and update others.
This brings in the new Romanian language translations, and updates
nine other translations. Important translations in this update
include the Italian and Japanese translations for ESTALE which
remove the mention of "NFS" from the error message translation.
realloc: Limit chunk reuse to only growing requests [BZ #30579]
The trim_threshold is too aggressive a heuristic to decide if chunk
reuse is OK for reallocated memory; for repeated small, shrinking
allocations it leads to internal fragmentation and for repeated larger
allocations that fragmentation may blow up even worse due to the dynamic
nature of the threshold.
Limit reuse only when it is within the alignment padding, which is 2 *
size_t for heap allocations and a page size for mmapped allocations.
There's the added wrinkle of THP, but this fix ignores it for now,
pessimizing that case in favor of keeping fragmentation low.
Some locales define a list of mapping pairs of alternate digits and
separators for input digits (to_inpunct). This require the scanf
to create a list of all possible inputs for the optional type
modifier 'I'.
Checked on x86_64-linux-gnu.
Reviewed-by: Joe Simmons-Talbott <josimmon@redhat.com>
fileops: Don't process ,ccs= as individual mode flags (BZ#18906)
In processing the first 7 individual characters of the mode for fopen
if ,ccs= is used those characters will be processed as well. Stop
processing individual mode flags once a comma is encountered. This has
the effect of requiring ,ccs= to be the last mode flag in the mode
string. Add a testcase to check that the ,ccs= mode flag is not
processed as individual mode flags.
Frédéric Bérat [Wed, 28 Jun 2023 07:07:26 +0000 (09:07 +0200)]
libio/bits/stdio2.h: Clearly separate declaration from definitions
Move declarations from libio/bits/stdio.h to existing
libio/bits/stdio2-decl.h. This will enable future use of
__REDIRECT_FORTIFY in place of some __REDIRECT.
misc/bits/syslog.h: Clearly separate declaration from definition
This allows to include bits/syslog-decl.h in include/sys/syslog.h and
therefore be able to create the libc_hidden_builtin_proto (__syslog_chk)
prototype.
misc/bits/select2.h: Clearly separate declaration from definitions
The __fdelt_chk declaration needs to be available so that
libc_hidden_proto can be used while not redefining __FD_ELT.
Thus, misc/bits/select-decl.h is created to hold the corresponding
prototypes.
posix/bits/unistd.h: Clearly separate declaration from definitions
This change is similar to what was done for bits/wchar2.h.
Routines declaration are moved into a dedicated bits/unistd-decl.h file
which is then included into the bits/unistd.h file.
This will allow to adapt the files so that PLT entries are not created when
_FORTIFY_SOURCE is enabled.
misc/sys/cdefs.h: Create FORTIFY redirects for internal calls
The __REDIRECT* macros are creating aliases which may lead to unwanted
PLT entries when fortification is enabled.
To prevent these entries, the REDIRECT alias should be set to point to the
existing __GI_* aliases.
This is done transparently by creating a __REDIRECT_FORTIFY* version of
these macros, that can be overwritten internally when necessary.
stdio: Ensure *_chk routines have their hidden builtin definition available
If libc_hidden_builtin_{def,proto} isn't properly set for *_chk routines,
there are unwanted PLT entries in libc.so.
There is a special case with __asprintf_chk:
If ldbl_* macros are used for asprintf, ABI gets broken on s390x,
if it isn't, ppc64le isn't building due to multiple asm redirections.
This is due to the inclusion of bits/stdio-lbdl.h for ppc64le whereas it
isn't for s390x. This header creates redirections, which are not
compatible with the ones generated using libc_hidden_def.
Yet, we can't use libc_hidden_ldbl_proto on s390x since it will not
create a simple strong alias (e.g. as done on x86_64), but a versioned
alias, leading to ABI breakage.
This results in errors on s390x:
/usr/bin/ld: glibc/iconv/../libio/bits/stdio2.h:137: undefined reference
to `__asprintf_chk'
Frédéric Bérat [Fri, 16 Jun 2023 14:53:29 +0000 (16:53 +0200)]
sysdeps: Ensure ieee128*_chk routines to be properly named
The *_chk routines naming doesn't match the name that would be generated
using libc_hidden_ldbl_proto. Since the macro is needed for some of
these *_chk functions for _FORTIFY_SOURCE to be enabled, that needed to
be fixed.
While at it, all the *_chk function get renamed appropriately for
consistency, even if not strictly necessary.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Frédéric Bérat [Fri, 17 Mar 2023 09:17:28 +0000 (10:17 +0100)]
Exclude routines from fortification
Since the _FORTIFY_SOURCE feature uses some routines of Glibc, they need to
be excluded from the fortification.
On top of that:
- some tests explicitly verify that some level of fortification works
appropriately, we therefore shouldn't modify the level set for them.
- some objects need to be build with optimization disabled, which
prevents _FORTIFY_SOURCE to be used for them.
Assembler files that implement architecture specific versions of the
fortified routines were not excluded from _FORTIFY_SOURCE as there is no
C header included that would impact their behavior.
Frédéric Bérat [Fri, 17 Mar 2023 09:14:50 +0000 (10:14 +0100)]
Allow glibc to be built with _FORTIFY_SOURCE
Add --enable-fortify-source option.
It is now possible to enable fortification through a configure option.
The level may be given as parameter, if none is provided, the configure
script will determine what is the highest level possible that can be set
considering GCC built-ins availability and set it.
If level is explicitly set to 3, configure checks if the compiler
supports the built-in function necessary for it or raise an error if it
isn't.
If the configure option isn't explicitly enabled, it _FORTIFY_SOURCE is
forcibly undefined (and therefore disabled).
The result of the configure checks are new variables, ${fortify_source}
and ${no_fortify_source} that can be used to appropriately populate
CFLAGS.
A dedicated patch will follow to make use of this variable in Makefiles
when necessary.
Updated NEWS and INSTALL.
Adding dedicated x86_64 variant that enables the configuration.
manual: Update documentation of strerror and related functions
The current implementation of strerror is thread-safe, but this
has implications for the lifetime of the return string.
Describe the strerror_l function. Describe both variants of the
strerror_r function. Mention the lifetime of the returned string
for strerrorname_np and strerrordesc_np. Clarify that perror
output depends on the current locale.
manual: Enhance documentation of the <ctype.h> functions
Describe the problems with signed characters, and the glibc extension
to deal with most of them. Mention that the is* functions return
zero for the special argument EOF.
Andreas Schwab [Tue, 30 Jan 2018 09:16:00 +0000 (10:16 +0100)]
Always do locking when accessing streams (bug 15142, bug 14697)
Now that abort no longer calls fflush there is no reason to avoid locking
the stdio streams anywhere. This fixes a conformance issue and potential
heap corruption during exit.
Sergey Bugaev [Sun, 25 Jun 2023 23:17:51 +0000 (02:17 +0300)]
hurd: Implement MAP_EXCL
MAP_FIXED is defined to silently replace any existing mappings at the
address range being mapped over. This, however, is a dangerous, and only
rarely desired behavior.
Various Unix systems provide replacements or additions to MAP_FIXED:
* SerenityOS and Linux provide MAP_FIXED_NOREPLACE. If the address space
already contains a mapping in the requested range, Linux returns
EEXIST. SerenityOS returns ENOMEM, however that is a bug, as the
MAP_FIXED_NOREPLACE implementation is intended to be compatible with
Linux.
* FreeBSD provides the MAP_EXCL flag that has to be used in combination
with MAP_FIXED. It returns EINVAL if the requested range already
contains existing mappings. This is directly analogous to the O_EXCL
flag in the open () call.
* DragonFly BSD, NetBSD, and OpenBSD provide MAP_TRYFIXED, but with
different semantics. DragonFly BSD returns ENOMEM if the requested
range already contains existing mappings. NetBSD does not return an
error, but instead creates the mapping at a different address if the
requested range contains mappings. OpenBSD behaves the same, but also
notes that this is the default behavior even without MAP_TRYFIXED
(which is the case on the Hurd too).
Since the Hurd leans closer to the BSD side, add MAP_EXCL as the primary
API to request the behavior of not replacing existing mappings. Declare
MAP_FIXED_NOREPLACE and MAP_TRYFIXED as aliases of (MAP_FIXED|MAP_EXCL),
so any existing software that checks for either of those macros will
pick them up automatically. For compatibility with Linux, return EEXIST
if a mapping already exists.
Sergey Bugaev [Sun, 25 Jun 2023 23:17:50 +0000 (02:17 +0300)]
hurd: Fix mapping at address 0 with MAP_FIXED
Zero address passed to mmap () typically means the caller doesn't have
any specific preferred address. Not so if MAP_FIXED is passed: in this
case 0 means literal 0. Fix this case to pass anywhere = 0 into vm_map.
Sergey Bugaev [Sun, 25 Jun 2023 23:17:48 +0000 (02:17 +0300)]
hurd: Map brk non-executable
The rest of the heap (backed by individual pages) is already mapped RW.
Mapping these pages RWX presents a security hazard.
Also, in another branch memory gets allocated using vm_allocate, which
sets memory protection to VM_PROT_DEFAULT (which is RW). The mismatch
between protections prevents Mach from coalescing the VM map entries.
Sergey Bugaev [Sun, 25 Jun 2023 23:17:47 +0000 (02:17 +0300)]
htl: Let Mach place thread stacks
Instead of trying to allocate a thread stack at a specific address,
looping over the address space, just set the ANYWHERE flag in
vm_allocate (). The previous behavior:
- defeats ASLR (for Mach versions that support ASLR),
- is particularly slow if the lower 4 GB of the address space are mapped
inaccessible, as we're planning to do on 64-bit Hurd,
- is just silly.
Samuel Thibault [Sun, 2 Jul 2023 11:27:51 +0000 (11:27 +0000)]
mach: strerror must not return NULL (bug 30555)
This follows 1d44530a5be2 ("string: strerror must not return NULL (bug 30555)"):
«
For strerror, this fixes commit 28aff047818eb1726394296d27b ("string:
Implement strerror in terms of strerror_l"). This commit avoids
returning NULL for strerror_l as well, although POSIX allows this
behavior for strerror_l.
»
support: Build with exceptions and asynchronous unwind tables [BZ #30587]
Changing tst-cleanup4.c to use xread instead of read caused
the nptl/tst-cleanupx4 test to fail. The routines in libsupport.a
need to be built with exception handling and asynchronous unwind
table support.
v2: Use "CFLAGS-.oS" instead of "override CFLAGS".
H.J. Lu [Thu, 22 Jun 2023 21:30:31 +0000 (14:30 -0700)]
ld.so: Always use MAP_COPY to map the first segment [BZ #30452]
The first segment in a shared library may be read-only, not executable.
To support LD_PREFER_MAP_32BIT_EXEC on such shared libraries, we also
check MAP_DENYWRITE to decide if MAP_32BIT should be passed to mmap.
Normally the first segment is mapped with MAP_COPY, which is defined
as (MAP_PRIVATE | MAP_DENYWRITE). But if the segment alignment is
greater than the page size, MAP_COPY isn't used to allocate enough
space to ensure that the segment can be properly aligned. Map the
first segment with MAP_COPY in this case to fix BZ #30452.
Martin Coufal [Mon, 19 Jun 2023 14:05:21 +0000 (16:05 +0200)]
Add checks for wday, yday and new date formats
tm time struct contains tm_wday and tm_yday that were previously not
checked in this test. Also added new test cases for date formats
containing %D, %R or %h. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Joe Ramsay [Wed, 28 Jun 2023 11:19:39 +0000 (12:19 +0100)]
aarch64: Add vector implementations of exp routines
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines.
As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Joe Ramsay [Wed, 28 Jun 2023 11:19:38 +0000 (12:19 +0100)]
aarch64: Add vector implementations of log routines
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines. Log lookup table
added as HIDDEN symbol to allow it to be shared between AdvSIMD and
SVE variants.
As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Joe Ramsay [Wed, 28 Jun 2023 11:19:37 +0000 (12:19 +0100)]
aarch64: Add vector implementations of sin routines
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines.
As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Joe Ramsay [Wed, 28 Jun 2023 11:19:36 +0000 (12:19 +0100)]
aarch64: Add vector implementations of cos routines
Replace the loop-over-scalar placeholder routines with optimised
implementations from Arm Optimized Routines (AOR).
Also add some headers containing utilities for aarch64 libmvec
routines, and update libm-test-ulps.
Data tables for new routines are used via a pointer with a
barrier on it, in order to prevent overly aggressive constant
inlining in GCC. This allows a single adrp, combined with offset
loads, to be used for every constant in the table.
Special-case handlers are marked NOINLINE in order to confine the
save/restore overhead of switching from vector to normal calling
standard. This way we only incur the extra memory access in the
exceptional cases. NOINLINE definitions have been moved to
math_private.h in order to reduce duplication.
AOR exposes a config option, WANT_SIMD_EXCEPT, to enable
selective masking (and later fixing up) of invalid lanes, in
order to trigger fp exceptions correctly (AdvSIMD only). This is
tested and maintained in AOR, however it is configured off at
source level here for performance reasons. We keep the
WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify
the upstreaming process from AOR to glibc.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>