Define AT_MINSIGSTKSZ in the generic uapi header. It is already used
as generic ABI in glibc's generic elf.h, and this define will prevent
future namespace conflicts. In particular, x86 is also using this
generic definition.
Florian Weimer [Thu, 23 Dec 2021 14:01:07 +0000 (15:01 +0100)]
stdio: Implement %#m for vfprintf and related functions
%#m prints errno as an error constant if one is available, or
a decimal number as a fallback. This intends to address the gap
that strerrorname_np does not work well with printf for unknown
error codes due to its NULL return values in those cases.
Sunil K Pandey [Wed, 22 Dec 2021 14:20:41 +0000 (06:20 -0800)]
x86-64: Add vector acos/acosf implementation to libmvec
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector acos/acosf with regenerated ulps.
maminjie [Mon, 20 Dec 2021 11:36:32 +0000 (19:36 +0800)]
Linux: Fix 32-bit vDSO for clock_gettime on powerpc32
When the clock_id is CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID,
on the 5.10 kernel powerpc 32-bit, the 32-bit vDSO is executed successfully (
because the __kernel_clock_gettime in arch/powerpc/kernel/vdso32/gettimeofday.S
does not support these two IDs, the 32-bit time_t syscall will be used),
but tp32.tv_sec is equal to 0, causing the 64-bit time_t syscall to continue to be used,
resulting in two system calls.
Joseph Myers [Mon, 20 Dec 2021 15:38:32 +0000 (15:38 +0000)]
Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h
Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along
with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol
numbers for PF_CAN") but apparently missed for glibc at the time.
Aurelien Jarno [Wed, 15 Dec 2021 22:46:19 +0000 (23:46 +0100)]
elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704]
On KVM guests running on some AMD systems, the IBRS feature is reported
as a synthetic feature using the Intel feature, while the cpuinfo entry
keeps the same. Handle that by first checking the presence of the Intel
feature on AMD systems.
powerpc64[le]: Allocate extra stack frame on syscall.S
The syscall function does not allocate the extra stack frame for scv like other
assembly syscalls using DO_CALL_SCV. So after commit d120fb9941 changed the
offset that is used to save LR, syscall ended up using an invalid offset,
causing regressions on powerpc64. So make sure the extra stack frame is
allocated in syscall.S as well to make it consistent with other uses of
DO_CALL_SCV and avoid similar issues in the future.
Tested on powerpc, powerpc64, and powerpc64le (with and without scv)
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Florian Weimer [Fri, 17 Dec 2021 11:01:20 +0000 (12:01 +0100)]
nss: Use "files dns" as the default for the hosts database (bug 28700)
This matches what is currently in nss/nsswitch.conf. The new ordering
matches what most distributions use in their installed configuration
files.
It is common to add localhost to /etc/hosts because the name does not
exist in the DNS, but is commonly used as a host name.
With the built-in "dns [!UNAVAIL=return] files" default, dns is
searched first and provides an answer for "localhost" (NXDOMAIN).
We never look at the files database as a result, so the contents of
/etc/hosts is ignored. This means that "getent hosts localhost"
fail without a /etc/nsswitch.conf file, even though the host name
is listed in /etc/hosts.
Florian Weimer [Fri, 17 Dec 2021 10:48:41 +0000 (11:48 +0100)]
arm: Guard ucontext _rtld_global_ro access by SHARED, not PIC macro
Due to PIE-by-default, PIC is now defined in more cases. libc.a
does not have _rtld_global_ro, and statically linking setcontext
fails. SHARED is the right condition to use, so that libc.a
references _dl_hwcap instead of _rtld_global_ro.
For static PIE support, the !SHARED case would still have to be made
PIC. This patch does not achieve that.
I (and maybe one or two others) added a (C) to the copyright notice
regardless of the contribution checklist[1] not mentioning it. Fix all
these instances so that the notice reads as "Copyright The GNU Toolchain
Authors" across the source code.
Remove upper limit on tunable MALLOC_MMAP_THRESHOLD
The current limit on MALLOC_MMAP_THRESHOLD is either 1 Mbyte (for
32-bit apps) or 32 Mbytes (for 64-bit apps). This value was set by a
patch dated 2006 (15 years ago). Attempts to set the threshold higher
are currently ignored.
The default behavior is appropriate for many highly parallel
applications where many processes or threads are sharing RAM. In other
situations where the number of active processes or threads closely
matches the number of cores, a much higher limit may be desired by the
application designer. By today's standards on personal computers and
small servers, 2 Gbytes of RAM per core is commonly available. On
larger systems 4 Gbytes or more of RAM is sometimes available.
Instead of raising the limit to match current needs, this patch
proposes to remove the limit of the tunable, leaving the decision up
to the user of a tunable to judge the best value for their needs.
This patch does not change any of the defaults for malloc tunables,
retaining the current behavior of the dynamic malloc mmap threshold.
bugzilla 27801 - Remove upper limit on tunable MALLOC_MMAP_THRESHOLD Reviewed-by: DJ Delorie <dj@redhat.com>
malloc/
malloc.c changed do_set_mmap_threshold to remove test
for HEAP_MAX_SIZE.
Stefan Liebler [Thu, 16 Dec 2021 11:47:11 +0000 (12:47 +0100)]
Fix __minimal_malloc segfaults in __mmap due to stack-protector
Starting with commit b05fae4d8e34604a72ee36d2d3164391b76fcf0b
"elf: Use the minimal malloc on tunables_strdup",
I get lots of segfaults in static tests on s390x when also using, e.g.:
export GLIBC_TUNABLES="glibc.elision.enable=1"
tunables_strdup callls __minimal_malloc which tries to call __mmap
due to insufficient space left. __mmap itself first setups a new
stack frame and segfaults when copying the stack-protector canary
from thread-pointer. The latter one is not yet setup.
Thus this patch also turns off stack-protection for mmap. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
This patch adds support huge page support on main arena allocation,
enable with tunable glibc.malloc.hugetlb=2. The patch essentially
disable the __glibc_morecore() sbrk() call (similar when memory
tag does when sbrk() call does not support it) and fallback to
default page size if the memory allocation fails.
It is enabled as default for glibc.malloc.hugetlb set to 2 or higher.
It also uses a non configurable minimum value and maximum value,
currently set respectively to 1 and 4 selected huge page size.
The arena allocation with huge pages does not use MAP_NORESERVE. As
indicate by kernel internal documentation [1], the flag might trigger
a SIGBUS on soft page faults if at memory access there is no left
pages in the pool.
On systems without a reserved huge pages pool, is just stress the
mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is
required to create a pool with some allocated pages.
Checked on x86_64-linux-gnu with no reserved pages, 10 reserved pages
(which trigger mmap(MAP_HUGETBL) failures) and with 256 reserved pages
(which does not trigger mmap(MAP_HUGETLB) failures).
With the morecore hook removed, there is not easy way to provide huge
pages support on with glibc allocator without resorting to transparent
huge pages. And some users and programs do prefer to use the huge pages
directly instead of THP for multiple reasons: no splitting, re-merging
by the VM, no TLB shootdowns for running processes, fast allocation
from the reserve pool, no competition with the rest of the processes
unlike THP, no swapping all, etc.
This patch extends the 'glibc.malloc.hugetlb' tunable: the value
'2' means to use huge pages directly with the system default size,
while a positive value means and specific page size that is matched
against the supported ones by the system.
Currently only memory allocated on sysmalloc() is handled, the arenas
still uses the default system page size.
To test is a new rule is added tests-malloc-hugetlb2, which run the
addes tests with the required GLIBC_TUNABLE setting. On systems without
a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB)
allocation failure. To improve test coverage it is required to create
a pool with some allocated pages.
malloc: Add madvise support for Transparent Huge Pages
Linux Transparent Huge Pages (THP) current supports three different
states: 'never', 'madvise', and 'always'. The 'never' is
self-explanatory and 'always' will enable THP for all anonymous
pages. However, 'madvise' is still the default for some system and
for such case THP will be only used if the memory range is explicity
advertise by the program through a madvise(MADV_HUGEPAGE) call.
To enable it a new tunable is provided, 'glibc.malloc.hugetlb',
where setting to a value diffent than 0 enables the madvise call.
This patch issues the madvise(MADV_HUGEPAGE) call after a successful
mmap() call at sysmalloc() with sizes larger than the default huge
page size. The madvise() call is disable is system does not support
THP or if it has the mode set to "never" and on Linux only support
one page size for THP, even if the architecture supports multiple
sizes.
To test is a new rule is added tests-malloc-hugetlb1, which run the
addes tests with the required GLIBC_TUNABLE setting.
Florian Weimer [Wed, 15 Dec 2021 15:06:25 +0000 (16:06 +0100)]
powerpc: Use global register variable in <thread_pointer.h>
A local register variable is merely a compiler hint, and so not
appropriate in this context. Move the global register variable into
<thread_pointer.h> and include it from <tls.h>, as there can only
be one global definition for one particular register.
Use LFS and 64 bit time for installed programs (BZ #15333)
The installed programs are built with a combination of different
values for MODULE_NAME, as below. To enable both Long File Support
and 64 bt time, -D_TIME_BITS=64 -D_FILE_OFFSET_BITS=64 is added for
nonlibi, nscd, lddlibc4, libresolv, ldconfig, locale_programs,
iconvprogs, libnss_files, libnss_compat, libnss_db, libnss_hesiod,
libutil, libpcprofile, and libSegFault.
Also, to avoid adding both LFS and 64 bit time support on internal
tests they are moved to a newer 'testsuite-internal' module. It
should be similar to 'nonlib' regarding internal definition and
linking namespace.
This patch also enables LFS and 64 bit support of libsupport container
programs (echo-container, test-container, shell-container, and
true-container).
Khem Raj [Thu, 2 Dec 2021 07:13:13 +0000 (23:13 -0800)]
intl: Emit no lines in bison generated files
Improve reproducibility:
Do not put any #line preprocessor commands in bison generated files.
These lines contain absolute paths containing file locations on
the host build machine.
H.J. Lu [Wed, 8 Dec 2021 15:02:27 +0000 (07:02 -0800)]
Disable DT_RUNPATH on NSS tests [BZ #28455]
The glibc internal NSS functions should always load NSS modules from
the system. For testing purpose, disable DT_RUNPATH on NSS tests so
that the glibc internal NSS functions can load testing NSS modules
via DT_RPATH.
Akila Welihinda [Sun, 12 Dec 2021 18:35:03 +0000 (10:35 -0800)]
sysdeps: Simplify sin Taylor Series calculation
The macro TAYLOR_SIN adds the term `-0.5*da*a^2 + da` in hopes
of regaining some precision as a function of da. However the
comment says we add the term `-0.5*da*a^2 + 0.5*da` which is
different. This fix updates the comment to reflect the
code and also simplifies the calculation by replacing `a` with `x`
because they always have the same value.
Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu> Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
math: Remove the error handling wrapper from hypot and hypotf
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper with
SVID error handling around the new code. There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
The generic hypotf is slight slower, mostly due the tricks the assembly
does to optimize the isinf/isnan/issignaling. The generic hypot is way
slower, since the optimized implementation uses the i386 default
excessive precision to issue the operation directly. A similar
implementation is provided instead of using the generic implementation:
math: Use an improved algorithm for hypotl (ldbl-128)
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.05% results with an error of
1 ulp (453266 results) while the new implementation only shows
0.0001% of total (1280).
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
math: Use an improved algorithm for hypotl (ldbl-96)
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.02% results with an error of
1 ulp (23158 results) while the new implementation only shows
0.0001% of total (111).
Wilco Dijkstra [Tue, 30 Nov 2021 19:29:25 +0000 (16:29 -0300)]
math: Improve hypot performance with FMA
Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency. The non-fma version has 30% higher throughput and 10%
higher latency.
Max ULP error is 0.949 with fma and 0.792 without fma.
Wilco Dijkstra [Mon, 8 Mar 2021 20:07:39 +0000 (17:07 -0300)]
math: Use an improved algorithm for hypot (dbl-64)
This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for denormal results.
The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).
The performance result are also only slight worse than current
implementation. On x86_64 (Ryzen 5900X) with gcc 12:
Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float. With gcc it
results in 2 FP comparisons.
The file Copyright is also changed to use GPL, the implementation was
completely changed by 7c10fd3515f to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.
Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8
equivalents so that files don't end up with mixed encodings. With this,
all files (except tests that actually test different encodings) have a
single encoding.
Replace --enable-static-pie with --disable-default-pie
Build glibc programs and tests as PIE by default and enable static-pie
automatically if the architecture and toolchain supports it.
Also add a new configuration option --disable-default-pie to prevent
building programs as PIE.
Only the following architectures now have PIE disabled by default
because they do not work at the moment. hppa, ia64, alpha and csky
don't work because the linker is unable to handle a pcrel relocation
generated from PIE objects. The microblaze compiler is currently
failing with an ICE. GNU hurd tries to enable static-pie, which does
not work and hence fails. All these targets have default PIE disabled
at the moment and I have left it to the target maintainers to enable PIE
on their targets.
build-many-glibcs runs clean for all targets. I also tested x86_64 on
Fedora and Ubuntu, to verify that the default build as well as
--disable-default-pie work as expected with both system toolchains.
H.J. Lu [Fri, 10 Dec 2021 21:00:09 +0000 (13:00 -0800)]
x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]
Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since
the first PT_LOAD segment is no longer executable due to defaulting to
-z separate-code.
Rongwei Wang [Fri, 10 Dec 2021 12:39:10 +0000 (20:39 +0800)]
elf: Properly align PT_LOAD segments [BZ #28676]
When PT_LOAD segment alignment > the page size, allocate enough space to
ensure that the segment can be properly aligned. This change helps code
segments use huge pages become simple and available.
This fixes [BZ #28676].
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Florian Weimer [Fri, 10 Dec 2021 15:06:36 +0000 (16:06 +0100)]
elf: Install a symbolic link to ld.so as /usr/bin/ld.so
This makes ld.so features such as --preload, --audit,
and --list-diagnostics more accessible to end users because they
do not need to know the ABI name of the dynamic loader.
Florian Weimer [Fri, 10 Dec 2021 04:14:24 +0000 (05:14 +0100)]
nptl: Add one more barrier to nptl/tst-create1
Without the bar_ctor_finish barrier, it was possible that thread2
re-locked user_lock before ctor had a chance to lock it. ctor then
blocked in its locking operation, xdlopen from the main thread
did not return, and thread2 was stuck waiting in bar_dtor:
Florian Weimer [Thu, 9 Dec 2021 16:57:11 +0000 (17:57 +0100)]
Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGN
TLS_INIT_TCB_ALIGN is not actually used. TLS_TCB_ALIGN was likely
introduced to support a configuration where the thread pointer
has not the same alignment as THREAD_SELF. Only ia64 seems to use
that, but for the stack/pointer guard, not for storing tcbhead_t.
Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift
the thread pointer, potentially landing in a different residue class
modulo the alignment, but the changes should not impact that.
In general, given that TLS variables have their own alignment
requirements, having different alignment for the (unshifted) thread
pointer and struct pthread would potentially result in dynamic
offsets, leading to more complexity.
hppa had different values before: __alignof__ (tcbhead_t), which
seems to be 4, and __alignof__ (struct pthread), which was 8
(old default) and is now 32. However, it defines THREAD_SELF as:
/* Return the thread descriptor for the current thread. */
# define THREAD_SELF \
({ struct pthread *__self; \
__self = __get_cr27(); \
__self - 1; \
})
So the thread pointer points after struct pthread (hence __self - 1),
and they have to have the same alignment on hppa as well.
Similarly, on ia64, the definitions were different. We have:
And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment
(confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c).
On m68k, we have a larger gap between tcbhead_t and struct pthread.
But as far as I can tell, the port is fine with that. The definition
of TCB_OFFSET is sufficient to handle the shifted TCB scenario.
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
nptl: Add public rseq symbols and <sys/rseq.h>
The relationship between the thread pointer and the rseq area
is made explicit. The constant offset can be used by JIT compilers
to optimize rseq access (e.g., for really fast sched_getcpu).
Extensibility is provided through __rseq_size and __rseq_flags.
(In the future, the kernel could request a different rseq size
via the auxiliary vector.)
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
nptl: Add rseq registration
The rseq area is placed directly into struct pthread. rseq
registration failure is not treated as an error, so it is possible
that threads run with inconsistent registration status.
<sys/rseq.h> is not yet installed as a public header.
Add tests that had src/dst non 4-byte aligned. Since src/dst are
initialized/compared as uint32_t type which is 4-byte aligned this can
break on some targets.
Fix the issue by specifying a new non-aligned 4-byte
`unaligned_uint32_t` for src/dst.
Another alternative is to rely on memcpy/memcmp for
initializing/testing src/dst. Using memcpy for initializing in memcpy
tests, however, could lead to future bugs.
This patch improves the user experience by looking at the magic value,
which is always written, but never checked. It should still be possible
to trigger a segmentation fault with crafted files, but this already
catch many cases.
Florian Weimer [Mon, 6 Dec 2021 07:01:08 +0000 (08:01 +0100)]
misc, nptl: Remove stray references to __condvar_load_64_relaxed
The function was renamed to __atomic_wide_counter_load_relaxed
in commit 8bd336a00a5311bf7a9e99b3b0e9f01ff5faa74b ("nptl: Extract
<bits/atomic_wide_counter.h> from pthread_cond_common.c").
Florian Weimer [Sun, 5 Dec 2021 12:50:17 +0000 (13:50 +0100)]
csu: Always use __executable_start in gmon-start.c
Current binutils defines __executable_start as the lowest text
address, so using the entry point address as a fallback is no
longer necessary. As a result, overriding <entry.h> is only
necessary if the entry point is not called _start.
The previous approach to define __ASSEMBLY__ to suppress the
declaration breaks if headers included by <entry.h> are not
compatible with __ASSEMBLY__. This happens with rseq integration
because it is necessary to include kernel headers in more places.
Florian Weimer [Sun, 5 Dec 2021 10:28:34 +0000 (11:28 +0100)]
elf: execve statically linked programs instead of crashing [BZ #28648]
Programs without dynamic dependencies and without a program
interpreter are now run via execve.
Previously, the dynamic linker either crashed while attempting to
read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT
data), or the self-relocated in the static PIE executable crashed
because the outer dynamic linker had already applied RELRO protection.
<dl-execve.h> is needed because execve is not available in the
dynamic loader on Hurd.
H.J. Lu [Sat, 4 Dec 2021 19:25:53 +0000 (11:25 -0800)]
Add --with-timeoutfactor=NUM to specify TIMEOUTFACTOR
On Ice Lake and Tiger Lake laptops, some test programs timeout when there
are 3 "make check -j8" runs in parallel. Add --with-timeoutfactor=NUM to
specify an integer to scale the timeout of test programs, which can be
overriden by TIMEOUTFACTOR environment variable.
Noah Goldstein [Fri, 3 Dec 2021 23:29:25 +0000 (15:29 -0800)]
x86-64: Use notl in EVEX strcmp [BZ #28646]
Must use notl %edi here as lower bits are for CHAR comparisons
potentially out of range thus can be 0 without indicating mismatch.
This fixes BZ #28646.
Wilco Dijkstra [Thu, 2 Dec 2021 18:33:26 +0000 (18:33 +0000)]
AArch64: Improve A64FX memcpy
v2 is a complete rewrite of the A64FX memcpy. Performance is improved
by streamlining the code, aligning all large copies and using a single
unrolled loop for all sizes. The code size for memcpy and memmove goes
down from 1796 bytes to 868 bytes. Performance is better in all cases:
bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33%
faster for large sizes, bench-memcpy-walk is 25% faster for small sizes
and 20% for the largest sizes. The geomean of all tests in bench-memcpy
is 5.1% faster, and total time is reduced by 4%.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Wilco Dijkstra [Thu, 2 Dec 2021 18:30:55 +0000 (18:30 +0000)]
AArch64: Optimize memcmp
Rewrite memcmp to improve performance. On small and medium inputs performance
is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per
iteration, which is 30-50% faster depending on the size.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Matheus Castanho [Tue, 26 Oct 2021 13:44:59 +0000 (10:44 -0300)]
powerpc64[le]: Fix CFI and LR save address for asm syscalls [BZ #28532]
Syscalls based on the assembly templates are missing CFI for r31, which gets
clobbered when scv is used, and info for LR is inaccurate, placed in the wrong
LOC and not using the proper offset. LR was also being saved to the callee's
frame, while the ABI mandates it to be saved to the caller's frame. These are
fixed by this commit.
The syscall pipe2 was added in linux 2.6.27 and glibc requires linux
3.2.0. The patch removes the arch-specific implementation for alpha,
ia64, mips, sh, and sparc which requires a different kernel ABI
than the usual one.
Checked on x86_64-linux-gnu and with a build for the affected ABIs.
Variadic function calls in syscalls.list does not work for all ABIs
(for instance where the argument are passed on the stack instead of
registers) and might have underlying issues depending of the variadic
type (for instance if a 64-bit argument is used).
The LFS prlimit64 requires a arch-specific implementation in
syscalls.list. Instead add a generic one that handles the
required symbol alias for __RLIM_T_MATCHES_RLIM64_T.
HPPA is the only outlier which requires a different default
symbol.
Checked on x86_64-linux-gnu and with build for the affected ABIs.
Passing 64-bit arguments on syscalls.list is tricky: it requires
to reimplement the expected kernel abi in each architecture. This
is way to better to represent in C code where we already have
macros for this (SYSCALL_LL64).