Mike FABIAN [Thu, 1 Jun 2023 15:02:44 +0000 (17:02 +0200)]
Adapt collation in th_TH locale to use the iso14651_t1_common file and sync the collation with CLDR
I made it to agree as much as possible with the rules from CLDR (see:
https://github.com/unicode-org/cldr/blob/main/common/collation/th.xml).
It seems to be impossible to follow the CLDR rules
&[before 1]๚<ฯ # should be "variable"
and
&๛<ๆ # should be "variable"
exactly though. These ask for a primary difference in punctuation
characters whose primary weight should be "IGNORE". But using a
secondary differnence instead still sorts the test data correctly and
the previously used collation in th_TH used tertiary differences for
these characters.
There was old localedata/th_TH.in test data in TIS-620 encoding which
was not used (it was not in the localedata/Makefile). I converted this
to UTF-8 and moved it to localedata/th_TH.UTF-8.in and added it to
localedata/Makefile.
Using the existing collation rules in the th_TH locale did not sort that
test file completely correct, I think my new collation rules based on
iso14651_t1 are better.
Joseph Myers [Wed, 20 Sep 2023 13:36:46 +0000 (13:36 +0000)]
Update kernel version to 6.5 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py
and tst-pidfd-consts.py to 6.5. (There are no new constants covered
by these tests in 6.5 that need any other header changes;
tst-mount-consts.py was updated separately along with a header
constant addition.)
Key Points:
1. On lasx & lsx platforms, We must use _dl_runtime_{profile, resolve}_{lsx, lasx}
to save vector registers.
2. Via "tunables", users can choose str/mem_{lasx,lsx,unaligned} functions with
`export GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX,...`.
Note: glibc.cpu.hwcaps doesn't affect _dl_runtime_{profile, resolve}_{lsx, lasx}
selection.
Usage Notes:
1. Only valid inputs: LASX, LSX, UAL. Case-sensitive, comma-separated, no spaces.
2. Example: `export GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX,UAL` turns on LASX & UAL.
Unmentioned features turn off. With default ifunc: lasx > lsx > unaligned >
aligned > generic, effect is: lasx > unaligned > aligned > generic; lsx off.
3. Incorrect GLIBC_TUNABLES settings will show error messages.
For example: On lsx platforms, you cannot enable lasx features. If you do
that, you will get error messages.
4. Valid input examples:
- GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX: lasx > aligned > generic.
- GLIBC_TUNABLES=glibc.cpu.hwcaps=LSX,UAL: lsx > unaligned > aligned > generic.
- GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX,UAL,LASX,UAL,LSX,LASX,UAL: Repetitions
allowed but not recommended. Results in: lasx > lsx > unaligned > aligned >
generic.
Wilco Dijkstra [Tue, 15 Aug 2023 17:01:53 +0000 (18:01 +0100)]
math: Add a no-mathvec flag for sin (-0.0)
Add support for a no-mathvec flag to gen-auto-libm-tests.c.
Update input test sin (-0.0) to be skipped in vector math libraries and
regenerate testcases.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Mike FABIAN [Thu, 14 Sep 2023 16:01:40 +0000 (18:01 +0200)]
Update to Unicode 15.1.0 [BZ #30854]
Unicode 15.1.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 15.1.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Total removed characters in newly generated CHARMAP: 0
Total changed characters in newly generated CHARMAP: 0
Total added characters in newly generated CHARMAP: 627
Total removed characters in newly generated WIDTH: 0
Total changed characters in newly generated WIDTH: 0
Total added characters in newly generated WIDTH: 627
alpha: Added 622 characters in new ctype which were not in old ctype
graph: Added 627 characters in new ctype which were not in old ctype
print: Added 627 characters in new ctype which were not in old ctype
punct: Added 5 characters in new ctype which were not in old ctype
The five characters added to punct are:
2FFC;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM RIGHT;So;0;ON;;;;;N;;;;;
2FFD;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER RIGHT;So;0;ON;;;;;N;;;;;
2FFE;IDEOGRAPHIC DESCRIPTION CHARACTER HORIZONTAL REFLECTION;So;0;ON;;;;;N;;;;;
2FFF;IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION;So;0;ON;;;;;N;;;;;
31EF;IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION;So;0;ON;;;;;N;;;;;
The Unicode announcement blog entry says "[...] adds 627
characters, [...] additions include 622 CJK unified ideographs in
a new block, [...]", so that looks OK. The Unicode
blog mentions "six completely new emoji" but they don't appear here as
they are all sequences and not single code points.
getaddrinfo: Fix use after free in getcanonname (CVE-2023-4806)
When an NSS plugin only implements the _gethostbyname2_r and
_getcanonname_r callbacks, getaddrinfo could use memory that was freed
during tmpbuf resizing, through h_name in a previous query response.
The backing store for res->at->name when doing a query with
gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in
gethosts during the query. For AF_INET6 lookup with AI_ALL |
AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second
for a v4 lookup. In this case, if the first call reallocates tmpbuf
enough number of times, resulting in a malloc, th->h_name (that
res->at->name refers to) ends up on a heap allocated storage in tmpbuf.
Now if the second call to gethosts also causes the plugin callback to
return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF
reference in res->at->name. This then gets dereferenced in the
getcanonname_r plugin call, resulting in the use after free.
Fix this by copying h_name over and freeing it at the end. This
resolves BZ #30843, which is assigned CVE-2023-4806.
LoongArch: Add ifunc support for strrchr{aligned, lsx, lasx}
According to glibc strrchr microbenchmark test results, this implementation
could reduce the runtime time as following:
Name Percent of rutime reduced
strrchr-lasx 10%-50%
strrchr-lsx 0%-50%
strrchr-aligned 5%-50%
Generic strrchr is implemented by function strlen + memrchr, the lasx version
will compare with generic strrchr implemented by strlen-lasx + memrchr-lasx,
the lsx version will compare with generic strrchr implemented by strlen-lsx +
memrchr-lsx, the aligned version will compare with generic strrchr implemented
by strlen-aligned + memrchr-generic.
LoongArch: Add ifunc support for strcpy, stpcpy{aligned, unaligned, lsx, lasx}
According to glibc strcpy and stpcpy microbenchmark test results(changed
to use generic_strcpy and generic_stpcpy instead of strlen + memcpy),
comparing with the generic version, this implementation could reduce the
runtime as following:
Name Percent of rutime reduced
strcpy-aligned 8%-45%
strcpy-unaligned 8%-48%, comparing with the aligned version, unaligned
version takes less instructions to copy the tail of data
which length is less than 8. it also has better performance
in case src and dest cannot be both aligned with 8bytes
strcpy-lsx 20%-80%
strcpy-lasx 15%-86%
stpcpy-aligned 6%-43%
stpcpy-unaligned 8%-48%
stpcpy-lsx 10%-80%
stpcpy-lasx 10%-87%
Joseph Myers [Thu, 14 Sep 2023 14:58:15 +0000 (14:58 +0000)]
Add MOVE_MOUNT_BENEATH from Linux 6.5 to sys/mount.h
This patch adds the MOVE_MOUNT_BENEATH constant from Linux 6.5 to
glibc's sys/mount.h and updates tst-mount-consts.py to reflect these
constants being up to date with that Linux kernel version.
CVE-2023-4527: Stack read overflow with large TCP responses in no-aaaa mode
Without passing alt_dns_packet_buffer, __res_context_search can only
store 2048 bytes (what fits into dns_packet_buffer). However,
the function returns the total packet size, and the subsequent
DNS parsing code in _nss_dns_gethostbyname4_r reads beyond the end
of the stack-allocated buffer.
Joseph Myers [Tue, 12 Sep 2023 14:08:53 +0000 (14:08 +0000)]
Update syscall lists for Linux 6.5
Linux 6.5 has one new syscall, cachestat, and also enables the
cacheflush syscall for hppa. Update syscall-names.list and regenerate
the arch-syscall.h headers with build-many-glibcs.py update-syscalls.
ia64: Work around miscompilation and fix build on ia64's gcc-10 and later
Needed since gcc-10 enabled -fno-common by default.
[In use in Gentoo since gcc-10, no problems observed.
Also discussed with and reviewed by Jessica Clarke from
Debian. Andreas]
Bug: https://bugs.gentoo.org/723268 Reviewed-by: Carlos O'Donell <carlos@redhat.com> Signed-off-by: Sergei Trofimovich <slyich@gmail.com> Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Adam Jackson [Fri, 8 Sep 2023 19:55:19 +0000 (15:55 -0400)]
libio: Fix oversized __io_vtables
IO_VTABLES_LEN is the size of the struct array in bytes, not the number
of __IO_jump_t's in the array. Drops just under 384kb from .rodata on
LP64 machines.
Fixes: 3020f72618e ("libio: Remove the usage of __libc_IO_vtables") Signed-off-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Florian Weimer <fweimer@redhat.com> Tested-by: Florian Weimer <fweimer@redhat.com>
When backporting commmit 6985865bc3ad5b23147ee73466583dd7fdf65892
("elf: Always call destructors in reverse constructor order
(bug 30785)"), we can move the l_init_called_next field to this
place, so that the internal GLIBC_PRIVATE ABI does not change.
Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
elf: Always call destructors in reverse constructor order (bug 30785)
The current implementation of dlclose (and process exit) re-sorts the
link maps before calling ELF destructors. Destructor order is not the
reverse of the constructor order as a result: The second sort takes
relocation dependencies into account, and other differences can result
from ambiguous inputs, such as cycles. (The force_first handling in
_dl_sort_maps is not effective for dlclose.) After the changes in
this commit, there is still a required difference due to
dlopen/dlclose ordering by the application, but the previous
discrepancies went beyond that.
A new global (namespace-spanning) list of link maps,
_dl_init_called_list, is updated right before ELF constructors are
called from _dl_init.
In dl_close_worker, the maps variable, an on-stack variable length
array, is eliminated. (VLAs are problematic, and dlclose should not
call malloc because it cannot readily deal with malloc failure.)
Marking still-used objects uses the namespace list directly, with
next and next_idx replacing the done_index variable.
After marking, _dl_init_called_list is used to call the destructors
of now-unused maps in reverse destructor order. These destructors
can call dlopen. Previously, new objects do not have l_map_used set.
This had to change: There is no copy of the link map list anymore,
so processing would cover newly opened (and unmarked) mappings,
unloading them. Now, _dl_init (indirectly) sets l_map_used, too.
(dlclose is handled by the existing reentrancy guard.)
After _dl_init_called_list traversal, two more loops follow. The
processing order changes to the original link map order in the
namespace. Previously, dependency order was used. The difference
should not matter because relocation dependencies could already
reorder link maps in the old code.
The changes to _dl_fini remove the sorting step and replace it with
a traversal of _dl_init_called_list. The l_direct_opencount
decrement outside the loader lock is removed because it appears
incorrect: the counter manipulation could race with other dynamic
loader operations.
tst-audit23 needs adjustments to the changes in LA_ACT_DELETE
notifications. The new approach for checking la_activity should
make it clearer that la_activty calls come in pairs around namespace
updates.
The dependency sorting test cases need updates because the destructor
order is always the opposite order of constructor order, even with
relocation dependencies or cycles present.
There is a future cleanup opportunity to remove the now-constant
force_first and for_fini arguments from the _dl_sort_maps function.
Aurelien Jarno [Mon, 28 Aug 2023 21:30:37 +0000 (23:30 +0200)]
io: Fix record locking contants for powerpc64 with __USE_FILE_OFFSET64
Commit 5f828ff824e3b7cd1 ("io: Fix F_GETLK, F_SETLK, and F_SETLKW for
powerpc64") fixed an issue with the value of the lock constants on
powerpc64 when not using __USE_FILE_OFFSET64, but it ended-up also
changing the value when using __USE_FILE_OFFSET64 causing an API change.
Fix that by also checking that define, restoring the pre 4d0fe291aed3a476a commit values:
And shorten the section/node names a bit, so that the menu
entries become easier to read.
Texinfo 6.5 fails to process the previous structure:
./dynlink.texi:56: warning: node `Dynamic Linker Introspection' is
next for `Dynamic Linker Diagnostics' in sectioning but not in menu
./dynlink.texi:56: warning: node up `Dynamic Linker Diagnostics'
in menu `Dynamic Linker Invocation' and
in sectioning `Dynamic Linker' differ
./dynlink.texi:1: node `Dynamic Linker' lacks menu item for
`Dynamic Linker Diagnostics' despite being its Up target
./dynlink.texi:226: warning: node prev `Dynamic Linker Introspection' in menu `Dynamic Linker Invocation'
and in sectioning `Dynamic Linker Diagnostics' differ
riscv: Add support for XTheadBb in string-fz[a,i].h
XTheadBb has similar instructions like Zbb, which allow optimized
string processing:
* th.ff0: find-first zero is a CLZ instruction.
* th.tstnbz: Similar like orc.b, but with a bit-inverted result.
The instructions are documented here:
https://github.com/T-head-Semi/thead-extension-spec/tree/master/xtheadbb
These instructions can be found in the T-Head C906 and the C910.
Tested with the string tests.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This code is generally unused in practice since there don't seem to be
any NSS modules that only implement _nss_MOD_gethostbyname2_r and not
_nss_MOD_gethostbyname3_r.
This interface allows to obtain the associated process ID from the
process file descriptor. It is done by parsing the procps fdinfo
information. Its prototype is:
pid_t pidfd_getpid (int fd)
It returns the associated pid or -1 in case of an error and sets the
errno accordingly. The possible errno values are those from open, read,
and close (used on procps parsing), along with:
- EBADF if the FD is negative, does not have a PID associated, or if
the fdinfo fields contain a value larger than pid_t.
- EREMOTE if the PID is in a separate namespace.
- ESRCH if the process is already terminated.
Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.
posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
Returning a pidfd allows a process to keep a race-free handle for a
child process, otherwise, the caller will need to either use pidfd_open
(which still might be subject to TOCTOU) or keep the old racy interface
base on pid_t.
To correct use pifd_spawn, the kernel must support not only returning
the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux
5.4). If kernel does not support the waitid, pidfd return ENOSYS.
It avoids the need to racy workarounds, such as reading the procfs
fdinfo to get the pid to use along with other wait interfaces.
These interfaces are similar to the posix_spawn and posix_spawnp, with
the only difference being it returns a process file descriptor (int)
instead of a process ID (pid_t). Their prototypes are:
A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the return
argument lifetime. Although on Linux pid_t and int are interchangeable,
POSIX only states that pid_t should be a signed integer.
Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It
also means that both interfaces support the same attribute and file
actions, and a new flag or file action on posix_spawn is also added
automatically for pidfd_spawn.
Also, using posix_spawn plumbing allows the reusing of most of the
current testing with some changes:
- waitid is used instead of waitpid since it is a more generic
interface.
- tst-posix_spawn-setsid.c is adapted to take into consideration that
the caller can check for session id directly. The test now spawns
itself and writes the session id as a file instead.
- tst-spawn3.c need to know where pidfd_spawn is used so it keeps an
extra file description unused.
Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2. Reviewed-by: Florian Weimer <fweimer@redhat.com>
These functions allow to posix_spawn and posix_spawnp to use
CLONE_INTO_CGROUP with clone3, allowing the child process to
be created in a different cgroup version 2. These are GNU
extensions that are available only for Linux, and also only
for the architectures that implement clone3 wrapper
(HAVE_CLONE3_WRAPPER).
To create a process on a different cgroupv2, one can use the:
Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control
whether the cgroup file descriptor will be used or not with
clone3.
There is no fallback if either clone3 does not support the flag
or if the architecture does not provide the clone3 wrapper, in
this case posix_spawn returns EOPNOTSUPP.
Bruno Haible [Mon, 4 Sep 2023 13:31:36 +0000 (15:31 +0200)]
intl: Treat C.UTF-8 locale like C locale (BZ# 16621)
The wiki page https://sourceware.org/glibc/wiki/Proposals/C.UTF-8
says that "Setting LC_ALL=C.UTF-8 will ignore LANGUAGE just like it
does with LC_ALL=C." This patch implements it.
* intl/dcigettext.c (guess_category_value): Treat C.<encoding> locale
like the C locale.
Szabolcs Nagy [Tue, 16 Feb 2021 12:55:13 +0000 (12:55 +0000)]
elf: Fix slow tls access after dlopen [BZ #19924]
In short: __tls_get_addr checks the global generation counter and if
the current dtv is older then _dl_update_slotinfo updates dtv up to the
generation of the accessed module. So if the global generation is newer
than generation of the module then __tls_get_addr keeps hitting the
slow dtv update path. The dtv update path includes a number of checks
to see if any update is needed and this already causes measurable tls
access slow down after dlopen.
It may be possible to detect up-to-date dtv faster. But if there are
many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
least walking the slotinfo list.
This patch tries to update the dtv to the global generation instead, so
after a dlopen the tls access slow path is only hit once. The modules
with larger generation than the accessed one were not necessarily
synchronized before, so additional synchronization is needed.
This patch uses acquire/release synchronization when accessing the
generation counter.
Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.
I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
powerpc, sparc, x86 who reported that it fixes the performance issue
of bug 19924.
H.J. Lu [Mon, 28 Aug 2023 19:08:14 +0000 (12:08 -0700)]
x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643]
The old Intel software developer manual specified that the low byte of
EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of
CPUDID leaf 2 was needed to retrieve the complete cache information. The
newer Intel manual has been changed to that it should always return 1
and be ignored. If the lower byte isn't 1, CPUID leaf 2 can't be used.
In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead. If
CPUID leaf 4 doesn't contain the cache information, cache information
isn't available at all. This addresses BZ #30643.
Florian Weimer [Tue, 29 Aug 2023 06:28:31 +0000 (08:28 +0200)]
nscd: Skip unusable entries in first pass in prune_cache (bug 30800)
Previously, if an entry was marked unusable for any reason, but had
not timed out yet, the assert would trigger.
One way to get into such state is if a data change is detected during
re-validation of an entry. This causes the entry to be marked as not
usable. If exits nscd soon after that, then the clock jumps
backwards, and nscd restarted, the cache re-validation run after
startup triggers the removed assert.
The change is more complicated than just the removal of the assert
because entries marked as not usable should be garbage-collected in
the second pass. To make this happen, it is necessary to update some
book-keeping data.
dengjianbo [Mon, 28 Aug 2023 02:08:39 +0000 (10:08 +0800)]
LoongArch: Add ifunc support for memcmp{aligned, lsx, lasx}
According to glibc memcmp microbenchmark test results(Add generic
memcmp), this implementation have performance improvement
except the length is less than 3, details as below:
Name Percent of time reduced
memcmp-lasx 16%-74%
memcmp-lsx 20%-50%
memcmp-aligned 5%-20%
dengjianbo [Mon, 28 Aug 2023 02:08:38 +0000 (10:08 +0800)]
LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx}
According to glibc memset microbenchmark test results, for LSX and LASX
versions, A few cases with length less than 8 experience performace
degradation, overall, the LASX version could reduce the runtime about
15% - 75%, LSX version could reduce the runtime about 15%-50%.
The unaligned version uses unaligned memmory access to set data which
length is less than 64 and make address aligned with 8. For this part,
the performace is better than aligned version. Comparing with the generic
version, the performance is close when the length is larger than 128. When
the length is 8-128, the unaligned version could reduce the runtime about
30%-70%, the aligned version could reduce the runtime about 20%-50%.
dengjianbo [Mon, 28 Aug 2023 02:08:35 +0000 (10:08 +0800)]
LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx}
According to glibc rawmemchr microbenchmark, A few cases tested with
char '\0' experience performance degradation due to the lasx and lsx
versions don't handle the '\0' separately. Overall, rawmemchr-lasx
implementation could reduce the runtime about 40%-80%, rawmemchr-lsx
implementation could reduce the runtime about 40%-66%, rawmemchr-aligned
implementation could reduce the runtime about 20%-40%.
m68k: Use M68K_SCALE_AVAILABLE on __mpn_lshift and __mpn_rshift
This patch adds a new macro, M68K_SCALE_AVAILABLE, similar to gmp
scale_available_p (mpn/m68k/m68k-defs.m4) that expand to 1 if a
scale factor can be used in addressing modes. This is used
instead of __mc68020__ for some optimization decisions.
Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
m68k: Fix build with -mcpu=68040 or higher (BZ 30740)
GCC currently does not define __mc68020__ for -mcpu=68040 or higher,
which memcpy/memmove assumptions. Since this memory copy optimization
seems only intended for m68020, disable for other m680X0 variants.
Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
dengjianbo [Thu, 24 Aug 2023 08:50:19 +0000 (16:50 +0800)]
LoongArch: Add ifunc support for strncmp{aligned, lsx}
Based on the glibc microbenchmark, only a few short inputs with this
strncmp-aligned and strncmp-lsx implementation experience performance
degradation, overall, strncmp-aligned could reduce the runtime 0%-10%
for aligned comparision, 10%-25% for unaligend comparision, strncmp-lsx
could reduce the runtime about 0%-60%.
dengjianbo [Thu, 24 Aug 2023 08:50:18 +0000 (16:50 +0800)]
LoongArch: Add ifunc support for strcmp{aligned, lsx}
Based on the glibc microbenchmark, strcmp-aligned implementation could
reduce the runtime 0%-10% for aligned comparison, 10%-20% for unaligned
comparison, strcmp-lsx implemenation could reduce the runtime 0%-50%.
dengjianbo [Thu, 24 Aug 2023 08:50:17 +0000 (16:50 +0800)]
LoongArch: Add ifunc support for strnlen{aligned, lsx, lasx}
Based on the glibc microbenchmark, strnlen-aligned implementation could
reduce the runtime more than 10%, strnlen-lsx implementation could reduce
the runtime about 50%-78%, strnlen-lasx implementation could reduce the
runtime about 50%-88%.
Florian Weimer [Fri, 4 Aug 2023 10:44:01 +0000 (12:44 +0200)]
Linux: Avoid conflicting types in ld.so --list-diagnostics
The path auxv[*].a_val could either be an integer or a string,
depending on the a_type value. Use a separate field, a_val_string, to
simplify mechanical parsing of the --list-diagnostics output.
H.J. Lu [Thu, 17 Aug 2023 16:42:29 +0000 (09:42 -0700)]
x86_64: Add log1p with FMA
On Skylake, it changes log1p bench performance by:
Before After Improvement
max 63.349 58.347 8%
min 4.448 5.651 -30%
mean 12.0674 10.336 14%
The minimum code path is
if (hx < 0x3FDA827A) /* x < 0.41422 */
{
if (__glibc_unlikely (ax >= 0x3ff00000)) /* x <= -1.0 */
{
...
}
if (__glibc_unlikely (ax < 0x3e200000)) /* |x| < 2**-29 */
{
math_force_eval (two54 + x); /* raise inexact */
if (ax < 0x3c900000) /* |x| < 2**-54 */
{
...
}
else
return x - x * x * 0.5;
FMA and non-FMA code sequences look similar. Non-FMA version is slightly
faster. Since log1p is called by asinh and atanh, it improves asinh
performance by:
Before After Improvement
max 75.645 63.135 16%
min 10.074 10.071 0%
mean 15.9483 14.9089 6%
and improves atanh performance by:
Before After Improvement
max 91.768 75.081 18%
min 15.548 13.883 10%
mean 18.3713 16.8011 8%
Mahesh Bodapati [Fri, 11 Aug 2023 15:38:25 +0000 (10:38 -0500)]
string: Fix tester build with fortify enable with gcc < 12
When building with fortify enabled, GCC < 12 issues a warning on the
fortify strncat wrapper might overflow the destination buffer (the
failure is tied to -Werror).
Checked on ppc64 and x86_64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Stefan Liebler [Mon, 14 Aug 2023 12:22:24 +0000 (14:22 +0200)]
s390x: Fix static PIE condition for toolchain bootstrapping.
The static PIE configure check uses link tests. When bootstrapping
a cross-toolchain, the link tests fail due to missing crt-files /
libc.so. As we explicitely want to test an issue in binutils (ld),
we now also explicitely check for known linker versions.
dengjianbo [Tue, 15 Aug 2023 01:11:53 +0000 (09:11 +0800)]
Loongarch: Add ifunc support for memcpy{aligned, unaligned, lsx, lasx} and memmove{aligned, unaligned, lsx, lasx}
These implementations improve the time to copy data in the glibc
microbenchmark as below:
memcpy-lasx reduces the runtime about 8%-76%
memcpy-lsx reduces the runtime about 8%-72%
memcpy-unaligned reduces the runtime of unaligned data copying up to 40%
memcpy-aligned reduece the runtime of unaligned data copying up to 25%
memmove-lasx reduces the runtime about 20%-73%
memmove-lsx reduces the runtime about 50%
memmove-unaligned reduces the runtime of unaligned data moving up to 40%
memmove-aligned reduces the runtime of unaligned data moving up to 25%
dengjianbo [Tue, 15 Aug 2023 01:08:11 +0000 (09:08 +0800)]
Loongarch: Add ifunc support for strchr{aligned, lsx, lasx} and strchrnul{aligned, lsx, lasx}
These implementations improve the time to run strchr{nul}
microbenchmark in glibc as below:
strchr-lasx reduces the runtime about 50%-83%
strchr-lsx reduces the runtime about 30%-67%
strchr-aligned reduces the runtime about 10%-20%
strchrnul-lasx reduces the runtime about 50%-83%
strchrnul-lsx reduces the runtime about 36%-65%
strchrnul-aligned reduces the runtime about 6%-10%
SYS_modify_ldt requires CONFIG_MODIFY_LDT_SYSCALL to be set in the kernel, which
some distributions may disable for hardening. Check if that's the case (unset)
and mark the test as UNSUPPORTED if so.
Reviewed-by: DJ Delorie <dj@redhat.com> Signed-off-by: Sam James <sam@gentoo.org>
Florian Weimer [Thu, 10 Aug 2023 17:36:56 +0000 (19:36 +0200)]
malloc: Remove bin scanning from memalign (bug 30723)
On the test workload (mpv --cache=yes with VP9 video decoding), the
bin scanning has a very poor success rate (less than 2%). The tcache
scanning has about 50% success rate, so keep that.
Update comments in malloc/tst-memalign-2 to indicate the purpose
of the tests. Even with the scanning removed, the additional
merging opportunities since commit 542b1105852568c3ebc712225ae78b
("malloc: Enable merging of remainders in memalign (bug 30723)")
are sufficient to pass the existing large bins test.
Remove leftover variables from _int_free from refactoring in the
same commit.
H.J. Lu [Fri, 11 Aug 2023 15:04:08 +0000 (08:04 -0700)]
x86_64: Add expm1 with FMA
On Skylake, it improves expm1 bench performance by:
Before After Improvement
max 70.204 68.054 3%
min 20.709 16.2 22%
mean 22.1221 16.7367 24%
NB: Add
extern long double __expm1l (long double);
extern long double __expm1f128 (long double);
for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.
dengjianbo [Tue, 8 Aug 2023 06:15:44 +0000 (14:15 +0800)]
Loongarch: Add ifunc support and add different versions of strlen
strlen-lasx is implemeted by LASX simd instructions(256bit)
strlen-lsx is implemeted by LSX simd instructions(128bit)
strlen-align is implemented by LA basic instructions and never use unaligned memory acess