H.J. Lu [Fri, 10 May 2024 03:07:01 +0000 (20:07 -0700)]
Force DT_RPATH for --enable-hardcoded-path-in-tests
On Fedora 40/x86-64, linker enables --enable-new-dtags by default which
generates DT_RUNPATH instead of DT_RPATH. Unlike DT_RPATH, DT_RUNPATH
only applies to DT_NEEDED entries in the executable and doesn't applies
to DT_NEEDED entries in shared libraries which are loaded via DT_NEEDED
entries in the executable. Some glibc tests have libstdc++.so.6 in
DT_NEEDED, which has libm.so.6 in DT_NEEDED. When DT_RUNPATH is generated,
/lib64/libm.so.6 is loaded for such tests. If the newly built glibc is
older than glibc 2.36, these tests fail with
assert/tst-assert-c++: /export/build/gnu/tools-build/glibc-gitlab-release/build-x86_64-linux/libc.so.6: version `GLIBC_2.36' not found (required by /lib64/libm.so.6)
assert/tst-assert-c++: /export/build/gnu/tools-build/glibc-gitlab-release/build-x86_64-linux/libc.so.6: version `GLIBC_ABI_DT_RELR' not found (required by /lib64/libm.so.6)
Pass -Wl,--disable-new-dtags to linker when building glibc tests with
--enable-hardcoded-path-in-tests. This fixes BZ #31719.
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS. This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).
CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in addgetnetgrentX (bug 31680)
This avoids potential memory corruption when the underlying NSS
callback function does not use the buffer space to store all strings
(e.g., for constant strings).
Instead of custom buffer management, two scratch buffers are used.
This increases stack usage somewhat.
Scratch buffer allocation failure is handled by return -1
(an invalid timeout value) instead of terminating the process.
This fixes bug 31679.
The addgetnetgrentX call in addinnetgrX may have failed to produce
a result, so the result variable in addinnetgrX can be NULL.
Use db->negtimeout as the fallback value if there is no result data;
the timeout is also overwritten below.
Also avoid sending a second not-found response. (The client
disconnects after receiving the first response, so the data stream did
not go out of sync even without this fix.) It is still beneficial to
add the negative response to the mapping, so that the client can get
it from there in the future, instead of going through the socket.
Charles Fol [Thu, 28 Mar 2024 15:25:38 +0000 (12:25 -0300)]
iconv: ISO-2022-CN-EXT: fix out-of-bound writes when writing escape sequence (CVE-2024-2961)
ISO-2022-CN-EXT uses escape sequences to indicate character set changes
(as specified by RFC 1922). While the SOdesignation has the expected
bounds checks, neither SS2designation nor SS3designation have its;
allowing a write overflow of 1, 2, or 3 bytes with fixed values:
'$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'.
Checked on aarch64-linux-gnu.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit f9dc609e06b1136bb0408be9605ce7973a767ada)
powerpc: Fix ld.so address determination for PCREL mode (bug 31640)
This seems to have stopped working with some GCC 14 versions,
which clobber r2. With other compilers, the kernel-provided
r2 value is still available at this point.
Wilco Dijkstra [Thu, 21 Mar 2024 16:48:33 +0000 (16:48 +0000)]
AArch64: Check kernel version for SVE ifuncs
Old Linux kernels disable SVE after every system call. Calling the
SVE-optimized memcpy afterwards will then cause a trap to reenable SVE.
As a result, applications with a high use of syscalls may run slower with
the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0,
except for 5.14.0 which was patched. Avoid this by checking the kernel
version and selecting the SVE ifunc on modern kernels.
Parse the kernel version reported by uname() into a 24-bit kernel.major.minor
value without calling any library functions. If uname() is not supported or
if the version format is not recognized, assume the kernel is modern.
Tested-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 2e94e2f5d2bf2de124c8ad7da85463355e54ccb2)
Szabolcs Nagy [Wed, 13 Mar 2024 14:34:14 +0000 (14:34 +0000)]
aarch64: fix check for SVE support in assembler
Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.
The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.
Andreas Schwab [Thu, 23 Nov 2023 17:23:46 +0000 (18:23 +0100)]
aarch64: correct CFI in rawmemchr (bug 31113)
The .cfi_return_column directive changes the return column for the whole
FDE range. But the actual intent is to tell the unwinder that the value
in x30 (lr) now resides in x15 after the move, and that is expressed by
the .cfi_register directive.
Wilco Dijkstra [Thu, 26 Oct 2023 16:30:36 +0000 (17:30 +0100)]
AArch64: Remove Falkor memcpy
The latest implementations of memcpy are actually faster than the Falkor
implementations [1], so remove the falkor/phecda ifuncs for memcpy and
the now unused IS_FALKOR/IS_PHECDA defines.
Wilco Dijkstra [Thu, 26 Oct 2023 16:07:21 +0000 (17:07 +0100)]
AArch64: Add memset_zva64
Add a specialized memset for the common ZVA size of 64 to avoid the
overhead of reading the ZVA size. Since the code is identical to
__memset_falkor, remove the latter.
Wilco Dijkstra [Tue, 24 Oct 2023 12:51:07 +0000 (13:51 +0100)]
AArch64: Cleanup ifuncs
Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than
ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename
strlen_mte to strlen_generic. Remove rtld-memset.
Wilco Dijkstra [Tue, 17 Oct 2023 15:54:21 +0000 (16:54 +0100)]
AArch64: Add support for MOPS memcpy/memmove/memset
Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for
memcpy, memmove and memset (use .inst for now so it works with all binutils
versions without needing complex configure and conditional compilation).
Wilco Dijkstra [Wed, 1 Feb 2023 18:45:19 +0000 (18:45 +0000)]
AArch64: Improve SVE memcpy and memmove
Improve SVE memcpy by copying 2 vectors if the size is small enough.
This improves performance of random memcpy by ~9% on Neoverse V1, and
33-64 byte copies are ~16% faster.
Wilco Dijkstra [Wed, 11 Jan 2023 13:53:19 +0000 (13:53 +0000)]
AArch64: Improve strrchr
Use shrn for narrowing the mask which simplifies code and speeds up small
strings. Unroll the first search loop to improve performance on large
strings.
Wilco Dijkstra [Wed, 11 Jan 2023 13:53:05 +0000 (13:53 +0000)]
AArch64: Optimize strnlen
Optimize strnlen using the shrn instruction and improve the main loop.
Small strings are around 10% faster, large strings are 40% faster on
modern CPUs.
Wilco Dijkstra [Wed, 26 Oct 2022 13:16:50 +0000 (14:16 +0100)]
aarch64: Use memcpy_simd as the default memcpy
Since __memcpy_simd is the fastest memcpy on almost all cores, replace
the generic memcpy with it. If SVE is available, a SVE memcpy will be
used by default (including for Neoverse N2).
Danila Kutenin [Mon, 27 Jun 2022 16:12:13 +0000 (16:12 +0000)]
aarch64: Optimize string functions with shrn instruction
We found that string functions were using AND+ADDP
to find the nibble/syndrome mask but there is an easier
opportunity through `SHRN dst.8b, src.8h, 4` (shift
right every 2 bytes by 4 and narrow to 1 byte) and has
same latency on all SIMD ARMv8 targets as ADDP. There
are also possible gaps for memcmp but that's for
another patch.
We see 10-20% savings for small-mid size cases (<=128)
which are primary cases for general workloads.
Wilco Dijkstra [Tue, 7 Jun 2022 15:44:35 +0000 (16:44 +0100)]
AArch64: Add SVE memcpy
Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE
vectors which improves the random memcpy benchmark significantly.
Cleanup the memcpy and memmove ifunc selectors.
Wilco Dijkstra [Thu, 2 Dec 2021 18:30:55 +0000 (18:30 +0000)]
AArch64: Optimize memcmp
Rewrite memcmp to improve performance. On small and medium inputs performance
is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per
iteration, which is 30-50% faster depending on the size.
This restore the 2.33 semantic for arena_get2. It was changed by 11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc
was refactored to use an scratch_buffer - 903bc7dcc2acafc). The
__get_nproc was refactored over then and now it also avoid to call
malloc.
The 11a02b035b46 did not take in consideration any performance
implication, which should have been discussed properly. The
__get_nprocs_sched is still used as a fallback mechanism if procfs
and sysfs is not acessible.
Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory. Ffsll function code size is 17 bytes. Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line. When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.
Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.
This patch fixes ffsll function random performance regression.
H.J. Lu [Thu, 21 Dec 2023 03:42:12 +0000 (19:42 -0800)]
x86-64: Fix the tcb field load for x32 [BZ #31185]
_dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer
via the tcb field in TCB:
_dl_tlsdesc_undefweak:
_CET_ENDBR
movq 8(%rax), %rax
subq %fs:0, %rax
ret
_dl_tlsdesc_dynamic:
...
subq %fs:0, %rax
movq -8(%rsp), %rdi
ret
Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location,
not 64-bit. It should use "sub %fs:0, %RAX_LP" instead. Since
_dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic
returns void *, RAX_LP is appropriate here for x32 and x86-64. This
fixes BZ #31185.
H.J. Lu [Thu, 21 Dec 2023 00:31:43 +0000 (16:31 -0800)]
x86-64: Fix the dtv field load for x32 [BZ #31184]
On x32, I got
FAIL: elf/tst-tlsgap
$ gdb elf/tst-tlsgap
...
open tst-tlsgap-mod1.so
Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 2268754]
_dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108
108 movq (%rsi), %rax
(gdb) p/x $rsi
$4 = 0xf7dbf9005655fb18
(gdb)
This is caused by
_dl_tlsdesc_dynamic:
_CET_ENDBR
/* Preserve call-clobbered registers that we modify.
We need two scratch regs anyway. */
movq %rsi, -16(%rsp)
movq %fs:DTV_OFFSET, %rsi
Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit
location, not 64-bit. Load the dtv field to RSI_LP instead of rsi.
This fixes BZ #31184.
_dl_assign_tls_modid() assigns a slotinfo entry for a new module, but
does *not* do anything to the generation counter. The first time this
happens, the generation is zero and map_generation() returns the current
generation to be used during relocation processing. However, if
a slotinfo entry is later reused, it will already have a generation
assigned. If this generation has fallen behind the current global max
generation, then this causes an obsolete generation to be assigned
during relocation processing, as map_generation() returns this
generation if nonzero. _dl_add_to_slotinfo() eventually resets the
generation, but by then it is too late. This causes DTV updates to be
skipped, leading to NULL or broken TLS slot pointers and segfaults.
Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
so it behaves the same as the first time a slot is assigned.
_dl_add_to_slotinfo() will still assign the correct static generation
later during module load, but relocation processing will no longer use
an obsolete generation.
Note that slotinfo entry (aka modid) reuse typically happens after a
dlclose and only TLS access via dynamic tlsdesc is affected. Because
tlsdesc is optimized to use the optional part of static TLS, dynamic
tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls
tunable to a large enough value, or by LD_PRELOAD-ing the affected
modules.
tunables: Terminate if end of input is reached (CVE-2023-4911)
The string parsing routine may end up writing beyond bounds of tunestr
if the input tunable string is malformed, of the form name=name=val.
This gets processed twice, first as name=name=val and next as name=val,
resulting in tunestr being name=name=val:name=val, thus overflowing
tunestr.
Terminate the parsing loop at the first instance itself so that tunestr
does not overflow.
This also fixes up tst-env-setuid-tunables to actually handle failures
correct and add new tests to validate the fix for this CVE.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1056e5b4c3f2d90ed2b4a55f96add28da2f4c8fa)
getaddrinfo: Fix use after free in getcanonname (CVE-2023-4806)
When an NSS plugin only implements the _gethostbyname2_r and
_getcanonname_r callbacks, getaddrinfo could use memory that was freed
during tmpbuf resizing, through h_name in a previous query response.
The backing store for res->at->name when doing a query with
gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in
gethosts during the query. For AF_INET6 lookup with AI_ALL |
AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second
for a v4 lookup. In this case, if the first call reallocates tmpbuf
enough number of times, resulting in a malloc, th->h_name (that
res->at->name refers to) ends up on a heap allocated storage in tmpbuf.
Now if the second call to gethosts also causes the plugin callback to
return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF
reference in res->at->name. This then gets dereferenced in the
getcanonname_r plugin call, resulting in the use after free.
Fix this by copying h_name over and freeing it at the end. This
resolves BZ #30843, which is assigned CVE-2023-4806.
All other cases of failures due to lack of memory return EAI_MEMORY, so
it seems wrong to return EAI_SYSTEM here. The only reason
convert_hostent_to_gaih_addrtuple could fail is on calloc failure.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit b587456c0e7b59dcfdbd2d44db000a3bc8244e57)
Introduce the gaih_result structure and general paradigm for cleanups
that follow to process the lookup request and return a result. A lookup
function (like text_to_binary_address), should return an integer error
code and set members of gaih_result based on what it finds. If the
function does not have a result and no errors have occurred during the
lookup, it should return 0 and res.at should be set to NULL, allowing a
subsequent function to do the lookup until we run out of options.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 26dea461191cca519b498890a9682fe4bc8e4c2f)
Refactor the code to split out the service resolution code into a
separate function. Allocate the service tuples array just once to the
size of the typeproto array, thus avoiding the unnecessary pointer
chasing and stack allocations.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 8d6cf99f2fb81a097f9334c125e5c23604af1a98)
Use realloc in convert_hostent_to_gaih_addrtuple and fix up pointers in
the result list so that a single block is maintained for
hostbyname3_r/hostbyname2_r and freed in gaih_inet. This result is
never merged with any other results, since the hosts database does not
permit merging.
Resolves BZ #28852.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 300460460706ce3ffe29a7df8966e68323ec5bf1)
Simplify logic for allocation of canon to remove the canonbuf variable;
canon now always points to an allocated block. Also pull the canon name
set into a separate function.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit d01411f6bc61429fc027c38827bf3103b48eef2e)
Simplify allocations and fix merge and continue actions [BZ #28931]
Allocations for address tuples is currently a bit confusing because of
the pointer chasing through PAT, making it hard to observe the sequence
in which allocations have been made. Narrow scope of the pointer
chasing through PAT so that it is only used where necessary.
This also tightens actions behaviour with the hosts database in
getaddrinfo to comply with the manual text. The "continue" action
discards previous results and the "merge" action results in an immedate
lookup failure. Consequently, chaining of allocations across modules is
no longer necessary, thus opening up cleanup opportunities.
A test has been added that checks some combinations to ensure that they
work correctly.
Resolves: BZ #28931
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 1c37b8022e8763fedbb3f79c02e05c6acfe5a215)
x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold.
On some machines we end up with incomplete cache information. This can
make the new calculation of `sizeof(total-L3)/custom-divisor` end up
lower than intended (and lower than the prior value). So reintroduce
the old bound as a lower bound to avoid potentially regressing code
where we don't have complete information to make the decision. Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 8b9a0af8ca012217bf90d1dc0694f85b49ae09da)
x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4`
```
Split `shared` (cumulative cache size) from `shared_per_thread` (cache
size per socket), the `shared_per_thread` *can* be slightly off from
the previous calculation.
Previously we added `core` even if `threads_l2` was invalid, and only
used `threads_l2` to divide `core` if it was present. The changed
version only included `core` if `threads_l2` was valid.
This change restores the old behavior if `threads_l2` is invalid by
adding the entire value of `core`. Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 47f747217811db35854ea06741be3685e8bbd44d)
Noah Goldstein [Wed, 7 Jun 2023 18:18:01 +0000 (13:18 -0500)]
x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4`
Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 /
ncores_per_socket'. This patch updates that value to roughly
'sizeof_L3 / 4`
The original value (specifically dividing the `ncores_per_socket`) was
done to limit the amount of other threads' data a `memcpy`/`memset`
could evict.
Dividing by 'ncores_per_socket', however leads to exceedingly low
non-temporal thresholds and leads to using non-temporal stores in
cases where REP MOVSB is multiple times faster.
Furthermore, non-temporal stores are written directly to main memory
so using it at a size much smaller than L3 can place soon to be
accessed data much further away than it otherwise could be. As well,
modern machines are able to detect streaming patterns (especially if
REP MOVSB is used) and provide LRU hints to the memory subsystem. This
in affect caps the total amount of eviction at 1/cache_associativity,
far below meaningfully thrashing the entire cache.
As best I can tell, the benchmarks that lead this small threshold
where done comparing non-temporal stores versus standard cacheable
stores. A better comparison (linked below) is to be REP MOVSB which,
on the measure systems, is nearly 2x faster than non-temporal stores
at the low-end of the previous threshold, and within 10% for over
100MB copies (well past even the current threshold). In cases with a
low number of threads competing for bandwidth, REP MOVSB is ~2x faster
up to `sizeof_L3`.
The divisor of `4` is a somewhat arbitrary value. From benchmarks it
seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs
such as Broadwell prefer something closer to `8`. This patch is meant
to be followed up by another one to make the divisor cpu-specific, but
in the meantime (and for easier backporting), this patch settles on
`4` as a middle-ground.
Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable
stores where done using:
https://github.com/goldsteinn/memcpy-nt-benchmarks
Sheets results (also available in pdf on the github):
https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit af992e7abdc9049714da76cae1e5e18bc4838fb8)
The signal handler installed in the ELF constructor cannot easily
be removed again (because the program may have changed handlers
in the meantime). Mark the object as NODELETE so that the registered
handler function is never unloaded.
H.J. Lu [Thu, 27 Apr 2023 20:06:15 +0000 (13:06 -0700)]
__check_pf: Add a cancellation cleanup handler [BZ #20975]
There are reports for hang in __check_pf:
https://github.com/JoeDog/siege/issues/4
It is reproducible only under specific configurations:
1. Large number of cores (>= 64) and large number of threads (> 3X of
the number of cores) with long lived socket connection.
2. Low power (frequency) mode.
3. Power management is enabled.
While holding lock, __check_pf calls make_request which calls __sendto
and __recvmsg. Since __sendto and __recvmsg are cancellation points,
lock held by __check_pf won't be released and can cause deadlock when
thread cancellation happens in __sendto or __recvmsg. Add a cancellation
cleanup handler for __check_pf to unlock the lock when cancelled by
another thread. This fixes BZ #20975 and the siege hang issue.
Simon Kissane [Fri, 10 Feb 2023 21:58:02 +0000 (08:58 +1100)]
gmon: fix memory corruption issues [BZ# 30101]
V2 of this patch fixes an issue in V1, where the state was changed to ON not
OFF at end of _mcleanup. I hadn't noticed that (counterintuitively) ON=0 and
OFF=3, hence zeroing the buffer turned it back on. So set the state to OFF
after the memset.
1. Prevent double free, and reads from unallocated memory, when
_mcleanup is (incorrectly) called two or more times in a row,
without an intervening call to __monstartup; with this patch, the
second and subsequent calls effectively become no-ops instead.
While setting tos=NULL is minimal fix, safest action is to zero the
whole gmonparam buffer.
2. Prevent memory leak when __monstartup is (incorrectly) called two
or more times in a row, without an intervening call to _mcleanup;
with this patch, the second and subsequent calls effectively become
no-ops instead.
3. After _mcleanup, treat __moncontrol(1) as __moncontrol(0) instead.
With zeroing of gmonparam buffer in _mcleanup, this stops the
state incorrectly being changed to GMON_PROF_ON despite profiling
actually being off. If we'd just done the minimal fix to _mcleanup
of setting tos=NULL, there is risk of far worse memory corruption:
kcount would point to deallocated memory, and the __profil syscall
would make the kernel write profiling data into that memory,
which could have since been reallocated to something unrelated.
4. Ensure __moncontrol(0) still turns off profiling even in error
state. Otherwise, if mcount overflows and sets state to
GMON_PROF_ERROR, when _mcleanup calls __moncontrol(0), the __profil
syscall to disable profiling will not be invoked. _mcleanup will
free the buffer, but the kernel will still be writing profiling
data into it, potentially corrupted arbitrary memory.
Also adds a test case for (1). Issues (2)-(4) are not feasible to test.
When mcount overflows, no gmon.out file is generated, but no message is printed
to the user, leaving the user with no idea why, and thinking maybe there is
some bug - which is how BZ 27576 ended up being logged. Print a message to
stderr in this case so the user knows what is going on.
As a comment in sys/gmon.h acknowledges, the hardcoded MAXARCS value is too
small for some large applications, including the test case in that BZ. Rather
than increase it, add tunables to enable MINARCS and MAXARCS to be overridden
at runtime (glibc.gmon.minarcs and glibc.gmon.maxarcs). So if a user gets the
mcount overflow error, they can try increasing maxarcs (they might need to
increase minarcs too if the heuristic is wrong in their case.)
Note setting minarcs/maxarcs too large can cause monstartup to fail with an
out of memory error. If you set them large enough, it can cause an integer
overflow in calculating the buffer size. I haven't done anything to defend
against that - it would not generally be a security vulnerability, since these
tunables will be ignored in suid/sgid programs (due to the SXID_ERASE default),
and if you can set GLIBC_TUNABLES in the environment of a process, you can take
it over anyway (LD_PRELOAD, LD_LIBRARY_PATH, etc). I thought about modifying
the code of monstartup to defend against integer overflows, but doing so is
complicated, and I realise the existing code is susceptible to them even prior
to this change (e.g. try passing a pathologically large highpc argument to
monstartup), so I decided just to leave that possibility in-place.
Add a test case which demonstrates mcount overflow and the tunables.
The `__monstartup()` allocates a buffer used to store all the data
accumulated by the monitor.
The size of this buffer depends on the size of the internal structures
used and the address range for which the monitor is activated, as well
as on the maximum density of call instructions and/or callable functions
that could be potentially on a segment of executable code.
In particular a hash table of arcs is placed at the end of this buffer.
The size of this hash table is calculated in bytes as
p->fromssize = p->textsize / HASHFRACTION;
but actually should be
p->fromssize = ROUNDUP(p->textsize / HASHFRACTION, sizeof(*p->froms));
This results in writing beyond the end of the allocated buffer when an
added arc corresponds to a call near from the end of the monitored
address range, since `_mcount()` check the incoming caller address for
monitored range but not the intermediate result hash-like index that
uses to write into the table.
It should be noted that when the results are output to `gmon.out`, the
table is read to the last element calculated from the allocated size in
bytes, so the arcs stored outside the buffer boundary did not fall into
`gprof` for analysis. Thus this "feature" help me to found this bug
during working with https://sourceware.org/bugzilla/show_bug.cgi?id=29438
Just in case, I will explicitly note that the problem breaks the
`make test t=gmon/tst-gmon-dso` added for Bug 29438.
There, the arc of the `f3()` call disappears from the output, since in
the DSO case, the call to `f3` is located close to the end of the
monitored range.
Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>
Another minor error seems a related typo in the calculation of
`kcountsize`, but since kcounts are smaller than froms, this is
actually to align the p->froms data.
Co-authored-by: DJ Delorie <dj@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 801af9fafd4689337ebf27260aa115335a0cb2bc)
Adam Yi [Tue, 7 Mar 2023 12:30:02 +0000 (07:30 -0500)]
posix: Fix system blocks SIGCHLD erroneously [BZ #30163]
Fix bug that SIGCHLD is erroneously blocked forever in the following
scenario:
1. Thread A calls system but hasn't returned yet
2. Thread B calls another system but returns
SIGCHLD would be blocked forever in thread B after its system() returns,
even after the system() in thread A returns.
Although POSIX does not require, glibc system implementation aims to be
thread and cancellation safe. This bug was introduced in 5fb7fc96350575c9adb1316833e48ca11553be49 when we moved reverting signal
mask to happen when the last concurrently running system returns,
despite that signal mask is per thread. This commit reverts this logic
and adds a test.
Signed-off-by: Adam Yi <ayi@janestreet.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 436a604b7dc741fc76b5a6704c6cd8bb178518e7)
x86_64: Fix asm constraints in feraiseexcept (bug 30305)
The divss instruction clobbers its first argument, and the constraints
need to reflect that. Fortunately, with GCC 12, generated code does
not actually change, so there is no externally visible bug.
Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 5d1ccdda7b0c625751661d50977f3dfbc73f8eae)
Before this change, sgetsgent_r did not set errno to ERANGE, but
sgetsgent only check errno, not the return value from sgetsgent_r.
Consequently, sgetsgent did not detect any error, and reported
success to the caller, without initializing the struct sgrp object
whose address was returned.
This commit changes sgetsgent_r to set errno as well. This avoids
similar issues in applications which only change errno.
H.J. Lu [Tue, 3 Jan 2023 21:06:48 +0000 (13:06 -0800)]
x86: Check minimum/maximum of non_temporal_threshold [BZ #29953]
The minimum non_temporal_threshold is 0x4040. non_temporal_threshold may
be set to less than the minimum value when the shared cache size isn't
available (e.g., in an emulator) or by the tunable. Add checks for
minimum and maximum of non_temporal_threshold.
Vitaly Buka [Sat, 18 Feb 2023 20:53:41 +0000 (12:53 -0800)]
stdlib: Undo post review change to 16adc58e73f3 [BZ #27749]
Post review removal of "goto restart" from
https://sourceware.org/pipermail/libc-alpha/2021-April/125470.html
introduced a bug when some atexit handers skipped.
Florian Weimer [Wed, 8 Feb 2023 17:11:04 +0000 (18:11 +0100)]
elf: Smoke-test ldconfig -p against system /etc/ld.so.cache
The test is sufficient to detect the ldconfig bug fixed in
commit 9fe6f6363886aae6b2b210cae3ed1f5921299083 ("elf: Fix 64 time_t
support for installed statically binaries").
elf: Fix 64 time_t support for installed statically binaries
The usage of internal static symbol for statically linked binaries
does not work correctly for objects built with -D_TIME_BITS=64,
since the internal definition does not provide the expected aliases.
This patch makes it to use the default stat functions instead (which
uses the default 64 time_t alias and types).
Checked on i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9fe6f6363886aae6b2b210cae3ed1f5921299083)
Define the __glibc_fortify and other macros only when __FORTIFY_LEVEL >
0. This has the effect of not defining these macros on older C90
compilers that do not have support for variable length argument lists.
Also trim off the trailing backslashes from the definition of
__glibc_fortify and __glibc_fortify_n macros.
Carlos O'Donell [Fri, 28 Jan 2022 20:14:29 +0000 (15:14 -0500)]
malloc: Fix -Wuse-after-free warning in tst-mallocalign1 [BZ #26779]
The test leaks bits from the freed pointer via the return value
in ret, and the compiler correctly identifies this issue.
We switch the test to use TEST_VERIFY and terminate the test
if any of the pointers return an unexpected alignment.
This fixes another -Wuse-after-free error when compiling glibc
with gcc 12.
Joseph Myers [Mon, 2 Aug 2021 16:33:44 +0000 (16:33 +0000)]
Fix build of nptl/tst-thread_local1.cc with GCC 12
The test nptl/tst-thread_local1.cc fails to build with GCC mainline
because of changes to what libstdc++ headers implicitly include what
other headers:
tst-thread_local1.cc: In function 'int do_test()':
tst-thread_local1.cc:177:5: error: variable 'std::array<std::pair<const char*, std::function<void(void* (*)(void*))> >, 2> do_thread_X' has initializer but incomplete type
177 | do_thread_X
| ^~~~~~~~~~~
Fix this by adding an explicit include of <array>.
Tested with build-many-glibcs.py for aarch64-linux-gnu.
Joseph Myers [Mon, 4 Oct 2021 19:10:43 +0000 (19:10 +0000)]
Fix stdio-common tests for GCC 12 -Waddress
My glibc bot shows failures building the testsuite with GCC mainline
across all architectures:
tst-vfprintf-width-prec.c: In function 'do_test':
tst-vfprintf-width-prec.c:90:16: error: the comparison will always evaluate as 'false' for the address of 'result' will never be NULL [-Werror=address]
90 | if (result == NULL)
| ^~
tst-vfprintf-width-prec.c:89:13: note: 'result' declared here
89 | wchar_t result[100];
| ^~~~~~
This is clearly a correct warning; the comparison against NULL is
clearly a cut-and-paste mistake from an earlier case in the test that
does use calloc. Thus, remove the unnecessary check for NULL shown up
by the warning.
Similarly, two other tests have bogus comparisons against NULL; remove
those as well:
scanf14a.c:95:13: error: the comparison will always evaluate as 'false' for the address of 'fname' will never be NULL [-Werror=address]
95 | if (fname == NULL)
| ^~
scanf14a.c:93:8: note: 'fname' declared here
93 | char fname[strlen (tmpdir) + sizeof "/tst-scanf14.XXXXXX"];
| ^~~~~
scanf16a.c:125:13: error: the comparison will always evaluate as 'false' for the address of 'fname' will never be NULL [-Werror=address]
125 | if (fname == NULL)
| ^~
scanf16a.c:123:8: note: 'fname' declared here
123 | char fname[strlen (tmpdir) + sizeof "/tst-scanf16.XXXXXX"];
| ^~~~~
Tested with build-many-glibcs.py (GCC mainline) for aarch64-linux-gnu.
Joseph Myers [Tue, 5 Oct 2021 14:25:40 +0000 (14:25 +0000)]
Fix stdlib/tst-setcontext.c for GCC 12 -Warray-compare
Building stdlib/tst-setcontext.c fails with GCC mainline:
tst-setcontext.c: In function 'f2':
tst-setcontext.c:61:16: error: comparison between two arrays [-Werror=array-compare]
61 | if (on_stack < st2 || on_stack >= st2 + sizeof (st2))
| ^
tst-setcontext.c:61:16: note: use '&on_stack[0] < &st2[0]' to compare the addresses
The comparison in this case is deliberate, so adjust it as suggested
in that note.
Tested with build-many-glibcs.py (GCC mainline) for aarch64-linux-gnu.
Replace a call to sprintf with an equivalent pair of stpcpy/strcpy calls
to avoid a GCC 12 -Wformat-overflow false positive due to recent optimizer
improvements.
Fangrui Song [Mon, 16 Aug 2021 16:59:30 +0000 (09:59 -0700)]
elf: Drop elf/tls-macros.h in favor of __thread and tls_model attributes [BZ #28152] [BZ #28205]
elf/tls-macros.h was added for TLS testing when GCC did not support
__thread. __thread and tls_model attributes are mature now and have been
used by many newer tests.
Also delete tst-tls2.c which tests .tls_common (unused by modern GCC and
unsupported by Clang/LLD). .tls_common and .tbss definition are almost
identical after linking, so the runtime test doesn't add additional
coverage. Assembler and linker tests should be on the binutils side.
When LLD 13.0.0 is allowed in configure.ac
(https://sourceware.org/pipermail/libc-alpha/2021-August/129866.html),
`make check` result is on par with glibc built with GNU ld on aarch64
and x86_64.
As a future clean-up, TLS_GD/TLS_LD/TLS_IE/TLS_IE macros can be removed from
sysdeps/*/tls-macros.h. We can add optional -mtls-dialect={gnu2,trad}
tests to ensure coverage.
Tested on aarch64-linux-gnu, powerpc64le-linux-gnu, and x86_64-linux-gnu.
The tzfile_mtime is already compared to 64 bit time_t stat call. Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 4e21c2075193e406a92c0d1cb091a7c804fda4d9)