* A stack-based buffer overflow was found in libresolv when invoked from
libnss_dns, allowing specially crafted DNS responses to seize control
of execution flow in the DNS client. The buffer overflow occurs in
the functions send_dg (send datagram) and send_vc (send TCP) for the
NSS module libnss_dns.so.2 when calling getaddrinfo with AF_UNSPEC
family. The use of AF_UNSPEC triggers the low-level resolver code to
send out two parallel queries for A and AAAA. A mismanagement of the
buffers used for those queries could result in the response of a query
writing beyond the alloca allocated buffer created by
_nss_dns_gethostbyname4_r. Buffer management is simplified to remove
the overflow. Thanks to the Google Security Team and Red Hat for
reporting the security impact of this issue, and Robert Holiday of
Ciena for reporting the related bug 18665. (CVE-2015-7547)
See also:
https://sourceware.org/ml/libc-alpha/2016-02/msg00416.html
https://sourceware.org/ml/libc-alpha/2016-02/msg00418.html
Leonhard Holz [Tue, 13 Jan 2015 06:03:56 +0000 (11:33 +0530)]
Fix memory handling in strxfrm_l [BZ #16009]
[Modified from the original email by Siddhesh Poyarekar]
This patch solves bug #16009 by implementing an additional path in
strxfrm that does not depend on caching the weight and rule indices.
In detail the following changed:
* The old main loop was factored out of strxfrm_l into the function
do_xfrm_cached to be able to alternativly use the non-caching version
do_xfrm.
* strxfrm_l allocates a a fixed size array on the stack. If this is not
sufficiant to store the weight and rule indices, the non-caching path is
taken. As the cache size is not dependent on the input there can be no
problems with integer overflows or stack allocations greater than
__MAX_ALLOCA_CUTOFF. Note that malloc-ing is not possible because the
definition of strxfrm does not allow an oom errorhandling.
* The uncached path determines the weight and rule index for every char
and for every pass again.
* Passing all the locale data array by array resulted in very long
parameter lists, so I introduced a structure that holds them.
* Checking for zero src string has been moved a bit upwards, it is
before the locale data initialization now.
* To verify that the non-caching path works correct I added a test run
to localedata/sort-test.sh & localedata/xfrm-test.c where all strings
are patched up with spaces so that they are too large for the caching path.
Florian Weimer [Thu, 15 Oct 2015 07:23:07 +0000 (09:23 +0200)]
Always enable pointer guard [BZ #18928]
Honoring the LD_POINTER_GUARD environment variable in AT_SECURE mode
has security implications. This commit enables pointer guard
unconditionally, and the environment variable is now ignored.
[BZ #18928]
* sysdeps/generic/ldsodefs.h (struct rtld_global_ro): Remove
_dl_pointer_guard member.
* elf/rtld.c (_rtld_global_ro): Remove _dl_pointer_guard
initializer.
(security_init): Always set up pointer guard.
(process_envvars): Do not process LD_POINTER_GUARD.
The lookup function implementation in
nss/nss_files/files-XXX.c:DB_LOOKUP has code to prevent that. It is
supposed skip closing the input file if it was already open.
/* Reset file pointer to beginning or open file. */ \
status = internal_setent (keep_stream); \
\
if (status == NSS_STATUS_SUCCESS) \
{ \
/* Tell getent function that we have repositioned the file pointer. */ \
last_use = getby; \
\
while ((status = internal_getent (result, buffer, buflen, errnop \
H_ERRNO_ARG EXTRA_ARGS_VALUE)) \
== NSS_STATUS_SUCCESS) \
{ break_if_match } \
\
if (! keep_stream) \
internal_endent (); \
} \
keep_stream is initialized from the stayopen flag in internal_setent.
internal_setent is called from the set*ent implementation as:
status = internal_setent (stayopen);
However, for non-host database, this flag is always 0, per the
STAYOPEN magic in nss/getXXent_r.c.
Thus, the fix is this:
- status = internal_setent (stayopen);
+ status = internal_setent (1);
This is not a behavioral change even for the hosts database (where the
application can specify the stayopen flag) because with a call to
sethostent(0), the file handle is still not closed in the
implementation of gethostent.
Paul Pluzhnikov [Fri, 6 Feb 2015 05:30:42 +0000 (00:30 -0500)]
CVE-2015-1472: wscanf allocates too little memory
BZ #16618
Under certain conditions wscanf can allocate too little memory for the
to-be-scanned arguments and overflow the allocated buffer. The
implementation now correctly computes the required buffer size when
using malloc.
Carlos O'Donell [Wed, 19 Nov 2014 16:44:12 +0000 (11:44 -0500)]
CVE-2014-7817: wordexp fails to honour WRDE_NOCMD.
The function wordexp() fails to properly handle the WRDE_NOCMD
flag when processing arithmetic inputs in the form of "$((... ``))"
where "..." can be anything valid. The backticks in the arithmetic
epxression are evaluated by in a shell even if WRDE_NOCMD forbade
command substitution. This allows an attacker to attempt to pass
dangerous commands via constructs of the above form, and bypass
the WRDE_NOCMD flag. This patch fixes this by checking for WRDE_NOCMD
in exec_comm(), the only place that can execute a shell. All other
checks for WRDE_NOCMD are superfluous and removed.
We expand the testsuite and add 3 new regression tests of roughly
the same form but with a couple of nested levels.
On top of the 3 new tests we add fork validation to the WRDE_NOCMD
testing. If any forks are detected during the execution of a wordexp()
call with WRDE_NOCMD, the test is marked as failed. This is slightly
heuristic since vfork might be used in the future, but it provides a
higher level of assurance that no shells were executed as part of
command substitution with WRDE_NOCMD in effect. In addition it doesn't
require libpthread or libdl, instead we use the public implementation
namespace function __register_atfork (already part of the public ABI
for libpthread).
Will Newton [Fri, 16 Aug 2013 11:54:29 +0000 (12:54 +0100)]
malloc: Check for integer overflow in memalign.
A large bytes parameter to memalign could cause an integer overflow
and corrupt allocator internals. Check the overflow does not occur
before continuing with the allocation.
ChangeLog:
2013-09-11 Will Newton <will.newton@linaro.org>
[BZ #15857]
* malloc/malloc.c (__libc_memalign): Check the value of bytes
does not overflow.
Will Newton [Fri, 16 Aug 2013 10:59:37 +0000 (11:59 +0100)]
malloc: Check for integer overflow in valloc.
A large bytes parameter to valloc could cause an integer overflow
and corrupt allocator internals. Check the overflow does not occur
before continuing with the allocation.
ChangeLog:
2013-09-11 Will Newton <will.newton@linaro.org>
[BZ #15856]
* malloc/malloc.c (__libc_valloc): Check the value of bytes
does not overflow.
David S. Miller [Wed, 30 Apr 2014 19:57:51 +0000 (12:57 -0700)]
Fix v9/64-bit strcmp when string ends in multiple zero bytes.
[BZ #16885]
* sysdeps/sparc/sparc64/strcmp.S: Fix end comparison handling when
multiple zero bytes exist at the end of a string.
Reported by Aurelien Jarno <aurelien@aurel32.net>
* string/test-strcmp.c (check): Add explicit test for situations where
there are multiple zero bytes after the first.
Brooks Moses [Tue, 18 Feb 2014 17:18:44 +0000 (11:18 -0600)]
Fix erroneous (and circular) implied pattern rule for linkobj/libc.so.
[BZ #15915] As described in the bug, the pattern rule for lib%.so files
in Makerules includes linkobj/libc.so as a dependency. However, the
explicit rule for linkobj/libc.so is in the top-level Makefile.
Thus, the subdirectory makefiles that include Makerules end up with an
erroneous makefile pattern rule for linkobj/libc.so that includes
itself as a dependency. The result is make warnings whenever rules
for other .so files are resolved -- and, on occasion, actual makefile
failures when a race condition causes the implicit rule to actually be
used.
This patch moves the explicit rules for linkobj/libc.so into Makerules
to clear up this problem. It also elaborates a couple of comments
that I'd initially found confusing.
This patch adds GLIBC_2.3 mark on libm so it is always
define in abi-versions.h. This fixes a build issue with
fe_nomask.c in PPC64 LE where GLIBC_2_3 are no defined
for SHLIB_COMPAT, resulting in a wrong evaluation in the macro.
This patch creates implicit rules to match the abifiles if
abilist-pattern is defined in the architecture Makefile. This allows
machine specific Makefiles to define different abifiles names
(for instance *-le.abilist for powerpc64le).
H.J. Lu [Wed, 29 Jan 2014 15:51:41 +0000 (07:51 -0800)]
Disable x87 inline functions for SSE2 math
When i386 and x86-64 mathinline.h was merged into a single mathinline.h,
"gcc -m32" enables x87 inline functions on x86-64 even when -mfpmath=sse
and SSE2 is enabled. It is a regression on x86-64. We should check
__SSE2_MATH__ instead of __x86_64__ when disabling x87 inline functions.
The IFUNC selector for gettimeofday runs before _libc_vdso_platform_setup where
__vdso_gettimeofday is set. The selector then sets __gettimeofday (the internal
version used within GLIBC) to use the system call version instead of the vDSO one.
This patch changes the check if vDSO is available to get its value directly
instead of rely on __vdso_gettimeofday.
This patch changes it by getting the vDSO value directly.
PowerPC: remove wrong truncl implementation for PowerPC64
The truncl assembly implementation (sysdeps/powerpc/powerpc64/fpu/s_truncl.S)
returns wrong results for some inputs where first double is a exact integer
and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit 5c68d401698a58cf7da150d9cce769fa6679ba5f that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_truncl.c instead it fixes tgammal
issues regarding wrong result sign.
Tom Tromey [Mon, 20 Jan 2014 12:58:03 +0000 (12:58 +0000)]
[AArch64] BZ #16169 Add CFI directives to clone.S
[BZ #16169] Add CFI directives to the AArch64 clone.S implementation
and ensure that the FP in the child is zero'd in order to comply with
AAPCS.
(cherry picked from commit 3a3acb6afc753475675b5724f206e619d0c9590d)
H.J. Lu [Mon, 20 Jan 2014 19:05:22 +0000 (11:05 -0800)]
Include generic symbol-hacks.h for x32
In BZ #15605 fix with addding memset/memmove alias in symbol-hacks.h,
x32 symbol-hacks.h change was missing. Fixed by including
<sysdeps/generic/symbol-hacks.h> in x32 symbol-hacks.h.
PowerPC: Fix ftime gettimeofday internal call returning bogus data
This patches fixes BZ#16430 by setting a different symbol for internal
GLIBC calls that points to ifunc resolvers. For PPC32, if the symbol
is defined as hidden (which is the case for gettimeofday and time) the
compiler will create local branches (symbol@local) and linker will not
create PLT calls (required for IFUNC). This will leads to internal symbol
calling the IFUNC resolver instead of the resolved symbol.
For PPC64 this behavior does not occur because a call to a function in
another translation unit might use a different toc pointer thus requiring
a PLT call.
Maxim Kuvyrkov [Mon, 23 Dec 2013 20:44:50 +0000 (09:44 +1300)]
Fix race in free() of fastbin chunk: BZ #15073
Perform sanity check only if we have_lock. Due to lockless nature of fastbins
we need to be careful derefencing pointers to fastbin entries (chunksize(old)
in this case) in multithreaded environments.
The fix is to add have_lock to the if-condition checks. The rest of the patch
only makes code more readable.
* malloc/malloc.c (_int_free): Perform sanity check only if we
have_lock.
Alan Modra [Sat, 17 Aug 2013 09:41:11 +0000 (19:11 +0930)]
A little more precise %Lf for IBM long double
http://sourceware.org/ml/libc-alpha/2013-08/msg00106.html
IBM long double has variable precision and we often have values with
107 bits during the normal course of calculations. It's possible to
have an IBM long double with 2k bits of precision, and while some might
argue we ought to print to the full precision, I'm not inclined to try
to implement that.
So this patch gives the mpn values returned from ldbl2mpn() an extra
11 bits of precision. To do that it's necessary to inform the caller
of __mpn_extract_long_double() exactly how many bits are returned
rather than assuming a fixed LDBL_MANT_DIG bits. I did that
indirectly by returning the number of zero bits in the last mp_limb,
which turns out to be convenient both in __mpn_extract_long_double(),
and it's caller. Given the extra parameter it then becomes possible
to omit shifting in __mpn_extract_long_double(), since the caller does
that anyway.
In the following, note that 1 - IEEE854_LONG_DOUBLE_BIAS is equal to
LDBL_MIN_EXP - 1. I prefer the former since we already use
IEEE854_LONG_DOUBLE_BIAS in the exponent calculation for normalised
values, and it reinforces the fact that denormals are treated as if
their unbiased exponent was 1.
[BZ #5268]
* include/gmp.h (__mpn_extract_double, __mpn_extract_long_double):
Update prototypes.
* stdio-common/printf_fp.c: Likewise.
(__printf_fp): Use returned zero_bits.
* stdlib/dbl2mpn.c (__mpn_extract_double): Update funtion arguments.
* sysdeps/i386/ldbl2mpn.c (__mpn_extract_long_double): Don't perform
shifts for denormals here. Instead return leading zero bit count
and adjust return value of function.
* sysdeps/ieee754/dbl-64/dbl2mpn.c (__mpn_extract_long_double): Likewise.
* sysdeps/ieee754/ldbl-128/ldbl2mpn.c (__mpn_extract_long_double):
Likewise.
* sysdeps/ieee754/ldbl-96/ldbl2mpn.c (__mpn_extract_long_double):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/ldbl2mpn.c (__mpn_extract_long_double):
Likewise. Return an extra 11 bits of precision.
Ulrich Weigand [Fri, 15 Nov 2013 18:04:30 +0000 (12:04 -0600)]
PowerPC64 ELFv2 ABI 6/6: Bump ld.so soname version number
To avoid having a ELFv2 binary accidentally picking up an old ABI ld.so,
this patch bumps the soname to ld64.so.2.
In theory (or for testing purposes) this will also allow co-installing
ld.so versions for both ABIs on the same system. Note that the kernel
will already be able to load executables of both ABIs. However, there
is currently no plan to use that theoretical possibility in a any
supported distribution environment ...
Note that in order to check which ABI to use, we need to invoke the
compiler to check the _CALL_ELF macro; this is done in a new configure
check in sysdeps/unix/sysv/linux/powerpc/powerpc64/configure.ac,
replacing the hard-coded value of default-abi in the Makefile.
Ulrich Weigand [Fri, 15 Nov 2013 18:01:33 +0000 (12:01 -0600)]
PowerPC64 ELFv2 ABI 5/6: LD_AUDIT interface changes
The ELFv2 ABI changes the calling convention by passing and returning
structures in registers in more cases than the old ABI:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01145.html
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01147.html
For the most part, this does not affect glibc, since glibc assembler
files do not use structure parameters / return values. However, one
place is affected: the LD_AUDIT interface provides a structure to
the audit routine that contains all registers holding function
argument and return values for the intercepted PLT call.
Since the new ABI now sometimes uses registers to return values
that were never used for this purpose in the old ABI, this structure
has to be extended. To force audit routines to be modified for the
new ABI if necessary, the patch defines v2 variants of the la_ppc64
types and routines.
In addition, the patch contains two unrelated changes to the
PLT trampoline routines: it fixes a bug where FPR return values
were stored in the wrong place, and it removes the unnecessary
save/restore of CR.
Ulrich Weigand [Fri, 15 Nov 2013 17:59:39 +0000 (11:59 -0600)]
PowerPC64 ELFv2 ABI 4/6: Stack frame layout changes
This updates glibc for the changes in the ELFv2 relating to the
stack frame layout. These are described in more detail here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01149.html
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01146.html
Specifically, the "compiler and linker doublewords" were removed,
which has the effect that the save slot for the TOC register is
now at offset 24 rather than 40 to the stack pointer.
In addition, a function may now no longer necessarily assume that
its caller has set up a 64-byte register save area its use.
To address the first change, the patch goes through all assembler
files and replaces immediate offsets in instructions accessing the
ABI-defined stack slots by symbolic offsets. Those already were
defined in ucontext_i.sym and used in some of the context routines,
but that doesn't really seem like the right place for those defines.
The patch instead defines those symbolic offsets in sysdeps.h,
in two variants for the old and new ABI, and uses them systematically
in all assembler files, not just the context routines.
The second change only affected a few assembler files that used
the save area to temporarily store some registers. In those
cases where this happens within a leaf function, this patch
changes the code to store those registers to the "red zone"
below the stack pointer. Otherwise, the functions already allocate
a stack frame, and the patch changes them to add extra space in
these frames as temporary space for the ELFv2 ABI.
Ulrich Weigand [Fri, 15 Nov 2013 17:56:31 +0000 (11:56 -0600)]
PowerPC64 ELFv2 ABI 3/6: PLT local entry point optimization
This is a follow-on to the previous patch to support the ELFv2 ABI in the
dynamic loader, split off into its own patch since it is just an optional
optimization.
In the ELFv2 ABI, most functions define both a global and a local entry
point; the local entry requires r2 to be already set up by the caller
to point to the callee's TOC; while the global entry does not require
the caller to know about the callee's TOC, but it needs to set up r12
to the callee's entry point address.
Now, when setting up a PLT slot, the dynamic linker will usually need
to enter the target function's global entry point. However, if the
linker can prove that the target function is in the same DSO as the
PLT slot itself, and the whole DSO only uses a single TOC (which the
linker will let ld.so know via a DT_PPC64_OPT entry), then it is
possible to actually enter the local entry point address into the
PLT slot, for a slight improvement in performance.
Note that this uncovered a problem on the first call via _dl_runtime_resolve,
because that routine neglected to restore the caller's TOC before calling
the target function for the first time, since it assumed that function
would always reload its own TOC anyway ...
Ulrich Weigand [Fri, 15 Nov 2013 17:54:51 +0000 (11:54 -0600)]
PowerPC64 ELFv2 ABI 2/6: Remove function descriptors
This patch adds support for the ELFv2 ABI feature to remove function
descriptors. See this GCC patch for in-depth discussion:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01141.html
This mostly involves two types of changes: updating assembler source
files to the new logic, and updating the dynamic loader.
After the refactoring in the previous patch, most of the assembler source
changes can be handled simply by providing ELFv2 versions of the
macros in sysdep.h. One somewhat non-obvious change is in __GI__setjmp:
this used to "fall through" to the immediately following __setjmp ENTRY
point. This is no longer safe in the ELFv2 since ENTRY defines both
a global and a local entry point, and you cannot simply fall through
to a global entry point as it requires r12 to be set up.
Also, makecontext needs to be updated to set up registers according to
the new ABI for calling into the context's start routine.
The dynamic linker changes mostly consist of removing special code
to handle function descriptors. We also need to support the new PLT
and glink format used by the the ELFv2 linker, see:
https://sourceware.org/ml/binutils/2013-10/msg00376.html
In addition, the dynamic linker now verifies that the dynamic libraries
it loads match its own ABI.
The hack in VDSO_IFUNC_RET to "synthesize" a function descriptor
for vDSO routines is also no longer necessary for ELFv2.
Ulrich Weigand [Fri, 15 Nov 2013 17:52:37 +0000 (11:52 -0600)]
PowerPC64 ELFv2 ABI 1/6: Code refactoring
This is the first patch to support the new ELFv2 ABI in glibc.
As preparation, this patch simply refactors some of the powerpc64 assembler
code to move all code related to creating function descriptors (.opd section)
or using function descriptors (function pointer call) into a central place
in sysdep.h.
Note that most locations creating .opd entries were already using macros
in sysdep.h, this patch simply extends this to the remaining places.
Alan Modra [Fri, 15 Nov 2013 17:51:18 +0000 (11:51 -0600)]
PowerPC64: Report overflow on @h and @ha relocations
This patch updates glibc in accordance with the binutils patch checked
in here:
https://sourceware.org/ml/binutils/2013-10/msg00372.html
This changes the various R_PPC64_..._HI and _HA relocations to report
32-bit overflows. The motivation is that existing uses of @h / @ha
are to build up 32-bit offsets (for the "medium model" TOC access
that GCC now defaults to), and we'd really like to see failures at
link / load time rather than silent truncations.
For those rare cases where a modifier is needed to build up a 64-bit
constant, new relocations _HIGH / _HIGHA are supported.
The patch also fixes a bug in overflow checking for the R_PPC64_ADDR30
and R_PPC64_ADDR32 relocations.
Ulrich Weigand [Fri, 15 Nov 2013 17:49:21 +0000 (11:49 -0600)]
PowerPC64: Fix incorrect CFI in *context routines
The context established by "makecontext" has a link register pointing
back to an error path within the makecontext routine. This is currently
covered by the CFI FDE for makecontext itself, which is simply wrong
for the stack frame *inside* the context. When trying to unwind (e.g.
doing a backtrace) in a routine inside a context created by makecontext,
this can lead to uninitialized stack slots being accessed, causing the
unwinder to crash in the worst case.
Similarly, during parts of the "setcontext" routine, when the stack
pointer has already been switched to point to the new context, the
address range is still covered by the CFI FDE for setcontext. When
trying to unwind in that situation (e.g. backtrace from an async
signal handler for profiling), it is again possible that the unwinder
crashes.
Theses are all problems in existing code, but the changes in stack
frame layout appear to make the "worst case" much more likely in
the ELFv2 ABI context. This causes regressions e.g. in the libgo
testsuite on ELFv2.
This patch fixes this by ending the makecontext/setcontext FDEs
before those problematic parts of the assembler, similar to what
is already done on other platforms. This fixes the libgo
regression on ELFv2.
Ulrich Weigand [Fri, 15 Nov 2013 17:47:44 +0000 (11:47 -0600)]
PowerPC64: Add __private_ss field to TCB header
the TCB header on Intel contains a field __private_ss that is used
to efficiently implement the -fsplit-stack GCC feature.
In order to prepare for a possible future implementation of that
feature on powerpc64, we'd like to reserve a similar field in
the TCB header as well. (It would be good if this went in with
or before the ELFv2 patches to ensure that this field will be
available always in the ELFv2 environment.)
The field needs to be added at the front of tcbhead_t structure
to avoid changing the ABI; see the recent discussion when adding
the EBB fields.
This patch fixes the vDSO symbol used directed in IFUNC resolver where
they do not have an associated ODP entry leading to undefined behavior
in some cases. It adds an artificial OPD static entry to such cases
and set its TOC to non 0 to avoid triggering lazy resolutions.
Will Newton [Fri, 16 Aug 2013 11:54:29 +0000 (12:54 +0100)]
malloc: Check for integer overflow in memalign.
A large bytes parameter to memalign could cause an integer overflow
and corrupt allocator internals. Check the overflow does not occur
before continuing with the allocation.
ChangeLog:
2013-09-11 Will Newton <will.newton@linaro.org>
[BZ #15857]
* malloc/malloc.c (__libc_memalign): Check the value of bytes
does not overflow.
Will Newton [Fri, 16 Aug 2013 10:59:37 +0000 (11:59 +0100)]
malloc: Check for integer overflow in valloc.
A large bytes parameter to valloc could cause an integer overflow
and corrupt allocator internals. Check the overflow does not occur
before continuing with the allocation.
ChangeLog:
2013-09-11 Will Newton <will.newton@linaro.org>
[BZ #15856]
* malloc/malloc.c (__libc_valloc): Check the value of bytes
does not overflow.
Will Newton [Mon, 12 Aug 2013 14:08:02 +0000 (15:08 +0100)]
malloc: Check for integer overflow in pvalloc.
A large bytes parameter to pvalloc could cause an integer overflow
and corrupt allocator internals. Check the overflow does not occur
before continuing with the allocation.
ChangeLog:
2013-09-11 Will Newton <will.newton@linaro.org>
[BZ #15855]
* malloc/malloc.c (__libc_pvalloc): Check the value of bytes
does not overflow.
Carlos O'Donell [Mon, 23 Sep 2013 04:52:09 +0000 (00:52 -0400)]
BZ #15754: CVE-2013-4788
The pointer guard used for pointer mangling was not initialized for
static applications resulting in the security feature being disabled.
The pointer guard is now correctly initialized to a random value for
static applications. Existing static applications need to be
recompiled to take advantage of the fix.
The test tst-ptrguard1-static and tst-ptrguard1 add regression
coverage to ensure the pointer guards are sufficiently random
and initialized to a default value.
Check for integer overflow in cache size computation in strcoll
strcoll is implemented using a cache for indices and weights of
collation sequences in the strings so that subsequent passes do not
have to search through collation data again. For very large string
inputs, the cache size computation could overflow. In such a case,
use the fallback function that does not cache indices and weights of
collation sequences.
Fall back to non-cached sequence traversal and comparison on malloc fail
strcoll currently falls back to alloca if malloc fails, resulting in a
possible stack overflow. This patch implements sequence traversal and
comparison without caching indices and rules.
PowerPC: strcpy/stpcpy optimization for PPC64/POWER7
This patch intends to unify both strcpy and stpcpy implementationsi
for PPC64 and PPC64/POWER7. The idead default powerpc64 implementation
is to provide both doubleword and word aligned memory access.
For PPC64/POWER7 is also provide doubleword and word memory access,
remove the branch hints, use the cmpb instruction for compare
doubleword/words, and add an optimization for inputs of same alignment.
This patch fixes another stack overflow in getaddrinfo when it is
called with AF_INET6. The AF_UNSPEC case was fixed as CVE-2013-1914,
but the AF_INET6 case went undetected back then.
Alan Modra [Fri, 4 Oct 2013 03:18:51 +0000 (12:48 +0930)]
Use stdint.h types in union unaligned.
* sysdeps/powerpc/powerpc32/dl-machine.c (__process_machine_rela):
Use stdint types in rather than __attribute__((mode())).
* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise.
Alan Modra [Sat, 17 Aug 2013 09:18:36 +0000 (18:48 +0930)]
PowerPC LE memchr and memrchr
http://sourceware.org/ml/libc-alpha/2013-08/msg00105.html
Like strnlen, memchr and memrchr had a number of defects fixed by this
patch as well as adding little-endian support. The first one I
noticed was that the entry to the main loop needlessly checked for
"are we done yet?" when we know the size is large enough that we can't
be done. The second defect I noticed was that the main loop count was
wrong, which in turn meant that the small loop needed to handle an
extra word. Thirdly, there is nothing to say that the string can't
wrap around zero, except of course that we'd normally hit a segfault
on trying to read from address zero. Fixing that simplified a number
of places:
- /* Are we done already? */
- addi r9,r8,8
- cmpld r9,r7
- bge L(null)
becomes
+ cmpld r8,r7
+ beqlr
However, the exit gets an extra test because I test for being on the
last word then if so whether the byte offset is less than the end.
Overall, the change is a win.
Lastly, memrchr used the wrong cache hint.
* sysdeps/powerpc/powerpc64/power7/memchr.S: Replace rlwimi with
insrdi. Make better use of reg selection to speed exit slightly.
Schedule entry path a little better. Remove useless "are we done"
checks on entry to main loop. Handle wrapping around zero address.
Correct main loop count. Handle single left-over word from main
loop inline rather than by using loop_small. Remove extra word
case in loop_small caused by wrong loop count. Add little-endian
support.
* sysdeps/powerpc/powerpc32/power7/memchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise. Use proper
cache hint.
* sysdeps/powerpc/powerpc32/power7/memrchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Add little-endian
support. Avoid rlwimi.
* sysdeps/powerpc/powerpc32/power7/rawmemchr.S: Likewise.
Alan Modra [Sat, 17 Aug 2013 09:17:59 +0000 (18:47 +0930)]
PowerPC LE memset
http://sourceware.org/ml/libc-alpha/2013-08/msg00104.html
One of the things I noticed when looking at power7 timing is that rlwimi
is cracked and the two resulting insns have a register dependency.
That makes it a little slower than the equivalent rldimi.
Alan Modra [Sat, 17 Aug 2013 09:17:22 +0000 (18:47 +0930)]
PowerPC LE memcpy
http://sourceware.org/ml/libc-alpha/2013-08/msg00103.html
LIttle-endian support for memcpy. I spent some time cleaning up the
64-bit power7 memcpy, in order to avoid the extra alignment traps
power7 takes for little-endian. It probably would have been better
to copy the linux kernel version of memcpy.
* sysdeps/powerpc/powerpc32/power4/memcpy.S: Add little endian support.
* sysdeps/powerpc/powerpc32/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/mempcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise. Make better
use of regs. Use power7 mtocrf. Tidy function tails.
Alan Modra [Sat, 17 Aug 2013 09:16:47 +0000 (18:46 +0930)]
PowerPC LE memcmp
http://sourceware.org/ml/libc-alpha/2013-08/msg00102.html
This is a rather large patch due to formatting and renaming. The
formatting changes were to make it possible to compare power7 and
power4 versions of memcmp. Using different register defines came
about while I was wrestling with the code, trying to find spare
registers at one stage. I found it much simpler if we refer to a reg
by the same name throughout a function, so it's better if short-term
multiple use regs like rTMP are referred to using their register
number. I made the cr field usage changes when attempting to reload
rWORDn regs in the exit path to byte swap before comparing when
little-endian. That proved a bad idea due to the pipelining involved
in the main loop; Offsets to reload the regs were different first
time around the loop.. Anyway, I left the cr field usage changes in
place for consistency.
Aside from these more-or-less cosmetic changes, I fixed a number of
places where an early exit path restores regs unnecessarily, removed
some dead code, and optimised one or two exits.
* sysdeps/powerpc/powerpc64/power7/memcmp.S: Add little-endian support.
Formatting. Consistently use rXXX register defines or rN defines.
Use early exit labels that avoid restoring unused non-volatile regs.
Make cr field use more consistent with rWORDn compares. Rename
regs used as shift registers for unaligned loop, using rN defines
for short lifetime/multiple use regs.
* sysdeps/powerpc/powerpc64/power4/memcmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memcmp.S: Likewise. Exit with
addi 1,1,64 to pop stack frame. Simplify return value code.
* sysdeps/powerpc/powerpc32/power4/memcmp.S: Likewise.
Alan Modra [Sat, 17 Aug 2013 09:16:05 +0000 (18:46 +0930)]
PowerPC LE strchr
http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html
Adds little-endian support to optimised strchr assembly. I've also
tweaked the big-endian code a little. In power7/strchr.S there's a
check in the tail of the function that we didn't match 0 before
finding a c match, done by comparing leading zero counts. It's just
as valid, and quicker, to compare the raw output from cmpb.
Another little tweak is to use rldimi/insrdi in place of rlwimi for
the power7 strchr functions. Since rlwimi is cracked, it is a few
cycles slower. rldimi can be used on the 32-bit power7 functions
too.
* sysdeps/powerpc/powerpc64/power7/strchr.S (strchr): Add little-endian
support. Correct typos, formatting. Optimize tail. Use insrdi
rather than rlwimi.
* sysdeps/powerpc/powerpc32/power7/strchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strchrnul.S (__strchrnul): Add
little-endian support. Correct typos.
* sysdeps/powerpc/powerpc32/power7/strchrnul.S: Likewise. Use insrdi
rather than rlwimi.
* sysdeps/powerpc/powerpc64/strchr.S (rTMP4, rTMP5): Define. Use
in loop and entry code to keep "and." results.
(strchr): Add little-endian support. Comment. Move cntlzd
earlier in tail.
* sysdeps/powerpc/powerpc32/strchr.S: Likewise.
Alan Modra [Sat, 17 Aug 2013 09:11:17 +0000 (18:41 +0930)]
PowerPC LE strcmp and strncmp
http://sourceware.org/ml/libc-alpha/2013-08/msg00099.html
More little-endian support. I leave the main strcmp loops unchanged,
(well, except for renumbering rTMP to something other than r0 since
it's needed in an addi insn) and modify the tail for little-endian.
I noticed some of the big-endian tail code was a little untidy so have
cleaned that up too.
Alan Modra [Sat, 17 Aug 2013 09:10:48 +0000 (18:40 +0930)]
PowerPC LE strnlen
http://sourceware.org/ml/libc-alpha/2013-08/msg00098.html
The existing strnlen code has a number of defects, so this patch is more
than just adding little-endian support. The changes here are similar to
those for memchr.
* sysdeps/powerpc/powerpc64/power7/strnlen.S (strnlen): Add
little-endian support. Remove unnecessary "are we done" tests.
Handle "s" wrapping around zero and extremely large "size".
Correct main loop count. Handle single left-over word from main
loop inline rather than by using small_loop. Correct comments.
Delete "zero" tail, use "end_max" instead.
* sysdeps/powerpc/powerpc32/power7/strnlen.S: Likewise.
Alan Modra [Sat, 17 Aug 2013 09:10:11 +0000 (18:40 +0930)]
PowerPC LE strlen
http://sourceware.org/ml/libc-alpha/2013-08/msg00097.html
This is the first of nine patches adding little-endian support to the
existing optimised string and memory functions. I did spend some
time with a power7 simulator looking at cycle by cycle behaviour for
memchr, but most of these patches have not been run on cpu simulators
to check that we are going as fast as possible. I'm sure PowerPC can
do better. However, the little-endian support mostly leaves main
loops unchanged, so I'm banking on previous authors having done a
good job on big-endian.. As with most code you stare at long enough,
I found some improvements for big-endian too.
Little-endian support for strlen. Like most of the string functions,
I leave the main word or multiple-word loops substantially unchanged,
just needing to modify the tail.
Removing the branch in the power7 functions is just a tidy. .align
produces a branch anyway. Modifying regs in the non-power7 functions
is to suit the new little-endian tail.
* sysdeps/powerpc/powerpc64/power7/strlen.S (strlen): Add little-endian
support. Don't branch over align.
* sysdeps/powerpc/powerpc32/power7/strlen.S: Likewise.
* sysdeps/powerpc/powerpc64/strlen.S (strlen): Add little-endian support.
Rearrange tmp reg use to suit. Comment.
* sysdeps/powerpc/powerpc32/strlen.S: Likewise.
This copies the sparc version of sigstack.h, which gives powerpc
#define MINSIGSTKSZ 4096
#define SIGSTKSZ 16384
Before the VSX changes, struct rt_sigframe size was 1920 plus 128 for
__SIGNAL_FRAMESIZE giving ppc64 exactly the default MINSIGSTKSZ of
2048.
After VSX, ucontext increased by 256 bytes. Oops, we're over
MINSIGSTKSZ, so powerpc has been using the wrong value for quite a
while. Add another ucontext for TM and rt_sigframe is now at 3872,
giving actual MINSIGSTKSZ of 4000.
The glibc testcase that I was looking at was tst-cancel21, which
allocates 2*SIGSTKSZ (not because the test is trying to be
conservative, but because the test actually has nested signal stack
frames). We blew the allocation by 48 bytes when using current
mainline gcc to compile glibc (le ppc64).
The required stack depth in _dl_lookup_symbol_x from the top of the
next signal frame was 10944 bytes. I guess you'd want to add 288 to
that, implying an actual SIGSTKSZ of 11232.
* sysdeps/unix/sysv/linux/powerpc/bits/sigstack.h: New file.
Use conditional form of branch and link to avoid destroying the cpu
link stack used to predict blr return addresses.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/makecontext.S: Use
conditional form of branch and link when obtaining pc.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/makecontext.S: Likewise.