This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Rewritten v9/64-bit sparc strcmp.
- From: Aurelien Jarno <aurelien at aurel32 dot net>
- To: David Miller <davem at davemloft dot net>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 29 Apr 2014 11:53:39 +0200
- Subject: Re: [PATCH] Rewritten v9/64-bit sparc strcmp.
- Authentication-results: sourceware.org; auth=none
- References: <20110824 dot 013854 dot 674433294054458127 dot davem at davemloft dot net>
On Wed, Aug 24, 2011 at 01:38:54AM -0700, David Miller wrote:
> This new code is heavily inspired by the powerpc 64-bit base strcmp.
> It's faster than the existing code, especially on Niagara cpus as
> the number of branches has been minimized to reduce cpu thread
> On UltraSPARC-3 the tail code executes in a constant 5 cycles,
> regardless of where the mismatching/zero byte is. The main aligned
> loop executes in 4 cycles, which with a 2 cycle load latency is
> essentially optimal.
> Committed to master.
> ChangeLog | 4 +
> sysdeps/sparc/sparc64/strcmp.S | 416 ++++++++++++++++------------------------
> 2 files changed, 173 insertions(+), 247 deletions(-)
> + retl
> + mov 0, %o0
> + /* All loops terminate here once they find an unequal word.
> + * If a zero byte appears in the word before the first unequal
> + * byte, we must report zero. Otherwise we report '1' or '-1'
> + * depending upon whether the first mis-matching byte is larger
> + * in the first string or the second, respectively.
> + *
> + * First we compute a 64-bit mask value that has "0x01" in
> + * each byte where a zero exists in rWORD1. rSTRXOR holds the
> + * value (rWORD1 ^ rWORD2). Therefore, if considered as an
> + * unsigned quantity, our "0x01" mask value is "greater than"
> + * rSTRXOR then a zero terminating byte comes first and
> + * therefore we report '0'.
> + *
> + * The formula for this mask is:
> + *
> + * mask_tmp1 = ~rWORD1 & 0x8080808080808080;
> + * mask_tmp2 = ((rWORD1 & 0x7f7f7f7f7f7f7f7f) +
> + * 0x7f7f7f7f7f7f7f7f);
> + *
> + * mask = ((mask_tmp1 & ~mask_tmp2) >> 7);
This method doesn't work when comparing a 0x00 char in string 1 and 0x01
char in string 2. In that case the mask for this byte is 0x01 and the
corresponding xor is also 0x01. The result of the comparison therefore
depends on the garbage after the end of the string.
On Debian  this causes for example debian-installer to fail to build
, and it might be the source of the random segfaults which we are
trying to debug for a few years.
Aurelien Jarno GPG: 4096R/1DDD8C9B