This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
On Thu, Jun 1, 2017 at 2:20 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/01/2017 11:17 PM, H.J. Lu wrote:
>> On Thu, Jun 1, 2017 at 2:00 PM, Florian Weimer <fweimer@redhat.com> wrote:
>>> On 06/01/2017 10:57 PM, H.J. Lu wrote:
>>>> I don't think it works with memcmp since return value depends on
>>>> the first bytes which differs. Say
>>>>
>>>> ABCDE turns into EDCBDCBA
>>>>
>>>> If all bytes differs, we should only compare A, not EDCBDCBA.
>>>
>>> That's what the bswapq is for, it reverses the order of bytes.
>>>
>>
>> bswapq doesn't help since cmpq compares 8 bytes but only
>> the last byte matters. Comparing the highest byte give you the
>> wrong result, like
>>
>> 0x36775382d1367753
>> 0x7b8d14025b7b8d14
>
> I don't understand. On big-endian, to compare two 8-byte arrays as if
> by memcmp, you can certainly do a uint64_t load, compute the difference
> as a 65-bit value, and return the integer sign of that.
>
> The code I posted does that (modulo bugs, but you can get a working
> patch from the old message I referenced). bswapq is needed to get an
> equivalent to that big-endian load.
>
I put memcmp-avx2.S on hjl/avx2/master branch and changed it
to
L(between_4_7):
movl (%rdi), %r8d
movl (%rsi), %ecx
shlq $32, %r8
shlq $32, %rcx
movl -4(%rdi, %rdx), %edi
movl -4(%rsi, %rdx), %esi
orq %rdi, %r8
orq %rsi, %rcx
bswap %r8
bswap %rcx
cmpq %rcx, %r8
je L(zero)
sbbl %eax, %eax
orl $1, %eax
ret
and got
Iteration 70485 - wrong result in function __memcmp_avx2 (18, 26, 5,
0) -1 != 1, p1 0x7ffff7ff0e00 p2 0x7ffff7fece00
Where did I do wrong?
--
H.J.