This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2

From: "H.J. Lu" <hjl dot tools at gmail dot com>
To: Florian Weimer <fweimer at redhat dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>
Date: Thu, 1 Jun 2017 14:29:11 -0700
Subject: Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
Authentication-results: sourceware.org; auth=none
References: <20170601154519.GB14526@lucon.org> <33f989bd-5357-086a-27a7-7437718f5ac3@redhat.com> <CAMe9rOpYpksQnqBSZjF1dDM7YMr4Qj6hNdi1MCESBP825ysRrg@mail.gmail.com> <cbf2d909-206b-7552-60b6-f6912abd2c34@redhat.com> <CAMe9rOreBvFj04YcNUH83fkvyXy+awyPDk+Dd76GXVJUJkyYtA@mail.gmail.com> <970021c2-f07c-fe8b-8990-2f61d7fcce31@redhat.com> <CAMe9rOoKXizFHaTAwck9EzcUkXLC=bE960ELb=yB-=NoA5q0Gg@mail.gmail.com> <1e01f0e9-4398-6c49-7e5c-0aac2c334d67@redhat.com>

On Thu, Jun 1, 2017 at 2:20 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/01/2017 11:17 PM, H.J. Lu wrote:
>> On Thu, Jun 1, 2017 at 2:00 PM, Florian Weimer <fweimer@redhat.com> wrote:
>>> On 06/01/2017 10:57 PM, H.J. Lu wrote:
>>>> I don't think it works with memcmp since return value depends on
>>>> the first bytes which differs.  Say
>>>>
>>>> ABCDE   turns into EDCBDCBA
>>>>
>>>> If all bytes differs, we should only compare A, not EDCBDCBA.
>>>
>>> That's what the bswapq is for, it reverses the order of bytes.
>>>
>>
>> bswapq doesn't help since cmpq compares 8 bytes but only
>> the last byte matters.   Comparing the highest byte give you the
>> wrong result, like
>>
>> 0x36775382d1367753
>> 0x7b8d14025b7b8d14
>
> I don't understand.  On big-endian, to compare two 8-byte arrays as if
> by memcmp, you can certainly do a uint64_t load, compute the difference
> as a 65-bit value, and return the integer sign of that.
>
> The code I posted does that (modulo bugs, but you can get a working
> patch from the old message I referenced).  bswapq is needed to get an
> equivalent to that big-endian load.
>

I put memcmp-avx2.S on hjl/avx2/master branch and changed it
to

L(between_4_7):
        movl    (%rdi), %r8d
        movl    (%rsi), %ecx
        shlq    $32, %r8
        shlq    $32, %rcx
        movl    -4(%rdi, %rdx), %edi
        movl    -4(%rsi, %rdx), %esi
        orq     %rdi, %r8
        orq     %rsi, %rcx
        bswap   %r8
        bswap   %rcx
        cmpq    %rcx, %r8
        je      L(zero)
        sbbl    %eax, %eax
        orl     $1, %eax
        ret

and got

Iteration 70485 - wrong result in function __memcmp_avx2 (18, 26, 5,
0) -1 != 1, p1 0x7ffff7ff0e00 p2 0x7ffff7fece00

Where did I do wrong?


-- 
H.J.

Follow-Ups:
- Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
  - From: Florian Weimer

References:
- [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
  - From: H.J. Lu
- Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
  - From: Florian Weimer
- Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
  - From: H.J. Lu
- Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
  - From: Florian Weimer
- Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
  - From: H.J. Lu
- Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
  - From: Florian Weimer
- Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
  - From: H.J. Lu
- Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]