This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.

From: Liubov Dmitrieva <liubov dot dmitrieva at gmail dot com>
To: Matt Turner <mattst88 at gmail dot com>
Cc: "H.J. Lu" <hjl dot tools at gmail dot com>, Andreas Jaeger <aj at suse dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Thu, 25 Jul 2013 19:19:07 +0400
Subject: Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
References: <CAMe9rOreowCOEH+6zRaRNk_p9sYe3T2bhwPRbKpybW9cO0BhJA at mail dot gmail dot com> <1373419029-19125-1-git-send-email-mattst88 at gmail dot com> <51DCE51F dot 7000001 at suse dot com> <CAMe9rOqb3_DnhSh0jPh9=suJo5c+WjegxfDh1+1go6pY+7+PLA at mail dot gmail dot com> <CAEdQ38Go4UY=k==nYT_6S86-tsOoxOO=Wn=8_pNk+LkkxSxU_Q at mail dot gmail dot com> <CAMe9rOpgaNgGSdoM5rXdhLT-TqVEJjGMyHgKRP=t+2LrSTpFAA at mail dot gmail dot com> <CAEdQ38FBeyuJpQ1eSHnM5w=8MHD3cfFjgWekkXnRFHO+Aathnw at mail dot gmail dot com> <CAMe9rOompuMMzQm+RX=ejoPMX0uWmXarvSZa_fp-Fi1p_-8o1Q at mail dot gmail dot com> <CAHjhQ91+RSKU=1F4vQ1XrJ=1j1wAv6HuQJh_s9BzcBOOTP8BDg at mail dot gmail dot com> <CAEdQ38EX=Gni0kwK_Hqv71zGnACHD9EQ=A=U2498e88smrH7jQ at mail dot gmail dot com>

I checked performance on machine with SSE4_1 and without SSE4_2.
So, SSE4_1 version is faster than SSSE3 on the machine because of fast
unaligned loads and stuff like that.
I agree that SSE 4.1 is not really needed, we can just replace ptest
with "pmovmskb + test" pair and
performance will be nearly identical and call the implementation as
memcmp_sse2_unaligned version.
Then it will look similar as strcpy, memcpy, e.t.c. dispatching.

--
Liubov

On Thu, Jul 25, 2013 at 2:22 AM, Matt Turner <mattst88@gmail.com> wrote:
> On Thu, Jul 11, 2013 at 7:07 AM, Liubov Dmitrieva
> <liubov.dmitrieva@gmail.com> wrote:
>> My Silvermont patch in the latest edition doesn't touch memcmp and
>> wmemcmp at all because I didn't see good boost from switching SSE42
>> off for these 2 functions.
>> Now I see why. There are no SSE42 instruction there. :)
>> The patch looks good. I will just check performance regressions for Penryn.
>
> Any performance numbers?

Attachment: bench-memcmp-ifunc.out
Description: Binary data

References:
- Re: Does memcmp_sse4_2 actually use SSE 4.2 instructions?
  - From: H.J. Lu
- [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Matt Turner
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Andreas Jaeger
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: H.J. Lu
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Matt Turner
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: H.J. Lu
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Matt Turner
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: H.J. Lu
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Liubov Dmitrieva
- Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
  - From: Matt Turner

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]