[PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.
Liubov Dmitrieva
liubov.dmitrieva@gmail.com
Thu Jul 25 15:19:00 GMT 2013
I checked performance on machine with SSE4_1 and without SSE4_2.
So, SSE4_1 version is faster than SSSE3 on the machine because of fast
unaligned loads and stuff like that.
I agree that SSE 4.1 is not really needed, we can just replace ptest
with "pmovmskb + test" pair and
performance will be nearly identical and call the implementation as
memcmp_sse2_unaligned version.
Then it will look similar as strcpy, memcpy, e.t.c. dispatching.
--
Liubov
On Thu, Jul 25, 2013 at 2:22 AM, Matt Turner <mattst88@gmail.com> wrote:
> On Thu, Jul 11, 2013 at 7:07 AM, Liubov Dmitrieva
> <liubov.dmitrieva@gmail.com> wrote:
>> My Silvermont patch in the latest edition doesn't touch memcmp and
>> wmemcmp at all because I didn't see good boost from switching SSE42
>> off for these 2 functions.
>> Now I see why. There are no SSE42 instruction there. :)
>> The patch looks good. I will just check performance regressions for Penryn.
>
> Any performance numbers?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bench-memcmp-ifunc.out
Type: application/octet-stream
Size: 6895 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/libc-alpha/attachments/20130725/653c7ec7/attachment.obj>
More information about the Libc-alpha
mailing list