This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
I checked performance on machine with SSE4_1 and without SSE4_2. So, SSE4_1 version is faster than SSSE3 on the machine because of fast unaligned loads and stuff like that. I agree that SSE 4.1 is not really needed, we can just replace ptest with "pmovmskb + test" pair and performance will be nearly identical and call the implementation as memcmp_sse2_unaligned version. Then it will look similar as strcpy, memcpy, e.t.c. dispatching. -- Liubov On Thu, Jul 25, 2013 at 2:22 AM, Matt Turner <mattst88@gmail.com> wrote: > On Thu, Jul 11, 2013 at 7:07 AM, Liubov Dmitrieva > <liubov.dmitrieva@gmail.com> wrote: >> My Silvermont patch in the latest edition doesn't touch memcmp and >> wmemcmp at all because I didn't see good boost from switching SSE42 >> off for these 2 functions. >> Now I see why. There are no SSE42 instruction there. :) >> The patch looks good. I will just check performance regressions for Penryn. > > Any performance numbers?
Attachment:
bench-memcmp-ifunc.out
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |