This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2


On Fri, 1 Jun 2018, Leonardo Sandoval wrote:
> this is partially true for AVX2 FMA and AVX512. What I am proposing
> contains none of the latter instructions, just AVX2 without FMA
> instructions.

This would address my concern (if true for all CPUs), but ...

> In the other hand, some microbenchmarks were done to see the benefit of
> this effort, which is resumed on the commit description but the
> complete picture is here 

this does not. The whole point was that frequency behavior means the
slowdown on programs making *occasional* calls to strcmp will not be
captured by microbenchmarks. What good is saving dozens of cycles on
strcmp calls if the remaining program is slowed down by 5%?

I was missing that AVX frequency limits kick in only if "heavy" operations
are used -- on recent generations. I'm not sure that's true for older, e.g.
Haswell, generations. Intel's white paper explaining Haswell AVX clocks
makes no distinction of "light" vs. "heavy" operations:

https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf

Can you please clarify further?

Alexander


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]