This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2
On Fri, 1 Jun 2018, Leonardo Sandoval wrote:
> this is partially true for AVX2 FMA and AVX512. What I am proposing
> contains none of the latter instructions, just AVX2 without FMA
> instructions.
This would address my concern (if true for all CPUs), but ...
> In the other hand, some microbenchmarks were done to see the benefit of
> this effort, which is resumed on the commit description but the
> complete picture is here
this does not. The whole point was that frequency behavior means the
slowdown on programs making *occasional* calls to strcmp will not be
captured by microbenchmarks. What good is saving dozens of cycles on
strcmp calls if the remaining program is slowed down by 5%?
I was missing that AVX frequency limits kick in only if "heavy" operations
are used -- on recent generations. I'm not sure that's true for older, e.g.
Haswell, generations. Intel's white paper explaining Haswell AVX clocks
makes no distinction of "light" vs. "heavy" operations:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf
Can you please clarify further?
Alexander