This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] faster string operations for buldozer.
On Wed, Sep 26, 2012 at 12:44:23PM -0700, Roland McGrath wrote:
> > 2012-09-26 Ondrej Bilka <neleai@seznam.cz>
> >
> > * sysdeps/x86_64/multiarch/init_arch.c (__init_cpu_features):
> > Set bit_Prefer_PMINUB_for_stringop for AMD processors.
> > Set bit_Fast_Unaligned_Load for AMD processors with AVX
>
> Use tabs for indentation, not 8 spaces.
> Missing period on the second sentence.
>
> > @@ -131,6 +131,9 @@ __init_cpu_features (void)
> > __cpu_features.feature[index_Prefer_SSE_for_memop]
> > |= bit_Prefer_SSE_for_memop;
> >
> > + /* Assuming unaligned loads are fast when avx is available.*/
>
> "Assume", not "Assuming". AVX in caps. Two spaces after a sentence.
>
> > + if ((ecx & bit_AVX) != 0)
> > + __cpu_features.feature[index_Fast_Rep_String]
> > + |= ( bit_Fast_Unaligned_Load);
>
> Excess space and paren here. The indentation here is wrong.
> The consequent should be two spaces right of the "if".
> The |= should be two spaces right of that.
>
> > + __cpu_features.feature[index_Fast_Rep_String]
> > + |= bit_Prefer_PMINUB_for_stringop;
>
> Same indentation problem here.
>
> Once all these nits are fixed, for a performance change we need to see
> benchmark data. I thought Carlos had made a wiki page about benchmark
> requirements for supposed performance enhancements, but I'm not sure.
For strlen performance of glibc variants on sse4_2 capable processors is following.
http://kam.mff.cuni.cz/~ondra/benchmark_string/i7/strlen/html/test_r.html
http://kam.mff.cuni.cz/~ondra/benchmark_string/i7_ivy_bridge/strlen/html/test_r.html
http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen/html/test_r.html
Note that sse4_2 version caused 25% slowdown from previos
implementation.
>
>
> Thanks,
> Roland
--
The cord jumped over and hit the power switch.