[PATCH] powerpc: Remove power8 strcasestr optimization
Adhemerval Zanella Netto
adhemerval.zanella@linaro.org
Mon Mar 18 12:52:28 GMT 2024
On 15/03/24 16:57, Peter Bergner wrote:
> I'm sorry I haven't replied earlier, but I just got back from vacation.
> I also see you've pushed this already. That said...
>
> As with the power7 patch, I'm all for cleanups, especially if they
> simplify things, remove more code than they add, and make things
> faster.
>
>
> On 3/5/24 2:13 PM, Adhemerval Zanella wrote:
>> Similar to strstr (1e9a550ba4), power8 strcasestr does not show much
>> improvement compared to the generic implementation. The geomean
>> on bench-strcasestr shows:
>>
>> __strcasestr_power8 __strcasestr_ppc
>> power10 1159 1120
>> power9 1640 1469
>> power8 1787 1904
>
> The generic implementation being the one in string/strcasestr.c,
> correct? Then how do I read the performance numbers above?
Sorry about that, in fact I created these numbers with an improved
bench-strcasestr.c [1] benchmark based on bench-strstr.c and I forgot
to send it along the patch to remove the power implementation.
With this patch you should be able to use benchtests/scripts/compare_strings.py
to get the numbers.
>
> When Raji first added the power8 optimized routine, it was
> showing big speedups. I see that was before Wilco's changes
> to the generic routine. Do you think that was the main
> reason why the generic implementation is better now?
Yes, but the change was not only motivated by this. Wilco added
a set of improvements (3ae725dfb6d7f6144 which on aarch64 improved
performance by about 4.3% and 284f42bc778e487dfd5 which improved
by 3-4%). But the main problem with previous implementation was it
used a non-linear vector scan and added a hack (by counting the
iteration on the search) to fallback to generic version.
This turns to waste a lot of cycles for this tests and it also make
the generic implementation to not take the improvement in the generic
one (not without rewrite it anyway).
We are trying to avoid such 'optimization', since they are really
hard to evaluate if they are not quadratic and we are trying to avoid
such pitfalls in the code.
However, I think there are still room for improvement on powerpc64le
since it uses ifunc for some functions that are not really required
since the minimum ISA is power8 (strnlen, strncat for instance).
We can just setup the built system to use the power8 version and
not provide ifunc variants for LE (as zseries and x86_64-vN are
doing).
>
>
> Even though it's already pushed...
>
> LGTM.
>
> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
Thanks!
>
> Peter
>
>
[1] https://sourceware.org/pipermail/libc-alpha/2024-March/155413.html
More information about the Libc-alpha
mailing list