[PATCH v3 2/2] powerpc: Add optimized stpncpy for POWER9
Adhemerval Zanella
adhemerval.zanella@linaro.org
Wed Sep 30 13:42:31 GMT 2020
On 29/09/2020 12:21, Raphael Moreira Zinsly via Libc-alpha wrote:
> Add stpncpy support into the POWER9 strncpy.
The benchmark numbers you provided [1] seems to show it is slight worse than
the generic_strncpy which uses the same strategy as string/strncpy.c
(which would use VSX instruction through memset/memcpy). Did you compare this
optimization against an implementation that just call power8/9 memset/memcpy
instead?
It should resulting a smaller implementation which reduces i-cache size and
the code is much more simpler and maintainable. The same applies for stpncpy.
I tried to dissuade Intel developers that such micro-optimization are not
really a real gain and instead we should optimize only a handful of string
operations (memcpy/memset/etc.) and use composable implementation instead
(as generic strncpy). It still resulted on 1a153e47fcc, but I think we
might do better for powerpc.
[1] https://sourceware.org/pipermail/libc-alpha/2020-September/118049.html
More information about the Libc-alpha
mailing list