[PATCH v2 2/2] powerpc: Add optimized stpncpy for POWER9
Raphael M Zinsly
rzinsly@linux.ibm.com
Wed Sep 16 12:56:59 GMT 2020
Hi Matheus,
On 16/09/2020 09:32, Matheus Castanho wrote:
> On 9/4/20 1:59 PM, Raphael M Zinsly via Libc-alpha wrote:
>> Benchtest output:
>> generic_stpncpy __stpncpy_power9 __stpncpy_power8 __stpncpy_power7 __stpncpy_ppc
> <snip>
>> Length 512, n 1024, alignment 0/ 0: 20.5111 22.9782 19.6648 21.3857 42.4801
> <snip>
>> Length 512, n 1024, alignment 1/ 6: 29.9694 24.3087 22.0513 46.7436 51.5908
>
> These two seem to be the only cases in which the power9 version loses to
> the power8 one. Have you investigated what happens in these two specific
> cases?
>
Yes the power8 optimization calls memset to do the zero padding at the
end if n > length. In this case where n is way higher, memset is faster
than the loop used in my implementation.
Thanks for the review!
Regards,
--
Raphael Moreira Zinsly
IBM
Linux on Power Toolchain
More information about the Libc-alpha
mailing list