[PATCH v2 2/2] powerpc: Add optimized stpncpy for POWER9

Raphael M Zinsly rzinsly@linux.ibm.com
Wed Sep 16 12:56:59 GMT 2020


Hi Matheus,

On 16/09/2020 09:32, Matheus Castanho wrote:
> On 9/4/20 1:59 PM, Raphael M Zinsly via Libc-alpha wrote:
>> Benchtest output:
>>                                  generic_stpncpy    __stpncpy_power9  __stpncpy_power8    __stpncpy_power7    __stpncpy_ppc
> <snip>
>> Length  512, n 1024, alignment  0/ 0:    20.5111    22.9782   19.6648    21.3857 42.4801
> <snip>
>> Length  512, n 1024, alignment  1/ 6:    29.9694    24.3087   22.0513    46.7436 51.5908
> 
> These two seem to be the only cases in which the power9 version loses to
> the power8 one. Have you investigated what happens in these two specific
> cases?
> 
Yes the power8 optimization calls memset to do the zero padding at the 
end if n > length. In this case where n is way higher, memset is faster 
than the loop used in my implementation.


Thanks for the review!

Regards,
-- 
Raphael Moreira Zinsly
IBM
Linux on Power Toolchain


More information about the Libc-alpha mailing list