[PATCH v2] powerpc64le: Optimize memset for POWER10

Florian Weimer fweimer@redhat.com
Fri Apr 30 12:21:04 GMT 2021


* Raoni Fassina Firmino:

> On Fri, Apr 30, 2021 at 06:52:42AM +0200, AL glibc-alpha wrote:
>> * Raoni Fassina Firmino via Libc-alpha:
>> 
>> > +L(dcbz_loop):
>> > +	/* Sets 512 bytes to zero in each iteration, the loop unrolling shows
>> > +	   a throughput boost for large sizes (2048 bytes or higher).  */
>> > +	dcbz	0,r6
>> > +	dcbz	r9,r6
>> > +	dcbz	r10,r6
>> > +	dcbz	r11,r6
>> > +	addi	r6,r6,512
>> > +	bdnz	L(dcbz_loop)
>> 
>> > +# ifdef __LITTLE_ENDIAN__
>> > +	    (hwcap2 & (PPC_FEATURE2_ARCH_3_1 | PPC_FEATURE2_HAS_ISEL)
>> > +	     && hwcap & PPC_FEATURE_HAS_VSX)
>> > +	    ? __memset_power10 :
>> > +# endif
>> 
>> Should the IFUNC resolver check that the cache line size is 128 bytes?
>
> I'm not sure, this part was taken from power8 version and for that it
> does not check the cache line size.  In fact I looked all other memset
> versions (power7, power6, power4 and ppc), and only the ppc version does
> not assume a 128 bytes cache line.  And none has this check (and all of
> them uses dcbz).  So I don't know.

Hmm, you are right, I think we had a discussion about this already (in
the POWER8 context) because the string function breaks if the cache line
size (as emulated by QEMU, if I recall correctly) are unexpected.

Thanks,
Florian



More information about the Libc-alpha mailing list