[PATCH v2] powerpc64le: Optimize memset for POWER10
Florian Weimer
fweimer@redhat.com
Fri Apr 30 12:21:04 GMT 2021
* Raoni Fassina Firmino:
> On Fri, Apr 30, 2021 at 06:52:42AM +0200, AL glibc-alpha wrote:
>> * Raoni Fassina Firmino via Libc-alpha:
>>
>> > +L(dcbz_loop):
>> > + /* Sets 512 bytes to zero in each iteration, the loop unrolling shows
>> > + a throughput boost for large sizes (2048 bytes or higher). */
>> > + dcbz 0,r6
>> > + dcbz r9,r6
>> > + dcbz r10,r6
>> > + dcbz r11,r6
>> > + addi r6,r6,512
>> > + bdnz L(dcbz_loop)
>>
>> > +# ifdef __LITTLE_ENDIAN__
>> > + (hwcap2 & (PPC_FEATURE2_ARCH_3_1 | PPC_FEATURE2_HAS_ISEL)
>> > + && hwcap & PPC_FEATURE_HAS_VSX)
>> > + ? __memset_power10 :
>> > +# endif
>>
>> Should the IFUNC resolver check that the cache line size is 128 bytes?
>
> I'm not sure, this part was taken from power8 version and for that it
> does not check the cache line size. In fact I looked all other memset
> versions (power7, power6, power4 and ppc), and only the ppc version does
> not assume a 128 bytes cache line. And none has this check (and all of
> them uses dcbz). So I don't know.
Hmm, you are right, I think we had a discussion about this already (in
the POWER8 context) because the string function breaks if the cache line
size (as emulated by QEMU, if I recall correctly) are unexpected.
Thanks,
Florian
More information about the Libc-alpha
mailing list