[PATCH v3 5/5] AArch64: Improve A64FX memset

Fri Aug 27 05:05:03 GMT 2021

Hi Wilco,

> > If you agree to the cmp and branch workaround (2 instructions at the beginning of the loop)
> > below, I'll submit a patch.
> 
> Yes, the 2 instruction workaround is clearly the best solution so far. It fixes the dips
> around 16KB but doesn't regress anything else. The results v4 vs v4fix [9] show there
> are even some uplifts in the 1-8KB range.

Thank you for the review. I submitted a patch [1], please find it.

[1] https://sourceware.org/pipermail/libc-alpha/2021-August/130569.html

> > 2) Result of the cmp and branch workaround (2 instructions at the beginning of the loop)
> 
> It's interesting this works on both systems, however it's still a mystery why...
> It would be a good idea to ask your CPU team about this.

OK. In the meanwhile you can find the microarchitecture manual [2] if you're interested in.

[2] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.5.pdf

Thanks.
Naohiro