[PATCH v2] aarch64: MTE compatible strlen
Tue Jun 9 17:24:37 GMT 2020
> > note that there is an strlen_asimd.S ifunc variant that
> > is used on falkor and kunpeng920, this new strlen is
> > likely fast on those cores too, so i think we can remove
> > that in a follow up patch. adding Xuelei Zhang on cc:
> > i plan to remove the strlen_asimd.S ifunc variant in this
> > release cycle unless there are regressions on kunpeng920.
> > (but even then it would be nicer to fix the generic version
> > to be fast on kunpeng920 instead of maintaining two variants)
I would expect the new generic version to be faster than strlen_asimd.S
mainly because UMINV is a slow instruction on all microarchitectures
>> Should we keep a non-simd version to handle such cores? Also
>> for multiarch build, the selection is based on midr which is
>> vendor specific. Should we use something else?
> a non-simd version can be useful for little cores
> since the performance drop is quite big, but it's
> also useful not to maintain many ifunc variants
The performance drop on Cortex-A53 is mainly for sizes over 32, so for
typical sized strings the average difference would be well below 10%.
I haven't tuned for in-order cores, so it could likely be improved.
It's reasonable to add an optimized strlen for modern high-end cores
which would be as fast as possible. I have an experimental version which
is about 2.5x as fast as strlen_asymd.S on Neoverse N1. We could use it
for Cortex-A53 if it happens to perform better than the MTE variant.
So for strlen we should be able to cover all microarchitectures with 2 ifuncs.
More information about the Libc-alpha