[PATCH v2] aarch64: MTE compatible strlen
Adhemerval Zanella
adhemerval.zanella@linaro.org
Tue Jun 9 12:47:02 GMT 2020
On 09/06/2020 04:35, Szabolcs Nagy wrote:
> The 06/05/2020 17:22, Andrea Corallo wrote:
>> Follows a performance comparison (obtained using glibc benchtests) of
>> the strlen benchmark run on Cortex-A72, Cortex-A53, Neoverse N1.
>>
>> | length | alignment | perf-uplift A72 | perf-uplift A53 |perf-uplift |
> ...
>> | 64 | 0 | 1.17x | 0.80x | 1.75x |
>> | 64 | 7 | 1.17x | 0.77x | 1.83x |
>> | 64 | 6 | 1.17x | 0.77x | 1.57x |
>> | 42 | 6 | 1.00x | 0.80x | 1.42x |
>> | 128 | 0 | 0.96x | 0.68x | 1.80x |
>> | 128 | 7 | 0.96x | 0.66x | 1.85x |
>> | 128 | 7 | 0.96x | 0.67x | 1.86x |
>> | 85 | 7 | 1.05x | 0.75x | 1.87x |
>> | 256 | 0 | 0.98x | 0.69x | 1.88x |
>> | 256 | 7 | 0.98x | 0.68x | 1.92x |
>> | 256 | 8 | 0.99x | 0.69x | 1.88x |
>> | 170 | 8 | 0.96x | 0.72x | 1.86x |
>> | 512 | 0 | 0.99x | 0.65x | 1.90x |
>> | 512 | 7 | 0.98x | 0.65x | 1.92x |
>> | 512 | 9 | 0.99x | 0.65x | 1.92x |
>> | 341 | 9 | 0.98x | 0.68x | 1.99x |
>> | 1024 | 0 | 0.99x | 0.63x | 1.90x |
>> | 1024 | 7 | 0.99x | 0.62x | 1.92x |
>> | 1024 | 10 | 0.99x | 0.62x | 1.92x |
>> | 682 | 10 | 0.99x | 0.64x | 1.96x |
>> | 2048 | 0 | 0.99x | 0.61x | 1.92x |
>> | 2048 | 7 | 1.01x | 0.61x | 1.93x |
>> | 2048 | 11 | 1.00x | 0.61x | 1.95x |
>> | 1365 | 11 | 1.00x | 0.62x | 1.94x |
>> | 4096 | 0 | 1.00x | 0.61x | 1.93x |
>> | 4096 | 7 | 1.00x | 0.61x | 1.94x |
>> | 4096 | 12 | 1.00x | 0.61x | 1.95x |
>> | 2730 | 12 | 1.00x | 0.61x | 1.94x |
> ...
>> Introduce an Arm MTE compatible strlen implementation.
>>
>> The existing implementation assumes that any access to the pages in
>> which the string resides is safe. This assumption is not true when
>> MTE is enabled. This patch updates the algorithm to ensure that
>> accesses remain within the bounds of an MTE tag (16-byte chunks) and
>> improves overall performance.
>
> there is non-trivial performance degradation on cortex-a53
> so i think the commit message is not appropriate. e.g. use
>
> ".. it improves overall performance on modern cores, but
> can be slower on cores with a less efficient Advanced SIMD
> implementation."
>
> note that there is an strlen_asimd.S ifunc variant that
> is used on falkor and kunpeng920, this new strlen is
> likely fast on those cores too, so i think we can remove
> that in a follow up patch. adding Xuelei Zhang on cc:
> i plan to remove the strlen_asimd.S ifunc variant in this
> release cycle unless there are regressions on kunpeng920.
> (but even then it would be nicer to fix the generic version
> to be fast on kunpeng920 instead of maintaining two variants)
Should we keep a non-simd version to handle such cores? Also
for multiarch build, the selection is based on midr which is
vendor specific. Should we use something else?
>
>> Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.
>>
>> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
>
> this is ok to commit with updated commit message.
>
More information about the Libc-alpha
mailing list