[PATCH v2] aarch64: MTE compatible strlen

Adhemerval Zanella adhemerval.zanella@linaro.org
Tue Jun 9 12:47:02 GMT 2020



On 09/06/2020 04:35, Szabolcs Nagy wrote:
> The 06/05/2020 17:22, Andrea Corallo wrote:
>> Follows a performance comparison (obtained using glibc benchtests) of 
>> the strlen benchmark run on Cortex-A72, Cortex-A53, Neoverse N1.
>>
>> | length | alignment | perf-uplift A72 | perf-uplift A53 |perf-uplift |
> ...
>> |     64 |         0 |           1.17x |           0.80x |      1.75x |
>> |     64 |         7 |           1.17x |           0.77x |      1.83x |
>> |     64 |         6 |           1.17x |           0.77x |      1.57x |
>> |     42 |         6 |           1.00x |           0.80x |      1.42x |
>> |    128 |         0 |           0.96x |           0.68x |      1.80x |
>> |    128 |         7 |           0.96x |           0.66x |      1.85x |
>> |    128 |         7 |           0.96x |           0.67x |      1.86x |
>> |     85 |         7 |           1.05x |           0.75x |      1.87x |
>> |    256 |         0 |           0.98x |           0.69x |      1.88x |
>> |    256 |         7 |           0.98x |           0.68x |      1.92x |
>> |    256 |         8 |           0.99x |           0.69x |      1.88x |
>> |    170 |         8 |           0.96x |           0.72x |      1.86x |
>> |    512 |         0 |           0.99x |           0.65x |      1.90x |
>> |    512 |         7 |           0.98x |           0.65x |      1.92x |
>> |    512 |         9 |           0.99x |           0.65x |      1.92x |
>> |    341 |         9 |           0.98x |           0.68x |      1.99x |
>> |   1024 |         0 |           0.99x |           0.63x |      1.90x |
>> |   1024 |         7 |           0.99x |           0.62x |      1.92x |
>> |   1024 |        10 |           0.99x |           0.62x |      1.92x |
>> |    682 |        10 |           0.99x |           0.64x |      1.96x |
>> |   2048 |         0 |           0.99x |           0.61x |      1.92x |
>> |   2048 |         7 |           1.01x |           0.61x |      1.93x |
>> |   2048 |        11 |           1.00x |           0.61x |      1.95x |
>> |   1365 |        11 |           1.00x |           0.62x |      1.94x |
>> |   4096 |         0 |           1.00x |           0.61x |      1.93x |
>> |   4096 |         7 |           1.00x |           0.61x |      1.94x |
>> |   4096 |        12 |           1.00x |           0.61x |      1.95x |
>> |   2730 |        12 |           1.00x |           0.61x |      1.94x |
> ...
>> Introduce an Arm MTE compatible strlen implementation.
>>
>> The existing implementation assumes that any access to the pages in
>> which the string resides is safe.  This assumption is not true when
>> MTE is enabled.  This patch updates the algorithm to ensure that
>> accesses remain within the bounds of an MTE tag (16-byte chunks) and
>> improves overall performance.
> 
> there is non-trivial performance degradation on cortex-a53
> so i think the commit message is not appropriate. e.g. use
> 
> ".. it improves overall performance on modern cores, but
> can be slower on cores with a less efficient Advanced SIMD
> implementation."
> 
> note that there is an strlen_asimd.S ifunc variant that
> is used on falkor and kunpeng920, this new strlen is
> likely fast on those cores too, so i think we can remove
> that in a follow up patch. adding Xuelei Zhang on cc:
> i plan to remove the strlen_asimd.S ifunc variant in this
> release cycle unless there are regressions on kunpeng920.
> (but even then it would be nicer to fix the generic version
> to be fast on kunpeng920 instead of maintaining two variants)

Should we keep a non-simd version to handle such cores? Also
for multiarch build, the selection is based on midr which is
vendor specific. Should we use something else?

> 
>> Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.
>>
>> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
> 
> this is ok to commit with updated commit message.
> 


More information about the Libc-alpha mailing list