[PATCH v2] aarch64: MTE compatible strlen

Szabolcs Nagy szabolcs.nagy@arm.com
Tue Jun 9 07:35:38 GMT 2020


The 06/05/2020 17:22, Andrea Corallo wrote:
> Follows a performance comparison (obtained using glibc benchtests) of 
> the strlen benchmark run on Cortex-A72, Cortex-A53, Neoverse N1.
> 
> | length | alignment | perf-uplift A72 | perf-uplift A53 |perf-uplift |
...
> |     64 |         0 |           1.17x |           0.80x |      1.75x |
> |     64 |         7 |           1.17x |           0.77x |      1.83x |
> |     64 |         6 |           1.17x |           0.77x |      1.57x |
> |     42 |         6 |           1.00x |           0.80x |      1.42x |
> |    128 |         0 |           0.96x |           0.68x |      1.80x |
> |    128 |         7 |           0.96x |           0.66x |      1.85x |
> |    128 |         7 |           0.96x |           0.67x |      1.86x |
> |     85 |         7 |           1.05x |           0.75x |      1.87x |
> |    256 |         0 |           0.98x |           0.69x |      1.88x |
> |    256 |         7 |           0.98x |           0.68x |      1.92x |
> |    256 |         8 |           0.99x |           0.69x |      1.88x |
> |    170 |         8 |           0.96x |           0.72x |      1.86x |
> |    512 |         0 |           0.99x |           0.65x |      1.90x |
> |    512 |         7 |           0.98x |           0.65x |      1.92x |
> |    512 |         9 |           0.99x |           0.65x |      1.92x |
> |    341 |         9 |           0.98x |           0.68x |      1.99x |
> |   1024 |         0 |           0.99x |           0.63x |      1.90x |
> |   1024 |         7 |           0.99x |           0.62x |      1.92x |
> |   1024 |        10 |           0.99x |           0.62x |      1.92x |
> |    682 |        10 |           0.99x |           0.64x |      1.96x |
> |   2048 |         0 |           0.99x |           0.61x |      1.92x |
> |   2048 |         7 |           1.01x |           0.61x |      1.93x |
> |   2048 |        11 |           1.00x |           0.61x |      1.95x |
> |   1365 |        11 |           1.00x |           0.62x |      1.94x |
> |   4096 |         0 |           1.00x |           0.61x |      1.93x |
> |   4096 |         7 |           1.00x |           0.61x |      1.94x |
> |   4096 |        12 |           1.00x |           0.61x |      1.95x |
> |   2730 |        12 |           1.00x |           0.61x |      1.94x |
...
> Introduce an Arm MTE compatible strlen implementation.
> 
> The existing implementation assumes that any access to the pages in
> which the string resides is safe.  This assumption is not true when
> MTE is enabled.  This patch updates the algorithm to ensure that
> accesses remain within the bounds of an MTE tag (16-byte chunks) and
> improves overall performance.

there is non-trivial performance degradation on cortex-a53
so i think the commit message is not appropriate. e.g. use

".. it improves overall performance on modern cores, but
can be slower on cores with a less efficient Advanced SIMD
implementation."

note that there is an strlen_asimd.S ifunc variant that
is used on falkor and kunpeng920, this new strlen is
likely fast on those cores too, so i think we can remove
that in a follow up patch. adding Xuelei Zhang on cc:
i plan to remove the strlen_asimd.S ifunc variant in this
release cycle unless there are regressions on kunpeng920.
(but even then it would be nicer to fix the generic version
to be fast on kunpeng920 instead of maintaining two variants)

> Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.
> 
> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>

this is ok to commit with updated commit message.


More information about the Libc-alpha mailing list