[PATCH v2] aarch64: MTE compatible strlen

Szabolcs Nagy szabolcs.nagy@arm.com
Tue Jun 9 14:14:51 GMT 2020


The 06/09/2020 09:47, Adhemerval Zanella via Libc-alpha wrote:
> On 09/06/2020 04:35, Szabolcs Nagy wrote:
> > The 06/05/2020 17:22, Andrea Corallo wrote:
> >> Follows a performance comparison (obtained using glibc benchtests) of 
> >> the strlen benchmark run on Cortex-A72, Cortex-A53, Neoverse N1.
> >>
> >> | length | alignment | perf-uplift A72 | perf-uplift A53 |perf-uplift |
> > ...
> >> |     64 |         0 |           1.17x |           0.80x |      1.75x |
> >> |     64 |         7 |           1.17x |           0.77x |      1.83x |
> >> |     64 |         6 |           1.17x |           0.77x |      1.57x |
> >> |     42 |         6 |           1.00x |           0.80x |      1.42x |
> >> |    128 |         0 |           0.96x |           0.68x |      1.80x |
> >> |    128 |         7 |           0.96x |           0.66x |      1.85x |
> >> |    128 |         7 |           0.96x |           0.67x |      1.86x |
> >> |     85 |         7 |           1.05x |           0.75x |      1.87x |
> >> |    256 |         0 |           0.98x |           0.69x |      1.88x |
> >> |    256 |         7 |           0.98x |           0.68x |      1.92x |
> >> |    256 |         8 |           0.99x |           0.69x |      1.88x |
> >> |    170 |         8 |           0.96x |           0.72x |      1.86x |
> >> |    512 |         0 |           0.99x |           0.65x |      1.90x |
> >> |    512 |         7 |           0.98x |           0.65x |      1.92x |
> >> |    512 |         9 |           0.99x |           0.65x |      1.92x |
> >> |    341 |         9 |           0.98x |           0.68x |      1.99x |
> >> |   1024 |         0 |           0.99x |           0.63x |      1.90x |
> >> |   1024 |         7 |           0.99x |           0.62x |      1.92x |
> >> |   1024 |        10 |           0.99x |           0.62x |      1.92x |
> >> |    682 |        10 |           0.99x |           0.64x |      1.96x |
> >> |   2048 |         0 |           0.99x |           0.61x |      1.92x |
> >> |   2048 |         7 |           1.01x |           0.61x |      1.93x |
> >> |   2048 |        11 |           1.00x |           0.61x |      1.95x |
> >> |   1365 |        11 |           1.00x |           0.62x |      1.94x |
> >> |   4096 |         0 |           1.00x |           0.61x |      1.93x |
> >> |   4096 |         7 |           1.00x |           0.61x |      1.94x |
> >> |   4096 |        12 |           1.00x |           0.61x |      1.95x |
> >> |   2730 |        12 |           1.00x |           0.61x |      1.94x |
> > ...
> >> Introduce an Arm MTE compatible strlen implementation.
> >>
> >> The existing implementation assumes that any access to the pages in
> >> which the string resides is safe.  This assumption is not true when
> >> MTE is enabled.  This patch updates the algorithm to ensure that
> >> accesses remain within the bounds of an MTE tag (16-byte chunks) and
> >> improves overall performance.
> > 
> > there is non-trivial performance degradation on cortex-a53
> > so i think the commit message is not appropriate. e.g. use
> > 
> > ".. it improves overall performance on modern cores, but
> > can be slower on cores with a less efficient Advanced SIMD
> > implementation."
> > 
> > note that there is an strlen_asimd.S ifunc variant that
> > is used on falkor and kunpeng920, this new strlen is
> > likely fast on those cores too, so i think we can remove
> > that in a follow up patch. adding Xuelei Zhang on cc:
> > i plan to remove the strlen_asimd.S ifunc variant in this
> > release cycle unless there are regressions on kunpeng920.
> > (but even then it would be nicer to fix the generic version
> > to be fast on kunpeng920 instead of maintaining two variants)
> 
> Should we keep a non-simd version to handle such cores? Also
> for multiarch build, the selection is based on midr which is
> vendor specific. Should we use something else?

a non-simd version can be useful for little cores
since the performance drop is quite big, but it's
also useful not to maintain many ifunc variants
and i don't think we need to super optimize glibc
for small cores (but this depends on how users
use glibc on aarch64, i might be wrong, but i think
for now just one strlen should be acceptable, if
somebody complains we can change it).

i think midr based dispatch is fine if we use
ifunc variants to work around core specific
performance quirks. using the midr of the current
core is not ideal on a big-little system, but
i think there is at least no correctness issue
since all cores must support mte for the kernel
to turn mte on.

note that str* functions may be performance
critical in ld.so where we can only use the
generic version (so it has to be mte safe and
should be fast on all cores anyway).


More information about the Libc-alpha mailing list