[PATCH] aarch64: MTE compatible strchr
Andrea Corallo
andrea.corallo@arm.com
Wed Jun 3 09:49:04 GMT 2020
Hi all,
I'd like to submit this patch introducing an Arm MTE compatible strchr
implementation.
Follows a performance comparison of the strchr benchmark run on
Cortex-A72, Cortex-A53, Neoverse N1.
| length | alignment | perf-uplift A72 | perf-uplift A53 | perf-uplift N1 |
|--------+-----------+-----------------+-----------------+----------------|
| 32 | 0 | 1.91x | 1.10x | 1.33x |
| 32 | 1 | 2.06x | 1.22x | 1.41x |
| 64 | 0 | 1.61x | 1.00x | 1.18x |
| 64 | 2 | 1.69x | 1.08x | 1.15x |
| 128 | 0 | 1.51x | 0.85x | 1.06x |
| 128 | 3 | 1.57x | 0.90x | 1.15x |
| 256 | 0 | 1.37x | 0.84x | 1.09x |
| 256 | 4 | 1.41x | 0.83x | 1.15x |
| 512 | 0 | 1.18x | 0.80x | 1.09x |
| 512 | 5 | 1.19x | 0.82x | 1.14x |
| 1024 | 0 | 1.15x | 0.78x | 1.09x |
| 1024 | 6 | 1.05x | 0.79x | 1.09x |
| 2048 | 0 | 1.15x | 0.76x | 1.08x |
| 2048 | 7 | 1.13x | 0.77x | 1.08x |
| 64 | 1 | 1.28x | 1.08x | 1.33x |
| 64 | 1 | 1.28x | 1.08x | 1.31x |
| 64 | 2 | 1.28x | 1.08x | 1.31x |
| 64 | 2 | 1.28x | 1.08x | 1.15x |
| 64 | 3 | 1.28x | 1.08x | 1.15x |
| 64 | 3 | 1.28x | 1.08x | 1.31x |
| 64 | 4 | 1.28x | 1.08x | 1.31x |
| 64 | 4 | 1.28x | 1.08x | 1.31x |
| 64 | 5 | 1.28x | 1.08x | 1.31x |
| 64 | 5 | 1.28x | 1.08x | 1.31x |
| 64 | 6 | 1.28x | 1.08x | 1.31x |
| 64 | 6 | 1.28x | 1.08x | 1.31x |
| 64 | 7 | 1.28x | 1.08x | 1.31x |
| 64 | 7 | 1.28x | 1.08x | 1.31x |
| 0 | 0 | 1.32x | 1.63x | 1.53x |
| 0 | 0 | 1.32x | 1.63x | 1.53x |
| 1 | 0 | 1.31x | 1.64x | 1.53x |
| 1 | 0 | 1.32x | 1.67x | 1.53x |
| 2 | 0 | 1.32x | 1.63x | 1.52x |
| 2 | 0 | 1.32x | 1.69x | 1.52x |
| 3 | 0 | 1.32x | 1.67x | 1.51x |
| 3 | 0 | 1.32x | 1.66x | 1.52x |
| 4 | 0 | 1.32x | 1.69x | 1.52x |
| 4 | 0 | 1.32x | 1.69x | 1.52x |
| 5 | 0 | 1.32x | 1.69x | 1.26x |
| 5 | 0 | 1.32x | 1.69x | 1.26x |
| 6 | 0 | 1.32x | 1.69x | 1.26x |
| 6 | 0 | 1.32x | 1.68x | 1.51x |
| 7 | 0 | 1.32x | 1.63x | 1.54x |
| 7 | 0 | 1.32x | 1.63x | 1.52x |
| 8 | 0 | 1.32x | 1.69x | 1.53x |
| 8 | 0 | 1.32x | 1.65x | 1.53x |
| 9 | 0 | 1.32x | 1.63x | 1.54x |
| 9 | 0 | 1.32x | 1.68x | 1.52x |
| 10 | 0 | 1.32x | 1.63x | 1.52x |
| 10 | 0 | 1.32x | 1.69x | 1.51x |
| 11 | 0 | 1.32x | 1.64x | 1.52x |
| 11 | 0 | 1.32x | 1.63x | 1.52x |
| 12 | 0 | 1.32x | 1.64x | 1.52x |
| 12 | 0 | 1.32x | 1.68x | 1.54x |
| 13 | 0 | 1.32x | 1.63x | 1.53x |
| 13 | 0 | 1.32x | 1.67x | 1.52x |
| 14 | 0 | 1.32x | 1.65x | 1.53x |
| 14 | 0 | 1.32x | 1.63x | 1.52x |
| 15 | 0 | 1.32x | 1.67x | 1.52x |
| 15 | 0 | 1.32x | 1.65x | 1.26x |
| 16 | 0 | 1.08x | 1.00x | 1.03x |
| 16 | 0 | 1.08x | 1.00x | 1.03x |
| 17 | 0 | 1.09x | 1.00x | 1.03x |
| 17 | 0 | 1.09x | 1.00x | 1.03x |
| 18 | 0 | 1.09x | 1.00x | 1.03x |
| 18 | 0 | 1.08x | 1.00x | 1.03x |
| 19 | 0 | 1.08x | 1.00x | 1.03x |
| 19 | 0 | 1.08x | 1.00x | 1.03x |
| 20 | 0 | 1.08x | 1.00x | 1.03x |
| 20 | 0 | 1.09x | 1.00x | 1.03x |
| 21 | 0 | 1.08x | 1.00x | 1.03x |
| 21 | 0 | 1.08x | 1.00x | 1.08x |
| 22 | 0 | 1.09x | 1.00x | 1.09x |
| 22 | 0 | 1.08x | 1.00x | 1.09x |
| 23 | 0 | 1.08x | 1.00x | 1.08x |
| 23 | 0 | 1.08x | 1.00x | 1.08x |
| 24 | 0 | 1.08x | 1.00x | 1.08x |
| 24 | 0 | 1.08x | 1.00x | 1.09x |
| 25 | 0 | 1.08x | 1.00x | 1.10x |
| 25 | 0 | 1.08x | 1.00x | 1.09x |
| 26 | 0 | 1.08x | 1.00x | 1.08x |
| 26 | 0 | 1.08x | 1.00x | 1.08x |
| 27 | 0 | 1.09x | 1.00x | 1.08x |
| 27 | 0 | 1.08x | 1.00x | 1.08x |
| 28 | 0 | 1.08x | 1.00x | 1.08x |
| 28 | 0 | 1.08x | 1.00x | 1.08x |
| 29 | 0 | 1.08x | 1.00x | 1.09x |
| 29 | 0 | 1.08x | 1.00x | 1.08x |
| 30 | 0 | 1.08x | 1.00x | 1.08x |
| 30 | 0 | 1.08x | 1.00x | 1.08x |
| 31 | 0 | 1.09x | 1.00x | 1.08x |
| 31 | 0 | 1.08x | 1.00x | 1.08x |
| 32 | 0 | 1.27x | 1.10x | 1.25x |
| 32 | 1 | 1.38x | 1.21x | 1.38x |
| 64 | 0 | 1.17x | 1.00x | 1.20x |
| 64 | 2 | 1.28x | 1.08x | 1.33x |
| 128 | 0 | 1.17x | 0.85x | 1.17x |
| 128 | 3 | 1.23x | 0.90x | 1.29x |
| 256 | 0 | 1.17x | 0.84x | 1.15x |
| 256 | 4 | 1.21x | 0.83x | 1.21x |
| 512 | 0 | 1.16x | 0.80x | 1.08x |
| 512 | 5 | 1.19x | 0.82x | 1.14x |
| 1024 | 0 | 1.15x | 0.78x | 1.09x |
| 1024 | 6 | 1.05x | 0.79x | 1.09x |
| 2048 | 0 | 1.15x | 0.76x | 1.08x |
| 2048 | 7 | 1.14x | 0.77x | 1.08x |
| 64 | 1 | 1.20x | 1.08x | 1.33x |
| 64 | 1 | 1.28x | 1.08x | 1.33x |
| 64 | 2 | 1.28x | 1.08x | 1.35x |
| 64 | 2 | 1.28x | 1.08x | 1.35x |
| 64 | 3 | 1.28x | 1.08x | 1.15x |
| 64 | 3 | 1.28x | 1.08x | 1.15x |
| 64 | 4 | 1.28x | 1.08x | 1.35x |
| 64 | 4 | 1.28x | 1.08x | 1.31x |
| 64 | 5 | 1.28x | 1.08x | 1.35x |
| 64 | 5 | 1.28x | 1.08x | 1.35x |
| 64 | 6 | 1.28x | 1.08x | 1.31x |
| 64 | 6 | 1.28x | 1.08x | 1.31x |
| 64 | 7 | 1.28x | 1.08x | 1.35x |
| 64 | 7 | 1.28x | 1.08x | 1.35x |
| 0 | 0 | 1.32x | 1.68x | 1.52x |
| 0 | 0 | 1.32x | 1.63x | 1.53x |
| 1 | 0 | 1.32x | 1.69x | 1.52x |
| 1 | 0 | 1.32x | 1.68x | 1.52x |
| 2 | 0 | 1.32x | 1.69x | 1.51x |
| 2 | 0 | 1.32x | 1.69x | 1.52x |
| 3 | 0 | 1.32x | 1.67x | 1.51x |
| 3 | 0 | 1.32x | 1.69x | 1.52x |
| 4 | 0 | 1.32x | 1.67x | 1.52x |
| 4 | 0 | 1.32x | 1.69x | 1.56x |
| 5 | 0 | 1.32x | 1.69x | 1.52x |
| 5 | 0 | 1.32x | 1.69x | 1.52x |
| 6 | 0 | 1.32x | 1.69x | 1.51x |
| 6 | 0 | 1.32x | 1.69x | 1.52x |
| 7 | 0 | 1.32x | 1.63x | 1.52x |
| 7 | 0 | 1.32x | 1.63x | 1.53x |
| 8 | 0 | 1.32x | 1.65x | 1.52x |
| 8 | 0 | 1.32x | 1.63x | 1.52x |
| 9 | 0 | 1.32x | 1.63x | 1.51x |
| 9 | 0 | 1.32x | 1.64x | 1.52x |
| 10 | 0 | 1.32x | 1.63x | 1.52x |
| 10 | 0 | 1.32x | 1.65x | 1.52x |
| 11 | 0 | 1.32x | 1.63x | 1.52x |
| 11 | 0 | 1.32x | 1.63x | 1.51x |
| 12 | 0 | 1.32x | 1.63x | 1.53x |
| 12 | 0 | 1.32x | 1.63x | 1.51x |
| 13 | 0 | 1.32x | 1.63x | 1.52x |
| 13 | 0 | 1.32x | 1.65x | 1.52x |
| 14 | 0 | 1.32x | 1.66x | 1.53x |
| 14 | 0 | 1.32x | 1.64x | 1.26x |
| 15 | 0 | 1.32x | 1.68x | 1.26x |
| 15 | 0 | 1.32x | 1.69x | 1.26x |
| 16 | 0 | 1.08x | 1.00x | 1.03x |
| 16 | 0 | 1.08x | 1.00x | 1.05x |
| 17 | 0 | 1.08x | 1.00x | 1.08x |
| 17 | 0 | 1.09x | 1.00x | 1.03x |
| 18 | 0 | 1.09x | 1.00x | 1.08x |
| 18 | 0 | 1.08x | 1.00x | 1.08x |
| 19 | 0 | 1.08x | 1.00x | 1.08x |
| 19 | 0 | 1.08x | 1.00x | 1.09x |
| 20 | 0 | 1.09x | 1.00x | 1.08x |
| 20 | 0 | 1.08x | 1.00x | 1.08x |
| 21 | 0 | 1.08x | 1.00x | 1.09x |
| 21 | 0 | 1.08x | 1.00x | 1.08x |
| 22 | 0 | 1.09x | 1.00x | 1.08x |
| 22 | 0 | 1.08x | 1.00x | 1.09x |
| 23 | 0 | 1.08x | 1.00x | 1.08x |
| 23 | 0 | 1.08x | 1.00x | 1.08x |
| 24 | 0 | 1.08x | 1.00x | 1.08x |
| 24 | 0 | 1.08x | 1.00x | 1.08x |
| 25 | 0 | 1.08x | 1.00x | 1.08x |
| 25 | 0 | 1.08x | 1.00x | 1.09x |
| 26 | 0 | 1.08x | 1.00x | 1.08x |
| 26 | 0 | 1.08x | 1.00x | 1.09x |
| 27 | 0 | 1.09x | 1.00x | 1.08x |
| 27 | 0 | 1.08x | 1.00x | 1.08x |
| 28 | 0 | 1.08x | 1.00x | 1.08x |
| 28 | 0 | 1.09x | 1.00x | 1.03x |
| 29 | 0 | 1.08x | 1.00x | 1.03x |
| 29 | 0 | 1.08x | 1.00x | 1.03x |
| 30 | 0 | 1.08x | 1.00x | 1.08x |
| 30 | 0 | 1.08x | 1.00x | 1.08x |
| 31 | 0 | 1.09x | 1.00x | 1.08x |
| 31 | 0 | 1.08x | 1.00x | 1.08x |
This patch is passing GLIBC tests.
Regards
Andrea
8< --- 8< --- 8<
Introduce an Arm MTE compatible strchr implementation.
Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show
performance regressions.
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strchr-mte.patch
Type: text/x-diff
Size: 6241 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200603/533d2719/attachment-0001.bin>
More information about the Libc-alpha
mailing list