[PATCH v2] aarch64: MTE compatible strchr
Andrea Corallo
andrea.corallo@arm.com
Fri Jun 5 15:20:50 GMT 2020
Hi all,
I'd like to submit this patch introducing an Arm MTE compatible strchr
implementation.
Follows a performance comparison (obtained using glibc benchtests) of the
strchr benchmark run on Cortex-A72, Cortex-A53, Neoverse N1.
| length | alignment | perf-uplift A72 | perf-uplift A53 | perf-uplift N1 |
|--------+-----------+-----------------+-----------------+----------------|
| 32 | 0 | 1.91x | 1.10x | 1.33x |
| 32 | 1 | 2.06x | 1.22x | 1.41x |
| 64 | 0 | 1.61x | 1.00x | 1.18x |
| 64 | 2 | 1.69x | 1.08x | 1.15x |
| 128 | 0 | 1.51x | 0.85x | 1.06x |
| 128 | 3 | 1.57x | 0.90x | 1.15x |
| 256 | 0 | 1.37x | 0.84x | 1.09x |
| 256 | 4 | 1.41x | 0.83x | 1.15x |
| 512 | 0 | 1.18x | 0.80x | 1.09x |
| 512 | 5 | 1.19x | 0.82x | 1.14x |
| 1024 | 0 | 1.15x | 0.78x | 1.09x |
| 1024 | 6 | 1.05x | 0.79x | 1.09x |
| 2048 | 0 | 1.15x | 0.76x | 1.08x |
| 2048 | 7 | 1.13x | 0.77x | 1.08x |
| 64 | 1 | 1.28x | 1.08x | 1.33x |
| 64 | 1 | 1.28x | 1.08x | 1.31x |
| 64 | 2 | 1.28x | 1.08x | 1.31x |
| 64 | 2 | 1.28x | 1.08x | 1.15x |
| 64 | 3 | 1.28x | 1.08x | 1.15x |
| 64 | 3 | 1.28x | 1.08x | 1.31x |
| 64 | 4 | 1.28x | 1.08x | 1.31x |
| 64 | 4 | 1.28x | 1.08x | 1.31x |
| 64 | 5 | 1.28x | 1.08x | 1.31x |
| 64 | 5 | 1.28x | 1.08x | 1.31x |
| 64 | 6 | 1.28x | 1.08x | 1.31x |
| 64 | 6 | 1.28x | 1.08x | 1.31x |
| 64 | 7 | 1.28x | 1.08x | 1.31x |
| 64 | 7 | 1.28x | 1.08x | 1.31x |
| 0 | 0 | 1.32x | 1.63x | 1.53x |
| 0 | 0 | 1.32x | 1.63x | 1.53x |
| 1 | 0 | 1.31x | 1.64x | 1.53x |
| 1 | 0 | 1.32x | 1.67x | 1.53x |
| 2 | 0 | 1.32x | 1.63x | 1.52x |
| 2 | 0 | 1.32x | 1.69x | 1.52x |
| 3 | 0 | 1.32x | 1.67x | 1.51x |
| 3 | 0 | 1.32x | 1.66x | 1.52x |
| 4 | 0 | 1.32x | 1.69x | 1.52x |
| 4 | 0 | 1.32x | 1.69x | 1.52x |
| 5 | 0 | 1.32x | 1.69x | 1.26x |
| 5 | 0 | 1.32x | 1.69x | 1.26x |
| 6 | 0 | 1.32x | 1.69x | 1.26x |
| 6 | 0 | 1.32x | 1.68x | 1.51x |
| 7 | 0 | 1.32x | 1.63x | 1.54x |
| 7 | 0 | 1.32x | 1.63x | 1.52x |
| 8 | 0 | 1.32x | 1.69x | 1.53x |
| 8 | 0 | 1.32x | 1.65x | 1.53x |
| 9 | 0 | 1.32x | 1.63x | 1.54x |
| 9 | 0 | 1.32x | 1.68x | 1.52x |
| 10 | 0 | 1.32x | 1.63x | 1.52x |
| 10 | 0 | 1.32x | 1.69x | 1.51x |
| 11 | 0 | 1.32x | 1.64x | 1.52x |
| 11 | 0 | 1.32x | 1.63x | 1.52x |
| 12 | 0 | 1.32x | 1.64x | 1.52x |
| 12 | 0 | 1.32x | 1.68x | 1.54x |
| 13 | 0 | 1.32x | 1.63x | 1.53x |
| 13 | 0 | 1.32x | 1.67x | 1.52x |
| 14 | 0 | 1.32x | 1.65x | 1.53x |
| 14 | 0 | 1.32x | 1.63x | 1.52x |
| 15 | 0 | 1.32x | 1.67x | 1.52x |
| 15 | 0 | 1.32x | 1.65x | 1.26x |
| 16 | 0 | 1.08x | 1.00x | 1.03x |
| 16 | 0 | 1.08x | 1.00x | 1.03x |
| 17 | 0 | 1.09x | 1.00x | 1.03x |
| 17 | 0 | 1.09x | 1.00x | 1.03x |
| 18 | 0 | 1.09x | 1.00x | 1.03x |
| 18 | 0 | 1.08x | 1.00x | 1.03x |
| 19 | 0 | 1.08x | 1.00x | 1.03x |
| 19 | 0 | 1.08x | 1.00x | 1.03x |
| 20 | 0 | 1.08x | 1.00x | 1.03x |
| 20 | 0 | 1.09x | 1.00x | 1.03x |
| 21 | 0 | 1.08x | 1.00x | 1.03x |
| 21 | 0 | 1.08x | 1.00x | 1.08x |
| 22 | 0 | 1.09x | 1.00x | 1.09x |
| 22 | 0 | 1.08x | 1.00x | 1.09x |
| 23 | 0 | 1.08x | 1.00x | 1.08x |
| 23 | 0 | 1.08x | 1.00x | 1.08x |
| 24 | 0 | 1.08x | 1.00x | 1.08x |
| 24 | 0 | 1.08x | 1.00x | 1.09x |
| 25 | 0 | 1.08x | 1.00x | 1.10x |
| 25 | 0 | 1.08x | 1.00x | 1.09x |
| 26 | 0 | 1.08x | 1.00x | 1.08x |
| 26 | 0 | 1.08x | 1.00x | 1.08x |
| 27 | 0 | 1.09x | 1.00x | 1.08x |
| 27 | 0 | 1.08x | 1.00x | 1.08x |
| 28 | 0 | 1.08x | 1.00x | 1.08x |
| 28 | 0 | 1.08x | 1.00x | 1.08x |
| 29 | 0 | 1.08x | 1.00x | 1.09x |
| 29 | 0 | 1.08x | 1.00x | 1.08x |
| 30 | 0 | 1.08x | 1.00x | 1.08x |
| 30 | 0 | 1.08x | 1.00x | 1.08x |
| 31 | 0 | 1.09x | 1.00x | 1.08x |
| 31 | 0 | 1.08x | 1.00x | 1.08x |
| 32 | 0 | 1.27x | 1.10x | 1.25x |
| 32 | 1 | 1.38x | 1.21x | 1.38x |
| 64 | 0 | 1.17x | 1.00x | 1.20x |
| 64 | 2 | 1.28x | 1.08x | 1.33x |
| 128 | 0 | 1.17x | 0.85x | 1.17x |
| 128 | 3 | 1.23x | 0.90x | 1.29x |
| 256 | 0 | 1.17x | 0.84x | 1.15x |
| 256 | 4 | 1.21x | 0.83x | 1.21x |
| 512 | 0 | 1.16x | 0.80x | 1.08x |
| 512 | 5 | 1.19x | 0.82x | 1.14x |
| 1024 | 0 | 1.15x | 0.78x | 1.09x |
| 1024 | 6 | 1.05x | 0.79x | 1.09x |
| 2048 | 0 | 1.15x | 0.76x | 1.08x |
| 2048 | 7 | 1.14x | 0.77x | 1.08x |
| 64 | 1 | 1.20x | 1.08x | 1.33x |
| 64 | 1 | 1.28x | 1.08x | 1.33x |
| 64 | 2 | 1.28x | 1.08x | 1.35x |
| 64 | 2 | 1.28x | 1.08x | 1.35x |
| 64 | 3 | 1.28x | 1.08x | 1.15x |
| 64 | 3 | 1.28x | 1.08x | 1.15x |
| 64 | 4 | 1.28x | 1.08x | 1.35x |
| 64 | 4 | 1.28x | 1.08x | 1.31x |
| 64 | 5 | 1.28x | 1.08x | 1.35x |
| 64 | 5 | 1.28x | 1.08x | 1.35x |
| 64 | 6 | 1.28x | 1.08x | 1.31x |
| 64 | 6 | 1.28x | 1.08x | 1.31x |
| 64 | 7 | 1.28x | 1.08x | 1.35x |
| 64 | 7 | 1.28x | 1.08x | 1.35x |
| 0 | 0 | 1.32x | 1.68x | 1.52x |
| 0 | 0 | 1.32x | 1.63x | 1.53x |
| 1 | 0 | 1.32x | 1.69x | 1.52x |
| 1 | 0 | 1.32x | 1.68x | 1.52x |
| 2 | 0 | 1.32x | 1.69x | 1.51x |
| 2 | 0 | 1.32x | 1.69x | 1.52x |
| 3 | 0 | 1.32x | 1.67x | 1.51x |
| 3 | 0 | 1.32x | 1.69x | 1.52x |
| 4 | 0 | 1.32x | 1.67x | 1.52x |
| 4 | 0 | 1.32x | 1.69x | 1.56x |
| 5 | 0 | 1.32x | 1.69x | 1.52x |
| 5 | 0 | 1.32x | 1.69x | 1.52x |
| 6 | 0 | 1.32x | 1.69x | 1.51x |
| 6 | 0 | 1.32x | 1.69x | 1.52x |
| 7 | 0 | 1.32x | 1.63x | 1.52x |
| 7 | 0 | 1.32x | 1.63x | 1.53x |
| 8 | 0 | 1.32x | 1.65x | 1.52x |
| 8 | 0 | 1.32x | 1.63x | 1.52x |
| 9 | 0 | 1.32x | 1.63x | 1.51x |
| 9 | 0 | 1.32x | 1.64x | 1.52x |
| 10 | 0 | 1.32x | 1.63x | 1.52x |
| 10 | 0 | 1.32x | 1.65x | 1.52x |
| 11 | 0 | 1.32x | 1.63x | 1.52x |
| 11 | 0 | 1.32x | 1.63x | 1.51x |
| 12 | 0 | 1.32x | 1.63x | 1.53x |
| 12 | 0 | 1.32x | 1.63x | 1.51x |
| 13 | 0 | 1.32x | 1.63x | 1.52x |
| 13 | 0 | 1.32x | 1.65x | 1.52x |
| 14 | 0 | 1.32x | 1.66x | 1.53x |
| 14 | 0 | 1.32x | 1.64x | 1.26x |
| 15 | 0 | 1.32x | 1.68x | 1.26x |
| 15 | 0 | 1.32x | 1.69x | 1.26x |
| 16 | 0 | 1.08x | 1.00x | 1.03x |
| 16 | 0 | 1.08x | 1.00x | 1.05x |
| 17 | 0 | 1.08x | 1.00x | 1.08x |
| 17 | 0 | 1.09x | 1.00x | 1.03x |
| 18 | 0 | 1.09x | 1.00x | 1.08x |
| 18 | 0 | 1.08x | 1.00x | 1.08x |
| 19 | 0 | 1.08x | 1.00x | 1.08x |
| 19 | 0 | 1.08x | 1.00x | 1.09x |
| 20 | 0 | 1.09x | 1.00x | 1.08x |
| 20 | 0 | 1.08x | 1.00x | 1.08x |
| 21 | 0 | 1.08x | 1.00x | 1.09x |
| 21 | 0 | 1.08x | 1.00x | 1.08x |
| 22 | 0 | 1.09x | 1.00x | 1.08x |
| 22 | 0 | 1.08x | 1.00x | 1.09x |
| 23 | 0 | 1.08x | 1.00x | 1.08x |
| 23 | 0 | 1.08x | 1.00x | 1.08x |
| 24 | 0 | 1.08x | 1.00x | 1.08x |
| 24 | 0 | 1.08x | 1.00x | 1.08x |
| 25 | 0 | 1.08x | 1.00x | 1.08x |
| 25 | 0 | 1.08x | 1.00x | 1.09x |
| 26 | 0 | 1.08x | 1.00x | 1.08x |
| 26 | 0 | 1.08x | 1.00x | 1.09x |
| 27 | 0 | 1.09x | 1.00x | 1.08x |
| 27 | 0 | 1.08x | 1.00x | 1.08x |
| 28 | 0 | 1.08x | 1.00x | 1.08x |
| 28 | 0 | 1.09x | 1.00x | 1.03x |
| 29 | 0 | 1.08x | 1.00x | 1.03x |
| 29 | 0 | 1.08x | 1.00x | 1.03x |
| 30 | 0 | 1.08x | 1.00x | 1.08x |
| 30 | 0 | 1.08x | 1.00x | 1.08x |
| 31 | 0 | 1.09x | 1.00x | 1.08x |
| 31 | 0 | 1.08x | 1.00x | 1.08x |
This patch is passing GLIBC tests.
Regards
Andrea
8< --- 8< --- 8<
Introduce an Arm MTE compatible strchr implementation.
The existing implementation assumes that any access to the pages in
which the string resides is safe. This assumption is not true when
MTE is enabled. This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance.
Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strchr-mte.patch
Type: text/x-diff
Size: 6241 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200605/b7df1266/attachment-0001.bin>
More information about the Libc-alpha
mailing list