[PATCH v2] aarch64: MTE compatible strchr

Andrea Corallo andrea.corallo@arm.com
Fri Jun 5 15:20:50 GMT 2020


Hi all,

I'd like to submit this patch introducing an Arm MTE compatible strchr
implementation.

Follows a performance comparison (obtained using glibc benchtests) of the 
strchr benchmark run on Cortex-A72, Cortex-A53, Neoverse N1.

| length | alignment | perf-uplift A72 | perf-uplift A53 | perf-uplift N1 |
|--------+-----------+-----------------+-----------------+----------------|
|     32 |         0 |           1.91x |           1.10x |          1.33x |
|     32 |         1 |           2.06x |           1.22x |          1.41x |
|     64 |         0 |           1.61x |           1.00x |          1.18x |
|     64 |         2 |           1.69x |           1.08x |          1.15x |
|    128 |         0 |           1.51x |           0.85x |          1.06x |
|    128 |         3 |           1.57x |           0.90x |          1.15x |
|    256 |         0 |           1.37x |           0.84x |          1.09x |
|    256 |         4 |           1.41x |           0.83x |          1.15x |
|    512 |         0 |           1.18x |           0.80x |          1.09x |
|    512 |         5 |           1.19x |           0.82x |          1.14x |
|   1024 |         0 |           1.15x |           0.78x |          1.09x |
|   1024 |         6 |           1.05x |           0.79x |          1.09x |
|   2048 |         0 |           1.15x |           0.76x |          1.08x |
|   2048 |         7 |           1.13x |           0.77x |          1.08x |
|     64 |         1 |           1.28x |           1.08x |          1.33x |
|     64 |         1 |           1.28x |           1.08x |          1.31x |
|     64 |         2 |           1.28x |           1.08x |          1.31x |
|     64 |         2 |           1.28x |           1.08x |          1.15x |
|     64 |         3 |           1.28x |           1.08x |          1.15x |
|     64 |         3 |           1.28x |           1.08x |          1.31x |
|     64 |         4 |           1.28x |           1.08x |          1.31x |
|     64 |         4 |           1.28x |           1.08x |          1.31x |
|     64 |         5 |           1.28x |           1.08x |          1.31x |
|     64 |         5 |           1.28x |           1.08x |          1.31x |
|     64 |         6 |           1.28x |           1.08x |          1.31x |
|     64 |         6 |           1.28x |           1.08x |          1.31x |
|     64 |         7 |           1.28x |           1.08x |          1.31x |
|     64 |         7 |           1.28x |           1.08x |          1.31x |
|      0 |         0 |           1.32x |           1.63x |          1.53x |
|      0 |         0 |           1.32x |           1.63x |          1.53x |
|      1 |         0 |           1.31x |           1.64x |          1.53x |
|      1 |         0 |           1.32x |           1.67x |          1.53x |
|      2 |         0 |           1.32x |           1.63x |          1.52x |
|      2 |         0 |           1.32x |           1.69x |          1.52x |
|      3 |         0 |           1.32x |           1.67x |          1.51x |
|      3 |         0 |           1.32x |           1.66x |          1.52x |
|      4 |         0 |           1.32x |           1.69x |          1.52x |
|      4 |         0 |           1.32x |           1.69x |          1.52x |
|      5 |         0 |           1.32x |           1.69x |          1.26x |
|      5 |         0 |           1.32x |           1.69x |          1.26x |
|      6 |         0 |           1.32x |           1.69x |          1.26x |
|      6 |         0 |           1.32x |           1.68x |          1.51x |
|      7 |         0 |           1.32x |           1.63x |          1.54x |
|      7 |         0 |           1.32x |           1.63x |          1.52x |
|      8 |         0 |           1.32x |           1.69x |          1.53x |
|      8 |         0 |           1.32x |           1.65x |          1.53x |
|      9 |         0 |           1.32x |           1.63x |          1.54x |
|      9 |         0 |           1.32x |           1.68x |          1.52x |
|     10 |         0 |           1.32x |           1.63x |          1.52x |
|     10 |         0 |           1.32x |           1.69x |          1.51x |
|     11 |         0 |           1.32x |           1.64x |          1.52x |
|     11 |         0 |           1.32x |           1.63x |          1.52x |
|     12 |         0 |           1.32x |           1.64x |          1.52x |
|     12 |         0 |           1.32x |           1.68x |          1.54x |
|     13 |         0 |           1.32x |           1.63x |          1.53x |
|     13 |         0 |           1.32x |           1.67x |          1.52x |
|     14 |         0 |           1.32x |           1.65x |          1.53x |
|     14 |         0 |           1.32x |           1.63x |          1.52x |
|     15 |         0 |           1.32x |           1.67x |          1.52x |
|     15 |         0 |           1.32x |           1.65x |          1.26x |
|     16 |         0 |           1.08x |           1.00x |          1.03x |
|     16 |         0 |           1.08x |           1.00x |          1.03x |
|     17 |         0 |           1.09x |           1.00x |          1.03x |
|     17 |         0 |           1.09x |           1.00x |          1.03x |
|     18 |         0 |           1.09x |           1.00x |          1.03x |
|     18 |         0 |           1.08x |           1.00x |          1.03x |
|     19 |         0 |           1.08x |           1.00x |          1.03x |
|     19 |         0 |           1.08x |           1.00x |          1.03x |
|     20 |         0 |           1.08x |           1.00x |          1.03x |
|     20 |         0 |           1.09x |           1.00x |          1.03x |
|     21 |         0 |           1.08x |           1.00x |          1.03x |
|     21 |         0 |           1.08x |           1.00x |          1.08x |
|     22 |         0 |           1.09x |           1.00x |          1.09x |
|     22 |         0 |           1.08x |           1.00x |          1.09x |
|     23 |         0 |           1.08x |           1.00x |          1.08x |
|     23 |         0 |           1.08x |           1.00x |          1.08x |
|     24 |         0 |           1.08x |           1.00x |          1.08x |
|     24 |         0 |           1.08x |           1.00x |          1.09x |
|     25 |         0 |           1.08x |           1.00x |          1.10x |
|     25 |         0 |           1.08x |           1.00x |          1.09x |
|     26 |         0 |           1.08x |           1.00x |          1.08x |
|     26 |         0 |           1.08x |           1.00x |          1.08x |
|     27 |         0 |           1.09x |           1.00x |          1.08x |
|     27 |         0 |           1.08x |           1.00x |          1.08x |
|     28 |         0 |           1.08x |           1.00x |          1.08x |
|     28 |         0 |           1.08x |           1.00x |          1.08x |
|     29 |         0 |           1.08x |           1.00x |          1.09x |
|     29 |         0 |           1.08x |           1.00x |          1.08x |
|     30 |         0 |           1.08x |           1.00x |          1.08x |
|     30 |         0 |           1.08x |           1.00x |          1.08x |
|     31 |         0 |           1.09x |           1.00x |          1.08x |
|     31 |         0 |           1.08x |           1.00x |          1.08x |
|     32 |         0 |           1.27x |           1.10x |          1.25x |
|     32 |         1 |           1.38x |           1.21x |          1.38x |
|     64 |         0 |           1.17x |           1.00x |          1.20x |
|     64 |         2 |           1.28x |           1.08x |          1.33x |
|    128 |         0 |           1.17x |           0.85x |          1.17x |
|    128 |         3 |           1.23x |           0.90x |          1.29x |
|    256 |         0 |           1.17x |           0.84x |          1.15x |
|    256 |         4 |           1.21x |           0.83x |          1.21x |
|    512 |         0 |           1.16x |           0.80x |          1.08x |
|    512 |         5 |           1.19x |           0.82x |          1.14x |
|   1024 |         0 |           1.15x |           0.78x |          1.09x |
|   1024 |         6 |           1.05x |           0.79x |          1.09x |
|   2048 |         0 |           1.15x |           0.76x |          1.08x |
|   2048 |         7 |           1.14x |           0.77x |          1.08x |
|     64 |         1 |           1.20x |           1.08x |          1.33x |
|     64 |         1 |           1.28x |           1.08x |          1.33x |
|     64 |         2 |           1.28x |           1.08x |          1.35x |
|     64 |         2 |           1.28x |           1.08x |          1.35x |
|     64 |         3 |           1.28x |           1.08x |          1.15x |
|     64 |         3 |           1.28x |           1.08x |          1.15x |
|     64 |         4 |           1.28x |           1.08x |          1.35x |
|     64 |         4 |           1.28x |           1.08x |          1.31x |
|     64 |         5 |           1.28x |           1.08x |          1.35x |
|     64 |         5 |           1.28x |           1.08x |          1.35x |
|     64 |         6 |           1.28x |           1.08x |          1.31x |
|     64 |         6 |           1.28x |           1.08x |          1.31x |
|     64 |         7 |           1.28x |           1.08x |          1.35x |
|     64 |         7 |           1.28x |           1.08x |          1.35x |
|      0 |         0 |           1.32x |           1.68x |          1.52x |
|      0 |         0 |           1.32x |           1.63x |          1.53x |
|      1 |         0 |           1.32x |           1.69x |          1.52x |
|      1 |         0 |           1.32x |           1.68x |          1.52x |
|      2 |         0 |           1.32x |           1.69x |          1.51x |
|      2 |         0 |           1.32x |           1.69x |          1.52x |
|      3 |         0 |           1.32x |           1.67x |          1.51x |
|      3 |         0 |           1.32x |           1.69x |          1.52x |
|      4 |         0 |           1.32x |           1.67x |          1.52x |
|      4 |         0 |           1.32x |           1.69x |          1.56x |
|      5 |         0 |           1.32x |           1.69x |          1.52x |
|      5 |         0 |           1.32x |           1.69x |          1.52x |
|      6 |         0 |           1.32x |           1.69x |          1.51x |
|      6 |         0 |           1.32x |           1.69x |          1.52x |
|      7 |         0 |           1.32x |           1.63x |          1.52x |
|      7 |         0 |           1.32x |           1.63x |          1.53x |
|      8 |         0 |           1.32x |           1.65x |          1.52x |
|      8 |         0 |           1.32x |           1.63x |          1.52x |
|      9 |         0 |           1.32x |           1.63x |          1.51x |
|      9 |         0 |           1.32x |           1.64x |          1.52x |
|     10 |         0 |           1.32x |           1.63x |          1.52x |
|     10 |         0 |           1.32x |           1.65x |          1.52x |
|     11 |         0 |           1.32x |           1.63x |          1.52x |
|     11 |         0 |           1.32x |           1.63x |          1.51x |
|     12 |         0 |           1.32x |           1.63x |          1.53x |
|     12 |         0 |           1.32x |           1.63x |          1.51x |
|     13 |         0 |           1.32x |           1.63x |          1.52x |
|     13 |         0 |           1.32x |           1.65x |          1.52x |
|     14 |         0 |           1.32x |           1.66x |          1.53x |
|     14 |         0 |           1.32x |           1.64x |          1.26x |
|     15 |         0 |           1.32x |           1.68x |          1.26x |
|     15 |         0 |           1.32x |           1.69x |          1.26x |
|     16 |         0 |           1.08x |           1.00x |          1.03x |
|     16 |         0 |           1.08x |           1.00x |          1.05x |
|     17 |         0 |           1.08x |           1.00x |          1.08x |
|     17 |         0 |           1.09x |           1.00x |          1.03x |
|     18 |         0 |           1.09x |           1.00x |          1.08x |
|     18 |         0 |           1.08x |           1.00x |          1.08x |
|     19 |         0 |           1.08x |           1.00x |          1.08x |
|     19 |         0 |           1.08x |           1.00x |          1.09x |
|     20 |         0 |           1.09x |           1.00x |          1.08x |
|     20 |         0 |           1.08x |           1.00x |          1.08x |
|     21 |         0 |           1.08x |           1.00x |          1.09x |
|     21 |         0 |           1.08x |           1.00x |          1.08x |
|     22 |         0 |           1.09x |           1.00x |          1.08x |
|     22 |         0 |           1.08x |           1.00x |          1.09x |
|     23 |         0 |           1.08x |           1.00x |          1.08x |
|     23 |         0 |           1.08x |           1.00x |          1.08x |
|     24 |         0 |           1.08x |           1.00x |          1.08x |
|     24 |         0 |           1.08x |           1.00x |          1.08x |
|     25 |         0 |           1.08x |           1.00x |          1.08x |
|     25 |         0 |           1.08x |           1.00x |          1.09x |
|     26 |         0 |           1.08x |           1.00x |          1.08x |
|     26 |         0 |           1.08x |           1.00x |          1.09x |
|     27 |         0 |           1.09x |           1.00x |          1.08x |
|     27 |         0 |           1.08x |           1.00x |          1.08x |
|     28 |         0 |           1.08x |           1.00x |          1.08x |
|     28 |         0 |           1.09x |           1.00x |          1.03x |
|     29 |         0 |           1.08x |           1.00x |          1.03x |
|     29 |         0 |           1.08x |           1.00x |          1.03x |
|     30 |         0 |           1.08x |           1.00x |          1.08x |
|     30 |         0 |           1.08x |           1.00x |          1.08x |
|     31 |         0 |           1.09x |           1.00x |          1.08x |
|     31 |         0 |           1.08x |           1.00x |          1.08x |


This patch is passing GLIBC tests.

Regards

  Andrea

8< --- 8< --- 8<
Introduce an Arm MTE compatible strchr implementation.

The existing implementation assumes that any access to the pages in
which the string resides is safe.  This assumption is not true when
MTE is enabled.  This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance.

Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strchr-mte.patch
Type: text/x-diff
Size: 6241 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200605/b7df1266/attachment-0001.bin>


More information about the Libc-alpha mailing list