[PATCH v2] aarch64: MTE compatible strlen

Andrea Corallo andrea.corallo@arm.com
Fri Jun 5 15:22:26 GMT 2020


Hi all,

I'd like to submit this patch introducing an Arm MTE compatible strlen
implementation.

Follows a performance comparison (obtained using glibc benchtests) of 
the strlen benchmark run on Cortex-A72, Cortex-A53, Neoverse N1.

| length | alignment | perf-uplift A72 | perf-uplift A53 |perf-uplift |
|--------+-----------+-----------------+-----------------|------------|
|      1 |         1 |           1.00x |           0.96x |      1.13x |
|      1 |         0 |           2.15x |           0.96x |      1.00x |
|      2 |         2 |           1.16x |           0.95x |      1.09x |
|      2 |         0 |           1.17x |           0.93x |      1.00x |
|      3 |         3 |           1.30x |           0.95x |      1.09x |
|      3 |         0 |           1.32x |           0.96x |      1.00x |
|      4 |         4 |           1.14x |           0.87x |      0.99x |
|      4 |         0 |           1.14x |           0.96x |      1.00x |
|      5 |         5 |           1.15x |           0.89x |      1.09x |
|      5 |         0 |           1.19x |           0.96x |      1.00x |
|      6 |         6 |           1.14x |           0.96x |      1.39x |
|      6 |         0 |           1.14x |           0.95x |      1.00x |
|      7 |         7 |           1.03x |           0.90x |      1.09x |
|      7 |         0 |           1.14x |           0.95x |      1.27x |
|      4 |         0 |           1.15x |           0.87x |      1.00x |
|      4 |         7 |           1.15x |           0.96x |      1.10x |
|      4 |         2 |           1.27x |           0.95x |      1.39x |
|      2 |         2 |           1.14x |           0.96x |      1.09x |
|      8 |         0 |           1.15x |           0.96x |      1.00x |
|      8 |         7 |           1.14x |           0.96x |      1.09x |
|      8 |         3 |           1.17x |           0.96x |      1.39x |
|      5 |         3 |           1.14x |           0.96x |      1.39x |
|     16 |         0 |           1.15x |           0.83x |      1.48x |
|     16 |         7 |           1.14x |           0.80x |      1.43x |
|     16 |         4 |           1.15x |           0.83x |      1.48x |
|     10 |         4 |           1.15x |           0.96x |      1.27x |
|     32 |         0 |           1.04x |           0.88x |      1.16x |
|     32 |         7 |           1.02x |           0.84x |      1.19x |
|     32 |         5 |           1.04x |           0.84x |      1.23x |
|     21 |         5 |           1.14x |           0.83x |      1.60x |
|     64 |         0 |           1.17x |           0.80x |      1.75x |
|     64 |         7 |           1.17x |           0.77x |      1.83x |
|     64 |         6 |           1.17x |           0.77x |      1.57x |
|     42 |         6 |           1.00x |           0.80x |      1.42x |
|    128 |         0 |           0.96x |           0.68x |      1.80x |
|    128 |         7 |           0.96x |           0.66x |      1.85x |
|    128 |         7 |           0.96x |           0.67x |      1.86x |
|     85 |         7 |           1.05x |           0.75x |      1.87x |
|    256 |         0 |           0.98x |           0.69x |      1.88x |
|    256 |         7 |           0.98x |           0.68x |      1.92x |
|    256 |         8 |           0.99x |           0.69x |      1.88x |
|    170 |         8 |           0.96x |           0.72x |      1.86x |
|    512 |         0 |           0.99x |           0.65x |      1.90x |
|    512 |         7 |           0.98x |           0.65x |      1.92x |
|    512 |         9 |           0.99x |           0.65x |      1.92x |
|    341 |         9 |           0.98x |           0.68x |      1.99x |
|   1024 |         0 |           0.99x |           0.63x |      1.90x |
|   1024 |         7 |           0.99x |           0.62x |      1.92x |
|   1024 |        10 |           0.99x |           0.62x |      1.92x |
|    682 |        10 |           0.99x |           0.64x |      1.96x |
|   2048 |         0 |           0.99x |           0.61x |      1.92x |
|   2048 |         7 |           1.01x |           0.61x |      1.93x |
|   2048 |        11 |           1.00x |           0.61x |      1.95x |
|   1365 |        11 |           1.00x |           0.62x |      1.94x |
|   4096 |         0 |           1.00x |           0.61x |      1.93x |
|   4096 |         7 |           1.00x |           0.61x |      1.94x |
|   4096 |        12 |           1.00x |           0.61x |      1.95x |
|   2730 |        12 |           1.00x |           0.61x |      1.94x |

This patch is passing GLIBC tests.

Regards

  Andrea

8< --- 8< --- 8<
Introduce an Arm MTE compatible strlen implementation.

The existing implementation assumes that any access to the pages in
which the string resides is safe.  This assumption is not true when
MTE is enabled.  This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance.

Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strlen.patch
Type: text/x-diff
Size: 8030 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200605/34811f9e/attachment.bin>


More information about the Libc-alpha mailing list