[PATCH] aarch64: MTE compatible strlen

Andrea Corallo andrea.corallo@arm.com
Wed Jun 3 09:53:11 GMT 2020


Hi all,

I'd like to submit this patch introducing an Arm MTE compatible strlen
implementation.

Follows a performance comparison of the strlen benchmark run on 
Cortex-A72, Cortex-A53, Neoverse N1.

| length | alignment | perf-uplift A72 | perf-uplift A53 |perf-uplift |
|--------+-----------+-----------------+-----------------|------------|
|      1 |         1 |           1.00x |           0.96x |      1.13x |
|      1 |         0 |           2.15x |           0.96x |      1.00x |
|      2 |         2 |           1.16x |           0.95x |      1.09x |
|      2 |         0 |           1.17x |           0.93x |      1.00x |
|      3 |         3 |           1.30x |           0.95x |      1.09x |
|      3 |         0 |           1.32x |           0.96x |      1.00x |
|      4 |         4 |           1.14x |           0.87x |      0.99x |
|      4 |         0 |           1.14x |           0.96x |      1.00x |
|      5 |         5 |           1.15x |           0.89x |      1.09x |
|      5 |         0 |           1.19x |           0.96x |      1.00x |
|      6 |         6 |           1.14x |           0.96x |      1.39x |
|      6 |         0 |           1.14x |           0.95x |      1.00x |
|      7 |         7 |           1.03x |           0.90x |      1.09x |
|      7 |         0 |           1.14x |           0.95x |      1.27x |
|      4 |         0 |           1.15x |           0.87x |      1.00x |
|      4 |         7 |           1.15x |           0.96x |      1.10x |
|      4 |         2 |           1.27x |           0.95x |      1.39x |
|      2 |         2 |           1.14x |           0.96x |      1.09x |
|      8 |         0 |           1.15x |           0.96x |      1.00x |
|      8 |         7 |           1.14x |           0.96x |      1.09x |
|      8 |         3 |           1.17x |           0.96x |      1.39x |
|      5 |         3 |           1.14x |           0.96x |      1.39x |
|     16 |         0 |           1.15x |           0.83x |      1.48x |
|     16 |         7 |           1.14x |           0.80x |      1.43x |
|     16 |         4 |           1.15x |           0.83x |      1.48x |
|     10 |         4 |           1.15x |           0.96x |      1.27x |
|     32 |         0 |           1.04x |           0.88x |      1.16x |
|     32 |         7 |           1.02x |           0.84x |      1.19x |
|     32 |         5 |           1.04x |           0.84x |      1.23x |
|     21 |         5 |           1.14x |           0.83x |      1.60x |
|     64 |         0 |           1.17x |           0.80x |      1.75x |
|     64 |         7 |           1.17x |           0.77x |      1.83x |
|     64 |         6 |           1.17x |           0.77x |      1.57x |
|     42 |         6 |           1.00x |           0.80x |      1.42x |
|    128 |         0 |           0.96x |           0.68x |      1.80x |
|    128 |         7 |           0.96x |           0.66x |      1.85x |
|    128 |         7 |           0.96x |           0.67x |      1.86x |
|     85 |         7 |           1.05x |           0.75x |      1.87x |
|    256 |         0 |           0.98x |           0.69x |      1.88x |
|    256 |         7 |           0.98x |           0.68x |      1.92x |
|    256 |         8 |           0.99x |           0.69x |      1.88x |
|    170 |         8 |           0.96x |           0.72x |      1.86x |
|    512 |         0 |           0.99x |           0.65x |      1.90x |
|    512 |         7 |           0.98x |           0.65x |      1.92x |
|    512 |         9 |           0.99x |           0.65x |      1.92x |
|    341 |         9 |           0.98x |           0.68x |      1.99x |
|   1024 |         0 |           0.99x |           0.63x |      1.90x |
|   1024 |         7 |           0.99x |           0.62x |      1.92x |
|   1024 |        10 |           0.99x |           0.62x |      1.92x |
|    682 |        10 |           0.99x |           0.64x |      1.96x |
|   2048 |         0 |           0.99x |           0.61x |      1.92x |
|   2048 |         7 |           1.01x |           0.61x |      1.93x |
|   2048 |        11 |           1.00x |           0.61x |      1.95x |
|   1365 |        11 |           1.00x |           0.62x |      1.94x |
|   4096 |         0 |           1.00x |           0.61x |      1.93x |
|   4096 |         7 |           1.00x |           0.61x |      1.94x |
|   4096 |        12 |           1.00x |           0.61x |      1.95x |
|   2730 |        12 |           1.00x |           0.61x |      1.94x |

This patch is passing GLIBC tests.

Regards

  Andrea

8< --- 8< --- 8<
Introduce an Arm MTE compatible strlen implementation.


Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1 does not show
performance regressions.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: strlen.patch
Type: text/x-diff
Size: 8030 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200603/63286801/attachment.bin>


More information about the Libc-alpha mailing list