[PATCH] aarch64: MTE compatible strcmp

Alex Butler Alex.Butler@arm.com
Tue Jun 16 12:42:38 GMT 2020


This patch adds an MTE compatible implementation of strcmp.

Please see the benchmark results for the performance uplift.

| length | align1 | align2 | uplift A72 | uplift A53 | uplift N1 |
|      1 |      1 |      1 |      0.87x |      1.03x |     1.06x |
|      1 |      1 |      1 |      0.95x |      1.03x |     0.90x |
|      1 |      1 |      1 |      0.95x |      1.03x |     0.90x |
|      2 |      2 |      2 |      0.95x |      1.03x |     1.08x |
|      2 |      2 |      2 |      0.95x |      1.08x |     1.08x |
|      2 |      2 |      2 |      0.83x |      1.03x |     1.08x |
|      3 |      3 |      3 |      0.95x |      1.03x |     1.17x |
|      3 |      3 |      3 |      0.95x |      1.03x |     1.08x |
|      3 |      3 |      3 |      0.95x |      1.04x |     1.08x |
|      4 |      4 |      4 |      0.61x |      1.10x |     0.95x |
|      4 |      4 |      4 |      0.95x |      1.03x |     1.05x |
|      4 |      4 |      4 |      0.95x |      1.02x |     1.08x |
|      5 |      5 |      5 |      0.59x |      1.09x |     0.95x |
|      5 |      5 |      5 |      0.59x |      1.10x |     0.95x |
|      5 |      5 |      5 |      0.59x |      1.10x |     0.95x |
|      6 |      6 |      6 |      0.62x |      1.08x |     0.95x |
|      6 |      6 |      6 |      0.62x |      1.10x |     0.78x |
|      6 |      6 |      6 |      0.62x |      1.10x |     0.95x |
|      7 |      7 |      7 |      0.62x |      1.09x |     0.95x |
|      7 |      7 |      7 |      0.62x |      1.10x |     0.95x |
|      7 |      7 |      7 |      0.62x |      1.11x |     1.01x |
|      8 |      8 |      8 |      1.68x |      0.95x |     0.96x |
|      8 |      8 |      8 |      0.93x |      0.96x |     1.03x |
|      8 |      8 |      8 |      0.95x |      0.92x |     1.03x |
|      9 |      9 |      9 |      0.62x |      1.10x |     0.96x |
|      9 |      9 |      9 |      0.62x |      1.10x |     0.95x |
|      9 |      9 |      9 |      0.60x |      1.10x |     0.98x |
|     10 |     10 |     10 |      0.59x |      1.10x |     0.96x |
|     10 |     10 |     10 |      0.59x |      1.10x |     0.96x |
|     10 |     10 |     10 |      0.59x |      1.09x |     0.96x |
|     11 |     11 |     11 |      0.59x |      1.10x |     0.96x |
|     11 |     11 |     11 |      0.59x |      1.10x |     0.96x |
|     11 |     11 |     11 |      0.59x |      1.10x |     0.96x |
|     12 |     12 |     12 |      0.98x |      1.06x |     1.00x |
|     12 |     12 |     12 |      0.62x |      1.10x |     0.80x |
|     12 |     12 |     12 |      0.61x |      1.10x |     0.96x |
|     13 |     13 |     13 |      1.00x |      1.05x |     1.00x |
|     13 |     13 |     13 |      1.00x |      1.07x |     1.00x |
|     13 |     13 |     13 |      1.00x |      1.05x |     1.00x |
|     14 |     14 |     14 |      1.00x |      1.07x |     1.00x |
|     14 |     14 |     14 |      1.01x |      1.05x |     1.00x |
|     14 |     14 |     14 |      1.00x |      1.07x |     1.00x |
|     15 |     15 |     15 |      0.99x |      1.05x |     1.12x |
|     15 |     15 |     15 |      1.00x |      1.07x |     1.12x |
|     15 |     15 |     15 |      1.00x |      1.05x |     1.12x |
|     16 |     16 |     16 |      1.00x |      1.03x |     1.01x |
|     16 |     16 |     16 |      1.76x |      1.00x |     0.97x |
|     16 |     16 |     16 |      1.86x |      1.00x |     0.97x |
|     17 |     17 |     17 |      1.00x |      1.06x |     1.00x |
|     17 |     17 |     17 |      1.00x |      1.06x |     1.01x |
|     17 |     17 |     17 |      1.00x |      1.05x |     1.00x |
|     18 |     18 |     18 |      1.00x |      1.07x |     1.00x |
|     18 |     18 |     18 |      1.01x |      1.05x |     1.00x |
|     18 |     18 |     18 |      1.00x |      1.06x |     1.00x |
|     19 |     19 |     19 |      1.00x |      1.05x |     1.00x |
|     19 |     19 |     19 |      1.01x |      1.06x |     1.00x |
|     19 |     19 |     19 |      1.00x |      1.06x |     1.00x |
|     20 |     20 |     20 |      1.03x |      1.07x |     1.04x |
|     20 |     20 |     20 |      1.00x |      1.07x |     1.00x |
|     20 |     20 |     20 |      0.99x |      1.06x |     1.00x |
|     21 |     21 |     21 |      1.03x |      1.07x |     1.04x |
|     21 |     21 |     21 |      1.03x |      1.07x |     1.04x |
|     21 |     21 |     21 |      1.05x |      1.07x |     1.04x |
|     22 |     22 |     22 |      1.03x |      1.08x |     1.04x |
|     22 |     22 |     22 |      1.03x |      1.07x |     1.04x |
|     22 |     22 |     22 |      1.03x |      1.07x |     1.04x |
|     23 |     23 |     23 |      1.02x |      1.07x |     1.03x |
|     23 |     23 |     23 |      1.03x |      1.07x |     1.04x |
|     23 |     23 |     23 |      1.03x |      1.07x |     1.04x |
|     24 |     24 |     24 |      1.04x |      1.04x |     1.05x |
|     24 |     24 |     24 |      1.00x |      1.03x |     0.82x |
|     24 |     24 |     24 |      1.01x |      0.98x |     1.01x |
|     25 |     25 |     25 |      1.03x |      1.07x |     1.04x |
|     25 |     25 |     25 |      1.03x |      1.07x |     1.04x |
|     25 |     25 |     25 |      1.03x |      1.07x |     1.04x |
|     26 |     26 |     26 |      1.03x |      1.07x |     1.04x |
|     26 |     26 |     26 |      1.03x |      1.07x |     1.04x |
|     26 |     26 |     26 |      1.03x |      1.07x |     1.04x |
|     27 |     27 |     27 |      1.03x |      1.07x |     1.04x |
|     27 |     27 |     27 |      1.03x |      1.07x |     1.04x |
|     27 |     27 |     27 |      1.03x |      1.08x |     1.04x |
|     28 |     28 |     28 |      1.05x |      1.09x |     1.06x |
|     28 |     28 |     28 |      1.03x |      1.08x |     1.04x |
|     28 |     28 |     28 |      1.02x |      1.07x |     1.04x |
|     29 |     29 |     29 |      1.06x |      1.09x |     1.09x |
|     29 |     29 |     29 |      1.06x |      1.09x |     1.06x |
|     29 |     29 |     29 |      1.06x |      1.08x |     1.06x |
|     30 |     30 |     30 |      1.06x |      1.09x |     1.06x |
|     30 |     30 |     30 |      1.06x |      1.09x |     1.05x |
|     30 |     30 |     30 |      1.07x |      1.09x |     1.06x |
|     31 |     31 |     31 |      1.06x |      1.09x |     1.06x |
|     31 |     31 |     31 |      1.06x |      1.09x |     1.06x |
|     31 |     31 |     31 |      1.06x |      1.09x |     1.05x |
|      4 |      0 |      0 |      0.94x |      0.96x |     1.01x |
|      4 |      0 |      0 |      0.94x |      0.96x |     1.00x |
|      4 |      0 |      0 |      0.94x |      0.95x |     0.90x |
|      4 |      0 |      0 |      0.94x |      0.96x |     0.90x |
|      4 |      0 |      0 |      0.94x |      0.96x |     0.91x |
|      4 |      0 |      0 |      0.94x |      0.96x |     0.91x |
|      4 |      0 |      1 |      0.92x |      0.93x |     0.73x |
|      4 |      1 |      2 |      0.87x |      0.98x |     0.84x |
|      8 |      0 |      0 |      1.00x |      1.00x |     0.76x |
|      8 |      0 |      0 |      1.00x |      0.96x |     0.75x |
|      8 |      0 |      0 |      0.92x |      0.90x |     1.01x |
|      8 |      0 |      0 |      0.94x |      0.96x |     1.00x |
|      8 |      0 |      0 |      0.94x |      0.97x |     1.01x |
|      8 |      0 |      0 |      0.94x |      0.90x |     0.99x |
|      8 |      0 |      2 |      0.86x |      0.95x |     0.78x |
|      8 |      2 |      3 |      0.99x |      0.96x |     1.22x |
|     16 |      0 |      0 |      1.01x |      1.02x |     1.01x |
|     16 |      0 |      0 |      1.00x |      0.97x |     1.01x |
|     16 |      0 |      0 |      1.00x |      1.00x |     0.76x |
|     16 |      0 |      0 |      1.00x |      1.00x |     0.97x |
|     16 |      0 |      0 |      1.00x |      1.00x |     0.97x |
|     16 |      0 |      0 |      1.00x |      1.00x |     0.97x |
|     16 |      0 |      3 |      0.86x |      1.00x |     0.88x |
|     16 |      3 |      4 |      1.00x |      0.93x |     1.10x |
|     32 |      0 |      0 |      1.07x |      1.04x |     1.08x |
|     32 |      0 |      0 |      1.08x |      1.04x |     1.08x |
|     32 |      0 |      0 |      1.04x |      1.05x |     1.05x |
|     32 |      0 |      0 |      1.04x |      0.96x |     1.05x |
|     32 |      0 |      0 |      1.04x |      0.96x |     1.05x |
|     32 |      0 |      0 |      1.04x |      0.98x |     1.05x |
|     32 |      0 |      4 |      0.91x |      1.03x |     0.93x |
|     32 |      4 |      5 |      0.94x |      1.00x |     1.00x |
|     64 |      0 |      0 |      1.12x |      1.03x |     1.16x |
|     64 |      0 |      0 |      1.11x |      1.03x |     1.16x |
|     64 |      0 |      0 |      1.11x |      1.04x |     1.15x |
|     64 |      0 |      0 |      1.11x |      1.03x |     1.15x |
|     64 |      0 |      0 |      1.11x |      1.03x |     1.02x |
|     64 |      0 |      0 |      1.11x |      1.03x |     1.02x |
|     64 |      0 |      5 |      1.05x |      1.19x |     1.15x |
|     64 |      5 |      6 |      1.04x |      1.14x |     1.12x |
|    128 |      0 |      0 |      1.11x |      0.99x |     1.18x |
|    128 |      0 |      0 |      1.11x |      0.99x |     1.18x |
|    128 |      0 |      0 |      1.09x |      0.99x |     1.18x |
|    128 |      0 |      0 |      1.09x |      0.99x |     1.18x |
|    128 |      0 |      0 |      1.09x |      0.99x |     1.18x |
|    128 |      0 |      0 |      1.09x |      0.99x |     1.18x |
|    128 |      0 |      6 |      1.15x |      1.04x |     1.26x |
|    128 |      6 |      7 |      1.22x |      1.05x |     1.27x |
|    256 |      0 |      0 |      1.13x |      1.00x |     1.19x |
|    256 |      0 |      0 |      1.13x |      1.00x |     1.19x |
|    256 |      0 |      0 |      1.12x |      0.99x |     1.19x |
|    256 |      0 |      0 |      1.12x |      1.00x |     1.19x |
|    256 |      0 |      0 |      1.12x |      1.00x |     1.19x |
|    256 |      0 |      0 |      1.12x |      0.99x |     1.19x |
|    256 |      0 |      7 |      1.53x |      1.08x |     1.33x |
|    256 |      7 |      8 |      1.51x |      1.07x |     1.33x |
|    512 |      0 |      0 |      1.16x |      1.00x |     1.19x |
|    512 |      0 |      0 |      1.15x |      1.00x |     1.19x |
|    512 |      0 |      0 |      1.15x |      1.00x |     1.19x |
|    512 |      0 |      0 |      1.15x |      1.00x |     1.19x |
|    512 |      0 |      0 |      1.18x |      1.00x |     1.19x |
|    512 |      0 |      0 |      1.16x |      1.00x |     1.19x |
|    512 |      0 |      8 |      1.15x |      1.00x |     1.19x |
|    512 |      8 |      9 |      1.35x |      1.10x |     1.37x |
|   1024 |      0 |      0 |      1.19x |      1.00x |     1.19x |
|   1024 |      0 |      0 |      1.20x |      1.00x |     1.19x |
|   1024 |      0 |      0 |      1.19x |      1.00x |     1.19x |
|   1024 |      0 |      0 |      1.16x |      1.00x |     1.19x |
|   1024 |      0 |      0 |      1.16x |      1.00x |     1.19x |
|   1024 |      0 |      0 |      1.17x |      1.00x |     1.19x |
|   1024 |      0 |      9 |      1.39x |      1.12x |     1.40x |
|   1024 |      9 |     10 |      1.34x |      1.11x |     1.35x |
|     16 |      1 |      2 |      0.96x |      0.96x |     1.08x |
|     16 |      2 |      1 |      0.86x |      0.95x |     0.97x |
|     16 |      1 |      2 |      0.97x |      0.96x |     1.08x |
|     16 |      2 |      1 |      0.86x |      0.95x |     0.97x |
|     16 |      1 |      2 |      0.95x |      0.95x |     1.19x |
|     16 |      2 |      1 |      0.86x |      0.95x |     1.07x |
|     32 |      2 |      4 |      0.91x |      0.98x |     1.00x |
|     32 |      4 |      2 |      0.92x |      0.93x |     0.97x |
|     32 |      2 |      4 |      0.89x |      0.96x |     1.00x |
|     32 |      4 |      2 |      0.92x |      0.92x |     0.97x |
|     32 |      2 |      4 |      0.91x |      0.96x |     1.00x |
|     32 |      4 |      2 |      0.92x |      1.00x |     0.97x |
|     64 |      3 |      6 |      1.00x |      1.11x |     1.10x |
|     64 |      6 |      3 |      1.03x |      1.15x |     1.09x |
|     64 |      3 |      6 |      1.01x |      1.11x |     1.11x |
|     64 |      6 |      3 |      1.03x |      1.15x |     1.09x |
|     64 |      3 |      6 |      1.00x |      1.11x |     1.10x |
|     64 |      6 |      3 |      1.02x |      1.16x |     1.09x |
|    128 |      4 |      8 |      1.17x |      1.03x |     1.23x |
|    128 |      8 |      4 |      1.17x |      1.06x |     1.26x |
|    128 |      4 |      8 |      1.20x |      1.05x |     1.24x |
|    128 |      8 |      4 |      1.16x |      1.04x |     1.26x |
|    128 |      4 |      8 |      1.16x |      1.03x |     1.24x |
|    128 |      8 |      4 |      1.15x |      1.04x |     1.26x |
|    256 |      5 |     10 |      1.52x |      1.07x |     1.31x |
|    256 |     10 |      5 |      1.48x |      1.07x |     1.31x |
|    256 |      5 |     10 |      1.52x |      1.07x |     1.33x |
|    256 |     10 |      5 |      1.47x |      1.07x |     1.30x |
|    256 |      5 |     10 |      1.52x |      1.07x |     1.33x |
|    256 |     10 |      5 |      1.48x |      1.08x |     1.30x |
|    512 |      6 |     12 |      1.26x |      1.10x |     1.37x |
|    512 |     12 |      6 |      1.33x |      1.11x |     1.35x |
|    512 |      6 |     12 |      1.27x |      1.10x |     1.37x |
|    512 |     12 |      6 |      1.33x |      1.11x |     1.35x |
|    512 |      6 |     12 |      1.27x |      1.10x |     1.37x |
|    512 |     12 |      6 |      1.33x |      1.11x |     1.35x |
|   1024 |      7 |     14 |      1.39x |      1.12x |     1.45x |
|   1024 |     14 |      7 |      1.32x |      1.13x |     1.41x |
|   1024 |      7 |     14 |      1.39x |      1.12x |     1.45x |
|   1024 |     14 |      7 |      1.32x |      1.13x |     1.42x |
|   1024 |      7 |     14 |      1.39x |      1.12x |     1.45x |
|   1024 |     14 |      7 |      1.33x |      1.13x |     1.41x |

This patch passes the tests with no regressions.

8< --- 8< --- 8<
Add support for MTE to strcmp. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.

The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.

Co-authored-by: Branislav Rankov <branislav.rankov@arm.com>
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-add-MTE-compatible-strcmp.patch
Type: text/x-patch
Size: 8320 bytes
Desc: 0001-aarch64-add-MTE-compatible-strcmp.patch
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200616/d47852b0/attachment-0001.bin>


More information about the Libc-alpha mailing list