[PATCH] aarch64: MTE compatible memchr

Alex Butler Alex.Butler@arm.com
Tue Jun 9 16:06:03 GMT 2020


This patch adds an MTE compatible implementation of memchr.

Please see the benchmark results for the performance uplift.

| length | position | alignment | uplift A72  | uplift A53  | uplift N1   |
|--------+----------+-----------+-------------|-------------|-------------|
|   2048 |       32 |         0 |       1.53x |       1.02x |       1.21x |
|    256 |       64 |         1 |       1.57x |       1.03x |       1.02x |
|   2048 |       32 |         0 |       1.53x |       1.02x |       1.21x |
|    256 |       64 |         1 |       1.57x |       1.03x |       1.00x |
|   2048 |       64 |         0 |       1.58x |       0.95x |       0.95x |
|    256 |       64 |         2 |       1.57x |       1.03x |       1.01x |
|   2048 |       64 |         0 |       1.58x |       0.95x |       0.95x |
|    256 |       64 |         2 |       1.57x |       1.03x |       1.00x |
|   2048 |      128 |         0 |       1.35x |       0.90x |       0.96x |
|    256 |       64 |         3 |       1.57x |       1.03x |       1.00x |
|   2048 |      128 |         0 |       1.35x |       0.90x |       0.96x |
|    256 |       64 |         3 |       1.57x |       1.03x |       1.00x |
|   2048 |      256 |         0 |       1.10x |       0.84x |       1.01x |
|    256 |       64 |         4 |       1.57x |       1.03x |       1.00x |
|   2048 |      256 |         0 |       1.10x |       0.83x |       1.01x |
|    256 |       64 |         4 |       1.57x |       1.03x |       1.00x |
|   2048 |      512 |         0 |       1.03x |       0.80x |       0.98x |
|    256 |       64 |         5 |       1.57x |       1.03x |       1.00x |
|   2048 |      512 |         0 |       1.01x |       0.80x |       0.98x |
|    256 |       64 |         5 |       1.57x |       1.03x |       1.00x |
|   2048 |     1024 |         0 |       0.92x |       0.78x |       0.96x |
|    256 |       64 |         6 |       1.57x |       1.03x |       1.00x |
|   2048 |     1024 |         0 |       0.92x |       0.78x |       0.97x |
|    256 |       64 |         6 |       1.57x |       1.03x |       1.00x |
|   2048 |     2048 |         0 |       0.88x |       0.76x |       0.95x |
|    256 |       64 |         7 |       1.57x |       1.03x |       1.00x |
|   2048 |     2048 |         0 |       0.86x |       0.76x |       0.96x |
|    256 |       64 |         7 |       1.05x |       1.02x |       1.00x |
|      2 |        1 |         0 |       1.38x |       1.56x |       1.38x |
|      2 |        1 |         0 |       1.37x |       1.56x |       1.38x |
|      2 |        1 |         1 |       1.37x |       1.72x |       1.55x |
|      2 |        1 |         1 |       1.37x |       1.72x |       1.55x |
|      3 |        2 |         0 |       1.38x |       1.56x |       1.38x |
|      3 |        2 |         0 |       1.37x |       1.56x |       1.37x |
|      3 |        2 |         2 |       1.38x |       1.71x |       1.55x |
|      3 |        2 |         2 |       1.37x |       1.72x |       1.51x |
|      4 |        3 |         0 |       1.37x |       1.56x |       1.38x |
|      4 |        3 |         0 |       1.37x |       1.56x |       1.37x |
|      4 |        3 |         3 |       1.37x |       1.72x |       1.50x |
|      4 |        3 |         3 |       1.37x |       1.72x |       1.55x |
|      5 |        4 |         0 |       1.37x |       1.56x |       1.38x |
|      5 |        4 |         0 |       1.37x |       1.56x |       1.37x |
|      5 |        4 |         4 |       1.37x |       1.72x |       1.50x |
|      5 |        4 |         4 |       1.37x |       1.72x |       1.55x |
|      6 |        5 |         0 |       1.37x |       1.56x |       1.38x |
|      6 |        5 |         0 |       1.37x |       1.56x |       1.38x |
|      6 |        5 |         5 |       1.38x |       1.72x |       1.54x |
|      6 |        5 |         5 |       1.38x |       1.72x |       1.56x |
|      7 |        6 |         0 |       1.38x |       1.56x |       1.38x |
|      7 |        6 |         0 |       1.37x |       1.56x |       1.38x |
|      7 |        6 |         6 |       1.37x |       1.72x |       1.50x |
|      7 |        6 |         6 |       1.37x |       1.72x |       1.54x |
|      8 |        7 |         0 |       1.38x |       1.56x |       1.38x |
|      8 |        7 |         0 |       1.38x |       1.56x |       1.37x |
|      8 |        7 |         7 |       1.37x |       1.72x |       1.53x |
|      8 |        7 |         7 |       1.37x |       1.72x |       1.50x |
|      9 |        8 |         0 |       1.37x |       1.56x |       1.38x |
|      9 |        8 |         0 |       1.38x |       1.56x |       1.38x |
|      9 |        8 |         0 |       1.37x |       1.56x |       1.37x |
|      9 |        8 |         0 |       1.38x |       1.56x |       1.37x |
|     10 |        9 |         0 |       1.37x |       1.56x |       1.38x |
|     10 |        9 |         0 |       1.37x |       1.56x |       1.37x |
|     10 |        9 |         1 |       1.38x |       1.72x |       1.55x |
|     10 |        9 |         1 |       1.37x |       1.72x |       1.50x |
|     11 |       10 |         0 |       1.38x |       1.56x |       1.38x |
|     11 |       10 |         0 |       1.37x |       1.56x |       1.37x |
|     11 |       10 |         2 |       1.38x |       1.72x |       1.55x |
|     11 |       10 |         2 |       1.37x |       1.72x |       1.55x |
|     12 |       11 |         0 |       1.38x |       1.56x |       1.38x |
|     12 |       11 |         0 |       1.38x |       1.56x |       1.37x |
|     12 |       11 |         3 |       1.38x |       1.72x |       1.50x |
|     12 |       11 |         3 |       1.38x |       1.72x |       1.50x |
|     13 |       12 |         0 |       1.37x |       1.56x |       1.38x |
|     13 |       12 |         0 |       1.37x |       1.56x |       1.37x |
|     13 |       12 |         4 |       0.88x |       1.05x |       1.03x |
|     13 |       12 |         4 |       0.88x |       1.05x |       1.03x |
|     14 |       13 |         0 |       1.37x |       1.56x |       1.38x |
|     14 |       13 |         0 |       1.37x |       1.56x |       1.37x |
|     14 |       13 |         5 |       0.88x |       1.05x |       1.03x |
|     14 |       13 |         5 |       0.88x |       1.05x |       1.04x |
|     15 |       14 |         0 |       1.37x |       1.56x |       1.38x |
|     15 |       14 |         0 |       1.37x |       1.56x |       1.38x |
|     15 |       14 |         6 |       0.88x |       1.05x |       1.02x |
|     15 |       14 |         6 |       0.88x |       1.05x |       1.03x |
|     16 |       15 |         0 |       1.38x |       1.56x |       1.38x |
|     16 |       15 |         0 |       1.38x |       1.56x |       1.37x |
|     16 |       15 |         7 |       0.88x |       1.05x |       1.03x |
|     16 |       15 |         7 |       0.88x |       1.05x |       1.00x |
|     17 |       16 |         0 |       0.88x |       0.95x |       0.92x |
|     17 |       16 |         0 |       0.88x |       0.95x |       0.92x |
|     17 |       16 |         0 |       0.88x |       0.95x |       0.92x |
|     17 |       16 |         0 |       0.88x |       0.95x |       0.92x |
|     18 |       17 |         0 |       0.88x |       0.95x |       0.92x |
|     18 |       17 |         0 |       0.88x |       0.95x |       0.92x |
|     18 |       17 |         1 |       0.88x |       1.05x |       1.01x |
|     18 |       17 |         1 |       0.88x |       1.05x |       1.03x |
|     19 |       18 |         0 |       0.88x |       0.95x |       0.92x |
|     19 |       18 |         0 |       0.88x |       0.95x |       0.92x |
|     19 |       18 |         2 |       0.88x |       1.05x |       1.11x |
|     19 |       18 |         2 |       0.88x |       1.05x |       1.09x |
|     20 |       19 |         0 |       0.88x |       0.95x |       1.00x |
|     20 |       19 |         0 |       0.88x |       0.95x |       1.00x |
|     20 |       19 |         3 |       0.88x |       1.05x |       1.12x |
|     20 |       19 |         3 |       0.88x |       1.05x |       1.13x |
|     21 |       20 |         0 |       0.88x |       0.95x |       0.92x |
|     21 |       20 |         0 |       0.88x |       0.95x |       0.92x |
|     21 |       20 |         4 |       0.88x |       1.05x |       1.03x |
|     21 |       20 |         4 |       0.88x |       1.05x |       1.00x |
|     22 |       21 |         0 |       0.88x |       0.95x |       1.00x |
|     22 |       21 |         0 |       0.88x |       0.95x |       1.00x |
|     22 |       21 |         5 |       0.88x |       1.05x |       1.09x |
|     22 |       21 |         5 |       0.88x |       1.05x |       1.09x |
|     23 |       22 |         0 |       0.88x |       0.95x |       1.00x |
|     23 |       22 |         0 |       0.88x |       0.95x |       1.00x |
|     23 |       22 |         6 |       0.88x |       1.05x |       1.13x |
|     23 |       22 |         6 |       0.88x |       1.05x |       1.14x |
|     24 |       23 |         0 |       0.88x |       0.95x |       1.00x |
|     24 |       23 |         0 |       0.88x |       0.95x |       1.00x |
|     24 |       23 |         7 |       0.88x |       1.05x |       1.13x |
|     24 |       23 |         7 |       0.88x |       1.05x |       1.11x |
|     25 |       24 |         0 |       0.88x |       0.95x |       1.00x |
|     25 |       24 |         0 |       0.88x |       0.95x |       1.00x |
|     25 |       24 |         0 |       0.88x |       0.95x |       1.00x |
|     25 |       24 |         0 |       0.88x |       0.95x |       1.00x |
|     26 |       25 |         0 |       0.88x |       0.95x |       1.00x |
|     26 |       25 |         0 |       0.88x |       0.95x |       0.92x |
|     26 |       25 |         1 |       0.88x |       1.05x |       1.03x |
|     26 |       25 |         1 |       0.88x |       1.05x |       1.03x |
|     27 |       26 |         0 |       0.88x |       0.95x |       0.92x |
|     27 |       26 |         0 |       0.88x |       0.95x |       0.92x |
|     27 |       26 |         2 |       0.88x |       1.05x |       1.03x |
|     27 |       26 |         2 |       0.88x |       1.05x |       1.00x |
|     28 |       27 |         0 |       0.88x |       0.95x |       0.92x |
|     28 |       27 |         0 |       0.88x |       0.95x |       0.92x |
|     28 |       27 |         3 |       0.88x |       1.05x |       1.00x |
|     28 |       27 |         3 |       0.88x |       1.05x |       1.03x |
|     29 |       28 |         0 |       0.88x |       0.95x |       0.92x |
|     29 |       28 |         0 |       0.88x |       0.95x |       0.92x |
|     29 |       28 |         4 |       1.24x |       1.17x |       1.15x |
|     29 |       28 |         4 |       1.21x |       1.17x |       1.19x |
|     30 |       29 |         0 |       0.88x |       0.95x |       0.92x |
|     30 |       29 |         0 |       0.88x |       0.95x |       0.92x |
|     30 |       29 |         5 |       1.22x |       1.17x |       1.15x |
|     30 |       29 |         5 |       1.22x |       1.17x |       1.19x |
|     31 |       30 |         0 |       0.88x |       0.95x |       0.92x |
|     31 |       30 |         0 |       0.88x |       0.95x |       0.92x |
|     31 |       30 |         6 |       1.22x |       1.17x |       1.17x |
|     31 |       30 |         6 |       1.22x |       1.17x |       1.16x |
|     32 |       31 |         0 |       0.88x |       0.95x |       0.92x |
|     32 |       31 |         0 |       0.88x |       0.95x |       0.92x |
|     32 |       31 |         7 |       1.22x |       1.17x |       1.19x |
|     32 |       31 |         7 |       1.22x |       1.17x |       1.16x |

This patch passes the tests with no regressions.

8< --- 8< --- 8<
Add support for MTE to memchr. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.

The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.

Co-authored-by: Gabor Kertesz <gabor.kertesz@arm.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-add-MTE-compatible-memchr.patch
Type: text/x-patch
Size: 6548 bytes
Desc: 0001-aarch64-add-MTE-compatible-memchr.patch
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200609/e48e3ae8/attachment-0001.bin>


More information about the Libc-alpha mailing list