[PATCH v2] aarch64: MTE compatible strncmp
Alex Butler
Alex.Butler@arm.com
Tue Jun 16 12:44:24 GMT 2020
This patch adds an MTE compatible implementation of strncmp.
Please see the benchmark results for the performance uplift.
| strlen | len | align1 | align2 | uplift A72 | uplift A53 | uplift N1 |
| 8 | 1 | 0 | 0 | 1.15x | 1.17x | 1.21x |
| 8 | 1 | 0 | 0 | 1.00x | 1.17x | 1.21x |
| 8 | 1 | 0 | 0 | 1.00x | 1.17x | 1.20x |
| 8 | 1 | 1 | 1 | 1.03x | 1.14x | 1.28x |
| 8 | 1 | 1 | 1 | 1.03x | 1.14x | 1.22x |
| 8 | 1 | 1 | 1 | 1.03x | 1.14x | 1.22x |
| 8 | 1 | 1 | 2 | 1.03x | 1.00x | 0.78x |
| 8 | 1 | 2 | 1 | 1.05x | 1.02x | 1.00x |
| 8 | 1 | 1 | 3 | 1.05x | 0.98x | 1.42x |
| 8 | 1 | 0 | 0 | 1.14x | 1.17x | 1.20x |
| 8 | 1 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 1 | 0 | 0 | 1.14x | 1.17x | 1.20x |
| 8 | 1 | 1 | 1 | 1.15x | 1.14x | 1.25x |
| 8 | 1 | 1 | 1 | 1.15x | 1.14x | 1.20x |
| 8 | 1 | 1 | 1 | 1.15x | 1.14x | 1.22x |
| 8 | 1 | 1 | 2 | 1.05x | 0.99x | 0.78x |
| 8 | 1 | 2 | 1 | 1.05x | 1.00x | 0.78x |
| 8 | 1 | 1 | 3 | 1.05x | 1.00x | 0.78x |
| 8 | 2 | 0 | 0 | 1.13x | 1.17x | 1.20x |
| 8 | 2 | 0 | 0 | 1.14x | 1.17x | 1.22x |
| 8 | 2 | 0 | 0 | 1.14x | 1.19x | 1.21x |
| 8 | 2 | 2 | 2 | 1.15x | 1.14x | 1.22x |
| 8 | 2 | 2 | 2 | 1.15x | 1.14x | 1.22x |
| 8 | 2 | 2 | 2 | 1.17x | 1.14x | 1.22x |
| 8 | 2 | 2 | 4 | 1.04x | 1.00x | 1.06x |
| 8 | 2 | 4 | 2 | 1.04x | 1.00x | 1.06x |
| 8 | 2 | 2 | 6 | 1.04x | 1.00x | 0.95x |
| 8 | 2 | 0 | 0 | 1.14x | 1.17x | 1.20x |
| 8 | 2 | 0 | 0 | 1.16x | 1.19x | 1.21x |
| 8 | 2 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 2 | 2 | 2 | 1.15x | 1.14x | 1.22x |
| 8 | 2 | 2 | 2 | 1.15x | 1.14x | 1.22x |
| 8 | 2 | 2 | 2 | 1.15x | 1.14x | 1.22x |
| 8 | 2 | 2 | 4 | 1.04x | 1.00x | 1.06x |
| 8 | 2 | 4 | 2 | 1.04x | 1.00x | 1.22x |
| 8 | 2 | 2 | 6 | 1.04x | 1.00x | 0.95x |
| 8 | 3 | 0 | 0 | 1.14x | 1.17x | 1.22x |
| 8 | 3 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 3 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 3 | 3 | 3 | 1.15x | 1.15x | 1.22x |
| 8 | 3 | 3 | 3 | 1.15x | 1.14x | 1.22x |
| 8 | 3 | 3 | 3 | 1.15x | 1.13x | 1.22x |
| 8 | 3 | 3 | 6 | 1.05x | 1.01x | 0.87x |
| 8 | 3 | 6 | 3 | 0.99x | 1.00x | 0.88x |
| 8 | 3 | 3 | 1 | 1.03x | 1.00x | 0.87x |
| 8 | 3 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 3 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 3 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 3 | 3 | 3 | 1.14x | 1.13x | 1.22x |
| 8 | 3 | 3 | 3 | 1.15x | 1.15x | 1.22x |
| 8 | 3 | 3 | 3 | 1.15x | 1.14x | 1.22x |
| 8 | 3 | 3 | 6 | 1.04x | 1.00x | 1.08x |
| 8 | 3 | 6 | 3 | 1.03x | 1.01x | 1.08x |
| 8 | 3 | 3 | 1 | 1.03x | 1.00x | 1.09x |
| 8 | 4 | 0 | 0 | 1.14x | 1.17x | 1.22x |
| 8 | 4 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 4 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 4 | 4 | 4 | 1.14x | 1.14x | 1.10x |
| 8 | 4 | 4 | 4 | 1.15x | 1.13x | 1.10x |
| 8 | 4 | 4 | 4 | 1.15x | 1.14x | 1.10x |
| 8 | 4 | 4 | 0 | 1.02x | 1.00x | 1.07x |
| 8 | 4 | 8 | 4 | 1.03x | 0.99x | 1.08x |
| 8 | 4 | 4 | 4 | 1.15x | 1.14x | 1.10x |
| 8 | 4 | 0 | 0 | 1.14x | 1.17x | 1.23x |
| 8 | 4 | 0 | 0 | 1.14x | 1.17x | 1.22x |
| 8 | 4 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 4 | 4 | 4 | 1.13x | 1.14x | 1.11x |
| 8 | 4 | 4 | 4 | 1.15x | 1.14x | 1.10x |
| 8 | 4 | 4 | 4 | 1.15x | 1.14x | 1.10x |
| 8 | 4 | 4 | 0 | 1.03x | 1.00x | 1.08x |
| 8 | 4 | 8 | 4 | 1.03x | 1.00x | 1.08x |
| 8 | 4 | 4 | 4 | 1.15x | 1.14x | 1.10x |
| 8 | 5 | 0 | 0 | 1.14x | 1.17x | 1.22x |
| 8 | 5 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 5 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 5 | 5 | 5 | 1.11x | 1.10x | 1.16x |
| 8 | 5 | 5 | 5 | 1.12x | 1.11x | 1.17x |
| 8 | 5 | 5 | 5 | 1.12x | 1.14x | 1.16x |
| 8 | 5 | 5 | 2 | 1.03x | 1.00x | 1.07x |
| 8 | 5 | 10 | 5 | 1.02x | 0.99x | 1.07x |
| 8 | 5 | 5 | 7 | 1.03x | 1.00x | 1.07x |
| 8 | 5 | 0 | 0 | 1.14x | 1.23x | 1.21x |
| 8 | 5 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 5 | 0 | 0 | 1.12x | 1.17x | 1.22x |
| 8 | 5 | 5 | 5 | 1.12x | 1.12x | 1.24x |
| 8 | 5 | 5 | 5 | 1.12x | 1.10x | 1.16x |
| 8 | 5 | 5 | 5 | 1.12x | 1.10x | 1.16x |
| 8 | 5 | 5 | 2 | 1.02x | 0.99x | 1.07x |
| 8 | 5 | 10 | 5 | 1.03x | 1.00x | 1.07x |
| 8 | 5 | 5 | 7 | 1.03x | 1.01x | 1.07x |
| 8 | 6 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 6 | 0 | 0 | 1.12x | 1.17x | 1.22x |
| 8 | 6 | 0 | 0 | 1.14x | 1.17x | 1.20x |
| 8 | 6 | 6 | 6 | 1.12x | 1.14x | 1.16x |
| 8 | 6 | 6 | 6 | 1.12x | 1.14x | 1.09x |
| 8 | 6 | 6 | 6 | 1.11x | 1.13x | 1.16x |
| 8 | 6 | 6 | 4 | 1.03x | 1.00x | 1.06x |
| 8 | 6 | 12 | 6 | 1.03x | 1.00x | 0.89x |
| 8 | 6 | 6 | 2 | 1.03x | 1.00x | 0.89x |
| 8 | 6 | 0 | 0 | 1.14x | 1.18x | 1.22x |
| 8 | 6 | 0 | 0 | 1.14x | 1.15x | 1.21x |
| 8 | 6 | 0 | 0 | 1.14x | 1.17x | 1.20x |
| 8 | 6 | 6 | 6 | 1.12x | 1.10x | 1.16x |
| 8 | 6 | 6 | 6 | 1.11x | 1.14x | 1.16x |
| 8 | 6 | 6 | 6 | 1.12x | 1.11x | 1.24x |
| 8 | 6 | 6 | 4 | 1.03x | 0.99x | 1.07x |
| 8 | 6 | 12 | 6 | 1.03x | 1.00x | 1.07x |
| 8 | 6 | 6 | 2 | 1.04x | 1.00x | 1.07x |
| 8 | 7 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 7 | 0 | 0 | 1.14x | 1.18x | 1.21x |
| 8 | 7 | 0 | 0 | 1.14x | 1.17x | 1.21x |
| 8 | 7 | 7 | 7 | 1.13x | 1.10x | 1.09x |
| 8 | 7 | 7 | 7 | 1.12x | 1.10x | 1.16x |
| 8 | 7 | 7 | 7 | 1.12x | 1.14x | 1.09x |
| 8 | 7 | 7 | 6 | 1.02x | 1.00x | 0.90x |
| 8 | 7 | 14 | 7 | 1.03x | 0.99x | 1.02x |
| 8 | 7 | 7 | 5 | 1.03x | 1.01x | 1.03x |
| 8 | 7 | 0 | 0 | 1.14x | 1.23x | 1.22x |
| 8 | 7 | 0 | 0 | 1.15x | 1.17x | 1.22x |
| 8 | 7 | 0 | 0 | 1.14x | 1.16x | 1.20x |
| 8 | 7 | 7 | 7 | 1.12x | 1.09x | 1.16x |
| 8 | 7 | 7 | 7 | 1.12x | 1.11x | 1.09x |
| 8 | 7 | 7 | 7 | 1.12x | 1.14x | 1.16x |
| 8 | 7 | 7 | 6 | 1.03x | 1.01x | 1.03x |
| 8 | 7 | 14 | 7 | 1.03x | 1.00x | 1.02x |
| 8 | 7 | 7 | 5 | 1.02x | 1.00x | 1.02x |
| 8 | 8 | 0 | 0 | 1.14x | 1.17x | 1.11x |
| 8 | 8 | 0 | 0 | 1.14x | 1.17x | 1.11x |
| 8 | 8 | 0 | 0 | 1.14x | 1.17x | 1.12x |
| 8 | 8 | 8 | 0 | 1.14x | 1.17x | 1.12x |
| 8 | 8 | 8 | 0 | 1.14x | 1.17x | 1.11x |
| 8 | 8 | 8 | 0 | 1.14x | 1.17x | 1.11x |
| 8 | 8 | 8 | 0 | 1.14x | 1.17x | 1.12x |
| 8 | 8 | 16 | 0 | 1.12x | 1.17x | 1.11x |
| 8 | 8 | 8 | 0 | 1.14x | 1.17x | 1.11x |
| 8 | 8 | 0 | 0 | 1.14x | 1.15x | 1.12x |
| 8 | 8 | 0 | 0 | 1.14x | 1.17x | 1.12x |
| 8 | 8 | 0 | 0 | 1.12x | 1.17x | 1.12x |
| 8 | 8 | 8 | 0 | 1.14x | 1.17x | 1.11x |
| 8 | 8 | 8 | 0 | 1.14x | 1.17x | 1.11x |
| 8 | 8 | 8 | 0 | 1.14x | 1.17x | 1.10x |
| 8 | 8 | 8 | 0 | 1.12x | 1.17x | 1.10x |
| 8 | 8 | 16 | 0 | 1.14x | 1.17x | 1.11x |
| 8 | 8 | 8 | 0 | 1.14x | 1.17x | 1.11x |
| 8 | 9 | 0 | 0 | 1.10x | 1.11x | 1.16x |
| 8 | 9 | 0 | 0 | 1.10x | 1.11x | 1.15x |
| 8 | 9 | 0 | 0 | 1.10x | 1.12x | 1.15x |
| 8 | 9 | 9 | 1 | 1.12x | 1.10x | 1.16x |
| 8 | 9 | 9 | 1 | 1.11x | 1.10x | 1.16x |
| 8 | 9 | 9 | 1 | 1.12x | 1.10x | 1.16x |
| 8 | 9 | 9 | 2 | 1.02x | 1.00x | 1.08x |
| 8 | 9 | 18 | 1 | 1.02x | 1.00x | 1.00x |
| 8 | 9 | 9 | 3 | 1.02x | 1.00x | 1.01x |
| 8 | 9 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 9 | 0 | 0 | 1.11x | 1.09x | 1.15x |
| 8 | 9 | 0 | 0 | 1.10x | 1.16x | 1.15x |
| 8 | 9 | 9 | 1 | 1.12x | 1.09x | 1.16x |
| 8 | 9 | 9 | 1 | 1.13x | 1.10x | 1.16x |
| 8 | 9 | 9 | 1 | 1.11x | 1.11x | 1.16x |
| 8 | 9 | 9 | 2 | 1.02x | 1.00x | 1.00x |
| 8 | 9 | 18 | 1 | 1.02x | 1.01x | 1.00x |
| 8 | 9 | 9 | 3 | 1.02x | 1.00x | 1.00x |
| 8 | 10 | 0 | 0 | 1.10x | 1.11x | 1.16x |
| 8 | 10 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 10 | 0 | 0 | 1.10x | 1.11x | 1.15x |
| 8 | 10 | 10 | 2 | 1.12x | 1.14x | 1.16x |
| 8 | 10 | 10 | 2 | 1.12x | 1.14x | 1.17x |
| 8 | 10 | 10 | 2 | 1.12x | 1.13x | 1.16x |
| 8 | 10 | 10 | 4 | 1.02x | 1.00x | 1.00x |
| 8 | 10 | 20 | 2 | 1.02x | 0.99x | 1.00x |
| 8 | 10 | 10 | 6 | 1.02x | 1.00x | 1.00x |
| 8 | 10 | 0 | 0 | 1.09x | 1.18x | 1.15x |
| 8 | 10 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 10 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 10 | 10 | 2 | 1.12x | 1.14x | 1.16x |
| 8 | 10 | 10 | 2 | 1.12x | 1.14x | 1.16x |
| 8 | 10 | 10 | 2 | 1.13x | 1.10x | 1.16x |
| 8 | 10 | 10 | 4 | 1.02x | 1.00x | 1.00x |
| 8 | 10 | 20 | 2 | 1.02x | 1.00x | 1.00x |
| 8 | 10 | 10 | 6 | 1.03x | 1.00x | 1.00x |
| 8 | 11 | 0 | 0 | 1.10x | 1.15x | 1.15x |
| 8 | 11 | 0 | 0 | 1.09x | 1.10x | 1.15x |
| 8 | 11 | 0 | 0 | 1.10x | 1.09x | 1.15x |
| 8 | 11 | 11 | 3 | 1.12x | 1.14x | 1.16x |
| 8 | 11 | 11 | 3 | 1.12x | 1.13x | 1.16x |
| 8 | 11 | 11 | 3 | 1.12x | 1.13x | 1.16x |
| 8 | 11 | 11 | 6 | 1.02x | 1.00x | 1.00x |
| 8 | 11 | 22 | 3 | 1.02x | 1.01x | 1.00x |
| 8 | 11 | 11 | 1 | 1.02x | 1.00x | 1.00x |
| 8 | 11 | 0 | 0 | 1.10x | 1.16x | 1.03x |
| 8 | 11 | 0 | 0 | 1.10x | 1.10x | 1.03x |
| 8 | 11 | 0 | 0 | 1.09x | 1.10x | 1.15x |
| 8 | 11 | 11 | 3 | 1.13x | 1.14x | 1.16x |
| 8 | 11 | 11 | 3 | 1.12x | 1.10x | 1.16x |
| 8 | 11 | 11 | 3 | 1.12x | 1.10x | 1.15x |
| 8 | 11 | 11 | 6 | 1.03x | 0.99x | 0.99x |
| 8 | 11 | 22 | 3 | 1.02x | 1.00x | 1.00x |
| 8 | 11 | 11 | 1 | 1.02x | 0.99x | 0.99x |
| 8 | 12 | 0 | 0 | 1.10x | 1.21x | 1.15x |
| 8 | 12 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 12 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 12 | 12 | 4 | 1.13x | 1.10x | 1.07x |
| 8 | 12 | 12 | 4 | 1.15x | 1.11x | 1.07x |
| 8 | 12 | 12 | 4 | 1.15x | 1.10x | 1.01x |
| 8 | 12 | 12 | 0 | 1.02x | 1.01x | 1.00x |
| 8 | 12 | 24 | 4 | 1.02x | 1.00x | 1.00x |
| 8 | 12 | 12 | 4 | 1.15x | 1.11x | 1.07x |
| 8 | 12 | 0 | 0 | 1.10x | 1.15x | 1.15x |
| 8 | 12 | 0 | 0 | 1.12x | 1.11x | 1.14x |
| 8 | 12 | 0 | 0 | 1.10x | 1.16x | 1.15x |
| 8 | 12 | 12 | 4 | 1.15x | 1.09x | 1.07x |
| 8 | 12 | 12 | 4 | 1.16x | 1.10x | 1.07x |
| 8 | 12 | 12 | 4 | 1.15x | 1.10x | 1.07x |
| 8 | 12 | 12 | 0 | 1.02x | 1.00x | 0.98x |
| 8 | 12 | 24 | 4 | 1.02x | 1.00x | 1.00x |
| 8 | 12 | 12 | 4 | 1.15x | 1.10x | 1.07x |
| 8 | 13 | 0 | 0 | 1.10x | 1.21x | 1.15x |
| 8 | 13 | 0 | 0 | 1.10x | 1.10x | 1.16x |
| 8 | 13 | 0 | 0 | 1.10x | 1.10x | 1.33x |
| 8 | 13 | 13 | 5 | 1.06x | 1.08x | 1.07x |
| 8 | 13 | 13 | 5 | 1.06x | 1.08x | 1.07x |
| 8 | 13 | 13 | 5 | 1.06x | 1.08x | 1.07x |
| 8 | 13 | 13 | 2 | 1.02x | 1.00x | 1.00x |
| 8 | 13 | 26 | 5 | 1.02x | 1.00x | 1.00x |
| 8 | 13 | 13 | 7 | 1.02x | 1.00x | 1.00x |
| 8 | 13 | 0 | 0 | 1.10x | 1.12x | 1.04x |
| 8 | 13 | 0 | 0 | 1.10x | 1.16x | 1.15x |
| 8 | 13 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 13 | 13 | 5 | 1.06x | 1.08x | 1.07x |
| 8 | 13 | 13 | 5 | 1.06x | 1.08x | 1.07x |
| 8 | 13 | 13 | 5 | 1.07x | 1.08x | 1.07x |
| 8 | 13 | 13 | 2 | 1.02x | 1.00x | 1.01x |
| 8 | 13 | 26 | 5 | 1.02x | 1.00x | 1.00x |
| 8 | 13 | 13 | 7 | 1.02x | 1.00x | 1.00x |
| 8 | 14 | 0 | 0 | 1.10x | 1.10x | 1.14x |
| 8 | 14 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 14 | 0 | 0 | 1.11x | 1.09x | 1.16x |
| 8 | 14 | 14 | 6 | 1.05x | 1.08x | 1.06x |
| 8 | 14 | 14 | 6 | 1.06x | 1.08x | 1.07x |
| 8 | 14 | 14 | 6 | 1.06x | 1.08x | 1.07x |
| 8 | 14 | 14 | 4 | 1.02x | 1.01x | 1.00x |
| 8 | 14 | 28 | 6 | 1.02x | 1.00x | 1.00x |
| 8 | 14 | 14 | 2 | 1.03x | 1.01x | 1.08x |
| 8 | 14 | 0 | 0 | 1.10x | 1.10x | 1.33x |
| 8 | 14 | 0 | 0 | 1.10x | 1.11x | 1.15x |
| 8 | 14 | 0 | 0 | 1.11x | 1.10x | 1.04x |
| 8 | 14 | 14 | 6 | 1.06x | 1.08x | 1.07x |
| 8 | 14 | 14 | 6 | 1.03x | 1.08x | 1.07x |
| 8 | 14 | 14 | 6 | 1.06x | 1.08x | 1.07x |
| 8 | 14 | 14 | 4 | 1.02x | 0.99x | 1.00x |
| 8 | 14 | 28 | 6 | 1.02x | 1.00x | 1.00x |
| 8 | 14 | 14 | 2 | 1.02x | 0.99x | 0.96x |
| 8 | 15 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 15 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 15 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 8 | 15 | 15 | 7 | 1.07x | 1.08x | 1.07x |
| 8 | 15 | 15 | 7 | 1.06x | 1.08x | 1.06x |
| 8 | 15 | 15 | 7 | 1.05x | 1.08x | 1.07x |
| 8 | 15 | 15 | 6 | 1.02x | 1.00x | 1.08x |
| 8 | 15 | 30 | 7 | 1.02x | 1.00x | 1.00x |
| 8 | 15 | 15 | 5 | 1.01x | 1.00x | 1.00x |
| 8 | 15 | 0 | 0 | 1.11x | 1.11x | 1.15x |
| 8 | 15 | 0 | 0 | 1.10x | 1.16x | 1.15x |
| 8 | 15 | 0 | 0 | 1.10x | 1.11x | 1.15x |
| 8 | 15 | 15 | 7 | 1.06x | 1.07x | 1.07x |
| 8 | 15 | 15 | 7 | 1.06x | 1.08x | 1.07x |
| 8 | 15 | 15 | 7 | 1.06x | 1.08x | 1.07x |
| 8 | 15 | 15 | 6 | 1.02x | 1.00x | 0.97x |
| 8 | 15 | 30 | 7 | 1.02x | 1.00x | 1.00x |
| 8 | 15 | 15 | 5 | 1.02x | 1.00x | 1.00x |
| 16 | 32 | 0 | 0 | 1.02x | 1.09x | 1.02x |
| 16 | 32 | 0 | 0 | 1.03x | 1.10x | 1.03x |
| 16 | 32 | 0 | 0 | 1.03x | 1.08x | 1.04x |
| 16 | 32 | 0 | 0 | 1.04x | 1.09x | 1.03x |
| 16 | 32 | 0 | 0 | 1.03x | 1.09x | 1.03x |
| 16 | 32 | 0 | 0 | 1.03x | 1.09x | 1.03x |
| 16 | 32 | 7 | 2 | 1.15x | 0.99x | 0.85x |
| 16 | 32 | 7 | 2 | 1.15x | 0.98x | 0.85x |
| 16 | 32 | 2 | 1 | 1.02x | 0.98x | 0.86x |
| 16 | 32 | 2 | 1 | 1.02x | 0.99x | 0.86x |
| 32 | 64 | 0 | 0 | 1.03x | 0.97x | 0.99x |
| 32 | 64 | 0 | 0 | 1.03x | 0.96x | 0.99x |
| 32 | 64 | 0 | 0 | 1.03x | 0.96x | 0.99x |
| 32 | 64 | 0 | 0 | 1.03x | 0.96x | 0.99x |
| 32 | 64 | 0 | 0 | 1.03x | 0.97x | 0.99x |
| 32 | 64 | 0 | 0 | 1.03x | 0.97x | 0.99x |
| 32 | 64 | 6 | 4 | 1.15x | 0.93x | 0.81x |
| 32 | 64 | 6 | 4 | 1.16x | 0.93x | 0.81x |
| 32 | 64 | 4 | 2 | 0.94x | 0.94x | 0.81x |
| 32 | 64 | 4 | 2 | 0.93x | 0.94x | 0.80x |
| 64 | 128 | 0 | 0 | 1.02x | 0.96x | 0.99x |
| 64 | 128 | 0 | 0 | 1.02x | 0.96x | 0.99x |
| 64 | 128 | 0 | 0 | 1.02x | 0.96x | 0.99x |
| 64 | 128 | 0 | 0 | 1.02x | 0.97x | 0.99x |
| 64 | 128 | 0 | 0 | 1.02x | 0.96x | 1.00x |
| 64 | 128 | 0 | 0 | 1.02x | 0.97x | 0.99x |
| 64 | 128 | 5 | 6 | 0.86x | 0.83x | 0.78x |
| 64 | 128 | 5 | 6 | 0.86x | 0.83x | 0.77x |
| 64 | 128 | 6 | 3 | 1.00x | 0.84x | 0.80x |
| 64 | 128 | 6 | 3 | 1.00x | 0.84x | 0.80x |
| 128 | 256 | 0 | 0 | 1.01x | 1.02x | 1.00x |
| 128 | 256 | 0 | 0 | 1.01x | 1.02x | 1.00x |
| 128 | 256 | 0 | 0 | 1.01x | 1.02x | 1.00x |
| 128 | 256 | 0 | 0 | 1.01x | 1.02x | 1.00x |
| 128 | 256 | 0 | 0 | 1.01x | 1.02x | 1.00x |
| 128 | 256 | 0 | 0 | 1.01x | 1.02x | 1.00x |
| 128 | 256 | 4 | 0 | 0.79x | 0.82x | 0.79x |
| 128 | 256 | 4 | 0 | 0.79x | 0.82x | 0.79x |
| 128 | 256 | 8 | 4 | 0.76x | 0.81x | 0.75x |
| 128 | 256 | 8 | 4 | 0.76x | 0.81x | 0.75x |
| 256 | 512 | 0 | 0 | 1.01x | 1.01x | 1.00x |
| 256 | 512 | 0 | 0 | 1.01x | 1.01x | 1.00x |
| 256 | 512 | 0 | 0 | 1.21x | 1.01x | 1.00x |
| 256 | 512 | 0 | 0 | 1.01x | 1.01x | 1.00x |
| 256 | 512 | 0 | 0 | 1.00x | 1.01x | 1.00x |
| 256 | 512 | 0 | 0 | 1.19x | 1.02x | 1.00x |
| 256 | 512 | 3 | 2 | 0.87x | 0.80x | 0.77x |
| 256 | 512 | 3 | 2 | 0.87x | 0.80x | 0.77x |
| 256 | 512 | 10 | 5 | 0.85x | 0.81x | 0.76x |
| 256 | 512 | 10 | 5 | 0.85x | 0.81x | 0.76x |
| 512 | 1024 | 0 | 0 | 1.01x | 1.01x | 1.00x |
| 512 | 1024 | 0 | 0 | 1.01x | 1.01x | 1.00x |
| 512 | 1024 | 0 | 0 | 1.01x | 1.01x | 1.00x |
| 512 | 1024 | 0 | 0 | 1.02x | 1.01x | 1.00x |
| 512 | 1024 | 0 | 0 | 1.01x | 1.01x | 1.00x |
| 512 | 1024 | 0 | 0 | 1.01x | 1.01x | 1.00x |
| 512 | 1024 | 2 | 4 | 0.79x | 0.79x | 0.75x |
| 512 | 1024 | 2 | 4 | 0.79x | 0.79x | 0.75x |
| 512 | 1024 | 12 | 6 | 0.79x | 0.78x | 0.75x |
| 512 | 1024 | 12 | 6 | 0.79x | 0.78x | 0.75x |
| 1024 | 2048 | 0 | 0 | 1.01x | 1.00x | 1.00x |
| 1024 | 2048 | 0 | 0 | 1.01x | 1.00x | 1.00x |
| 1024 | 2048 | 0 | 0 | 1.01x | 1.00x | 1.00x |
| 1024 | 2048 | 0 | 0 | 1.00x | 1.01x | 1.00x |
| 1024 | 2048 | 0 | 0 | 1.01x | 1.00x | 1.00x |
| 1024 | 2048 | 0 | 0 | 1.01x | 1.00x | 1.00x |
| 1024 | 2048 | 1 | 6 | 0.73x | 0.78x | 0.74x |
| 1024 | 2048 | 1 | 6 | 0.73x | 0.78x | 0.74x |
| 1024 | 2048 | 14 | 7 | 0.74x | 0.78x | 0.74x |
| 1024 | 2048 | 14 | 7 | 0.74x | 0.78x | 0.74x |
| 21 | 20 | 4 | 0 | 1.12x | 1.01x | 0.91x |
| 21 | 20 | 0 | 4 | 0.89x | 0.81x | 0.76x |
| 25 | 24 | 8 | 0 | 1.12x | 1.11x | 1.03x |
| 25 | 24 | 0 | 8 | 1.12x | 1.10x | 1.03x |
| 17 | 16 | 0 | 0 | 1.13x | 1.11x | 1.12x |
| 17 | 16 | 0 | 0 | 1.13x | 1.10x | 1.11x |
| 15 | 16 | 0 | 0 | 1.13x | 1.10x | 1.12x |
| 15 | 16 | 0 | 0 | 1.13x | 1.10x | 1.12x |
| 15 | 16 | 0 | 0 | 1.13x | 1.09x | 1.11x |
| 15 | 16 | 0 | 0 | 1.14x | 1.10x | 1.11x |
| 15 | 16 | 0 | 0 | 1.12x | 1.10x | 1.12x |
| 15 | 16 | 0 | 0 | 1.13x | 1.11x | 1.11x |
| 16 | 15 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 16 | 15 | 0 | 0 | 1.10x | 1.10x | 1.16x |
| 14 | 15 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 14 | 15 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 14 | 15 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 14 | 15 | 0 | 0 | 1.10x | 1.10x | 1.16x |
| 14 | 15 | 0 | 0 | 1.10x | 1.11x | 1.15x |
| 14 | 15 | 0 | 0 | 1.10x | 1.09x | 1.16x |
| 15 | 14 | 0 | 0 | 1.10x | 1.11x | 1.16x |
| 15 | 14 | 0 | 0 | 1.10x | 1.09x | 1.15x |
| 13 | 14 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 13 | 14 | 0 | 0 | 1.10x | 1.09x | 1.15x |
| 13 | 14 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 13 | 14 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 13 | 14 | 0 | 0 | 1.08x | 1.10x | 1.20x |
| 13 | 14 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 14 | 13 | 0 | 0 | 1.10x | 1.16x | 1.15x |
| 14 | 13 | 0 | 0 | 1.10x | 1.10x | 1.03x |
| 12 | 13 | 0 | 0 | 1.10x | 1.10x | 1.02x |
| 12 | 13 | 0 | 0 | 1.07x | 1.10x | 1.02x |
| 12 | 13 | 0 | 0 | 1.10x | 1.16x | 1.15x |
| 12 | 13 | 0 | 0 | 1.10x | 1.10x | 1.33x |
| 12 | 13 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 12 | 13 | 0 | 0 | 1.10x | 1.11x | 1.15x |
| 13 | 12 | 0 | 0 | 1.10x | 1.16x | 1.15x |
| 13 | 12 | 0 | 0 | 1.10x | 1.11x | 1.15x |
| 11 | 12 | 0 | 0 | 1.10x | 1.10x | 1.03x |
| 11 | 12 | 0 | 0 | 1.10x | 1.11x | 1.04x |
| 11 | 12 | 0 | 0 | 1.10x | 1.10x | 1.04x |
| 11 | 12 | 0 | 0 | 1.10x | 1.11x | 1.03x |
| 11 | 12 | 0 | 0 | 1.10x | 1.10x | 1.02x |
| 11 | 12 | 0 | 0 | 1.09x | 1.11x | 1.14x |
| 12 | 11 | 0 | 0 | 1.10x | 1.10x | 1.14x |
| 12 | 11 | 0 | 0 | 1.10x | 1.11x | 1.15x |
| 10 | 11 | 0 | 0 | 1.10x | 1.10x | 1.16x |
| 10 | 11 | 0 | 0 | 1.10x | 1.09x | 1.15x |
| 10 | 11 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 10 | 11 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 10 | 11 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 10 | 11 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 11 | 10 | 0 | 0 | 1.10x | 1.10x | 1.15x |
| 11 | 10 | 0 | 0 | 1.10x | 1.10x | 1.16x |
| 9 | 10 | 0 | 0 | 1.10x | 1.16x | 1.15x |
| 9 | 10 | 0 | 0 | 1.10x | 1.10x | 1.16x |
| 9 | 10 | 0 | 0 | 1.09x | 1.16x | 1.15x |
| 9 | 10 | 0 | 0 | 1.11x | 1.10x | 1.03x |
| 9 | 10 | 0 | 0 | 1.10x | 1.09x | 1.04x |
| 9 | 10 | 0 | 0 | 1.10x | 1.10x | 1.03x |
| 10 | 9 | 0 | 0 | 1.10x | 1.11x | 1.33x |
| 10 | 9 | 0 | 0 | 1.10x | 1.10x | 1.16x |
| 8 | 9 | 0 | 0 | 1.10x | 1.11x | 1.14x |
| 8 | 9 | 0 | 0 | 1.09x | 1.16x | 1.03x |
| 8 | 9 | 0 | 0 | 1.10x | 1.10x | 1.03x |
| 8 | 9 | 0 | 0 | 1.10x | 1.10x | 1.04x |
| 8 | 9 | 0 | 0 | 1.10x | 1.16x | 1.15x |
| 8 | 9 | 0 | 0 | 1.12x | 1.09x | 1.15x |
This patch passes the tests with no regressions.
8< --- 8< --- 8<
Add support for MTE to strncmp. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.
The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.
Co-authored-by: Branislav Rankov <branislav.rankov@arm.com>
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aarch64-add-MTE-compatible-strncmp-v2.patch
Type: text/x-patch
Size: 11625 bytes
Desc: aarch64-add-MTE-compatible-strncmp-v2.patch
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20200616/0600379f/attachment-0001.bin>
More information about the Libc-alpha
mailing list