[PATCH 0/2] Enable EVEX strcmp
H.J. Lu
hjl.tools@gmail.com
Mon Nov 1 12:54:10 GMT 2021
Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte
strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1
VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD
and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1
VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up
to 40% on Tiger Lake and Ice Lake.
bench-strcmp data on Tiger Lake:
Function: strcmp
Variant: default
__strcmp_avx2 __strcmp_evex
=======================================================================
length=1, align1=1, align2=1: 23.69 25.56
length=1, align1=1, align2=1: 24.62 23.43
length=1, align1=1, align2=1: 23.87 23.43
length=2, align1=2, align2=2: 6.82 6.61
length=2, align1=2, align2=2: 5.38 5.98
length=2, align1=2, align2=2: 6.86 6.85
length=3, align1=3, align2=3: 6.85 6.86
length=3, align1=3, align2=3: 5.98 5.98
length=3, align1=3, align2=3: 5.98 6.10
length=4, align1=4, align2=4: 6.58 5.98
length=4, align1=4, align2=4: 6.37 5.98
length=4, align1=4, align2=4: 6.58 5.98
length=5, align1=5, align2=5: 5.98 5.98
length=5, align1=5, align2=5: 6.06 6.82
length=5, align1=5, align2=5: 5.98 5.98
length=6, align1=6, align2=6: 6.58 5.98
length=6, align1=6, align2=6: 6.58 6.06
length=6, align1=6, align2=6: 5.98 5.98
length=7, align1=7, align2=7: 5.98 5.98
length=7, align1=7, align2=7: 5.98 6.05
length=7, align1=7, align2=7: 5.98 5.98
length=8, align1=8, align2=8: 5.38 5.38
length=8, align1=8, align2=8: 5.98 5.38
length=8, align1=8, align2=8: 5.98 5.38
length=9, align1=9, align2=9: 5.38 5.38
length=9, align1=9, align2=9: 5.38 5.38
length=9, align1=9, align2=9: 4.78 5.38
length=10, align1=10, align2=10: 6.05 5.40
length=10, align1=10, align2=10: 5.38 5.38
length=10, align1=10, align2=10: 5.38 5.38
length=11, align1=11, align2=11: 4.78 5.38
length=11, align1=11, align2=11: 4.78 5.38
length=11, align1=11, align2=11: 4.78 5.38
length=12, align1=12, align2=12: 4.86 5.38
length=12, align1=12, align2=12: 5.98 5.38
length=12, align1=12, align2=12: 5.98 5.38
length=13, align1=13, align2=13: 5.98 5.38
length=13, align1=13, align2=13: 4.78 5.38
length=13, align1=13, align2=13: 4.78 5.38
length=14, align1=14, align2=14: 5.98 5.38
length=14, align1=14, align2=14: 5.47 5.38
length=14, align1=14, align2=14: 5.38 5.38
length=15, align1=15, align2=15: 5.38 5.38
length=15, align1=15, align2=15: 5.98 5.38
length=15, align1=15, align2=15: 6.05 5.38
length=16, align1=16, align2=16: 4.79 4.79
length=16, align1=16, align2=16: 4.78 4.78
length=16, align1=16, align2=16: 5.38 4.79
length=17, align1=17, align2=17: 6.58 7.18
length=17, align1=17, align2=17: 6.58 7.18
length=17, align1=17, align2=17: 6.58 7.20
length=18, align1=18, align2=18: 6.58 7.20
length=18, align1=18, align2=18: 6.58 7.20
length=18, align1=18, align2=18: 6.58 7.20
length=19, align1=19, align2=19: 6.58 7.20
length=19, align1=19, align2=19: 6.58 7.18
length=19, align1=19, align2=19: 6.58 7.20
length=20, align1=20, align2=20: 6.58 7.18
length=20, align1=20, align2=20: 6.58 7.17
length=20, align1=20, align2=20: 6.58 7.18
length=21, align1=21, align2=21: 6.58 7.07
length=21, align1=21, align2=21: 7.18 5.98
length=21, align1=21, align2=21: 7.18 5.98
length=22, align1=22, align2=22: 6.58 5.98
length=22, align1=22, align2=22: 7.18 5.98
length=22, align1=22, align2=22: 7.18 6.06
length=23, align1=23, align2=23: 6.58 5.98
length=23, align1=23, align2=23: 6.58 5.98
length=23, align1=23, align2=23: 6.58 5.98
length=24, align1=24, align2=24: 4.86 4.79
length=24, align1=24, align2=24: 5.38 4.79
length=24, align1=24, align2=24: 5.38 4.79
length=25, align1=25, align2=25: 4.78 4.79
length=25, align1=25, align2=25: 5.38 4.79
length=25, align1=25, align2=25: 5.38 4.78
length=26, align1=26, align2=26: 5.46 4.78
length=26, align1=26, align2=26: 5.38 4.79
length=26, align1=26, align2=26: 5.38 4.78
length=27, align1=27, align2=27: 4.78 4.79
length=27, align1=27, align2=27: 4.78 4.78
length=27, align1=27, align2=27: 4.78 4.79
length=28, align1=28, align2=28: 5.38 4.79
length=28, align1=28, align2=28: 4.78 4.79
length=28, align1=28, align2=28: 5.38 4.78
length=29, align1=29, align2=29: 4.78 4.79
length=29, align1=29, align2=29: 5.38 4.78
length=29, align1=29, align2=29: 4.78 4.79
length=30, align1=30, align2=30: 4.78 4.86
length=30, align1=30, align2=30: 5.38 4.79
length=30, align1=30, align2=30: 4.78 4.79
length=31, align1=31, align2=31: 4.78 4.86
length=31, align1=31, align2=31: 5.38 4.78
length=31, align1=31, align2=31: 5.38 4.78
length=4, align1=0, align2=0: 6.00 5.39
length=4, align1=0, align2=0: 6.00 5.38
length=4, align1=0, align2=0: 6.00 5.38
length=4, align1=0, align2=0: 5.98 5.38
length=4, align1=0, align2=0: 6.02 5.38
length=4, align1=0, align2=0: 5.98 5.38
length=4, align1=0, align2=1: 5.98 5.98
length=4, align1=1, align2=2: 5.38 5.98
length=8, align1=0, align2=0: 5.98 5.38
length=8, align1=0, align2=0: 6.02 5.38
length=8, align1=0, align2=0: 6.00 5.38
length=8, align1=0, align2=0: 6.00 5.38
length=8, align1=0, align2=0: 6.02 5.38
length=8, align1=0, align2=0: 5.98 5.38
length=8, align1=0, align2=2: 5.98 5.98
length=8, align1=2, align2=3: 5.38 5.98
length=16, align1=0, align2=0: 5.38 4.79
length=16, align1=0, align2=0: 5.38 4.78
length=16, align1=0, align2=0: 4.87 4.78
length=16, align1=0, align2=0: 5.38 4.79
length=16, align1=0, align2=0: 4.78 4.79
length=16, align1=0, align2=0: 5.38 4.79
length=16, align1=0, align2=3: 6.00 5.38
length=16, align1=3, align2=4: 5.98 5.38
length=32, align1=0, align2=0: 7.82 5.99
length=32, align1=0, align2=0: 7.71 6.58
length=32, align1=0, align2=0: 6.44 4.79
length=32, align1=0, align2=0: 6.81 4.79
length=32, align1=0, align2=0: 6.53 4.79
length=32, align1=0, align2=0: 6.33 4.79
length=32, align1=0, align2=4: 8.61 4.78
length=32, align1=4, align2=5: 6.74 5.49
length=64, align1=0, align2=0: 9.67 8.24
length=64, align1=0, align2=0: 11.11 8.23
length=64, align1=0, align2=0: 10.00 6.88
length=64, align1=0, align2=0: 12.82 6.88
length=64, align1=0, align2=0: 10.42 7.88
length=64, align1=0, align2=0: 10.37 6.88
length=64, align1=0, align2=5: 11.08 6.88
length=64, align1=5, align2=6: 9.29 6.88
length=128, align1=0, align2=0: 14.06 14.08
length=128, align1=0, align2=0: 14.23 14.14
length=128, align1=0, align2=0: 8.41 7.48
length=128, align1=0, align2=0: 10.55 7.48
length=128, align1=0, align2=0: 8.45 7.48
length=128, align1=0, align2=0: 9.38 7.48
length=128, align1=0, align2=6: 8.44 7.48
length=128, align1=6, align2=7: 8.66 7.48
length=256, align1=0, align2=0: 16.54 17.55
length=256, align1=0, align2=0: 16.42 17.49
length=256, align1=0, align2=0: 17.03 17.47
length=256, align1=0, align2=0: 17.57 17.49
length=256, align1=0, align2=0: 16.63 17.47
length=256, align1=0, align2=0: 17.88 17.54
length=256, align1=0, align2=7: 20.20 19.18
length=256, align1=7, align2=8: 20.17 19.14
length=512, align1=0, align2=0: 25.17 24.51
length=512, align1=0, align2=0: 24.60 24.38
length=512, align1=0, align2=0: 24.53 24.52
length=512, align1=0, align2=0: 25.71 24.34
length=512, align1=0, align2=0: 24.55 24.48
length=512, align1=0, align2=0: 25.15 24.44
length=512, align1=0, align2=8: 25.97 25.90
length=512, align1=8, align2=9: 25.88 25.92
length=1024, align1=0, align2=0: 40.13 36.75
length=1024, align1=0, align2=0: 39.84 36.63
length=1024, align1=0, align2=0: 40.50 36.84
length=1024, align1=0, align2=0: 40.16 36.76
length=1024, align1=0, align2=0: 39.72 36.76
length=1024, align1=0, align2=0: 40.67 36.76
length=1024, align1=0, align2=9: 40.57 39.59
length=1024, align1=9, align2=10: 40.66 39.60
length=16, align1=1, align2=2: 6.59 7.18
length=16, align1=2, align2=1: 7.18 7.18
length=16, align1=1, align2=2: 5.39 5.38
length=16, align1=2, align2=1: 5.97 5.40
length=16, align1=1, align2=2: 5.41 5.38
length=16, align1=2, align2=1: 5.98 5.38
length=32, align1=2, align2=4: 8.81 7.18
length=32, align1=4, align2=2: 8.79 7.18
length=32, align1=2, align2=4: 7.57 4.79
length=32, align1=4, align2=2: 6.79 4.79
length=32, align1=2, align2=4: 7.03 4.78
length=32, align1=4, align2=2: 7.04 4.78
length=64, align1=3, align2=6: 10.00 8.38
length=64, align1=6, align2=3: 8.89 9.57
length=64, align1=3, align2=6: 9.31 6.88
length=64, align1=6, align2=3: 10.06 6.88
length=64, align1=3, align2=6: 9.38 6.88
length=64, align1=6, align2=3: 10.42 6.88
length=128, align1=4, align2=8: 17.36 16.15
length=128, align1=8, align2=4: 14.30 14.50
length=128, align1=4, align2=8: 8.48 7.48
length=128, align1=8, align2=4: 8.78 7.48
length=128, align1=4, align2=8: 8.45 7.48
length=128, align1=8, align2=4: 8.57 7.55
length=256, align1=5, align2=10: 20.73 19.26
length=256, align1=10, align2=5: 16.81 18.56
length=256, align1=5, align2=10: 20.44 19.14
length=256, align1=10, align2=5: 16.76 18.57
length=256, align1=5, align2=10: 20.03 19.22
length=256, align1=10, align2=5: 17.01 18.55
length=512, align1=6, align2=12: 26.50 25.81
length=512, align1=12, align2=6: 24.64 25.61
length=512, align1=6, align2=12: 26.23 25.90
length=512, align1=12, align2=6: 24.78 25.70
length=512, align1=6, align2=12: 25.85 25.90
length=512, align1=12, align2=6: 25.98 25.71
length=1024, align1=7, align2=14: 40.62 39.69
length=1024, align1=14, align2=7: 39.74 39.06
length=1024, align1=7, align2=14: 40.70 39.58
length=1024, align1=14, align2=7: 40.16 39.04
length=1024, align1=7, align2=14: 40.62 39.65
length=1024, align1=14, align2=7: 39.68 39.12
length=128, align1=8063, align2=8063: 14.19 14.43
length=128, align1=8063, align2=8062: 14.57 14.48
length=129, align1=8062, align2=8063: 17.52 16.06
length=129, align1=8062, align2=8062: 14.13 14.08
length=129, align1=8062, align2=8062: 14.16 14.08
length=129, align1=8062, align2=8061: 15.59 14.54
length=130, align1=8061, align2=8062: 17.53 16.14
length=130, align1=8061, align2=8061: 14.66 14.08
length=130, align1=8061, align2=8061: 13.80 14.09
length=130, align1=8061, align2=8060: 14.28 14.47
length=131, align1=8060, align2=8061: 17.84 16.11
length=131, align1=8060, align2=8060: 14.08 14.07
length=131, align1=8060, align2=8060: 14.02 14.07
length=131, align1=8060, align2=8059: 15.05 14.48
length=132, align1=8059, align2=8060: 17.46 16.10
length=132, align1=8059, align2=8059: 13.99 14.07
length=132, align1=8059, align2=8059: 14.01 14.08
length=132, align1=8059, align2=8058: 14.54 14.54
length=133, align1=8058, align2=8059: 17.38 16.17
length=133, align1=8058, align2=8058: 14.14 14.08
length=133, align1=8058, align2=8058: 13.88 14.06
length=133, align1=8058, align2=8057: 14.66 14.47
length=134, align1=8057, align2=8058: 17.45 16.13
length=134, align1=8057, align2=8057: 14.10 14.07
length=134, align1=8057, align2=8057: 14.54 14.07
length=134, align1=8057, align2=8056: 14.58 14.49
length=135, align1=8056, align2=8057: 17.65 16.10
length=135, align1=8056, align2=8056: 13.91 14.08
length=135, align1=8056, align2=8056: 14.16 14.07
length=135, align1=8056, align2=8055: 15.19 14.74
length=136, align1=8055, align2=8056: 18.17 16.10
length=136, align1=8055, align2=8055: 14.68 14.64
length=136, align1=8055, align2=8055: 14.58 14.64
length=136, align1=8055, align2=8054: 15.21 15.03
length=137, align1=8054, align2=8055: 17.75 16.22
length=137, align1=8054, align2=8054: 14.51 14.62
length=137, align1=8054, align2=8054: 15.15 14.69
length=137, align1=8054, align2=8053: 15.11 14.94
length=138, align1=8053, align2=8054: 18.13 16.22
length=138, align1=8053, align2=8053: 14.61 14.70
length=138, align1=8053, align2=8053: 14.41 14.70
length=138, align1=8053, align2=8052: 14.96 14.94
length=139, align1=8052, align2=8053: 17.98 16.21
length=139, align1=8052, align2=8052: 14.63 14.68
length=139, align1=8052, align2=8052: 15.30 14.62
length=139, align1=8052, align2=8051: 15.20 14.95
length=140, align1=8051, align2=8052: 17.66 16.13
length=140, align1=8051, align2=8051: 14.60 14.68
length=140, align1=8051, align2=8051: 14.58 14.62
length=140, align1=8051, align2=8050: 15.51 14.94
length=141, align1=8050, align2=8051: 17.41 16.14
length=141, align1=8050, align2=8050: 14.77 14.71
length=141, align1=8050, align2=8050: 14.50 14.62
length=141, align1=8050, align2=8049: 14.95 14.97
length=142, align1=8049, align2=8050: 17.55 16.14
length=142, align1=8049, align2=8049: 14.46 14.70
length=142, align1=8049, align2=8049: 14.60 14.61
length=142, align1=8049, align2=8048: 14.77 14.78
length=143, align1=8048, align2=8049: 18.15 16.15
length=143, align1=8048, align2=8048: 13.92 14.02
length=143, align1=8048, align2=8048: 13.88 14.02
length=143, align1=8048, align2=8047: 14.11 14.32
length=144, align1=8047, align2=8048: 17.64 16.19
length=144, align1=8047, align2=8047: 14.20 13.96
length=144, align1=8047, align2=8047: 14.03 13.95
length=144, align1=8047, align2=8046: 14.36 14.32
length=145, align1=8046, align2=8047: 17.82 16.11
length=145, align1=8046, align2=8046: 14.39 13.95
length=145, align1=8046, align2=8046: 13.88 13.95
length=145, align1=8046, align2=8045: 14.55 14.33
length=146, align1=8045, align2=8046: 18.02 16.10
length=146, align1=8045, align2=8045: 13.91 13.95
length=146, align1=8045, align2=8045: 13.77 13.95
length=146, align1=8045, align2=8044: 14.26 14.32
length=147, align1=8044, align2=8045: 17.43 16.17
length=147, align1=8044, align2=8044: 14.02 14.01
length=147, align1=8044, align2=8044: 13.99 13.89
length=147, align1=8044, align2=8043: 14.40 14.32
length=148, align1=8043, align2=8044: 17.57 16.08
length=148, align1=8043, align2=8043: 14.00 13.95
length=148, align1=8043, align2=8043: 14.18 13.95
length=148, align1=8043, align2=8042: 14.66 14.33
length=149, align1=8042, align2=8043: 17.50 16.20
length=149, align1=8042, align2=8042: 13.87 13.95
length=149, align1=8042, align2=8042: 14.12 13.96
length=149, align1=8042, align2=8041: 14.74 14.32
length=150, align1=8041, align2=8042: 17.63 16.13
length=150, align1=8041, align2=8041: 13.87 13.95
length=150, align1=8041, align2=8041: 13.73 13.94
length=150, align1=8041, align2=8040: 14.31 14.34
length=151, align1=8040, align2=8041: 18.46 16.09
length=151, align1=8040, align2=8040: 15.37 13.95
length=151, align1=8040, align2=8040: 14.01 13.95
length=151, align1=8040, align2=8039: 14.25 14.32
length=152, align1=8039, align2=8040: 17.70 16.11
length=152, align1=8039, align2=8039: 13.89 14.03
length=152, align1=8039, align2=8039: 14.49 14.02
length=152, align1=8039, align2=8038: 14.31 14.39
length=153, align1=8038, align2=8039: 17.62 16.10
length=153, align1=8038, align2=8038: 13.75 13.95
length=153, align1=8038, align2=8038: 14.00 13.94
length=153, align1=8038, align2=8037: 14.25 14.33
length=154, align1=8037, align2=8038: 18.33 16.11
length=154, align1=8037, align2=8037: 14.12 13.96
length=154, align1=8037, align2=8037: 14.08 13.95
length=154, align1=8037, align2=8036: 15.15 14.33
length=155, align1=8036, align2=8037: 17.66 16.09
length=155, align1=8036, align2=8036: 14.22 14.01
length=155, align1=8036, align2=8036: 13.87 14.02
length=155, align1=8036, align2=8035: 14.63 14.32
length=156, align1=8035, align2=8036: 17.57 16.10
length=156, align1=8035, align2=8035: 14.00 13.96
length=156, align1=8035, align2=8035: 13.88 13.95
length=156, align1=8035, align2=8034: 14.79 14.41
length=157, align1=8034, align2=8035: 17.74 16.15
length=157, align1=8034, align2=8034: 14.13 13.94
length=157, align1=8034, align2=8034: 14.86 13.95
length=157, align1=8034, align2=8033: 14.35 14.33
length=158, align1=8033, align2=8034: 17.68 16.16
length=158, align1=8033, align2=8033: 13.94 13.94
H.J. Lu (2):
x86-64: Improve EVEX strcmp with masked load
x86-64: Remove Prefer_AVX2_STRCMP
sysdeps/x86/cpu-features.c | 8 -
sysdeps/x86/cpu-tunables.c | 2 -
...cpu-features-preferred_feature_index_1.def | 1 -
sysdeps/x86_64/multiarch/strcmp-evex.S | 461 +++++++++---------
sysdeps/x86_64/multiarch/strcmp.c | 3 +-
sysdeps/x86_64/multiarch/strncmp.c | 3 +-
6 files changed, 245 insertions(+), 233 deletions(-)
--
2.33.1
More information about the Libc-alpha
mailing list