[PATCH v4 0/2] Improve wcsstr
Wilco Dijkstra
Wilco.Dijkstra@arm.com
Tue Mar 19 15:07:37 GMT 2024
Hi Adhemerval,
> With this fix, and with the removal of the powerpc strcasestr
> optimization [2], it seems that only x86_64 still provides a non
> O(m*n) implementation [3]. Noah already gave a +1, so it would be
> good to have some confirmation that this implementation can really
> show some quadradic behaviour before propose a removal.
Yes it is a simple brute-force algorithm that checks the whole needle
at a matching character pair (and does so 1 byte at a time after the
first 64 bytes of a needle). Also it never skips ahead and thus can match
at every haystack position after trying to match all of the needle.
I added a quick test for this (every different implementation requires
a unique test for its worst-case), and I got:
"ifuncs": ["basic_strstr", "twoway_strstr", "__strstr_avx512", "__strstr_sse2_unaligned", "__strstr_generic"],
{
"len_haystack": 65536,
"len_needle": 1024,
"align_haystack": 0,
"align_needle": 0,
"fail": 1,
"desc": "Difficult bruteforce needle",
"timings": [4.0948e+07, 15094.5, 3.20818e+07, 108558, 10839.2]
},
{
"len_haystack": 1048576,
"len_needle": 4096,
"align_haystack": 0,
"align_needle": 0,
"fail": 1,
"desc": "Difficult bruteforce needle",
"timings": [2.69767e+09, 100797, 2.08535e+09, 495706, 82666.9]
}
So a 4x larger needle and 16x larger haystack gives a clear 65x slowdown on
both basic_strstr and __strstr_avx512.
Cheers,
Wilco
More information about the Libc-alpha
mailing list