Fastest String Search Algorithm.
Phil Blundell
pb@pbcl.net
Wed Jun 9 09:26:39 GMT 2021
On Wed, Jun 09, 2021 at 01:24:39PM +0530, Amit Choudhary wrote:
> So, similarly, a check can be made that if the needle length is greater
> than 4 KB then my algorithm should be used.
Okay, I see. That would add an extra comparison and a probably-not-taken
branch for strstr() of strings between 256 and 4095 bytes in length,
but I imagine that's in the noise.
The comment above two_way_needle() describes its cost as:
Return the first location of non-empty NEEDLE within HAYSTACK, or
NULL. HAYSTACK_LEN is the minimum known length of HAYSTACK. This
method is optimized for LONG_NEEDLE_THRESHOLD <= NEEDLE_LEN.
Performance is guaranteed to be linear, with an initialization cost
of 3 * NEEDLE_LEN + (1 << CHAR_BIT) operations.
If AVAILABLE does not modify HAYSTACK_LEN (as in memmem), then at
most 2 * HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching,
and sublinear performance O(HAYSTACK_LEN / NEEDLE_LEN) is possible.
If AVAILABLE modifies HAYSTACK_LEN (as in strstr), then at most 3 *
HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching, and
sublinear performance is not possible.
How does this compare to your algorithm?
Phil
More information about the Libc-alpha
mailing list