Fastest String Search Algorithm.

Phil Blundell pb@pbcl.net
Wed Jun 9 09:26:39 GMT 2021


On Wed, Jun 09, 2021 at 01:24:39PM +0530, Amit Choudhary wrote:
> So, similarly, a check can be made that if the needle length is greater
> than 4 KB then my algorithm should be used.

Okay, I see.  That would add an extra comparison and a probably-not-taken
branch for strstr() of strings between 256 and 4095 bytes in length,
but I imagine that's in the noise.

The comment above two_way_needle() describes its cost as:

   Return the first location of non-empty NEEDLE within HAYSTACK, or
   NULL.  HAYSTACK_LEN is the minimum known length of HAYSTACK.  This
   method is optimized for LONG_NEEDLE_THRESHOLD <= NEEDLE_LEN.
   Performance is guaranteed to be linear, with an initialization cost
   of 3 * NEEDLE_LEN + (1 << CHAR_BIT) operations.

   If AVAILABLE does not modify HAYSTACK_LEN (as in memmem), then at
   most 2 * HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching,
   and sublinear performance O(HAYSTACK_LEN / NEEDLE_LEN) is possible.
   If AVAILABLE modifies HAYSTACK_LEN (as in strstr), then at most 3 *
   HAYSTACK_LEN - NEEDLE_LEN comparisons occur in searching, and
   sublinear performance is not possible.

How does this compare to your algorithm?

Phil


More information about the Libc-alpha mailing list