This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Improve performance of memmem


On 10/06/2019 19:31, Wilco Dijkstra wrote:
> v2: Update comments after review.
> 
> This patch significantly improves performance of memmem using a novel
> modified Horspool algorithm.  Needles up to size 256 use a bad-character
> table indexed by hashed pairs of characters to quickly skip past mismatches.
> Long needles use a self-adapting filtering step to avoid comparing the whole
> needle repeatedly.
> 
> By limiting the needle length to 256, the shift table only requires 8 bits
> per entry, lowering preprocessing overhead and minimizing cache effects.
> This limit also implies worst-case performance is linear.
> 
> Small needles up to size 2 use a dedicated linear search.  Very long needles
> use the Two-Way algorithm (to avoid increasing stack size or slowing down
> the common case, inlining is disabled).
> 
> The performance gain is 6.6 times on English text on AArch64 using random
> needles with average size 8 (this is even faster than the recently improved strstr
> algorithm, so I'll update that in the near future).

the comment about strstr is no longer relevant.

> 
> Tested against GLIBC testsuite and randomized tests. OK for commit?
> 
> ChangeLog:
> 2019-06-10  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* string/memmem.c (__memmem): Rewrite to improve performance.
> 

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

i only had one comment below if that's addressed
then i think it's ready to commit.

(but i think you should wait a day in case there
are further comments on this latest version.)

> +  for ( ; hs <= end; )
>      {
> -      haystack = memchr (haystack, *needle, haystack_len);
> -      if (!haystack || __builtin_expect (needle_len == 1, 0))
> -	return (void *) haystack;
> -      haystack_len -= haystack - (const unsigned char *) haystack_start;
> -      if (haystack_len < needle_len)
> -	return NULL;
> -      /* Check whether we have a match.  This improves performance since we
> -	 avoid the initialization overhead of the two-way algorithm.  */
> -      if (memcmp (haystack, needle, needle_len) == 0)
> -	return (void *) haystack;
> -      return two_way_short_needle (haystack, haystack_len, needle, needle_len);
> +      /* Skip past character pairs not in the needle.  */
> +      do
> +	{
> +	  hs += m1;
> +	  tmp = shift[hash2 (hs)];
> +	}
> +      while (tmp == 0 && hs <= end);

i noticed that the check here is in different
order than in strstr, i wonder if that's deliberate.

if either way is fine i'd prefer to have the same
logic in strstr and memmem.

> +
> +      /* If the match is not at the end of the needle, shift to the end
> +	 and continue until we match the hash of the needle end.  */
> +      hs -= tmp;
> +      if (tmp < m1)
> +	continue;
> +
> +      /* Hash of the last 2 characters matches.  If the needle is long,
> +	 try to quickly filter out mismatches.  */
> +      if (m1 < 15 || memcmp (hs + offset, ne + offset, 8) == 0)
> +	{
> +	  if (memcmp (hs, ne, m1) == 0)
> +	    return (void *) hs;
> +
> +	  /* Adjust filter offset when it doesn't find the mismatch.  */
> +	  offset = (offset >= 8 ? offset : m1) - 8;
> +	}
> +
> +      /* Skip based on matching the hash of the needle end.  */
> +      hs += shift1;
>      }

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]