This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] Improve performance of memmem

From: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, 'GNU C Library' <libc-alpha at sourceware dot org>
Cc: nd <nd at arm dot com>
Date: Tue, 11 Jun 2019 09:30:29 +0000
Subject: Re: [PATCH v2] Improve performance of memmem
References: <DB5PR08MB1030105EE3706615A33BB02283B80@DB5PR08MB1030.eurprd08.prod.outlook.com> <DB5PR08MB10306332982B5ADC766894BE83920@DB5PR08MB1030.eurprd08.prod.outlook.com> <DB5PR08MB1030235578A697313FCF191F83730@DB5PR08MB1030.eurprd08.prod.outlook.com> <AM6PR08MB5078F033E9076789B9A1D1A383470@AM6PR08MB5078.eurprd08.prod.outlook.com> <AM6PR08MB5078D8B9BFBCA6DCF84D875383280@AM6PR08MB5078.eurprd08.prod.outlook.com> <VI1PR0801MB2127BC5B98C3584BFA3B92E883020@VI1PR0801MB2127.eurprd08.prod.outlook.com> <VI1PR0801MB212778920393C7817D1FE08983130@VI1PR0801MB2127.eurprd08.prod.outlook.com> <VI1PR0801MB21278FF8424901F1FEC9AAFB83130@VI1PR0801MB2127.eurprd08.prod.outlook.com>

On 10/06/2019 19:31, Wilco Dijkstra wrote:
> v2: Update comments after review.
> 
> This patch significantly improves performance of memmem using a novel
> modified Horspool algorithm.  Needles up to size 256 use a bad-character
> table indexed by hashed pairs of characters to quickly skip past mismatches.
> Long needles use a self-adapting filtering step to avoid comparing the whole
> needle repeatedly.
> 
> By limiting the needle length to 256, the shift table only requires 8 bits
> per entry, lowering preprocessing overhead and minimizing cache effects.
> This limit also implies worst-case performance is linear.
> 
> Small needles up to size 2 use a dedicated linear search.  Very long needles
> use the Two-Way algorithm (to avoid increasing stack size or slowing down
> the common case, inlining is disabled).
> 
> The performance gain is 6.6 times on English text on AArch64 using random
> needles with average size 8 (this is even faster than the recently improved strstr
> algorithm, so I'll update that in the near future).

the comment about strstr is no longer relevant.

> 
> Tested against GLIBC testsuite and randomized tests. OK for commit?
> 
> ChangeLog:
> 2019-06-10  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* string/memmem.c (__memmem): Rewrite to improve performance.
> 

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

i only had one comment below if that's addressed
then i think it's ready to commit.

(but i think you should wait a day in case there
are further comments on this latest version.)

> +  for ( ; hs <= end; )
>      {
> -      haystack = memchr (haystack, *needle, haystack_len);
> -      if (!haystack || __builtin_expect (needle_len == 1, 0))
> -	return (void *) haystack;
> -      haystack_len -= haystack - (const unsigned char *) haystack_start;
> -      if (haystack_len < needle_len)
> -	return NULL;
> -      /* Check whether we have a match.  This improves performance since we
> -	 avoid the initialization overhead of the two-way algorithm.  */
> -      if (memcmp (haystack, needle, needle_len) == 0)
> -	return (void *) haystack;
> -      return two_way_short_needle (haystack, haystack_len, needle, needle_len);
> +      /* Skip past character pairs not in the needle.  */
> +      do
> +	{
> +	  hs += m1;
> +	  tmp = shift[hash2 (hs)];
> +	}
> +      while (tmp == 0 && hs <= end);

i noticed that the check here is in different
order than in strstr, i wonder if that's deliberate.

if either way is fine i'd prefer to have the same
logic in strstr and memmem.

> +
> +      /* If the match is not at the end of the needle, shift to the end
> +	 and continue until we match the hash of the needle end.  */
> +      hs -= tmp;
> +      if (tmp < m1)
> +	continue;
> +
> +      /* Hash of the last 2 characters matches.  If the needle is long,
> +	 try to quickly filter out mismatches.  */
> +      if (m1 < 15 || memcmp (hs + offset, ne + offset, 8) == 0)
> +	{
> +	  if (memcmp (hs, ne, m1) == 0)
> +	    return (void *) hs;
> +
> +	  /* Adjust filter offset when it doesn't find the mismatch.  */
> +	  offset = (offset >= 8 ? offset : m1) - 8;
> +	}
> +
> +      /* Skip based on matching the hash of the needle end.  */
> +      hs += shift1;
>      }

References:
- Re: [PATCH] Improve performance of memmem
  - From: Wilco Dijkstra
- Re: [PATCH v2] Improve performance of memmem
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]