This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Use Quicksearch in strstr


On Mon, 29 Oct 2018, Wilco Dijkstra wrote:

> The needle length is limited to 254 - combined with a hashed shift table
> this reduces the shift table memory 16 to 32 times, lowering preprocessing
> overhead and minimizing cache effects.

This is a bit unclear: given needle length limit, unhashed table would require
256 bytes, hashing as implemented in the patch reduces it to 64 bytes. I think
unhashed variant should be even faster, is saving 192 bytes on stack worth it?

> The performance gain using the improved bench-strstr on Cortex-A72 is 2.1x.
> Due to the low overhead the gain is larger on small haystacks: haystacks of 64
> chars are ~3 times faster for needles of 5-16 chars.

How does the worst case change look like (I think it's searching for "a{251}ba"
in "aaa...aaa" if memcmp works left-to-right)?  Does it make sense to move
needle length cutoff further below 254?

> 	* string/str-two-way.h (two_way_short_needle): Add inline to avoid warning.
> 	* string/strstr.c (strstr2): Add new function.
> 	(strstr3): Likewise.
> 	(strstr3): Likewise.
> 	(STRSTR): Completely rewrite strstr to improve performance

Generating the patch with 'git diff --patience' may produce a more readable diff.

> +  /* Use the Quick-Search algorithm for needle lengths less than 255.  */
> +  if (__glibc_likely (ne_len < 255))
> +    {
> +      uint8_t shift[1 << SHIFT_TABLE_BITS];
> +      const unsigned char *end = hs + hs_len - ne_len;
> +
> +      /* Initialize bad character shift hash table.  */
> +      memset (shift, ne_len + 1, sizeof (shift));
> +      for (int i = 0; i < ne_len; i++)
> +	shift[ne[i] % sizeof (shift)] = ne_len - i;
> +
> +      do
> +	{
> +	  hs--;

Doesn't this move hs out of bounds and thus invoke undefined behavior?

Thanks.
Alexander


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]