[PATCH] Use memmem algorithm for strstr
Eric Blake
eblake@redhat.com
Thu Jan 3 19:18:00 GMT 2019
On 12/27/18 12:30 PM, Wilco Dijkstra wrote:
> This patch further improves performance of strstr by using the new memmem
> algorithm (http://www.cygwin.com/ml/newlib/2018/msg01151.html).
> The performance gain on English text with randomized needles of average size
> 8 is 25% on small haystacks and 75% for large haystacks.
>
> Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
> (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).
>
> --
> @@ -150,10 +130,6 @@ strstr (const char *haystack, const char *needle)
> return (char*)strchr (hs, ne[0]);
> if (ne[2] == '\0')
> return strstr2 (hs, ne);
> - if (ne[3] == '\0')
> - return strstr3 (hs, ne);
> - if (ne[4] == '\0')
> - return strstr4 (hs, ne);
>
> size_t ne_len = strlen (ne);
> size_t hs_len = strnlen (hs, ne_len | 512);
Is the new code faster than strstr[34] on short needles, or should you
be keeping those versions around?
Calling strlen (ne) can be wasteful for someone that calls
strstr("short", "super_long.....") - such callers will always get an
answer of NULL, but the question is how much dead work did we do in the
meantime. My original Two-Way implementation tried hard to not compute
the bounds of EITHER needle or haystack in isolation, but rather made a
first pass through both simultaneously to detect early exit
opportunities long before an unbounded strlen() on a disproportionately
long counterpart. Thus, I think this code should be using strnlen (ne,
256) before deciding to switch to Two-Way, rather than an unbounded length.
> @@ -162,39 +138,59 @@ strstr (const char *haystack, const char *needle)
> if (hs_len < ne_len)
> return NULL;
>
> - /* Use the Quick-Search algorithm for needle lengths less than 255. */
> - if (__builtin_expect (ne_len < 255, 1))
> + /* Use Two-Way algorithm for very long needles. */
> + if (__builtin_expect (ne_len > 256, 0))
> + return two_way_long_needle (hs, hs_len, ne, ne_len);
Is hs_len accurate (or even ne_len, if you take my suggestion to use
strnlen() in its compuatation)? Or do you first need to (re-)compute
accurate lengths?
> +
> + const unsigned char *end = hs + hs_len - ne_len;
> + uint8_t shift[256];
> + size_t tmp, shift1;
> + size_t m1 = ne_len - 1;
> + size_t offset = 0;
> +
> + /* Initialize bad character shift hash table. */
> + memset (shift, 0, sizeof (shift));
> + for (int i = 1; i < m1; i++)
> + shift[hash2 (ne + i)] = i;
> + shift1 = m1 - shift[hash2 (ne + m1)];
> + shift[hash2 (ne + m1)] = m1;
> +
> + while (1)
> {
> - uint8_t shift[1 << SHIFT_TABLE_BITS];
> - const unsigned char *end = hs + hs_len - ne_len;
> -
> - /* Initialize bad character shift hash table. */
> - memset (shift, ne_len + 1, sizeof (shift));
> - for (int i = 0; i < ne_len; i++)
> - shift[ne[i] % sizeof (shift)] = ne_len - i;
> + if (__builtin_expect (hs > end, 0))
> + {
> + end += strnlen (end + m1 + 1, 2048);
Why hard-coded 2048 here, vs. ne_len|512 above?
> + if (hs > end)
> + return NULL;
> + }
>
> + /* Skip past character pairs not in the needle. */
> do
> {
> - hs--;
> -
> - /* Search by skipping past bad characters. */
> - size_t tmp = shift[hs[ne_len] % sizeof (shift)];
> - for (hs += tmp; hs <= end; hs += tmp)
> - {
> - tmp = shift[hs[ne_len] % sizeof (shift)];
> - if (memcmp (hs, ne, ne_len) == 0)
> - return (char*) hs;
> - }
> - if (end[ne_len] == 0)
> - return NULL;
> - end += strnlen (end + ne_len, 2048);
> + hs += m1;
> + tmp = shift[hash2 (hs)];
> }
> - while (hs <= end);
> + while (hs <= end && tmp == 0);
>
> - return NULL;
> - }
> + /* If the match is not at the end of the needle, shift to the end
> + and continue until we match the last 2 characters. */
> + hs -= tmp;
> + if (tmp < m1)
> + continue;
>
> - /* Use Two-Way algorithm for very long needles. */
> - return two_way_long_needle (hs, hs_len, ne, ne_len);
> + /* The last 2 characters match. If the needle is long, check a
> + fixed number of characters first to quickly filter out mismatches. */
> + if (m1 <= 15 || memcmp (hs + offset, ne + offset, sizeof (long)) == 0)
> + {
> + if (memcmp (hs, ne, m1) == 0)
> + return (void *) hs;
> +
> + /* Adjust filter offset when it doesn't find the mismatch. */
> + offset = (offset >= sizeof (long) ? offset : m1) - sizeof (long);
> + }
> +
> + /* Skip based on matching the last 2 characters. */
> + hs += shift1;
> + }
> }
> #endif /* compilation for speed */
>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://sourceware.org/pipermail/newlib/attachments/20190103/b11f063e/attachment.sig>
More information about the Newlib
mailing list