This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] Improve performance of memmem
- From: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, 'GNU C Library' <libc-alpha at sourceware dot org>
- Cc: nd <nd at arm dot com>
- Date: Tue, 11 Jun 2019 09:30:29 +0000
- Subject: Re: [PATCH v2] Improve performance of memmem
- References: <DB5PR08MB1030105EE3706615A33BB02283B80@DB5PR08MB1030.eurprd08.prod.outlook.com> <DB5PR08MB10306332982B5ADC766894BE83920@DB5PR08MB1030.eurprd08.prod.outlook.com> <DB5PR08MB1030235578A697313FCF191F83730@DB5PR08MB1030.eurprd08.prod.outlook.com> <AM6PR08MB5078F033E9076789B9A1D1A383470@AM6PR08MB5078.eurprd08.prod.outlook.com> <AM6PR08MB5078D8B9BFBCA6DCF84D875383280@AM6PR08MB5078.eurprd08.prod.outlook.com> <VI1PR0801MB2127BC5B98C3584BFA3B92E883020@VI1PR0801MB2127.eurprd08.prod.outlook.com> <VI1PR0801MB212778920393C7817D1FE08983130@VI1PR0801MB2127.eurprd08.prod.outlook.com> <VI1PR0801MB21278FF8424901F1FEC9AAFB83130@VI1PR0801MB2127.eurprd08.prod.outlook.com>
On 10/06/2019 19:31, Wilco Dijkstra wrote:
> v2: Update comments after review.
>
> This patch significantly improves performance of memmem using a novel
> modified Horspool algorithm. Needles up to size 256 use a bad-character
> table indexed by hashed pairs of characters to quickly skip past mismatches.
> Long needles use a self-adapting filtering step to avoid comparing the whole
> needle repeatedly.
>
> By limiting the needle length to 256, the shift table only requires 8 bits
> per entry, lowering preprocessing overhead and minimizing cache effects.
> This limit also implies worst-case performance is linear.
>
> Small needles up to size 2 use a dedicated linear search. Very long needles
> use the Two-Way algorithm (to avoid increasing stack size or slowing down
> the common case, inlining is disabled).
>
> The performance gain is 6.6 times on English text on AArch64 using random
> needles with average size 8 (this is even faster than the recently improved strstr
> algorithm, so I'll update that in the near future).
the comment about strstr is no longer relevant.
>
> Tested against GLIBC testsuite and randomized tests. OK for commit?
>
> ChangeLog:
> 2019-06-10 Wilco Dijkstra <wdijkstr@arm.com>
>
> * string/memmem.c (__memmem): Rewrite to improve performance.
>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
i only had one comment below if that's addressed
then i think it's ready to commit.
(but i think you should wait a day in case there
are further comments on this latest version.)
> + for ( ; hs <= end; )
> {
> - haystack = memchr (haystack, *needle, haystack_len);
> - if (!haystack || __builtin_expect (needle_len == 1, 0))
> - return (void *) haystack;
> - haystack_len -= haystack - (const unsigned char *) haystack_start;
> - if (haystack_len < needle_len)
> - return NULL;
> - /* Check whether we have a match. This improves performance since we
> - avoid the initialization overhead of the two-way algorithm. */
> - if (memcmp (haystack, needle, needle_len) == 0)
> - return (void *) haystack;
> - return two_way_short_needle (haystack, haystack_len, needle, needle_len);
> + /* Skip past character pairs not in the needle. */
> + do
> + {
> + hs += m1;
> + tmp = shift[hash2 (hs)];
> + }
> + while (tmp == 0 && hs <= end);
i noticed that the check here is in different
order than in strstr, i wonder if that's deliberate.
if either way is fine i'd prefer to have the same
logic in strstr and memmem.
> +
> + /* If the match is not at the end of the needle, shift to the end
> + and continue until we match the hash of the needle end. */
> + hs -= tmp;
> + if (tmp < m1)
> + continue;
> +
> + /* Hash of the last 2 characters matches. If the needle is long,
> + try to quickly filter out mismatches. */
> + if (m1 < 15 || memcmp (hs + offset, ne + offset, 8) == 0)
> + {
> + if (memcmp (hs, ne, m1) == 0)
> + return (void *) hs;
> +
> + /* Adjust filter offset when it doesn't find the mismatch. */
> + offset = (offset >= 8 ? offset : m1) - 8;
> + }
> +
> + /* Skip based on matching the hash of the needle end. */
> + hs += shift1;
> }