This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Improve strstr performance of short needles
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: "Gabriel F. T. Gomes" <gabriel at inconstante dot eti dot br>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
- Date: Sat, 29 Sep 2018 00:21:04 +0000
- Subject: Re: [PATCH] Improve strstr performance of short needles
- References: <DB5PR08MB1030D79438AF0E17282F3E1783020@DB5PR08MB1030.eurprd08.prod.outlook.com>,<20180928183018.551042ff@tereshkova>
Gabriel F. T. Gomes wrote:
> On Wed, 05 Sep 2018, Wilco Dijkstra wrote:
>
>>Improve strstr performance for the common case of short needles. For 2-4
>>characters a small loop is typically fastest. On large strings the speedup
>>with a needle size of 4 is about 65%.
>
> I ran some tests on a powerpc64le machine and I noticed that, as the
> haystack becomes larger, the search for a needle that is *not* in the
> haystack, became slower with this change.
That seems odd, particularly since the Two-Way algorithm is pretty slow
for small needles (large preprocessing overhead and small skips).
> For instance, when the haystack is 32 chars long, searching for a 4 chars
> long needle that is not in the haystack [1] took ~11% to ~22% longer to
> finish, when compared to the pristine code. When the needle *is* in the
> haystack, that did not happen and the performance was usually 30% better.
>
> Have you observed similar results in your tests?
I haven't looked in detail at the GLIBC strstr benchmark results. I'm running
tests with natural language texts from 64 bytes to 256KB. Generally speaking
the short needle Two-Way algorithm is terribly slow on both short and long
haystacks. Short needles make it worse since it can't skip much, so there isn't
any point in keeping it (I posted a much faster replacement in newlib).
Wilco