This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Improve strstr performance of short needles
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: "Gabriel F. T. Gomes" <gabriel at inconstante dot eti dot br>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
- Date: Mon, 1 Oct 2018 17:05:56 +0000
- Subject: Re: [PATCH] Improve strstr performance of short needles
- References: <DB5PR08MB1030D79438AF0E17282F3E1783020@DB5PR08MB1030.eurprd08.prod.outlook.com>,<20180928183018.551042ff@tereshkova>,<DB5PR08MB1030F8FBED4FB20D34E2A1CE83EC0@DB5PR08MB1030.eurprd08.prod.outlook.com>
>> I ran some tests on a powerpc64le machine and I noticed that, as the
>> haystack becomes larger, the search for a needle that is *not* in the
>> haystack, became slower with this change.
> That seems odd, particularly since the Two-Way algorithm is pretty slow
> for small needles (large preprocessing overhead and small skips).
I looked at bench-strstr again. There are cases where Two-Way wins by
10-15% when the string is found but for short strings the linear loop wins.
On average I can't see any performance difference.
However when iterating through 16K needles randomly chosen from natural
text I get the following speedups for different haystack and needle sizes:
H64 H256 H4096
N2: 64% 73% 46%
N3: 74% 66% 37%
N4: 94% 92% 66%
As you can see the speedup is larger for small needles since the preprocessing
overhead of the Two-Way algorithm is significant. This is completely hidden by
the GLIBC test due to using just 1 needle.