This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Improve strstr performance of short needles


Gabriel F. T. Gomes wrote:
> On Wed, 05 Sep 2018, Wilco Dijkstra wrote:
>
>>Improve strstr performance for the common case of short needles.  For 2-4 
>>characters a small loop is typically fastest.  On large strings the speedup
>>with a needle size of 4 is about 65%.
>
> I ran some tests on a powerpc64le machine and I noticed that, as the
> haystack becomes larger, the search for a needle that is *not* in the
> haystack, became slower with this change.

That seems odd, particularly since the Two-Way algorithm is pretty slow
for small needles (large preprocessing overhead and small skips).

> For instance, when the haystack is 32 chars long, searching for a 4 chars
> long needle that is not in the haystack [1] took ~11% to ~22% longer to
> finish, when compared to the pristine code.  When the needle *is* in the
> haystack, that did not happen and the performance was usually 30% better.
>
> Have you observed similar results in your tests?

I haven't looked in detail at the GLIBC strstr benchmark results. I'm running
tests with natural language texts from 64 bytes to 256KB. Generally speaking
the short needle Two-Way algorithm is terribly slow on both short and long
haystacks. Short needles make it worse since it can't skip much, so there isn't
any point in keeping it (I posted a much faster replacement in newlib).

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]