This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Use Quicksearch in strstr
Ondřej Bílka wrote:
> Real benefit is fixing idiocy of calculating needle size before calling
> strchr, code after that tries to make as this make cold path faster.
Doing a single strchr early helps indeed if the first char of needle occurs
infrequently or it finds the match. But I don't see it improving performance much.
> Better would be to call strchr again and if it still doesn't match for third time then
> code after that shouldn't matter as long as its linear.
Repeated strchr/memchr seems to be very bad for performance on characters
that occur frequently.
> If you capture inputs of actual applications you will find what are bottlenecks.
Yes but we'd need applications which use strstr a lot.
> As for real inputs record someing with this
Thanks, I'll have a look.
> For arm and anything with assembly strchr you would gain more by similar trick as I did for x64,
> for hot path its easy modification of strchr where you instead of one
> character check first two characters where convient(function leaky_digraph), sometimes you check
> only first one rather than trying to include character with unsuitable
> alignment. Note that unaligned loads by one character more could be
> faster than trying to shift vectors.
Sure an assembly implementation could easily search for 2 characters at a time.
But right now I'm improving the generic C version first. It's not even using unaligned
accesses yet - and those can give significant gains both in brute force search
and smarter algorithms like Quicksearch.
logic is following:
if (!n) return s;
if (!n) return strchr (s, n); // corner case, one-char needles are rare
s = leaky_digraph(s, n, n);
for (i = 1; n[i] && s[i] == n[i]; i++);
if (!n[i]) return s;
Yes it would be feasible to add a digraph test like that to replace the initial strchr check.
Do you have an idea what percentage of calls it filters compared to plain strchr?