This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Use Quicksearch in strstr


Hi,

Ondřej Bílka wrote:
> Real benefit is fixing idiocy of calculating needle size before calling
> strchr, code after that tries to make as this make cold path faster.

Doing a single strchr early helps indeed if the first char of needle occurs
infrequently or it finds the match. But I don't see it improving performance much.

> Better would be to call strchr again and if it still doesn't match for third time then
> code after that shouldn't matter as long as its linear.

Repeated strchr/memchr seems to be very bad for performance on characters 
that occur frequently.

> If you capture inputs of actual applications you will find what are bottlenecks.

Yes but we'd need applications which use strstr a lot.

> As for real inputs record someing with this
> http://kam.mff.cuni.cz/~ondra/dryrun.tar.bz2

Thanks, I'll have a look.

> For arm and anything with assembly strchr you would gain more by similar trick as I did for x64,
> for hot path its easy modification of strchr where you instead of one
> character check first two characters where convient(function leaky_digraph), sometimes you check
> only first one rather than trying to include character with unsuitable
> alignment. Note that unaligned loads by one character more could be
> faster than trying to shift vectors.

Sure an assembly implementation could easily search for 2 characters at a time.
But right now I'm improving the generic C version first. It's not even using unaligned
accesses yet - and those can give significant gains both in brute force search
and smarter algorithms like Quicksearch.

logic is following:
strstr(s,n) {
if (!n[0]) return s;
if (!n[1]) return strchr (s, n); // corner case, one-char needles are rare
   s = leaky_digraph(s, n[0], n[1]);
   if (!s)
     return NULL;
   for (i = 1; n[i] && s[i] == n[i]; i++);
   if (!n[i]) return s;
   return strstr_two_way(s,n);
}

Yes it would be feasible to add a digraph test like that to replace the initial strchr check.
Do you have an idea what percentage of calls it filters compared to plain strchr?

Wilco
    

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]