This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Use Quicksearch in strstr
On Fri, Nov 09, 2018 at 12:46:54PM +0000, Wilco Dijkstra wrote:
> Rich Felker wrote:
> > On further reading, it looks like the whole "performance problem" is
> > probably lack of a machine-optimized comparison loop for the two
> > (forward and backwards) comparisons in two-way. If we had "rep cmp"
> > functions for each direction (optimized to use specialized insns or
> > vector ops, possibly misaligned if allowed by the ISA, taking into
> > account that we know the needle length and a lower bound on the
> > haystack length so that there's no overread issue) returning the first
> > mismatching offset, I see no reason why there should be any cases that
> > perform worse than "quicksearch", and pretty much all cases should
> > speed up a lot vs the status quo.
> > Ideally the "rep cmp" functions should be inline containing inline asm
> > fragments so they can be inlined right into the loop without any call
> > overhead, but this probably only matters for short needles.
> This won't help at all. The character matching loops don't show up in profiles.
> Almost all time is spent in finding the first character, ie. strchr (short needles)
> and the Horsepool loop (long needles). Using Quicksearch to find a prefix of
> the suffix string should be faster.
Which profiles? The common cases (where significant-length prefix
matches are usually rare) or the worst cases? For needles that factor
poorly (long right factor), the latter breaks down to essentially
repeated disjoint memcmp's of the right factor against the whole
haystack, in successive order, starting over at the beginning of the
needle's right factor whenever there's a mismatch. That sounds highly
"memcmp-bound" to me. If worst cases are something you care about
improving vs the status quo, this is worth pursuing, but if not, okay.
I agree that for common cases there are potentially much better
improvements to be made elsewhere. strchr is of course a big help when
the character to be searched is rare but useless (and pessimizes from
call/setup overhead unless you can inline it) if it's frequent. It's
probably better to search for the first up-to-N characters of the
right factor, where N is a word/vector size, rotating bytes of the
haystack in/out of a running word/vector for comparison (musl's strstr
does this for needle sizes up to 4 already, and some data has shown
it's usually better than two-way a bit further on 64-bit systems).
Performing the critical factorization via the maximal suffix process
described in the two-way paper is also suboptimal, and dependent on
the actual order of the alphabet. For example hello factors as
[hell][o] but hellk factors as [he][llk] despite them being
structurally isomorphic under a renaming of the alphabet. In some
cases, optimal factorizations can be determined by short-circuiting
past the MS process. For example if the loop setting up the bad char
table determines that the final character of the needle is the only
occurrance of that character, [l-1] is a critical factorization and
p=l. There are larger classes of this type of shortcut but I'm not
sure what's efficient/reasonable-tradeoff to do. Searching for a short
right factor is appealing though because it means you get to skip by
the whole period as soon as there's any mismatch on the left factor.
It might be possible to choose (at least heuristically) some
permutation of the alphabet (probably anything more than a rotation of
it is not practical, though) that optimizes/improves the MS
factorization results though. For example, in the "hellk" example,
rotating the alphabet so that 'k' is the first or last character would
make it factor as [hell][k] rather than [he][llk].