This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v2] Improve performance of strstr
On Mon, Apr 15, 2019 at 04:15:02PM -0400, Zack Weinberg wrote:
> On Mon, Apr 15, 2019 at 2:02 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > Hi Rich,
> > >> Yes, without a reproducible example I can't see what your issue is. You
> > >> can't make it go quadratic because it simply isn't.
> > >
> > > Obviously it's not unbounded because you have a (very large) bound on
> > > the size, 256. I can make it do a 256-byte strcmp for nearly every
> > > byte of the input haystack. Maybe because of vectorization on some
> > > targets that's only 16x slower than the current code rather than 256x
> > > slower, but it's still a lot slower.
> > No you can't. It's impossible to make it do a full 256 byte memcmp every
> > character. And bad cases are not bad because of the time spent comparing
> > strings - they are bad because of mispredicted branches. So it's not possible
> > to compare bad cases without benchmarking actual examples on modern
> > CPUs.
> This discussion has been going in circles for quite some time now.
> Wilco, Rich, I think it would help a lot if you could BOTH write down
> some example needle and haystack pairs that you believe will
> demonstrate significantly improved performance with your preferred
> algorithm, and/or pathologically slow performance with your
> dispreferred algorithm. Even without specific numbers, that will give
> everyone something concrete to argue over, at least.
That would be easier if he didn't keep adding random heuristic
mitigations to hide the worst cases. Writing up the specific worst
case now requires reverse engineering the hash function for collisions
and figuring out the sliding offset that gets checked first.
The current code does not even look *correct*, much less fast. The
+ /* The last 2 characters match. If the needle is long, check a
occurs at a point where no comparison on the last 2 characters has
even been made; only the hash has been examined. I'm not sure if there
are collisions possible which would make it return false positives
(it's constrained somewhat by the second-to-last byte being compared
in the final memcmp), but it's very concerning.
I'll try to reverse-engineer the offset stuff later to produce a real
worst-case, but this whole exercise is really frustrating. I've
already been nerd-sniped into spending an inordinate amount of time
researching possible improvements to twoway, which should be the job
of someone proposing the changes, not someone objecting to bogus