This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH][BZ #12100] strstr with unaligned loads.
- From: Rich Felker <dalias at aerifal dot cx>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: libc-alpha at sourceware dot org
- Date: Mon, 26 Aug 2013 22:20:42 -0400
- Subject: Re: [PATCH][BZ #12100] strstr with unaligned loads.
- Authentication-results: sourceware.org; auth=none
- References: <20130825090758 dot GA23686 at domone dot kolej dot mff dot cuni dot cz> <20130825155406 dot GV20515 at brightrain dot aerifal dot cx> <20130826070925 dot GA4292 at domone dot kolej dot mff dot cuni dot cz> <20130826162446 dot GZ20515 at brightrain dot aerifal dot cx> <20130827020414 dot GA8611 at domone dot kolej dot mff dot cuni dot cz>
On Tue, Aug 27, 2013 at 04:04:14AM +0200, OndÅej BÃlka wrote:
> > Note that the above-quoted code is only for the short-needle case
> > without the bad character table, which I also claim should be removed
> > (empirically, glibc is much slower than musl on short needles,
> False, on random testcases sse4.2 often beats musl by 50% factor. I also
> tested musl on noise and it is 20% slower due of factorization overhead.
I'm assuming use of the two-way strstr code in glibc. The sse4.2 code
winning on random testcases is meaningless since it has quadratic
performance in the worst-case. For strstr random input is always the
> > presumably because musl always uses the bad-character table). There
> > may be additional logic considerations to integrating any optimized
> > "next pair" search with the bad character table; I have not checked
> > this.
> Bad character is also heuristic that is not needed when pair heuristic
> is used as it would only slow it down.
Bad character table changes the typical runtime from O(n) to O(n/m).
This is HUGE. For example, searching for a 100-byte needle in a
10000-byte haystack can often complete with just 100 comparisons.
> There is a lower bound that you need to inspect each character to check
> if it is zero.
Fair enough, though this does not apply to memmem...
You could also cheat and allow reading past the end of the string up
to the next page boundary, but valgrind users would hate you for
> Provided that pairs are unlikely I could get within
> factor of 2 of that lower bound. I traded here worse benchmark
> performance for better real-time performance. For benchmark-oriented I
> would seek for triples which will cut most of misprediction.
If you can update two-way for C strings to integrate the pair search
with the end-of-string checking, I think you could have an unbeatable
solution. Right now, glibc just has an unbeatably slow solution, and
fake benchmarks (random input) making it look fast. Next time try
benchmarking searches for a^Mb in a^Nb or something else hard.
> > > And most time first character will not match and in cases when it will
> > > second won't most of time.
> > The case you have to worry about is when they do match every single
> > time, and whether this increases the runtime from linear to quadratic.
> Please read mails before replying to them, from original mail.
> > On Sun, Aug 25, 2013 at 11:07:58AM +0200, OndÅej BÃlka wrote:
> > > This could be solved by standard buy or rent techique which I employ.
> > > For any position p there could be only p+64n+512 comparisons until I
> > > switch to two way algorithm which gives linear complexity.
That's a reasonable solution, but I think it would be even better to
integrate your SSE tricks with two-way.