This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH][BZ #12100] strstr with unaligned loads.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Rich Felker <dalias at aerifal dot cx>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 27 Aug 2013 14:16:41 +0200
- Subject: Re: [PATCH][BZ #12100] strstr with unaligned loads.
- Authentication-results: sourceware.org; auth=none
- References: <20130825090758 dot GA23686 at domone dot kolej dot mff dot cuni dot cz> <20130825155406 dot GV20515 at brightrain dot aerifal dot cx> <20130826070925 dot GA4292 at domone dot kolej dot mff dot cuni dot cz> <20130826162446 dot GZ20515 at brightrain dot aerifal dot cx> <20130827020414 dot GA8611 at domone dot kolej dot mff dot cuni dot cz> <20130827022042 dot GH20515 at brightrain dot aerifal dot cx> <20130827083048 dot GA12761 at domone dot kolej dot mff dot cuni dot cz> <20130827084500 dot GL20515 at brightrain dot aerifal dot cx>
On Tue, Aug 27, 2013 at 04:45:00AM -0400, Rich Felker wrote:
> On Tue, Aug 27, 2013 at 10:30:48AM +0200, OndÅej BÃlka wrote:
> > > winning on random testcases is meaningless since it has quadratic
> > > performance in the worst-case.
> > > For strstr random input is always the
> > > easy case.
> > How good is it on binary alphabeth?
> Two-way is very good on binary alphabet, and on 4-letter alphabet
> (e.g. DNA). The bad character table optimization is rather useless on
> small alphabets, of course, but that's not a core part of the
> algorithm, just an optimization that helps achieve sublinear
> performance in many real-world usages (while maintaining worst-case
> linear performance).
> Note that having an optimized search for the next occurrance of the
> first character or two is also rather useless in a binary alphabet...
A binary alphabet tends to be worst case for optimized implementations.
A most trick do not apply there, most shifts tend to be 1 or 2 bytes and
branches tend to be mispredicted. A performance impact changing alphabet from
binary to ternary in two-way is big.
> > > > > presumably because musl always uses the bad-character table). There
> > > > > may be additional logic considerations to integrating any optimized
> > > > > "next pair" search with the bad character table; I have not checked
> > > > > this.
> > > > >
> > > > Bad character is also heuristic that is not needed when pair heuristic
> > > > is used as it would only slow it down.
> > >
> > > Bad character table changes the typical runtime from O(n) to O(n/m).
> > > This is HUGE.
> > You are making basic mistake of arguing that some algorithm is better
> > in best case.
> My above comments were comparing two-way with bad character table to
> plain two-way. Bad character table does not always help (it just helps
> in many typical real-world cases), but it never hurts more than a
> small O(m) (m = needle length) startup cost. In my implementation, I
> combine this with the operation of computing the needle length. Of
> course strlen is a faster O(m) operation by several times, so one
> could argue that the bad character table still "costs too much", but
> empirically I found my version of two-way (in musl) outperforming
> glibc's version of two-way (not the SSE code in glibc) on many
> reasonable inputs where the needle was 16-31 bytes.
> > > > > On Sun, Aug 25, 2013 at 11:07:58AM +0200, OndÅej BÃlka wrote:
> > > > > > This could be solved by standard buy or rent techique which I employ.
> > > > > > For any position p there could be only p+64n+512 comparisons until I
> > > > > > switch to two way algorithm which gives linear complexity.
> > >
> > > That's a reasonable solution, but I think it would be even better to
> > > integrate your SSE tricks with two-way.
> > >
> > Its not worth doing from my perspective as factorization step cost
> > around 20 cycles per needle byte. In that time I could often find three
> > times that there is no match.
> I think we'd need to see more empirical data to determine this. Your
> switchover threshold is not exactly cheap. If the intent is just to
> give good performance for "typical" (e.g. English text) uses of
> strstr, and otherwise only avoid pathologically bad performance that
> could be used for DoS, then your approach may be reasonable. On the
Indent was get best possible performance for typical cases and
reasonable for all cases.
> other hand, if you want to support arbitrary reasonable uses of strstr
> (like DNA searches in an entire genome), it's probably not adequate.
> But that's just my semi-educated guess. A real determination would
> require empirical measurement.
You can not beat specialized algorithm (DNA search) without supplying
contextual information to algorithm. When you decide switch based of
only current run data you will slow common case down.
A profile-guided optimization is possible but it would need add
infrastructure to save context info.