This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Use Quicksearch in strstr
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: 'GNU C Library' <libc-alpha at sourceware dot org>, nd <nd at arm dot com>, Alexander Monakov <amonakov at ispras dot ru>
- Date: Mon, 12 Nov 2018 14:03:52 +0000
- Subject: Re: [PATCH] Use Quicksearch in strstr
- References: <DB5PR08MB1030E559B3818F52CA48A63583F30@DB5PR08MB1030.eurprd08.prod.outlook.com>,<20181029225421.GA21482@domone>,<DB5PR08MB1030CE58A51EDD8FE90D80DD83CD0@DB5PR08MB1030.eurprd08.prod.outlook.com>,<DB5PR08MB1030428E777D456220B6F09383CB0@DB5PR08MB1030.eurprd08.prod.outlook.com>,<DB5PR08MB1030F3640B0AA19923C5F0E583CB0@DB5PR08MB1030.eurprd08.prod.outlook.com>,<DB5PR08MB1030EE62F9542302047E379D83C40@DB5PR08MB1030.eurprd08.prod.outlook.com>,<DB5PR08MB1030AB09690D51BCA4C2ED5883C50@DB5PR08MB1030.eurprd08.prod.outlook.com>,<DB5PR08MB103098CFBADFFD62E9DB97FD83C60@DB5PR08MB1030.eurprd08.prod.outlook.com>
Rich Felker wrote:
> Which profiles? The common cases (where significant-length prefix
> matches are usually rare) or the worst cases?
Both - the profiles vary between the different code paths, but I don't see
long matches for good or bad cases. I made the slow cases significantly
slower by using a haystack of random 'a' and 'b' chars. This causes lots of
branch mispredictions by alternating between the slow paths that search
for the first character.
> For needles that factor
> poorly (long right factor), the latter breaks down to essentially
> repeated disjoint memcmp's of the right factor against the whole
> haystack, in successive order, starting over at the beginning of the
> needle's right factor whenever there's a mismatch. That sounds highly
> "memcmp-bound" to me. If worst cases are something you care about
> improving vs the status quo, this is worth pursuing, but if not, okay.
Eventhough it does only 1 character per iteration, finding the first matching
character is more complex and slower, so I haven't seen a case where it is
> I agree that for common cases there are potentially much better
> improvements to be made elsewhere. strchr is of course a big help when
> the character to be searched is rare but useless (and pessimizes from
> call/setup overhead unless you can inline it) if it's frequent. It's
> probably better to search for the first up-to-N characters of the
> right factor, where N is a word/vector size, rotating bytes of the
> haystack in/out of a running word/vector for comparison (musl's strstr
> does this for needle sizes up to 4 already, and some data has shown
> it's usually better than two-way a bit further on 64-bit systems).
Yes searching for a fixed size substring would work better indeed since
you get far fewer false matches than when searching for a single char.
I don't think brute force word-based matching is fast enough. I tried a
variant using unaligned accesses which are much faster. However
Quicksearch still beats it. The key is you can use any fast algorithm given
the substring would be fixed-size, so linear time is guaranteed.
> Performing the critical factorization via the maximal suffix process
> described in the two-way paper is also suboptimal, and dependent on
> the actual order of the alphabet. For example hello factors as
> [hell][o] but hellk factors as [he][llk] despite them being
> structurally isomorphic under a renaming of the alphabet. In some
> cases, optimal factorizations can be determined by short-circuiting
> past the MS process. For example if the loop setting up the bad char
> table determines that the final character of the needle is the only
> occurrance of that character, [l-1] is a critical factorization and
> p=l. There are larger classes of this type of shortcut but I'm not
> sure what's efficient/reasonable-tradeoff to do.
Indeed, the critical factorization doesn't appear optimal but I haven't
seen any research that tries to improve it. There must be better ways
of doing it since all you want is a short fixed-size substring with the
right properties. However finding an algorithm that does this efficiently
might not be trivial.
> Searching for a short
> right factor is appealing though because it means you get to skip by
> the whole period as soon as there's any mismatch on the left factor.
Indeed. And apart from highly repetitive needles it would allow for a
simple memcmp for matching the needle.