This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Proposal to handle __strstr_sse42 and friends issue on x86
- From: Nix <nix at esperi dot org dot uk>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: Allan McRae <allan at archlinux dot org>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Thu, 09 Jan 2014 23:58:56 +0000
- Subject: Re: Proposal to handle __strstr_sse42 and friends issue on x86
- Authentication-results: sourceware.org; auth=none
- References: <52A7B7E5 dot 6020607 at archlinux dot org> <20131214191819 dot GA24565 at domone dot podge>
On 14 Dec 2013, OndÅej BÃlka said:
> On Wed, Dec 11, 2013 at 10:55:01AM +1000, Allan McRae wrote:
>> would likely remove any advantage of the sse42 routine (not tested...),
>> and there are proposals to remove the sse42 routines for both x86 and
>> x86_64 due to quadratic complexity anyway [3,4].
Please. Half the GNU tools end up replacing strstr() with gnulib's
replacement strstr anyuway because of this. And they're right to do so.
> sse42 routines are quite ineffective in that regard, with plain sse2 you
> can get around five times faster. I planned to add a version that avoids
> unaligned loads for older processors.
I'd say it's not worth bothering with any of this unless it implements
the same algorithm as the C strstr(), rather than implementing something
with quadratic slowdown in really fast assembler. It doesn't matter if
strstr() is an imperceptible little bit faster on tiny needle / haystack
combinations if it slows down quadratically on the big ones where its
performance hit is in any case most noticeable. (Do we even know the
distribution of needle / haystack sizes on real systems? A preloaded
wrapper could tell us...)
> You can also use this one you just improve performance 15 times instead
> 30 if you expanded unaligned loads into aligned ones.
A 15-fold improvement is peanuts compared to the speedups you get from a
better algorithm -- and the generic code has a better algorithm than the
SSE4.2 code.