This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Potential issue with strstr on x86 with sse4.2 in glibc-2.18
- From: Rich Felker <dalias at aerifal dot cx>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: Allan McRae <allan at archlinux dot org>, Alexander Monakov <amonakov at ispras dot ru>, libc-alpha at sourceware dot org
- Date: Tue, 20 Aug 2013 13:57:35 -0400
- Subject: Re: Potential issue with strstr on x86 with sse4.2 in glibc-2.18
- References: <520E181D dot 2040308 at archlinux dot org> <alpine dot LNX dot 2 dot 00 dot 1308191628370 dot 2626 at monopod dot intra dot ispras dot ru> <20130819144648 dot GF20515 at brightrain dot aerifal dot cx> <alpine dot LNX dot 2 dot 00 dot 1308191924490 dot 2626 at monopod dot intra dot ispras dot ru> <5212A278 dot 3090909 at archlinux dot org> <20130819230644 dot GM20515 at brightrain dot aerifal dot cx> <5212E278 dot 4030703 at archlinux dot org> <20130820033430 dot GN20515 at brightrain dot aerifal dot cx> <20130820043956 dot GO20515 at brightrain dot aerifal dot cx> <Pine dot LNX dot 4 dot 64 dot 1308201531540 dot 15834 at digraph dot polyomino dot org dot uk>
On Tue, Aug 20, 2013 at 03:56:46PM +0000, Joseph S. Myers wrote:
> On Tue, 20 Aug 2013, Rich Felker wrote:
>
> > This all looks like a big mess, and it's all GCC's fault. With such a
> > nasty incompatible ABI change, they should have added a minimally
> > invasive way to build code that interoperates: not assuming the stack
> > pointer is aligned on entry, but preserving the alignment on calls
> > (i.e. keeping it the same mod 16 as it was on entry) so that both of
> > these cases work:
> >
> > 1. Caller is using old 4-byte alignment.
> >
> > 2. Caller is using 16-byte alignment and needs its callbacks to be
> > called with 16-byte alignment.
>
> The old 4-byte alignment case should only apply to very old binaries, but
> of course an old binary using strstr still ought to work on a new system.
Or a new binary built with gcc 3.4. While compiling glibc with gcc 3.4
is not supported, I don't think it's reasonable to tell people they
can't compile application code with it...
> That is, the intent of the glibc ABI is to support such binaries, and this
> worked as long as x86 didn't use SSE in glibc. In turn, that means that
> any entry point to glibc functions using SSE in a way that requires
> 16-byte alignment needs to do dynamic stack realignment.
Agreed.
> Given that since 2011 we don't try to build x86 glibc with
> -mpreferred-stack-boundary=2 for most functions to save stack space, I
> think the x86 glibc should already be preserving alignment as you request
> (and the $(uses-callbacks) incompleteness may not matter in practice for
> this issue) - the problem is for old binaries, maybe linked with glibc
> 2.0, doing things that were valid at the time with interfaces that existed
> at the time, although at least since 2006 it's not been considered valid
> to build something with the 4-byte alignment and expect to be able to call
> standard interfaces with that alignment (so maybe the realignment is only
> needed when the user's entry point to glibc is at a 2.4 or older symbol
> version, although I don't know if it's worth adding any new-version entry
> points that bypass the realignment).
>
> (In the case of strstr, bug 12100 for asymptotic slowness of the SSE4.2
> implementation is also still open - though the preference was to use a
> hybrid approach for a fix rather than completely removing the SSE4.2
> version, so I suppose the realignment issue will remain even with a fix
> for that bug.)
I question the reasoning for this. If the "short needle" version of
two-way were removed and the "long needle" version (with bad character
table) always used, I expect it would outperform the SSE code in
almost all cases. SSE is not at all well-suited to strstr since you
have to keep bitshifting and check all alignments. At best, the SSE
code will do one vector comparison per byte of the haystack (up until
the match, if any, is found). Two-way with the bad character table can
do much better, on average inspecting only C*n/m positions (where n is
the haystack length (up to the first match) and m the needle length).
Rich