This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Make strcmp optimized for SSE2 with unaligned load/store as default

From: OndÅej BÃlka <neleai at seznam dot cz>
To: "H.J. Lu" <hjl dot tools at gmail dot com>
Cc: Roland McGrath <roland at hack dot frob dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Wed, 26 Aug 2015 00:52:49 +0200
Subject: Re: [PATCH] Make strcmp optimized for SSE2 with unaligned load/store as default
Authentication-results: sourceware.org; auth=none
References: <CAMe9rOqgek-2_GfDeFYdHpHvvt0RodUCOBrb-0nKMiCEd62ycg at mail dot gmail dot com> <20150825185920 dot 2DF5A2C3A73 at topped-with-meat dot com> <CAMe9rOqoL2FEaYWK1rTzi6F8B+xQhF_0QCsVkt3QMYA6cbpA9g at mail dot gmail dot com>

On Tue, Aug 25, 2015 at 02:25:12PM -0700, H.J. Lu wrote:
> On Tue, Aug 25, 2015 at 11:59 AM, Roland McGrath <roland@hack.frob.com> wrote:
> > Making ld.so use the same code as libc should be a separate change from
> > anything changing anything about the strcmp implementation itself.  The
> > latter is billed as a performance improvement and so needs a report about
> > the benchmarks or other performance analysis that justify it.
> 
> Here is a patch to make strcmp_sse2_unaligned the default.  The actual
> change isn't very big:
> 
>  rename sysdeps/x86_64/{strcmp.S => strcmp-sse2-aligned.S} (99%)
>  rename sysdeps/x86_64/{multiarch/strcmp-sse2-unaligned.S => strcmp.S} (98%)
> 
> Although I don't have any benchmarks nor performance analysis, given
> that unaligned load/store is fast on recent Intel and AMD processors,
> it should be the default.
> 
> OK for master?
>
I did have mail in progress but I will reply here instead.

You do not need benchmark to show again that they are better, as if they
were slower on processor that defines these it would be generic
regression that should be fixed.

By the way when I read which processors are these amd does not seem to
be handled well as I dont see that enabled.

So question we need to ask in first place is that if we care more about old machines or new.

As I wrote before when I improved strcmp its more theoretical worst case
as practical problem as on profile its around 10% faster everywhere.

Case that matters are short strings where strcmp_sse2 and ssse3 have
very high overhead

A problem is that on older machines aligned loads with shifts/ssse3 are
faster for larger sizes, I see that on core2 improved sse2_unaligned
version is faster for sizes upto 128 bytes.

I had additional patches that fixed that by using same header but also
ssse3/sse2 shifts for larger case but I gave up after severel pings with
that patch.

As benchmark its same as one that I used originally, here are graphs.
A sse2 version should be there shortly as I omitted that previously as
it was too slow for most machines.

http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile.html

References:
- [PATCH] Make strcmp optimized for SSE2 with unaligned load/store as default
  - From: H.J. Lu
- Re: [PATCH] Make strcmp optimized for SSE2 with unaligned load/store as default
  - From: Roland McGrath
- Re: [PATCH] Make strcmp optimized for SSE2 with unaligned load/store as default
  - From: H.J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]