This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Optimize strstr, strcasestr and memmem

From: "Carlos O'Donell" <carlos at systemhalted dot org>
To: Ondřej Bílka <neleai at seznam dot cz>
Cc: Maxim Kuvyrkov <maxim at codesourcery dot com>, "Joseph S.Myers" <joseph at codesourcery dot com>, libc-alpha at sourceware dot org
Date: Fri, 18 May 2012 10:57:56 -0400
Subject: Re: [PATCH] Optimize strstr, strcasestr and memmem
References: <2C516CF2-D083-4C1D-AD27-6A31D381D548@codesourcery.com><Pine.LNX.4.64.1205172218020.28988@digraph.polyomino.org.uk><7416069C-E1FD-4F64-81DD-F09C726E63A0@codesourcery.com><20120518061821.GA2911@domone.kolej.mff.cuni.cz>

On Fri, May 18, 2012 at 2:18 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Fri, May 18, 2012 at 02:25:47PM +1200, Maxim Kuvyrkov wrote:
>> On 18/05/2012, at 10:20 AM, Joseph S. Myers wrote:
>>
>> > Anyone interested in performance of these functions may also be interested
>> > in bug 12100, where the SSE4 version of strstr reintroduces the unwanted
>> > quadratic asymptotic performance. Â(This is not a comment on your patch
>> > itself, but a mention of something people looking at this area might also
>> > be interested in.)
>>
>> Thanks for the pointer. ÂFor avoidance of doubt, I've benchmarked the patch on a Core2 machine without SSE4.
>>
>> I will benchmark the SSE4 implementation against the normal + this patch on short needles. ÂThe benchmark that motivated this patch is libosip message parsing, which heavily uses string functions with small strings.
>>
>
> I posted strstr implementation at this list and got no response.
>
> I use only SSE2 instructions and my algorithm is faster than using SSE4 instructions.
> With trivial modifications I could also use plain arithmetic/AVX2
>
> A trick is to check first two characters or zero terminator in parallel.
> This is about 25 times faster on core2 vs glibc one, on i7 its only 3
> times faster.
>
> I use these trick for my extension of regular expression engine a
> partialy autogenerated strstr follows.

Ondrej, Maxim,

The biggest problem I have with reviewing *any* of this code is that
performance is relative to benchmarks.

As a community we have no baseline benchmark numbers.

I would like to see someone from the community come forward to help
sort this out.

I think at a minimum we need:

* A list of baseline benchmarks we are going to use to evaluate
performance related submissions.
* Collect data on the runs.
* Record everything in the wiki including how to run the benchmarks
and the collection of growing results.

Are either of you interested in helping the community develop some
metrics for acceptance of performance related patches?

An initial solution to this problem need not be that much work.

Ondrej,

Do you refer to: http://sourceware.org/ml/libc-help/2011-11/msg00011.html?

You posted some interested benchmark code there that we could add to
glibc as part of a "benchmark" target to run after making changes to
performance critical routines.

Does what I'm saying make sense?

Cheers,
Carlos.

Follow-Ups:
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: Eric Blake
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: Maxim Kuvyrkov

References:
- [PATCH] Optimize strstr, strcasestr and memmem
  - From: Maxim Kuvyrkov
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: Joseph S. Myers
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: Maxim Kuvyrkov
- Re: [PATCH] Optimize strstr, strcasestr and memmem
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]