This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Optimize strstr, strcasestr and memmem
On Fri, May 18, 2012 at 2:18 AM, OndÅej BÃlka <email@example.com> wrote:
> On Fri, May 18, 2012 at 02:25:47PM +1200, Maxim Kuvyrkov wrote:
>> On 18/05/2012, at 10:20 AM, Joseph S. Myers wrote:
>> > Anyone interested in performance of these functions may also be interested
>> > in bug 12100, where the SSE4 version of strstr reintroduces the unwanted
>> > quadratic asymptotic performance. Â(This is not a comment on your patch
>> > itself, but a mention of something people looking at this area might also
>> > be interested in.)
>> Thanks for the pointer. ÂFor avoidance of doubt, I've benchmarked the patch on a Core2 machine without SSE4.
>> I will benchmark the SSE4 implementation against the normal + this patch on short needles. ÂThe benchmark that motivated this patch is libosip message parsing, which heavily uses string functions with small strings.
> I posted strstr implementation at this list and got no response.
> I use only SSE2 instructions and my algorithm is faster than using SSE4 instructions.
> With trivial modifications I could also use plain arithmetic/AVX2
> A trick is to check first two characters or zero terminator in parallel.
> This is about 25 times faster on core2 vs glibc one, on i7 its only 3
> times faster.
> I use these trick for my extension of regular expression engine a
> partialy autogenerated strstr follows.
The biggest problem I have with reviewing *any* of this code is that
performance is relative to benchmarks.
As a community we have no baseline benchmark numbers.
I would like to see someone from the community come forward to help
sort this out.
I think at a minimum we need:
* A list of baseline benchmarks we are going to use to evaluate
performance related submissions.
* Collect data on the runs.
* Record everything in the wiki including how to run the benchmarks
and the collection of growing results.
Are either of you interested in helping the community develop some
metrics for acceptance of performance related patches?
An initial solution to this problem need not be that much work.
Do you refer to: http://sourceware.org/ml/libc-help/2011-11/msg00011.html?
You posted some interested benchmark code there that we could add to
glibc as part of a "benchmark" target to run after making changes to
performance critical routines.
Does what I'm saying make sense?