This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Optimize strstr, strcasestr and memmem
On 19/05/2012, at 2:57 AM, Carlos O'Donell wrote:
> On Fri, May 18, 2012 at 2:18 AM, OndÅej BÃlka <email@example.com> wrote:
>> On Fri, May 18, 2012 at 02:25:47PM +1200, Maxim Kuvyrkov wrote:
>>> On 18/05/2012, at 10:20 AM, Joseph S. Myers wrote:
>>>> Anyone interested in performance of these functions may also be interested
>>>> in bug 12100, where the SSE4 version of strstr reintroduces the unwanted
>>>> quadratic asymptotic performance. (This is not a comment on your patch
>>>> itself, but a mention of something people looking at this area might also
>>>> be interested in.)
>>> Thanks for the pointer. For avoidance of doubt, I've benchmarked the patch on a Core2 machine without SSE4.
>>> I will benchmark the SSE4 implementation against the normal + this patch on short needles. The benchmark that motivated this patch is libosip message parsing, which heavily uses string functions with small strings.
>> I posted strstr implementation at this list and got no response.
>> I use only SSE2 instructions and my algorithm is faster than using SSE4 instructions.
>> With trivial modifications I could also use plain arithmetic/AVX2
>> A trick is to check first two characters or zero terminator in parallel.
>> This is about 25 times faster on core2 vs glibc one, on i7 its only 3
>> times faster.
>> I use these trick for my extension of regular expression engine a
>> partialy autogenerated strstr follows.
> Ondrej, Maxim,
> The biggest problem I have with reviewing *any* of this code is that
> performance is relative to benchmarks.
While the other portion of this thread is being hijacked by discussion of GLIBC benchmarking, let's return to the patch that started this thread.
This patch is an algorithmic improvement, which makes strstr and company do less work by piggy backing the task of detecting EOL with the matching procedure. As is, EOL is detected by a dedicated call to memchr, which is extremely expensive.
The only non-algorithmic improvement in the patch is the conversion of array[index] addressing to pointers. Replacing two variables and, potentially, an add instruction ("array", "index" and "array+index") with one variable ("pointer_array") is a fairly uncontroversial micro-optimization.
Any reason for the review to not proceed?
CodeSourcery / Mentor Graphics