This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] powerpc: strstr optimization

On Fri, Jul 17, 2015 at 04:14:53PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> On 07/17/2015 06:11 AM, OndÅej BÃlka wrote:
> >On Thu, Jul 16, 2015 at 06:05:10PM -0300, Tulio Magno Quites Machado Filho wrote:
> >And as generic case I many times asked if you ran my generic strstr
> >patch. It was lot faster than first two iteration and in practice looks
> I have run benchtest against your generic strstr patches
> and the results are attached in
> where 12750 out of 13442 combinations shows improvement.
And I was repeatedly telling you that your benchmark is invalid. You
compare generic digraph search with power7 strchr. That has two

First one is that I previously described that reason of using digraphs instead
of just strchr loop is better performance in more degenerated cases.
Your benchmarks just shows that checking for one character is better
than for two when probability of that character is very low.

So to test these you need to use different benchmark where you set
probabilities of first character to 1/8, 1/16, 1/32, 1/64 and you will
see that on some point digraphs beat single character. 

>From my previous mail:

Just use same patch like I send with ((unsigned) rand())%16 + 1 and you
will see completely different numbers in benchmark.

This is real problem, not just pathological case as it could easily
happen in some programs. If you search english text for 
strstr(haystack," the") then on average every fourth character is space.

As second problem I was repeatedly telling you that you need compare
power7 optimized code with power7 one. So either add powerpc
instructions to skeleton or use my previous patch. As I said that my
patch would beat yours it holds as that applied to strstr v2. Then I
realized this problem and said that you should check naive loop which
you didn't do.

Following comment from my previous mail still applies:

My comment was that it wastes time most of time as in common case these
already differ in first byte so all these shifts were completely
unnecessary, a simple optimization of naive algorithm using


would be faster.

Which I also paraphrased here which applies to your v3. 

Second is about average case perforance. There worst case is irrelevant
as you use buy-or-rent techniques to handle it. I sumbitted following
patch with easy way to handle that.

[PATCH] Improve generic strstr performance.

its currently superseeded by generic vector bigraph search but it should
beat your strstr.
> >twice faster than this assembly. Thats because strings are generaly
> >short. Its bit slower than assembly or large size. Thats just caused by
> >using ppc strchr instead power7 one which is slower. So I just used two
> >versions.
> >
> >Results with modified benchmark attached, where I did hack to rename my
> >proposal as strstr_power7b while strstr_power7 is current to clearly see
> >difference.
> >
> How did you manage to get both __strstr_power7b and __strstr_power7
> results in the same benchtest run? or Did you run two different
> benchtests and merge the output?
1. Just add new implementation after patching without deleting old one, name it __strstr_power7b
2. Add __strstr_power7b to ifunc_impl_list
3. Run benchmark

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]