This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v3] powerpc: strstr optimization

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: libc-alpha at sourceware dot org
Date: Fri, 17 Jul 2015 09:13:59 -0300
Subject: Re: [PATCH v3] powerpc: strstr optimization
Authentication-results: sourceware.org; auth=none
References: <558A5761 dot 2000409 at linux dot vnet dot ibm dot com> <87oajpm8nc dot fsf at totoro dot br dot ibm dot com> <871tgijuri dot fsf at linux dot vnet dot ibm dot com> <55A6FE3F dot 6090701 at redhat dot com> <55A70B70 dot 6090607 at redhat dot com> <20150716195538 dot GA5140 at domone> <55A8110C dot 7000209 at redhat dot com> <87egk7dda1 dot fsf at linux dot vnet dot ibm dot com> <20150717004134 dot GA30504 at domone> <55A8DCA5 dot 3050309 at linux dot vnet dot ibm dot com> <20150717115331 dot GA26478 at domone>


On 17-07-2015 08:53, OndÅej BÃlka wrote:
> On Fri, Jul 17, 2015 at 04:14:53PM +0530, Rajalakshmi Srinivasaraghavan wrote:
>>
>>
>> On 07/17/2015 06:11 AM, OndÅej BÃlka wrote:
>>> On Thu, Jul 16, 2015 at 06:05:10PM -0300, Tulio Magno Quites Machado Filho wrote:
>  >
>>> And as generic case I many times asked if you ran my generic strstr
>>> patch. It was lot faster than first two iteration and in practice looks
>> I have run benchtest against your generic strstr patches
>> and the results are attached in
>> https://sourceware.org/ml/libc-alpha/2015-06/msg00788.html
>> where 12750 out of 13442 combinations shows improvement.
>>
> And I was repeatedly telling you that your benchmark is invalid. You
> compare generic digraph search with power7 strchr. That has two
> problems.
> 
> First one is that I previously described that reason of using digraphs instead
> of just strchr loop is better performance in more degenerated cases.
> Your benchmarks just shows that checking for one character is better
> than for two when probability of that character is very low.
> 
> So to test these you need to use different benchmark where you set
> probabilities of first character to 1/8, 1/16, 1/32, 1/64 and you will
> see that on some point digraphs beat single character. 
> 
> From my previous mail:
> 
> "
> Just use same patch like I send with ((unsigned) rand())%16 + 1 and you
> will see completely different numbers in benchmark.
> 
> This is real problem, not just pathological case as it could easily
> happen in some programs. If you search english text for 
> strstr(haystack," the") then on average every fourth character is space.
> "
> 
> As second problem I was repeatedly telling you that you need compare
> power7 optimized code with power7 one. So either add powerpc
> instructions to skeleton or use my previous patch. As I said that my
> patch would beat yours it holds as that applied to strstr v2. Then I
> realized this problem and said that you should check naive loop which
> you didn't do.
> 
> Following comment from my previous mail still applies:
> 
> "
> My comment was that it wastes time most of time as in common case these
> already differ in first byte so all these shifts were completely
> unnecessary, a simple optimization of naive algorithm using
> 
> while(s=strchr(s,n[0]))
> 
> would be faster.
> "
> 
> Which I also paraphrased here which applies to your v3. 
> 
> "
> Second is about average case perforance. There worst case is irrelevant
> as you use buy-or-rent techniques to handle it. I sumbitted following
> patch with easy way to handle that.
> 
> [PATCH] Improve generic strstr performance.
> 
> its currently superseeded by generic vector bigraph search but it should
> beat your strstr.
> "
>  
>>> twice faster than this assembly. Thats because strings are generaly
>>> short. Its bit slower than assembly or large size. Thats just caused by
>>> using ppc strchr instead power7 one which is slower. So I just used two
>>> versions.
>>>
>>> Results with modified benchmark attached, where I did hack to rename my
>>> proposal as strstr_power7b while strstr_power7 is current to clearly see
>>> difference.
>>>
>> How did you manage to get both __strstr_power7b and __strstr_power7
>> results in the same benchtest run? or Did you run two different
>> benchtests and merge the output?
>>
> Trivial.
> 1. Just add new implementation after patching without deleting old one, name it __strstr_power7b
> 2. Add __strstr_power7b to ifunc_impl_list
> 3. Run benchmark
> 

I would be very good if we could work on improve the strstr benchmark
to shows this issues you just raised.

Follow-Ups:
- Re: [PATCH v3] powerpc: strstr optimization
  - From: OndÅej BÃlka

References:
- Re: [PATCH v3] powerpc: strstr optimization
  - From: Tulio Magno Quites Machado Filho
- Re: [PATCH v3] powerpc: strstr optimization
  - From: Tulio Magno Quites Machado Filho
- Re: [PATCH v3] powerpc: strstr optimization
  - From: Carlos O'Donell
- Re: [PATCH v3] powerpc: strstr optimization
  - From: Carlos O'Donell
- Re: [PATCH v3] powerpc: strstr optimization
  - From: OndÅej BÃlka
- Re: [PATCH v3] powerpc: strstr optimization
  - From: Carlos O'Donell
- Re: [PATCH v3] powerpc: strstr optimization
  - From: Tulio Magno Quites Machado Filho
- Re: [PATCH v3] powerpc: strstr optimization
  - From: OndÅej BÃlka
- Re: [PATCH v3] powerpc: strstr optimization
  - From: Rajalakshmi Srinivasaraghavan
- Re: [PATCH v3] powerpc: strstr optimization
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]