This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v3] powerpc: strstr optimization
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: libc-alpha at sourceware dot org
- Date: Fri, 17 Jul 2015 09:13:59 -0300
- Subject: Re: [PATCH v3] powerpc: strstr optimization
- Authentication-results: sourceware.org; auth=none
- References: <558A5761 dot 2000409 at linux dot vnet dot ibm dot com> <87oajpm8nc dot fsf at totoro dot br dot ibm dot com> <871tgijuri dot fsf at linux dot vnet dot ibm dot com> <55A6FE3F dot 6090701 at redhat dot com> <55A70B70 dot 6090607 at redhat dot com> <20150716195538 dot GA5140 at domone> <55A8110C dot 7000209 at redhat dot com> <87egk7dda1 dot fsf at linux dot vnet dot ibm dot com> <20150717004134 dot GA30504 at domone> <55A8DCA5 dot 3050309 at linux dot vnet dot ibm dot com> <20150717115331 dot GA26478 at domone>
On 17-07-2015 08:53, OndÅej BÃlka wrote:
> On Fri, Jul 17, 2015 at 04:14:53PM +0530, Rajalakshmi Srinivasaraghavan wrote:
>> On 07/17/2015 06:11 AM, OndÅej BÃlka wrote:
>>> On Thu, Jul 16, 2015 at 06:05:10PM -0300, Tulio Magno Quites Machado Filho wrote:
>>> And as generic case I many times asked if you ran my generic strstr
>>> patch. It was lot faster than first two iteration and in practice looks
>> I have run benchtest against your generic strstr patches
>> and the results are attached in
>> where 12750 out of 13442 combinations shows improvement.
> And I was repeatedly telling you that your benchmark is invalid. You
> compare generic digraph search with power7 strchr. That has two
> First one is that I previously described that reason of using digraphs instead
> of just strchr loop is better performance in more degenerated cases.
> Your benchmarks just shows that checking for one character is better
> than for two when probability of that character is very low.
> So to test these you need to use different benchmark where you set
> probabilities of first character to 1/8, 1/16, 1/32, 1/64 and you will
> see that on some point digraphs beat single character.
> From my previous mail:
> Just use same patch like I send with ((unsigned) rand())%16 + 1 and you
> will see completely different numbers in benchmark.
> This is real problem, not just pathological case as it could easily
> happen in some programs. If you search english text for
> strstr(haystack," the") then on average every fourth character is space.
> As second problem I was repeatedly telling you that you need compare
> power7 optimized code with power7 one. So either add powerpc
> instructions to skeleton or use my previous patch. As I said that my
> patch would beat yours it holds as that applied to strstr v2. Then I
> realized this problem and said that you should check naive loop which
> you didn't do.
> Following comment from my previous mail still applies:
> My comment was that it wastes time most of time as in common case these
> already differ in first byte so all these shifts were completely
> unnecessary, a simple optimization of naive algorithm using
> would be faster.
> Which I also paraphrased here which applies to your v3.
> Second is about average case perforance. There worst case is irrelevant
> as you use buy-or-rent techniques to handle it. I sumbitted following
> patch with easy way to handle that.
> [PATCH] Improve generic strstr performance.
> its currently superseeded by generic vector bigraph search but it should
> beat your strstr.
>>> twice faster than this assembly. Thats because strings are generaly
>>> short. Its bit slower than assembly or large size. Thats just caused by
>>> using ppc strchr instead power7 one which is slower. So I just used two
>>> Results with modified benchmark attached, where I did hack to rename my
>>> proposal as strstr_power7b while strstr_power7 is current to clearly see
>> How did you manage to get both __strstr_power7b and __strstr_power7
>> results in the same benchtest run? or Did you run two different
>> benchtests and merge the output?
> 1. Just add new implementation after patching without deleting old one, name it __strstr_power7b
> 2. Add __strstr_power7b to ifunc_impl_list
> 3. Run benchmark
I would be very good if we could work on improve the strstr benchmark
to shows this issues you just raised.