This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v3] powerpc: strstr optimization
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>
- Cc: Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>, Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
- Date: Fri, 17 Jul 2015 13:53:31 +0200
- Subject: Re: [PATCH v3] powerpc: strstr optimization
- Authentication-results: sourceware.org; auth=none
- References: <558A5761 dot 2000409 at linux dot vnet dot ibm dot com> <87oajpm8nc dot fsf at totoro dot br dot ibm dot com> <871tgijuri dot fsf at linux dot vnet dot ibm dot com> <55A6FE3F dot 6090701 at redhat dot com> <55A70B70 dot 6090607 at redhat dot com> <20150716195538 dot GA5140 at domone> <55A8110C dot 7000209 at redhat dot com> <87egk7dda1 dot fsf at linux dot vnet dot ibm dot com> <20150717004134 dot GA30504 at domone> <55A8DCA5 dot 3050309 at linux dot vnet dot ibm dot com>
On Fri, Jul 17, 2015 at 04:14:53PM +0530, Rajalakshmi Srinivasaraghavan wrote:
>
>
> On 07/17/2015 06:11 AM, OndÅej BÃlka wrote:
> >On Thu, Jul 16, 2015 at 06:05:10PM -0300, Tulio Magno Quites Machado Filho wrote:
>
> >And as generic case I many times asked if you ran my generic strstr
> >patch. It was lot faster than first two iteration and in practice looks
> I have run benchtest against your generic strstr patches
> and the results are attached in
> https://sourceware.org/ml/libc-alpha/2015-06/msg00788.html
> where 12750 out of 13442 combinations shows improvement.
>
And I was repeatedly telling you that your benchmark is invalid. You
compare generic digraph search with power7 strchr. That has two
problems.
First one is that I previously described that reason of using digraphs instead
of just strchr loop is better performance in more degenerated cases.
Your benchmarks just shows that checking for one character is better
than for two when probability of that character is very low.
So to test these you need to use different benchmark where you set
probabilities of first character to 1/8, 1/16, 1/32, 1/64 and you will
see that on some point digraphs beat single character.
>From my previous mail:
"
Just use same patch like I send with ((unsigned) rand())%16 + 1 and you
will see completely different numbers in benchmark.
This is real problem, not just pathological case as it could easily
happen in some programs. If you search english text for
strstr(haystack," the") then on average every fourth character is space.
"
As second problem I was repeatedly telling you that you need compare
power7 optimized code with power7 one. So either add powerpc
instructions to skeleton or use my previous patch. As I said that my
patch would beat yours it holds as that applied to strstr v2. Then I
realized this problem and said that you should check naive loop which
you didn't do.
Following comment from my previous mail still applies:
"
My comment was that it wastes time most of time as in common case these
already differ in first byte so all these shifts were completely
unnecessary, a simple optimization of naive algorithm using
while(s=strchr(s,n[0]))
would be faster.
"
Which I also paraphrased here which applies to your v3.
"
Second is about average case perforance. There worst case is irrelevant
as you use buy-or-rent techniques to handle it. I sumbitted following
patch with easy way to handle that.
[PATCH] Improve generic strstr performance.
its currently superseeded by generic vector bigraph search but it should
beat your strstr.
"
> >twice faster than this assembly. Thats because strings are generaly
> >short. Its bit slower than assembly or large size. Thats just caused by
> >using ppc strchr instead power7 one which is slower. So I just used two
> >versions.
> >
> >Results with modified benchmark attached, where I did hack to rename my
> >proposal as strstr_power7b while strstr_power7 is current to clearly see
> >difference.
> >
> How did you manage to get both __strstr_power7b and __strstr_power7
> results in the same benchtest run? or Did you run two different
> benchtests and merge the output?
>
Trivial.
1. Just add new implementation after patching without deleting old one, name it __strstr_power7b
2. Add __strstr_power7b to ifunc_impl_list
3. Run benchmark