This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v3] powerpc: strstr optimization
- From: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
- Cc: "Carlos O'Donell" <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
- Date: Fri, 17 Jul 2015 16:14:53 +0530
- Subject: Re: [PATCH v3] powerpc: strstr optimization
- Authentication-results: sourceware.org; auth=none
- References: <558A5642 dot 5020107 at linux dot vnet dot ibm dot com> <558A5761 dot 2000409 at linux dot vnet dot ibm dot com> <87oajpm8nc dot fsf at totoro dot br dot ibm dot com> <871tgijuri dot fsf at linux dot vnet dot ibm dot com> <55A6FE3F dot 6090701 at redhat dot com> <55A70B70 dot 6090607 at redhat dot com> <20150716195538 dot GA5140 at domone> <55A8110C dot 7000209 at redhat dot com> <87egk7dda1 dot fsf at linux dot vnet dot ibm dot com> <20150717004134 dot GA30504 at domone>
On 07/17/2015 06:11 AM, OndÅej BÃlka wrote:
On Thu, Jul 16, 2015 at 06:05:10PM -0300, Tulio Magno Quites Machado Filho wrote:
"Carlos O'Donell" <email@example.com> writes:
On 07/16/2015 03:55 PM, OndÅej BÃlka wrote:
Sorry Carlos, your trust is misplaced. This patch wasn't reviewed at
all. I did that as test how much we could test IBM to verify patches.
No. That isn't true.
I reviewed this patch myself. All the 4 versions.
So did Steven.
I did see only three. Also I didn't see you mails with comments on
libc-alpha. Please post these with comments so everybody could see
If I were you then I would deny that I reviewed it as that cast bad
light on you. There are numerous flaws on these patches and that both of
you didn't catch bug shows it.
I pointed out that it could have possibly quadratic behaviour which
still does. So please don't accept unreviewed patches next time.
Yes, this patch does have a quadratic worst case as does quicksort.
But for both algorithms, they're clearly an improvement over what was
available before them.
Is this algorithm perfect? No. I don't believe such a thing exist. I
believe in iterative improvements.
That isn't clear improvement. As patch history versions 1 and 2 were bad
algorithm which took me lot of time to convince change. Then this is
better but its just my patch for that poorly rewriten. I clearly see
problems there, for example calling strnlen is unnecessary overhead.
One thing is iterative change, second is wasted effort. If code needs to
be completely replaced to improve performance then its better not to
send a patch at all. If a assembly is slower than equivalent c
impementation then its pointless to use assembly. So why should you keep
custom assembly when generic implementation is better?
They showed cases for which the code does go faster and objectively
so using the microbenchmark, and that's a win for now. Please continue
to work with IBM to remove the quadratic worst case.
That's our intention.
An improvement to this implementation is already in our backlog (i.e. we
haven't started yet) and I'd be glad if someone is interested to help us with
That is simple as my generic implementation is both simpler and faster
and handles this. If you didn't waste time with assembly then you
wouldn't have backlog.
Tulio, You will need to work out why you have quadratic worst case.
It's certainly something we try to avoid. Did you make a particular
decision not to avoid it?
IMHO, the overall improvement brought by this patch is more important than
the worst case scenario OndÅej is showing.
Which doesn't mean we're neglecting the quadratic worst case.
As this was also negatively received on x64, see bug 12100 pointing out
to fix that should every reviewer do.
And as generic case I many times asked if you ran my generic strstr
patch. It was lot faster than first two iteration and in practice looks
I have run benchtest against your generic strstr patches
and the results are attached in
where 12750 out of 13442 combinations shows improvement.
twice faster than this assembly. Thats because strings are generaly
short. Its bit slower than assembly or large size. Thats just caused by
using ppc strchr instead power7 one which is slower. So I just used two
Results with modified benchmark attached, where I did hack to rename my
proposal as strstr_power7b while strstr_power7 is current to clearly see
How did you manage to get both __strstr_power7b and __strstr_power7
results in the same benchtest run? or Did you run two different
benchtests and merge the output?
So if you want better performance and fix quadratic behaviour what about
* benchtests/bench-strstr.c: Modify benchmark.
* sysdeps/powerpc/powerpc64/multiarch/Makefile (routines): Add strstr-ppc64.
(strstr_ppc, strstr_power7: Add.
* sysdeps/powerpc/powerpc64/multiarch/strstr-ppc64.c: New file.
* sysdeps/powerpc/powerpc64/multiarch/strstr.c: Likewise.