This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] powerpc: strstr optimization




On 07/17/2015 06:11 AM, OndÅej BÃlka wrote:
On Thu, Jul 16, 2015 at 06:05:10PM -0300, Tulio Magno Quites Machado Filho wrote:
"Carlos O'Donell" <carlos@redhat.com> writes:

On 07/16/2015 03:55 PM, OndÅej BÃlka wrote:
Sorry Carlos, your trust is misplaced. This patch wasn't reviewed at
all. I did that as test how much we could test IBM to verify patches.

No.  That isn't true.
I reviewed this patch myself.  All the 4 versions.
So did Steven.

I did see only three. Also I didn't see you mails with comments on
libc-alpha. Please post these with comments so everybody could see
these.

If I were you then I would deny that I reviewed it as that cast bad
light on you. There are numerous flaws on these patches and that both of
you didn't catch bug shows it.

I pointed out that it could have possibly quadratic behaviour which
still does. So please don't accept unreviewed patches next time.

Yes, this patch does have a quadratic worst case as does quicksort.
But for both algorithms, they're clearly an improvement over what was
available before them.
Is this algorithm perfect?  No.  I don't believe such a thing exist.  I
believe in iterative improvements.

That isn't clear improvement. As patch history versions 1 and 2 were bad
algorithm which took me lot of time to convince change. Then this is
better but its just my patch for that poorly rewriten. I clearly see
problems there, for example calling strnlen is unnecessary overhead.

One thing is iterative change, second is wasted effort. If code needs to
be completely replaced to improve performance then its better not to
send a patch at all. If a assembly is slower than equivalent c
impementation then its pointless to use assembly. So why should you keep
custom assembly when generic implementation is better?

They showed cases for which the code does go faster and objectively
so using the microbenchmark, and that's a win for now. Please continue
to work with IBM to remove the quadratic worst case.

That's our intention.
An improvement to this implementation is already in our backlog (i.e. we
haven't started yet) and I'd be glad if someone is interested to help us with
this.

That is simple as my generic implementation is both simpler and faster
and handles this. If you didn't waste time with assembly then you
wouldn't have backlog.


Tulio, You will need to work out why you have quadratic worst case.
It's certainly something we try to avoid. Did you make a particular
decision not to avoid it?

IMHO, the overall improvement brought by this patch is more important than
the worst case scenario OndÅej is showing.
Which doesn't mean we're neglecting the quadratic worst case.

As this was also negatively received on x64, see bug 12100 pointing out
to fix that should every reviewer do.

And as generic case I many times asked if you ran my generic strstr
patch. It was lot faster than first two iteration and in practice looks
I have run benchtest against your generic strstr patches
and the results are attached in
https://sourceware.org/ml/libc-alpha/2015-06/msg00788.html
where 12750 out of 13442 combinations shows improvement.

twice faster than this assembly. Thats because strings are generaly
short. Its bit slower than assembly or large size. Thats just caused by
using ppc strchr instead power7 one which is slower. So I just used two
versions.

Results with modified benchmark attached, where I did hack to rename my
proposal as strstr_power7b while strstr_power7 is current to clearly see
difference.

How did you manage to get both __strstr_power7b and __strstr_power7
results in the same benchtest run? or Did you run two different benchtests and merge the output?

So if you want better performance and fix quadratic behaviour what about
following?

	* benchtests/bench-strstr.c: Modify benchmark.
	* sysdeps/powerpc/powerpc64/multiarch/Makefile (routines): Add strstr-ppc64.
	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
	(strstr_ppc, strstr_power7: Add.
	* sysdeps/powerpc/powerpc64/multiarch/strstr-ppc64.c: New file.
	* sysdeps/powerpc/powerpc64/multiarch/strstr.c: Likewise.



--
Thanks
Rajalakshmi S


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]