This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: strcasestr optimization




On 06/01/2015 09:52 PM, OndÅej BÃlka wrote:
On Mon, Jun 01, 2015 at 09:36:47AM -0500, Steven Munroe wrote:
On Mon, 2015-06-01 at 14:28 +0200, OndÅej BÃlka wrote:
On Mon, Jun 01, 2015 at 04:11:28PM +0530, Rajalakshmi Srinivasaraghavan wrote:

This patch optimizes strcasestr function for power >= 7 systems.
This patch uses optimized strlen and strnlen for calculating
string length and the average improvement of this optimization is ~40%.
This patch is tested on powerpc64 and powerpc64le.
Attached the benchresults with this new patch.

Thats not enough. As strcasestr that I submited is around three times
slower your implementation would likely be regression over generic one.

A problem here is that you use moronic algorithm. Fix algorithm first
before trying to optimize it.


This is not very helpful. You are demanding changes without clear
explanation and justification.

What is wrong with Raja's algorithm? What is insufficient in the
benchmark data she has provided? And why do you think your specific
design applies to PowerISA and POWER7/POWER8 micro-architecture.

What data do you have that justified this objection?

I replied on strstr patch thread on why what she submitted is
performance regression. So I will repeat arguments from other thread
which still apply.

First was problem with quadratic behaviour. She tried to fix it but it
isn't a fix at all. Just benchmark

strcasestr ("aaa...(4000 times)...aaa", "aaa...(2000 times)...aab")

That call would take around hundred times than before which is
unacceptable.

This is already handled in the patch.If the needle len is more than
2048, it calls default string/strcasestr.c

If we ignore that red flag second problem was that benchmark she used is
bogus. It test with periodic haystacks, needle is copy of first bytes of
haystack with last byte set to something else.
Which benchmark are you referring as bogus? The benchtest result attached in the previous thread was created using benchtests/bench-strcasestr.c . Since your proposed benchtest changes were not yet committed, I have used default ones.
.
.
.


Just use same patch like I send with ((unsigned) rand())%16 + 1 and you
will see completely different numbers in benchmark.

Benchtest results attached with these changes.
.
.


As I don't have powerpc access now apply my patches

[PATCH v5] Generic string skeleton
[PATCH v5 4*] Generic string search functions (strstr, strcasestr, memmem)

I have attached the benchtest result with the above patches applied
along with benchtests/bench-strcasestr.c changes.(similiar changed as proposed by you for benchtests/bench-strstr.c).
The result attached clearly shows improvement.
and run (preferably fixed) benchmark with these. As gains that I see on
x64 are bigger than ones gained by this assembly you will likely see
that generic implementation is indeed better and it would be pointless
to try review that only to remove it shortly after adding to improve
performance.



--
Thanks
Rajalakshmi S

Attachment: newresults
Description: Text document

Attachment: new_benchtestcode
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]