This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: strstr optimization




On 05/01/2015 05:35 PM, OndÅej BÃlka wrote:
On Fri, Apr 17, 2015 at 02:09:44AM -0500, Rajalakshmi Srinivasaraghavan wrote:
This patch optimizes strstr function for power >= 7 systems.  Performance
gain is obtained using doubleword aligned memory access and usage of cmpb
instruction for quicker comparison.  The average improvement of this
optimization is ~40%.
As I did same with x64 I will add improvements, I didn't look at patch
in detail.

First problem looks there is  quadratic worst case. That is solved by
buy-or-rent technique. You count number of comparisons and if that
exceeds C*N where N is number of haystack bytes read so far and C
suitable constant then you switch to slower generic algorithm.

Main improvement in x64 case was checking for pair of characters. A pair
of characters is less likely than single character even vectorized
branch would be rarely triggered saving misprediction cost that
dominates performance.

Though I did not understand your program completely, I agree with checking pair of characters
can improve the performance.

In my optimization there are totally three branches.
case 1# if needle len >= 8 (common case)
case 2# if needle < 8 (common case)
case 3# For page boundary cross haystack. (rare case)

For case 1, I always compare 8 characters at a time using cmpb instruction.
-> perform basic strlen, strnlen, checks. Move haystack based on strchr result. -> Load first 8 bytes(taking care of aligned load)from haystack and needle and compare them.
If they are same proceed to next 8 bytes of haystack and needle.
Else load next doubleword from haystack.Form a doubleword with last seven characters from first load and first character from
second load and compare again with needle. Proceed in a similar way.
This gives good improvement.

For case2, always compare 'n'  characters at a time, 'n' is needle length.
-> Algorithm is same as case 1, however before doing cmpb,create input mask based on needle len.

For case 1 and case 2, input is always loaded as double word.

For case 3, its byte by byte comparison.


--
Thanks
Rajalakshmi S


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]