This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc64: strrchr optimization for power8


On 03/09/2017 01:14 AM, Rajalakshmi Srinivasaraghavan wrote:
> 
> 
> On 02/28/2017 01:02 PM, Rajalakshmi Srinivasaraghavan wrote:
>>
>>
>> On 02/20/2017 09:36 PM, Carlos O'Donell wrote:
>>> On 02/20/2017 11:01 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>>
>>>>
>>>> On 02/20/2017 07:12 PM, Carlos O'Donell wrote:
>>>>> On 02/14/2017 06:05 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>>>> P7 code is used for <=32B strings and for > 32B vectorized loops
>>>>>> are used.
>>>>>> This shows as an average 25% improvement depending on the position
>>>>>> of search
>>>>>> character.  The performance is same for shorter strings.
>>>>>> Tested on ppc64 and ppc64le.
>>>>> What did you use to test the 25% improvement?
>>>>
>>>> This improvement is seen when compared to power7. Benchtest is
>>>> modified to use length from 0 to 400  to find the average for
>>>> different lengths.
>>>
>>> Could you post your modifications for review an explain your
>>> process in a little more detail. I'm curious about the changes
>>> you made.
>>
>> Carlos,
>> Posted benchtest modification here:
>> https://sourceware.org/ml/libc-alpha/2017-02/msg00380.html
> 
> Carlos,
> 
> Do you have further comments?
 
This is exactly what I was interested in seeing, and I see Siddhesh
has approved your commit to benchtests to increase the string lengths
used in the analysis.

When I review these changes I look at:

(a) What microbenchmark did you use?

- Can we include it in glibc?

  * We did, your improvements should be going into master so others
    can reproduce them.

(b) What assumptions did you make and were they valid?

Increasing the microbenchmarks to measure up to 512 bytes is probably
a good thing to give broad coverage over the performance from small
to large strings that are multiples of most cache lines (and places
where prefetching might start helping).

Does IBM internally have any good data about what the low, median,
average, and high lengths of strings that are being used with the
strrchr API? Such gathered statistical data would allow us to tune
the microbencharmk.

Knowing the mean value of string lengths would let us decide where
to place most of optimization efforts. I don't know that we have any
good references to academic literature here.

Your lack of such references in your patch means you don't know either,
but given that you indicate low string size performance is no worse,
this patch looks fine.

In summary:

- You assume applications will be using strings > 32 bytes, and that's
  not an entirely unreasonable assumption to make.

- You show performance with <= 32b remains the same and longer string
  lengths improve.

- You contribute the microbenchmark changes that allowed you to measure
  these numbers.

That's exactly what I want to see from a good contribution.

Now I plotted the power8 performance and there is a big bump in the middle,
any idea why?

https://docs.google.com/a/redhat.com/spreadsheets/d/16kW90bXH7nC8Ak6Xyoe4cxVIvFPwjVDcO-7qsZs0iVc/pubhtml

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]