This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc64: strrchr optimization for power8
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>, libc-alpha at sourceware dot org
- Date: Fri, 17 Mar 2017 11:38:48 -0400
- Subject: Re: [PATCH] powerpc64: strrchr optimization for power8
- Authentication-results: sourceware.org; auth=none
- References: <1487070321-27700-1-git-send-email-raji@linux.vnet.ibm.com> <52479e2b-f675-9d6e-7b34-27e24ee48081@redhat.com> <55dfbdb0-3776-caa9-e87b-10c297b4b496@linux.vnet.ibm.com> <139ea9b7-005d-fd33-a223-fe36fc995f4d@redhat.com> <00958c35-0b2d-60c9-f618-cfcd185710e8@linux.vnet.ibm.com> <5af3baaf-a340-cf07-10fc-f877d803dd6c@linux.vnet.ibm.com>
On 03/09/2017 01:14 AM, Rajalakshmi Srinivasaraghavan wrote:
>
>
> On 02/28/2017 01:02 PM, Rajalakshmi Srinivasaraghavan wrote:
>>
>>
>> On 02/20/2017 09:36 PM, Carlos O'Donell wrote:
>>> On 02/20/2017 11:01 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>>
>>>>
>>>> On 02/20/2017 07:12 PM, Carlos O'Donell wrote:
>>>>> On 02/14/2017 06:05 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>>>> P7 code is used for <=32B strings and for > 32B vectorized loops
>>>>>> are used.
>>>>>> This shows as an average 25% improvement depending on the position
>>>>>> of search
>>>>>> character. The performance is same for shorter strings.
>>>>>> Tested on ppc64 and ppc64le.
>>>>> What did you use to test the 25% improvement?
>>>>
>>>> This improvement is seen when compared to power7. Benchtest is
>>>> modified to use length from 0 to 400 to find the average for
>>>> different lengths.
>>>
>>> Could you post your modifications for review an explain your
>>> process in a little more detail. I'm curious about the changes
>>> you made.
>>
>> Carlos,
>> Posted benchtest modification here:
>> https://sourceware.org/ml/libc-alpha/2017-02/msg00380.html
>
> Carlos,
>
> Do you have further comments?
This is exactly what I was interested in seeing, and I see Siddhesh
has approved your commit to benchtests to increase the string lengths
used in the analysis.
When I review these changes I look at:
(a) What microbenchmark did you use?
- Can we include it in glibc?
* We did, your improvements should be going into master so others
can reproduce them.
(b) What assumptions did you make and were they valid?
Increasing the microbenchmarks to measure up to 512 bytes is probably
a good thing to give broad coverage over the performance from small
to large strings that are multiples of most cache lines (and places
where prefetching might start helping).
Does IBM internally have any good data about what the low, median,
average, and high lengths of strings that are being used with the
strrchr API? Such gathered statistical data would allow us to tune
the microbencharmk.
Knowing the mean value of string lengths would let us decide where
to place most of optimization efforts. I don't know that we have any
good references to academic literature here.
Your lack of such references in your patch means you don't know either,
but given that you indicate low string size performance is no worse,
this patch looks fine.
In summary:
- You assume applications will be using strings > 32 bytes, and that's
not an entirely unreasonable assumption to make.
- You show performance with <= 32b remains the same and longer string
lengths improve.
- You contribute the microbenchmark changes that allowed you to measure
these numbers.
That's exactly what I want to see from a good contribution.
Now I plotted the power8 performance and there is a big bump in the middle,
any idea why?
https://docs.google.com/a/redhat.com/spreadsheets/d/16kW90bXH7nC8Ak6Xyoe4cxVIvFPwjVDcO-7qsZs0iVc/pubhtml
--
Cheers,
Carlos.