This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: memcpy performance regressions 2.19 -> 2.24(5)
On Tue, May 23, 2017 at 3:12 PM, Erich Elsen <eriche@google.com> wrote:
> Sounds good to me. Even if tunables aren't added, does memcpy.S ->
> memcpy.c seem reasonable?
I prefer not to do it for now. We can revisit it later after tunable is added
to cpu_features.
BTW, REP MOV is expected to have lower bandwidth on multi-socket
systems, but has the benefit of lower cache disruption throughout the
cache hierarchy. This is trade off of between overall system throughput
and single program performance.
> On Tue, May 23, 2017 at 3:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, May 23, 2017 at 1:57 PM, Erich Elsen <eriche@google.com> wrote:
>>> Maybe there's room for both?
>>>
>>> Setting the cpu_features would affect everything; it would be useful
>>> to be able to target only specific (and very important) routines.
>>
>> I prefer to do the cpu_features first. If it turns out not
>> sufficient, we then do
>> the IFUNC implementation.
>>
>>> On Tue, May 23, 2017 at 1:46 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Tue, May 23, 2017 at 1:39 PM, Erich Elsen <eriche@google.com> wrote:
>>>>> I was also thinking that it might be nice to have a TUNABLE that sets
>>>>> the implementation of memcpy directly. It would be easier to do this
>>>>> if memcpy.S was memcpy.c. Attached is a patch that does the
>>>>> conversion but doesn't add the tunables. How would you feel about
>>>>> this? It has no runtime impact, probably increases the size slightly,
>>>>> and makes the code easier to read / modify.
>>>>>
>>>>
>>>> It depends on how far you want to go. We can add TUNABLE support
>>>> to each IFUNC implementation or we can add TUNABLE support to
>>>> cpu_features to update processor features. I prefer latter.
>>>>
>>>>
>>>> --
>>>> H.J.
>>
>>
>>
>> --
>> H.J.
--
H.J.