This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add x86-64 memmove with unaligned load/store and rep movsb


On Tue, Mar 29, 2016 at 10:41 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 03/29/2016 12:58 PM, H.J. Lu wrote:
>> The goal of this patch is to replace SSE2 memcpy.S,
>> memcpy-avx-unaligned.S and memmove-avx-unaligned.S as well as
>> provide SSE2 memmove with faster alternatives.  bench-memcpy and
>> bench-memmove data on various Intel and AMD processors are at
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=19776
>>
>> Any comments, feedbacks?
>
> I assume this is a WIP? I don't see how this code replaces the memcpy@GLIBC_2.14
> IFUNC we're currently using, or redirects the IFUNC to use your new functions
> under certain conditions.
>
> For memcpy:
>
> * On ivybridge the new code regresses 9% mean performance versus AVX usage?

Ivy Bridge currently uses__memcpy_sse2_unaligned.  The new one
will be __memcpy_sse2_unaligned_erms, not __memcpy_avx_unaligned_erms.

> * On penryn the new code regresses 18% mean performance versus SSE2 usage?

Penryn will sick with __memcpy_ssse3_back.

> * On bulldozer the new code regresses 18% mean performance versus AVX usage,
>   and 3% versus SSE2 usage?

Bulldozer will stick with __memcpy_ssse3.

> This means that out of 11 hardware configurations the patch regresses 4
> of those configurations, while progressing 7. If all devices are of equal
> value, then this change is of mixed benefit.
>
> Which is a mean improvement of 14% in the cases which improved, and a mean
> degradation of 12% in the cases which had worse performance.
>
> This seems like a bad change for Ivybridge, Penry, and Bulldozer.
>
> Can you explain the loss of performance in terms of the hardware that is
> impacted, why did it do worse?
>
> Is it possible to limit the change to those key architectures where the
> optimizations make a difference? Are you trying to avoid the maintenance
> burden of yet another set of optimized routines?
>

As I said, the new one will replace the old one.  That is the new SSE2/AVX
replaces the old SSE2/AVX.  It won't change the choice of SSE2, SSSE3
nor AVX for a given processor.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]