This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Add x86-64 memmove with unaligned load/store and rep movsb
- From: Carlos O'Donell <carlos at redhat dot com>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Wed, 30 Mar 2016 19:12:42 -0400
- Subject: Re: [PATCH] Add x86-64 memmove with unaligned load/store and rep movsb
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOopQ5rUGgH2vu9Xwe02Qw0UNrVNCNOAakiV7h0ukciMtQ at mail dot gmail dot com> <56FABE63 dot 2040705 at redhat dot com> <CAMe9rOoeFQ=a7=EA0DK9ug1ffZZjzbdu3vfwk0scbrro2aEu3g at mail dot gmail dot com>
On 03/29/2016 02:04 PM, H.J. Lu wrote:
> On Tue, Mar 29, 2016 at 10:41 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 03/29/2016 12:58 PM, H.J. Lu wrote:
>>> The goal of this patch is to replace SSE2 memcpy.S,
>>> memcpy-avx-unaligned.S and memmove-avx-unaligned.S as well as
>>> provide SSE2 memmove with faster alternatives. bench-memcpy and
>>> bench-memmove data on various Intel and AMD processors are at
>>>
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=19776
>>>
>>> Any comments, feedbacks?
>>
>> I assume this is a WIP? I don't see how this code replaces the memcpy@GLIBC_2.14
>> IFUNC we're currently using, or redirects the IFUNC to use your new functions
>> under certain conditions.
>>
>> For memcpy:
>>
>> * On ivybridge the new code regresses 9% mean performance versus AVX usage?
>
> Ivy Bridge currently uses__memcpy_sse2_unaligned. The new one
> will be __memcpy_sse2_unaligned_erms, not __memcpy_avx_unaligned_erms.
OK.
>> * On penryn the new code regresses 18% mean performance versus SSE2 usage?
>
> Penryn will sick with __memcpy_ssse3_back.
OK.
>> * On bulldozer the new code regresses 18% mean performance versus AVX usage,
>> and 3% versus SSE2 usage?
>
> Bulldozer will stick with __memcpy_ssse3.
OK.
> As I said, the new one will replace the old one. That is the new SSE2/AVX
> replaces the old SSE2/AVX. It won't change the choice of SSE2, SSSE3
> nor AVX for a given processor.
Perfect. Thanks for clarifying. In that case, as long as the ifunc choices
are as above, then it looks good to me.
I've responded again with a more detailed review.
--
Cheers,
Carlos.