This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v5] aarch64: thunderx2 memcpy optimizations for ext-based code path
- From: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
- To: Steve Ellcey <sellcey at marvell dot com>, "anton dot youdkevitch at bell-sw dot com" <anton dot youdkevitch at bell-sw dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: nd <nd at arm dot com>
- Date: Fri, 5 Apr 2019 15:21:51 +0000
- Subject: Re: [PATCH v5] aarch64: thunderx2 memcpy optimizations for ext-based code path
- References: <5CA2145C.8040402@bell-sw.com> <77ad27c025004dc9f7668baea2d1af59e0d8d3a4.camel@marvell.com>
On 02/04/2019 23:48, Steve Ellcey wrote:
> On Mon, 2019-04-01 at 16:38 +0300, Anton Youdkevitch wrote:
>> Here is the updated patch for improving the long unaligned
>> code path (the one using "ext" instruction).
>>
>> 1. Always taken conditional branch at the beginning is
>> removed.
>>
>> 2. Epilogue code is placed after the end of the loop to
>> reduce the number of branches.
>>
>> 3. The redundant "mov" instructions inside the loop are
>> gone due to the changed order of the registers in the "ext"
>> instructions inside the loop, the prologue has additional
>> "ext" instruction.
>>
>> 4.Updating count in the prologue was hoisted out as
>> it is the same update for each prologue.
>>
>> 5. Invariant code of the loop epilogue was hoisted out.
>>
>> 6. As the current size of the ext chunk is exactly 16
>> instructions long "nop" was added at the beginning
>> of the code sequence so that the loop entry for all the
>> chunks be aligned.
>>
>> make check - no regression (on linux-aarch64)
>> make bench - no performance regressions (on Thunderx2)
>>
>> Looks OK?
>
> This looks good to me Anton. I can check it in for you if we have a
> consensus that this version is OK and there are no objections.
yes, this is OK to commit, i have no objections.