This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v5] aarch64: thunderx2 memcpy optimizations for ext-based code path
- From: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
- To: Steve Ellcey <sellcey at marvell dot com>, "anton dot youdkevitch at bell-sw dot com" <anton dot youdkevitch at bell-sw dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: nd <nd at arm dot com>
- Date: Fri, 5 Apr 2019 15:21:51 +0000
- Subject: Re: [PATCH v5] aarch64: thunderx2 memcpy optimizations for ext-based code path
- References: <5CA2145C.firstname.lastname@example.org> <email@example.com>
On 02/04/2019 23:48, Steve Ellcey wrote:
> On Mon, 2019-04-01 at 16:38 +0300, Anton Youdkevitch wrote:
>> Here is the updated patch for improving the long unaligned
>> code path (the one using "ext" instruction).
>> 1. Always taken conditional branch at the beginning is
>> 2. Epilogue code is placed after the end of the loop to
>> reduce the number of branches.
>> 3. The redundant "mov" instructions inside the loop are
>> gone due to the changed order of the registers in the "ext"
>> instructions inside the loop, the prologue has additional
>> "ext" instruction.
>> 4.Updating count in the prologue was hoisted out as
>> it is the same update for each prologue.
>> 5. Invariant code of the loop epilogue was hoisted out.
>> 6. As the current size of the ext chunk is exactly 16
>> instructions long "nop" was added at the beginning
>> of the code sequence so that the loop entry for all the
>> chunks be aligned.
>> make check - no regression (on linux-aarch64)
>> make bench - no performance regressions (on Thunderx2)
>> Looks OK?
> This looks good to me Anton. I can check it in for you if we have a
> consensus that this version is OK and there are no objections.
yes, this is OK to commit, i have no objections.