This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v5] aarch64: thunderx2 memcpy optimizations for ext-based code path
- From: Steve Ellcey <sellcey at marvell dot com>
- To: "anton dot youdkevitch at bell-sw dot com" <anton dot youdkevitch at bell-sw dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Tue, 2 Apr 2019 22:48:27 +0000
- Subject: Re: [PATCH v5] aarch64: thunderx2 memcpy optimizations for ext-based code path
- References: <5CA2145C.8040402@bell-sw.com>
On Mon, 2019-04-01 at 16:38 +0300, Anton Youdkevitch wrote:
> Here is the updated patch for improving the long unaligned
> code path (the one using "ext" instruction).
>
> 1. Always taken conditional branch at the beginning is
> removed.
>
> 2. Epilogue code is placed after the end of the loop to
> reduce the number of branches.
>
> 3. The redundant "mov" instructions inside the loop are
> gone due to the changed order of the registers in the "ext"
> instructions inside the loop, the prologue has additional
> "ext" instruction.
>
> 4.Updating count in the prologue was hoisted out as
> it is the same update for each prologue.
>
> 5. Invariant code of the loop epilogue was hoisted out.
>
> 6. As the current size of the ext chunk is exactly 16
> instructions long "nop" was added at the beginning
> of the code sequence so that the loop entry for all the
> chunks be aligned.
>
> make check - no regression (on linux-aarch64)
> make bench - no performance regressions (on Thunderx2)
>
> Looks OK?
This looks good to me Anton. I can check it in for you if we have a
consensus that this version is OK and there are no objections.
Steve Ellcey
sellcey@marvell.com