This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v5] aarch64: thunderx2 memcpy optimizations for ext-based code path


On Mon, 2019-04-01 at 16:38 +0300, Anton Youdkevitch wrote:
> Here is the updated patch for improving the long unaligned
> code path (the one using "ext" instruction).
> 
> 1. Always taken conditional branch at the beginning is
> removed.
> 
> 2. Epilogue code is placed after the end of the loop to
> reduce the number of branches.
> 
> 3. The redundant "mov" instructions inside the loop are
> gone due to the changed order of the registers in the "ext"
> instructions inside the loop,  the prologue has additional
> "ext" instruction.
> 
> 4.Updating count in the prologue was hoisted out as
> it is the same update for each prologue.
> 
> 5. Invariant code of the loop epilogue was hoisted out.
> 
> 6. As the current size of the ext chunk is exactly 16
> instructions long "nop" was added at the beginning
> of the code sequence so that the loop entry for all the
> chunks be aligned.
> 
> make check - no regression (on linux-aarch64)
> make bench - no performance regressions (on Thunderx2)
> 
> Looks OK?

This looks good to me Anton.  I can check it in for you if we have a
consensus that this version is OK and there are no objections.

Steve Ellcey
sellcey@marvell.com

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]