This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v5] aarch64: thunderx2 memcpy optimizations for ext-based code path


On 02/04/2019 23:48, Steve Ellcey wrote:
> On Mon, 2019-04-01 at 16:38 +0300, Anton Youdkevitch wrote:
>> Here is the updated patch for improving the long unaligned
>> code path (the one using "ext" instruction).
>>
>> 1. Always taken conditional branch at the beginning is
>> removed.
>>
>> 2. Epilogue code is placed after the end of the loop to
>> reduce the number of branches.
>>
>> 3. The redundant "mov" instructions inside the loop are
>> gone due to the changed order of the registers in the "ext"
>> instructions inside the loop,  the prologue has additional
>> "ext" instruction.
>>
>> 4.Updating count in the prologue was hoisted out as
>> it is the same update for each prologue.
>>
>> 5. Invariant code of the loop epilogue was hoisted out.
>>
>> 6. As the current size of the ext chunk is exactly 16
>> instructions long "nop" was added at the beginning
>> of the code sequence so that the loop entry for all the
>> chunks be aligned.
>>
>> make check - no regression (on linux-aarch64)
>> make bench - no performance regressions (on Thunderx2)
>>
>> Looks OK?
> 
> This looks good to me Anton.  I can check it in for you if we have a
> consensus that this version is OK and there are no objections.

yes, this is OK to commit, i have no objections.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]