This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 5/1/2019 14:34, Szabolcs Nagy wrote:
On 30/04/2019 13:40, Anton Youdkevitch wrote:Now with the patch On Tue, Apr 30, 2019 at 03:37:32PM +0300, Anton Youdkevitch wrote:Here is the patch to make memove use thunderx2 capabilities more efficient. The performance improvement is about 20%-30% for larger cases and about 1%-5% for smaller cases.this or similar statement about the performance improvement on thunderx2 should be added to the commit message.
Will do.
Used SIMD load/store instead of GPR for overlapping forward move. Reused existing memcpy implementation for small or overlapping backward move. Fixed the existing memcpy implementation to allow it to deal with the overlapping case. Simplified loop tails in the memcpy implementation - use branchless overlapping sequence of fixed length load/stores instead of branching depending on the size. Fixed some missing optimization mainly wrt ldr/str to ldp/stp conversion. Added __memmove_thunderx2 to the list of the available implementations. make check on linux/aarch64 - no regressions make bench on thunderx2 - improvements Looks OK? * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Added __memmove_thunderx2 to the list of implementations * sysdeps/aarch64/multiarch/memmove.c: Likewise * sysdeps/aarch64/multiarch/memcpy_thunderx2.S: (__memmove_thunderx2): rewritten using SIMD ld/st (__memcpy_thunderx2): fixed to handle overlapping casesThis is ok to commit with the commit message fixed.
OK, thanks!
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |