This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] aarch64: thunderx2 memmove performance improvements

From: Anton Youdkevitch <anton dot youdkevitch at bell-sw dot com>
To: Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
Cc: nd <nd at arm dot com>
Date: Fri, 3 May 2019 00:24:21 +0300
Subject: Re: [PATCH v2] aarch64: thunderx2 memmove performance improvements
References: <20190430123731.GA13500@bell-sw.com> <20190430124042.GB13500@bell-sw.com> <812e6b8a-2d3f-ddac-bc3b-2c8fd17a5daa@arm.com>

On 5/1/2019 14:34, Szabolcs Nagy wrote:

On 30/04/2019 13:40, Anton Youdkevitch wrote:

Now with the patch

On Tue, Apr 30, 2019 at 03:37:32PM +0300, Anton Youdkevitch wrote:

Here is the patch to make memove use thunderx2
capabilities more efficient.

The performance improvement is about 20%-30% for
larger cases and about 1%-5% for smaller cases.


this or similar statement about the performance
improvement on thunderx2 should be added to the
commit message.

Will do.


Used SIMD load/store instead of GPR for overlapping
forward move.

Reused existing memcpy implementation for small or
overlapping backward move.

Fixed the existing memcpy implementation to allow it
to deal with the overlapping case.

Simplified loop tails in the memcpy implementation -
use branchless overlapping sequence of fixed length
load/stores instead of branching depending on the
size.

Fixed some missing optimization mainly wrt ldr/str
to ldp/stp conversion.

Added __memmove_thunderx2 to the list of the
available implementations.


make check on linux/aarch64 - no regressions
make bench on thunderx2     - improvements

Looks OK?

* sysdeps/aarch64/multiarch/ifunc-impl-list.c: Added
   __memmove_thunderx2 to the list of implementations
* sysdeps/aarch64/multiarch/memmove.c: Likewise
* sysdeps/aarch64/multiarch/memcpy_thunderx2.S:
   (__memmove_thunderx2): rewritten using SIMD ld/st
   (__memcpy_thunderx2): fixed to handle overlapping cases


This is ok to commit with the commit message fixed.

OK, thanks!

References:
- Re: [PATCH v2] aarch64: thunderx2 memmove performance improvements
  - From: Szabolcs Nagy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]