This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH 0/2] aarch64,falkor: memcpy/memmove performance improvements
- From: Siddhesh Poyarekar <siddhesh at sourceware dot org>
- To: libc-alpha at sourceware dot org
- Date: Thu, 3 May 2018 23:22:07 +0530
- Subject: [PATCH 0/2] aarch64,falkor: memcpy/memmove performance improvements
Hi,
Here are a couple of patches to improve performance of the falkor memcpy
and memmove implementations based on testing on the latest hardware.
The theme of the optimization is to avoid trying to train the hardware
prefetcher for smaller sizes and in the loop tail since that just
mis-trains the prefetcher. Instead, use multiple registers to aid
reordering wherever possible. Testing showed that regressions in these
sizes compared to generic memcpy are resolved with this patch.
Siddhesh
Siddhesh Poyarekar (2):
aarch64,falkor: Ignore prefetcher hints for memmove tail
Ignore prefetcher tagging for smaller copies
sysdeps/aarch64/multiarch/memcpy_falkor.S | 68 ++++++++++++++++++------------
sysdeps/aarch64/multiarch/memmove_falkor.S | 48 ++++++++++++---------
2 files changed, 70 insertions(+), 46 deletions(-)
--
2.14.3