This is the mail archive of the
glibc-cvs@sourceware.org
mailing list for the glibc project.
GNU C Library master sources branch master updated. glibc-2.28.9000-305-g5770c0a
- From: wilco at sourceware dot org
- To: glibc-cvs at sourceware dot org
- Date: 20 Nov 2018 12:38:40 -0000
- Subject: GNU C Library master sources branch master updated. glibc-2.28.9000-305-g5770c0a
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, master has been updated
via 5770c0ad1e0c784e817464ca2cf9436a58c9beb7 (commit)
from 9a62a9397d0a25643922d8d053f04ee895100d9a (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=5770c0ad1e0c784e817464ca2cf9436a58c9beb7
commit 5770c0ad1e0c784e817464ca2cf9436a58c9beb7
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date: Tue Nov 20 12:37:00 2018 +0000
[AArch64] Adjust writeback in non-zero memset
This fixes an ineffiency in the non-zero memset. Delaying the writeback
until the end of the loop is slightly faster on some cores - this shows
~5% performance gain on Cortex-A53 when doing large non-zero memsets.
* sysdeps/aarch64/memset.S (MEMSET): Improve non-zero memset loop.
diff --git a/ChangeLog b/ChangeLog
index d340866..be23442 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2018-11-20 Wilco Dijkstra <wdijkstr@arm.com>
+
+ * sysdeps/aarch64/memset.S (MEMSET): Improve non-zero memset loop.
+
2018-11-20 Joseph Myers <joseph@codesourcery.com>
* conform/conformtest.py (ElementTest.run): Use unique identifiers
diff --git a/sysdeps/aarch64/memset.S b/sysdeps/aarch64/memset.S
index 4a45459..9738cf5 100644
--- a/sysdeps/aarch64/memset.S
+++ b/sysdeps/aarch64/memset.S
@@ -89,10 +89,10 @@ L(set_long):
b.eq L(try_zva)
L(no_zva):
sub count, dstend, dst /* Count is 16 too large. */
- add dst, dst, 16
+ sub dst, dst, 16 /* Dst is biased by -32. */
sub count, count, 64 + 16 /* Adjust count and bias for loop. */
-1: stp q0, q0, [dst], 64
- stp q0, q0, [dst, -32]
+1: stp q0, q0, [dst, 32]
+ stp q0, q0, [dst, 64]!
L(tail64):
subs count, count, 64
b.hi 1b
@@ -183,6 +183,7 @@ L(zva_other):
subs count, count, zva_len
b.hs 3b
4: add count, count, zva_len
+ sub dst, dst, 32 /* Bias dst for tail loop. */
b L(tail64)
#endif
-----------------------------------------------------------------------
Summary of changes:
ChangeLog | 4 ++++
sysdeps/aarch64/memset.S | 7 ++++---
2 files changed, 8 insertions(+), 3 deletions(-)
hooks/post-receive
--
GNU C Library master sources