This is the mail archive of the glibc-cvs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GNU C Library master sources branch master updated. glibc-2.28.9000-305-g5770c0a


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  5770c0ad1e0c784e817464ca2cf9436a58c9beb7 (commit)
      from  9a62a9397d0a25643922d8d053f04ee895100d9a (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=5770c0ad1e0c784e817464ca2cf9436a58c9beb7

commit 5770c0ad1e0c784e817464ca2cf9436a58c9beb7
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Tue Nov 20 12:37:00 2018 +0000

    [AArch64] Adjust writeback in non-zero memset
    
    This fixes an ineffiency in the non-zero memset.  Delaying the writeback
    until the end of the loop is slightly faster on some cores - this shows
    ~5% performance gain on Cortex-A53 when doing large non-zero memsets.
    
    	* sysdeps/aarch64/memset.S (MEMSET): Improve non-zero memset loop.

diff --git a/ChangeLog b/ChangeLog
index d340866..be23442 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2018-11-20  Wilco Dijkstra  <wdijkstr@arm.com>
+
+	* sysdeps/aarch64/memset.S (MEMSET): Improve non-zero memset loop.
+
 2018-11-20  Joseph Myers  <joseph@codesourcery.com>
 
 	* conform/conformtest.py (ElementTest.run): Use unique identifiers
diff --git a/sysdeps/aarch64/memset.S b/sysdeps/aarch64/memset.S
index 4a45459..9738cf5 100644
--- a/sysdeps/aarch64/memset.S
+++ b/sysdeps/aarch64/memset.S
@@ -89,10 +89,10 @@ L(set_long):
 	b.eq	L(try_zva)
 L(no_zva):
 	sub	count, dstend, dst	/* Count is 16 too large.  */
-	add	dst, dst, 16
+	sub	dst, dst, 16		/* Dst is biased by -32.  */
 	sub	count, count, 64 + 16	/* Adjust count and bias for loop.  */
-1:	stp	q0, q0, [dst], 64
-	stp	q0, q0, [dst, -32]
+1:	stp	q0, q0, [dst, 32]
+	stp	q0, q0, [dst, 64]!
 L(tail64):
 	subs	count, count, 64
 	b.hi	1b
@@ -183,6 +183,7 @@ L(zva_other):
 	subs	count, count, zva_len
 	b.hs	3b
 4:	add	count, count, zva_len
+	sub	dst, dst, 32		/* Bias dst for tail loop.  */
 	b	L(tail64)
 #endif
 

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                |    4 ++++
 sysdeps/aarch64/memset.S |    7 ++++---
 2 files changed, 8 insertions(+), 3 deletions(-)


hooks/post-receive
-- 
GNU C Library master sources


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]