This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory




On 12/09/2017 01:10 AM, Tulio Magno Quites Machado Filho wrote:
From: Adhemerval Zanella<azanella@linux.vnet.ibm.com>

I made the changes I requested, updated copyright entries, added a
manual entry and fixed a build issue on powerpc64.

--- 8< ---

On POWER8, unaligned memory accesses to cached memory has little impact
on performance as opposed to its ancestors.

It is disabled by default and will only be available when the tunable
glibc.tune.cached_memopt is set to 1.

                  __memcpy_power8_cached      __memcpy_power7
============================================================
     max-size=4096:     33325.70 ( 12.65%)        38153.00
     max-size=8192:     32878.20 ( 11.17%)        37012.30
    max-size=16384:     33782.20 ( 11.61%)        38219.20
    max-size=32768:     33296.20 ( 11.30%)        37538.30
    max-size=65536:     33765.60 ( 10.53%)        37738.40

2017-12-08  Adhemerval Zanella<azanella@linux.vnet.ibm.com>
	    Tulio Magno Quites Machado Filho<tuliom@linux.vnet.ibm.com>

	* manual/tunables.texi (Hardware Capability Tunables): Document
	glibc.tune.cached_memopt.
	* sysdeps/powerpc/cpu-features.c: New file.
	* sysdeps/powerpc/cpu-features.h: New file.
	* sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add
	_dl_powerpc_cpu_features.
	* sysdeps/powerpc/dl-tunables.list: New file.
	* sysdeps/powerpc/ldsodefs.h: Include cpu-features.h.
	* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: .

Comment missing.
	* sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize
	use_aligned_memopt.

Should this be moved to init-arch.h? (also use_cached_memopt)
	* sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines):
	Add memcpy-power8-cached.
	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add
	__memcpy_power8_cached.
	* sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S:
	New file.
---
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
new file mode 100644
index 0000000..e5b6f25
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S
@@ -0,0 +1,179 @@
+	stxvd2x	v0,r0,r3
+L(dst_is_align_16):
+	cmpldi	cr7,r5,127
+	ble	cr7,L(tail_copy)
+	addi	r8,r5,-128
+	mr	r9,r12
+	rldicr	r8,r8,0,56
+	li	r11,16
+	srdi	r10,r8,7
+	addi	r0,r8,128
+	addi	r10,r10,1

Can we directly do
	rldicr  r0, r5, 0, 56
	srdi    r10,r5,7
instead of this sequence?
79         addi    r8,r5,-128
81         rldicr  r8,r8,0,56
83         srdi    r10,r8,7
84         addi    r0,r8,128
85         addi    r10,r10,1

+	li	r6,32
+	mtctr	r10
+	li	r7,48
+
+	/* Main loop, copy 128 bytes each time.  */

LGTM.

Reviewed-by: Rajalakshmi Srinivasaraghavan  <raji@linux.vnet.ibm.com>


--
Thanks
Rajalakshmi S


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]