This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCHv2] powerpc: P9 vector load instruction change in memcpy and memmove
- From: "Tulio Magno Quites Machado Filho" <tuliom at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org, adhemerval dot zanella at linaro dot org
- Cc: raji at linux dot vnet dot ibm dot com
- Date: Thu, 19 Oct 2017 16:20:56 -0200
- Subject: [PATCHv2] powerpc: P9 vector load instruction change in memcpy and memmove
- Authentication-results: sourceware.org; auth=none
From: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> According to "POWER8 Processor User’s Manual for the Single-Chip Module"
> (it is buried on a sign wall at [1]), both lxv2dx/lvx and stxvd2x/stvx
> uses the same pipeline, have the same latency and same throughput. The
> only difference is lxv2dx/stxv2x have microcode handling for unaligned
> case and for 4k crossing or 32-byte cross L1 miss (which should not
> occur in the with aligned address).
>
> Why not change POWER7 implementation instead of dropping another one
> which is exactly the same for POWER9?
We're trying to limit the impact of this requirement on other processors so
that newer P7 or P8 optimizations can still benefit from lxv2dx and stxvd2x.
However, we could avoid source code duplication with the macros LVX and STVX
I propose here in version 2.
That way, we will postpone the copy to when/if a P7 optimization is
contributed.
Do you think it's better?
--- 8< ---
POWER9 DD2.1 and earlier has an issue where some cache inhibited
vector load traps to the kernel, causing a performance degradation. To
handle this in memcpy and memmove, lvx/stvx is used for aligned
addresses instead of lxvd2x/stxvd2x. The remaining part of the
optimization remains same as existing POWER7 code.
Reference: https://patchwork.ozlabs.org/patch/814059/
Tested on powerpc64le.
2017-10-19 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
* sysdeps/powerpc/powerpc64/multiarch/Makefile
(sysdep_routines): Add memcpy_power9 and memmove_power9.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
(memcpy): Add __memcpy_power9 to list of memcpy functions.
(memmove): Add __memmove_power9 to list of memmove functions.
(bcopy): Add __bcopy_power9 to list of bcopy functions.
* sysdeps/powerpc/powerpc64/multiarch/memcpy.c
(memcpy): Add __memcpy_power9 to ifunc list.
* sysdeps/powerpc/powerpc64/power9/memcpy.S: New File.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power9.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/bcopy.c
(bcopy): Add __bcopy_power9 to ifunc list.
* sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S
Change bcopy as __bcopy.
* sysdeps/powerpc/powerpc64/multiarch/memmove.c
(memmove): Add __memmove_power9 to ifunc list.
* sysdeps/powerpc/powerpc64/power7/memcpy.S (LVX, STVX): New
macros to help reuse this code on POWER9.
* sysdeps/powerpc/powerpc64/power7/memmove.S:
Alias bcopy only if not defined before.
(LVX, STVX): New macros to help reuse this code on POWER9.
* sysdeps/powerpc/powerpc64/multiarch/memmove-power9.S:
New file.
* sysdeps/powerpc/powerpc64/power9/memmove.S: Likewise.
---
sysdeps/powerpc/powerpc64/multiarch/Makefile | 7 +-
sysdeps/powerpc/powerpc64/multiarch/bcopy.c | 6 +-
.../powerpc/powerpc64/multiarch/ifunc-impl-list.c | 6 +
.../powerpc/powerpc64/multiarch/memcpy-power9.S | 26 ++++
sysdeps/powerpc/powerpc64/multiarch/memcpy.c | 3 +
.../powerpc/powerpc64/multiarch/memmove-power7.S | 4 +-
.../powerpc/powerpc64/multiarch/memmove-power9.S | 29 +++++
sysdeps/powerpc/powerpc64/multiarch/memmove.c | 5 +-
sysdeps/powerpc/powerpc64/power7/memcpy.S | 68 ++++++-----
sysdeps/powerpc/powerpc64/power7/memmove.S | 134 +++++++++++----------
sysdeps/powerpc/powerpc64/power9/memcpy.S | 23 ++++
sysdeps/powerpc/powerpc64/power9/memmove.S | 22 ++++
12 files changed, 230 insertions(+), 103 deletions(-)
create mode 100644 sysdeps/powerpc/powerpc64/multiarch/memcpy-power9.S
create mode 100644 sysdeps/powerpc/powerpc64/multiarch/memmove-power9.S
create mode 100644 sysdeps/powerpc/powerpc64/power9/memcpy.S
create mode 100644 sysdeps/powerpc/powerpc64/power9/memmove.S
diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile
index dea49ac..82728fa 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile
@@ -1,6 +1,6 @@
ifeq ($(subdir),string)
-sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \
- memcpy-power4 memcpy-ppc64 \
+sysdep_routines += memcpy-power9 memcpy-power7 memcpy-a2 memcpy-power6 \
+ memcpy-cell memcpy-power4 memcpy-ppc64 \
memcmp-power8 memcmp-power7 memcmp-power4 memcmp-ppc64 \
memset-power7 memset-power6 memset-power4 \
memset-ppc64 memset-power8 \
@@ -24,7 +24,8 @@ sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \
stpncpy-power8 stpncpy-power7 stpncpy-ppc64 \
strcmp-power9 strcmp-power8 strcmp-power7 strcmp-ppc64 \
strcat-power8 strcat-power7 strcat-ppc64 \
- memmove-power7 memmove-ppc64 wordcopy-ppc64 bcopy-ppc64 \
+ memmove-power9 memmove-power7 memmove-ppc64 \
+ wordcopy-ppc64 bcopy-ppc64 \
strncpy-power8 strstr-power7 strstr-ppc64 \
strspn-power8 strspn-ppc64 strcspn-power8 strcspn-ppc64 \
strlen-power8 strcasestr-power8 strcasestr-ppc64 \
diff --git a/sysdeps/powerpc/powerpc64/multiarch/bcopy.c b/sysdeps/powerpc/powerpc64/multiarch/bcopy.c
index 05d46e2..4a4ee6e 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/bcopy.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/bcopy.c
@@ -22,8 +22,12 @@
extern __typeof (bcopy) __bcopy_ppc attribute_hidden;
/* __bcopy_power7 symbol is implemented at memmove-power7.S */
extern __typeof (bcopy) __bcopy_power7 attribute_hidden;
+/* __bcopy_power9 symbol is implemented at memmove-power9.S. */
+extern __typeof (bcopy) __bcopy_power9 attribute_hidden;
libc_ifunc (bcopy,
- (hwcap & PPC_FEATURE_HAS_VSX)
+ (hwcap2 & PPC_FEATURE2_ARCH_3_00)
+ ? __bcopy_power9
+ : (hwcap & PPC_FEATURE_HAS_VSX)
? __bcopy_power7
: __bcopy_ppc);
diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index 6a88536..9040bbc 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -51,6 +51,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
#ifdef SHARED
/* Support sysdeps/powerpc/powerpc64/multiarch/memcpy.c. */
IFUNC_IMPL (i, name, memcpy,
+ IFUNC_IMPL_ADD (array, i, memcpy, hwcap2 & PPC_FEATURE2_ARCH_3_00,
+ __memcpy_power9)
IFUNC_IMPL_ADD (array, i, memcpy, hwcap & PPC_FEATURE_HAS_VSX,
__memcpy_power7)
IFUNC_IMPL_ADD (array, i, memcpy, hwcap & PPC_FEATURE_ARCH_2_06,
@@ -65,6 +67,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
/* Support sysdeps/powerpc/powerpc64/multiarch/memmove.c. */
IFUNC_IMPL (i, name, memmove,
+ IFUNC_IMPL_ADD (array, i, memmove, hwcap2 & PPC_FEATURE2_ARCH_3_00,
+ __memmove_power9)
IFUNC_IMPL_ADD (array, i, memmove, hwcap & PPC_FEATURE_HAS_VSX,
__memmove_power7)
IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_ppc))
@@ -168,6 +172,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
/* Support sysdeps/powerpc/powerpc64/multiarch/bcopy.c. */
IFUNC_IMPL (i, name, bcopy,
+ IFUNC_IMPL_ADD (array, i, bcopy, hwcap2 & PPC_FEATURE2_ARCH_3_00,
+ __bcopy_power9)
IFUNC_IMPL_ADD (array, i, bcopy, hwcap & PPC_FEATURE_HAS_VSX,
__bcopy_power7)
IFUNC_IMPL_ADD (array, i, bcopy, 1, __bcopy_ppc))
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power9.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power9.S
new file mode 100644
index 0000000..fbd0788
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power9.S
@@ -0,0 +1,26 @@
+/* Optimized memcpy implementation for PowerPC/POWER9.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+#define MEMCPY __memcpy_power9
+
+#undef libc_hidden_builtin_def
+#define libc_hidden_builtin_def(name)
+
+#include <sysdeps/powerpc/powerpc64/power9/memcpy.S>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy.c b/sysdeps/powerpc/powerpc64/multiarch/memcpy.c
index 9f4286c..4c16fa0 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/memcpy.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy.c
@@ -35,8 +35,11 @@ extern __typeof (__redirect_memcpy) __memcpy_cell attribute_hidden;
extern __typeof (__redirect_memcpy) __memcpy_power6 attribute_hidden;
extern __typeof (__redirect_memcpy) __memcpy_a2 attribute_hidden;
extern __typeof (__redirect_memcpy) __memcpy_power7 attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_power9 attribute_hidden;
libc_ifunc (__libc_memcpy,
+ (hwcap2 & PPC_FEATURE2_ARCH_3_00)
+ ? __memcpy_power9 :
(hwcap & PPC_FEATURE_HAS_VSX)
? __memcpy_power7 :
(hwcap & PPC_FEATURE_ARCH_2_06)
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S b/sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S
index a9435fa..0599a39 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S
+++ b/sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S
@@ -23,7 +23,7 @@
#undef libc_hidden_builtin_def
#define libc_hidden_builtin_def(name)
-#undef bcopy
-#define bcopy __bcopy_power7
+#undef __bcopy
+#define __bcopy __bcopy_power7
#include <sysdeps/powerpc/powerpc64/power7/memmove.S>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memmove-power9.S b/sysdeps/powerpc/powerpc64/multiarch/memmove-power9.S
new file mode 100644
index 0000000..16a2267
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/memmove-power9.S
@@ -0,0 +1,29 @@
+/* Optimized memmove implementation for PowerPC64/POWER7.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+
+#define MEMMOVE __memmove_power9
+
+#undef libc_hidden_builtin_def
+#define libc_hidden_builtin_def(name)
+
+#undef __bcopy
+#define __bcopy __bcopy_power9
+
+#include <sysdeps/powerpc/powerpc64/power9/memmove.S>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/memmove.c b/sysdeps/powerpc/powerpc64/multiarch/memmove.c
index db2bbc7..f02498e 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/memmove.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/memmove.c
@@ -31,9 +31,12 @@ extern __typeof (__redirect_memmove) __libc_memmove;
extern __typeof (__redirect_memmove) __memmove_ppc attribute_hidden;
extern __typeof (__redirect_memmove) __memmove_power7 attribute_hidden;
+extern __typeof (__redirect_memmove) __memmove_power9 attribute_hidden;
libc_ifunc (__libc_memmove,
- (hwcap & PPC_FEATURE_HAS_VSX)
+ (hwcap2 & PPC_FEATURE2_ARCH_3_00)
+ ? __memmove_power9
+ : (hwcap & PPC_FEATURE_HAS_VSX)
? __memmove_power7
: __memmove_ppc);
diff --git a/sysdeps/powerpc/powerpc64/power7/memcpy.S b/sysdeps/powerpc/powerpc64/power7/memcpy.S
index 1ccbc2e..aea1224 100644
--- a/sysdeps/powerpc/powerpc64/power7/memcpy.S
+++ b/sysdeps/powerpc/powerpc64/power7/memcpy.S
@@ -27,6 +27,10 @@
# define MEMCPY memcpy
#endif
+#define LVX lxvd2x
+#define STVX stxvd2x
+
+
#define dst 11 /* Use r11 so r3 kept unchanged. */
#define src 4
#define cnt 5
@@ -91,63 +95,63 @@ L(aligned_copy):
srdi 12,cnt,7
cmpdi 12,0
beq L(aligned_tail)
- lxvd2x 6,0,src
- lxvd2x 7,src,6
+ LVX 6,0,src
+ LVX 7,src,6
mtctr 12
b L(aligned_128loop)
.align 4
L(aligned_128head):
/* for the 2nd + iteration of this loop. */
- lxvd2x 6,0,src
- lxvd2x 7,src,6
+ LVX 6,0,src
+ LVX 7,src,6
L(aligned_128loop):
- lxvd2x 8,src,7
- lxvd2x 9,src,8
- stxvd2x 6,0,dst
+ LVX 8,src,7
+ LVX 9,src,8
+ STVX 6,0,dst
addi src,src,64
- stxvd2x 7,dst,6
- stxvd2x 8,dst,7
- stxvd2x 9,dst,8
- lxvd2x 6,0,src
- lxvd2x 7,src,6
+ STVX 7,dst,6
+ STVX 8,dst,7
+ STVX 9,dst,8
+ LVX 6,0,src
+ LVX 7,src,6
addi dst,dst,64
- lxvd2x 8,src,7
- lxvd2x 9,src,8
+ LVX 8,src,7
+ LVX 9,src,8
addi src,src,64
- stxvd2x 6,0,dst
- stxvd2x 7,dst,6
- stxvd2x 8,dst,7
- stxvd2x 9,dst,8
+ STVX 6,0,dst
+ STVX 7,dst,6
+ STVX 8,dst,7
+ STVX 9,dst,8
addi dst,dst,64
bdnz L(aligned_128head)
L(aligned_tail):
mtocrf 0x01,cnt
bf 25,32f
- lxvd2x 6,0,src
- lxvd2x 7,src,6
- lxvd2x 8,src,7
- lxvd2x 9,src,8
+ LVX 6,0,src
+ LVX 7,src,6
+ LVX 8,src,7
+ LVX 9,src,8
addi src,src,64
- stxvd2x 6,0,dst
- stxvd2x 7,dst,6
- stxvd2x 8,dst,7
- stxvd2x 9,dst,8
+ STVX 6,0,dst
+ STVX 7,dst,6
+ STVX 8,dst,7
+ STVX 9,dst,8
addi dst,dst,64
32:
bf 26,16f
- lxvd2x 6,0,src
- lxvd2x 7,src,6
+ LVX 6,0,src
+ LVX 7,src,6
addi src,src,32
- stxvd2x 6,0,dst
- stxvd2x 7,dst,6
+ STVX 6,0,dst
+ STVX 7,dst,6
addi dst,dst,32
16:
bf 27,8f
- lxvd2x 6,0,src
+ LVX 6,0,src
addi src,src,16
- stxvd2x 6,0,dst
+ STVX 6,0,dst
addi dst,dst,16
8:
bf 28,4f
diff --git a/sysdeps/powerpc/powerpc64/power7/memmove.S b/sysdeps/powerpc/powerpc64/power7/memmove.S
index 93baa69..253f541 100644
--- a/sysdeps/powerpc/powerpc64/power7/memmove.S
+++ b/sysdeps/powerpc/powerpc64/power7/memmove.S
@@ -30,6 +30,10 @@
#ifndef MEMMOVE
# define MEMMOVE memmove
#endif
+
+#define LVX lxvd2x
+#define STVX stxvd2x
+
.machine power7
ENTRY_TOCLESS (MEMMOVE, 5)
CALL_MCOUNT 3
@@ -92,63 +96,63 @@ L(aligned_copy):
srdi 12,r5,7
cmpdi 12,0
beq L(aligned_tail)
- lxvd2x 6,0,r4
- lxvd2x 7,r4,6
+ LVX 6,0,r4
+ LVX 7,r4,6
mtctr 12
b L(aligned_128loop)
.align 4
L(aligned_128head):
/* for the 2nd + iteration of this loop. */
- lxvd2x 6,0,r4
- lxvd2x 7,r4,6
+ LVX 6,0,r4
+ LVX 7,r4,6
L(aligned_128loop):
- lxvd2x 8,r4,7
- lxvd2x 9,r4,8
- stxvd2x 6,0,r11
+ LVX 8,r4,7
+ LVX 9,r4,8
+ STVX 6,0,r11
addi r4,r4,64
- stxvd2x 7,r11,6
- stxvd2x 8,r11,7
- stxvd2x 9,r11,8
- lxvd2x 6,0,r4
- lxvd2x 7,r4,6
+ STVX 7,r11,6
+ STVX 8,r11,7
+ STVX 9,r11,8
+ LVX 6,0,r4
+ LVX 7,r4,6
addi r11,r11,64
- lxvd2x 8,r4,7
- lxvd2x 9,r4,8
+ LVX 8,r4,7
+ LVX 9,r4,8
addi r4,r4,64
- stxvd2x 6,0,r11
- stxvd2x 7,r11,6
- stxvd2x 8,r11,7
- stxvd2x 9,r11,8
+ STVX 6,0,r11
+ STVX 7,r11,6
+ STVX 8,r11,7
+ STVX 9,r11,8
addi r11,r11,64
bdnz L(aligned_128head)
L(aligned_tail):
mtocrf 0x01,r5
bf 25,32f
- lxvd2x 6,0,r4
- lxvd2x 7,r4,6
- lxvd2x 8,r4,7
- lxvd2x 9,r4,8
+ LVX 6,0,r4
+ LVX 7,r4,6
+ LVX 8,r4,7
+ LVX 9,r4,8
addi r4,r4,64
- stxvd2x 6,0,r11
- stxvd2x 7,r11,6
- stxvd2x 8,r11,7
- stxvd2x 9,r11,8
+ STVX 6,0,r11
+ STVX 7,r11,6
+ STVX 8,r11,7
+ STVX 9,r11,8
addi r11,r11,64
32:
bf 26,16f
- lxvd2x 6,0,r4
- lxvd2x 7,r4,6
+ LVX 6,0,r4
+ LVX 7,r4,6
addi r4,r4,32
- stxvd2x 6,0,r11
- stxvd2x 7,r11,6
+ STVX 6,0,r11
+ STVX 7,r11,6
addi r11,r11,32
16:
bf 27,8f
- lxvd2x 6,0,r4
+ LVX 6,0,r4
addi r4,r4,16
- stxvd2x 6,0,r11
+ STVX 6,0,r11
addi r11,r11,16
8:
bf 28,4f
@@ -488,63 +492,63 @@ L(aligned_copy_bwd):
srdi r12,r5,7
cmpdi r12,0
beq L(aligned_tail_bwd)
- lxvd2x v6,r4,r6
- lxvd2x v7,r4,r7
+ LVX v6,r4,r6
+ LVX v7,r4,r7
mtctr 12
b L(aligned_128loop_bwd)
.align 4
L(aligned_128head_bwd):
/* for the 2nd + iteration of this loop. */
- lxvd2x v6,r4,r6
- lxvd2x v7,r4,r7
+ LVX v6,r4,r6
+ LVX v7,r4,r7
L(aligned_128loop_bwd):
- lxvd2x v8,r4,r8
- lxvd2x v9,r4,r9
- stxvd2x v6,r11,r6
+ LVX v8,r4,r8
+ LVX v9,r4,r9
+ STVX v6,r11,r6
subi r4,r4,64
- stxvd2x v7,r11,r7
- stxvd2x v8,r11,r8
- stxvd2x v9,r11,r9
- lxvd2x v6,r4,r6
- lxvd2x v7,r4,7
+ STVX v7,r11,r7
+ STVX v8,r11,r8
+ STVX v9,r11,r9
+ LVX v6,r4,r6
+ LVX v7,r4,7
subi r11,r11,64
- lxvd2x v8,r4,r8
- lxvd2x v9,r4,r9
+ LVX v8,r4,r8
+ LVX v9,r4,r9
subi r4,r4,64
- stxvd2x v6,r11,r6
- stxvd2x v7,r11,r7
- stxvd2x v8,r11,r8
- stxvd2x v9,r11,r9
+ STVX v6,r11,r6
+ STVX v7,r11,r7
+ STVX v8,r11,r8
+ STVX v9,r11,r9
subi r11,r11,64
bdnz L(aligned_128head_bwd)
L(aligned_tail_bwd):
mtocrf 0x01,r5
bf 25,32f
- lxvd2x v6,r4,r6
- lxvd2x v7,r4,r7
- lxvd2x v8,r4,r8
- lxvd2x v9,r4,r9
+ LVX v6,r4,r6
+ LVX v7,r4,r7
+ LVX v8,r4,r8
+ LVX v9,r4,r9
subi r4,r4,64
- stxvd2x v6,r11,r6
- stxvd2x v7,r11,r7
- stxvd2x v8,r11,r8
- stxvd2x v9,r11,r9
+ STVX v6,r11,r6
+ STVX v7,r11,r7
+ STVX v8,r11,r8
+ STVX v9,r11,r9
subi r11,r11,64
32:
bf 26,16f
- lxvd2x v6,r4,r6
- lxvd2x v7,r4,r7
+ LVX v6,r4,r6
+ LVX v7,r4,r7
subi r4,r4,32
- stxvd2x v6,r11,r6
- stxvd2x v7,r11,r7
+ STVX v6,r11,r6
+ STVX v7,r11,r7
subi r11,r11,32
16:
bf 27,8f
- lxvd2x v6,r4,r6
+ LVX v6,r4,r6
subi r4,r4,16
- stxvd2x v6,r11,r6
+ STVX v6,r11,r6
subi r11,r11,16
8:
bf 28,4f
@@ -832,4 +836,6 @@ ENTRY_TOCLESS (__bcopy)
mr r4,r6
b L(_memmove)
END (__bcopy)
+#ifndef __bcopy
weak_alias (__bcopy, bcopy)
+#endif
diff --git a/sysdeps/powerpc/powerpc64/power9/memcpy.S b/sysdeps/powerpc/powerpc64/power9/memcpy.S
new file mode 100644
index 0000000..d827cdf
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/power9/memcpy.S
@@ -0,0 +1,23 @@
+/* Optimized memcpy implementation for PowerPC64/POWER9.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* Avoid unnecessary traps on cache-inhibited memory on POWER9 DD2.1. */
+#define LVX lvx
+#define STVX stvx
+
+#include <sysdeps/powerpc/powerpc64/power7/memcpy.S>
diff --git a/sysdeps/powerpc/powerpc64/power9/memmove.S b/sysdeps/powerpc/powerpc64/power9/memmove.S
new file mode 100644
index 0000000..2c5887e
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/power9/memmove.S
@@ -0,0 +1,22 @@
+/* Optimized memmove implementation for PowerPC64/POWER9.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#define LVX lxvd2x
+#define STVX stxvd2x
+
+#include <sysdeps/powerpc/powerpc64/power7/memmove.S>
--
2.9.5