This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
[PATCH] x86: Correct EVEX vector load/store optimization
On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -4075,6 +4075,56 @@ optimize_encoding (void)
> > i.types[j].bitfield.ymmword = 0;
> > }
> > }
> > + else if ((cpu_arch_flags.bitfield.cpuavx
> > + || cpu_arch_isa_flags.bitfield.cpuavx)
>
> Once again a questionable condition, as per earlier replies to
> other patches of yours.
Fixed.
> > + && i.vec_encoding != vex_encoding_evex
> > + && !i.types[0].bitfield.zmmword
> > + && !i.mask
> > + && is_evex_encoding (&i.tm)
> > + && (i.tm.base_opcode == 0x666f
> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f
> > + || i.tm.base_opcode == 0xf36f
> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f
> > + || i.tm.base_opcode == 0xf26f
> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
>
> All three of these can be expressed with just a single comparison,
> using & or | instead of ^ and (if necessary) adjusting the literal
> value compared against.
Fixed.
> > + && i.tm.extension_opcode == None)
> > + {
> > + /* Optimize: -O1:
> > + VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16,
> > + vmovdqu32 and vmovdqu64:
> > + EVEX VOP %xmmM, %xmmN
> > + -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16)
> > + EVEX VOP %ymmM, %ymmN
> > + -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16)
> > + EVEX VOP %xmmM, mem
> > + -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16)
> > + EVEX VOP %ymmM, mem
> > + -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16)
> > + EVEX VOP mem, %xmmN
> > + -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16)
>
> There's some confusion on this line.
>
> > + EVEX VOP mem, %ymmN
> > + -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)
> > + */
>
> For the variants with a memory operand I doubt the conversion
> is always a win, and it may be against the user request in case of
> -Os. This is because of the Disp8 scaling the EVEX encoding permits.
Fixed.
> > + if (i.tm.base_opcode == 0xf26f)
> > + i.tm.base_opcode = 0xf36f;
> > + else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
> > + i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD;
>
> This again can be expressed without "else if()" afaict.
>
Fixed.
Here is the patch.
Thanks.
--
H.J.
From 84ecabf0624411c1ab95bfadbd864aa4b226b2e8 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 19 Mar 2019 10:56:39 +0800
Subject: [PATCH] x86: Correct EVEX vector load/store optimization
Update EVEX vector load/store optimization:
1. There is no need to check AVX since AVX2 is required for AVX512F.
2. We need to check both operands for ZMM register since AT&T syntax
may not set zmmword on the first operand.
3. Update Opcode_SIMD_IntD check and set.
4. Since the VEX prefix has 2 or 3 bytes, the EVEX prefix has 4 bytes,
EVEX Disp8 has 1 byte and VEX Disp32 has 4 bytes, we choose EVEX Disp8
over VEX Disp32.
* config/tc-i386.c (optimize_encoding): Don't check AVX for
EVEX vector load/store optimization. Check both operands for
ZMM register. Update EVEX vector load/store opcode check.
Choose EVEX Disp8 over VEX Disp32.
* testsuite/gas/i386/optimize-1.d: Updated.
* testsuite/gas/i386/optimize-1a.d: Likewise.
* testsuite/gas/i386/optimize-2.d: Likewise.
* testsuite/gas/i386/optimize-4.d: Likewise.
* testsuite/gas/i386/optimize-5.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-2.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-2a.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-2b.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-3.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-5.d: Likewise.
* testsuite/gas/i386/x86-64-optimize-6.d: Likewise.
* testsuite/gas/i386/optimize-1.s: Add ZMM register load
test.
* testsuite/gas/i386/x86-64-optimize-2.s: Likewise.
---
gas/config/tc-i386.c | 46 +++++++++++++++------
gas/testsuite/gas/i386/optimize-1.d | 25 +++++------
gas/testsuite/gas/i386/optimize-1.s | 2 +
gas/testsuite/gas/i386/optimize-1a.d | 25 +++++------
gas/testsuite/gas/i386/optimize-2.d | 24 +++++------
gas/testsuite/gas/i386/optimize-4.d | 25 +++++------
gas/testsuite/gas/i386/optimize-5.d | 25 +++++------
gas/testsuite/gas/i386/x86-64-optimize-2.d | 25 +++++------
gas/testsuite/gas/i386/x86-64-optimize-2.s | 2 +
gas/testsuite/gas/i386/x86-64-optimize-2a.d | 25 +++++------
gas/testsuite/gas/i386/x86-64-optimize-2b.d | 25 +++++------
gas/testsuite/gas/i386/x86-64-optimize-3.d | 24 +++++------
gas/testsuite/gas/i386/x86-64-optimize-5.d | 25 +++++------
gas/testsuite/gas/i386/x86-64-optimize-6.d | 25 +++++------
14 files changed, 178 insertions(+), 145 deletions(-)
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 3885728de7..3447fe0fa3 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -4068,18 +4068,14 @@ optimize_encoding (void)
i.types[j].bitfield.ymmword = 0;
}
}
- else if ((cpu_arch_flags.bitfield.cpuavx
- || cpu_arch_isa_flags.bitfield.cpuavx)
- && i.vec_encoding != vex_encoding_evex
+ else if (i.vec_encoding != vex_encoding_evex
&& !i.types[0].bitfield.zmmword
+ && !i.types[1].bitfield.zmmword
&& !i.mask
&& is_evex_encoding (&i.tm)
- && (i.tm.base_opcode == 0x666f
- || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f
- || i.tm.base_opcode == 0xf36f
- || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f
- || i.tm.base_opcode == 0xf26f
- || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
+ && ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0x666f
+ || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf36f
+ || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)
&& i.tm.extension_opcode == None)
{
/* Optimize: -O1:
@@ -4098,10 +4094,34 @@ optimize_encoding (void)
EVEX VOP mem, %ymmN
-> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)
*/
- if (i.tm.base_opcode == 0xf26f)
- i.tm.base_opcode = 0xf36f;
- else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
- i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD;
+ for (j = 0; j < 2; j++)
+ if (operand_type_check (i.types[j], disp)
+ && i.op[j].disps->X_op == O_constant)
+ {
+ /* Since the VEX prefix has 2 or 3 bytes, the EVEX prefix
+ has 4 bytes, EVEX Disp8 has 1 byte and VEX Disp32 has 4
+ bytes, we choose EVEX Disp8 over VEX Disp32. */
+ int evex_disp8, vex_disp8;
+ unsigned int memshift = i.memshift;
+ offsetT n = i.op[j].disps->X_add_number;
+
+ evex_disp8 = fits_in_disp8 (n);
+ i.memshift = 0;
+ vex_disp8 = fits_in_disp8 (n);
+ if (evex_disp8 != vex_disp8)
+ {
+ i.memshift = memshift;
+ return;
+ }
+
+ i.types[j].bitfield.disp8 = vex_disp8;
+ break;
+ }
+ if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)
+ {
+ i.tm.base_opcode &= Opcode_SIMD_IntD;
+ i.tm.base_opcode |= 0xf36f;
+ }
i.tm.opcode_modifier.vex
= i.types[0].bitfield.ymmword ? VEX256 : VEX128;
i.tm.opcode_modifier.vexw = VEXW0;
diff --git a/gas/testsuite/gas/i386/optimize-1.d b/gas/testsuite/gas/i386/optimize-1.d
index 70c802c002..2f40c72a4e 100644
--- a/gas/testsuite/gas/i386/optimize-1.d
+++ b/gas/testsuite/gas/i386/optimize-1.d
@@ -74,12 +74,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -92,10 +92,11 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2
#pass
diff --git a/gas/testsuite/gas/i386/optimize-1.s b/gas/testsuite/gas/i386/optimize-1.s
index 6dcfbc2799..4c15d16c2a 100644
--- a/gas/testsuite/gas/i386/optimize-1.s
+++ b/gas/testsuite/gas/i386/optimize-1.s
@@ -114,3 +114,5 @@ _start:
vmovdqu16 %ymm1, 128(%eax)
vmovdqu32 %ymm1, 128(%eax)
vmovdqu64 %ymm1, 128(%eax)
+
+ vmovdqa32 (%eax), %zmm2
diff --git a/gas/testsuite/gas/i386/optimize-1a.d b/gas/testsuite/gas/i386/optimize-1a.d
index cee2383d84..d7c253a6fa 100644
--- a/gas/testsuite/gas/i386/optimize-1a.d
+++ b/gas/testsuite/gas/i386/optimize-1a.d
@@ -75,12 +75,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -93,10 +93,11 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2
#pass
diff --git a/gas/testsuite/gas/i386/optimize-2.d b/gas/testsuite/gas/i386/optimize-2.d
index 19467f5c01..ed61dec6fa 100644
--- a/gas/testsuite/gas/i386/optimize-2.d
+++ b/gas/testsuite/gas/i386/optimize-2.d
@@ -29,12 +29,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -47,12 +47,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\)
+[a-f0-9]+: 62 f1 7d 48 6f d1 vmovdqa32 %zmm1,%zmm2
+[a-f0-9]+: 62 f1 fd 48 6f d1 vmovdqa64 %zmm1,%zmm2
+[a-f0-9]+: 62 f1 7f 48 6f d1 vmovdqu8 %zmm1,%zmm2
diff --git a/gas/testsuite/gas/i386/optimize-4.d b/gas/testsuite/gas/i386/optimize-4.d
index 2df84654d6..f062ad7717 100644
--- a/gas/testsuite/gas/i386/optimize-4.d
+++ b/gas/testsuite/gas/i386/optimize-4.d
@@ -74,12 +74,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -92,12 +92,13 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2
+[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5
#pass
diff --git a/gas/testsuite/gas/i386/optimize-5.d b/gas/testsuite/gas/i386/optimize-5.d
index ecc1ab139a..fdf5561af8 100644
--- a/gas/testsuite/gas/i386/optimize-5.d
+++ b/gas/testsuite/gas/i386/optimize-5.d
@@ -74,12 +74,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -92,12 +92,13 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\)
+ +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2
+[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.d b/gas/testsuite/gas/i386/x86-64-optimize-2.d
index 067df076f7..45b98ae694 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2.d
@@ -124,12 +124,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -148,10 +148,11 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2
#pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.s b/gas/testsuite/gas/i386/x86-64-optimize-2.s
index 1275610e55..e5d298225a 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2.s
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2.s
@@ -170,3 +170,5 @@ _start:
vmovdqu16 %ymm1, 128(%rax)
vmovdqu32 %ymm1, 128(%rax)
vmovdqu64 %ymm1, 128(%rax)
+
+ vmovdqa32 (%rax), %zmm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2a.d b/gas/testsuite/gas/i386/x86-64-optimize-2a.d
index 532a1458bc..39385b96ec 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2a.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2a.d
@@ -125,12 +125,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -149,10 +149,11 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2
#pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2b.d b/gas/testsuite/gas/i386/x86-64-optimize-2b.d
index 09474a1016..3eb3a59eac 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2b.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2b.d
@@ -124,12 +124,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -148,10 +148,11 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2
#pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-3.d b/gas/testsuite/gas/i386/x86-64-optimize-3.d
index 74336a4fe2..5e2832df4c 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-3.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-3.d
@@ -43,12 +43,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -67,12 +67,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\)
+[a-f0-9]+: 62 b1 7d 08 6f d5 vmovdqa32 %xmm21,%xmm2
+[a-f0-9]+: 62 b1 fd 08 6f d5 vmovdqa64 %xmm21,%xmm2
+[a-f0-9]+: 62 b1 7f 08 6f d5 vmovdqu8 %xmm21,%xmm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-5.d b/gas/testsuite/gas/i386/x86-64-optimize-5.d
index 012237df57..5065d650d4 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-5.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-5.d
@@ -124,12 +124,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -148,12 +148,13 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2
+[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-6.d b/gas/testsuite/gas/i386/x86-64-optimize-6.d
index aca119e4f9..8ebd9b2475 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-6.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-6.d
@@ -124,12 +124,12 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
+[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\)
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2
+[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2
@@ -148,12 +148,13 @@ Disassembly of section .text:
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
+[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
- +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\)
+ +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2
+[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2
--
2.20.1