1) vmovdqu %ymm0, %ymm1 as --64 -o test.o test.s Assembles ok. 2) vmovdqu %ymm0, %ymm16 as --64 -o test.o test.s test.s: Assembler messages: test.s:1: Error: unsupported instruction `vmovdqu' 2) Requires the vmovdqu<8/16/32/64> mnemonic. I understand that vmovdqu is the VEX version, while vmovdqu<8/16/32/64> encodes as EVEX. I also understand that 2) requires EVEX. However, I don't see a reason why 2) could not default to one version of vmovdqu<8/16/32/64> with writemask k0. If it doesn't, the consequence is that inline asm e.g. written in C needs to write vmovdqu for ymm <= 15 and vmovsqu32 for ymm > 15. This is inconvenient e.g. for macros. #ifdef __AVX__ vmovdqu ymm0, [] [..] vmovdqu ymm15, [] #ifdef __AVX512F__ vmovdqu32 ymm15, [] [..] vmovdqu32 ymm31, [] #endif #endif as --version GNU assembler (GNU Binutils for Debian) 2.30 Copyright (C) 2018 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or later. This program has absolutely no warranty. This assembler was configured for a target of `x86_64-linux-gnu'. Thanks in advance!0000000000
Created attachment 11683 [details] A patch You can use vmovdqu32 %reg, %reg and pass -O2 or -Os to assembler. Assembler will encode vmovdqu32 as vmovdqu if possible.
The master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=97ed31ae00ea83410f9daf61ece8a606044af365 commit 97ed31ae00ea83410f9daf61ece8a606044af365 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 18 08:56:10 2019 +0800 x86: Optimize EVEX vector load/store instructions When there is no write mask, we can encode lower 16 128-bit/256-bit EVEX vector register load and store instructions as VEX vector register load and store instructions with -O1. gas/ PR gas/24348 * config/tc-i386.c (optimize_encoding): Encode 128-bit and 256-bit EVEX vector register load/store instructions as VEX vector register load/store instructions for -O1. * doc/c-i386.texi: Update -O1 documentation. * testsuite/gas/i386/i386.exp: Run PR gas/24348 tests. * testsuite/gas/i386/optimize-1.s: Add tests for EVEX vector load/store instructions. * testsuite/gas/i386/optimize-2.s: Likewise. * testsuite/gas/i386/optimize-3.s: Likewise. * testsuite/gas/i386/optimize-5.s: Likewise. * testsuite/gas/i386/x86-64-optimize-2.s: Likewise. * testsuite/gas/i386/x86-64-optimize-3.s: Likewise. * testsuite/gas/i386/x86-64-optimize-4.s: Likewise. * testsuite/gas/i386/x86-64-optimize-5.s: Likewise. * testsuite/gas/i386/x86-64-optimize-6.s: Likewise. * testsuite/gas/i386/optimize-1.d: Updated. * testsuite/gas/i386/optimize-2.d: Likewise. * testsuite/gas/i386/optimize-3.d: Likewise. * testsuite/gas/i386/optimize-4.d: Likewise. * testsuite/gas/i386/optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-2.d: Likewise. * testsuite/gas/i386/x86-64-optimize-3.d: Likewise. * testsuite/gas/i386/x86-64-optimize-4.d: Likewise. * testsuite/gas/i386/x86-64-optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-6.d: Likewise. * testsuite/gas/i386/optimize-7.d: New file. * testsuite/gas/i386/optimize-7.s: Likewise. * testsuite/gas/i386/x86-64-optimize-8.d: Likewise. * testsuite/gas/i386/x86-64-optimize-8.s: Likewise. opcodes/ PR gas/24348 * i386-opc.tbl: Add Optimize to vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, vmovdqu32 and vmovdqu64. * i386-tbl.h: Regenerated.
Fixed for 2.33 with -O.