This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH] x86: Support Intel AVX512 BF16
- From: "Jan Beulich" <JBeulich at suse dot com>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>,<xuepeng dot guo at intel dot com>
- Cc: <binutils at sourceware dot org>
- Date: Mon, 08 Apr 2019 02:56:50 -0600
- Subject: Re: [PATCH] x86: Support Intel AVX512 BF16
- References: <20190405180128.GA12026@intel.com>
>>> On 05.04.19 at 20:01, <hongjiu.lu@intel.com> wrote:
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -4710,3 +4710,33 @@ movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|CpuNo64, Modrm|IgnoreSize|No_bSu
> movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|AddrPrefixOpReg, { Unspecified|BaseIndex, Reg32|Reg64 }
>
> // MOVEDIR instructions end.
> +
> +// AVX512_BF16 instructions.
> +
> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Xmmword|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Ymmword|Unspecified|BaseIndex, RegYMM, RegYMM }
> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex=1|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Zmmword|Unspecified|BaseIndex, RegZMM, RegZMM }
Here and below, can you please avoid introducing anew inefficient
table entries with partly irrelevant / redundant attributes:
- use Disp8ShiftVL (and no explicit EVex<NNN> / EVex=<N> nor
CpuAVX512VL), thus folding the above three entries into one
- don't explicitly use [XYZ]mmWord (redundant with Reg[XYZ]MM,
once folding with the non-broadcast forms is also done - see
further down)
- presumably IgnoreSize is not needed
I hope I didn't forget further ones.
For brevity / readability I'd also like to recommend omitting"=1" in
insn attribute specifications. i386-gen assumes 1 whether there's
no explicit value given.
> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }
> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }
Why this second set of three entries? Broadcast is an optional
attribute, i.e. the first entries (really: entry) ought to cover
everything.
> +vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
> +vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }
> +vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }
> +// AVX512_BF16 instructions end.
Can we gain a blank line above this sentinel please?
Thanks, Jan