x86: Support Intel AVX VNNI

H.J. Lu hjl.tools@gmail.com
Wed Oct 14 13:28:14 GMT 2020


On Wed, Oct 14, 2020 at 6:12 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 14.10.2020 08:37, Cui, Lili wrote:
> > Subject: [PATCH] x86: Support Intel AVX VNNI
> >
> > Intel AVX VNNI instructions are marked with CpuVEX_PREFIX.  Without the
> > pseudo {vex} prefix, mnemonics of Intel VNNI instructions are encoded
> > with the EVEX prefix.  The pseudo {vex} prefix can be used to encode
> > mnemonics of Intel VNNI instructions with the VEX prefix.
>
> I take it this is (on the gas side) to avoid breaking existing code
> by suddenly switching from EVEX- to VEX-encoding. Ugly, but well ...
> (I imply that there are or are going to be CPUs with AVX512_VNNI but

Cascadelake has AVX512 VNNI.

> without AVX_VNNI, or else all of this would be quite pointless.) On
> the disassembler side, though, I don't see at all why the {vex3}
> thingy needs printing. We don't do so for other mnemonics allowing
> perhaps all of 2-byte VEX, 3-byte VEX, and EVEX encoding.
>
> I wonder anyway why these separate encodings are being introduced
> after their EVEX ones. Is this just because in many cases the
> encodings are shorter? If so, why isn't this done more consistently
> for all AVX512 insns not (yet) having VEX encoded counterparts?

Good questions.

> > @@ -1964,7 +1967,14 @@ cpu_flags_match (const insn_template *t)
> >        cpu = cpu_flags_and (x, cpu);
> >        if (!cpu_flags_all_zero (&cpu))
> >       {
> > -       if (x.bitfield.cpuavx)
> > +       if (x.bitfield.cpuvex_prefix)
> > +         {
> > +           /* We need to check a few extra flags with VEX_PREFIX.  */
> > +           if (i.vec_encoding == vex_encoding_vex
> > +               || i.vec_encoding == vex_encoding_vex3)
> > +             match |= CPU_FLAGS_ARCH_MATCH;
> > +         }
> > +       else if (x.bitfield.cpuavx)
>
> Is this (including the new cpuvex_prefix attribute, which imo shouldn't
> be a Cpu* bit) really needed? Couldn't you achieve the same by placing
> the templates _after_ the AVX512 counterparts? Iirc templates get
> tried in order, and the first match wins. The {vex3} prefix would then
> prevent a match on the EVEX-encoded AVX512_VNNI templates.

Lili, please look into it.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/avx-vnni.s
> > @@ -0,0 +1,22 @@
> > +     .allow_index_reg
> > +
> > +.macro test_insn mnemonic
> > +     \mnemonic       %xmm2, %xmm4, %xmm2
> > +     {evex} \mnemonic %xmm2, %xmm4, %xmm2
> > +     {vex}  \mnemonic %xmm2, %xmm4, %xmm2
> > +     {vex2} \mnemonic %xmm2, %xmm4, %xmm2
> > +     {vex3} \mnemonic %xmm2, %xmm4, %xmm2
> > +     {vex}  \mnemonic (%ecx), %xmm4, %xmm2
> > +     {vex2} \mnemonic (%ecx), %xmm4, %xmm2
> > +     {vex3} \mnemonic (%ecx), %xmm4, %xmm2
> > +.endm
>
> I question the {vex2} cases here: Bot {vex} and {vex3} are
> mandatory prefixes, while {vex2} has been deprecated (and it's
> not even documented anymore). If anything, {vex2} should result
> in an error here. But for simplicity omitting the tests now
> (allowing the case to become an error later on) would be fine
> with me.

Lili, let's drop {vex2} test.

> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -3904,6 +3904,16 @@ vpshrdw, 4, 0x6672, None, 1, CpuAVX512_VBMI2, Modrm|Masking=3|OpcodePrefix=2|Vex
> >
> >  // AVX512_VBMI2 instructions end
> >
> > +// AVX_VNNI instructions
> > +
> > +vpdpbusd, 3, 0x6650, None, 1, CpuAVX_VNNI|CpuVEX_PREFIX, Modrm|Vex|OpcodePrefix=1|VexVVVV=1|VexW=1|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
> > +vpdpwssd, 3, 0x6652, None, 1, CpuAVX_VNNI|CpuVEX_PREFIX, Modrm|Vex|OpcodePrefix=1|VexVVVV=1|VexW=1|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
> > +
> > +vpdpbusds, 3, 0x6651, None, 1, CpuAVX_VNNI|CpuVEX_PREFIX, Modrm|Vex|OpcodePrefix=1|VexVVVV=1|VexW=1|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
> > +vpdpwssds, 3, 0x6653, None, 1, CpuAVX_VNNI|CpuVEX_PREFIX, Modrm|Vex|OpcodePrefix=1|VexVVVV=1|VexW=1|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM }
>
> At the very least please use more "modern" (and more meaningful /
> readable) constructs where available, e.g. VexW0 instead of VexW=1.
> But in new code I'd like to further encourage to avoid any
> attributes of the form <name>=<value>, and instead introduce
> suitable #define-s instead. Since we're already transforming the
> code (slowly, but still), this will reduce future churn. (For
> VexVVVV=1 please note that this is equivalent to just VexVVVV
> anyway.)

Lili, please adjust them.

-- 
H.J.


More information about the Binutils mailing list