x86: Support Intel AVX VNNI

Thu Oct 15 15:23:48 GMT 2020

On Thu, Oct 15, 2020 at 8:22 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 15.10.2020 14:38, H.J. Lu wrote:
> > On Thu, Oct 15, 2020 at 5:28 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 15.10.2020 13:15, H.J. Lu wrote:
> >>> On Thu, Oct 15, 2020 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> On 15.10.2020 09:10, Cui, Lili wrote:
> >>>>>>>> @@ -1964,7 +1967,14 @@ cpu_flags_match (const insn_template *t)
> >>>>>>>>        cpu = cpu_flags_and (x, cpu);
> >>>>>>>>        if (!cpu_flags_all_zero (&cpu))
> >>>>>>>>       {
> >>>>>>>> -       if (x.bitfield.cpuavx)
> >>>>>>>> +       if (x.bitfield.cpuvex_prefix)
> >>>>>>>> +         {
> >>>>>>>> +           /* We need to check a few extra flags with VEX_PREFIX.  */
> >>>>>>>> +           if (i.vec_encoding == vex_encoding_vex
> >>>>>>>> +               || i.vec_encoding == vex_encoding_vex3)
> >>>>>>>> +             match |= CPU_FLAGS_ARCH_MATCH;
> >>>>>>>> +         }
> >>>>>>>> +       else if (x.bitfield.cpuavx)
> >>>>>>>
> >>>>>>> Is this (including the new cpuvex_prefix attribute, which imo
> >>>>>>> shouldn't be a Cpu* bit) really needed? Couldn't you achieve the same
> >>>>>>> by placing the templates _after_ the AVX512 counterparts? Iirc
> >>>>>>> templates get tried in order, and the first match wins. The {vex3}
> >>>>>>> prefix would then prevent a match on the EVEX-encoded AVX512_VNNI
> >>>>>> templates.
> >>>>>>
> >>>>>> Lili, please look into it.
> >>>>>>
> >>>>>
> >>>>> I add an invalid test for it, we need cpuvex_prefix attribute for under scenario.
> >>>>>
> >>>>> .arch .noavx512_vnni
> >>>>> vpdpbusd %xmm2,%xmm4,%xmm2
> >>>>>
> >>>>> As without the pseudo {vex} prefix, this instruction should be encoded with EVEX prefix.
> >>>>> we should report error for it, I rename CpuVEX_PREFIX to PseudoVexPrefix
> >>>>> and move it into opcode_modifier bit, thanks.
> >>>>
> >>>> I disagree, unless AVX-VNNI was specified to have a dependency on
> >>>> AVX512-VNNI (which would seem pretty odd, as meanwhile I've noticed
> >>>> that another reason for introducing these encodings may be to allow
> >>>> their use on AVX512-incapable hardware). The above very much should
> >>>> result in the VEX encoding despite the absence of a {vex} prefix.
> >>>> It's really only the default case of everything being enabled where
> >>>> the pseudo-prefix should be mandated. This particularly implies
> >>>> that an explicit ".arch .avx_vnni" ought to _also_ eliminate the
> >>>> need for the pseudo prefix.
> >>>
> >>> AVX VNNI always requires the {vex} prefix.  It isn't optional.
> >>
> >> That's said or written where? These are new insns with - afaict - no
> >> specification beyond the ISA extensions doc. There's nothing like
> >
> > This is true.  When we implemented AVX VNNI, we decided that
> > the {vex} prefix is mandatory so that
> >
> > vpdpbusd %xmm2,%xmm4,%xmm2
> >
> > always mean EVEX encoding.
>
> And this decision was discussed internally at Intel, and other

Internally.

> community members get no say at all?
>
> >> that said there afaics.
> >>
> >>> It is similar to
> >>>
> >>> vmovdqu32 %xmm5, %xmm6
> >>>
> >>> vs
> >>>
> >>> vmovdqu %xmm5, %xmm6
> >>>
> >>> It is the 32 suffix vs the {vex} prefix.
> >>
> >> I don't see the similarity. the 32 / 64 suffix in the EVEX encoding
> >> controls EVEX.W. There's nothing similar here.
> >>
> >
> > There are no EVEX vmovdqu instructions,
>
> Right, another reason why the comparison isn't a helpful one.
>
> Jan
>
> > just like there are no
> > AVX VNNI without {vex}.
> >
>

-- 
H.J.