x86: Support Intel AVX VNNI

Thu Oct 15 15:22:29 GMT 2020

On 15.10.2020 14:38, H.J. Lu wrote:
> On Thu, Oct 15, 2020 at 5:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 15.10.2020 13:15, H.J. Lu wrote:
>>> On Thu, Oct 15, 2020 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>> On 15.10.2020 09:10, Cui, Lili wrote:
>>>>>>>> @@ -1964,7 +1967,14 @@ cpu_flags_match (const insn_template *t)
>>>>>>>>        cpu = cpu_flags_and (x, cpu);
>>>>>>>>        if (!cpu_flags_all_zero (&cpu))
>>>>>>>>       {
>>>>>>>> -       if (x.bitfield.cpuavx)
>>>>>>>> +       if (x.bitfield.cpuvex_prefix)
>>>>>>>> +         {
>>>>>>>> +           /* We need to check a few extra flags with VEX_PREFIX.  */
>>>>>>>> +           if (i.vec_encoding == vex_encoding_vex
>>>>>>>> +               || i.vec_encoding == vex_encoding_vex3)
>>>>>>>> +             match |= CPU_FLAGS_ARCH_MATCH;
>>>>>>>> +         }
>>>>>>>> +       else if (x.bitfield.cpuavx)
>>>>>>>
>>>>>>> Is this (including the new cpuvex_prefix attribute, which imo
>>>>>>> shouldn't be a Cpu* bit) really needed? Couldn't you achieve the same
>>>>>>> by placing the templates _after_ the AVX512 counterparts? Iirc
>>>>>>> templates get tried in order, and the first match wins. The {vex3}
>>>>>>> prefix would then prevent a match on the EVEX-encoded AVX512_VNNI
>>>>>> templates.
>>>>>>
>>>>>> Lili, please look into it.
>>>>>>
>>>>>
>>>>> I add an invalid test for it, we need cpuvex_prefix attribute for under scenario.
>>>>>
>>>>> .arch .noavx512_vnni
>>>>> vpdpbusd %xmm2,%xmm4,%xmm2
>>>>>
>>>>> As without the pseudo {vex} prefix, this instruction should be encoded with EVEX prefix.
>>>>> we should report error for it, I rename CpuVEX_PREFIX to PseudoVexPrefix
>>>>> and move it into opcode_modifier bit, thanks.
>>>>
>>>> I disagree, unless AVX-VNNI was specified to have a dependency on
>>>> AVX512-VNNI (which would seem pretty odd, as meanwhile I've noticed
>>>> that another reason for introducing these encodings may be to allow
>>>> their use on AVX512-incapable hardware). The above very much should
>>>> result in the VEX encoding despite the absence of a {vex} prefix.
>>>> It's really only the default case of everything being enabled where
>>>> the pseudo-prefix should be mandated. This particularly implies
>>>> that an explicit ".arch .avx_vnni" ought to _also_ eliminate the
>>>> need for the pseudo prefix.
>>>
>>> AVX VNNI always requires the {vex} prefix.  It isn't optional.
>>
>> That's said or written where? These are new insns with - afaict - no
>> specification beyond the ISA extensions doc. There's nothing like
> 
> This is true.  When we implemented AVX VNNI, we decided that
> the {vex} prefix is mandatory so that
> 
> vpdpbusd %xmm2,%xmm4,%xmm2
> 
> always mean EVEX encoding.

And this decision was discussed internally at Intel, and other
community members get no say at all?

>> that said there afaics.
>>
>>> It is similar to
>>>
>>> vmovdqu32 %xmm5, %xmm6
>>>
>>> vs
>>>
>>> vmovdqu %xmm5, %xmm6
>>>
>>> It is the 32 suffix vs the {vex} prefix.
>>
>> I don't see the similarity. the 32 / 64 suffix in the EVEX encoding
>> controls EVEX.W. There's nothing similar here.
>>
> 
> There are no EVEX vmovdqu instructions,

Right, another reason why the comparison isn't a helpful one.

Jan

> just like there are no
> AVX VNNI without {vex}.
>