x86: Support Intel AVX VNNI

Jan Beulich jbeulich@suse.com
Thu Oct 15 16:04:14 GMT 2020


On 15.10.2020 17:34, H.J. Lu wrote:
> On Thu, Oct 15, 2020 at 8:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 15.10.2020 17:23, H.J. Lu wrote:
>>> On Thu, Oct 15, 2020 at 8:22 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 15.10.2020 14:38, H.J. Lu wrote:
>>>>> On Thu, Oct 15, 2020 at 5:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 15.10.2020 13:15, H.J. Lu wrote:
>>>>>>> On Thu, Oct 15, 2020 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>> On 15.10.2020 09:10, Cui, Lili wrote:
>>>>>>>>>>>> @@ -1964,7 +1967,14 @@ cpu_flags_match (const insn_template *t)
>>>>>>>>>>>>        cpu = cpu_flags_and (x, cpu);
>>>>>>>>>>>>        if (!cpu_flags_all_zero (&cpu))
>>>>>>>>>>>>       {
>>>>>>>>>>>> -       if (x.bitfield.cpuavx)
>>>>>>>>>>>> +       if (x.bitfield.cpuvex_prefix)
>>>>>>>>>>>> +         {
>>>>>>>>>>>> +           /* We need to check a few extra flags with VEX_PREFIX.  */
>>>>>>>>>>>> +           if (i.vec_encoding == vex_encoding_vex
>>>>>>>>>>>> +               || i.vec_encoding == vex_encoding_vex3)
>>>>>>>>>>>> +             match |= CPU_FLAGS_ARCH_MATCH;
>>>>>>>>>>>> +         }
>>>>>>>>>>>> +       else if (x.bitfield.cpuavx)
>>>>>>>>>>>
>>>>>>>>>>> Is this (including the new cpuvex_prefix attribute, which imo
>>>>>>>>>>> shouldn't be a Cpu* bit) really needed? Couldn't you achieve the same
>>>>>>>>>>> by placing the templates _after_ the AVX512 counterparts? Iirc
>>>>>>>>>>> templates get tried in order, and the first match wins. The {vex3}
>>>>>>>>>>> prefix would then prevent a match on the EVEX-encoded AVX512_VNNI
>>>>>>>>>> templates.
>>>>>>>>>>
>>>>>>>>>> Lili, please look into it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I add an invalid test for it, we need cpuvex_prefix attribute for under scenario.
>>>>>>>>>
>>>>>>>>> .arch .noavx512_vnni
>>>>>>>>> vpdpbusd %xmm2,%xmm4,%xmm2
>>>>>>>>>
>>>>>>>>> As without the pseudo {vex} prefix, this instruction should be encoded with EVEX prefix.
>>>>>>>>> we should report error for it, I rename CpuVEX_PREFIX to PseudoVexPrefix
>>>>>>>>> and move it into opcode_modifier bit, thanks.
>>>>>>>>
>>>>>>>> I disagree, unless AVX-VNNI was specified to have a dependency on
>>>>>>>> AVX512-VNNI (which would seem pretty odd, as meanwhile I've noticed
>>>>>>>> that another reason for introducing these encodings may be to allow
>>>>>>>> their use on AVX512-incapable hardware). The above very much should
>>>>>>>> result in the VEX encoding despite the absence of a {vex} prefix.
>>>>>>>> It's really only the default case of everything being enabled where
>>>>>>>> the pseudo-prefix should be mandated. This particularly implies
>>>>>>>> that an explicit ".arch .avx_vnni" ought to _also_ eliminate the
>>>>>>>> need for the pseudo prefix.
>>>>>>>
>>>>>>> AVX VNNI always requires the {vex} prefix.  It isn't optional.
>>>>>>
>>>>>> That's said or written where? These are new insns with - afaict - no
>>>>>> specification beyond the ISA extensions doc. There's nothing like
>>>>>
>>>>> This is true.  When we implemented AVX VNNI, we decided that
>>>>> the {vex} prefix is mandatory so that
>>>>>
>>>>> vpdpbusd %xmm2,%xmm4,%xmm2
>>>>>
>>>>> always mean EVEX encoding.
>>>>
>>>> And this decision was discussed internally at Intel, and other
>>>
>>> Internally.
>>>
>>>> community members get no say at all?
>>
>> Please can such discussions be had in the public for open source projects?
> 
> The discussion happened around the time when I added
> 
> commit 86fa6981e7487e2c2df4337aa75ed2d93c32eaf2
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date:   Thu Mar 9 09:58:46 2017 -0800
> 
>     X86: Add pseudo prefixes to control encoding
> 
>     Many x86 instructions have more than one encodings.  Assembler picks
>     the default one, usually the shortest one.  Although the ".s", ".d8"
>     and ".d32" suffixes can be used to swap register operands or specify
>     displacement size, they aren't very flexible.  This patch adds pseudo
>     prefixes, {xxx}, to control instruction encoding.  The available
>     pseudo prefixes are {disp8}, {disp32}, {load}, {store}, {vex2}, {vex3}
>     and {evex}.  Pseudo prefixes are preferred over the ".s", ".d8" and
>     ".d32" suffixes, which are deprecated.
> 
> It wasn't practical to discuss it in public.

Wow.

>> I continue to think that the behavior as implemented is not the best
>> possible choice. Therefore I'd like to at least hear the arguments that
>> led to this decision.
> 
> Please send me your detailed comments.  I will forward it to our internal
> group.

I've given my two points already - there are two cases where the
pseudo prefix shouldn't be required. Plus, as also said, the
disassembler shouldn't display it by default.

Jan


More information about the Binutils mailing list