x86: Support Intel AVX VNNI
Jan Beulich
jbeulich@suse.com
Thu Oct 15 16:04:14 GMT 2020
On 15.10.2020 17:34, H.J. Lu wrote:
> On Thu, Oct 15, 2020 at 8:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 15.10.2020 17:23, H.J. Lu wrote:
>>> On Thu, Oct 15, 2020 at 8:22 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 15.10.2020 14:38, H.J. Lu wrote:
>>>>> On Thu, Oct 15, 2020 at 5:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 15.10.2020 13:15, H.J. Lu wrote:
>>>>>>> On Thu, Oct 15, 2020 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>> On 15.10.2020 09:10, Cui, Lili wrote:
>>>>>>>>>>>> @@ -1964,7 +1967,14 @@ cpu_flags_match (const insn_template *t)
>>>>>>>>>>>> cpu = cpu_flags_and (x, cpu);
>>>>>>>>>>>> if (!cpu_flags_all_zero (&cpu))
>>>>>>>>>>>> {
>>>>>>>>>>>> - if (x.bitfield.cpuavx)
>>>>>>>>>>>> + if (x.bitfield.cpuvex_prefix)
>>>>>>>>>>>> + {
>>>>>>>>>>>> + /* We need to check a few extra flags with VEX_PREFIX. */
>>>>>>>>>>>> + if (i.vec_encoding == vex_encoding_vex
>>>>>>>>>>>> + || i.vec_encoding == vex_encoding_vex3)
>>>>>>>>>>>> + match |= CPU_FLAGS_ARCH_MATCH;
>>>>>>>>>>>> + }
>>>>>>>>>>>> + else if (x.bitfield.cpuavx)
>>>>>>>>>>>
>>>>>>>>>>> Is this (including the new cpuvex_prefix attribute, which imo
>>>>>>>>>>> shouldn't be a Cpu* bit) really needed? Couldn't you achieve the same
>>>>>>>>>>> by placing the templates _after_ the AVX512 counterparts? Iirc
>>>>>>>>>>> templates get tried in order, and the first match wins. The {vex3}
>>>>>>>>>>> prefix would then prevent a match on the EVEX-encoded AVX512_VNNI
>>>>>>>>>> templates.
>>>>>>>>>>
>>>>>>>>>> Lili, please look into it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I add an invalid test for it, we need cpuvex_prefix attribute for under scenario.
>>>>>>>>>
>>>>>>>>> .arch .noavx512_vnni
>>>>>>>>> vpdpbusd %xmm2,%xmm4,%xmm2
>>>>>>>>>
>>>>>>>>> As without the pseudo {vex} prefix, this instruction should be encoded with EVEX prefix.
>>>>>>>>> we should report error for it, I rename CpuVEX_PREFIX to PseudoVexPrefix
>>>>>>>>> and move it into opcode_modifier bit, thanks.
>>>>>>>>
>>>>>>>> I disagree, unless AVX-VNNI was specified to have a dependency on
>>>>>>>> AVX512-VNNI (which would seem pretty odd, as meanwhile I've noticed
>>>>>>>> that another reason for introducing these encodings may be to allow
>>>>>>>> their use on AVX512-incapable hardware). The above very much should
>>>>>>>> result in the VEX encoding despite the absence of a {vex} prefix.
>>>>>>>> It's really only the default case of everything being enabled where
>>>>>>>> the pseudo-prefix should be mandated. This particularly implies
>>>>>>>> that an explicit ".arch .avx_vnni" ought to _also_ eliminate the
>>>>>>>> need for the pseudo prefix.
>>>>>>>
>>>>>>> AVX VNNI always requires the {vex} prefix. It isn't optional.
>>>>>>
>>>>>> That's said or written where? These are new insns with - afaict - no
>>>>>> specification beyond the ISA extensions doc. There's nothing like
>>>>>
>>>>> This is true. When we implemented AVX VNNI, we decided that
>>>>> the {vex} prefix is mandatory so that
>>>>>
>>>>> vpdpbusd %xmm2,%xmm4,%xmm2
>>>>>
>>>>> always mean EVEX encoding.
>>>>
>>>> And this decision was discussed internally at Intel, and other
>>>
>>> Internally.
>>>
>>>> community members get no say at all?
>>
>> Please can such discussions be had in the public for open source projects?
>
> The discussion happened around the time when I added
>
> commit 86fa6981e7487e2c2df4337aa75ed2d93c32eaf2
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date: Thu Mar 9 09:58:46 2017 -0800
>
> X86: Add pseudo prefixes to control encoding
>
> Many x86 instructions have more than one encodings. Assembler picks
> the default one, usually the shortest one. Although the ".s", ".d8"
> and ".d32" suffixes can be used to swap register operands or specify
> displacement size, they aren't very flexible. This patch adds pseudo
> prefixes, {xxx}, to control instruction encoding. The available
> pseudo prefixes are {disp8}, {disp32}, {load}, {store}, {vex2}, {vex3}
> and {evex}. Pseudo prefixes are preferred over the ".s", ".d8" and
> ".d32" suffixes, which are deprecated.
>
> It wasn't practical to discuss it in public.
Wow.
>> I continue to think that the behavior as implemented is not the best
>> possible choice. Therefore I'd like to at least hear the arguments that
>> led to this decision.
>
> Please send me your detailed comments. I will forward it to our internal
> group.
I've given my two points already - there are two cases where the
pseudo prefix shouldn't be required. Plus, as also said, the
disassembler shouldn't display it by default.
Jan
More information about the Binutils
mailing list