[PATCH] x86: VCVTNEPS2BF16{X,Y} should permit broadcasting

Jan Beulich jbeulich@suse.com
Mon Jan 20 16:12:00 GMT 2020


On 20.01.2020 17:03, H.J. Lu wrote:
> On Mon, Jan 20, 2020 at 7:36 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 20.01.2020 16:26, H.J. Lu wrote:
>>> On Mon, Jan 20, 2020 at 7:21 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 20.01.2020 15:42, H.J. Lu wrote:
>>>>> On Mon, Jan 20, 2020 at 6:35 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 20.01.2020 14:58, H.J. Lu wrote:
>>>>>>> On Mon, Jan 20, 2020 at 4:00 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> Just like other VCVT*{X,Y} templates do, and to allow the programmer
>>>>>>>> flexibility (might be relevant in particular when heavily macro-izing
>>>>>>>> code), the two templates should also have Broadcast set, just like their
>>>>>>>> X/Y-suffix-less counterparts. This in turn requires them to also have
>>>>>>>> * Dword set on their memory operands, to cover the logic added to i386gen
>>>>>>>>   by 4a1b91eabbe7 ("x86: Expand Broadcast to 3 bits"),
>>>>>>>> * Xmmword/Ymmword set on their memory operands, to satisfy broadcast
>>>>>>>>   sizing logic in gas itself.
>>>>>>>> Otherwise ATTSyntax templates wouldn't need such operand size attributes.
>>>>>>>>
>>>>>>>> While extending the test cases, also add Intel syntax broadcast forms
>>>>>>>> without explicit size specifiers.
>>>>>>>>
>>>>>>>> gas/
>>>>>>>> 2020-01-XX  Jan Beulich  <jbeulich@suse.com>
>>>>>>>>
>>>>>>>>         * testsuite/gas/i386/avx512_bf16_vl.s,
>>>>>>>>         testsuite/gas/i386/x86-64-avx512_bf16_vl.s: Add broadcast forms
>>>>>>>>         of VCVTNEPS2BF16{X,Y}. Add operand-size less Intel syntax
>>>>>>>>         broadcast forms of VCVTNEPS2BF16.
>>>>>>>>         * testsuite/gas/i386/avx512_bf16_vl.d,
>>>>>>>>         testsuite/gas/i386/x86-64-avx512_bf16_vl.d: Adjust expectations.
>>>>>>>>
>>>>>>>> opcodes/
>>>>>>>> 2020-01-XX  Jan Beulich  <jbeulich@suse.com>
>>>>>>>>
>>>>>>>>         * i386-opc.tbl (vcvtneps2bf16x): Add Broadcast, Xmmword, and
>>>>>>>>         Dword.
>>>>>>>>         (vcvtneps2bf16y): Add Broadcast, Ymmword, and Dword.
>>>>>>>>         * i386-tbl.h: Re-generate.
>>>>>>>>
>>>>>>>
>>>>>>> OK.
>>>>>>
>>>>>> Thanks. What about this remark
>>>>>>
>>>>>>>> ---
>>>>>>>> Arguably, just like other VCVT*{X,Y}, the ones here could then also be
>>>>>>>> made permit RegXMM/RegYMM as source operand. Personally I'd prefer this,
>>>>>>>> but there was resistance to such in similar earlier cases.
>>>>>>
>>>>>> that the original mail had? I think I've meanwhile found a case where
>>>>>> it would actually help if all such templates were consistent in this
>>>>>> regard.
>>>>>
>>>>> Are you referring to use AVX512VL for suffix check? I don't we should
>>>>> do that.
>>>>
>>>> No. I think there might be a broadcast handling anomaly when there's no
>>>> RegXMM (or RegYMM/RegZMM) on such templates. And I also think that the
>>>> same argument (might be used in heavily macro-ized code) calls for
>>>> register operands to be allowed even with these artificial x/y suffixes,
>>>> and despite there then being some redundancy at the source level.
>>>
>>> Did you mean allow broadcast on register source operand?
>>
>> No. But the Broadcast attribute is permitted together with e.g. a RegXMM
>> source (you'll find many such templates); it won't be accepted unless
>> the actual source is a memory operand (i.e. the template obviously also
>> has to specify a memory operand as a possibility). To (hopefully)
>> clarify, instead of
>>
>> vcvtneps2bf16x, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode|EVex128|Masking=3|VexW0|Broadcast|Disp8MemShift=4|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Xmmword|Dword|Unspecified|BaseIndex, RegXMM }
>> vcvtneps2bf16y, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode|EVex256|Masking=3|VexW0|Broadcast|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Ymmword|Dword|Unspecified|BaseIndex, RegXMM }
>>
>> I'd then have
>>
>> vcvtneps2bf16x, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode|EVex128|Masking=3|VexW0|Broadcast|Disp8MemShift=4|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM }
>> vcvtneps2bf16y, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode|EVex256|Masking=3|VexW0|Broadcast|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { RegYMM|Dword|Unspecified|BaseIndex, RegXMM }
> 
> This looks OK.

May I take this as approval for the such adjusted patch to go in?

Jan



More information about the Binutils mailing list