[PATCH] x86: VCVTNEPS2BF16{X,Y} should permit broadcasting

H.J. Lu hjl.tools@gmail.com
Mon Jan 20 15:27:00 GMT 2020


On Mon, Jan 20, 2020 at 7:21 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 20.01.2020 15:42, H.J. Lu wrote:
> > On Mon, Jan 20, 2020 at 6:35 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 20.01.2020 14:58, H.J. Lu wrote:
> >>> On Mon, Jan 20, 2020 at 4:00 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> Just like other VCVT*{X,Y} templates do, and to allow the programmer
> >>>> flexibility (might be relevant in particular when heavily macro-izing
> >>>> code), the two templates should also have Broadcast set, just like their
> >>>> X/Y-suffix-less counterparts. This in turn requires them to also have
> >>>> * Dword set on their memory operands, to cover the logic added to i386gen
> >>>>   by 4a1b91eabbe7 ("x86: Expand Broadcast to 3 bits"),
> >>>> * Xmmword/Ymmword set on their memory operands, to satisfy broadcast
> >>>>   sizing logic in gas itself.
> >>>> Otherwise ATTSyntax templates wouldn't need such operand size attributes.
> >>>>
> >>>> While extending the test cases, also add Intel syntax broadcast forms
> >>>> without explicit size specifiers.
> >>>>
> >>>> gas/
> >>>> 2020-01-XX  Jan Beulich  <jbeulich@suse.com>
> >>>>
> >>>>         * testsuite/gas/i386/avx512_bf16_vl.s,
> >>>>         testsuite/gas/i386/x86-64-avx512_bf16_vl.s: Add broadcast forms
> >>>>         of VCVTNEPS2BF16{X,Y}. Add operand-size less Intel syntax
> >>>>         broadcast forms of VCVTNEPS2BF16.
> >>>>         * testsuite/gas/i386/avx512_bf16_vl.d,
> >>>>         testsuite/gas/i386/x86-64-avx512_bf16_vl.d: Adjust expectations.
> >>>>
> >>>> opcodes/
> >>>> 2020-01-XX  Jan Beulich  <jbeulich@suse.com>
> >>>>
> >>>>         * i386-opc.tbl (vcvtneps2bf16x): Add Broadcast, Xmmword, and
> >>>>         Dword.
> >>>>         (vcvtneps2bf16y): Add Broadcast, Ymmword, and Dword.
> >>>>         * i386-tbl.h: Re-generate.
> >>>>
> >>>
> >>> OK.
> >>
> >> Thanks. What about this remark
> >>
> >>>> ---
> >>>> Arguably, just like other VCVT*{X,Y}, the ones here could then also be
> >>>> made permit RegXMM/RegYMM as source operand. Personally I'd prefer this,
> >>>> but there was resistance to such in similar earlier cases.
> >>
> >> that the original mail had? I think I've meanwhile found a case where
> >> it would actually help if all such templates were consistent in this
> >> regard.
> >
> > Are you referring to use AVX512VL for suffix check? I don't we should
> > do that.
>
> No. I think there might be a broadcast handling anomaly when there's no
> RegXMM (or RegYMM/RegZMM) on such templates. And I also think that the
> same argument (might be used in heavily macro-ized code) calls for
> register operands to be allowed even with these artificial x/y suffixes,
> and despite there then being some redundancy at the source level.
>

Did you mean allow broadcast on register source operand?

-- 
H.J.



More information about the Binutils mailing list