x86: Add support for Intel AMX instructions

H.J. Lu hjl.tools@gmail.com
Tue Jun 30 12:20:25 GMT 2020


On Tue, Jun 30, 2020 at 2:48 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 29.06.2020 17:40, H.J. Lu wrote:
> > On Mon, Jun 29, 2020 at 8:22 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 29.06.2020 17:16, H.J. Lu wrote:
> >>> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> On 29.06.2020 14:46, H.J. Lu wrote:
> >>>>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:
> >>>>>>> @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|
> >>>>>>>  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
> >>>>>>>
> >>>>>>>  // TSXLDTRK instructions end.
> >>>>>>> +
> >>>>>>> +// AMX instructions.
> >>>>>>> +
> >>>>>>> +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
> >>>>>>> +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
> >>>>>>
> >>>>>> Aren't these lacking Vex128 and VexW0? Same for I think all further
> >>>>>> entries below; see also the respective test case remark further up.
> >>>>>>
> >>>>>> For Intel syntax these should allow for "qword ptr".
> >>>>>
> >>>>> I don't think it is correct since these 2 instructions take a
> >>>>> 64-memory location.
> >>>>
> >>>> Oh, sorry for mixing bits and bytes. Should be "zmmword ptr" then, which
> >>>> I admit would be kind of ugly/misleading. It still would seem desirable
> >>>> to have a way to explicitly specify memory operand size here, but I have
> >>>> no good other suggestion for the moment.
> >>>
> >>> We don't do this for other instructions with 64-byte memory location, like
> >>> movdir64b.
> >>
> >> Well, yes, I'm aware, but I'm not happy with the situation. Still I
> >> can see why this may not warrant addressing right now.
> >>
> >>>>>>> +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.
> >>>>>>> +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
> >>>>>>> +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
> >>>>>>> +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
> >>>>>>> +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
> >>>>>>> +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
> >>>>>>> +
> >>>>>>> +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
> >>>>>>> +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
> >>>>>>> +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }
> >>>>>>
> >>>>>> As per an earlier comment I think for Intel syntax these ought to accept
> >>>>>> "dword ptr" on their memory operands.
> >>>>>
> >>>>> See above.
> >>>>
> >>>> How "see above"? The units copied are, aiui, dwords. All larger blocks
> >>>> combined from these dwords are dynamically sized, and hence can't be
> >>>> expressed with a static size specifier. Hence "dword ptr" looks
> >>>> applicable to me here.
> >>>>
> >>>
> >>> So far, "dword ptr" means a pointer to dword.  But it isn't the case here.
> >>> Also dword isn't the basic unit.
> >>
> >> Is it not? What does the 'd' suffix in tileloadd and tilestored stand for
> >> then?
> >
> > I will check.  But AMX spec doesn't have any dword operations in
> > tileloadd nor tilestored.
>
> It's not very explicit, but the exception section has various mentions
> along the lines of "#UD if tsrc.colbytes mod 4 != 0". I.e. while
> arithmetic happens on int8 / bf16 units, organization is still in
> dword granularity. Also see the description of the dot product insns,
> which describe how "each dword" gets interpreted by these insns.
>

But we don't use "dword ptr" on vector instructions with dword granularity.

-- 
H.J.


More information about the Binutils mailing list