[PATCH v2 2/2] Support Intel AMX-MOVRS
Jiang, Haochen
haochen.jiang@intel.com
Mon Dec 30 03:25:47 GMT 2024
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, December 27, 2024 8:48 PM
>
> On 24.12.2024 10:24, Haochen Jiang wrote:
> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
> > @@ -135,11 +135,19 @@ _start:
> > sttilecfg 0x123(%r31,%rax,4)
> > tileloadd 0x123(%r31,%rax,4),%tmm6
> > tileloaddt1 0x123(%r31,%rax,4),%tmm6
> > + tileloaddrs 0x10000000(%rbp, %r31, 8), %tmm6
> > + tileloaddrs (%r16), %tmm3
> > + tileloaddrst1 0x10000000(%r31, %r14, 8), %tmm6
> > + tileloaddrst1 (%r16), %tmm3
> > tilestored %tmm6,0x123(%r31,%rax,4)
> > t2rpntlvwz0 0x123(%r31,%rax,8),%tmm6
> > t2rpntlvwz0t1 0x123(%r31,%rax,8),%tmm6
> > t2rpntlvwz1 0x123(%r31,%rax,8),%tmm6
> > t2rpntlvwz1t1 0x123(%r31,%rax,8),%tmm6
> > + t2rpntlvwz0rs 0x123(%r31,%rax,8),%tmm6
> > + t2rpntlvwz0rst1 0x123(%r31,%rax,8),%tmm6
> > + t2rpntlvwz1rs 0x123(%r31,%rax,8),%tmm6
> > + t2rpntlvwz1rst1 0x123(%r31,%rax,8),%tmm6
>
> Please move these up a few lines, as in ASCII numbers sort ahead of letters.
> (I should have spotted this on the AMX-TRANSPOSE patch already, where it
> also wants correcting.)
I will do that both to AMX=MOVRS and AMX-TRANSPOSE patch.
>
> > @@ -4112,6 +4130,14 @@ static const struct dis386 prefix_table[][4] = {
> > { RM_TABLE (RM_VEX_0F3849_X86_64_L_0_W_0_M_1_P_3) },
> > },
> >
> > + /* PREFIX_VEX_0F384A_X86_64_W_0_L_0 */ {
> > + { Bad_Opcode },
> > + { Bad_Opcode },
> > + { "tileloaddrst1", { TMM, MVexSIBMEM }, 0 },
> > + { "tileloaddrs", { TMM, MVexSIBMEM }, 0 },
> > + },
> > +
> > /* PREFIX_VEX_0F384B_X86_64_L_0_W_0 */
> > {
> > { Bad_Opcode },
>
> At the example of this: Opcode 4A fully mirrors 4B afaict. Hence their decode
> paths would better be fully in sync.
Ok.
>
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -3235,9 +3235,15 @@ tdpbhf8ps, 0xf2fd, AMX_FP8,
> > Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, Re
> tdphbf8ps,
> > 0xf3fd, AMX_FP8, Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf,
> { RegTMM,
> > RegTMM, RegTMM } tdphf8ps, 0x66fd, AMX_FP8,
> > Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, RegTMM,
> RegTMM }
> >
> > +t2rpntlvw<z>rs<loc>, 0x<z:opc>f8 | <loc:opc>,
> > +AMX_MOVRS&AMX_TRANSPOSE,
> > +Sibmem|Vex128|Map5|VexW0|NoSuf|ImplicitGroup, {
> > +Unspecified|BaseIndex, RegTMM } t2rpntlvw<z>rs<loc>, 0x<z:opc>f8 |
> > +<loc:opc>, APX_F&AMX_MOVRS&AMX_TRANSPOSE,
> > +Sibmem|EVex128|Map5|VexW0|NoSuf|ImplicitGroup, {
> > +Unspecified|BaseIndex, RegTMM }
> > +
> > <z>
> > <loc>
>
> Context-wise I'm afraid I can't associate this: It looks as if it went on top of
> other than (just?) the AMX-TRANSPOSE patch you sent to the list.
Yes. We need AMX-TRANSPOSE patch first before this patch.
>
> I further wonder if APX_F() isn't usable even here, in a "non-standard" way:
> Either of AMX_MOVRS&APX_F(AMX_TRANSPOSE) and
> AMX_TRANSPOSE&APX_F(AMX_MOVRS) may work fine.
>
It will need to change or ease the assert condition in cpu_flags_match:
gas_assert (cpu_flags_equal (&cpu, &all)) to get the thing done like that.
Let me check if we could have another "equivalent" condition for that.
Thx,
Haochen
More information about the Binutils
mailing list