[PATCH v2 2/2] Support Intel AMX-MOVRS

Jiang, Haochen haochen.jiang@intel.com
Mon Dec 30 03:25:47 GMT 2024


> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, December 27, 2024 8:48 PM
> 
> On 24.12.2024 10:24, Haochen Jiang wrote:
> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
> > @@ -135,11 +135,19 @@ _start:
> >  	sttilecfg	0x123(%r31,%rax,4)
> >  	tileloadd	0x123(%r31,%rax,4),%tmm6
> >  	tileloaddt1	0x123(%r31,%rax,4),%tmm6
> > +	tileloaddrs     0x10000000(%rbp, %r31, 8), %tmm6
> > +	tileloaddrs     (%r16), %tmm3
> > +	tileloaddrst1   0x10000000(%r31, %r14, 8), %tmm6
> > +	tileloaddrst1   (%r16), %tmm3
> >  	tilestored	%tmm6,0x123(%r31,%rax,4)
> >  	t2rpntlvwz0	0x123(%r31,%rax,8),%tmm6
> >  	t2rpntlvwz0t1	0x123(%r31,%rax,8),%tmm6
> >  	t2rpntlvwz1	0x123(%r31,%rax,8),%tmm6
> >  	t2rpntlvwz1t1	0x123(%r31,%rax,8),%tmm6
> > +	t2rpntlvwz0rs	0x123(%r31,%rax,8),%tmm6
> > +	t2rpntlvwz0rst1	0x123(%r31,%rax,8),%tmm6
> > +	t2rpntlvwz1rs	0x123(%r31,%rax,8),%tmm6
> > +	t2rpntlvwz1rst1	0x123(%r31,%rax,8),%tmm6
> 
> Please move these up a few lines, as in ASCII numbers sort ahead of letters.
> (I should have spotted this on the AMX-TRANSPOSE patch already, where it
> also wants correcting.)

I will do that both to AMX=MOVRS and AMX-TRANSPOSE patch.

> 
> > @@ -4112,6 +4130,14 @@ static const struct dis386 prefix_table[][4] = {
> >      { RM_TABLE (RM_VEX_0F3849_X86_64_L_0_W_0_M_1_P_3) },
> >    },
> >
> > +  /* PREFIX_VEX_0F384A_X86_64_W_0_L_0 */  {
> > +    { Bad_Opcode },
> > +    { Bad_Opcode },
> > +    { "tileloaddrst1",	{ TMM, MVexSIBMEM }, 0 },
> > +    { "tileloaddrs",	{ TMM, MVexSIBMEM }, 0 },
> > +  },
> > +
> >    /* PREFIX_VEX_0F384B_X86_64_L_0_W_0 */
> >    {
> >      { Bad_Opcode },
> 
> At the example of this: Opcode 4A fully mirrors 4B afaict. Hence their decode
> paths would better be fully in sync.

Ok.

> 
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -3235,9 +3235,15 @@ tdpbhf8ps, 0xf2fd, AMX_FP8,
> > Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, Re
> tdphbf8ps,
> > 0xf3fd, AMX_FP8, Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf,
> { RegTMM,
> > RegTMM, RegTMM }  tdphf8ps, 0x66fd, AMX_FP8,
> > Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, RegTMM,
> RegTMM }
> >
> > +t2rpntlvw<z>rs<loc>, 0x<z:opc>f8 | <loc:opc>,
> > +AMX_MOVRS&AMX_TRANSPOSE,
> > +Sibmem|Vex128|Map5|VexW0|NoSuf|ImplicitGroup, {
> > +Unspecified|BaseIndex, RegTMM } t2rpntlvw<z>rs<loc>, 0x<z:opc>f8 |
> > +<loc:opc>, APX_F&AMX_MOVRS&AMX_TRANSPOSE,
> > +Sibmem|EVex128|Map5|VexW0|NoSuf|ImplicitGroup, {
> > +Unspecified|BaseIndex, RegTMM }
> > +
> >  <z>
> >  <loc>
> 
> Context-wise I'm afraid I can't associate this: It looks as if it went on top of
> other than (just?) the AMX-TRANSPOSE patch you sent to the list.

Yes. We need AMX-TRANSPOSE patch first before this patch.

> 
> I further wonder if APX_F() isn't usable even here, in a "non-standard" way:
> Either of AMX_MOVRS&APX_F(AMX_TRANSPOSE) and
> AMX_TRANSPOSE&APX_F(AMX_MOVRS) may work fine.
> 

It will need to change or ease the assert condition in cpu_flags_match:
gas_assert (cpu_flags_equal (&cpu, &all)) to get the thing done like that.

Let me check if we could have another "equivalent" condition for that.

Thx,
Haochen


More information about the Binutils mailing list