[PATCH 1/2] Support Intel MOVRS

Jiang, Haochen haochen.jiang@intel.com
Mon Dec 30 07:39:20 GMT 2024


> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, December 27, 2024 8:27 PM
> 
> On 24.12.2024 10:24, Haochen Jiang wrote:
> > From: "Hu, Lin1" <lin1.hu@intel.com>
> >
> > The APX_F extension for MOVRS will be:
> >   - EVEX.LLZ.NP.MAP4.SCALABLE 8A !(11):rrr:bbb for r8/m8 with NF=0 and
> >     ND=0
> 
> This got to be IGNORED, not SCALABLE, to match other byte insns in APX?

It should not be SCALABLE, I copied it wrongly. But it is missing IGNORED.
I suppose it should be added. Let me check that.

> >
> > We did not merge the table together for APX_F since there is an
> > explicit
> > x64 for movrs insn. The current APX_F() did not support the
> > combination between CPUIDs.
> 
> This is an odd remark to make. APX_F() makes sense to use only for dual
> VEX/EVEX templates, yet the non-APX forms of MOVRS are legacy-encoded.
> See also the ENQCMD{,S} cleanup patch just sent.

I just got deeper understanding on APX_F() through that patch, I will remove this.

> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/movrs-inval.s
> > @@ -0,0 +1,12 @@
> > +# Check Illegal 32bit MOVRS instructions
> > +
> > +	.text
> > +_start:
> > +	movrs	0x10000000(%esp, %esi, 8), %dx
> > +	movrs	0x10000000(%esp, %esi, 8), %edx
> > +	movrs	0x10000000(%esp, %esi, 8), %r12
> > +	movrs	0x10000000(%esp, %esi, 8), %bl
> > +	vmovrsb	0x10000000(%esp, %esi, 8), %zmm6{%k7}
> > +	vmovrsd	0x10000000(%esp, %esi, 8), %zmm6{%k7}
> > +	vmovrsq	0x10000000(%esp, %esi, 8), %zmm6{%k7}
> > +	vmovrsw	0x10000000(%esp, %esi, 8), %zmm6{%k7}
> 
> Once again I'm uncertain of the usefulness of this test.

I did not remove this since MOVRS got prefetchrst2 which is valid
under 32-bit. It is not a whole CPUID under 64-bit.

I am okay for keeping it or removing it.

> 
> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
> > @@ -118,6 +118,10 @@ Disassembly of section \.text:
> >  [	 ]*[a-f0-9]+:[	 ]*62 4c .d 08 f8 bc 87 23 01 00 00[
> 	 ]+movdir64b[	 ]+0x123\(%r31,%rax,4\),%r31
> >  [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[
> 	 ]+%r25d,0x123\(%r31,%rax,4\)
> >  [	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[
> 	 ]+%r31,0x123\(%r31,%rax,4\)
> > +[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 8a 84 87 23 01 00 00[	 ]+movrs[
> 	 ]+0x123\(%r31,%rax,4\),%r16b
> 
> This one's WIG, but doesn't end up with W=1. I wonder whether that's related
> to the very last comment (on the opcode table entries).

Initially MOVRS is implemented under W0, which is wrong. I did a last minute
change to let it skip W. However, APX_F part is forgotten. I will handle that part.

> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-movrs-suffix.d
> > @@ -0,0 +1,13 @@
> > +#objdump: -dwMsuffix
> > +#name: x86_64 MOVRS insns
> > +
> > +.*: +file format .*
> > +
> > +Disassembly of section \.text:
> > +
> > +0+ <_start>:
> > +\s*[a-f0-9]+:\s*66 0f 38 8b 92 00 ff ff ff\s+movrsw
> > +-0x100\(%rdx\),%dx \s*[a-f0-9]+:\s*0f 38 8b 92 00 fe ff ff\s+movrsl
> > +-0x200\(%rdx\),%edx \s*[a-f0-9]+:\s*4c 0f 38 8b a2 00 fc ff
> > +ff\s+movrsq -0x400\(%rdx\),%r12 \s*[a-f0-9]+:\s*0f 38 8a 5a
> > +80\s+movrsb -0x80\(%rdx\),%bl #pass
> 
> While I agree with having a separate -Msuffix test, ...
> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-movrs-suffix.s
> > @@ -0,0 +1,15 @@
> > +# Check 64bit MOVRS instructions
> > +
> > +	.text
> > +_start:
> > +	movrsw	-256(%rdx), %dx
> > +	movrsl	-512(%rdx), %edx
> > +	movrsq	-1024(%rdx), %r12
> > +	movrsb	-128(%rdx), %bl
> 
> ... the source imo wants folding into x86-64-movrs.s and ...
> 
> > +_intel:
> > +	.intel_syntax noprefix
> > +	movrsw	dx, WORD PTR [rdx-256]
> > +	movrsl	edx, DWORD PTR [rdx-512]
> > +	movrsq	r12, QWORD PTR [rdx-1024]
> > +	movrsb	bl, BYTE PTR [rdx-128]
> 
> ... further bogus use of suffixes in Intel syntax shouldn't be added in testcases.

I will change them.

> 
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -3554,3 +3554,17 @@ vcomxs<sdh>, 0x<sdh:spfx>2f, AVX10_2,
> > Modrm|EVexLIG|<sdh:spc1>|<sdh:vexw>|Disp8M
> >  vucomxs<sdh>, 0x<sdh:spfx>2e, AVX10_2,
> > Modrm|EVexLIG|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, {
> > RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM }
> >
> >  // AVX10.2 instructions end.
> > +
> > +// MOVRS instructions.
> > +
> > +prefetchrst2, 0xf18/4, MOVRS, Modrm|Anysize|IgnoreSize|NoSuf, {
> > +BaseIndex }
> > +
> > +movrs, 0x8a, MOVRS&x64,
> > +Modrm|Space0F38|No_wSuf|No_lSuf|No_sSuf|No_qSuf, {
> > +Byte|Unspecified|BaseIndex, Reg8 } movrs, 0x8b, MOVRS&x64,
> > +Modrm|Anysize|Space0F38|CheckOperandSize|No_bSuf|No_sSuf, {
> > +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> movrs,
> > +0x8a, MOVRS&APX_F,
> >
> +Modrm|EVex128|EVexMap4|VexW0|No_wSuf|No_lSuf|No_sSuf|No_qSuf
> , {
> > +Byte|Unspecified|BaseIndex, Reg8 } movrs, 0x8b, MOVRS&APX_F,
> > +Modrm|CheckOperandSize|EVex128|EVexMap4|No_bSuf|No_sSuf, {
> > +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> 
> These want to leverage W if at all possible, allowing to go from 4 to 2
> templates.

It is not a VEX/EVEX promotion, but a legacy/EVEX promotion. Could we do
that?

If we could, the problem here is still MOVRS is not 64-bit default. Then the
non-APX part needs x64, making it multiple CPUIDs. Maybe it also needs
work around in cpu_flags_match second assert just like
AMX_MOVRS & AMX_TRANSPOSE. At least we could check the all.bitfield
to let them skip that assert.

Thx,
Haochen


More information about the Binutils mailing list