[PATCH 1/2] Support Intel MOVRS
Jiang, Haochen
haochen.jiang@intel.com
Mon Dec 30 07:39:20 GMT 2024
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, December 27, 2024 8:27 PM
>
> On 24.12.2024 10:24, Haochen Jiang wrote:
> > From: "Hu, Lin1" <lin1.hu@intel.com>
> >
> > The APX_F extension for MOVRS will be:
> > - EVEX.LLZ.NP.MAP4.SCALABLE 8A !(11):rrr:bbb for r8/m8 with NF=0 and
> > ND=0
>
> This got to be IGNORED, not SCALABLE, to match other byte insns in APX?
It should not be SCALABLE, I copied it wrongly. But it is missing IGNORED.
I suppose it should be added. Let me check that.
> >
> > We did not merge the table together for APX_F since there is an
> > explicit
> > x64 for movrs insn. The current APX_F() did not support the
> > combination between CPUIDs.
>
> This is an odd remark to make. APX_F() makes sense to use only for dual
> VEX/EVEX templates, yet the non-APX forms of MOVRS are legacy-encoded.
> See also the ENQCMD{,S} cleanup patch just sent.
I just got deeper understanding on APX_F() through that patch, I will remove this.
>
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/movrs-inval.s
> > @@ -0,0 +1,12 @@
> > +# Check Illegal 32bit MOVRS instructions
> > +
> > + .text
> > +_start:
> > + movrs 0x10000000(%esp, %esi, 8), %dx
> > + movrs 0x10000000(%esp, %esi, 8), %edx
> > + movrs 0x10000000(%esp, %esi, 8), %r12
> > + movrs 0x10000000(%esp, %esi, 8), %bl
> > + vmovrsb 0x10000000(%esp, %esi, 8), %zmm6{%k7}
> > + vmovrsd 0x10000000(%esp, %esi, 8), %zmm6{%k7}
> > + vmovrsq 0x10000000(%esp, %esi, 8), %zmm6{%k7}
> > + vmovrsw 0x10000000(%esp, %esi, 8), %zmm6{%k7}
>
> Once again I'm uncertain of the usefulness of this test.
I did not remove this since MOVRS got prefetchrst2 which is valid
under 32-bit. It is not a whole CPUID under 64-bit.
I am okay for keeping it or removing it.
>
> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
> > @@ -118,6 +118,10 @@ Disassembly of section \.text:
> > [ ]*[a-f0-9]+:[ ]*62 4c .d 08 f8 bc 87 23 01 00 00[
> ]+movdir64b[ ]+0x123\(%r31,%rax,4\),%r31
> > [ ]*[a-f0-9]+:[ ]*62 4c 7c 08 f9 8c 87 23 01 00 00[ ]+movdiri[
> ]+%r25d,0x123\(%r31,%rax,4\)
> > [ ]*[a-f0-9]+:[ ]*62 4c fc 08 f9 bc 87 23 01 00 00[ ]+movdiri[
> ]+%r31,0x123\(%r31,%rax,4\)
> > +[ ]*[a-f0-9]+:[ ]*62 cc 7c 08 8a 84 87 23 01 00 00[ ]+movrs[
> ]+0x123\(%r31,%rax,4\),%r16b
>
> This one's WIG, but doesn't end up with W=1. I wonder whether that's related
> to the very last comment (on the opcode table entries).
Initially MOVRS is implemented under W0, which is wrong. I did a last minute
change to let it skip W. However, APX_F part is forgotten. I will handle that part.
>
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-movrs-suffix.d
> > @@ -0,0 +1,13 @@
> > +#objdump: -dwMsuffix
> > +#name: x86_64 MOVRS insns
> > +
> > +.*: +file format .*
> > +
> > +Disassembly of section \.text:
> > +
> > +0+ <_start>:
> > +\s*[a-f0-9]+:\s*66 0f 38 8b 92 00 ff ff ff\s+movrsw
> > +-0x100\(%rdx\),%dx \s*[a-f0-9]+:\s*0f 38 8b 92 00 fe ff ff\s+movrsl
> > +-0x200\(%rdx\),%edx \s*[a-f0-9]+:\s*4c 0f 38 8b a2 00 fc ff
> > +ff\s+movrsq -0x400\(%rdx\),%r12 \s*[a-f0-9]+:\s*0f 38 8a 5a
> > +80\s+movrsb -0x80\(%rdx\),%bl #pass
>
> While I agree with having a separate -Msuffix test, ...
>
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-movrs-suffix.s
> > @@ -0,0 +1,15 @@
> > +# Check 64bit MOVRS instructions
> > +
> > + .text
> > +_start:
> > + movrsw -256(%rdx), %dx
> > + movrsl -512(%rdx), %edx
> > + movrsq -1024(%rdx), %r12
> > + movrsb -128(%rdx), %bl
>
> ... the source imo wants folding into x86-64-movrs.s and ...
>
> > +_intel:
> > + .intel_syntax noprefix
> > + movrsw dx, WORD PTR [rdx-256]
> > + movrsl edx, DWORD PTR [rdx-512]
> > + movrsq r12, QWORD PTR [rdx-1024]
> > + movrsb bl, BYTE PTR [rdx-128]
>
> ... further bogus use of suffixes in Intel syntax shouldn't be added in testcases.
I will change them.
>
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -3554,3 +3554,17 @@ vcomxs<sdh>, 0x<sdh:spfx>2f, AVX10_2,
> > Modrm|EVexLIG|<sdh:spc1>|<sdh:vexw>|Disp8M
> > vucomxs<sdh>, 0x<sdh:spfx>2e, AVX10_2,
> > Modrm|EVexLIG|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, {
> > RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM }
> >
> > // AVX10.2 instructions end.
> > +
> > +// MOVRS instructions.
> > +
> > +prefetchrst2, 0xf18/4, MOVRS, Modrm|Anysize|IgnoreSize|NoSuf, {
> > +BaseIndex }
> > +
> > +movrs, 0x8a, MOVRS&x64,
> > +Modrm|Space0F38|No_wSuf|No_lSuf|No_sSuf|No_qSuf, {
> > +Byte|Unspecified|BaseIndex, Reg8 } movrs, 0x8b, MOVRS&x64,
> > +Modrm|Anysize|Space0F38|CheckOperandSize|No_bSuf|No_sSuf, {
> > +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> movrs,
> > +0x8a, MOVRS&APX_F,
> >
> +Modrm|EVex128|EVexMap4|VexW0|No_wSuf|No_lSuf|No_sSuf|No_qSuf
> , {
> > +Byte|Unspecified|BaseIndex, Reg8 } movrs, 0x8b, MOVRS&APX_F,
> > +Modrm|CheckOperandSize|EVex128|EVexMap4|No_bSuf|No_sSuf, {
> > +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>
> These want to leverage W if at all possible, allowing to go from 4 to 2
> templates.
It is not a VEX/EVEX promotion, but a legacy/EVEX promotion. Could we do
that?
If we could, the problem here is still MOVRS is not 64-bit default. Then the
non-APX part needs x64, making it multiple CPUIDs. Maybe it also needs
work around in cpu_flags_match second assert just like
AMX_MOVRS & AMX_TRANSPOSE. At least we could check the all.bitfield
to let them skip that assert.
Thx,
Haochen
More information about the Binutils
mailing list