[PATCH] Support Intel SM4 AVX10.2 extension
Jan Beulich
jbeulich@suse.com
Fri Dec 13 11:47:30 GMT 2024
On 13.12.2024 09:31, Haochen Jiang wrote:
> This is the v2 patch for Intel SM4 AVX10.2 extension.
>
> Changes, open and patch descrption are embedded below.
>
> Ok for trunk?
Yes, please apply as is, on the grounds of ...
> Open:
>
> Currently in v2 patch, I just templatize the table with the following changes
> based on v1 patch:
>
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -2193,11 +2193,8 @@ vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|Src1VVVV|VexW0|NoSuf, { RegXMM|Uns
>
> // SM4 instructions.
>
> -vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> -vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> -
> -vsm4key4, 0xf3da, SM4&AVX10_2, Modrm|Space0F38|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> -vsm4rnds4, 0xf2da, SM4&AVX10_2, Modrm|Space0F38|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +<sm4:isa:attr:reg, $y:SM4:Vex:, $z:SM4&AVX10_2:Disp8ShiftVL:RegZMM>
> +
> +vsm4key4<sm4>, 0xf3da, <sm4:isa>, Modrm|Space0F38|<sm4:attr>|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|<sm4:reg>|Unspecified|BaseIndex, RegXMM|RegYMM|<sm4:reg>, RegXMM|RegYMM|<sm4:reg> }
> +vsm4rnds4<sm4>, 0xf2da, <sm4:isa>, Modrm|Space0F38|<sm4:attr>|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|<sm4:reg>|Unspecified|BaseIndex, RegXMM|RegYMM|<sm4:reg>, RegXMM|RegYMM|<sm4:reg> }
> +
> +<sm4>
>
> // SM4 instructions end.
>
>
> While I have also tried to merge the table like AVX/AVX512, it needs
> the following changes based on v1 patch:
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -2224,7 +2224,8 @@ cpu_flags_match (const insn_template *t)
> /* Dual AVX/AVX512 templates need to retain AVX512* only if we already
> know that EVEX encoding will be needed. */
> if ((any.bitfield.cpuavx || any.bitfield.cpuavx2 || any.bitfield.cpufma)
> - && (any.bitfield.cpuavx512f || any.bitfield.cpuavx512vl))
> + && (any.bitfield.cpuavx512f || any.bitfield.cpuavx512vl
> + || any.bitfield.cpuavx10_2))
> {
> if (need_evex_encoding (t))
> {
> @@ -2238,6 +2239,7 @@ cpu_flags_match (const insn_template *t)
> {
> any.bitfield.cpuavx512f = 0;
> any.bitfield.cpuavx512vl = 0;
> + any.bitfield.cpuavx10_2 = 0;
> }
> }
>
> @@ -4033,13 +4035,15 @@ install_template (const insn_template *t)
> {
> if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> || maybe_cpu (t, CpuFMA))
> - && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> + && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)
> + || maybe_cpu (t, CpuAVX10_2)))
> {
> if (need_evex_encoding (t))
> {
> i.tm.opcode_modifier.vex = 0;
> i.tm.cpu.bitfield.cpuavx512f = i.tm.cpu_any.bitfield.cpuavx512f;
> i.tm.cpu.bitfield.cpuavx512vl = i.tm.cpu_any.bitfield.cpuavx512vl;
> + i.tm.cpu.bitfield.cpuavx10_2 = i.tm.cpu_any.bitfield.cpuavx10_2;
> }
> else
> {
>
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -2193,11 +2193,8 @@ vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|Src1VVVV|VexW0|NoSuf, { RegXMM|Uns
>
> // SM4 instructions.
>
> -vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> -vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> -
> -vsm4key4, 0xf3da, SM4&AVX10_2, Modrm|Space0F38|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> -vsm4rnds4, 0xf2da, SM4&AVX10_2, Modrm|Space0F38|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +vsm4key4, 0xf3da, SM4&(AVX|AVX10_2), Modrm|Space0F38|Vex|EVexDYN|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +vsm4rnds4, 0xf2da, SM4&(AVX|AVX10_2), Modrm|Space0F38|Vex|EVexDYN|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
>
> // SM4 instructions end.
>
>
> I am okay to go either way, but slightly prefer the templatizing one
> since probably SM4 would be the only ISA with AVX10.2 needs such VEX
> to EVEX extension as mentioned in the previous thread (MOVRS does
> not need that). Also, it is a tendancy that we will directly provide
> EVEX encodings and no VEX encodings for vector instructions since
> AVX10.
... this statement of yours. I'll take you up on that if things end up
changing later ...
Jan
More information about the Binutils
mailing list