[PATCH] Support Intel SM4 AVX10.2 extension

Jan Beulich jbeulich@suse.com
Fri Dec 13 11:47:30 GMT 2024


On 13.12.2024 09:31, Haochen Jiang wrote:
> This is the v2 patch for Intel SM4 AVX10.2 extension.
> 
> Changes, open and patch descrption are embedded below.
> 
> Ok for trunk?

Yes, please apply as is, on the grounds of ...

> Open:
> 
> Currently in v2 patch, I just templatize the table with the following changes
> based on v1 patch:
> 
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -2193,11 +2193,8 @@ vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|Src1VVVV|VexW0|NoSuf, { RegXMM|Uns
> 
>  // SM4 instructions.
> 
> -vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> -vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> -
> -vsm4key4, 0xf3da, SM4&AVX10_2, Modrm|Space0F38|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> -vsm4rnds4, 0xf2da, SM4&AVX10_2, Modrm|Space0F38|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +<sm4:isa:attr:reg, $y:SM4:Vex:, $z:SM4&AVX10_2:Disp8ShiftVL:RegZMM>
> +
> +vsm4key4<sm4>, 0xf3da, <sm4:isa>, Modrm|Space0F38|<sm4:attr>|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|<sm4:reg>|Unspecified|BaseIndex, RegXMM|RegYMM|<sm4:reg>, RegXMM|RegYMM|<sm4:reg> }
> +vsm4rnds4<sm4>, 0xf2da, <sm4:isa>, Modrm|Space0F38|<sm4:attr>|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|<sm4:reg>|Unspecified|BaseIndex, RegXMM|RegYMM|<sm4:reg>, RegXMM|RegYMM|<sm4:reg> }
> +
> +<sm4>
>  
>  // SM4 instructions end.
> 
> 
> While I have also tried to merge the table like AVX/AVX512, it needs
> the following changes based on v1 patch:
> 
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -2224,7 +2224,8 @@ cpu_flags_match (const insn_template *t)
>        /* Dual AVX/AVX512 templates need to retain AVX512* only if we already
>          know that EVEX encoding will be needed.  */
>        if ((any.bitfield.cpuavx || any.bitfield.cpuavx2 || any.bitfield.cpufma)
> -         && (any.bitfield.cpuavx512f || any.bitfield.cpuavx512vl))
> +         && (any.bitfield.cpuavx512f || any.bitfield.cpuavx512vl
> +             || any.bitfield.cpuavx10_2))
>         {
>           if (need_evex_encoding (t))
>             {
> @@ -2238,6 +2239,7 @@ cpu_flags_match (const insn_template *t)
>             {
>               any.bitfield.cpuavx512f = 0;
>               any.bitfield.cpuavx512vl = 0;
> +             any.bitfield.cpuavx10_2 = 0;
>             }
>         }
> 
> @@ -4033,13 +4035,15 @@ install_template (const insn_template *t)
>      {
>        if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>            || maybe_cpu (t, CpuFMA))
> -         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> +         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)
> +             || maybe_cpu (t, CpuAVX10_2)))
>         {
>           if (need_evex_encoding (t))
>             {
>               i.tm.opcode_modifier.vex = 0;
>               i.tm.cpu.bitfield.cpuavx512f = i.tm.cpu_any.bitfield.cpuavx512f;
>               i.tm.cpu.bitfield.cpuavx512vl = i.tm.cpu_any.bitfield.cpuavx512vl;
> +             i.tm.cpu.bitfield.cpuavx10_2 = i.tm.cpu_any.bitfield.cpuavx10_2;
>             }
>           else
>             {
> 
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -2193,11 +2193,8 @@ vsm3msg2, 0x66da, SM3, Modrm|Space0F38|Vex128|Src1VVVV|VexW0|NoSuf, { RegXMM|Uns
> 
>  // SM4 instructions.
> 
> -vsm4key4, 0xf3da, SM4, Modrm|Space0F38|Vex|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> -vsm4rnds4, 0xf2da, SM4, Modrm|Space0F38|Vex|Src1VVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
> -
> -vsm4key4, 0xf3da, SM4&AVX10_2, Modrm|Space0F38|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> -vsm4rnds4, 0xf2da, SM4&AVX10_2, Modrm|Space0F38|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +vsm4key4, 0xf3da, SM4&(AVX|AVX10_2), Modrm|Space0F38|Vex|EVexDYN|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +vsm4rnds4, 0xf2da, SM4&(AVX|AVX10_2), Modrm|Space0F38|Vex|EVexDYN|Src1VVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> 
>  // SM4 instructions end.
> 
> 
> I am okay to go either way, but slightly prefer the templatizing one
> since probably SM4 would be the only ISA with AVX10.2 needs such VEX
> to EVEX extension as mentioned in the previous thread (MOVRS does
> not need that). Also, it is a tendancy that we will directly provide
> EVEX encodings and no VEX encodings for vector instructions since
> AVX10.

... this statement of yours. I'll take you up on that if things end up
changing later ...

Jan


More information about the Binutils mailing list