[PATCH 2/3] x86: Drop SwapSources

Tue Apr 30 06:18:24 GMT 2024

On 30.04.2024 04:56, Cui, Lili wrote:
>> On 29.04.2024 15:41, Cui, Lili wrote:
>>>> On 29.04.2024 14:23, Cui, Lili wrote:
>>>>>> On 28.04.2024 06:47, Cui, Lili wrote:
>>>>>>>> On 26.04.2024 10:14, Cui, Lili wrote:
>>>>>>>>>> On 24.04.2024 09:23, Cui, Lili wrote:
>>>>>>>>>>> --- a/gas/config/tc-i386.c
>>>>>>>>>>> +++ b/gas/config/tc-i386.c
>>>>>>>>>>> @@ -10434,6 +10434,14 @@ build_modrm_byte (void)
>>>>>>>>>>>
>>>>>>>>>>>    switch (i.tm.opcode_modifier.vexvvvv)
>>>>>>>>>>>      {
>>>>>>>>>>> +    case VexVVVV_SRC2:
>>>>>>>>>>> +      if (source != op)
>>>>>>>>>>> +	{
>>>>>>>>>>> +	  v = source++;
>>>>>>>>>>> +	  break;
>>>>>>>>>>> +	}
>>>>>>>>>>> +      /* For XOP: vpshl* and vpsha*.  */
>>>>>>>>>>> +      /* Fall through.  */
>>>>>>>>>>>      case VexVVVV_SRC1:
>>>>>>>>>>
>>>>>>>>>> This falling-through is odd and hence needs a better comment
>>>>>>>>>> (then also covering vprot*, which afaict is similarly affected).
>>>>>>>>>> The reason for this is the XOP.W-controlled operand swapping,
>>>>>>>>>> if I'm not mistaken? In which case perhaps instead of the
>>>>>>>>>> fall-through here the logic swapping the operands should
>>>>>>>>>> replace
>>>>>>>>>> VexVVVV_SRC2 by
>>>>>>>> VexVVVV_SRC1?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, vprot* should be included, and it is related to
>>>>>>>>> XOP.W-controlled
>>>>>>>> operand swapping, the comments says " /* Only the first two
>>>>>>>> register operands need reversing, alongside flipping VEX.W.  */
>>>>>>>> ", But there is actually a memory operand, not two register operands.
>>>>>>>>>
>>>>>>>>> I think VexVVVV_SRC2 makes more sense here, it matches the
>>>>>>>>> actual
>>>>>>>> situation, we want to use vvvv to encode the first operand.
>>>>>>>>>
>>>>>>>>> Opcode table:
>>>>>>>>> vprot<xop>, 0x90 | <xop:opc>, XOP,
>>>>>>>>> D|Modrm|Vex128|SpaceXOP09|VexVVVV_Src2|VexW0|NoSuf,
>>>>>> { RegXMM,
>>>>>>>>> RegXMM|Unspecified|BaseIndex, RegXMM }
>>>>>>>>>
>>>>>>>>> testcase:
>>>>>>>>> vprotb (%rax),%xmm12,%xmm15
>>>>>>>>> vprotb %xmm15,(%r12),%xmm0
>>>>>>>>
>>>>>>>> VexVVVV_Src2 is appropriate for the latter, yes, but not for the
>>>>>>>> former. That uses VexVVVV_Src1 layout. Hence my suggestion to
>>>>>>>> replace the attribute when swapping operands.
>>>>>>>>
>>>>>>>
>>>>>>> If replace the Src2VVVV| VexW0 with Src1VVVV| VexW1 and swapping
>>>>>> operands. We can put VexVVVV_SRC1 before VexVVVV_SRC2, but we
>> still
>>>>>> need to add "(!is_cpu (&i.tm, CpuXOP) || source == op" under
>>>>>> VexVVVV_SRC1 , and match_template also needs to be adjusted (I
>> made
>>>>>> a simple modification and it still failed, I think continuing like
>>>>>> this may go against the original intention).
>>>>>>>
>>>>>>>   switch (i.tm.opcode_modifier.vexvvvv)
>>>>>>>     {
>>>>>>>     /* VEX.vvvv encodes the first source register operand.  */
>>>>>>>     case VexVVVV_SRC1:
>>>>>>>       if (!is_cpu (&i.tm, CpuXOP) || source == op)
>>>>>>>         {
>>>>>>>           v =  dest - 1;
>>>>>>>           break;
>>>>>>>         }
>>>>>>>     /* For XOP: vpshl*, vpsha* and vprot*.  */
>>>>>>>     /* Fall through.  */
>>>>>>>     /* VEX.vvvv encodes the last source register operand.  */
>>>>>>>     case VexVVVV_SRC2:
>>>>>>>       v = source++;
>>>>>>>       break;
>>>>>>>     /* VEX.vvvv encodes the destination register operand.  */
>>>>>>>     case VexVVVV_DST:
>>>>>>>       v = dest--;
>>>>>>>       break;
>>>>>>>     default:
>>>>>>>       v = ~0;
>>>>>>>       break;
>>>>>>>      }
>>>>>>>
>>>>>>> Do you think we should add a separate patch 4 for XOP that removes
>>>>>>> the
>>>>>> special handling in match_template and completes its template? so
>>>>>> we don't have to add special handling for src1vvvv or src2vvvv.
>>>>>> This might go against your desire to reduce template size, but it
>>>>>> would help simplify the logic. I'd like to know your thoughts.
>>>>>>
>>>>>> Indeed. You'd effectively revert earlier folding that I did. And
>>>>>> the adjustment I suggested earlier ought to be small/simple enough.
>>>>>>
>>>>>
>>>>> So, I continued working on the previous suggestion. With the
>>>>> following
>>>> modification and it worked.
>>>>>
>>>>> @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix)
>>>>>                           || is_cpu (t, CpuAPX_F));
>>>>>               if (!operand_type_match (overlap0, i.types[0])
>>>>>                   || !operand_type_match (overlap1, i.types[j])
>>>>> -                 || (t->operands == 3
>>>>> +                 || (t->operands == 3 && !is_cpu (t, CpuXOP)
>>>>>                       && !operand_type_match (overlap2, i.types[1]))
>>>>>                   || (check_register
>>>>>                       && !operand_type_register_match (i.types[0],
>>>>
>>>> Just to mention it - this certainly isn't what I suggested. In fact I
>>>> seem to vaguely recall that something similar was once proposed
>>>> during the original APX work as well, where I then objected, too.
>>>>
>>>>> But I found that there are 4 test files that failed, I didn't find
>>>>> the doc on
>>>> how to encode vprotb but I guess that is because I changed the
>>>> default template from Src2VVVV| VexW0 to Src1VVVV| VexW1, then all
>>>> the related test cases needed to be modified. Do you have any comments
>> here?
>>>>>
>>>>> regexp "^[      ]*[a-f0-9]+:    8f e9 40 90
>>>> d8[         ]+vprotb %xmm7,%xmm0,%xmm3$"
>>>>> line   "    11ed:       8f e9 f8 90 df          vprotb %xmm7,%xmm0,%xmm3"
>>>>
>>>> Well, while changing the templates is in principle possible, and the
>>>> resulting code would still be correct, changing encodings it usually not a
>> good idea.
>>>> Thus when it can be avoided, it should be avoided, imo. Hence why I
>>>> didn't suggest this, but to amend the code doing the operand swapping
>>>> (for the case where operand order is controlled by XOP.W).
>>>>
>>>
>>> I mistakenly thought this was what you wanted, and I also think this
>> modification is unacceptable. Could you elaborate further on your original
>> suggestion?
>>
>> At the bottom of match_template() we have
>>
>>     case Opcode_VexW:
>>       /* Only the first two register operands need reversing, alongside
>> 	 flipping VEX.W.  */
>>       i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
>>
>> This is where I think you further want to adjust
>> i.tm.opcode_modifier.vexvvvv.
>>
> 
> "vprotb %xmm7,%xmm0,%xmm3" doesn't go through this place, it finds the matching template directly (Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }).

Of course, for not having a memory operand (and no pseudo prefix). But that's
not the problem here - the problem is that for build_modrm_byte() you want to
make adjustments when operands need swapping (compared to what the templates
say).

> I inserted a judgment to solve this problem, it's a bit ugly. I'd like to abandon this optimization. 

Didn't you have a working form earlier on? Could we use that, leaving the XOP
stuff untouched for now, for me to see about looking into later?

> To avoid confusion, I post all the changes.
> 
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix)
>                           || is_cpu (t, CpuAPX_F));
>               if (!operand_type_match (overlap0, i.types[0])
>                   || !operand_type_match (overlap1, i.types[j])
> -                 || (t->operands == 3
> +                 || (t->operands == 3 && !is_cpu (t, CpuXOP)     --> This place also needs to change.
>                       && !operand_type_match (overlap2, i.types[1]))
>                   || (check_register
>                       && !operand_type_register_match (i.types[0],
> @@ -9035,6 +9035,10 @@ match_template (char mnem_suffix)
>                       specific_error = progress (i.error);
>                       continue;
>                     }
> +                 if (is_cpu (t, CpuXOP) && operand_types[0].bitfield.baseindex   --> It's ugly, and still has some test cases failing for other XOP instructions.
> +                     && i.types[0].bitfield.class == RegSIMD
> +                     && t->opcode_modifier.vexw == VEXW1)
> +                   found_reverse_match = Opcode_VexW;
>                   break;
>                 }
>             } 

Yeah, as indicated before: These probably shouldn't be changed like this.

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -1938,10 +1938,10 @@ vpmacsww, 0x95, XOP, Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, Reg
>  vpmadcsswd, 0xa6, XOP, Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
>  vpmadcswd, 0xb6, XOP, Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
>  vpperm, 0xa3, XOP, D|Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
> -vprot<xop>, 0x90 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src2VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
> +vprot<xop>, 0x90 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
>  vprot<xop>, 0xc0 | <xop:opc>, XOP, Modrm|Vex128|SpaceXOP08|VexW0|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
> -vpsha<xop>, 0x98 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src2VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
> -vpshl<xop>, 0x94 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src2VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
> +vpsha<xop>, 0x98 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vpshl<xop>, 0x94 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }

Aiui (and as per earlier discussion) this leads to encoding changes, which we
want to avoid whenever possible.

Jan