[PATCH 2/3] x86: Drop SwapSources
Jan Beulich
jbeulich@suse.com
Mon Apr 29 13:49:01 GMT 2024
On 29.04.2024 15:41, Cui, Lili wrote:
>> On 29.04.2024 14:23, Cui, Lili wrote:
>>>> On 28.04.2024 06:47, Cui, Lili wrote:
>>>>>> On 26.04.2024 10:14, Cui, Lili wrote:
>>>>>>>> On 24.04.2024 09:23, Cui, Lili wrote:
>>>>>>>>> --- a/gas/config/tc-i386.c
>>>>>>>>> +++ b/gas/config/tc-i386.c
>>>>>>>>> @@ -10434,6 +10434,14 @@ build_modrm_byte (void)
>>>>>>>>>
>>>>>>>>> switch (i.tm.opcode_modifier.vexvvvv)
>>>>>>>>> {
>>>>>>>>> + case VexVVVV_SRC2:
>>>>>>>>> + if (source != op)
>>>>>>>>> + {
>>>>>>>>> + v = source++;
>>>>>>>>> + break;
>>>>>>>>> + }
>>>>>>>>> + /* For XOP: vpshl* and vpsha*. */
>>>>>>>>> + /* Fall through. */
>>>>>>>>> case VexVVVV_SRC1:
>>>>>>>>
>>>>>>>> This falling-through is odd and hence needs a better comment
>>>>>>>> (then also covering vprot*, which afaict is similarly affected).
>>>>>>>> The reason for this is the XOP.W-controlled operand swapping, if
>>>>>>>> I'm not mistaken? In which case perhaps instead of the
>>>>>>>> fall-through here the logic swapping the operands should replace
>>>>>>>> VexVVVV_SRC2 by
>>>>>> VexVVVV_SRC1?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, vprot* should be included, and it is related to
>>>>>>> XOP.W-controlled
>>>>>> operand swapping, the comments says " /* Only the first two
>>>>>> register operands need reversing, alongside flipping VEX.W. */ ",
>>>>>> But there is actually a memory operand, not two register operands.
>>>>>>>
>>>>>>> I think VexVVVV_SRC2 makes more sense here, it matches the actual
>>>>>> situation, we want to use vvvv to encode the first operand.
>>>>>>>
>>>>>>> Opcode table:
>>>>>>> vprot<xop>, 0x90 | <xop:opc>, XOP,
>>>>>>> D|Modrm|Vex128|SpaceXOP09|VexVVVV_Src2|VexW0|NoSuf,
>>>> { RegXMM,
>>>>>>> RegXMM|Unspecified|BaseIndex, RegXMM }
>>>>>>>
>>>>>>> testcase:
>>>>>>> vprotb (%rax),%xmm12,%xmm15
>>>>>>> vprotb %xmm15,(%r12),%xmm0
>>>>>>
>>>>>> VexVVVV_Src2 is appropriate for the latter, yes, but not for the
>>>>>> former. That uses VexVVVV_Src1 layout. Hence my suggestion to
>>>>>> replace the attribute when swapping operands.
>>>>>>
>>>>>
>>>>> If replace the Src2VVVV| VexW0 with Src1VVVV| VexW1 and swapping
>>>> operands. We can put VexVVVV_SRC1 before VexVVVV_SRC2, but we still
>>>> need to add "(!is_cpu (&i.tm, CpuXOP) || source == op" under
>>>> VexVVVV_SRC1 , and match_template also needs to be adjusted (I made a
>>>> simple modification and it still failed, I think continuing like this
>>>> may go against the original intention).
>>>>>
>>>>> switch (i.tm.opcode_modifier.vexvvvv)
>>>>> {
>>>>> /* VEX.vvvv encodes the first source register operand. */
>>>>> case VexVVVV_SRC1:
>>>>> if (!is_cpu (&i.tm, CpuXOP) || source == op)
>>>>> {
>>>>> v = dest - 1;
>>>>> break;
>>>>> }
>>>>> /* For XOP: vpshl*, vpsha* and vprot*. */
>>>>> /* Fall through. */
>>>>> /* VEX.vvvv encodes the last source register operand. */
>>>>> case VexVVVV_SRC2:
>>>>> v = source++;
>>>>> break;
>>>>> /* VEX.vvvv encodes the destination register operand. */
>>>>> case VexVVVV_DST:
>>>>> v = dest--;
>>>>> break;
>>>>> default:
>>>>> v = ~0;
>>>>> break;
>>>>> }
>>>>>
>>>>> Do you think we should add a separate patch 4 for XOP that removes
>>>>> the
>>>> special handling in match_template and completes its template? so we
>>>> don't have to add special handling for src1vvvv or src2vvvv. This
>>>> might go against your desire to reduce template size, but it would
>>>> help simplify the logic. I'd like to know your thoughts.
>>>>
>>>> Indeed. You'd effectively revert earlier folding that I did. And the
>>>> adjustment I suggested earlier ought to be small/simple enough.
>>>>
>>>
>>> So, I continued working on the previous suggestion. With the following
>> modification and it worked.
>>>
>>> @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix)
>>> || is_cpu (t, CpuAPX_F));
>>> if (!operand_type_match (overlap0, i.types[0])
>>> || !operand_type_match (overlap1, i.types[j])
>>> - || (t->operands == 3
>>> + || (t->operands == 3 && !is_cpu (t, CpuXOP)
>>> && !operand_type_match (overlap2, i.types[1]))
>>> || (check_register
>>> && !operand_type_register_match (i.types[0],
>>
>> Just to mention it - this certainly isn't what I suggested. In fact I seem to
>> vaguely recall that something similar was once proposed during the original
>> APX work as well, where I then objected, too.
>>
>>> But I found that there are 4 test files that failed, I didn't find the doc on
>> how to encode vprotb but I guess that is because I changed the default
>> template from Src2VVVV| VexW0 to Src1VVVV| VexW1, then all the related
>> test cases needed to be modified. Do you have any comments here?
>>>
>>> regexp "^[ ]*[a-f0-9]+: 8f e9 40 90
>> d8[ ]+vprotb %xmm7,%xmm0,%xmm3$"
>>> line " 11ed: 8f e9 f8 90 df vprotb %xmm7,%xmm0,%xmm3"
>>
>> Well, while changing the templates is in principle possible, and the resulting
>> code would still be correct, changing encodings it usually not a good idea.
>> Thus when it can be avoided, it should be avoided, imo. Hence why I didn't
>> suggest this, but to amend the code doing the operand swapping (for the
>> case where operand order is controlled by XOP.W).
>>
>
> I mistakenly thought this was what you wanted, and I also think this modification is unacceptable. Could you elaborate further on your original suggestion?
At the bottom of match_template() we have
case Opcode_VexW:
/* Only the first two register operands need reversing, alongside
flipping VEX.W. */
i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
This is where I think you further want to adjust i.tm.opcode_modifier.vexvvvv.
Jan
More information about the Binutils
mailing list