[PATCH 2/3] x86: Drop SwapSources

Tue Apr 30 07:34:32 GMT 2024

> On 30.04.2024 04:56, Cui, Lili wrote:
> >> On 29.04.2024 15:41, Cui, Lili wrote:
> >>>> On 29.04.2024 14:23, Cui, Lili wrote:
> >>>>>> On 28.04.2024 06:47, Cui, Lili wrote:
> >>>>>>>> On 26.04.2024 10:14, Cui, Lili wrote:
> >>>>>>>>>> On 24.04.2024 09:23, Cui, Lili wrote:
> >>>>>>>>>>> --- a/gas/config/tc-i386.c
> >>>>>>>>>>> +++ b/gas/config/tc-i386.c
> >>>>>>>>>>> @@ -10434,6 +10434,14 @@ build_modrm_byte (void)
> >>>>>>>>>>>
> >>>>>>>>>>>    switch (i.tm.opcode_modifier.vexvvvv)
> >>>>>>>>>>>      {
> >>>>>>>>>>> +    case VexVVVV_SRC2:
> >>>>>>>>>>> +      if (source != op)
> >>>>>>>>>>> +	{
> >>>>>>>>>>> +	  v = source++;
> >>>>>>>>>>> +	  break;
> >>>>>>>>>>> +	}
> >>>>>>>>>>> +      /* For XOP: vpshl* and vpsha*.  */
> >>>>>>>>>>> +      /* Fall through.  */
> >>>>>>>>>>>      case VexVVVV_SRC1:
> >>>>>>>>>>
> >>>>>>>>>> This falling-through is odd and hence needs a better comment
> >>>>>>>>>> (then also covering vprot*, which afaict is similarly affected).
> >>>>>>>>>> The reason for this is the XOP.W-controlled operand swapping,
> >>>>>>>>>> if I'm not mistaken? In which case perhaps instead of the
> >>>>>>>>>> fall-through here the logic swapping the operands should
> >>>>>>>>>> replace
> >>>>>>>>>> VexVVVV_SRC2 by
> >>>>>>>> VexVVVV_SRC1?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Yes, vprot* should be included, and it is related to
> >>>>>>>>> XOP.W-controlled
> >>>>>>>> operand swapping, the comments says " /* Only the first two
> >>>>>>>> register operands need reversing, alongside flipping VEX.W.  */
> >>>>>>>> ", But there is actually a memory operand, not two register
> operands.
> >>>>>>>>>
> >>>>>>>>> I think VexVVVV_SRC2 makes more sense here, it matches the
> >>>>>>>>> actual
> >>>>>>>> situation, we want to use vvvv to encode the first operand.
> >>>>>>>>>
> >>>>>>>>> Opcode table:
> >>>>>>>>> vprot<xop>, 0x90 | <xop:opc>, XOP,
> >>>>>>>>> D|Modrm|Vex128|SpaceXOP09|VexVVVV_Src2|VexW0|NoSuf,
> >>>>>> { RegXMM,
> >>>>>>>>> RegXMM|Unspecified|BaseIndex, RegXMM }
> >>>>>>>>>
> >>>>>>>>> testcase:
> >>>>>>>>> vprotb (%rax),%xmm12,%xmm15
> >>>>>>>>> vprotb %xmm15,(%r12),%xmm0
> >>>>>>>>
> >>>>>>>> VexVVVV_Src2 is appropriate for the latter, yes, but not for
> >>>>>>>> the former. That uses VexVVVV_Src1 layout. Hence my suggestion
> >>>>>>>> to replace the attribute when swapping operands.
> >>>>>>>>
> >>>>>>>
> >>>>>>> If replace the Src2VVVV| VexW0 with Src1VVVV| VexW1 and
> swapping
> >>>>>> operands. We can put VexVVVV_SRC1 before VexVVVV_SRC2, but we
> >> still
> >>>>>> need to add "(!is_cpu (&i.tm, CpuXOP) || source == op" under
> >>>>>> VexVVVV_SRC1 , and match_template also needs to be adjusted (I
> >> made
> >>>>>> a simple modification and it still failed, I think continuing
> >>>>>> like this may go against the original intention).
> >>>>>>>
> >>>>>>>   switch (i.tm.opcode_modifier.vexvvvv)
> >>>>>>>     {
> >>>>>>>     /* VEX.vvvv encodes the first source register operand.  */
> >>>>>>>     case VexVVVV_SRC1:
> >>>>>>>       if (!is_cpu (&i.tm, CpuXOP) || source == op)
> >>>>>>>         {
> >>>>>>>           v =  dest - 1;
> >>>>>>>           break;
> >>>>>>>         }
> >>>>>>>     /* For XOP: vpshl*, vpsha* and vprot*.  */
> >>>>>>>     /* Fall through.  */
> >>>>>>>     /* VEX.vvvv encodes the last source register operand.  */
> >>>>>>>     case VexVVVV_SRC2:
> >>>>>>>       v = source++;
> >>>>>>>       break;
> >>>>>>>     /* VEX.vvvv encodes the destination register operand.  */
> >>>>>>>     case VexVVVV_DST:
> >>>>>>>       v = dest--;
> >>>>>>>       break;
> >>>>>>>     default:
> >>>>>>>       v = ~0;
> >>>>>>>       break;
> >>>>>>>      }
> >>>>>>>
> >>>>>>> Do you think we should add a separate patch 4 for XOP that
> >>>>>>> removes the
> >>>>>> special handling in match_template and completes its template? so
> >>>>>> we don't have to add special handling for src1vvvv or src2vvvv.
> >>>>>> This might go against your desire to reduce template size, but it
> >>>>>> would help simplify the logic. I'd like to know your thoughts.
> >>>>>>
> >>>>>> Indeed. You'd effectively revert earlier folding that I did. And
> >>>>>> the adjustment I suggested earlier ought to be small/simple enough.
> >>>>>>
> >>>>>
> >>>>> So, I continued working on the previous suggestion. With the
> >>>>> following
> >>>> modification and it worked.
> >>>>>
> >>>>> @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix)
> >>>>>                           || is_cpu (t, CpuAPX_F));
> >>>>>               if (!operand_type_match (overlap0, i.types[0])
> >>>>>                   || !operand_type_match (overlap1, i.types[j])
> >>>>> -                 || (t->operands == 3
> >>>>> +                 || (t->operands == 3 && !is_cpu (t, CpuXOP)
> >>>>>                       && !operand_type_match (overlap2, i.types[1]))
> >>>>>                   || (check_register
> >>>>>                       && !operand_type_register_match (i.types[0],
> >>>>
> >>>> Just to mention it - this certainly isn't what I suggested. In fact
> >>>> I seem to vaguely recall that something similar was once proposed
> >>>> during the original APX work as well, where I then objected, too.
> >>>>
> >>>>> But I found that there are 4 test files that failed, I didn't find
> >>>>> the doc on
> >>>> how to encode vprotb but I guess that is because I changed the
> >>>> default template from Src2VVVV| VexW0 to Src1VVVV| VexW1, then all
> >>>> the related test cases needed to be modified. Do you have any
> >>>> comments
> >> here?
> >>>>>
> >>>>> regexp "^[      ]*[a-f0-9]+:    8f e9 40 90
> >>>> d8[         ]+vprotb %xmm7,%xmm0,%xmm3$"
> >>>>> line   "    11ed:       8f e9 f8 90 df          vprotb %xmm7,%xmm0,%xmm3"
> >>>>
> >>>> Well, while changing the templates is in principle possible, and
> >>>> the resulting code would still be correct, changing encodings it
> >>>> usually not a
> >> good idea.
> >>>> Thus when it can be avoided, it should be avoided, imo. Hence why I
> >>>> didn't suggest this, but to amend the code doing the operand
> >>>> swapping (for the case where operand order is controlled by XOP.W).
> >>>>
> >>>
> >>> I mistakenly thought this was what you wanted, and I also think this
> >> modification is unacceptable. Could you elaborate further on your
> >> original suggestion?
> >>
> >> At the bottom of match_template() we have
> >>
> >>     case Opcode_VexW:
> >>       /* Only the first two register operands need reversing, alongside
> >> 	 flipping VEX.W.  */
> >>       i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
> >>
> >> This is where I think you further want to adjust
> >> i.tm.opcode_modifier.vexvvvv.
> >>
> >
> > "vprotb %xmm7,%xmm0,%xmm3" doesn't go through this place, it finds
> the matching template directly (Src1VVVV|VexW1|NoSuf,
> { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }).
> 
> Of course, for not having a memory operand (and no pseudo prefix). But
> that's not the problem here - the problem is that for build_modrm_byte()
> you want to make adjustments when operands need swapping (compared to
> what the templates say).
> 
> > I inserted a judgment to solve this problem, it's a bit ugly. I'd like to
> abandon this optimization.
> 
> Didn't you have a working form earlier on? Could we use that, leaving the
> XOP stuff untouched for now, for me to see about looking into later?
> 

Yes, I mean I will continue with these patches, but without the XOP optimization. Thanks for helping look at this part later.

The next 5 days are our Labor Day, and email responses will be slower.

Regards,
Lili.