[PATCH 2/3] x86: Drop SwapSources

Mon Apr 29 13:41:27 GMT 2024

> On 29.04.2024 14:23, Cui, Lili wrote:
> >> On 28.04.2024 06:47, Cui, Lili wrote:
> >>>> On 26.04.2024 10:14, Cui, Lili wrote:
> >>>>>> On 24.04.2024 09:23, Cui, Lili wrote:
> >>>>>>> --- a/gas/config/tc-i386.c
> >>>>>>> +++ b/gas/config/tc-i386.c
> >>>>>>> @@ -10434,6 +10434,14 @@ build_modrm_byte (void)
> >>>>>>>
> >>>>>>>    switch (i.tm.opcode_modifier.vexvvvv)
> >>>>>>>      {
> >>>>>>> +    case VexVVVV_SRC2:
> >>>>>>> +      if (source != op)
> >>>>>>> +	{
> >>>>>>> +	  v = source++;
> >>>>>>> +	  break;
> >>>>>>> +	}
> >>>>>>> +      /* For XOP: vpshl* and vpsha*.  */
> >>>>>>> +      /* Fall through.  */
> >>>>>>>      case VexVVVV_SRC1:
> >>>>>>
> >>>>>> This falling-through is odd and hence needs a better comment
> >>>>>> (then also covering vprot*, which afaict is similarly affected).
> >>>>>> The reason for this is the XOP.W-controlled operand swapping, if
> >>>>>> I'm not mistaken? In which case perhaps instead of the
> >>>>>> fall-through here the logic swapping the operands should replace
> >>>>>> VexVVVV_SRC2 by
> >>>> VexVVVV_SRC1?
> >>>>>>
> >>>>>
> >>>>> Yes, vprot* should be included, and it is related to
> >>>>> XOP.W-controlled
> >>>> operand swapping, the comments says " /* Only the first two
> >>>> register operands need reversing, alongside flipping VEX.W.  */ ",
> >>>> But there is actually a memory operand, not two register operands.
> >>>>>
> >>>>> I think VexVVVV_SRC2 makes more sense here, it matches the actual
> >>>> situation, we want to use vvvv to encode the first operand.
> >>>>>
> >>>>> Opcode table:
> >>>>> vprot<xop>, 0x90 | <xop:opc>, XOP,
> >>>>> D|Modrm|Vex128|SpaceXOP09|VexVVVV_Src2|VexW0|NoSuf,
> >> { RegXMM,
> >>>>> RegXMM|Unspecified|BaseIndex, RegXMM }
> >>>>>
> >>>>> testcase:
> >>>>> vprotb (%rax),%xmm12,%xmm15
> >>>>> vprotb %xmm15,(%r12),%xmm0
> >>>>
> >>>> VexVVVV_Src2 is appropriate for the latter, yes, but not for the
> >>>> former. That uses VexVVVV_Src1 layout. Hence my suggestion to
> >>>> replace the attribute when swapping operands.
> >>>>
> >>>
> >>> If replace the Src2VVVV| VexW0 with Src1VVVV| VexW1 and swapping
> >> operands. We can put VexVVVV_SRC1 before VexVVVV_SRC2, but we still
> >> need to add "(!is_cpu (&i.tm, CpuXOP) || source == op" under
> >> VexVVVV_SRC1 , and match_template also needs to be adjusted (I made a
> >> simple modification and it still failed, I think continuing like this
> >> may go against the original intention).
> >>>
> >>>   switch (i.tm.opcode_modifier.vexvvvv)
> >>>     {
> >>>     /* VEX.vvvv encodes the first source register operand.  */
> >>>     case VexVVVV_SRC1:
> >>>       if (!is_cpu (&i.tm, CpuXOP) || source == op)
> >>>         {
> >>>           v =  dest - 1;
> >>>           break;
> >>>         }
> >>>     /* For XOP: vpshl*, vpsha* and vprot*.  */
> >>>     /* Fall through.  */
> >>>     /* VEX.vvvv encodes the last source register operand.  */
> >>>     case VexVVVV_SRC2:
> >>>       v = source++;
> >>>       break;
> >>>     /* VEX.vvvv encodes the destination register operand.  */
> >>>     case VexVVVV_DST:
> >>>       v = dest--;
> >>>       break;
> >>>     default:
> >>>       v = ~0;
> >>>       break;
> >>>      }
> >>>
> >>> Do you think we should add a separate patch 4 for XOP that removes
> >>> the
> >> special handling in match_template and completes its template? so we
> >> don't have to add special handling for src1vvvv or src2vvvv. This
> >> might go against your desire to reduce template size, but it would
> >> help simplify the logic. I'd like to know your thoughts.
> >>
> >> Indeed. You'd effectively revert earlier folding that I did. And the
> >> adjustment I suggested earlier ought to be small/simple enough.
> >>
> >
> > So, I continued working on the previous suggestion. With the following
> modification and it worked.
> >
> > @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix)
> >                           || is_cpu (t, CpuAPX_F));
> >               if (!operand_type_match (overlap0, i.types[0])
> >                   || !operand_type_match (overlap1, i.types[j])
> > -                 || (t->operands == 3
> > +                 || (t->operands == 3 && !is_cpu (t, CpuXOP)
> >                       && !operand_type_match (overlap2, i.types[1]))
> >                   || (check_register
> >                       && !operand_type_register_match (i.types[0],
> 
> Just to mention it - this certainly isn't what I suggested. In fact I seem to
> vaguely recall that something similar was once proposed during the original
> APX work as well, where I then objected, too.
> 
> > But I found that there are 4 test files that failed, I didn't find the doc on
> how to encode vprotb but I guess that is because I changed the default
> template from Src2VVVV| VexW0 to Src1VVVV| VexW1, then all the related
> test cases needed to be modified. Do you have any comments here?
> >
> > regexp "^[      ]*[a-f0-9]+:    8f e9 40 90
> d8[         ]+vprotb %xmm7,%xmm0,%xmm3$"
> > line   "    11ed:       8f e9 f8 90 df          vprotb %xmm7,%xmm0,%xmm3"
> 
> Well, while changing the templates is in principle possible, and the resulting
> code would still be correct, changing encodings it usually not a good idea.
> Thus when it can be avoided, it should be avoided, imo. Hence why I didn't
> suggest this, but to amend the code doing the operand swapping (for the
> case where operand order is controlled by XOP.W).
> 

I mistakenly thought this was what you wanted, and I also think this modification is unacceptable. Could you elaborate further on your original suggestion?

Thanks,
Lili.