[PATCH v3 4/9] Support APX GPR32 with extend evex prefix

Tue Dec 12 12:39:38 GMT 2023

On 12.12.2023 13:32, Cui, Lili wrote:
>>>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>>>>>>>
>>>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
>>>> variants.  */
>>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>>> -  {
>>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>>>>>> -	   || maybe_cpu (t, CpuFMA))
>>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>>>>>>> +    {
>>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>>>> APX_F(CpuCMPCCXADD)
>>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>>>> APX_F(CpuAVX512DQ)
>>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
>> APX_F(CpuBMI2))
>>>>>>>  	{
>>>>>>>  	  if (need_evex_encoding ())
>>>>>>
>>>>>> There are several issues here:
>>>>>> - Why did you need to change (to the worse) the original code?
>>>>>> - Why did you not model the addition after that original code?
>>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
>>>> AVX512
>>>>>> insn can be VEX-encoded?
>>>>>
>>>>>  I don't understand what you mean, we have this combination.
>>>>>
>>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
>>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
>>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
>>>>
>>>> Oh, I'm sorry: I forgot about the mask register insns.
>>>>
>>>>>> - If these new macros are really needed for whatever reason, they
>>>> shouldn't
>>>>>>   be added to opcodes/i386-opc.h when they're useful only in the
>>>> assembler.
>>>>>> - Style requires a blank before the opening parenthesis in function
>>>>>>   invocations (which also covers function-like macro invocations).
>>>>>>
>>>>>> I think I asked before: How is it that you get away without
>>>>>> altering cpu_flags_match(), containing related and quite similar logic?
>>>>>>
>>>>>
>>>>> For the original logic ( ... || ... ) && ( ... || ...), the content
>>>>> in the first bracket
>>>> and the content in the following brackets can be combined
>>>> arbitrarily. I think it is Inaccurate.
>>>>
>>>> In which way? If there are issues with the existing code, these
>>>> issues want taking care of in separate (prereq) patches. Of course
>>>> there are assumptions made here about the CPU combinations that can
>>>> (and cannot) occur in any of our templates. Similar assumptions are imo
>> fine to make in the APX additions.
>>>>
>>>> Note how I used two nested if()s despite that not having been
>>>> necessary at that time. I did so in anticipation that for APX you'd
>>>> want to add another
>>>> (separate) inner if(), rather than altering the one that's there.
>>>
>>> Could we remove the CPU check here? it's a bit ugly and has limited
>> effectiveness.
>>>
>>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>     {
>>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>> APX_F(CpuCMPCCXADD)
>>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>> APX_F(CpuAVX512DQ)
>>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>
>> I agree on the "a bit ugly" part, but taking what's there right now I don't
>> understand "has limited effectiveness". Of course you can remove any code
>> you want, provided you can prove nothing breaks.
>>
> 
> Here is install_template().
> All I can say is that after removing the CPU check, no test cases failed. I know it's hard to convince you to delete this place, or what do you suggest to do with this? APX requires this, otherwise the test cases will fail.
> 
> -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
> -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
> -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> -       {

So be it then (assuming you don't delete any pre-existing code there). As
said, I expect this will bite us later.

>>>>> Just found cpu_flags_match() has similar logic, I think the
>>>>> following is the
>>>> only code related to CPUID alerts, but none of our combinations are
>>>> related to cpuavx.
>>>>>
>>>>>           if (all.bitfield.cpuavx)
>>>>>             {
>>>>>               /* We need to check SSE2AVX with AVX.  */
>>>>>               if (!t->opcode_modifier.sse2avx
>>>>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>>>>>                 match |= CPU_FLAGS_ARCH_MATCH;
>>>>>             }
>>>>
>>>> Not sure why you pick out this one. This special case is needed for
>>>> sse2avx; I don't see how it's related here. What I've been pointing
>>>> you at is the code in that function which follows a similar "Dual VEX/EVEX
>> templates ..."
>>>> comment.
>>>>
>>>
>>> I know you're talking about this code, I'm just guessing what it does? Don't
>> know what I missed.
>>
>> You pulled out this sse2avx code. Hence I was expecting you to tell me why
>> you consider it relevant here.
>>
> Here is cpu_flag_match().
> 
> I rechecked the code, maybe you want to say I missed the outer loop.
> 
>       cpu = cpu_flags_and (any, active);
>       if (cpu_flags_all_zero (&any) || !cpu_flags_all_zero (&cpu))
>         {
>           if (all.bitfield.cpuavx)
>             {
>               /* We need to check SSE2AVX with AVX.  */
>               if (!t->opcode_modifier.sse2avx
>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>                 match |= CPU_FLAGS_ARCH_MATCH;
>             }
>           else
>             match |= CPU_FLAGS_ARCH_MATCH;
>         }

No, ...

>>> For example
>>>
>>> .arch .nobmi
>>> andn    (%eax), %eax, %eax
>>>
>>> ---------------------------------------------------------------------------------------------
>>>   if (flag_code != CODE_64BIT)
>>>     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
>>>   else
>>>     active = cpu_arch_flags;                   ---> cpubmi = 0;
>>>   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
>>>   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
>>>     {
>>>     ...
>>>     }
>>> Return  CPU_FLAGS_64BIT_MATCH
>>> ----------------------------------------------------------------------
>>> ------------------------
>>> Then we will report an arch error.
>>>
>>>           if (supported != CPU_FLAGS_PERFECT_MATCH)
>>>             {
>>>               as_bad (_("`%s' is not supported on `%s%s'"),
>>>                       insn_name (current_templates.start),
>>>                       cpu_arch_name ? cpu_arch_name : default_arch,
>>>                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
>>>               return NULL;
>>>             }
>>
>> Which is what we want, I think (for the particular example you picked)? Yet
>> again, I don't think I can see what you're trying to tell me. I also have to
>> confess I've lost track of whether we're discussing install_template(),
>> cpu_flag_match(), or both. For example in install_template() you may indeed
>> be able to get away with little or no changes, as long as there's no used
>> features tracking for APX (see the early ELF-specific part of output_insn()).
>> Things would be somewhat inconsistent then, but that may be tolerable (as
>> long as properly justified in the patch description). Not getting this into
>> proper shape right with the introduction of APX may bite us later, though.
>>
> 
> Here is cpu_flag_match().
> I just want to say that for the APX part we don't need to handle it in the "Double VEX/EVEX Template...".

... I was referring to the dual VEX/EVEX logic. I have to admit I still don't
understand how you get away without touching that, but if everything works,
all is fine of course.

Jan