[PATCH v3] Support ymm rounding control for Intel AVX10.2

Thu Aug 15 15:29:44 GMT 2024

On 15.08.2024 15:08, Jiang, Haochen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, August 15, 2024 5:24 PM
>> To: Jiang, Haochen <haochen.jiang@intel.com>
>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>> Subject: Re: [PATCH v3] Support ymm rounding control for Intel AVX10.2
>>
>> On 15.08.2024 03:06, Jiang, Haochen wrote:
>>>>> --- a/opcodes/i386-opc.tbl
>>>>> +++ b/opcodes/i386-opc.tbl
>>>>> @@ -156,6 +156,8 @@
>>>>>  // substantially similar), depending on what encoding was requested.
>>>>>  #define APX_F(cpuid) cpuid&(cpuid|APX_F)
>>>>>
>>>>> +#define AVX10_2(cpuid) cpuid&(cpuid|AVX10_2)
>>>>
>>>> ... this. The StaticRounding / SAE insn forms are all suitably
>>>> identified by these two attributes. So far they applied to EVEX512
>>>> only. All that changes is that now they apply to EVEX256 as well. As
>>>> long as AVX10.2 is available, of course. Therefore all (or at least
>>>> most) of the templates should be possible to leave alone (on the
>>>> assumption that there are no outliers, i.e. no mnemonics which allow
>>>> RC/SAE in 512-bit forms bit not in 256-bit ones). You already change
>>>> check_VecOperands(), it just needs doing a little differently (in
>>>> particular without setting .ymm and without using maybe_cpu()).
>>>>
>>>> To achieve this, it may end up necessary to split encoding_evex512
>>>> into two enumerators, the new one being encoding_sae.
>>>
>>> I haven't investigated that fully, but I suppose we can definitely do that.
>>> The concern is if we should do that.
>>>
>>> IMO,  if doing that and not adding AVX10.2 to table, it means that the
>>> ymm rounding in AVX10.2 is implicitly enabled, unlike all the other
>>> features which are explicitly enabled. It will cause confusion when
>>> someone wants to go through the table to know which feature belongs to
>>> which ISA. It will also take more time to investigate that why
>>> suddenly ymm rounding is turned on if they are not familiar with the latest ISA.
>>
>> The table's primary purpose isn't documentation. One of the goals continues to be
>> to limit the number of templates we have, to in turn limit the number of matching
>> attempts that need doing while trying to find the template for a given insn. That's
> 
> I get your concern. However, my change is only altering the CPUIDs. No new
> templates is added in the table. Therefore, it won't add the number of templates
> for going through the whole table of templates.
> 
> For the effort side, whatever way we are using, we will always need to find the
> original AVX512 template, which is also the template I changed. It is exactly the
> same path per my understanding.
> 
> If that is the only concern, it is somehow not that convinced for me.

Hmm, you're right, I was still having v1 in mind, it seems. Nevertheless:
Why a more complicated (and hence harder to parse/understand) CPU
specifier than needed? If the same functionality can be achieved while
leaving the templates alone, I guess I'd still prefer that.

Jan