[PATCH v3] Support ymm rounding control for Intel AVX10.2
Jiang, Haochen
haochen.jiang@intel.com
Thu Aug 15 13:08:29 GMT 2024
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, August 15, 2024 5:24 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> Subject: Re: [PATCH v3] Support ymm rounding control for Intel AVX10.2
>
> On 15.08.2024 03:06, Jiang, Haochen wrote:
> >>> --- a/opcodes/i386-opc.tbl
> >>> +++ b/opcodes/i386-opc.tbl
> >>> @@ -156,6 +156,8 @@
> >>> // substantially similar), depending on what encoding was requested.
> >>> #define APX_F(cpuid) cpuid&(cpuid|APX_F)
> >>>
> >>> +#define AVX10_2(cpuid) cpuid&(cpuid|AVX10_2)
> >>
> >> ... this. The StaticRounding / SAE insn forms are all suitably
> >> identified by these two attributes. So far they applied to EVEX512
> >> only. All that changes is that now they apply to EVEX256 as well. As
> >> long as AVX10.2 is available, of course. Therefore all (or at least
> >> most) of the templates should be possible to leave alone (on the
> >> assumption that there are no outliers, i.e. no mnemonics which allow
> >> RC/SAE in 512-bit forms bit not in 256-bit ones). You already change
> >> check_VecOperands(), it just needs doing a little differently (in
> >> particular without setting .ymm and without using maybe_cpu()).
> >>
> >> To achieve this, it may end up necessary to split encoding_evex512
> >> into two enumerators, the new one being encoding_sae.
> >
> > I haven't investigated that fully, but I suppose we can definitely do that.
> > The concern is if we should do that.
> >
> > IMO, if doing that and not adding AVX10.2 to table, it means that the
> > ymm rounding in AVX10.2 is implicitly enabled, unlike all the other
> > features which are explicitly enabled. It will cause confusion when
> > someone wants to go through the table to know which feature belongs to
> > which ISA. It will also take more time to investigate that why
> > suddenly ymm rounding is turned on if they are not familiar with the latest ISA.
>
> The table's primary purpose isn't documentation. One of the goals continues to be
> to limit the number of templates we have, to in turn limit the number of matching
> attempts that need doing while trying to find the template for a given insn. That's
I get your concern. However, my change is only altering the CPUIDs. No new
templates is added in the table. Therefore, it won't add the number of templates
for going through the whole table of templates.
For the effort side, whatever way we are using, we will always need to find the
original AVX512 template, which is also the template I changed. It is exactly the
same path per my understanding.
If that is the only concern, it is somehow not that convinced for me.
Thx,
Haochen
> why, for example, I went through the effort of folding the SAE templates back into
> their base ones. Which, as it turns out, is likely beneficial now - the same
> templates can hopefully simply be re-used for AVX10.2.
>
> There are, btw, other things which aren't explicit in the table (anymore).
>
> Jan
More information about the Binutils
mailing list