[PATCH 3/5] x86: support AVX10.1/512

Tue Sep 5 07:04:36 GMT 2023

> >> Actually there's something similar with AVX10 itself: AVX512F includes
> >> equivalents right away of what comes under separate extensions for AVX:
> >> F16C and FMA. AVX10, otoh, is presently specified to only guarantee
> >> AVX and AVX2. Does that mean that VEX-encoded vfm{add,sub}* and ps<-ph
> >> conversion insns aren't guaranteed to also be available? Doesn't seem
> >> logical to me, so I'm inclined to make FMA and F16C prereqs of AVX10.1
> >> as well (or alternatively of AVX512F, but I think this would have
> >> undesirable effects). AVX2 isn't an explicit prereq only because it
> >> already is one of AVX512F.
> >
> > I suppose AVX10 should only enable EVEX encoding,  they have nothing
> > to do with the VEX encoding.
> >
> > For those independent VEX ISAs, if AVX512F is not enabling it, AVX10 neither.
> >
> > Actually, not only F16C and FMA, under AVX10, ISAs like AVX-VNNI, AVX-IFMA
> > are also not enabled.
> 
> The difference to the AVX-* ones you mention is important here: AVX-VNNI
> (taking that as example) isn't a feature that had equivalent EVEX
> encodings added right in AVX512F. So I'd like to ask that you re-consider

I see your point since here we are just focusing on features introduced in
AVX512F. But I still would like to mention AVX-VNNI below just for discussion.

> what you said. Also think about what the compiler does (which doesn't
> emit .arch directives to limit the usable ISA extensions) when just
> -mavx512vl is passed to it: VEX-encoded vfm{add,sub}* would then still be
> resulting (to prevent that, the compiler would need to further emit {evex}
> pseudo-prefixes). IOW in the compiler there is such an implication already
> anyway.

For FMA, in GCC, we have such comment on that:

;; The standard names for scalar FMA are only available with SSE math enabled.
;; CPUID bit AVX512F enables evex encoded scalar and 512-bit fma.  It doesn't
;; care about FMA bit, so we enable fma for TARGET_AVX512F even when TARGET_FMA
;; and TARGET_FMA4 are both false.
;; TODO: In theory AVX512F does not automatically imply FMA, and without FMA
;; one must force the EVEX encoding of the fma insns.  Ideally we'd improve
;; GAS to allow proper prefix selection.  However, for the moment all hardware
;; that supports AVX512F also supports FMA so we can ignore this for now.

Although splitting the pattern between FMA/FMA4 and AVX512F, the code itself actually
won't emit an {evex} prefix in mnemonic if there is only AVX512F since there is no true
hardware for codegen to do so.

For F16C, the pattern is even not split, so the scenario is the same as FMA/FMA4.

Therefore, I suppose it could be ok for AVX10 to imply FMA/F16C in gas for simplicity. But
let's wait for H.J.'s opinion on that.

For AVX-VNNI issue, it is introduced in Sapphire Rapids, which is before AVX10.1 introduction
(Granite Rapids), which means that on the hardware we will always have AVX-VNNI while
AVX10.1 is there. So there might be a chance to imply AVX-VNNI in AVX10.1 in compiler,
but we could put that discussion after everything in AVX10.1 is set in community.

Thx,
Haochen

> 
> Jan