[PATCH 4/5] x86/APX: extend SSE2AVX coverage

Wed Apr 3 09:22:02 GMT 2024

> On 03.04.2024 09:59, Cui, Lili wrote:
> >>> This conversion is clever, although the mnemonic has changed, but
> >> considering it is controlled by -msse2avx, maybe we can mention in
> >> the option that it might change the mnemonic. Judging from the option
> >> name alone, it is difficult for users to predict that the mnemonic
> >> will change (traditionally, it seems to just add V).
> >>
> >> I don't think doc adjustment is needed here. We already have at least
> >> one example where the mnemonic also changes: CVTPI2PD ->
> VCVTDQ2PD.
> >>
> >
> > Oh, there has been such a conversion before. Another thing that comes to
> mind is that sse2avx was previously used to support sse to vex conversion.
> This option works on machines that don't support evex. We now extend sse
> to evex, which makes this option unavailable on machines that do not
> support the evex instruction (e.g. hybrid machines like Alderlake). Do you
> think we should add a new option?
> 
> That's a question I've tentatively answered with "No". SSE => VEX requires
> systems supporting AVX. SSE-with-eGPR requires systems with APX.
> SSE-with-eGPR => EVEX similarly can rely on APX being there, and I expect all
> such systems will support at least AVX10/128. If that is deemed a wrong
> assumption, then indeed we may need to consider adding a new option (but
> not -msse2avx512 as you suggest further down, as SSE only ever covers 128-
> bit operations; -msse2avx10 maybe).
> 

Yes, I was wrong, only Egprs trigger sse to evex conversion. Your assumption is correct.

> >>>> Should we also convert %xmm<N>-only templates (to consistently
> >>>> permit use of {evex})? Or should we reject use of {evex}, but then
> >>>> also that of {vex}/{vex3}?
> >>>
> >>> Do you mean SHA and KeyLocker?
> >>
> >> No, I mean templates with all XMM operands and no memory ones. Such
> >> don't use eGPR-s, yet could be converted to their EVEX counterparts,
> >> too (by way of the programmer adding {evex} to the _legacy_ insn).
> >> Hence the question on how to treat {evex} there, and then also {vex}
> >> / {vex3}. Take, for example, MOVHLPS or MOVLHPS.
> >
> > I'm not sure if you want to support this conversion under -sse2avx. I think
> this conversion is only used by people writing assembler by hand.
> 
> Aiui -msse2avx is there mainly for hand-written assembly. Compilers will do
> better insn selection on their own anyway.
> 
> > As for adding a prefix to convert sse to vex or evex, I think this requirement
> doesn't make much sense at the moment, maybe in the future if evex is faster
> than the vex instruction we can provide an option like sse2avx512 to achieve
> this conversion.
> 
> That's not my point. Consider this example:
> 
> 	.text
> sse2avx:
> 		movlhps	%xmm0, %xmm1
> 	{vex}	movlhps	%xmm0, %xmm1
> 	{vex3}	movlhps	%xmm0, %xmm1
> 	{evex}	movlhps	%xmm0, %xmm1
> 
> 		movlps	(%rax), %xmm1
> 	{vex}	movlps	(%rax), %xmm1
> 	{vex3}	movlps	(%rax), %xmm1
> 	{evex}	movlps	(%rax), %xmm1
> 
> Other than the {evex}-prefixed lines, everything assembles smoothly prior to
> the patch here. IOW even {vex3} has an effect on the non-VEX mnemonic.
> With my patch as it is now, the 2nd {evex}-prefixed line assembles fine, while
> the 1st doesn't. This is simply inconsistent. Hence why I see two
> options: Disallow all three pseudo-prefixes on legacy mnemonics, or permit
> {evex} consistently, too.
> 

Oh, I got you, thank you for your detailed explanation. This increases the robustness of binutils, and since we already support some of them, I thought it would be nice to support it if it didn't require too much effort.

Thanks,
Lili.