[PATCH 0/5] x86/Intel: AVX512 syntax enhancements

Wed May 18 15:07:47 GMT 2022

On Tue, May 17, 2022 at 11:40 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.05.2022 05:15, Cui, Lili wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Tuesday, May 17, 2022 8:00 PM
> >> To: Cui, Lili <lili.cui@intel.com>
> >> Cc: H.J. Lu <hjl.tools@gmail.com>; Binutils <binutils@sourceware.org>
> >> Subject: Re: [PATCH 0/5] x86/Intel: AVX512 syntax enhancements
> >>
> >>> 1. If we use BCST instead {1to*}, it cannot directly reflect the broadcast
> >> number. When the register size is zmm, but broadcast number is not the
> >> same.
> >>>
> >>> -[      ]*[a-f0-9]+:[   ]*62 f5 54 58 58 31[     ]*vaddph zmm6,zmm5,WORD PTR
> >> \[ecx\]\{1to32\}
> >>> +[      ]*[a-f0-9]+:[   ]*62 f5 54 58 58 31[     ]*vaddph zmm6,zmm5,WORD
> >> BCST \[ecx\]
> >>>
> >>> -[      ]*[a-f0-9]+:[   ]*62 65 7d df 5b 72 80[          ]*vcvtph2dq
> >> zmm30\{k7\}\{z\},WORD PTR \[rdx-0x100\]\{1to16\}
> >>> +[      ]*[a-f0-9]+:[   ]*62 65 7d df 5b 72 80[          ]*vcvtph2dq
> >> zmm30\{k7\}\{z\},WORD BCST \[rdx-0x100\]
> >>
> >> This case is clearly disambiguated by the destination register.
> >> What I think you're worried about are conversions where the field size
> >> shrinks (e.g. from 32 bits to 16 bits, like in vcvtdq2ph). In this case you will
> >> note that for the purpose of keeping things unambiguous the disassembler
> >> will continue to emit {1to<N>}, and the assembler will continue to require
> >> that extra bit of information.
> >>
> >
> > The format of appending {1to<N>} for vcvtdq2ph special case is great.
> > There is no ambiguity for the format of vcvtph2dq zmm30{k7}{z},WORD BCST [rdx-0x100], but we cannot direct know the N ({1to<N>}) for this BCST format, although we can confirm it with the SDM. I just trying to say for the first impression, BAST format has this disadvantage.
>
> But that's no different for e.g. VADDPS - the element count isn't explicit
> anywhere, it's known from register kind only.
>
> I don't, btw, have insight into how MASM disambiguates VCVTDQ2PH and alike.
>
> >>> 2. Just remove the last comma, it's ok for me, I remember FP16 has an
> >> instruction with {sae} on the middle position for the ATT format. But the intel
> >> format is placed at the end, I don't know if there is any problem.
> >>>
> >>> -[      ]*[a-f0-9]+:[   ]*62 f5 54 18 58 f4[     ]*vaddph zmm6,zmm5,zmm4,\{rn-
> >> sae\}
> >>> +[      ]*[a-f0-9]+:[   ]*62 f5 54 18 58 f4[     ]*vaddph zmm6,zmm5,zmm4\{rn-
> >> sae\}
> >>>
> >>> FP16:
> >>> vcvtusi2sh %edx, {rn-sae}, %xmm29, %xmm30 vcvtusi2sh
> >>> xmm6,xmm5,edx\{rn-sae\}
> >>
> >> Well, yes, this is not only not a problem, but intended. See how the SDM
> >> places the rounding/SAE modifiers. It's also not FP16-specific in any way.
> >>
> >
> > Yes, SDM put the rounding/SAE behind the last register operand, if the last operand is immediate, it will put rounding/SAE before the immediate. But I don't quite understand why ATT format put it after %edx instead of before.
>
> That's a question I raised back at the time when introducing the Intel
> syntax alternative. I don't recall having got a good answer. I guess I
> can only forward to H.J. here ...

AT&T syntax order is always different.   SAE was new.  I don't remember exactly
how the choice was made.

-- 
H.J.