FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions
Cui, Lili
lili.cui@intel.com
Tue Jul 20 11:13:51 GMT 2021
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, July 20, 2021 4:46 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16
> instructions
>
> On 20.07.2021 09:08, Cui, Lili wrote:
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Wednesday, July 14, 2021 11:21 PM
> >> To: Cui, Lili <lili.cui@intel.com>
> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> >> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16
> >> instructions
> >>
> >> On 13.07.2021 08:58, Cui, Lili wrote:
> >>
> >> Disassembler:
> >>
> >> d_scalar_mode looks to be unused.
> >>
> >> This
> >>
> >> /* EVEX_W_MAP5_2A_P_1 */
> >> {
> >> { "vcvtsi2sh{%LQ|}", { XMScalar, VexScalar, EXxEVexR, Ed }, 0 },
> >> { "vcvtsi2sh{%LQ|}", { XMScalar, VexScalar, EXxEVexR, Eq }, 0 },
> >> },
> >>
> >> can imo be expressed without decoding EVEX.W, by using Edq instead of
> >> (separately) Ed and Eq. There's at least one similar case elsewhere.
> >> Interestingly in the 2si/2usi conversions you do use Gdq already,
> >> which I think handles the EVEX.W=1 case correctly outside of 64-bit
> >> mode (unlike Eq, which will unconditionally produce 64-bit register names
> afaict).
> >>
> >> As to a broader question on decoding EVEX.W: Did you consider
> >> introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a
> >> valid encoding), to avoid this decode step for perhaps almost all
> >> entries? And if that's not an option, decoding EVEX.W first for all
> >> the opcodes which previously had no meaning at all would, in some
> >> cases, reduce the overall number of table entries (and in all other
> >> cases this would then merely be for consistency, as it also wouldn't
> increase the number of table entries). To give an example:
> >>
> >> { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },
> >>
> >> =>
> >>
> >> /* PREFIX_EVEX_0F3AC2 */
> >> {
> >> { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },
> >> { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },
> >> },
> >>
> >> =>
> >>
> >> /* EVEX_W_0F3AC2_P_0 */
> >> {
> >> { "vcmpph", { XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
> >> },
> >> /* EVEX_W_0F3AC2_P_1 */
> >> {
> >> { "vcmpsh", { XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
> >> },
> >>
> >> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first would
> >> yield 1
> >> (evex) + 2 (evex_w) + 4 (prefix) entries.
> >
> > Hi Jan,
> >
> > Do you want me to change it like this?
> > { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },
> >
> > =>
> >
> > /* PREFIX_EVEX_0F3AC2 */
> > {
> > { "vcmp%XH", { XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
> > { "vcmp%XH", { XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
> > },
> >
> > "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1 report
> bad code.
> > if (EVEX.LL== EVEX.LLIG)
> > print 'sh'
> > else
> > print 'ph'
>
> Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'
> depending on VEX.W. %XH would print 'h' if EVEX.W is clear and produce an
> appropriate indication of the encoding being bad if EVEX.W is set.
> IOW something like
>
> /* PREFIX_EVEX_0F3AC2 */
> {
> { "vcmpp%XH", { XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
> { "vcmps%XH", { XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
> },
>
> >> The delta is even larger for something like MAP5_7D: 1 + 4 + 4 * 2
> >> vs. 1 + 2 + 4. This also results in more related entries ending up
> >> closer to one another.
> >>
> > I don't quite understand here, should I let all FP16 disassembler go
> through W_TABLE fist? or just add something like %XH instead of going
> through W_TABLE? Thanks.
>
> Where beneficial you will want to decode EVEX.W first, yes. Unless, as per
> above, you can avoid that decoding step altogether by using %XH.
>
Okay, It is clear to me, many thanks!
Lili
More information about the Binutils
mailing list