[PATCH 4/4] x86: fold certain AVX and AVX2 templates

H.J. Lu hjl.tools@gmail.com
Fri Dec 15 16:49:00 GMT 2017


On Fri, Dec 15, 2017 at 8:32 AM, Jan Beulich <jbeulich@suse.com> wrote:
>>>> "H.J. Lu" <hjl.tools@gmail.com> 12/15/17 2:10 PM >>>
>>On Fri, Dec 15, 2017 at 2:35 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>> Just like for instructions in GPRs, there's no need to have separate
>>> templates for otherwise identical insns acting on XMM or YMM registers
>>> (or memory of the same size).
>>>
>>> gas/
>>> 2017-12-15  Jan Beulich  <jbeulich@suse.com>
>>>
>>>         * config/tc-i386.c (regymm, regzmm): Delete.
>>>         (operand_type_register_match). Extend comment. Also handle some
>>>         memory operands here. Extend to cover .regsimd.
>>>         (build_vex_prefix): Derive vector_length from actual operand
>>>         size.
>>>         (process_operands, build_modrm_byte): Use .regsimd.
>>>
>>> opcodes/
>>> 2017-12-15  Jan Beulich  <jbeulich@suse.com>
>>>
>>>         * i386-gen.c (operand_type_init): Delete OPERAND_TYPE_REGYMM and
>>>         OPERAND_TYPE_REGZMM entries.
>>>         * i386-opc.h (enum of opcode modifiers): Extend comment.
>>>         i386-opc.tbl (vaddpd, vaddps, vaddsubpd, vaddsubps, vandnpd,
>>>         vandnps, vandpd, vandps, vblendpd, vblendps, vblendvpd,
>>>         vblendvps, vbroadcastss, vcmpeq_ospd, vcmpeq_osps, vcmpeqpd,
>>>         vcmpeqps, vcmpeq_uqpd, vcmpeq_uqps, vcmpeq_uspd, vcmpeq_usps,
>>>         vcmpfalse_ospd, vcmpfalse_osps, vcmpfalsepd, vcmpfalseps,
>>>         vcmpge_oqpd, vcmpge_oqps, vcmpgepd, vcmpgeps, vcmpgt_oqpd,
>>>         vcmpgt_oqps, vcmpgtpd, vcmpgtps, vcmple_oqpd, vcmple_oqps,
>>>         vcmplepd, vcmpleps, vcmplt_oqpd, vcmplt_oqps, vcmpltpd,
>>>         vcmpltps, vcmpneq_oqpd, vcmpneq_oqps, vcmpneq_ospd,
>>>         vcmpneq_osps, vcmpneqpd, vcmpneqps, vcmpneq_uspd, vcmpneq_usps,
>>>         vcmpngepd, vcmpngeps, vcmpnge_uqpd, vcmpnge_uqps, vcmpngtpd,
>>>         vcmpngtps, vcmpngt_uqpd, vcmpngt_uqps, vcmpnlepd, vcmpnleps,
>>>         vcmpnle_uqpd, vcmpnle_uqps, vcmpnltpd, vcmpnltps, vcmpnlt_uqpd,
>>>         vcmpnlt_uqps, vcmpordpd, vcmpordps, vcmpord_spd, vcmpord_sps,
>>>         vcmppd, vcmpps, vcmptruepd, vcmptrueps, vcmptrue_uspd,
>>>         vcmptrue_usps, vcmpunordpd, vcmpunordps, vcmpunord_spd,
>>>         vcmpunord_sps, vcvtdq2ps, vcvtpd2dq, vcvtpd2ps, vcvtps2dq,
>>>         vcvttpd2dq, vcvttps2dq, vdivpd, vdivps, vdpps, vhaddpd, vhaddps,
>>>         vhsubpd, vhsubps, vlddqu, vmaskmovpd, vmaskmovps, vmaxpd,
>>>         vmaxps, vminpd, vminps, vmovapd, vmovaps, vmovdqa, vmovdqu,
>>>         vmovmskpd, vmovmskps, vmovntdq, vmovntpd, vmovntps, vmovshdup,
>>>         vmovsldup, vmovupd, vmovups, vmulpd, vmulps, vorpd, vorps,
>>>         vpermilpd, vpermilps, vptest, vrcpps, vroundpd, vroundps,
>>>         vrsqrtps, vshufpd, vshufps, vsqrtpd, vsqrtps, vsubpd, vsubps,
>>>         vtestpd, vtestps, vunpckhpd, vunpckhps, vunpcklpd, vunpcklps,
>>>         vxorpd, vxorps, vpblendd, vpbroadcastb, vpbroadcastd,
>>>         vpbroadcastw, vpbroadcastq, vpmaskmovd, vpmaskmovq, vpsllvd,
>>>         vpsllvq, vpsravd, vpsravq, vpsrlvd, vpsrlvq): Fold 128- and
>>>         256-bit forms. Use CheckRegSize instead of IgnoreSize where
>>>         appropriate. Drop Xmmword and Ymmword from the results where
>>>         possible.
>>>         * i386-tbl.h: Re-generate.
>>> ---
>>> For some yet to be understood reason folding the memory forms of
>>> vcvtpd2ps doesn't work (some Intel mode ymmword ptr forms produce
>>> 128-bit insns).
>>
>>Integer extension instructions also take 2 register operands of different
>>sizes.  How are they handled?
>
> As per the list of changes insns, conversions to/from scalar int aren't being
> folded, so their handling doesn't change. And quite obviously so, since no
> matter what the GPR size, the other side is an xmmword (register or
> memory), while here I'm folding only templates where one used xmmword
> and the other ymmword.
>
>>> Similarly I didn't figure out yet the reason for an anomaly when the
>>> "unspecified" checks in operand_type_register_match() are missing: In
>>> that case I've observed errors on vaddsubp{s,d}, but not on e.g.
>>> vaddp{s,d} with identical operands.
>>
>>Please open a bug with a testcase.
>
> You perhaps misunderstood: I've observed this issue while putting together
> the patch here. I'm not aware of an issue without the patch applied, nor with
> the patch in its current form applied. I'm merely pointing out that there is
> a _possible_ issue pointed out by this anomaly. This could e.g. be the result
> of some latent bug somewhere which was triggered by the not-yet-correct
> patch. I'm intending to investigate this, but I can't predict when this will be;
> I've put the note here in case the observation triggers something for you or
> anyone else who reads this, which might then help me save some time
> needlessly investigating what's going on there.
>

Patch is OK then.

Thanks.


-- 
H.J.



More information about the Binutils mailing list