[PATCH 2/4] i386: Align branches within a fixed boundary
H.J. Lu
hjl.tools@gmail.com
Fri Dec 6 01:05:00 GMT 2019
On Thu, Dec 5, 2019 at 12:04 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 05.12.2019 00:55, H.J. Lu wrote:
> > On Wed, Dec 4, 2019 at 2:57 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 03.12.2019 20:02, H.J. Lu wrote:
> >>> +static int
> >>> +maybe_fused_with_jcc_p (void)
> >>> +{
> >>> + /* No RIP address. */
> >>> + if (i.base_reg && i.base_reg->reg_num == RegIP)
> >>> + return 0;
> >>> +
> >>> + /* and, add, sub with destination register. */
> >>> + if ((i.tm.base_opcode | 5) == 0x25
> >>> + || (i.tm.base_opcode | 5) == 0x5
> >>> + || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
> >>
> >> Why not the same "| 5" approach here as above? And why then not
> >> further fold AND and SUB (using "| 0xd" along the lines of ...
> >
> > It turns out that (i.tm.base_opcode | 5) == 0x25 doesn't work since
> > 5 doesn't contain consecutive bits. I need to use
> >
> > i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25
>
> Then use "| 7", as suggested further down, since ...
"| 7" == 0x27 will match 0x20 to 0x27. But we only want 0x20 to 0x25.
> >>> + || ((i.tm.base_opcode | 3) == 0x83
> >>> + && ((i.tm.extension_opcode | 1) == 0x5
> >>
> >> ... this?
> >>
> >>> + || i.tm.extension_opcode == 0x0)))
> >>> + return (i.types[1].bitfield.class == Reg
> >>> + || i.types[1].bitfield.instance == Accum);
>
> ... this will produce 0 for opcodes 0x26 and 0x27.
Are you suggesting checking i.types for opcode? I
don't think it is a good idea.
> >>> + /* test, cmp with any register. */
> >>> + if ((i.tm.base_opcode | 1) == 0x85
> >>> + || (i.tm.base_opcode | 1) == 0xa9
> >>> + || (i.tm.base_opcode | 1) == 0xf7
> >>
> >> This needs further qualification by extension opcode, to avoid
> >> covering not, neg, mul, etc.
> >
> > I am adding i.tm.extension_opcode == 0.
> >
> >>> + || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
> >>
> >> Along the lines of the above use "| 7" here, seeing that ...
> >
> > We can't use "| 7" since it will match 0x3e and 0x3f.
> >
> >>> + || ((i.tm.base_opcode | 3) == 0x83
> >>> + && (i.tm.extension_opcode == 0x7)))
> >>> + return (i.types[0].bitfield.class == Reg
> >>> + || i.types[0].bitfield.instance == Accum
> >>> + || i.types[1].bitfield.class == Reg
> >>> + || i.types[1].bitfield.instance == Accum);
> >>
> >> ... this will also be correct (returning 0) for AAS?
> >
> > AAS can't be fused with jcc.
>
> Of course, but my comment did cover this fact: The expression of
> the return statement will produce 0 for it, which is what you
> want (aiui).
We shouldn't check types for opcode.
> >>> +static int
> >>> +add_branch_prefix_frag_p (void)
> >>> +{
> >>> + /* NB: Don't work with COND_JUMP86 without i386. Don't add prefix
> >>> + to PadLock instructions since they include prefixes in opcode. */
> >>> + if (!align_branch_power
> >>> + || now_seg == absolute_section
> >>> + || i.tm.cpu_flags.bitfield.cpupadlock
> >>
> >> I continue to question this: PadLock insns are no different to legacy
> >> encoded SSE ones, and even some newer GPR ones. Why don't you instead
> >> check i.tm.base_opcode >> (8 * i.tm.opcode_bytes) to be (non-)zero?
> >
> > There are a few i.tm.cpu_flags.bitfield.cpupadlock checks. I want to be
> > conservative here. PadLock insns should be on dead code path for affected
> > processors anyway.
>
> But legacy encoded insns with prefixes embedded in their base
> opcode won't be. I appreciate you wanting to be conservative here
> - what I'm asking for is to be even more conservative. If there
> is a concern about such prefixes in the first place, then the
> concern ought to be universal, I would think.
I will also count SSE prefix.
> >>> + || !cpu_arch_flags.bitfield.cpui386)
> >>> + return 0;
> >>> +
> >>> + /* Don't add prefix if it is a prefix or there is no operand. */
> >>> + if (!i.operands || i.tm.opcode_modifier.isprefix)
> >>> + return 0;
> >>
> >> Why is there not being any operand a reason to avoid adding prefixes?
> >
> > I am just a little bit conservative.
>
> Especially in cases of arbitrary decisions like this one, may I
> ask that you leave suitable comments, so it won't remain guess
> work down the road as to why the condition is there?
Will add a comment.
> >>> +static int
> >>> +add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> >>> +{
> >>> + int add_padding;
> >>> +
> >>> + /* NB: Don't work with COND_JUMP86 without i386. */
> >>> + if (!align_branch_power
> >>> + || now_seg == absolute_section
> >>> + || !cpu_arch_flags.bitfield.cpui386)
> >>> + return 0;
> >>> +
> >>> + add_padding = 0;
> >>> +
> >>> + /* Check for jcc and direct jmp. */
> >>> + if (i.tm.opcode_modifier.jump == JUMP)
> >>> + {
> >>> + if (i.tm.base_opcode == JUMP_PC_RELATIVE)
> >>> + {
> >>> + *branch_p = align_branch_jmp;
> >>> + add_padding = align_branch & align_branch_jmp_bit;
> >>> + }
> >>> + else
> >>> + {
> >>> + *branch_p = align_branch_jcc;
> >>> + if ((align_branch & align_branch_jcc_bit))
> >>
> >> Extra pair of parentheses (more similar instances further down
> >> in this function).
> >
> > I consider it as a coding style in case a new condition is added.
>
> Oh. I didn't think we do so commonly. But anyway, you're the
> maintainer.
>
> Jan
Thanks.
--
H.J.
More information about the Binutils
mailing list