[PATCH 2/4] i386: Align branches within a fixed boundary

Fri Dec 6 01:05:00 GMT 2019

On Thu, Dec 5, 2019 at 12:04 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 05.12.2019 00:55, H.J. Lu wrote:
> > On Wed, Dec 4, 2019 at 2:57 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 03.12.2019 20:02,  H.J. Lu  wrote:
> >>> +static int
> >>> +maybe_fused_with_jcc_p (void)
> >>> +{
> >>> +  /* No RIP address.  */
> >>> +  if (i.base_reg && i.base_reg->reg_num == RegIP)
> >>> +    return 0;
> >>> +
> >>> +  /* and, add, sub with destination register.  */
> >>> +  if ((i.tm.base_opcode | 5) == 0x25
> >>> +      || (i.tm.base_opcode | 5) == 0x5
> >>> +      || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
> >>
> >> Why not the same "| 5" approach here as above? And why then not
> >> further fold AND and SUB (using "| 0xd" along the lines of ...
> >
> > It turns out that (i.tm.base_opcode | 5) == 0x25 doesn't work since
> > 5 doesn't contain consecutive bits.  I need to use
> >
> > i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25
>
> Then use "| 7", as suggested further down, since ...

"| 7" == 0x27 will match 0x20 to 0x27.  But we only want 0x20 to 0x25.

> >>> +      || ((i.tm.base_opcode | 3) == 0x83
> >>> +       && ((i.tm.extension_opcode | 1) == 0x5
> >>
> >> ... this?
> >>
> >>> +           || i.tm.extension_opcode == 0x0)))
> >>> +    return (i.types[1].bitfield.class == Reg
> >>> +         || i.types[1].bitfield.instance == Accum);
>
> ... this will produce 0 for opcodes 0x26 and 0x27.

Are you suggesting checking i.types for opcode?  I
don't think it is a good idea.

> >>> +  /* test, cmp with any register.  */
> >>> +  if ((i.tm.base_opcode | 1) == 0x85
> >>> +      || (i.tm.base_opcode | 1) == 0xa9
> >>> +      || (i.tm.base_opcode | 1) == 0xf7
> >>
> >> This needs further qualification by extension opcode, to avoid
> >> covering not, neg, mul, etc.
> >
> > I am adding i.tm.extension_opcode == 0.
> >
> >>> +      || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
> >>
> >> Along the lines of the above use "| 7" here, seeing that ...
> >
> > We can't use "| 7" since it will match 0x3e and 0x3f.
> >
> >>> +      || ((i.tm.base_opcode | 3) == 0x83
> >>> +       && (i.tm.extension_opcode == 0x7)))
> >>> +    return (i.types[0].bitfield.class == Reg
> >>> +         || i.types[0].bitfield.instance == Accum
> >>> +         || i.types[1].bitfield.class == Reg
> >>> +         || i.types[1].bitfield.instance == Accum);
> >>
> >> ... this will also be correct (returning 0) for AAS?
> >
> > AAS can't be fused with jcc.
>
> Of course, but my comment did cover this fact: The expression of
> the return statement will produce 0 for it, which is what you
> want (aiui).

We shouldn't check types for opcode.

> >>> +static int
> >>> +add_branch_prefix_frag_p (void)
> >>> +{
> >>> +  /* NB: Don't work with COND_JUMP86 without i386.  Don't add prefix
> >>> +     to PadLock instructions since they include prefixes in opcode.  */
> >>> +  if (!align_branch_power
> >>> +      || now_seg == absolute_section
> >>> +      || i.tm.cpu_flags.bitfield.cpupadlock
> >>
> >> I continue to question this: PadLock insns are no different to legacy
> >> encoded SSE ones, and even some newer GPR ones. Why don't you instead
> >> check i.tm.base_opcode >> (8 * i.tm.opcode_bytes) to be (non-)zero?
> >
> > There are a few  i.tm.cpu_flags.bitfield.cpupadlock checks.  I want to be
> > conservative here.  PadLock insns should be on dead code path for affected
> > processors anyway.
>
> But legacy encoded insns with prefixes embedded in their base
> opcode won't be. I appreciate you wanting to be conservative here
> - what I'm asking for is to be even more conservative. If there
> is a concern about such prefixes in the first place, then the
> concern ought to be universal, I would think.

I will also count SSE prefix.

> >>> +      || !cpu_arch_flags.bitfield.cpui386)
> >>> +    return 0;
> >>> +
> >>> +  /* Don't add prefix if it is a prefix or there is no operand.  */
> >>> +  if (!i.operands || i.tm.opcode_modifier.isprefix)
> >>> +    return 0;
> >>
> >> Why is there not being any operand a reason to avoid adding prefixes?
> >
> > I am just a little bit conservative.
>
> Especially in cases of arbitrary decisions like this one, may I
> ask that you leave suitable comments, so it won't remain guess
> work down the road as to why the condition is there?

Will add a comment.

> >>> +static int
> >>> +add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> >>> +{
> >>> +  int add_padding;
> >>> +
> >>> +  /* NB: Don't work with COND_JUMP86 without i386.  */
> >>> +  if (!align_branch_power
> >>> +      || now_seg == absolute_section
> >>> +      || !cpu_arch_flags.bitfield.cpui386)
> >>> +    return 0;
> >>> +
> >>> +  add_padding = 0;
> >>> +
> >>> +  /* Check for jcc and direct jmp.  */
> >>> +  if (i.tm.opcode_modifier.jump == JUMP)
> >>> +    {
> >>> +      if (i.tm.base_opcode == JUMP_PC_RELATIVE)
> >>> +     {
> >>> +       *branch_p = align_branch_jmp;
> >>> +       add_padding = align_branch & align_branch_jmp_bit;
> >>> +     }
> >>> +      else
> >>> +     {
> >>> +       *branch_p = align_branch_jcc;
> >>> +       if ((align_branch & align_branch_jcc_bit))
> >>
> >> Extra pair of parentheses (more similar instances further down
> >> in this function).
> >
> > I consider it as a coding style in case a new condition is added.
>
> Oh. I didn't think we do so commonly. But anyway, you're the
> maintainer.
>
> Jan

Thanks.

-- 
H.J.