This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/4] i386: Align branches within a fixed boundary


On Wed, Dec 4, 2019 at 2:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.12.2019 20:02,  H.J. Lu  wrote:
> > +static int
> > +maybe_fused_with_jcc_p (void)
> > +{
> > +  /* No RIP address.  */
> > +  if (i.base_reg && i.base_reg->reg_num == RegIP)
> > +    return 0;
> > +
> > +  /* and, add, sub with destination register.  */
> > +  if ((i.tm.base_opcode | 5) == 0x25
> > +      || (i.tm.base_opcode | 5) == 0x5
> > +      || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
>
> Why not the same "| 5" approach here as above? And why then not
> further fold AND and SUB (using "| 0xd" along the lines of ...

It turns out that (i.tm.base_opcode | 5) == 0x25 doesn't work since
5 doesn't contain consecutive bits.  I need to use

i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25

> > +      || ((i.tm.base_opcode | 3) == 0x83
> > +       && ((i.tm.extension_opcode | 1) == 0x5
>
> ... this?
>
> > +           || i.tm.extension_opcode == 0x0)))
> > +    return (i.types[1].bitfield.class == Reg
> > +         || i.types[1].bitfield.instance == Accum);
> > +
> > +  /* test, cmp with any register.  */
> > +  if ((i.tm.base_opcode | 1) == 0x85
> > +      || (i.tm.base_opcode | 1) == 0xa9
> > +      || (i.tm.base_opcode | 1) == 0xf7
>
> This needs further qualification by extension opcode, to avoid
> covering not, neg, mul, etc.

I am adding i.tm.extension_opcode == 0.

> > +      || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
>
> Along the lines of the above use "| 7" here, seeing that ...

We can't use "| 7" since it will match 0x3e and 0x3f.

> > +      || ((i.tm.base_opcode | 3) == 0x83
> > +       && (i.tm.extension_opcode == 0x7)))
> > +    return (i.types[0].bitfield.class == Reg
> > +         || i.types[0].bitfield.instance == Accum
> > +         || i.types[1].bitfield.class == Reg
> > +         || i.types[1].bitfield.instance == Accum);
>
> ... this will also be correct (returning 0) for AAS?

AAS can't be fused with jcc.

> Also, for any of the checks here (and just like now done in
> add_branch_padding_frag_p()) - wouldn't you better exclude VEX
> and alike encoding up front. I didn't check whether there currently
> are conflicting encodings, but new ones appearing would likely go
> unnoticed as far as this code is concerned.

I will add

 /* No VEX/EVEX encoding.  */
  if (is_any_vex_encoding (&i.tm))
   return 0;

> > +static int
> > +add_branch_prefix_frag_p (void)
> > +{
> > +  /* NB: Don't work with COND_JUMP86 without i386.  Don't add prefix
> > +     to PadLock instructions since they include prefixes in opcode.  */
> > +  if (!align_branch_power
> > +      || now_seg == absolute_section
> > +      || i.tm.cpu_flags.bitfield.cpupadlock
>
> I continue to question this: PadLock insns are no different to legacy
> encoded SSE ones, and even some newer GPR ones. Why don't you instead
> check i.tm.base_opcode >> (8 * i.tm.opcode_bytes) to be (non-)zero?

There are a few  i.tm.cpu_flags.bitfield.cpupadlock checks.  I want to be
conservative here.  PadLock insns should be on dead code path for affected
processors anyway.

> > +      || !cpu_arch_flags.bitfield.cpui386)
> > +    return 0;
> > +
> > +  /* Don't add prefix if it is a prefix or there is no operand.  */
> > +  if (!i.operands || i.tm.opcode_modifier.isprefix)
> > +    return 0;
>
> Why is there not being any operand a reason to avoid adding prefixes?

I am just a little bit conservative.

> > +static int
> > +add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> > +{
> > +  int add_padding;
> > +
> > +  /* NB: Don't work with COND_JUMP86 without i386.  */
> > +  if (!align_branch_power
> > +      || now_seg == absolute_section
> > +      || !cpu_arch_flags.bitfield.cpui386)
> > +    return 0;
> > +
> > +  add_padding = 0;
> > +
> > +  /* Check for jcc and direct jmp.  */
> > +  if (i.tm.opcode_modifier.jump == JUMP)
> > +    {
> > +      if (i.tm.base_opcode == JUMP_PC_RELATIVE)
> > +     {
> > +       *branch_p = align_branch_jmp;
> > +       add_padding = align_branch & align_branch_jmp_bit;
> > +     }
> > +      else
> > +     {
> > +       *branch_p = align_branch_jcc;
> > +       if ((align_branch & align_branch_jcc_bit))
>
> Extra pair of parentheses (more similar instances further down
> in this function).

I consider it as a coding style in case a new condition is added.

>
> > @@ -8471,9 +8806,75 @@ output_insn (void)
> >         if (j > 15)
> >           as_warn (_("instruction length of %u bytes exceeds the limit of 15"),
> >                    j);
> > +       else if (fragP)
> > +         {
> > +           /* NB: Don't add prefix with GOTPC relocation since
> > +              output_disp() above depends on the fixed encoding
> > +              length.  Can't add prefix with TLS relocation since
> > +              it breaks TLS linker optimization.  */
> > +           unsigned int max = i.has_gotpc_tls_reloc ? 0 : 15 - j;
> > +           /* Prefix count on the current instruction.  */
> > +           unsigned int count = !!is_any_vex_encoding (&i.tm);
>
> How come VEX and alike count as exactly one byte worth of prefixes?

I will fix it.

Thanks.


-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]