This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH 2/4] i386: Align branches within a fixed boundary
On Wed, Dec 4, 2019 at 2:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 03.12.2019 20:02, H.J. Lu wrote:
> > +static int
> > +maybe_fused_with_jcc_p (void)
> > +{
> > + /* No RIP address. */
> > + if (i.base_reg && i.base_reg->reg_num == RegIP)
> > + return 0;
> > +
> > + /* and, add, sub with destination register. */
> > + if ((i.tm.base_opcode | 5) == 0x25
> > + || (i.tm.base_opcode | 5) == 0x5
> > + || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
>
> Why not the same "| 5" approach here as above? And why then not
> further fold AND and SUB (using "| 0xd" along the lines of ...
It turns out that (i.tm.base_opcode | 5) == 0x25 doesn't work since
5 doesn't contain consecutive bits. I need to use
i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25
> > + || ((i.tm.base_opcode | 3) == 0x83
> > + && ((i.tm.extension_opcode | 1) == 0x5
>
> ... this?
>
> > + || i.tm.extension_opcode == 0x0)))
> > + return (i.types[1].bitfield.class == Reg
> > + || i.types[1].bitfield.instance == Accum);
> > +
> > + /* test, cmp with any register. */
> > + if ((i.tm.base_opcode | 1) == 0x85
> > + || (i.tm.base_opcode | 1) == 0xa9
> > + || (i.tm.base_opcode | 1) == 0xf7
>
> This needs further qualification by extension opcode, to avoid
> covering not, neg, mul, etc.
I am adding i.tm.extension_opcode == 0.
> > + || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
>
> Along the lines of the above use "| 7" here, seeing that ...
We can't use "| 7" since it will match 0x3e and 0x3f.
> > + || ((i.tm.base_opcode | 3) == 0x83
> > + && (i.tm.extension_opcode == 0x7)))
> > + return (i.types[0].bitfield.class == Reg
> > + || i.types[0].bitfield.instance == Accum
> > + || i.types[1].bitfield.class == Reg
> > + || i.types[1].bitfield.instance == Accum);
>
> ... this will also be correct (returning 0) for AAS?
AAS can't be fused with jcc.
> Also, for any of the checks here (and just like now done in
> add_branch_padding_frag_p()) - wouldn't you better exclude VEX
> and alike encoding up front. I didn't check whether there currently
> are conflicting encodings, but new ones appearing would likely go
> unnoticed as far as this code is concerned.
I will add
/* No VEX/EVEX encoding. */
if (is_any_vex_encoding (&i.tm))
return 0;
> > +static int
> > +add_branch_prefix_frag_p (void)
> > +{
> > + /* NB: Don't work with COND_JUMP86 without i386. Don't add prefix
> > + to PadLock instructions since they include prefixes in opcode. */
> > + if (!align_branch_power
> > + || now_seg == absolute_section
> > + || i.tm.cpu_flags.bitfield.cpupadlock
>
> I continue to question this: PadLock insns are no different to legacy
> encoded SSE ones, and even some newer GPR ones. Why don't you instead
> check i.tm.base_opcode >> (8 * i.tm.opcode_bytes) to be (non-)zero?
There are a few i.tm.cpu_flags.bitfield.cpupadlock checks. I want to be
conservative here. PadLock insns should be on dead code path for affected
processors anyway.
> > + || !cpu_arch_flags.bitfield.cpui386)
> > + return 0;
> > +
> > + /* Don't add prefix if it is a prefix or there is no operand. */
> > + if (!i.operands || i.tm.opcode_modifier.isprefix)
> > + return 0;
>
> Why is there not being any operand a reason to avoid adding prefixes?
I am just a little bit conservative.
> > +static int
> > +add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> > +{
> > + int add_padding;
> > +
> > + /* NB: Don't work with COND_JUMP86 without i386. */
> > + if (!align_branch_power
> > + || now_seg == absolute_section
> > + || !cpu_arch_flags.bitfield.cpui386)
> > + return 0;
> > +
> > + add_padding = 0;
> > +
> > + /* Check for jcc and direct jmp. */
> > + if (i.tm.opcode_modifier.jump == JUMP)
> > + {
> > + if (i.tm.base_opcode == JUMP_PC_RELATIVE)
> > + {
> > + *branch_p = align_branch_jmp;
> > + add_padding = align_branch & align_branch_jmp_bit;
> > + }
> > + else
> > + {
> > + *branch_p = align_branch_jcc;
> > + if ((align_branch & align_branch_jcc_bit))
>
> Extra pair of parentheses (more similar instances further down
> in this function).
I consider it as a coding style in case a new condition is added.
>
> > @@ -8471,9 +8806,75 @@ output_insn (void)
> > if (j > 15)
> > as_warn (_("instruction length of %u bytes exceeds the limit of 15"),
> > j);
> > + else if (fragP)
> > + {
> > + /* NB: Don't add prefix with GOTPC relocation since
> > + output_disp() above depends on the fixed encoding
> > + length. Can't add prefix with TLS relocation since
> > + it breaks TLS linker optimization. */
> > + unsigned int max = i.has_gotpc_tls_reloc ? 0 : 15 - j;
> > + /* Prefix count on the current instruction. */
> > + unsigned int count = !!is_any_vex_encoding (&i.tm);
>
> How come VEX and alike count as exactly one byte worth of prefixes?
I will fix it.
Thanks.
--
H.J.