This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH 2/4] i386: Align branches within a fixed boundary
On 03.12.2019 20:02, H.J. Lu wrote:
> +static int
> +maybe_fused_with_jcc_p (void)
> +{
> + /* No RIP address. */
> + if (i.base_reg && i.base_reg->reg_num == RegIP)
> + return 0;
> +
> + /* and, add, sub with destination register. */
> + if ((i.tm.base_opcode | 5) == 0x25
> + || (i.tm.base_opcode | 5) == 0x5
> + || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
Why not the same "| 5" approach here as above? And why then not
further fold AND and SUB (using "| 0xd" along the lines of ...
> + || ((i.tm.base_opcode | 3) == 0x83
> + && ((i.tm.extension_opcode | 1) == 0x5
... this?
> + || i.tm.extension_opcode == 0x0)))
> + return (i.types[1].bitfield.class == Reg
> + || i.types[1].bitfield.instance == Accum);
> +
> + /* test, cmp with any register. */
> + if ((i.tm.base_opcode | 1) == 0x85
> + || (i.tm.base_opcode | 1) == 0xa9
> + || (i.tm.base_opcode | 1) == 0xf7
This needs further qualification by extension opcode, to avoid
covering not, neg, mul, etc.
> + || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
Along the lines of the above use "| 7" here, seeing that ...
> + || ((i.tm.base_opcode | 3) == 0x83
> + && (i.tm.extension_opcode == 0x7)))
> + return (i.types[0].bitfield.class == Reg
> + || i.types[0].bitfield.instance == Accum
> + || i.types[1].bitfield.class == Reg
> + || i.types[1].bitfield.instance == Accum);
... this will also be correct (returning 0) for AAS?
Also, for any of the checks here (and just like now done in
add_branch_padding_frag_p()) - wouldn't you better exclude VEX
and alike encoding up front. I didn't check whether there currently
are conflicting encodings, but new ones appearing would likely go
unnoticed as far as this code is concerned.
> +static int
> +add_branch_prefix_frag_p (void)
> +{
> + /* NB: Don't work with COND_JUMP86 without i386. Don't add prefix
> + to PadLock instructions since they include prefixes in opcode. */
> + if (!align_branch_power
> + || now_seg == absolute_section
> + || i.tm.cpu_flags.bitfield.cpupadlock
I continue to question this: PadLock insns are no different to legacy
encoded SSE ones, and even some newer GPR ones. Why don't you instead
check i.tm.base_opcode >> (8 * i.tm.opcode_bytes) to be (non-)zero?
> + || !cpu_arch_flags.bitfield.cpui386)
> + return 0;
> +
> + /* Don't add prefix if it is a prefix or there is no operand. */
> + if (!i.operands || i.tm.opcode_modifier.isprefix)
> + return 0;
Why is there not being any operand a reason to avoid adding prefixes?
> +static int
> +add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> +{
> + int add_padding;
> +
> + /* NB: Don't work with COND_JUMP86 without i386. */
> + if (!align_branch_power
> + || now_seg == absolute_section
> + || !cpu_arch_flags.bitfield.cpui386)
> + return 0;
> +
> + add_padding = 0;
> +
> + /* Check for jcc and direct jmp. */
> + if (i.tm.opcode_modifier.jump == JUMP)
> + {
> + if (i.tm.base_opcode == JUMP_PC_RELATIVE)
> + {
> + *branch_p = align_branch_jmp;
> + add_padding = align_branch & align_branch_jmp_bit;
> + }
> + else
> + {
> + *branch_p = align_branch_jcc;
> + if ((align_branch & align_branch_jcc_bit))
Extra pair of parentheses (more similar instances further down
in this function).
> @@ -8471,9 +8806,75 @@ output_insn (void)
> if (j > 15)
> as_warn (_("instruction length of %u bytes exceeds the limit of 15"),
> j);
> + else if (fragP)
> + {
> + /* NB: Don't add prefix with GOTPC relocation since
> + output_disp() above depends on the fixed encoding
> + length. Can't add prefix with TLS relocation since
> + it breaks TLS linker optimization. */
> + unsigned int max = i.has_gotpc_tls_reloc ? 0 : 15 - j;
> + /* Prefix count on the current instruction. */
> + unsigned int count = !!is_any_vex_encoding (&i.tm);
How come VEX and alike count as exactly one byte worth of prefixes?
Jan