This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/4] i386: Align branches within a fixed boundary


On 03.12.2019 20:02,  H.J. Lu  wrote:
> +static int
> +maybe_fused_with_jcc_p (void)
> +{
> +  /* No RIP address.  */
> +  if (i.base_reg && i.base_reg->reg_num == RegIP)
> +    return 0;
> +
> +  /* and, add, sub with destination register.  */
> +  if ((i.tm.base_opcode | 5) == 0x25
> +      || (i.tm.base_opcode | 5) == 0x5
> +      || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)

Why not the same "| 5" approach here as above? And why then not
further fold AND and SUB (using "| 0xd" along the lines of ...

> +      || ((i.tm.base_opcode | 3) == 0x83
> +	  && ((i.tm.extension_opcode | 1) == 0x5

... this?

> +	      || i.tm.extension_opcode == 0x0)))
> +    return (i.types[1].bitfield.class == Reg
> +	    || i.types[1].bitfield.instance == Accum);
> +
> +  /* test, cmp with any register.  */
> +  if ((i.tm.base_opcode | 1) == 0x85
> +      || (i.tm.base_opcode | 1) == 0xa9
> +      || (i.tm.base_opcode | 1) == 0xf7

This needs further qualification by extension opcode, to avoid
covering not, neg, mul, etc.

> +      || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)

Along the lines of the above use "| 7" here, seeing that ...

> +      || ((i.tm.base_opcode | 3) == 0x83
> +	  && (i.tm.extension_opcode == 0x7)))
> +    return (i.types[0].bitfield.class == Reg
> +	    || i.types[0].bitfield.instance == Accum
> +	    || i.types[1].bitfield.class == Reg
> +	    || i.types[1].bitfield.instance == Accum);

... this will also be correct (returning 0) for AAS?

Also, for any of the checks here (and just like now done in
add_branch_padding_frag_p()) - wouldn't you better exclude VEX
and alike encoding up front. I didn't check whether there currently
are conflicting encodings, but new ones appearing would likely go
unnoticed as far as this code is concerned.

> +static int
> +add_branch_prefix_frag_p (void)
> +{
> +  /* NB: Don't work with COND_JUMP86 without i386.  Don't add prefix
> +     to PadLock instructions since they include prefixes in opcode.  */
> +  if (!align_branch_power
> +      || now_seg == absolute_section
> +      || i.tm.cpu_flags.bitfield.cpupadlock

I continue to question this: PadLock insns are no different to legacy
encoded SSE ones, and even some newer GPR ones. Why don't you instead
check i.tm.base_opcode >> (8 * i.tm.opcode_bytes) to be (non-)zero?

> +      || !cpu_arch_flags.bitfield.cpui386)
> +    return 0;
> +
> +  /* Don't add prefix if it is a prefix or there is no operand.  */
> +  if (!i.operands || i.tm.opcode_modifier.isprefix)
> +    return 0;

Why is there not being any operand a reason to avoid adding prefixes?

> +static int
> +add_branch_padding_frag_p (enum align_branch_kind *branch_p)
> +{
> +  int add_padding;
> +
> +  /* NB: Don't work with COND_JUMP86 without i386.  */
> +  if (!align_branch_power
> +      || now_seg == absolute_section
> +      || !cpu_arch_flags.bitfield.cpui386)
> +    return 0;
> +
> +  add_padding = 0;
> +
> +  /* Check for jcc and direct jmp.  */
> +  if (i.tm.opcode_modifier.jump == JUMP)
> +    {
> +      if (i.tm.base_opcode == JUMP_PC_RELATIVE)
> +	{
> +	  *branch_p = align_branch_jmp;
> +	  add_padding = align_branch & align_branch_jmp_bit;
> +	}
> +      else
> +	{
> +	  *branch_p = align_branch_jcc;
> +	  if ((align_branch & align_branch_jcc_bit))

Extra pair of parentheses (more similar instances further down
in this function).

> @@ -8471,9 +8806,75 @@ output_insn (void)
>  	  if (j > 15)
>  	    as_warn (_("instruction length of %u bytes exceeds the limit of 15"),
>  		     j);
> +	  else if (fragP)
> +	    {
> +	      /* NB: Don't add prefix with GOTPC relocation since
> +		 output_disp() above depends on the fixed encoding
> +		 length.  Can't add prefix with TLS relocation since
> +		 it breaks TLS linker optimization.  */
> +	      unsigned int max = i.has_gotpc_tls_reloc ? 0 : 15 - j;
> +	      /* Prefix count on the current instruction.  */
> +	      unsigned int count = !!is_any_vex_encoding (&i.tm);

How come VEX and alike count as exactly one byte worth of prefixes?

Jan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]