This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: V3 [PATCH 2/4] i386: Align branches within a fixed boundary


On 06.12.2019 20:10,  H.J. Lu  wrote:
> Add 3 command-line options to align branches within a fixed boundary
> with segment prefixes or NOPs:
> 
> 1. -malign-branch-boundary=NUM aligns branches within NUM byte boundary.
> 2. -malign-branch=TYPE[+TYPE...] specifies types of branches to align.
> The supported branches are:
>   a. Conditional jump.
>   b. Fused conditional jump.
>   c. Unconditional jump.
>   d. Call.
>   e. Ret.
>   f. Indirect jump and call.
> 3. -malign-branch-prefix-size=NUM aligns branches with NUM segment
> prefixes per instruction.
> 
> 3 new rs_machine_dependent frag types are added:
> 
> 1. BRANCH_PADDING.  The variable size frag to insert NOP before branch.
> 2. BRANCH_PREFIX.  The variable size frag to insert segment prefixes to
> an instruction.  The choices of prefixes are:
>    a. Use the existing segment prefix if there is one.
>    b. Use CS segment prefix in 64-bit mode.
>    c. In 32-bit mode, use SS segment prefix with ESP/EBP base register
>    and use DS segment prefix without ESP/EBP base register.
> 3. FUSED_JCC_PADDING.  The variable size frag to insert NOP before fused
> conditional jump.
> 
> The new rs_machine_dependent frags aren't inserted if the previous item
> is a prefix or a constant directive, which may be used to hardcode an
> instruction, since there is no clear instruction boundary.  Segment
> prefixes and NOP padding are disabled before relaxable TLS relocations
> and tls_get_addr calls to keep TLS instruction sequence unchanged.
> 
> md_estimate_size_before_relax() and i386_generic_table_relax_frag() are
> used to handled BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags.
> i386_generic_table_relax_frag() grows or shrinks sizes of segment prefix
> and NOP to align the next branch frag:
> 
> 1. First try to add segment prefixes to instructions before a branch.
> 2. If there is no sufficient room to add segment prefixes, NOP will be
> inserted before a branch.
> 
> 	* config/tc-i386.c (_i386_insn): Add has_gotpc_tls_reloc.
> 	(tls_get_addr): New.
> 	(last_insn): New.
> 	(align_branch_power): New.
> 	(align_branch_kind): New.
> 	(align_branch_bit): New.
> 	(align_branch): New.
> 	(MAX_FUSED_JCC_PADDING_SIZE): New.
> 	(align_branch_prefix_size): New.
> 	(BRANCH_PADDING): New.
> 	(BRANCH_PREFIX): New.
> 	(FUSED_JCC_PADDING): New.
> 	(i386_generate_nops): Support BRANCH_PADDING and FUSED_JCC_PADDING.
> 	(md_begin): Abort if align_branch_prefix_size <
> 	MAX_FUSED_JCC_PADDING_SIZE.
> 	(md_assemble): Set last_insn.
> 	(maybe_fused_with_jcc_p): New.
> 	(add_fused_jcc_padding_frag_p): New.
> 	(add_branch_prefix_frag_p): New.
> 	(add_branch_padding_frag_p): New.
> 	(output_insn): Generate a BRANCH_PADDING, FUSED_JCC_PADDING or
> 	BRANCH_PREFIX frag and terminate each frag to align branches.
> 	(output_disp): Set i.has_gotpc_tls_reloc to TRUE for GOTPC and
> 	relaxable TLS relocations.
> 	(output_imm): Likewise.
> 	(i386_next_non_empty_frag): New.
> 	(i386_next_jcc_frag): New.
> 	(i386_classify_machine_dependent_frag): New.
> 	(i386_branch_padding_size): New.
> 	(i386_generic_table_relax_frag): New.
> 	(md_estimate_size_before_relax): Handle COND_JUMP_PADDING,
> 	FUSED_JCC_PADDING and COND_JUMP_PREFIX frags.
> 	(md_convert_frag): Handle BRANCH_PADDING, BRANCH_PREFIX and
> 	FUSED_JCC_PADDING frags.
> 	(OPTION_MALIGN_BRANCH_BOUNDARY): New.
> 	(OPTION_MALIGN_BRANCH_PREFIX_SIZE): New.
> 	(OPTION_MALIGN_BRANCH): New.
> 	(md_longopts): Add -malign-branch-boundary=,
> 	-malign-branch-prefix-size= and -malign-branch=.
> 	(md_parse_option): Handle -malign-branch-boundary=,
> 	-malign-branch-prefix-size= and -malign-branch=.
> 	(md_show_usage): Display -malign-branch-boundary=,
> 	-malign-branch-prefix-size= and -malign-branch=.
> 	(i386_target_format): Set tls_get_addr.
> 	(i386_cons_align): New.
> 	* config/tc-i386.h (i386_cons_align): New.
> 	(md_cons_align): New.
> 	(i386_generic_table_relax_frag): New.
> 	(md_generic_table_relax_frag): New.
> 	(i386_tc_frag_data): Add u, padding_address, length,
> 	max_prefix_length, prefix_length, default_prefix, cmp_size,
> 	classified and branch_type.
> 	(TC_FRAG_INIT): Initialize u, padding_address, length,
> 	max_prefix_length, prefix_length, default_prefix, cmp_size,
> 	classified and branch_type.
> 	* doc/c-i386.texi: Document -malign-branch-boundary=,
> 	-malign-branch= and -malign-branch-prefix-size=.
> ---
>  gas/config/tc-i386.c | 1046 +++++++++++++++++++++++++++++++++++++++++-
>  gas/config/tc-i386.h |   31 ++
>  gas/doc/c-i386.texi  |   26 ++
>  3 files changed, 1100 insertions(+), 3 deletions(-)
> 
> diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
> index b62af34268..0ab6651f24 100644
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -368,6 +368,9 @@ struct _i386_insn
>      /* Has ZMM register operands.  */
>      bfd_boolean has_regzmm;
>  
> +    /* Has GOTPC or TLS relocation.  */
> +    bfd_boolean has_gotpc_tls_reloc;
> +
>      /* RM and SIB are the modrm byte and the sib byte where the
>         addressing modes of this insn are encoded.  */
>      modrm_byte rm;
> @@ -562,6 +565,8 @@ static enum flag_code flag_code;
>  static unsigned int object_64bit;
>  static unsigned int disallow_64bit_reloc;
>  static int use_rela_relocations = 0;
> +/* __tls_get_addr/___tls_get_addr symbol for TLS.  */
> +static const char *tls_get_addr;
>  
>  #if ((defined (OBJ_MAYBE_COFF) && defined (OBJ_MAYBE_AOUT)) \
>       || defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF) \
> @@ -622,6 +627,21 @@ static int omit_lock_prefix = 0;
>     "lock addl $0, (%{re}sp)".  */
>  static int avoid_fence = 0;
>  
> +/* Type of the previous instruction.  */
> +static struct
> +  {
> +    segT seg;
> +    const char *file;
> +    const char *name;
> +    unsigned int line;
> +    enum last_insn_kind
> +      {
> +	last_insn_other = 0,
> +	last_insn_directive,
> +	last_insn_prefix
> +      } kind;
> +  } last_insn;
> +
>  /* 1 if the assembler should generate relax relocations.  */
>  
>  static int generate_relax_relocations
> @@ -635,6 +655,44 @@ static enum check_kind
>    }
>  sse_check, operand_check = check_warning;
>  
> +/* Non-zero if branches should be aligned within power of 2 boundary.  */
> +static int align_branch_power = 0;
> +
> +/* Types of branches to align.  */
> +enum align_branch_kind
> +  {
> +    align_branch_none = 0,
> +    align_branch_jcc = 1,
> +    align_branch_fused = 2,
> +    align_branch_jmp = 3,
> +    align_branch_call = 4,
> +    align_branch_indirect = 5,
> +    align_branch_ret = 6
> +  };
> +
> +/* Type bits of branches to align.  */
> +enum align_branch_bit
> +  {
> +    align_branch_jcc_bit = 1 << align_branch_jcc,
> +    align_branch_fused_bit = 1 << align_branch_fused,
> +    align_branch_jmp_bit = 1 << align_branch_jmp,
> +    align_branch_call_bit = 1 << align_branch_call,
> +    align_branch_indirect_bit = 1 << align_branch_indirect,
> +    align_branch_ret_bit = 1 << align_branch_ret
> +  };
> +
> +static unsigned int align_branch = (align_branch_jcc_bit
> +				    | align_branch_fused_bit
> +				    | align_branch_jmp_bit);
> +
> +/* The maximum padding size for fused jcc.  CMP like instruction can
> +   be 9 bytes and jcc can be 6 bytes.  Leave room just in case for
> +   prefixes.   */
> +#define MAX_FUSED_JCC_PADDING_SIZE 20
> +
> +/* The maximum number of prefixes added for an instruction.  */
> +static unsigned int align_branch_prefix_size = 5;
> +
>  /* Optimization:
>     1. Clear the REX_W bit with register operand if possible.
>     2. Above plus use 128bit vector instruction to clear the full vector
> @@ -738,12 +796,19 @@ int x86_cie_data_alignment;
>  /* Interface to relax_segment.
>     There are 3 major relax states for 386 jump insns because the
>     different types of jumps add different sizes to frags when we're
> -   figuring out what sort of jump to choose to reach a given label.  */
> +   figuring out what sort of jump to choose to reach a given label.
> +
> +   BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING are used to align
> +   branches which are handled by md_estimate_size_before_relax() and
> +   i386_generic_table_relax_frag().  */
>  
>  /* Types.  */
>  #define UNCOND_JUMP 0
>  #define COND_JUMP 1
>  #define COND_JUMP86 2
> +#define BRANCH_PADDING 3
> +#define BRANCH_PREFIX 4
> +#define FUSED_JCC_PADDING 5
>  
>  /* Sizes.  */
>  #define CODE16	1
> @@ -1384,6 +1449,12 @@ i386_generate_nops (fragS *fragP, char *where, offsetT count, int limit)
>      case rs_fill_nop:
>      case rs_align_code:
>        break;
> +    case rs_machine_dependent:
> +      /* Allow NOP padding for jumps and calls.  */
> +      if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PADDING
> +	  || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == FUSED_JCC_PADDING)
> +	break;
> +      /* Fall through.  */
>      default:
>        return;
>      }
> @@ -1528,7 +1599,7 @@ i386_generate_nops (fragS *fragP, char *where, offsetT count, int limit)
>  	  return;
>  	}
>      }
> -  else
> +  else if (fragP->fr_type != rs_machine_dependent)
>      fragP->fr_var = count;
>  
>    if ((count / max_single_nop_size) > max_number_of_nops)
> @@ -3011,6 +3082,11 @@ md_begin (void)
>        x86_dwarf2_return_column = 8;
>        x86_cie_data_alignment = -4;
>      }
> +
> +  /* NB: FUSED_JCC_PADDING frag must have sufficient room so that it
> +     can be turned into BRANCH_PREFIX frag.  */
> +  if (align_branch_prefix_size > MAX_FUSED_JCC_PADDING_SIZE)
> +    abort ();
>  }
>  
>  void
> @@ -4536,6 +4612,17 @@ md_assemble (char *line)
>  
>    /* We are ready to output the insn.  */
>    output_insn ();
> +
> +  last_insn.seg = now_seg;
> +
> +  if (i.tm.opcode_modifier.isprefix)
> +    {
> +      last_insn.kind = last_insn_prefix;
> +      last_insn.name = i.tm.name;
> +      last_insn.file = as_where (&last_insn.line);
> +    }
> +  else
> +    last_insn.kind = last_insn_other;
>  }
>  
>  static char *
> @@ -8193,11 +8280,206 @@ encoding_length (const fragS *start_frag, offsetT start_off,
>    return len - start_off + (frag_now_ptr - frag_now->fr_literal);
>  }
>  
> +/* Return 1 for test, and, cmp, add, sub, inc and dec which may
> +   be macro-fused with conditional jumps.  */
> +
> +static int
> +maybe_fused_with_jcc_p (void)
> +{
> +  /* No RIP address.  */
> +  if (i.base_reg && i.base_reg->reg_num == RegIP)
> +    return 0;
> +
> +  /* No VEX/EVEX encoding.  */
> +  if (is_any_vex_encoding (&i.tm))
> +    return 0;
> +
> +  /* and, add, sub with destination register.  */
> +  if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
> +      || i.tm.base_opcode <= 5
> +      || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
> +      || ((i.tm.base_opcode | 3) == 0x83
> +	  && ((i.tm.extension_opcode | 1) == 0x5

One more minor suggestion to reduce the number of branches to
result from this code: Just like the "| 1" here, couldn't you

  if (((i.tm.base_opcode | 8) >= 0x28 && (i.tm.base_opcode | 8) <= 0x2d)

> +	      || i.tm.extension_opcode == 0x0)))
> +    return (i.types[1].bitfield.class == Reg
> +	    || i.types[1].bitfield.instance == Accum);
> +
> +  /* test, cmp with any register.  */
> +  if ((i.tm.base_opcode | 1) == 0x85
> +      || (i.tm.base_opcode | 1) == 0xa9
> +      || ((i.tm.base_opcode | 1) == 0xf7
> +	  && i.tm.extension_opcode == 0)
> +      || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
> +      || ((i.tm.base_opcode | 3) == 0x83
> +	  && (i.tm.extension_opcode == 0x7)))
> +    return (i.types[0].bitfield.class == Reg
> +	    || i.types[0].bitfield.instance == Accum
> +	    || i.types[1].bitfield.class == Reg
> +	    || i.types[1].bitfield.instance == Accum);
> +
> +  /* inc, dec with any register.   */
> +  if ((i.tm.cpu_flags.bitfield.cpuno64
> +       && (i.tm.base_opcode | 0xf) == 0x4f)
> +      || ((i.tm.base_opcode | 1) == 0xff
> +	  && (i.tm.extension_opcode | 1) == 0x1))

Maybe

	  && i.tm.extension_opcode <= 0x1)

?

> +/* Return 1 if a BRANCH_PREFIX frag should be generated.  */
> +
> +static int
> +add_branch_prefix_frag_p (void)
> +{
> +  /* NB: Don't work with COND_JUMP86 without i386.  Don't add prefix
> +     to PadLock instructions since they include prefixes in opcode.  */
> +  if (!align_branch_power
> +      || !align_branch_prefix_size
> +      || now_seg == absolute_section
> +      || i.tm.cpu_flags.bitfield.cpupadlock

Didn't you confirm you'd take care of here of other insns than just
the PadLock ones including prefixes in their base_opcode? (I still
don't see what's wrong in this case, as you only mean to add
segment overrides, but that's a separate aspect.)

> @@ -8473,9 +8815,105 @@ output_insn (void)
>  	  if (j > 15)
>  	    as_warn (_("instruction length of %u bytes exceeds the limit of 15"),
>  		     j);
> +	  else if (fragP)
> +	    {
> +	      /* NB: Don't add prefix with GOTPC relocation since
> +		 output_disp() above depends on the fixed encoding
> +		 length.  Can't add prefix with TLS relocation since
> +		 it breaks TLS linker optimization.  */
> +	      unsigned int max = i.has_gotpc_tls_reloc ? 0 : 15 - j;
> +	      /* Prefix count on the current instruction.  */
> +	      unsigned int count = i.vex.length;
> +	      unsigned int k;
> +	      for (k = 0; k < ARRAY_SIZE (i.prefix); k++)
> +		/* REX byte is encoded in VEX/EVEX prefix.  */
> +		if (i.prefix[k] && (k != REX_PREFIX || !i.vex.length))
> +		  count++;
> +
> +	      /* Count SSE prefix.  */
> +	      if (!i.vex.length)
> +		switch (i.tm.opcode_length)
> +		  {
> +		  case 3:
> +		    if (((i.tm.base_opcode >> 16) & 0xff) == 0xf)
> +		      {
> +			count++;
> +			switch ((i.tm.base_opcode >> 8) & 0xff)
> +			  {
> +			  case 0x38:
> +			  case 0x3a:
> +			    count++;
> +			    break;
> +			  default:
> +			    break;
> +			  }
> +		      }
> +		    break;
> +		  case 2:
> +		    if (((i.tm.base_opcode >> 8) & 0xff) == 0xf)
> +		      count++;

In particular (but not only) for this case the "SSE" in the comment
looks sufficiently misleading. How about "extended opcode maps"?

Jan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]