[PR24444] speed up locview resolution wiht relaxable frags

Sun Apr 14 13:22:00 GMT 2019

On Sat, Apr 13, 2019 at 05:55:34AM -0300, Alexandre Oliva wrote:
> 
> Targets such as xtensa incur a much higher overhead to resolve
> location view numbers than e.g. x86, because the expressions used to
> compute view numbers cannot be resolved soon enough.
> 
> Each view number is computed by incrementing the previous view, if
> they are both at the same address, or by resetting it to zero
> otherwise.  If PV is the previous view number, PL is its location, and
> NL is the location of the next view, its number is computed by
> evaluating NV = !(NL > PL) * (PV + 1).
> 
> set_or_check_view uses resolve_expression to decide whether portions
> of this expression can be simplified to constants.  The (NL > PL)
> subexpression is one that can often be resolved to a constant,
> breaking chains of view number computations at instructions of nonzero
> length, but not after alignment that might be unnecessary.
> 
> Alas, when nearly every frag ends with a relaxable instruction,
> frag_offset_fixed_p will correctly fail to determine a known offset
> between two unresolved addresses in neighboring frags, so the
> unresolved symbolic operation will be constructed and used in the
> computation of most view numbers.  This results in very deep
> expressions.
> 
> As view numbers get referenced in location view lists, each operand in
> the list goes through symbol_clone_if_forward_ref, which recurses on
> every subexpression.  If each view number were to be referenced, this
> would exhibit O(n^2) behavior, where n is the depth of the view number
> expressions, i.e., the length of view number sequences without an
> early resolution that cuts the expression short.
> 
> I see two ways to go about this:
> 
> - make forward_ref a tri-state, so that symbol_clone_if_forward_ref
> can avoid revisiting the same nodes over and over when they have
> already been visited and the conclusion was that their expression does
> not contain forward refs.

This doesn't work.  I threw together a quick patch when I saw this bug
report and the result was a reduction in gas run-time from 89 to 80
minutes on the testcase in the PR.  The problem is simply the depth of
the expressions..

> - enable address compares used by view numbering to be resolved even
> when exact offsets are not known, using new logic to determine when
> the location either remained the same or changed for sure, even with
> the possibility of relaxation.

Whereas this approach does in fact result in shorter expressions,
reducing run-time to 3 seconds.

> This patch implements the latter, assuming fr_fix to be the minimum
> size the frag will end up with, even for machine-dependent frags, but
> not rs_org: for those, or for negative fr_fix, it conservatively
> refrains from resolving the expression.  This enables most view number
> expressions to be resolved with a small, reasonable depth.
> 
> Now, is the assumption on fr_fix a reasonable one to make across all
> supported targets?

Yes, I believe so.  In fact, fr_fix is really unsigned.  Also, it is
an error for rs_org frags to go backwards.

> Tested on x86_64-linux-gnu, native and cross to xtensa-elf.  No new
> testcase; the issue was just performance, not output correctness.  Ok to
> install?

Um, the testcase object file after this patch differs from the
original.  First readelf -wi difference shown below.

@@ -109941,7 +109941,7 @@
  <2><3e4f6>: Abbrev Number: 8 (DW_TAG_inlined_subroutine)
     <3e4f7>   DW_AT_abstract_origin: <0x2e531>
     <3e4fb>   DW_AT_entry_pc    : 0x2a
-    <3e4ff>   DW_AT_GNU_entry_view: 9
+    <3e4ff>   DW_AT_GNU_entry_view: 2
     <3e501>   DW_AT_low_pc      : 0x2a
     <3e505>   DW_AT_high_pc     : 0x0
     <3e509>   DW_AT_call_file   : 3


> 
> 
> for  gas/ChangeLog
> 
> 	PR gas/24444
> 	* frags.c (frag_gtoffset_p): New.
> 	* frags.h (frag_gtoffset_p): Declare it.
> 	* expr.c (resolve_expression): Use it.
> ---
>  gas/expr.c  |    5 +++-
>  gas/frags.c |   74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  gas/frags.h |    2 ++
>  3 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/gas/expr.c b/gas/expr.c
> index ee85bda1cc60..3efde88cc0a2 100644
> --- a/gas/expr.c
> +++ b/gas/expr.c
> @@ -2179,7 +2179,10 @@ resolve_expression (expressionS *expressionP)
>  		|| op == O_lt || op == O_le || op == O_ge || op == O_gt)
>  	       && seg_left == seg_right
>  	       && (finalize_syms
> -		   || frag_offset_fixed_p (frag_left, frag_right, &frag_off))
> +		   || frag_offset_fixed_p (frag_left, frag_right, &frag_off)
> +		   || (op == O_gt
> +		       && frag_gtoffset_p (left, frag_left,
> +					   right, frag_right, &frag_off)))
>  	       && (seg_left != reg_section || left == right)
>  	       && (seg_left != undefined_section || add_symbol == op_symbol)))
>  	{
> diff --git a/gas/frags.c b/gas/frags.c
> index 90096ff696d7..e2ff79fbcc04 100644
> --- a/gas/frags.c
> +++ b/gas/frags.c
> @@ -460,3 +460,77 @@ frag_offset_fixed_p (const fragS *frag1, const fragS *frag2, offsetT *offset)
>  
>    return FALSE;
>  }
> +
> +/* Return TRUE if we can determine whether FRAG1 OFF1 appears after
> +   (strict >, not >=) FRAG2 OFF2, assuming it is not before.  Set
> +   *OFFSET so that resolve_expression will resolve an O_gt operation
> +   between them to false (0) if they are guaranteed to be at the same
> +   location, or to true (-1) if they are guaranteed to be at different
> +   locations.  Return FALSE conservatively, e.g. if neither result can
> +   be guaranteed (yet).
> +
> +   They are known to be in the same segment, and not the same frag
> +   (this is a fallback for frag_offset_fixed_p, that always takes care
> +   of this case), and it is expected (from the uses this is designed
> +   to simplify, namely location view increments) that frag1 is
> +   reachable from frag2 following the fr_next links, rather than the
> +   other way round.  */

For the sake of anyone reading this code, please swap the param names,
with frag1/off1 being the first in the frag chain.

> +
> +bfd_boolean
> +frag_gtoffset_p (valueT off1, const fragS *frag1,
> +		 valueT off2, const fragS *frag2, offsetT *offset)
> +{
> +  if (frag1 == frag2 || off2 > (valueT)frag2->fr_fix)
> +    return FALSE;
> +
> +  /* If any frag from frag2, inclusive, to frag1, exclusive, has a
> +     nonzero fixed length, the addresses won't be the same as long as
> +     we reach frag1.  */
> +  bfd_boolean maybe_same_addr = off1 == 0 && off2 == (valueT)frag2->fr_fix;

Ins't it true that if the symbol offsets are not at exactly the above
values, then you already know the result and there's no need to
traverse the frags?  This is assuming you can't .org backwards, which
is the case.

> +  bfd_boolean found_varlength = FALSE;
> +  const fragS *frag = frag2;
> +  for (;;)
> +    {
> +      switch (frag->fr_type)
> +	{
> +	case rs_dummy:
> +	  break;
> +
> +	case rs_fill:
> +	case rs_align:
> +	case rs_align_code:
> +	case rs_align_test:
> +	case rs_space:
> +	case rs_space_nop:
> +	case rs_fill_nop:
> +	case rs_leb128:
> +	case rs_cfa:
> +	case rs_dwarf2dbg:
> +	case rs_machine_dependent:
> +	  found_varlength = TRUE;
> +	  break;
> +
> +	default:
> +     /* case rs_broken_word: */
> +	case rs_org:
> +	  return FALSE;
> +	}
> +
> +      frag = frag->fr_next;
> +      if (frag == frag1 || !frag)
> +	break;
> +      if (maybe_same_addr && frag->fr_fix > 0)
> +	maybe_same_addr = FALSE;
> +      if (frag->fr_fix < 0)
> +	return FALSE;
> +    }
> +
> +  /* If both are TRUE, we don't know whether or not it will resolve to
> +     the same address.  */
> +  if (maybe_same_addr && found_varlength)
> +    return FALSE;
> +
> +  if (frag)
> +    *offset = (off1 - off2 - !maybe_same_addr) * OCTETS_PER_BYTE;
> +  return !!frag;
> +}
> diff --git a/gas/frags.h b/gas/frags.h
> index 6ac5d4426c79..9eb066229085 100644
> --- a/gas/frags.h
> +++ b/gas/frags.h
> @@ -154,6 +154,8 @@ char *frag_var (relax_stateT type,
>  		char *opcode);
>  
>  bfd_boolean frag_offset_fixed_p (const fragS *, const fragS *, offsetT *);
> +bfd_boolean frag_gtoffset_p (valueT, const fragS *, valueT, const fragS *,
> +			     offsetT *);
>  
>  int get_frag_count (void);
>  void clear_frag_count (void);

The following makes the changes I'm suggesting, and simplifies a few
more things.  The object file for the testcase is the same as with
your patch.


/* Return TRUE if we can determine whether FRAG2 OFF2 appears after
   (strict >, not >=) FRAG1 OFF1, assuming it is not before.  Set
   *OFFSET so that resolve_expression will resolve an O_gt operation
   between them to false (0) if they are guaranteed to be at the same
   location, or to true (-1) if they are guaranteed to be at different
   locations.  Return FALSE conservatively, e.g. if neither result can
   be guaranteed (yet).

   They are known to be in the same segment, and not the same frag
   (this is a fallback for frag_offset_fixed_p, that always takes care
   of this case), and it is expected (from the uses this is designed
   to simplify, namely location view increments) that frag2 is
   reachable from frag1 following the fr_next links, rather than the
   other way round.  */

bfd_boolean
frag_gtoffset_p (valueT off2, const fragS *frag2,
		 valueT off1, const fragS *frag1, offsetT *offset)
{
  /* Insanity check.  */
  if (frag2 == frag1 || off1 > (valueT) frag1->fr_fix)
    return FALSE;

  /* If the first symbol offset is at the end of the first frag and
     the second symbol offset at the beginning of the second frag then
     it is possible they are at the same address.  Go looking for a
     non-zero fr_fix in any frag between these frags.  If found then
     we can say the O_gt result will be true.  If no such frag is
     found we assume that frag1 or any of the following frags might
     have a variable tail and thus the answer is unknown.  This isn't
     strictly true; some frags don't have a variable tail, but it
     doesn't seem worth optimizing for those cases.
     We could omit the loop on off1 != frag1->fr_fix || off2 != 0  and
     exit the loop on finding a non-zero fr_fix, but loop until we hit
     frag2 to check our assumptions.  */
  const fragS *frag = frag1;
  offsetT delta = off2 - off1;
  for (;;)
    {
      delta += frag->fr_fix;
      frag = frag->fr_next;
      if (frag == frag2)
	{
	  if (delta == 0)
	    return FALSE;
	  break;
	}
      /* Insanity check.  */
      if (frag == NULL)
	return FALSE;
    }

  *offset = (off2 - off1 - 1) * OCTETS_PER_BYTE;
  return TRUE;
}

-- 
Alan Modra
Australia Development Lab, IBM