This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [RFA] rs6000-tdep.c: Improve prologue handling with code motion.


On Nov 10,  1:26pm, Peter.Schauer wrote:

> I was also trying to minimize the risk of overscan and performance impacts.
[...]
> I don't know if we should include the find_pc_line test or not.

I have an alternate proposal (that uses find_pc_line) which attempts
to limit the possibility of overscan.  (See below.)

> Would you be willing to run the GDB testsuite with your version of the
> compiler, using -O2 ?
> It will almost certainly cause many failures, but it would be interesting,
> if you get better results with my original patch, or still even better
> results when you leave out the find_pc_line test.

This was certainly an interesting exercise, but the results are
inconclusive.  I do get slightly better results when I use your
patch *with* the find_pc_line test.  Here are the numerical
results:

(a) FAILs w/ patch with find_pc_line:        421
(b) FAILs w/ patch without find_pc_line:     422
(c) FAILs w/ limit refinement patch (below): 422

(There's some non-determinism in one of the threads tests; Sometimes
one additional failure would appear.)

When I compare the specific FAILures, I find that the FAILures for
(b) and (c) to be identical.  However, I think (c) performs better
and I would expect there to be cases where (c) would result in a
better prologue analysis due to the fact that it will prevent an
overscan.

The results are interesting when you compare the specific FAILs in (a)
and (b).  These tests do have some FAILs in common, but a large number
of them are different.  I.e, there were a lot of tests which got a
PASS for (a) and a FAIL for (b) and vice versa.

I will now describe two of the more interesting cases.

Case 1: gdb.base/break.exp: run until file:function(6) breakpoint

This FAILs with (b) and (c), but passes with (a).  The reason for
(b) and (c) FAILing is that the breakpoint gets set on an instruction
corresponding to line 97 in break.c instead of line 96 which is
expected by the test suite.

The generated code for factorial looks like this:

0x100005f8 <factorial>: stwu    r1,-16(r1)
0x100005fc <factorial+4>:       mflr    r0
0x10000600 <factorial+8>:       stw     r31,12(r1)
0x10000604 <factorial+12>:      mr      r31,r3
0x10000608 <factorial+16>:      cmpwi   r31,1
0x1000060c <factorial+20>:      stw     r0,20(r1)
0x10000610 <factorial+24>:      addi    r3,r31,-1
0x10000614 <factorial+28>:      ble     0x10000624 <factorial+44>
0x10000618 <factorial+32>:      crclr   4*cr1+eq
0x1000061c <factorial+36>:      bl      0x100005f8 <factorial>
0x10000620 <factorial+40>:      mullw   r31,r31,r3
0x10000624 <factorial+44>:      lwz     r0,20(r1)
0x10000628 <factorial+48>:      mr      r3,r31
0x1000062c <factorial+52>:      lwz     r31,12(r1)
0x10000630 <factorial+56>:      mtlr    r0
0x10000634 <factorial+60>:      addi    r1,r1,16
0x10000638 <factorial+64>:      blr

(b) and (c) put the breakpoint for "break break.c:factorial" on
factorial+24.  In my opinion, this is correct since the stw
instruction at factorial+20 is actually the last instruction in the
prologue.  Note also that factorial+24 corresponds to line 97, but
that factorial+28 corresponds to line 96.

(a) puts the breakpoint on factorial+16.  This instruction corresponds
to the "if" statement (line 96), but is before the end of the prologue.

So... even though we have a failure here where we had none before,
I think the breakpoint placement is preferable.

Case 2: gdb.base/sizeof.exp: check sizeof char == 1

In this case, a breakpoint is placed on main.  (b) and (c) PASS this
test whereas (a) FAILs it.

The code in question looks like this:

0x100004a4 <main>:      lis     r3,4096
0x100004a8 <main+4>:    stwu    r1,-16(r1)
0x100004ac <main+8>:    mflr    r0
0x100004b0 <main+12>:   li      r4,1
0x100004b4 <main+16>:   addi    r3,r3,1616
0x100004b8 <main+20>:   stw     r0,20(r1)
0x100004bc <main+24>:   crclr   4*cr1+eq
0x100004c0 <main+28>:   bl      0x100108a0 <printf>
...

Note that the very first instruction of the function *is not*
part of the prologue!

That being the case, (a) puts its breakpoint on 0x100004a4 <main>,
whereas (b) and (c) put the breakpoint on 0x100004bc <main+24> which
is just after the last prologue instruction.  Even if you disagree
with me on case 1, this breakpoint is clearly preferable for case 2.

[...]
> On the other hand we have prove that the patch doesn't introduce any
> testsuite regressions, even with -O2.

Hmm.  I don't see how this is going to be possible.  I examined
quite a few of the test cases and there were some where it was
shear luck that (a) passed some of the tests that (b) and (c) had
failed upon.

I do agree though that there should be no regressions when we
take the optimizer out of the picture.  I've tested for such
and have found none.

Here's a combined patch:

	* rs6000-tdep.c (refine_prolog_limit): New function.
	(skip_prologue): Adjust lim_pc by calling refine_prologue_limit.
	Also, adjust fencepost error regarding the limit in the loop.

	From Peter Schauer:
	* rs6000-tdep.c (skip_prologue):  Handle optimizer code motions into
	the prologue by continuing the prologue search, if we have no valid
	frame yet or if the return address is not yet saved in the frame.

Index: rs6000-tdep.c
===================================================================
RCS file: /cvs/src/src/gdb/rs6000-tdep.c,v
retrieving revision 1.16
diff -u -p -r1.16 rs6000-tdep.c
--- rs6000-tdep.c	2000/11/09 09:49:00	1.16
+++ rs6000-tdep.c	2000/11/14 02:32:59
@@ -380,11 +380,63 @@ rs6000_software_single_step (unsigned in
 
 #define GET_SRC_REG(x) (((x) >> 21) & 0x1f)
 
+/* Limit the number of skipped non-prologue instructions, as the examining
+   of the prologue is expensive.  */
+static int max_skip_non_prologue_insns = 10;
+
+/* Given PC representing the starting address of a function, and
+   LIM_PC which is the (sloppy) limit to which to scan when looking
+   for a prologue, attempt to further refine this limit by using
+   the line data in the symbol table.  If successful, a better guess
+   on where the prologue ends is returned, otherwise the previous
+   value of lim_pc is returned.  */
 static CORE_ADDR
+refine_prologue_limit (CORE_ADDR pc, CORE_ADDR lim_pc)
+{
+  struct symtab_and_line prologue_sal;
+
+  prologue_sal = find_pc_line (pc, 0);
+  if (prologue_sal.line != 0)
+    {
+      int i;
+      CORE_ADDR addr = prologue_sal.end;
+
+      /* Handle the case in which compiler's optimizer/scheduler
+         has moved instructions into the prologue.  We scan ahead
+	 in the function looking for address ranges whose corresponding
+	 line number is less than or equal to the first one that we
+	 found for the function.  (It can be less than when the
+	 scheduler puts a body instruction before the first prologue
+	 instruction.)  */
+      for (i = 2 * max_skip_non_prologue_insns; 
+           i > 0 && (lim_pc == 0 || addr < lim_pc);
+	   i--)
+        {
+	  struct symtab_and_line sal;
+
+	  sal = find_pc_line (addr, 0);
+	  if (sal.line == 0)
+	    break;
+	  if (sal.line <= prologue_sal.line 
+	      && sal.symtab == prologue_sal.symtab)
+	    {
+	      prologue_sal = sal;
+	    }
+	  addr = sal.end;
+	}
+
+      if (lim_pc == 0 || prologue_sal.end < lim_pc)
+	lim_pc = prologue_sal.end;
+    }
+  return lim_pc;
+}
+
+
+static CORE_ADDR
 skip_prologue (CORE_ADDR pc, CORE_ADDR lim_pc, struct rs6000_framedata *fdata)
 {
   CORE_ADDR orig_pc = pc;
-  CORE_ADDR last_prologue_pc;
+  CORE_ADDR last_prologue_pc = pc;
   char buf[4];
   unsigned long op;
   long offset = 0;
@@ -394,6 +446,9 @@ skip_prologue (CORE_ADDR pc, CORE_ADDR l
   int framep = 0;
   int minimal_toc_loaded = 0;
   int prev_insn_was_prologue_insn = 1;
+  int num_skip_non_prologue_insns = 0;
+
+  lim_pc = refine_prologue_limit (pc, lim_pc);
 
   memset (fdata, 0, sizeof (struct rs6000_framedata));
   fdata->saved_gpr = -1;
@@ -402,19 +457,22 @@ skip_prologue (CORE_ADDR pc, CORE_ADDR l
   fdata->frameless = 1;
   fdata->nosavedpc = 1;
 
-  pc -= 4;
-  while (lim_pc == 0 || pc < lim_pc - 4)
+  for (;; pc += 4)
     {
-      pc += 4;
-
       /* Sometimes it isn't clear if an instruction is a prologue
          instruction or not.  When we encounter one of these ambiguous
 	 cases, we'll set prev_insn_was_prologue_insn to 0 (false).
 	 Otherwise, we'll assume that it really is a prologue instruction. */
       if (prev_insn_was_prologue_insn)
 	last_prologue_pc = pc;
+
+      /* Stop scanning if we've hit the limit.  */
+      if (lim_pc != 0 && pc >= lim_pc)
+	break;
+
       prev_insn_was_prologue_insn = 1;
 
+      /* Fetch the instruction and convert it to an integer.  */
       if (target_read_memory (pc, buf, 4))
 	break;
       op = extract_signed_integer (buf, 4);
@@ -628,7 +686,31 @@ skip_prologue (CORE_ADDR pc, CORE_ADDR l
 	}
       else
 	{
-	  break;
+	  /* Not a recognized prologue instruction.
+	     Handle optimizer code motions into the prologue by continuing
+	     the search if we have no valid frame yet or if the return
+	     address is not yet saved in the frame.  */
+	  if (fdata->frameless == 0
+	      && (lr_reg == -1 || fdata->nosavedpc == 0))
+	    break;
+
+	  if (op == 0x4e800020		/* blr */
+	      || op == 0x4e800420)	/* bctr */
+	    /* Do not scan past epilogue in frameless functions or
+	       trampolines.  */
+	    break;
+	  if ((op & 0xf4000000) == 0x40000000) /* bxx */
+	    /* Never skip branches. */
+	    break;
+
+	  if (num_skip_non_prologue_insns++ > max_skip_non_prologue_insns)
+	    /* Do not scan too many insns, scanning insns is expensive with
+	       remote targets.  */
+	    break;
+
+	  /* Continue scanning.  */
+	  prev_insn_was_prologue_insn = 0;
+	  continue;
 	}
     }
 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]