Bug 16157 - the function get_pc_function_start (CORE_ADDR pc) maybe inaccurate
Summary: the function get_pc_function_start (CORE_ADDR pc) maybe inaccurate
Status: WAITING
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: unknown
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-12 06:12 UTC by ggs334
Modified: 2013-11-14 09:57 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ggs334 2013-11-12 06:12:30 UTC
get_pc_function_start(CORE_ADDR pc) try to get the function start for a special pc, but the function 
lookup_minimal_symbol_by_pc(CORE_ADDR pc) may return a minimal_symbol, which is not a function(e.g. a label in assembler code). So the fstart is not a function start address, too.

This may cause a problem: in following code, GDB can not stop when try to next over Line 1.(lop2 and lop3 are mistaken for a function, so GDB thinks that it step into a new function, set a breakpoint at the address stored in register $ra, and run to it)

Is this correct? 

==========================
        #.globl hardware_hazard_hook .text
        .globl  _start
        .ent    _start
_start:
        .set    noreorder
    	addiu  v0, 1
	addiu  v0, 1
lop3:
     addiu  v0, 1
	addiu  v0, 1
lop2:
	addiu  v0, 1// Line 1
	addiu  v0, 1
lop1:
	addiu  v0, 1
	addiu  v0, 1
	addiu  v0, 1
	addiu  v0, 1
  	nop
...
-------------------------------
gdb/minsyms.c
CORE_ADDR
get_pc_function_start (CORE_ADDR pc)
{
  struct block *bl;
  struct minimal_symbol *msymbol;

  bl = block_for_pc (pc);
  if (bl)
    {
      struct symbol *symbol = block_linkage_function (bl);

      if (symbol)
	{
	  bl = SYMBOL_BLOCK_VALUE (symbol);
	  return BLOCK_START (bl);
	}
    }

  msymbol = lookup_minimal_symbol_by_pc (pc);
  if (msymbol)
    {
      CORE_ADDR fstart = SYMBOL_VALUE_ADDRESS (msymbol);
      if (find_pc_section (fstart))
	return fstart;
    }

  return 0;
}
Comment 1 Pedro Alves 2013-11-12 11:46:23 UTC
> get_pc_function_start(CORE_ADDR pc) try to get the function start for a 
> special pc, but the function 
> lookup_minimal_symbol_by_pc(CORE_ADDR pc) may return a minimal_symbol, which 
> is not a function(e.g. a label in assembler code). So the fstart is not a 
> function start address, too.

The only way to tell apart a label from a function, is from the minimal symbol's size.  Try stepping through lookup_minimal_symbol_by_pc_section_1, and see the comments there.

> This may cause a problem: in following code, GDB can not stop when try to next > over Line 1.(lop2 and lop3 are mistaken for a function, so GDB thinks that it > step into a new function, set a breakpoint at the address stored in register
> $ra, and run to it)

Sounds like something else might be tricking GDB into thinking you stepped into a new function.  See the code just below "Check for subroutine calls." part of infrun.c.  That's where the logic to detect if the program called a new function is.
I wonder if this related to the outermost heuristics, or something odd in the unwinder/backtrace.  What does "bt" show when the program is stopped at the instruction just before lop2, and then again "bt" when you stepi to lop2?
Comment 2 ggs334 2013-11-12 13:01:29 UTC
> "bt" show when the program is stopped at the instruction just before lop2:
#0  _start () at crt0.S:93
> then stepi
(gdb) si
warning: GDB can't find the start of the function at 0xfffffffc.

> and then again "bt" when stepi to lop2?
#0  lop2 () at crt0.S:95
#1  0xfffffffe in ?? ()

> Sounds like something else might be tricking GDB into thinking you stepped 
> into a new function.  See the code just below "Check for subroutine calls." 
> part of infrun.c.  That's where the logic to detect if the program called a 
> new function is.

GDB use the frame id to "Check for subroutine calls", and the function frame_id_eq() will check the .code_addr,If .code addresses are
different, the frames are different. If lop3 and lop2 are mistaken for function start address, the code address are different. So GDB thinking the program stepped into a new function

gdb/frame.c:
int
frame_id_eq (struct frame_id l, struct frame_id r)
{
  int eq;

  if (!l.stack_addr_p && l.special_addr_p
      && !r.stack_addr_p && r.special_addr_p)
    /* The outermost frame marker is equal to itself.  This is the
       dodgy thing about outer_frame_id, since between execution steps
       we might step into another function - from which we can't
       unwind either.  More thought required to get rid of
       outer_frame_id.  */
    eq = 1;
  else if (!l.stack_addr_p || !r.stack_addr_p)
    /* Like a NaN, if either ID is invalid, the result is false.
       Note that a frame ID is invalid iff it is the null frame ID.  */
    eq = 0;
  else if (l.stack_addr != r.stack_addr)
    /* If .stack addresses are different, the frames are different.  */
    eq = 0;
  else if (l.code_addr_p && r.code_addr_p && l.code_addr != r.code_addr)
    /* An invalid code addr is a wild card.  If .code addresses are
       different, the frames are different.  */
    eq = 0;
  else if (l.special_addr_p && r.special_addr_p
	   && l.special_addr != r.special_addr)
    /* An invalid special addr is a wild card (or unused).  Otherwise
       if special addresses are different, the frames are different.  */
    eq = 0;
  else if (l.artificial_depth != r.artificial_depth)
    /* If artifical depths are different, the frames must be different.  */
    eq = 0;
  else
    /* Frames are equal.  */
    eq = 1;

  if (frame_debug)
    {
      fprintf_unfiltered (gdb_stdlog, "{ frame_id_eq (l=");
      fprint_frame_id (gdb_stdlog, l);
      fprintf_unfiltered (gdb_stdlog, ",r=");
      fprint_frame_id (gdb_stdlog, r);
      fprintf_unfiltered (gdb_stdlog, ") -> %d }\n", eq);
    }
  return eq;
}
Comment 3 ggs334 2013-11-12 13:18:17 UTC
look the code in the function get_pc_function_start(CORE_ADDR pc)   
> CORE_ADDR
> get_pc_function_start (CORE_ADDR pc)
> {
> ...........
> ...........
> ...........
> ...........
> ...........
> ...........

>  msymbol = lookup_minimal_symbol_by_pc (pc);
>  if (msymbol)
>    {
>     CORE_ADDR fstart = SYMBOL_VALUE_ADDRESS (msymbol);
>      if (find_pc_section (fstart))
>	return fstart;
>    }

>  return 0;
>}
the label lop2 and lop3 hava adress values, if the pc value is equal to the address of lop2 or lop3, the msymbol returned from lookup_minimal_symbol_by_pc() must be lop2 or lop3, then uses SYMBOL_VALUE_ADDRESS (msymbol) to get the address, and treats the address as function start address.

I Think this is the problem, is it accurate?
Comment 4 Pedro Alves 2013-11-12 14:26:46 UTC
> the code address are different. So GDB thinking the program stepped into a new 
> function

That's not sufficient, the frame that was frame #0 before the step must be frame #1 after the step for GDB to consider this was a subroutine call.  That's this part of the condition:

      && (frame_id_eq (frame_unwind_caller_id (get_current_frame ()),
		       ecs->event_thread->control.step_stack_frame_id)

If before the stepi you have:

 #0  _start () at crt0.S:93

and then after you have:

 #0  lop2 () at crt0.S:95
 #1  0xfffffffe in ?? ()

Then I don't understand how that frame_id_eq returned true.  Well, unless both were outer_frame_id.  Please check that.

I also don't understand why GDB thinks the function is _start just before the stepi, instead of lop3.  What's different between lop3 and lop2?  You need to step through  lookup_minimal_symbol_by_pc_section_1 and understand that.

> the label lop2 and lop3 hava adress values, if the pc value is equal to the 
> address of lop2 or lop3, the msymbol returned from
> lookup_minimal_symbol_by_pc() must be lop2 or lop3, then uses 
> SYMBOL_VALUE_ADDRESS (msymbol) to get the address, and treats the address as 
> function start address.
> I Think this is the problem, is it accurate?

Not exactly.  lookup_minimal_symbol_by_pc, if not returning the "real" function, then should be returning the closes label.  That is, for all instructions between lop3 and lop2, it should return lop3, etc.

But that shouldn't be a problem on its own, the other checks in the
"Check for subroutine calls" bit should catch that.  Unless, again, this is 
really the outer_frame_id bits triggering.  outer_frame_id really should die...
Comment 5 Pedro Alves 2013-11-12 14:29:20 UTC
> Unless, again, this is really the outer_frame_id bits triggering.  
> outer_frame_id really should die...

BTW, if this is the case, this means that this issue only triggers when stepping through code in the outermost frame (the entry point).  IOW, iIf your _start  was actually some other function that was called by _start (so that it'd wouldn't be the outermost frame), this issue wouldn't trigger.
Comment 6 ggs334 2013-11-13 02:49:00 UTC
> Then I don't understand how that frame_id_eq returned true 
The frame_id_eq return false(eq == 0) according to the following condition:
-----
else if (l.code_addr_p && r.code_addr_p && l.code_addr != r.code_addr)
    /* An invalid code addr is a wild card.  If .code addresses are
       different, the frames are different.  */
    eq = 0;
-----
and I try to delete this code, The problem disappears.

> What's different between lop3 and lop2? 
No different from lop2 and lop3, only 2 labels.

------------------

the next case: when I single step in line 196, program run until exit:

Breakpoint 1, zerobss () at crt0.S:196
196             sw      v0, 0(s0)
(gdb) l
191             nop
192
193             # Tell other cores it's ready
194             li      v0, 1
195             LA      (s0, flag_ready)
196             sw      v0, 0(s0)
197
198     all_wait_1:
199             LA      (s0, flag_ready)
200             lw      v0, 0(s0)
(gdb) l 196
191             nop
192
193             # Tell other cores it's ready
194             li      v0, 1
195             LA      (s0, flag_ready)
196             sw      v0, 0(s0)
197
198     all_wait_1:
199             LA      (s0, flag_ready)
200             lw      v0, 0(s0)
(gdb) set debug infrun 1
(gdb) s
=pc:===ffffffffbfc0012c====
=func start===ffffffffbfc000d8====
infrun: clear_proceed_status_thread (Thread 1)
infrun: proceed (addr=0xffffffff, signal=144, step=1)
infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 1] at 0xbfc0012c
infrun: wait_for_inferior ()
infrun: target_wait (-1, status) =
infrun:   42000 [Thread 1],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0xbfc00130
=pc:===ffffffffbfc00130====
=func start===ffffffffbfc00130====
=pc:===ffffffffbfc0011f====
=func start===ffffffffbfc000d8====
infrun: stepped into subroutine
infrun: inserting step-resume breakpoint at 0xbfc00004
infrun: resume (step=0, signal=0), trap_expected=0, current thread [Thread 1] at 0xbfc00130
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   42000 [Remote target],
infrun:   status->kind = exited, status = 0
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_EXITED
[Inferior 1 (Remote target) exited normally]
Comment 7 ggs334 2013-11-13 03:10:11 UTC
>BTW, if this is the case, this means that this issue only triggers when >stepping through code in the outermost frame (the entry point).  IOW, iIf >your _start  was actually some other function that was called by _start (so >that it'd wouldn't be the outermost frame), this issue wouldn't trigger.

You are right! this issue only triggers when debugging assembler code!
Comment 8 Pedro Alves 2013-11-13 10:06:56 UTC
I feels like you're either ignoring half my suggestions, or not reading carefully.  It makes it hard for me to help you.

> > What's different between lop3 and lop2? 
>No different from lop2 and lop3, only 2 labels.

I'm well aware they're too labels.  But what makes it so that for instructions between lop3 and lop2, gdb believes the function is _start, not lop3?

You still haven't checked for outer_frame_id.

> >BTW, if this is the case, this means that this issue only triggers when 
> >stepping through code in the outermost frame (the entry point).  IOW, iIf 
> >your _start  was actually some other function that was called by _start (so 
> >that it'd wouldn't be the outermost frame), this issue wouldn't trigger.
> You are right! this issue only triggers when debugging assembler code!

Sure, except that's not what I said.
Comment 9 ggs334 2013-11-14 09:57:24 UTC
> That's not sufficient, the frame that was frame #0 before the step must be 
> frame #1 after the step for GDB to consider this was a subroutine call.  
> That's this part of the condition:
>
>      && (frame_id_eq (frame_unwind_caller_id (get_current_frame ()),
>		       ecs->event_thread->control.step_stack_frame_id)

> Then I don't understand how that frame_id_eq returned true.  Well, unless 
> both were outer_frame_id.  Please check that.


I have checked it, frame_id_eq() return true, but frame_unwind_caller_id() and ecs->event_thread->control.step_stack_frame_id are not outer_frame_id.

If frame_unwind_caller_id() can find a valid function address in register $ra,
the returned frame id id equal to 
ecs->event_thread->control.step_stack_frame_id 
and the vlaue is:
struct frame_id
{
  stack_addr=0xffffffff;
  code_addr=0x80001470;//The address of _start(entry point)
  special_addr=0x0;
  stack_addr_p=0x1;
  code_addr_p=0x1;
  special_addr_p=0x0;
  artificial_depth=0x0;
};


If can't find a valid function address in register $ra, GDB will print
============
warning: GDB can't find the start of the function at 0xfffffffc.

    GDB is unable to find the start of the function at 0xfffffffc
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
    This problem is most likely caused by an invalid program counter or
stack pointer.
    However, if you think GDB should simply search farther back
from 0xfffffffc for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
============

So,I think you are right,maybe something odd in the unwinder. But the unwinder is foreign for me, Can you give some advices?