This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix signals.exp test case on S/390


Hello,

I was trying to get the signals.exp test case to pass on s390.
This exposed a number of problems in the Linux kernel and gdb;
in particular I think I've found a bug in gdb common code
(step_over_function in infrun.c).

However, I'm not completely sure I fully understand the intended
flow of control throughout infrun.c, so I'd appreciate comments
on what I found.

Basically, the test case arranges for a signal (SIGALRM) to
arrive while the inferior is stopped, and then does a 'next'.
When the inferior is continued with PT_SINGLESTEP due to the
'next', the kernel tries to deliver the pending SIGALRM.

This in turn causes another ptrace intercept.  gdb gets active,
notices that it doesn't want to do anything special with the
signal, and continues with PT_SINGLESTEP giving the signal
number.

At this point, the Linux kernel delivers the signal, runs to
the first instruction of the signal handler, and gets the
single step trap there. (Note that there was a bug in the kernel that caused this not to work, but I've fixed that one.)


Now the interesting aspect starts.  handle_inferior_event
needs to figure out what has happened here.  There is special
code in handle_inferior_event that tries to recognize the
'we've just stepped into a signal handler' situation:

  /* Did we just take a signal?  */
  if (pc_in_sigtramp (stop_pc)
      && !pc_in_sigtramp (prev_pc)
      && INNER_THAN (read_sp (), step_sp))

[anything in infrun.c using INNER_THAN is always suspect]


    {
      /* We've just taken a signal; go until we are back to
         the point where we took it and one more.  */

Note, however, that this fundamentally does not work on
Linux because we use a signal trampoline only to *exit*
from a signal handler -- the kernel *starts* signal handlers
by jumping to them directly without any trampoline.

So this test doesn't trigger, and the event turns out to
be interpreted as call to a subroutine, and is passed on
to handle_step_info_function, which decides to step over
that function call using step_over_function. This would basically work just fine for this case -- we'd continue
just after the signal handler terminates, which is what
we want here.


However, step_over_function continues to run not only
until a specific PC has been reached, but at the same time
a specific *frame* needs to be reached.  In the situation
we're in right now:

  PC                                        frame
  1st instruction of signal handler         sig. handler frame
  1st instruction of sigreturn trampoline   sigtramp frame
  interrupted main routine                  main routine frame

step_over_function tries to continue running until it
reaches the current frame's return address (i.e. the 1st
instruction of the sigreturn trampoline) while at the same
time reaching the frame currently being stepped (i.e the
main routine's frame).  Of course, these two events never
coincide, and hence gdb steps until the program terminates.

I'm lost here. What happens with:


- get_frame_id (get_prev_frame (signal handler))

- get_frame_id (sigreturn trampoline)

Hopefully you can see the values by grubbing around in the output from "set debug frame 1".

They should match (check my tramp branch, it contains, er, interesting ideas on how to better handle trampolines).

Andrew



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]