Bug 3997

Summary: SIGTRAP handler gets reset when single stepping
Product: frysk Reporter: Mark Wielaard <mark>
Component: generalAssignee: Unassigned <frysk-bugzilla>
Status: NEW ---    
Severity: normal CC: cagney, cmoller
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Bug Depends on: 4019    
Bug Blocks: 1496    

Description Mark Wielaard 2007-02-07 16:19:18 UTC
When you single step a SIGTRAP handler with ptrace() then it gets reset on some
kernels. This happens at least on 2.6.19-1.2895.fc6, but not on
2.6.17-1.2174_FC5. It also doesn't happen when doing a normal ptrace() CONT
through the signal handler.
Comment 1 Mark Wielaard 2007-02-07 16:51:41 UTC
Pushed to Fedora: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227693
Comment 2 Mark Wielaard 2007-02-09 21:36:37 UTC
*** Bug 4019 has been marked as a duplicate of this bug. ***
Comment 3 Andrew Cagney 2007-02-10 00:22:04 UTC
Marking as suspended, test case added.
Comment 4 Mark Wielaard 2007-03-06 18:50:01 UTC
Note that "upstream" (fedora kernel maintainers in this case) said:

 "Happens on vanilla 2.6.18.6 from kernel.org, too"
 "Does not happen on 2.6.16.35"

So it seems an (old!) upstream, upstream (kernel.org) bug really.
Comment 5 Chris Moller 2007-03-07 17:33:51 UTC
Just as a bit of a blog, and as notes to myself, here's what's happening so far:

Presumably (I haven't checked yet, so it's "presumably") as a result of the
ptrace (PTRACE_SINGLESTEP, pid, 0, SIGTRAP); in the testcase,
kernel/utrace.c:utrace_signal_handler_singlestep() is called.  Something in
there (again, I haven't followed that path yet) results in a call to 

    arch/i386/kernel/traps.c:do_debug()

which calls 

    arch/i386/kernel/ptrace.c:send_sigtrap(SIGTRAP,...)

which calls 

    kernel/signal.c:force_sig_info()

which then sets 

    action->sa.sa_handler = SIG_DFL;

if the current action is blocked--the handler up to that point was correctly
pointing at the testcase handler;

A comment in kernel/signal.c reads:

/*
 * Force a signal that the process can't ignore: if necessary
 * we unblock the signal and change any SIG_IGN to SIG_DFL.
 *
 * Note: If we unblock the signal, we always reset it to SIG_DFL,
 * since we do not want to have a signal handler that was blocked
 * be invoked when user space had explicitly blocked it.
 *
 * We don't want to have recursive SIGSEGV's etc, for example.
 */

so I guess the behaviour is deliberate.

It will take me more poking to figure out what, if anything, should be done
about this.  I'm going to guess though that since PTRACE_SINGLESTEP results in
the child looking like it's been stopped by a SIGTRAP, and in the testcase a
non-SIG_DFL handler is being set by the child on SIGTRAP, there's a bit of
confusion.
Comment 6 Mark Wielaard 2007-03-09 10:42:07 UTC
According to a comment by roland on
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227693 this isn't a bug,
but expected behaviour of ptrace single stepping into sig trap handler (so, I
assume it was a bug that this worked on older kernels). Since stepping into a
sig trap handler will produce a sig trap signal itself (because that is how
ptrace reports the single step action) and the kernel cannot rely on there being
a debugger/parent swallowing that second sig trap signal. Note that single
stepping into any other signal handler doesn't have this problem.

So we will have to come up with a trick to (simulate?) single stepping into a
sig trap handler.

Leaving this open for now.
Comment 7 Mark Wielaard 2007-03-19 12:00:13 UTC
This is a misfeature of ptrace single step. It uses SIGTRAP to signal that a
step is made. this used to work in older kernels. But newer kernels decided to
block the sig trap handler if the child wasn't using a reentrant sigtrap handler
(even though the ptracing debugger would of course swallow the signal and never
deliver it to the child itself). Resetting the child signal handler obviously
breaks out testcases. For now, to have minimal testing of sigtrap handler
stepping, we instrument the test programs to us SA_NODEFER. Also the
funit-breakpoints uses a simple SIGUSER handler to test signal stepping and
breakpointing.

The real solution for this problem, so we can single step also non-altered user
programs that use SIGTRAP, is to use a, non-existing yet, interface on top of
utrace that doesn't use SIGTRAP for reporting events to frysk.