"finish" command leads to SIGTRAP

John Baldwin jhb@FreeBSD.org
Thu Feb 21 20:50:00 GMT 2019

On 2/21/19 11:34 AM, Pedro Alves wrote:
> Hi John,
> Thanks for stepping in.
> On 02/21/2019 06:49 PM, John Baldwin wrote:
>> On 2/21/19 9:50 AM, Pedro Alves wrote:
>>> I wonder what other kernels, like e.g., FreeBSD do here?
>> FreeBSD also fails (and in the last year we had a set of changes to rework
>> TF handling in the kernel to boot).  This doesn't look trivial to solve.
>> To get the exception you have to have TF set in %rflags/%eflags, but that
>> means it is set when the pushf writes to the stack.  I think what would
>> have to happen (ugh) is that the kernel needs to recognize that the DB#
>> fault is due to a pushf instruction and that if the TF was a "shadow" TF
>> due to ptrace it needs to clear TF from the value written on the stack as
>> part of the fault handler.
>>> Guess if GDB is to workaround this, it'll have to either add
>>> special treatment for this instruction (emulate, step over with a software
>>> breakpoints, something like that), or clear TF manually after
>>> single-stepping.  :-/
>> I suspect it will be common for kernels to have this bug because the CPU
>> will always write a value onto the stack with TF set as part of
>> executing the instruction.  A workaround in GDB would be much like what I
>> described above with the advantage that GDB actually knows it is stepping a
>> pushf before it steps it, so it can know to rewrite the value on the
>> stack after it gets the SIGTRAP for the single step over the pushf.
>> This may actually be hard for a kernel to get right as at the time of the
>> fault we don't get anything that says how long the faulting instruction was,
>> etc.  Thus, just looking at the byte before the current eip/rip in a DB#
>> fault handler for the pushf opcode (I believe it's a single byte) can get
>> false positives because you might have stepped over a mov instruction with
>> an immediate whose last byte happens to be the opcode, etc.
> I can think of other workarounds potentially possible:
> #1 - emulate the instruction: i.e., if you know you're stepping a
>    pushf instruction, you could instead push the flags state on the
>    stack yourself manually, advance the PC, and then raise a
>    fake trap.  Could be done by the kernel, or gdb.  Fixing it on
>    the kernel side should be more efficient, and fixes it for
>    all debuggers.  While fixing it on the debugger side fixes
>    it for all kernels...

Actually, yes, the PTRACE_STEP/PT_STEP can notice the pushf before it
executes it in the kernel.  That is not too bad then I guess.

> #2 - if you know you're stepping a pushf instruction, set a breakpoint
>    after it, and PTRACE_CONTINUE instead of stepping.  (that's the software
>    single-step workaround mentioned earlier).

I prefer that to my suggestion above, and if we chose to do it in GDB my
guess is that #2 is simpler / smaller patch to implement than #1?

> #3 - have gdb always clear TF after a single-step.  This is the
>    easiest, even if the "less technically cool" solution.  This
>    would mean that it'd be impossible to debug a program that
>    sets the trace flag manually.  I've actually once co-wrote
>    an in-process x86 debug stub, and in that use case
>    preserving TF mattered, made it possible to debug that
>    stub...  Quite a niche use case, though, and it'd have been
>    trivial for me for hack gdb for that special use case, of course.
> In order for GDB to know whether it is stepping a pushf instruction,
> it needs to read the memory at PC, which has a cost, but maybe it's
> negligible if we already end up reading memory anyway (because of the
> code cache), but I'm not sure we already do.  This can have a more
> noticeable effect with remote debugging (which should weigh on whether
> to do the workaround at the infrun.c level, or in the target backend (thus
> in gdbserver when remote).
> Solution #3 would require extra ptrace commands anyway (read-modify-write
> the flags), so it may end up being less performant, if #1 and #2 already
> hit the code cache.
> There are some extra complications around #1 and #2 for gdbserver,
> because we need to consider the cases when gdbserver handles 
> single-stepping without roundtripping to gdb:
>   - range-stepping
>   - stepping over breakpoints/tracepoints

Hmmm, I will probably try to fix (or get someone else to fix) FreeBSD's
kernel regardless probably using the approach in #1.  For GDB itself, I
probably have a slight preference for #2 over #1, but I haven't yet worked
with gdbserver, so I'd defer to you on if #3 is the best solution when
taking gdbserver into account.  If the edge case of #3 matters, (which might
matter for some other things like some language runtimes that set TF and use
SIGTRAP handlers that motivated FreeBSD's kernel changes last year), we
could perhaps provide a way for targets to override #3 if they know they
don't need it (e.g. a native target under a kernel known to work).  Not
sure how that would work over remote (e.g. if you would want gdbserver to
internalize this behavior so that only it deals with it and hides it from
the remote debugger).

John Baldwin


More information about the Gdb mailing list