"finish" command leads to SIGTRAP

Pedro Alves palves@redhat.com
Thu Feb 21 19:34:00 GMT 2019


Hi John,

Thanks for stepping in.

On 02/21/2019 06:49 PM, John Baldwin wrote:
> On 2/21/19 9:50 AM, Pedro Alves wrote:

>> I wonder what other kernels, like e.g., FreeBSD do here?
> 
> FreeBSD also fails (and in the last year we had a set of changes to rework
> TF handling in the kernel to boot).  This doesn't look trivial to solve.
> To get the exception you have to have TF set in %rflags/%eflags, but that
> means it is set when the pushf writes to the stack.  I think what would
> have to happen (ugh) is that the kernel needs to recognize that the DB#
> fault is due to a pushf instruction and that if the TF was a "shadow" TF
> due to ptrace it needs to clear TF from the value written on the stack as
> part of the fault handler.
> 
>> Guess if GDB is to workaround this, it'll have to either add
>> special treatment for this instruction (emulate, step over with a software
>> breakpoints, something like that), or clear TF manually after
>> single-stepping.  :-/
> 
> I suspect it will be common for kernels to have this bug because the CPU
> will always write a value onto the stack with TF set as part of
> executing the instruction.  A workaround in GDB would be much like what I
> described above with the advantage that GDB actually knows it is stepping a
> pushf before it steps it, so it can know to rewrite the value on the
> stack after it gets the SIGTRAP for the single step over the pushf.
> 
> This may actually be hard for a kernel to get right as at the time of the
> fault we don't get anything that says how long the faulting instruction was,
> etc.  Thus, just looking at the byte before the current eip/rip in a DB#
> fault handler for the pushf opcode (I believe it's a single byte) can get
> false positives because you might have stepped over a mov instruction with
> an immediate whose last byte happens to be the opcode, etc.
I can think of other workarounds potentially possible:

#1 - emulate the instruction: i.e., if you know you're stepping a
   pushf instruction, you could instead push the flags state on the
   stack yourself manually, advance the PC, and then raise a
   fake trap.  Could be done by the kernel, or gdb.  Fixing it on
   the kernel side should be more efficient, and fixes it for
   all debuggers.  While fixing it on the debugger side fixes
   it for all kernels...

#2 - if you know you're stepping a pushf instruction, set a breakpoint
   after it, and PTRACE_CONTINUE instead of stepping.  (that's the software
   single-step workaround mentioned earlier).

#3 - have gdb always clear TF after a single-step.  This is the
   easiest, even if the "less technically cool" solution.  This
   would mean that it'd be impossible to debug a program that
   sets the trace flag manually.  I've actually once co-wrote
   an in-process x86 debug stub, and in that use case
   preserving TF mattered, made it possible to debug that
   stub...  Quite a niche use case, though, and it'd have been
   trivial for me for hack gdb for that special use case, of course.

In order for GDB to know whether it is stepping a pushf instruction,
it needs to read the memory at PC, which has a cost, but maybe it's
negligible if we already end up reading memory anyway (because of the
code cache), but I'm not sure we already do.  This can have a more
noticeable effect with remote debugging (which should weigh on whether
to do the workaround at the infrun.c level, or in the target backend (thus
in gdbserver when remote).

Solution #3 would require extra ptrace commands anyway (read-modify-write
the flags), so it may end up being less performant, if #1 and #2 already
hit the code cache.

There are some extra complications around #1 and #2 for gdbserver,
because we need to consider the cases when gdbserver handles 
single-stepping without roundtripping to gdb:

  - range-stepping
  - stepping over breakpoints/tracepoints

Thanks,
Pedro Alves



More information about the Gdb mailing list