"finish" command leads to SIGTRAP

John Baldwin jhb@FreeBSD.org
Thu Feb 21 18:50:00 GMT 2019


On 2/21/19 9:50 AM, Pedro Alves wrote:
> On 02/21/2019 03:54 PM, David Griffiths wrote:
>> It's something to do with the nature of single stepping through a "popfq"
>> instruction. Given the following instructions:
> 
> I assume you have a pushf somewhere earlier?
> 
>>
>>    0x7fffe104638f:    add    $0x8,%rsp
>>    0x7fffe1046393:    popfq
>>    0x7fffe1046394:    pop    %rbp
>>    0x7fffe1046395:    jmpq   *%rax
>>
>> If I set a breakpoint at the first of that set and single step through, I
>> end up with:
>>
>> eflags         0x346    [ PF ZF TF IF ]
>>
>> but if I set a breakpoint on the last instruction and avoid single stepping
>> I get:
>>
>> eflags         0x246    [ PF ZF IF ]
>>
>> and I think it's that TF that is causing the SIGTRAP?
> 
> Same as <https://sourceware.org/bugzilla/show_bug.cgi?id=13508> ?
> 
> I can reproduce that here, on Fedora 27 / Linux 4.17.17-100.fc27.x86_64.
> 
> Sounds like PTRACE_SINGLESTEP enables TF, which then causes pushf to push
> the state with TF set.  And then popf pops restores that TF-enabled state.
> 
> I'd think this is a kernel bug, in the same vein as the signal issue
> I mentioned below (in which TF would get stuck when you stepped into
> a signal handler, or something like that).  The kernel could have special
> handling for pushf, emulating it instead of actually single-stepping it?
> 
> Maybe newer Linux kernels do something else.  Haven't tried.
> 
> I wonder what other kernels, like e.g., FreeBSD do here?

FreeBSD also fails (and in the last year we had a set of changes to rework
TF handling in the kernel to boot).  This doesn't look trivial to solve.
To get the exception you have to have TF set in %rflags/%eflags, but that
means it is set when the pushf writes to the stack.  I think what would
have to happen (ugh) is that the kernel needs to recognize that the DB#
fault is due to a pushf instruction and that if the TF was a "shadow" TF
due to ptrace it needs to clear TF from the value written on the stack as
part of the fault handler.

> Guess if GDB is to workaround this, it'll have to either add
> special treatment for this instruction (emulate, step over with a software
> breakpoints, something like that), or clear TF manually after
> single-stepping.  :-/

I suspect it will be common for kernels to have this bug because the CPU
will always write a value onto the stack with TF set as part of
executing the instruction.  A workaround in GDB would be much like what I
described above with the advantage that GDB actually knows it is stepping a
pushf before it steps it, so it can know to rewrite the value on the
stack after it gets the SIGTRAP for the single step over the pushf.

This may actually be hard for a kernel to get right as at the time of the
fault we don't get anything that says how long the faulting instruction was,
etc.  Thus, just looking at the byte before the current eip/rip in a DB#
fault handler for the pushf opcode (I believe it's a single byte) can get
false positives because you might have stepped over a mov instruction with
an immediate whose last byte happens to be the opcode, etc.

-- 
John Baldwin

                                                                            



More information about the Gdb mailing list