This is the mail archive of the
frysk@sources.redhat.com
mailing list for the frysk project.
Re: First try of breakpoint support
- From: Wu Zhou <woodzltc at cn dot ibm dot com>
- To: Mark Wielaard <mark at klomp dot org>
- Cc: frysk at sources dot redhat dot com
- Date: Mon, 14 Aug 2006 22:29:29 -0400
- Subject: Re: First try of breakpoint support
Quoting Mark Wielaard <mark@klomp.org>:
Yes, I also had some intermittent failures. That first try depended on
the timing of the ptrace calls to be right. I have fixed that by
integrating the code with the proc and task state machines. That makes
sure that the various ptrace calls are made at the right time. This also
makes the code much faster.
Great! Nice to know that it is fixed.
I also have some thought about general breakpoint support.
- the kprobe like mechanism to support multi-thread breakpoint seems
to me a very good idea. Some candidate for holding the original
instruction I can thought of are: .init section (they are not needed
after the program starts up), ELF header or P-headers (storing these
information somewhere else, so no need to look into them again), or an
extra dynamically loaded library (through some kinds of dynamic code
patching technique)
Yes, the only issue would be locking because we want the areas to be
only used by one thread at a time. Ideally we would have some per Task
section. Unfortunately we cannot use something like the per-thread-stack
since those are not executable these days. Loading an extra dynamically
loaded library is clever. But that might be pretty intrusive on the
process we monitor.
Yes, non-executable stack is a problem. Is there any way for the
debugger to change the access attribute of the debuggee pages? AFAIK,
ptrace can't do that. I am now thinking if utrace can do that. It
seems to me not that hard. Utrace can register a hook function to do
that in my opinion.
If this kind of mechanism is available, it might also help in
watchpoint implementation. To set a write watchpoint on some address,
we can set that page to be read-only, a page fault will trigger when a
write attempt is tried on it, then debugger can fetch that and report
a write hit; vice versa for read watchpoint. Any ideas?
Yes, loading extra dynamically loaded library is somewhat intrusive.
In fact, many thing debugger are doing are intrusive in some aspect.
The difference is the intrusive degree. What degree of intrusiveness
can we accept?
- this kind of dynamic code patching technique might also be used for
fast conditional breakpoint. To say this, I means something like
Dyninst (http://www.cs.umd.edu/projects/dyninstAPI/). I know it is
mainly used in non-interactive dynamic instrumentation. But I am not
sure if it is proper to use it in interavtive debugging. Maybe there
are some other technique feasible to do similar thing in interactive
debugging. Any idea on how to implement fast conditional breakpoints
in Frysk?
I hadn't seen that yet. Unfortunately it isn't Free Software and can
only be used for research purposes. But the idea to have a rich library
of dynamic instrumentation code is nice. We will most likely need
something that can generate conditional breakpoints by code patching at
some point. But for now I have kept the design as easy as possible and
just have unconditional breakpoints. There are a lot of issues to think
about when multiple threads might have different conditionals on their
breakpoints.
Yes. multi-thread debugging is always a hard problem. Maybe we can
list first all the possible problem we might encounter. And then
current available solution, their pros and cons...what support low
level provide? what we need to ask low-level for? ... based on these,
we might come up a better solution. Though there are really quite a
lot of works to be done.
- I see that you create a new breakpoint by hardwiring to
LinuxIa32Breakpoint. We are thinking that breakpoint for LinuxPPC64
will be very similar except that we will use an illegal instruction
opcode, which will trigger a SIGTRAP as well. The later work flow is
the same, frysk.sys.Wait will detect this when inspecting the wait
status, and handleTrappedEvent will be called. Is our understanding
correct? Any thing we need to keep in mind when adding LinuxPPC64
support for this? And we will appreciate it very much if you can
share your latest workable code with us.
I'll post the rewritten code asap. I redesigned the state machine part 3
times now and I am still not completely satisfied, but having the code
out means more feedback. I made the Breakpoint class even simpler,
ignoring any possible optimizations for now. That should make it even
simpler to get it up and running on another architecture quickly. The
things to consider with this simple approach are 'branch-delay-slots' if
your processor has those. In that case the simple, set/reset
trapping-instruction doesn't work if the target is a branch instruction
since then an instruction from the branch delay slot just after the
branch might be executed before the breakpoint is hit.
Thanks in advance for that. We will be very happy to go thru your code
and to see if we can provide any feedback. And it is really awful for
me to hear that you change the state machines three times (I am
wishing that you don't change that too much so that I don't need to
spend the same time I need to understand it the first time. :-).
For branch-delay-slot, I am not sure if power processors have that.
We will check that.
- Hardware watchpoint hit will also delieve SIGTRAP, then watchpoint
might also share some work flow with breakpoint. Do you have some
consideration in this aspect?
There doesn't seem to be a way to set those through something like
ptrace.
On ppc64, there is an extra ptrace command which can do this. I am
not very sure about the situation in i386 though.
So we would need kernel support for that since they can normally
only be set when running in kernel-privilege mode. But I haven't looked
into this yet. If we can support them they shouldn't work to differently
from how they work now by patching and resetting the original code with
trapping instructions. It would certainly be a faster.
- I remember you (or some one else) mentioned that system call tracing
can't co-exist with single step. I don't have any knowledge about
this before. Any reason for this? Is there any testcase to confirm
this? What about inserting a breakpoint in the next instruction
following system call to simulate stepping over system call?
There are 2 issues. The first is that the state machine support isn't
integrated yet. Breakpoints and syscall tracing are similar, but
different enough to make it non-trivial to use the same setup. The
second is as you mention, single-stepping over a syscall instruction and
interacting correctly with ptrace is a bit tricky. The other solution
for the problem you mention above is to turn on syscall tracing whenever
you set a breakpoint on a system call instruction. Both generate a
SIGTRAP when ptrace is used. You can then just have a flag or a table
that tells you how to interpret it (as syscall enter/leave or breakpoint
or both). I haven't looked at how utrace can be used here.
I don't have testcases yet. Those should indeed be written asap.
Yes. A testcase can help understand this situation more easily. I am
wishing someone can do that. :-)
Regards
- Wu Zhou