This is the mail archive of the frysk@sourceware.org mailing list for the frysk project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimizing watchpoints


Roland McGrath wrote:
For the latter, that means an individual thread or a group of threads that
share a set of watchpoints. Right now, the implementation can only be done
by setting each watchpoint individually on each thread. But it is likely
that future facilities will be able to share some low-level resources and
interface overhead by treating uniformly an arbitrary subset of threads in
the same address space.

Ideally from an api perspective, I'd like both. In the past, I always found it useful to watch every thread in a process to see which one was clobbering this memory address. However I would still like to preserve single thread watchpoints from a user (Frysk) api perspective.


It also likely to matter whether the chosen subset
is in fact the whole set of all threads in the same address space, and
whether a thread has only the breakpoints shared with its siblings in a
chosen subset, or has those plus additional private breakpoints of its own.
So it's worthwhile to think about how the structure of keeping track of
watchpoints (and other kinds of low-level breakpoints) can reflect those
groupings of threads from the high-level semantic control plane down to the
lowest-level implementation, where the most important sharing can occur.

Right now (correct me if I wrong here Mark), we do "software" code breakpoints via single-stepping and none of the limited debug registers are used for hardware code breakpoints. I guess the question here is whether we ever will, and if any design should reflect and be accommodating to that, or whether we should just "rewrite as necessary". For now I am going to take the latter, and pretend the former will never exist, at least in Frysk.


There is one final aspect of organization to consider. At the lowest
level, there is a fixed-size hardware resource of watchpoint slots. When
you set them with ptrace, the operating system just context-switches them
for each thread in the most straightforward way. So the hardware resource
is yours to decide how to allocate. However, this is not what we expect to
see in future facilities. The coming model is that hardware watchpoints
are a shared resource managed and virtualized to a certain degree by the
operating system. The debugger may be one among several noncooperating
users of this resource, for both per-thread and system-wide uses. Rather
than having the hardware slots to allocate as you choose, you will specify
what you want in a slot, and a priority, and can get dynamic feedback about
the availability of a slot for your priority. (For compatibility, ptrace
itself will use that facility to virtualize the demands made by
PTRACE_SET_DEBUGREG and the like. ptrace uses a known priority number that
is fairly high, so that some system-wide or other background tracing would
have to knowingly intend to interfere with traditional user application use
by choosing an even higher priority.)

This is where I see the largest change in Frysk's implementation now, and where it will change in the future with utrace; and it would do to make this setting and getting stuff in a fairly abstract class that can be reslotted depending on implementation. This is where I have been currently spending a lot of my thinking time. Right now, the debug registers will be populated via Frysk's register access routines which are themselves being refactored. The ptrace peek and poke is abstracted from the code, and just a simple set/get will be performed via the Frysk functions to populate and read the debug registers. But as you mention, it appears in the utrace world that this will be taken from the (abstracted) ptrace user and managed by the kernel. For the purposes of context on this list, is that hardware watchpoint design set in stone with utrace now, and would it be safe to lay plans based on that?


At one extreme you have single-step, i.e. software watchpoints by storing
the old value, stepping an instruction, and checking if the value in memory
changed. This has few constraints on specification (only that you can't
distinguish stored-same from no-store, and it's not a mechanism for data
read breakpoints). It has no resource contention issues at all. It is
inordinately expensive in CPU time (though a straightforward in-kernel
implementation could easily be orders of magnitude faster than the
traditional debugger experience of implementing this).

Conceptually (again correct me if I am wrong again, Mark/Tim) this is what we do with Code breakpoints, so adding a software watchpoint would be a modification of that code, and the hardware watchpoints - at least at the engine level - would be separate implementation. The user may or may not know the difference on whether they are assigning a hardware or software watchpoints depending on the tuneability that is given to them. However, I have no plans for software watchpoints at this moment.


Hardware watchpoints have some precise constraints and they compete for a
very limited dynamic resource, but they are extremely cheap in CPU time.

Yes and they seem to change on inter-model processor revisions too. Fun! Anyway, I'm still working on the bag of tricks for optimizing watchpoints. But I just wanted to give an overview to the first part of the email just as a wider scope, and open it up for comments about my long term intentions. I'll comment on the second part of your email later.


Regards

Phil



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]