This is the mail archive of the
frysk@sourceware.org
mailing list for the frysk project.
Re: Optimizing watchpoints
- From: Phil Muldoon <pmuldoon at redhat dot com>
- To: Roland McGrath <roland at redhat dot com>
- Cc: Frysk Hackers <frysk at sourceware dot org>
- Date: Wed, 10 Oct 2007 08:11:34 +0100
- Subject: Re: Optimizing watchpoints
- References: <46FD7036.2010500@redhat.com> <20071001012529.D264A4D0325@magilla.localdomain>
Roland McGrath wrote:
For the latter, that means an individual thread or a group of threads that
share a set of watchpoints. Right now, the implementation can only be done
by setting each watchpoint individually on each thread. But it is likely
that future facilities will be able to share some low-level resources and
interface overhead by treating uniformly an arbitrary subset of threads in
the same address space.
Ideally from an api perspective, I'd like both. In the past, I always
found it useful to watch every thread in a process to see which one was
clobbering this memory address. However I would still like to preserve
single thread watchpoints from a user (Frysk) api perspective.
It also likely to matter whether the chosen subset
is in fact the whole set of all threads in the same address space, and
whether a thread has only the breakpoints shared with its siblings in a
chosen subset, or has those plus additional private breakpoints of its own.
So it's worthwhile to think about how the structure of keeping track of
watchpoints (and other kinds of low-level breakpoints) can reflect those
groupings of threads from the high-level semantic control plane down to the
lowest-level implementation, where the most important sharing can occur.
Right now (correct me if I wrong here Mark), we do "software" code
breakpoints via single-stepping and none of the limited debug registers
are used for hardware code breakpoints. I guess the question here is
whether we ever will, and if any design should reflect and be
accommodating to that, or whether we should just "rewrite as necessary".
For now I am going to take the latter, and pretend the former will never
exist, at least in Frysk.
There is one final aspect of organization to consider. At the lowest
level, there is a fixed-size hardware resource of watchpoint slots. When
you set them with ptrace, the operating system just context-switches them
for each thread in the most straightforward way. So the hardware resource
is yours to decide how to allocate. However, this is not what we expect to
see in future facilities. The coming model is that hardware watchpoints
are a shared resource managed and virtualized to a certain degree by the
operating system. The debugger may be one among several noncooperating
users of this resource, for both per-thread and system-wide uses. Rather
than having the hardware slots to allocate as you choose, you will specify
what you want in a slot, and a priority, and can get dynamic feedback about
the availability of a slot for your priority. (For compatibility, ptrace
itself will use that facility to virtualize the demands made by
PTRACE_SET_DEBUGREG and the like. ptrace uses a known priority number that
is fairly high, so that some system-wide or other background tracing would
have to knowingly intend to interfere with traditional user application use
by choosing an even higher priority.)
This is where I see the largest change in Frysk's implementation now,
and where it will change in the future with utrace; and it would do to
make this setting and getting stuff in a fairly abstract class that can
be reslotted depending on implementation. This is where I have been
currently spending a lot of my thinking time. Right now, the debug
registers will be populated via Frysk's register access routines which
are themselves being refactored. The ptrace peek and poke is abstracted
from the code, and just a simple set/get will be performed via the Frysk
functions to populate and read the debug registers. But as you mention,
it appears in the utrace world that this will be taken from the
(abstracted) ptrace user and managed by the kernel. For the purposes of
context on this list, is that hardware watchpoint design set in stone
with utrace now, and would it be safe to lay plans based on that?
At one extreme you have single-step, i.e. software watchpoints by storing
the old value, stepping an instruction, and checking if the value in memory
changed. This has few constraints on specification (only that you can't
distinguish stored-same from no-store, and it's not a mechanism for data
read breakpoints). It has no resource contention issues at all. It is
inordinately expensive in CPU time (though a straightforward in-kernel
implementation could easily be orders of magnitude faster than the
traditional debugger experience of implementing this).
Conceptually (again correct me if I am wrong again, Mark/Tim) this is
what we do with Code breakpoints, so adding a software watchpoint would
be a modification of that code, and the hardware watchpoints - at least
at the engine level - would be separate implementation. The user may or
may not know the difference on whether they are assigning a hardware or
software watchpoints depending on the tuneability that is given to them.
However, I have no plans for software watchpoints at this moment.
Hardware watchpoints have some precise constraints and they compete for a
very limited dynamic resource, but they are extremely cheap in CPU time.
Yes and they seem to change on inter-model processor revisions too. Fun!
Anyway, I'm still working on the bag of tricks for optimizing
watchpoints. But I just wanted to give an overview to the first part of
the email just as a wider scope, and open it up for comments about my
long term intentions. I'll comment on the second part of your email later.
Regards
Phil