[RFA]: Modified Watchthreads Patch
Sat Dec 11 19:07:00 GMT 2004
On Sat, Dec 11, 2004 at 07:49:35PM +0100, Mark Kettenis wrote:
> Date: Sat, 11 Dec 2004 11:36:52 -0500
> From: Daniel Jacobowitz <email@example.com>
> > Adding hacks around hacks, like we've been doing to support threads on
> > Linux for quite some time now is defenitely not a good idea.
> Mark, would you please stop saying this? I don't believe it to be true
> any more. If you think it's still accurate, please point me at
> specific hacks around hacks, and let's see if we can get rid of them
> Sorry Daniel, I know you've done some really good work with regard to
> threads in the kernel. I guess I'm still somewhat frustrated about
> the situation back when I wrote the initial Linux LWP layer. Back
> then the attitude of the kernel developers was basically: "We won't
> complicate the kernel with support for debuggers, solve everything
> from userland!".
There's still some of that attitude, but when we can demonstrate the
value of a kernel change we can get it implemented. That's how
fork/clone events came to be.
> That said, there is just too much code in linux-nat.c. If you compare
> the code necessary to implement to_wait and to_resume that's there
> with the amount of code in inf-ttrace.c, you see what I mean. Most of
> the code is present because we need to stop each thread individually
> by sending it a SIGSTOP. Things become so much simpler if the kernel
> would provide an interface to stop them all in one go that doesn't
> interfere with signal delivery...
Unfortunately, I think this is basically impossible. I spent several
weeks talking to Roland (who has a better head for the signal delivery
issues than I do - and who didn't think it was impossible, so maybe
he's right :-) about how to do this.
The problem is that many threads could be executing right now; there is
no way to stop them before any pending signals are delivered,
especially if they're running on other processors.
Comparing the code size in inf-ttrace.c and linux-nat.c is unfair.
Remember what I said (below) about the different choice of models?
There's probably as much or more code in the HP/UX kernel to stop
multiple threads and queue their (possibly multiple) pending events
for ttrace. On HP/UX the code lives in the kernel where you can't see
it; on Linux it lives in GDB. I consider fitting the nat code into
three times the size of inf-ttrace a triumph. Of course, (A) it can be
slimmed down a lot (more below) and (B) inf-ttrace implements page
protection watchpoints also.
> I admit there are some peculiarities related to stopping all threads.
> But most of them are related to very real situations that we want to be
> able to debug: two threads receiving a signal at the same time, hitting
> different breakpoints at the same time, et cetera. Life with threads
> is just more complicated. Some platforms do the complicated bits in
> the kernel, and Linux chose to expose an LWP-oriented interface rather
> than a whole-process oriented interface so we have to do the
> complicated bits in userspace. That is not going to change, because
> the Linux design philosophy for threading is that they are just a
> special kind of process; Linux has no concept of "the whole process"
> and will not be adding one. This has been discussed from time to time
> on the linux-kernel list. [There is some correlation to the POSIX
> threading concept of a process, for the purpose of POSIX-compliant
> signal delivery, but that's the extent of it.]
> I still think this is wrong. The very fact that these proceses share
> a virtual memory space means that they're grouped together. The
> kernel shouldn't deny that. But even if folks don't want to support
> freezing that memory space atomically (at least to the observer), we
> really need a way to stop each process individually that doesn't
> interfere with signal delivery. I sincerely believe that we'll keep
> seeing thread-related problems if it isn't possible to stop threads
> while keeping all signals pending.
This is less difficult. The biggest wart in linux-nat, IMO, is the
code to backtrack off of breakpoints and signals and arrange to re-hit
them later; I intend to rip it out in the next couple of months.
Gdbserver is an existence proof that it isn't necessary.
After that, I might investigate a PTRACE_STOP that stops a single
thread without queueing a SIGSTOP; however, I have looked at this
before, and there's just no good way to do it inside the kernel. I
don't agree that it's necessary, though. It might be better to send a
gdb-chosen real-time signal to the process, because of the RT signal
properties of queueing and not displacing signals sent by other
processes using the same number. It doesn't need to stop on our
signal; just on some signal.
Remember, the "pending" signal may already have been delivered by the
time the observer asks to stop it. Making the kernel turn back time
would be a bit of a trick!
More information about the Gdb-patches