This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: what a probe-induced kernel fault looks like
- From: Martin Hunt <hunt at redhat dot com>
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Cc: systemtap at sources dot redhat dot com
- Date: Fri, 12 Aug 2005 10:25:47 -0700
- Subject: Re: what a probe-induced kernel fault looks like
- Organization: Red Hat Inc.
- References: <20050812140715.GA10334@redhat.com>
On Fri, 2005-08-12 at 10:07 -0400, Frank Ch. Eigler wrote:
> Hi -
>
> Just as a matter of curiosity, here's what a kernel fault looks like
> when a division-by-zero is triggered (by virtue of incomplete checks).
> What's interesting about it is that the erroneous division was inside
> a "begin" probe. Due to a questionable aspect of runtime design, this
> is run long after actual module load/init time: it happens after an
> explicit handshake with stpd.
Unfortunately due to the complexity of setting up communications and
exchanging parameters, it is not practical to run begin probes at module
loading time. So how is this a problem?
> This sort of handshaking-based protocol would be terribly
> inappropriate in the case of module onloading. If stpd dies, there
> would be no way to safely remove the module, e.g. to trigger orderly
> kprobes removals. Martin, how does shutdown happen in your model?
Depends on how it is initiated. In the normal case, stpd sends an exit
message to the module. The module runs the exit probes and flushes the
transport. Then stpd rmmods the module.
If stpd dies for some reason, You just rmmod the module. It attempts to
send an informational message to stpd that it is exiting, then goes
ahead and unloads.
> For what it's worth, in my mental model of probe shutdown/startup, the
> translator-emitted code would own the module init/exit hooks; perform
> begin/end probes and kprobes registrations therein, and call into the
> runtime to begin whatever stpd chitchat it wishes to engage in. The
> probe would in no way rely in the existence or activity of stpd for
> its crucial life cycle management functions.
And that would work fine except for all the complexity involving setting
up either netlink or relayfs communications, which is why we have stpd.
Because something needs to manage the kernel-to-userspace
communications.
Martin