This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: whitelist for safe-mode probes (or just a better blacklist?)


David Smith wrote:

Martin Hunt wrote:

On Wed, 2006-09-20 at 11:14 -0400, Frank Ch. Eigler wrote:

Martin Hunt <hunt@redhat.com> writes:

[...]  To guarantee a probe will not crash the kernel it is going to
be necessary to generate a whitelist of probe points.

Sure, except that this guarantee is only as good as the method used to generate the whitelist.


Of course.

[...]  How would this all work? The whitelist and blacklist would be
files distributed with Systemtap.  They would be updated
automatically with a test script. [...]

How do you imagine this test script working? Could it generate a list roughly matching the "in-our-experience-so-far-safe" set in a reasonable timeframe? (It would not be very helpful if it took months to run, or resulted in a small list.)


I imagine this would be a list that would be checked into CVS of
functions that have been tested and never caused problems.  The only
reason to use a whitelist instead of a blacklist is because we should be
paranoid and not assume as new functions get added to the kernel, they
are safely probeable, as we do now.

Writing a script to do this testing is not difficult, except for the
problems with lockups which require a way to remotely reboot a system.
This requires we assume the existence of special hardware or that the
test system is running on a specific virtualization system.  This needs
done regardless of what we decide about the need for a whitelist.  I
hoped to provoke some discussion about this.  We've talked about it, but
has anyone actually written any test scripts to test all the kernel
functions this way?


I can tell you that looking into the problems probing 'kernel.function("*")' on x86 over the last couple of days I've rebooted my test system (what seems like) countless times. I certainly agree with you that we'll need special hardware (perhaps x10 could be a simple start) or virtualization to get this going using a script. I do think that this testing would be extremely useful, even without a whitelist feature.

I wonder if we really might need various levels of "whitelists" to satisfy customer concerns. Something like anyone in group A can only probe syscalls, users in group B can probe syscalls + exported kernel functions, etc.

I would like to chime in..

Let us think of a white list not as a tool to increase systemtap stability but as a tool to decrease tap script debug time.

If I were a system manager in an environment where my next house payment depended on system-up time, I would never run any tap script that I had not fully tested, or was supplied by my ldp. Therefor the white list only helps me in a test environment by speeding up the testing of scripts to be use later in production. In other words the white list helps me from falling in pitfalls by using untested tap points. But it wont eliminate finding new pitfalls during my testing.

But thinking about it now, that is the same thing the black list is doing....

Testing is a good thing, but we should match the effort with the correct paradigm and work on maintaining just the black list.

--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA dwilder@us.ibm.com
(503)578-3789



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]