This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Output Redesign in SystemTap Runtime


ïOutput Redesign in SystemTap Runtime

SUMMARY

The current hastily-implemented output format in the runtime is
inadequate.  I have come to the conclusion that we need to have tagged
data in the output stream for the kernel-to-userspace transport.

I'm proposing an xml-like syntax and expect to be able to increase
efficiency and greatly improve flexibility.

CURRENT STATUS

The current API has a per-cpu static buffer that is written into by
printf-like commands.  When the programmer wants the accumulated data
sent and timestamped, stp_print_flush() must be called.  A timestamp is
placed in the front of the accumulated data and the result copied to the
relayfs per-cpu data buffer, where it will be delivered when the buffer
fills.  Obviously we can improve this process.

REQUIREMENTS

1. It must be efficient!

2. Relayfs requires timestamps so we can reassemble timeline from per-
cpu files.

3. It should allow a Netlink to be used instead of relayfs for
situations where a stream of data is needed.

4. Netlink data needs optional cpu numbers and timestamps.

5. I would like to enable applications to easily grab different types of
data from an output stream. For example, an application might want to
have register dumps tagged so they would be easily extracted from the
data. Think of what kernel debuggers or trace tools might want to do.

Trace data can be placed within the following categories:
 - timestamps
 - cpu number
 - trace information (entering and exiting functions)
 - stack dumps
 - register dumps
 - memory dumps
 - aggregate and static variable values
 - user-defined tagged data
 - misc ASCII output

RANDOM THOUGHTS

1. There are two logical output streams, the Event stream and the
Command stream.  All printed data goes over the Event stream, except
what is printed using stp_log().  All internal commands between the
runtime and the stpd  and the output of stp_log() go over the command
channel.  The Event stream is by default relayfs, but can be defined at
compile time to use netlink instead.  Command stream is always netlink.
stp_log() output is debug stuff, warnings, and other transient stuff
that isn't even guaranteed to get displayed. Could be printk'd.

Why have netlink as an option for the event stream? Sometimes we don't 
want the buffering or per-cpu characteristics of relayfs.
Rarely (??) for sending data for low-traffic realtime monitoring.
For example, shellsnoop.  This would be a compile-time option.  Output
data is never sent over both relayfs and netlink. Maybe we could simply
force relayfs to flush its buffers after every write, then this would be
fine.  However netlink is already implemented for the commands.

2. Forget the previous model (stp_print_flush, etc)

Any kprobe that generates output must call stp_print_entry(). This
creates the timestamp for any data that follows.

stp_print_entry(void) â prints a timestamped ENTRY tag.
stp_print_exit(void) â prints a timestamped EXIT tag. Used on return
probes.
(should those take strings so we can display output along with the
ENTRY/EXIT tags?)

output is <ENTRY timestamp func_addr> and <EXIT timestamp func_addr>

3. All trace data is tagged with XML (or xml-like) tags. So stack data
is tagged as <stack addr addr ... /> and registers as <REG r1 r2 />
(or should that be <stack format=format_type> addr addr </stack>)

4. Addresses, timestamps, and register values could actually be binary
to save time and space. No need to worry about endian problems because
this is just an internal format.  The post-processing step would need to
convert to ASCII and save in a proper XML format.  A simple #define could 
be used to toggle between ascii and binary to aid debugging.

5. For relayfs, don't use relayapp_write(). Instead implement our own
version of relay_write that writes directly into the underlying buffer.
Tom suggests using relay_reserve().

6. stpd probably needs a new name because it isn't really a daemon
anymore.

7. What about probes with custom GUI interfaces?  Do we define a way for
stpd to fork and forward data to a GUI app, or do we simply structure
stpd so anyone can grab the sources and easily build a custom version of
it?




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]