This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [pcp] suitability of PCP for event tracing
- From: Ken McDonell <kenj at internode dot on dot net>
- To: nathans at aconex dot com
- Cc: "Frank Ch. Eigler" <fche at redhat dot com>, Greg Banks <gnb at evostor dot com>, systemtap at sources dot redhat dot com, pcp at oss dot sgi dot com
- Date: Sun, 19 Sep 2010 00:21:34 +1000
- Subject: Re: [pcp] suitability of PCP for event tracing
- References: <388028135.1068001284679120887.JavaMail.root@mail-au.aconex.com>
On 17/09/2010 9:18 AM, nathans@aconex.com wrote:
...
[sp. TCP] :) ... local context mode could be used in that situation
(PM_CONTEXT_LOCAL), which would map more closely to the current trace
tools and doesn't use TCP. I haven't seen any reason why this scheme
wont work for our little-used local context friend, good thing we did
not remove that code, eh Ken? ;)
Yep. And the good thing is that there are no extra PMDA or PMCD
changes needed here ... build for the distributed case, then when the
client is running on the same host as the data source, you can choose
the low latency PM_CONTEXT_LOCAL path ... promoting PM_CONTEXT_LOCAL to
be a first class citizen has meant that we can accommodate new PMDAs in
this scheme, e.g. the event tracing PMDA.
...
I guess it remains to be seen what (existing) tools will do with the
trace data ... I'm guessing for the most part they will ignore it (as
many of them do for STRING/AGGREGATE type already (pmie, pmval, etc).
So, there's still plenty of work to be done to do a good job of adding
support to the client tools - almost certainly a new tracing-specific
tool will be needed.
For the existing tools, I think we'll probably end up adding a routine
to libpcp to turn a PM_TYPE_EVENT "blob" into a pmResult ... this will
work for pminfo and pmprobe where the timestamps are ignored. For pmie,
pmval, pmdumptext, pmchart, ... I'm not sure how they can make sense of
the event trace data in real time, expecting data from time t, and
getting a set of values with different timestamps smaller than t is
going to be a bit odd for these ones.
The tracing-specific tool will be expecting it, so that should be OK.
I've also thought we could even teach pmlogger about PM_TYPE_EVENT data
and it could emit one pmResult per event record to capture the correct
time sequences from the original event traces ... this only makes sense
if there are tools that can use this sort of data from an archive.
...
Main concerns center around the PMDA buffering scheme ... things like,
how does a PMDA decide what a sensible timeframe for buffering data is
(probably will need some kind of per-PMDA memory limit on buffer size,
rather than time frame). Also, will the PMDA have to keep track of
which clients have been sent which (portions of?) buffered data? (in
case of multiple clients with different request frequencies ... might
get a bit hairy?).
I'm bald, so hairy is no threat ... 8^)>
I _do_ think this is simple
- doubly linked list (or similar) of events
- reference count when event arrives based on number of matching client
registrations
- scan list for each client gathering matching events, decrementing
reference counts
- free event record when reference count is zero
- tune buffer depth per client with pmStore
- cull list if client is not keeping up and return PM_ERR_TOOSLOW
Plus several variants around lists per client or bit maps per client to
reduce matching overhead on each pmFetch.
If my battery would last long enough, I think this could be done on a
plane between Copenhagen and Melbourne!
Also, we've not really considered the additional requirements that we
have in archive mode. Unlike the sampled data, traces have explicit
start and end points, which we will need to know about. For example,
if I construct a chart with starting offset (-S) at 10am and ending
(-T) at 10:15, and a trace started at 9:45 which completes at 10:10,
I'd expect to see that trace displayed, even though the trace data
would (AIUI, in this proposal) all be stored at the time the trace
was sampled? ...
I think you'd need my "expand PM_TYPE_EVENT into a set of pmResults"
change to pmlogger to get close here. But even with that, I'm not sure
what pmchart is going to do with event data records having timestamps of
9:45:03.456, 9:45:03.501, 9:45:04.001, etc. The event parameters are
likely to be discrete, so the semantics is going to be hard for pmchart.
... Well, actually, not sure how this will look? - does a
trace have to end before a PMDA would see it? that'd be a bit lame;
or would we export start and end events separately? ...
This depends on the underlying event tracing subsystem. Some emit start
and end events, and then the consumer has to know how to match these up.
Others emit completion events (which usually include time taken and
other resources consumed to process the event, return status, etc).
As a generic tool, I'm not sure pmchart will be able to make a lot of
sense of the raw event data.
... then we need a
way to tie them back together in the client tools. Or in this example
of a long-running trace (relative to client sample time), does the
PMDA report "trace X is in-progress" on each sample? That'd be a bit
wasteful on disk space ... hmm, not clear what the best approach here
will be.
Not sure I follow. I'm expecting the events to be emitted once tracing
is activated (or an interest is registered), so I'm not sure the concept
of "trace X is in-progress" will be visible outside the PMDA.
Could extend the existing temporal index to index start/end time for
traces so we can quickly find whether a client sample covers a trace?
Either way, I suspect "trace start" and "trace end" may need to each
be a new metric type (in addition to PM_TYPE_COUNTER, PM_TYPE_INSTANT
and PM_TYPE_DISCRETE that we have now, iow).
I think we need some input from those on the list likely to be the
generators of the events, as it seems Nathan and I don't have a common
view on what data is going to be emitted. In my mind, there are event
records when a trace is active, and there are no event records when a
trace is not active, so the notion of a "start or end of trace" event is
not explicitly present.
...
Alot of work here, but its all fascinating stuff& gonna be great fun
to code!
Agreed ... this sounds like fun.