This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [pcp] suitability of PCP for event tracing

From: Ken McDonell <kenj at internode dot on dot net>
To: nathans at aconex dot com
Cc: "Frank Ch. Eigler" <fche at redhat dot com>, Greg Banks <gnb at evostor dot com>, systemtap at sources dot redhat dot com, pcp at oss dot sgi dot com
Date: Sun, 19 Sep 2010 00:21:34 +1000
Subject: Re: [pcp] suitability of PCP for event tracing
References: <388028135.1068001284679120887.JavaMail.root@mail-au.aconex.com>

On 17/09/2010 9:18 AM, nathans@aconex.com wrote:

...
[sp. TCP]  :) ... local context mode could be used in that situation
(PM_CONTEXT_LOCAL), which would map more closely to the current trace
tools and doesn't use TCP.  I haven't seen any reason why this scheme
wont work for our little-used local context friend, good thing we did
not remove that code, eh Ken?  ;)

Yep. And the good thing is that there are no extra PMDA or PMCD changes needed here ... build for the distributed case, then when the client is running on the same host as the data source, you can choose the low latency PM_CONTEXT_LOCAL path ... promoting PM_CONTEXT_LOCAL to be a first class citizen has meant that we can accommodate new PMDAs in this scheme, e.g. the event tracing PMDA.

...
I guess it remains to be seen what (existing) tools will do with the
trace data ... I'm guessing for the most part they will ignore it (as
many of them do for STRING/AGGREGATE type already (pmie, pmval, etc).
So, there's still plenty of work to be done to do a good job of adding
support to the client tools - almost certainly a new tracing-specific
tool will be needed.

For the existing tools, I think we'll probably end up adding a routine to libpcp to turn a PM_TYPE_EVENT "blob" into a pmResult ... this will work for pminfo and pmprobe where the timestamps are ignored. For pmie, pmval, pmdumptext, pmchart, ... I'm not sure how they can make sense of the event trace data in real time, expecting data from time t, and getting a set of values with different timestamps smaller than t is going to be a bit odd for these ones.

The tracing-specific tool will be expecting it, so that should be OK.

I've also thought we could even teach pmlogger about PM_TYPE_EVENT data and it could emit one pmResult per event record to capture the correct time sequences from the original event traces ... this only makes sense if there are tools that can use this sort of data from an archive.

...
Main concerns center around the PMDA buffering scheme ... things like,
how does a PMDA decide what a sensible timeframe for buffering data is
(probably will need some kind of per-PMDA memory limit on buffer size,
rather than time frame).  Also, will the PMDA have to keep track of
which clients have been sent which (portions of?) buffered data?  (in
case of multiple clients with different request frequencies ... might
get a bit hairy?).

I'm bald, so hairy is no threat ... 8^)>

I _do_ think this is simple - doubly linked list (or similar) of events - reference count when event arrives based on number of matching client registrations - scan list for each client gathering matching events, decrementing reference counts - free event record when reference count is zero - tune buffer depth per client with pmStore - cull list if client is not keeping up and return PM_ERR_TOOSLOW

Plus several variants around lists per client or bit maps per client to reduce matching overhead on each pmFetch.

If my battery would last long enough, I think this could be done on a plane between Copenhagen and Melbourne!

Also, we've not really considered the additional requirements that we
have in archive mode.  Unlike the sampled data, traces have explicit
start and end points, which we will need to know about.  For example,
if I construct a chart with starting offset (-S) at 10am and ending
(-T) at 10:15, and a trace started at 9:45 which completes at 10:10,
I'd expect to see that trace displayed, even though the trace data
would (AIUI, in this proposal) all be stored at the time the trace
was sampled? ...

I think you'd need my "expand PM_TYPE_EVENT into a set of pmResults" change to pmlogger to get close here. But even with that, I'm not sure what pmchart is going to do with event data records having timestamps of 9:45:03.456, 9:45:03.501, 9:45:04.001, etc. The event parameters are likely to be discrete, so the semantics is going to be hard for pmchart.

... Well, actually, not sure how this will look? - does a
trace have to end before a PMDA would see it?  that'd be a bit lame;
or would we export start and end events separately? ...

This depends on the underlying event tracing subsystem. Some emit start and end events, and then the consumer has to know how to match these up. Others emit completion events (which usually include time taken and other resources consumed to process the event, return status, etc).

As a generic tool, I'm not sure pmchart will be able to make a lot of sense of the raw event data.

... then we need a
way to tie them back together in the client tools.  Or in this example
of a long-running trace (relative to client sample time), does the
PMDA report "trace X is in-progress" on each sample?  That'd be a bit
wasteful on disk space ... hmm, not clear what the best approach here
will be.

Not sure I follow. I'm expecting the events to be emitted once tracing is activated (or an interest is registered), so I'm not sure the concept of "trace X is in-progress" will be visible outside the PMDA.

Could extend the existing temporal index to index start/end time for
traces so we can quickly find whether a client sample covers a trace?
Either way, I suspect "trace start" and "trace end" may need to each
be a new metric type (in addition to PM_TYPE_COUNTER, PM_TYPE_INSTANT
and PM_TYPE_DISCRETE that we have now, iow).

I think we need some input from those on the list likely to be the generators of the events, as it seems Nathan and I don't have a common view on what data is going to be emitted. In my mind, there are event records when a trace is active, and there are no event records when a trace is not active, so the notion of a "start or end of trace" event is not explicitly present.

...
Alot of work here, but its all fascinating stuff&  gonna be great fun
to code!

Agreed ... this sounds like fun.

Follow-Ups:
- Re: [pcp] suitability of PCP for event tracing
  - From: Max Matveev

References:
- Re: [pcp] suitability of PCP for event tracing
  - From: nathans

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]