This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [pcp] suitability of PCP for event tracing

From: Ken McDonell <kenj at internode dot on dot net>
To: "Frank Ch. Eigler" <fche at redhat dot com>
Cc: systemtap at sources dot redhat dot com
Date: Mon, 30 Aug 2010 01:54:57 +1000
Subject: Re: [pcp] suitability of PCP for event tracing
References: <20100827153906.GD3185@redhat.com>

[resending as text so it makes it to the systemtap list]

On 28/08/2010 1:39 AM, Frank Ch. Eigler wrote:
> Hi -
>
> We're investigating to what extent the PCP suite may be suitable for
> more general low-level event tracing.  Just from docs / source gazing
> (so please excuse my terminology errors), a few challenges would seem
> to be:

G'day Frank and others.

Apologies for the length of this reply, but there are a number of
non-trivial issues at play here.

Nathan has already answered some of your questions.  I'd like to start
by providing some historical and design center context.  From the outset
PCP was *not* designed for event-tracing, but PCP *was* designed for a
specific class of performance monitoring and management scenarios.

The table below outlines some of the differences ... these help to
explain why PCP is /a priori/ not necessarily suitable for event
tracing.  This does not mean PCP could not evolve to support
event-tracing in the ways Nathan has suggested, we just need to
understand that the needs are different and make sure we do not end up
morphing PCP into something that no longer works for the original design
center and may not work all that well for event tracing.

Locality of data processing
    PCP Design Center
        Monitored system is typically not the same system that the
        collection and/or analysis is performed on.
    Event Tracing
        Data collection happens on the system being monitored, analysis
        may happen later on another system.

Real time analysis
    PCP Design Center
        Central to the design requirements.
    Event Tracing
        Often not required, other than edge-triggers to start and stop
        collection.

Retrospective analysis
    PCP Design Center
        Central to the design requirements.
    Event Tracing
        Central to the design requirements.

Time scales
    PCP Design Center
        We are typically concerned with large and complex systems where
        average levels of activity over periods of the order of tens of
        seconds are representative.
    Event Tracing
        Short-term and transients are often important, and inter-arrival
        time for events may be on the order of milliseconds.

Data rates
    PCP Design Center
        Moderate. Monitoring is often long-term, requiring broad and
        shallow data collection, with a small number of narrow and deep
        collections aligned to known or suspected problem areas.
    Event Tracing
        Very high.  Monitoring is most often narrow, deep and short-lived.

Data spread
    PCP Design Center
        Very broad ... interesting data may come from a number of places,
        e.g.  hardware instrumentation, operating system stats, service
        layers and libraries, applications and distributed applications.
    Event Tracing
        Very narrow ... one source and one host.

Data semantics
    PCP Design Center
        A very broad range, but the most common are activity levels and
        event *counters* (with little or no event parameter information)
    Event Tracing
        Very specific, being the record of an event and its parameters
        with a high resolution time stamp.

Data source extensibility
    PCP Design Center
        Critical.
    Event Tracing
        Rare.

So with this backgrtound, let's look at Frank's specific questions.

> * poll-based data gathering
>
>    It seems as though PMDAs are used exclusively in 'polling' mode,
>    meaning that underlying system statistics are periodically queried
>    and summary results stored.  In our context, it would be useful if
>    PMDAs could push event data into the stream as they occur - perhaps
>    hundreds of times a second.

Yep, this would be a big change.  There is not really a data stream in
PCP ... there is a source of performance metrics (a host or an archive)
and clients connect to that source and pull data at a sample interval
defined by the client.

At the host source, the co-ordinating daemon (pmcd) maintains no cache
nor stream of recent data ... a client asks for a specific subset of the
available information, this is instantiated and returned to the client.
There is no requirement for the subsets of the requested information to
be the same for consecutive requests from a single client, and pmcd is
receiving requests from a number of clients that are handled completely
independently.

As Nathan has suggested, if event traces are intended for retrospective
analysis (as opposed to event counters being suited for either real time
or retrospective analysis), then there is an alternative approach,
namely to create a PCP archive directly from a source of data without
involving pmcd or a pmda or pmlogger.  We've recently reworked the
"pmimport" services to expose better APIs to support just this style of
use ... see LOGIMPORT(3) and sar2pcp(1) for an example.  I think this
approach is possibly a better semantic match between PCP and a stream of
event records.

> * relatively static pmns
>
>    It would be desirable if PMNS metrics were parametrizable with
>    strings/numbers, so that a PMDA engine could use it to synthesize
>    metrics on demand from a large space.  (Example: have a
>    "kernel-probe" PMNS namespace, parametrized by function name, which
>    returns statistics of that function's execution.  There are too many
>    kernel functions, and they vary from host to host enough, so that
>    enumerating them as a static PMNS table would be impractical.)

This is not so much of a problem.  We've relaxed the PMNS services to
allow PMDAs to dynamically define new metrics on the fly.  And as Nathan
has pointed out, the instance domain provides a dynamic dimension for
the available metric values that may also be useful, e.g. this is how
all of procfs is instantiated.

> * scalar payloads
>
>    It seems as though each metric value provided by PMDAs is
>    necessarily a scalar value, as opposed to some structured type.  For
>    event tracing, it would be useful to have tuples.  Front-ends could
>    choose the interesting fields to render.  (Example: tracing NFS
>    calls, complete with decoded payloads.)
>

We've tried really hard to make the PCP metadata rich enough (in the
data model and the API services) to enable clients to be data-driven,
based on what performance data happens to be available today from a host
or archive.  This is why the data aggregate (or blob) data type that
Nathan has mentioned is rarely used (although it is fully supported).

If there was a tight coupling between the source of the event data and
the client that interprets the event data, then the PCP data aggregate
could be used to provide a transport and storage encapsulation that is
consistent with the PCP APIs and protocols.  Of course, such a client
would be exposed to all of the word-size, endian and version issues that
plague other binary formats for performance data, e.g. the sar variants
based on AT&T UNIX.

> * filtering
>
>    It would be desirable for the apps fetching metric values to
>    communicate a filtering predicate associated with them, perhaps as
>    per pmie rules.  This is to allow the data server daemon to reduce
>    the amount of data sent to the gui frontends.  Perhaps also it could
>    use them to inform PMDAs as a form of subscription, and in turn they
>    could reduce the amount of data flow.

PMDAs are free to do as much or as little work as they choose.  Some are
totally demand-driven, instantiating only the information they are asked
for when they are asked for it.  Others use cacheing strategies to
refresh some or all of the information at each request.  Others maintain
timestamped caches and only refresh when the information is deemed
"stale".  Another class run a refresh thread that is contunally updating
a data cache, and requests are serviced from the cache.

The PMDA behaviour can be modal ... based on client requests, or more
interestingly as Nathan has suggested using the pmStore(3) API to allow
one or more clients to enable/disable collection (think about expensive,
detailed information that you don't want to collect unless some client
*really* wants it).  The values passed into the PMDA via pmStore(3) are
associated with PCP metrics, so they have the full richness of the PCP
data model to encode switches, text strings, blobs, etc.

> * no web-based frontends
>
>    In our usage, it would be desirable to have some mini pcp-gui that
>    is based on web technologies rather than QT.

There are several examples of web interfaces driven by PCP data ... but
each of these has been developed as a proprietary and specific
application and hence is not included in the PCP open source
distribution.  The PCP APIs provide all the services needed to build
something like this.

>
> To what extent could/should PCP be used/extended to cover this space?

I think this suggestion is worth further discussion, but we probably
need some more concrete examples of the sorts of event trace data that
is being considered, and the most likely use cases and patterns for that
data.

Cheers, Ken.

Follow-Ups:
- Re: [pcp] suitability of PCP for event tracing
  - From: David Smith

References:
- suitability of PCP for event tracing
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]