This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: systemtap/pcp integration pmda 0.1
- From: "Frank Ch. Eigler" <fche at redhat dot com>
- To: David Smith <dsmith at redhat dot com>
- Cc: Systemtap List <systemtap at sourceware dot org>, pcp <pcp at oss dot sgi dot com>
- Date: Tue, 23 Sep 2014 14:44:40 -0400
- Subject: Re: systemtap/pcp integration pmda 0.1
- Authentication-results: sourceware.org; auth=none
- References: <54133D71 dot 6040208 at redhat dot com> <y0mh9zztipi dot fsf at fche dot csb> <5421861B dot 1020504 at redhat dot com>
Hi -
dsmith wrote:
> [...]
> > probe json_data {
> > @json_output_string_value("xstring", "testing 1, 2, 3", "Test String")
> > @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count", 3,
> > "sum of latency for xmit device", $UNITS/SCALE)
> > }
> >
> > so that the metadata is attached at the end of the data-supplying calls?
>
> Hmm, I hadn't considered that. It might be possible to do, but seems
> quite tricky in getting the macros right to support both schema output
> and data output.
The macros wouldn't need to -output- the schema, only update the stap
tapset globals from which the schema can be read later / separately.
(OTOH, it could indeed generate the schema document at the same time,
and store it in some other stap global variable, so that the procfs
schema-reader could be just a string copy-out.)
> >> {
> >> "generation": 1,
> >> "data": {
> >
> > (IMHO we shouldn't mandate such wrappers.)
>
> Here's the deal here. I stole the "generation" idea from the mmv code.
> If I want to support being able to add/remove fields on the fly, I have
> to let the pmda know something has changed. [...]
Understood, good idea. Such a field could be optional & identified by
another pcp metadata field rather than hardcoded. Or the schema could
be reread regularly.
> I see what you are doing here, but I'm quite unsure. If you goal is to
> handle JSON from a variety of sources, to my mind this is a step
> backwards. Your more generic source isn't likely to output a schema in
> that format.
I wasn't explaining this part well, sorry. The idea is:
- the stap source of json data would programmatically emit both data &
pcp-schema
- non-stap sources of json data would emit data (in their own
preexisting custom format, not aware of pcp!), and a pcp-schema for
it would be hand-written by us
- both of the above would be usable by the *same* pmda code, making it
a schema-driven processor of general json data
(Perhaps conflating the word "schema" and "metadata" is not helping.)
> > One benefit of a formal "pcp-name" field here is that the mapping from
> > the JSON nesting structure need not match the pcp namespace exactly.
> > It would let the json object name components be free of constraints
> > like not containing dots (since we would not propagate them to pcp).
>
> Validating names (no dots, spaces, etc. and not too long) is on my todo
> list.
Right; my point is that instead of imposing such a constraint on the
JSON data structure, this could be a constraint on the pcp-specific
metadata tags in the metadata file.
> Originally I had designs of allowing the user to override
> {STAP_MODULE_NAME}. But then we have issues with that field being
> unique. For instance if the same systemtap script was run twice, both
> would try to override the field to the same value. Since we're assured
> that {STAP_MODULE_NAME} is unique, I just decided to go with it.
Yeah, that makes it simple, though stap_XXXXX names are hard to
predict/reuse, and stap -m FOO is also inconvenient. Perhaps the
schema could include a suggested root name, which the pmda could
resolve/reject ties amongst duplicates.
> I'm not really fond of the 'pcp-name' field idea. It means more
> validation (on both sides?) in not allowing things like "foo.bar" being
> a value and then "foo.bar.baz" being a value.
The pmda would be in a comfortable position to check such PCP PMNS
constraints, since it'd know every pcp-name used in a schema.
> This is probably implementable, although I do lose the easy
> data/schema validation provided by the stock python JSON stuff.
(Well, not necessarily, as the pcp-* attributes could be just added to
a json-schema.org schema, so the same overall file can serve both
purposes. Again recall though that we are not really obligated to
validate the random JSON data against any consistency with a schema;
we really only want to pull out designated parts of it for relaying to
PCP.)
> I guess I'm coming at this from a different angle.
>
> - If we want this pmda to (one day) support more generic JSON sources,
> we'll have to expect generic JSON schemas.
> - If we'd like the systemtap side of things to be able to support other
> data collectors (nagios, zabbix, etc.), it should export a fairly
> generic JSON schema.
> To my mind, the changes you've got here take us farther from both goals.
I hope the above clarifies why this is not actually the case. We get
to design a *specific* schema/metadata grammar for PCP, and our
tooling would construct these files (e.g., the stap tapset), or our
tools would *include* these files (e.g., imagine writing out by hand a
pcp-name etc. metadata file for the CEPH JSON data, and including
that with the pmda).
- FChE