This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: systemtap/pcp integration pmda 0.1


Hi -

dsmith wrote:
> [...]
> > probe json_data {
> >    @json_output_string_value("xstring", "testing 1, 2, 3", "Test String")
> >    @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count", 3,
> >                          "sum of latency for xmit device", $UNITS/SCALE)
> > }
> > 
> > so that the metadata is attached at the end of the data-supplying calls?
> 
> Hmm, I hadn't considered that. It might be possible to do, but seems
> quite tricky in getting the macros right to support both schema output
> and data output.

The macros wouldn't need to -output- the schema, only update the stap
tapset globals from which the schema can be read later / separately.
(OTOH, it could indeed generate the schema document at the same time,
and store it in some other stap global variable, so that the procfs
schema-reader could be just a string copy-out.)


> >> {
> >>   "generation": 1,
> >>   "data": {
> > 
> > (IMHO we shouldn't mandate such wrappers.)
> 
> Here's the deal here. I stole the "generation" idea from the mmv code.
> If I want to support being able to add/remove fields on the fly, I have
> to let the pmda know something has changed. [...]

Understood, good idea.  Such a field could be optional & identified by
another pcp metadata field rather than hardcoded.  Or the schema could
be reread regularly.


> I see what you are doing here, but I'm quite unsure. If you goal is to
> handle JSON from a variety of sources, to my mind this is a step
> backwards. Your more generic source isn't likely to output a schema in
> that format.

I wasn't explaining this part well, sorry.  The idea is:

- the stap source of json data would programmatically emit both data &
  pcp-schema

- non-stap sources of json data would emit data (in their own
  preexisting custom format, not aware of pcp!), and a pcp-schema for
  it would be hand-written by us

- both of the above would be usable by the *same* pmda code, making it
  a schema-driven processor of general json data

(Perhaps conflating the word "schema" and "metadata" is not helping.) 


> > One benefit of a formal "pcp-name" field here is that the mapping from
> > the JSON nesting structure need not match the pcp namespace exactly.
> > It would let the json object name components be free of constraints
> > like not containing dots (since we would not propagate them to pcp).
> 
> Validating names (no dots, spaces, etc. and not too long) is on my todo
> list.

Right; my point is that instead of imposing such a constraint on the
JSON data structure, this could be a constraint on the pcp-specific
metadata tags in the metadata file.


> Originally I had designs of allowing the user to override
> {STAP_MODULE_NAME}. But then we have issues with that field being
> unique. For instance if the same systemtap script was run twice, both
> would try to override the field to the same value. Since we're assured
> that {STAP_MODULE_NAME} is unique, I just decided to go with it.

Yeah, that makes it simple, though stap_XXXXX names are hard to
predict/reuse, and stap -m FOO is also inconvenient.  Perhaps the
schema could include a suggested root name, which the pmda could
resolve/reject ties amongst duplicates.
 

> I'm not really fond of the 'pcp-name' field idea. It means more
> validation (on both sides?) in not allowing things like "foo.bar" being
> a value and then "foo.bar.baz" being a value.

The pmda would be in a comfortable position to check such PCP PMNS
constraints, since it'd know every pcp-name used in a schema.


> This is probably implementable, although I do lose the easy
> data/schema validation provided by the stock python JSON stuff.

(Well, not necessarily, as the pcp-* attributes could be just added to
a json-schema.org schema, so the same overall file can serve both
purposes.  Again recall though that we are not really obligated to
validate the random JSON data against any consistency with a schema;
we really only want to pull out designated parts of it for relaying to
PCP.)


> I guess I'm coming at this from a different angle.
> 
> - If we want this pmda to (one day) support more generic JSON sources,
> we'll have to expect generic JSON schemas.

> - If we'd like the systemtap side of things to be able to support other
> data collectors (nagios, zabbix, etc.), it should export a fairly
> generic JSON schema.

> To my mind, the changes you've got here take us farther from both goals.

I hope the above clarifies why this is not actually the case.  We get
to design a *specific* schema/metadata grammar for PCP, and our
tooling would construct these files (e.g., the stap tapset), or our
tools would *include* these files (e.g., imagine writing out by hand a
pcp-name etc.  metadata file for the CEPH JSON data, and including
that with the pmda).


- FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]