This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: (PR11207) Macroprocessor discussion


Current status of token-based macroprocessor. Bit rough - still poring over bits of it, but it's at the stage where I'm ready to start experimenting with implementing the basics. (Modulo any commentary that anyone has for me.)

# Proposal for embedded (token-based) macroprocessor

This is a macroprocessor much more tightly coupled to the systemtap language than with a text-based approach. It fits in place of `scan_pp()` in the parser code, hence transforming a stream of tokens into a stream of tokens with preprocessor constructs expanded.

Documentation generation is handled by adding a separate mode to `scan_pp()`, which expands macros *including* docstrings. This mode is wrapped in a separate (platform-independent) frontend, which simply extracts the resulting docstrings and uses them as input to kernel-doc (or some other suitable documentation generator).

Macro definitions are always local to a single file; it is impossible to e.g. define a macro in a tapset and then use it in a systemtap script that includes the tapset.

# Basic Constructs

## Macro Definition (one line)

    %define macro_name(param_name1, param_name2, ...) macro_body

The body of the macro definition is expanded at the point of invocation, not when the macro is defined.

## Macro Definition (multiple lines)

    %define macro_name(param_name1, param_name2, ...) %(
      macro_body
    %)

The body of the macro definition is expanded at the point of invocation, not when the macro is defined.

Macro definitions cannot be nested (at least in the initial version of the preprocessor); this rule considerably simplifies semantics and implementation. Brackets belonging to conditionals must be balanced properly.

    // The following is an error -- brackets belong to the %define:
    %define foo %( condition %? a %: b %)

    // Use the following instead to avoid ambiguity:
    %define foo %(
        %( condition %? a %: b %)
    %)

The tricky possibility of someone trying to write a one-line conditional directly inside a macro definition is handled by assuming that a `%(` following a `%define` is automatically swallowed by the `%define`, as shown above.

## Macro Invocation

    macro_name
    macro_name(param1,param2,...)

Brackets inside a macro invocation parameter must be balanced.

Macro parameters are expanded *before* being passed to the macro itself, except in a few special cases (namely the `grab_*` macros).

Parameter names occurring inside a macro body are expanded to the original values, in the same fashion as parameterless macros.

## Macro Invocation -- Alternate Variant

    %macro_name
    %macro_name(param1,param2,...)

Brackets inside a macro invocation parameter must be balanced.

Macro parameters are expanded *before* being passed to the macro itself, except in a few special cases (namely the `grab_*` macros).

Parameter names occurring inside a macro body are expanded to the original values, in the same fashion as parameterless macros.

The sigil used for macro invocation here conflicts with the modulo operator `%`, but this is not a huge problem because expressions like `200%(3+4)` already trigger the current preprocessor.

## Docstring (given special treatment)

Docstrings are ignored outside of a special docstring-generation mode. Otherwise, a docstring is retained in the token stream as a special kind of token.

    /** This standard docstring is recognized by the macroprocessor.
        Macros inside the docstring are expanded when marked with a
        sigil '%'. Docstrings can also be grabbed and manipulated
        using the special directives described below. */

    /*** This is also recognized as a docstring; but additionally, as
         the lexer encounters it, it is pasted together with the
         previous docstring that occurs (in the unexpanded text). */

Because the content of a docstring is arbitrary text, the token-based preprocessor is not suited to generating docstring contents in the exact same manner as in ordinary code. Instead, the sigil trick is required for a macroprocessing construct to be recognized:

    /*** To evaluate a macro 'foo', mark it with a sigil like so: %foo */

This causes the preprocessor to start consuming tokens until it has parsed an entire macro invocation. Then the invocation is expanded, and the resulting tokens are synthesized back into text.

Because the result of evaluating conditionals depends on the target system being compiled for, while documentation is supposed to be largely platform-independent, conditionals cannot occur inside docstrings.

## Manipulation of Docstrings

Evaluation of docstrings that immediately precede a macro invocation or `%define` construct is delayed. If the invocation turns out to grab the docstring, it is deleted from the token stream at that point (and reinserted wherever the invocation requires it to, as detailed below). `%define` constructs always grab the immediately preceding docstrings.

The basic mechanism for retrieving a docstring is outlined below:

     grab_invocation_docs(param_name1,param_name2,...)

Found in a macro body, this grabs the docstring closest to the point where the macro was invoked, and inserts it whenever a macro of the form `invocation_docs(param1,param2,...)` is encountered. The parameters are substituted into the body of the docstring.

     grab_definition_docs(param_name1,param_name2,...)

Likewise, this grabs the docstring closest to the point where the macro was *defined*; the docstring is retrieved using the macro `definition_docs`.

The arguments to a `grab_*` macro should be single identifiers. They are not macro expanded. (See Usage Example 3 below.)

The arguments to a `definition_docs` or `invocation_docs` macro can be arbitrary text and are macro expanded as usual, but keep in mind that the text is tokenized and then de-tokenized (as explained above), which limits the amount of control one has over whitespace in the final output.

Here is an example of how this works. Docstrings can be constructed as follows:

    %define cakery %(
        grab_invocation_docs(foodstuff)

        /** EXTRA BLAH BLAH */
        invocation_docs(CHEESE)
        /*** ADDITIONAL BLAH BLAH */
        probe cheese { ... }

        invocation_docs(BEER)
        probe beer { ... }
    %)

    /** BLAH BLAH BLAH DOCUMENTATION ABOUT %foodstuff */
    cakery

Note in particular that the three-star doc comment is to be used when extending a doc comment *in the original source text* (not in the output). The macros `invocation_docs` and `definition_docs` are smart about whether or not they occur immediately after a docstring (if they occur after a docstring, their output is glued together with that docstring).

Hence, this example produces output similar to the following:

    /** EXTRA BLAH BLAH
     * BLAH BLAH BLAH DOCUMENTATION ABOUT CHEESE */
    probe cheese { ... }

    /** BLAH BLAH BLAH DOCUMENTATION ABOUT BEER */
    probe beer { ... }

## Preprocessor Conditionals

    %( ... %? ... %: ... %)

These work identically to preprocessor conditionals in the current stap. They are not expanded within docstrings.

## Command Line Arguments

    $1, $2, $3, ..., $#
    @1, @2, @3, ..., @#

These work identically to command line arguments in the current stap. They are in fact already handled for us at the lexer level, being a text-based feature. This is the natural choice given the way the overall parser is structured, but it does in fact permit some spectacularly odd abuses, e.g.

    $ stap -e 'probe %( kernel_v > "3.0" $1 begin %: end %) { println("foo") }' '%?'

In docstring mode, command line arguments are simply not expanded.

# Usage Example 1 -- equivalent to a standard cpp #define macro

    %define AREA(base,height) ((base)*(height)/2.0)
    a = AREA(2+2,5) // correctly handles precedence

# Usage Example 2 -- defining a shorthand for a cast operation

    %define FOO(ptr) @cast((ptr),"struct foo","/path/to/app:<sys/foo.h")
    bar = FOO(p)->bar
    baz = FOO(p)->baz

# Usage Example 3 -- defining multiple probes with associated docstrings

    /** ... doc comments common to all ip probes ... */
    %define make_ipprobes(probe_name, hook4_name, hook6_name) %(
        grab_invocation_docs(probe_name, ipprotocol_name) // note that probe_name is not expanded
        grab_definition_docs

        invocation_docs(ip,IP)
        definition_docs
        /*** ... doc comments specific to ip ... */
        probe netfilter.ip.probe_name = netfilter.ipv4.probe_name,
                netfilter.ipv6.probe_name { }

        invocation_docs(ipv4,IPv4)
        definition_docs
        /*** ... doc comments specific to ipv4 ... */
        probe netfilter.ipv4.probe_name
                = netfilter.pf("NFPROTO_IPV4").hook(hook4_name) {
          ... stuff specific to ipv4 ...
        }

        invocation_docs(ipv6,IPv6)
        definition_docs
        /*** ... doc comments specific to ipv6 ... */
        probe netfilter.ipv6.probe_name
              = netfilter.pf("NFPROTO_IPV6").hook(hook6_name) {
          ... stuff specific to ipv6 ...
        }
    %)

    /** probe netfilter.%probe_name.pre_routing - Called before an %ipprotocol_name packet is routed */
    make_ipprobes(pre_routing,"NF_INET_PRE_ROUTING","NF_IP6_PRE_ROUTING")


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]