1 The Systemtap Translator - a tour on the inside
7 - pass 2: semantic analysis (parts 1, 2, 3)
8 - pass 3: translation (parts 1, 2)
12 ------------------------------------------------------------------------
13 Translator general principles
15 - written in standard C++
16 - mildly O-O, sparing use of C++ features
17 - uses "visitor" concept for type-dependent (virtual) traversal
19 ------------------------------------------------------------------------
22 - abstract syntax tree <staptree.h>
23 - family of types and subtypes for language parts: expressions,
25 - includes outermost constructs: probes, aliases, functions
26 - an instance of "stapfile" represents an entire script file
27 - each annotated with a token (script source coordinates)
28 - data persists throughout run
31 - contains run-time parameters from command line
32 - contains all globals
33 - passed by reference to many functions
35 ------------------------------------------------------------------------
38 - hand-written recursive-descent <parse.cxx>
39 - language specified in man page <stap.1>
40 - reads user-specified script file
41 - also searches path for all <*.stp> files, parses them too
42 - => syntax errors are caught immediately, throughout tapset
43 - now includes baby preprocessor
45 %( kernel_v == "2.6.9" %? inline("foo") %: function("bar") %)
47 - enforces guru mode for embedded code %{ C %}
49 ------------------------------------------------------------------------
50 Pass 2 - semantic analysis - step 1: resolve symbols
52 - code in <elaborate.cxx>
53 - want to know all global and per-probe/function local variables
54 - one "vardecl" instance interned per variable
55 - fills in "referent" field in AST for nodes that refer to it
56 - collect "needed" probe/global/function list in session variable
57 - loop over file queue, starting with user script "stapfile"
58 - add to "needed" list this file's globals, functions, probes
59 - resolve any symbols used in this file (function calls, variables)
61 - if not resolved, search through all tapset "stapfile" instances;
62 add to file queue if matched
63 - if still not resolved, create as local scalar, or signal an error
65 ------------------------------------------------------------------------
66 Pass 2 - semantic analysis - step 2: resolve types
68 - fills in "type" field in AST
69 - iterate along all probes and functions, until convergence
70 - infer types of variables from usage context / operators:
71 a = 5 # a is a pe_long
72 b["foo",a]++ # b is a pe_long array with indexes pe_string and pe_long
73 - loop until no further variable types can be inferred
74 - signal error if any still unresolved
76 ------------------------------------------------------------------------
77 Pass 2 - semantic analysis - step 3: resolve probes
79 - probe points turned to "derived_probe" instances by code in <tapsets.cxx>
80 - derived_probes know how to talk to kernel API for registration/callbacks
81 - aliases get expanded at this point
82 - some probe points ("begin", "end", "timer*") are very simple
83 - dwarf ("kernel*", "module*") implementation very complicated
84 - target-variables "$foo" expanded to getter/setter functions
85 with synthesized embedded-C
87 ------------------------------------------------------------------------
88 Pass 3 - translation - step 1: data
91 - we now know all types, all variables
92 - strings are everywhere copied by value (MAXSTRINGLEN bytes)
93 - emit data storage mega-struct "context" for all probes/functions
94 - array instantiated per-CPU, per-nesting-level
95 - can be pretty big static data
97 ------------------------------------------------------------------------
98 Pass 3 - translation - step 2: code
100 - map script functions to C functions taking a context pointer
101 - map probes to two C functions:
102 - one to interface with the probe point infrastructure (kprobes,
103 kernel timer): reserves per-cpu context
104 - one to implement probe body, just like a script function
105 - emit global startup/shutdown routine to manage orderly
106 registration/deregistration of probes
107 - expressions/statements emitted in "natural" evaluation sequence
108 - emit code to enforce activity-count limits, simple safety tests
109 - global variables protected by locks
111 function foo () { k ++ } # write lock around increment
112 probe bar { if (k>5) ... } # read lock around read
113 - same thing for arrays, except foreach/sort take longer-duration locks
115 ------------------------------------------------------------------------
119 - write out C code in a temporary directory
120 - call into kbuild makefile to build module
125 - clean up temporary directory