After getting the debug memory tracker (activated with -DDEBUG_MEM) going again, it found a problem with the following testcase (part of src/testsuite/systemtap.base/control_limits.exp): ==== # stap -u -DMAXNESTING=3 ../src/testsuite/systemtap.base/control_limits.stp ERROR: MAXNESTING exceeded near identifier 'recurse' at ../src/testsuite/systemtap.base/control_limits.stp:3:10 WARNING: Number of errors: 1, skipped probes: 0 ==== Note that the above testcase is designed to fail, so the error printed there is OK. When run, the debug memory tracker uses printk to print the following on the console: ==== Dec 15 17:02:02 kvm-rawhide-64-1 kernel: SYSTEMTAP ERROR: Memory fence corrupted after allocated memory Dec 15 17:02:02 kvm-rawhide-64-1 kernel: at addr ffff88003e476698. (Allocation ends at ffff88003e476697) ==== Since the memory fence (a small bit of memory placed after every allocation) has been overwritten, this means systemtap is writing past the end of the allocated memory. After a bit of debugging, this appears to be happening to the allocated context structure.
There is some subtlety to the way we allocate context locals, with [MAXNESTING+1] slots (to store outgoing arguments to a potentially excessively nested function). Maybe something there is behaving badly. See translate.cxx:1018 ish.
Here's code output by the translator to set up the arguments to a function, then call it: ==== c->locals[c->nesting+1].function_recurse.n = l->__tmp0; function_recurse (c); ==== One of the first things a function does is check MAXNESTING: ==== static void function_recurse (struct context* __restrict__ c) { struct function_recurse_locals * __restrict__ l = & c->locals[c->nesting+1].function_recurse; (void) l; #define CONTEXT c #define THIS l if (0) goto out; c->last_stmt = "identifier 'recurse' at ../src/testsuite/systemtap.base/control_limits.stp:3:10"; if (unlikely (c->nesting+1 > MAXNESTING)) { c->last_error = "MAXNESTING exceeded"; return; } else { c->nesting ++; } ==== But, this check is too late - setting up the arguments has already written past the end of the context structure.
Hmm, strangely enough, we try to handle this by allocating 1 extra set of locals: o->newline(-1) << "} locals [MAXNESTING+1];"; // NB: The +1 above for extra room for outgoing arguments of next nested function. // If MAXNESTING is set too small, the args will be written, but the MAXNESTING // check done at c_unparser::emit_function will reject. // // This policy wastes memory (one row of locals[] that cannot really // be used), but trades that for smaller code (not having to check // c->nesting against MAXNESTING at every call site).
Fixed in commit 101a2bb. The test: if (unlikely (c->nesting+1 > MAXNESTING)) { was wrong. It should have been: if (unlikely (c->nesting+1 >= MAXNESTING)) { The original code actually allowed MAXNESTING+1 recursion levels. Unfortunately, I don't see a good way of adding a testsuite test for this problem, since the -DDEBUG_MEM logic just prints to the console.