Bug 11097 - debug memory tracker shows memory overwrite in MAXNESTING
Summary: debug memory tracker shows memory overwrite in MAXNESTING
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: David Smith
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-16 15:25 UTC by David Smith
Modified: 2009-12-16 18:08 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Smith 2009-12-16 15:25:48 UTC
After getting the debug memory tracker (activated with -DDEBUG_MEM) going again,
it found a problem with the following testcase (part of
src/testsuite/systemtap.base/control_limits.exp):

====
# stap -u -DMAXNESTING=3 ../src/testsuite/systemtap.base/control_limits.stp
ERROR: MAXNESTING exceeded near identifier 'recurse' at
../src/testsuite/systemtap.base/control_limits.stp:3:10
WARNING: Number of errors: 1, skipped probes: 0
====

Note that the above testcase is designed to fail, so the error printed there is OK.

When run, the debug memory tracker uses printk to print the following on the
console:

====
Dec 15 17:02:02 kvm-rawhide-64-1 kernel: SYSTEMTAP ERROR: Memory fence corrupted
after allocated memory
Dec 15 17:02:02 kvm-rawhide-64-1 kernel: at addr ffff88003e476698. (Allocation
ends at ffff88003e476697)
====

Since the memory fence (a small bit of memory placed after every allocation) has
been overwritten, this means systemtap is writing past the end of the allocated
memory.

After a bit of debugging, this appears to be happening to the allocated context
structure.
Comment 1 Frank Ch. Eigler 2009-12-16 15:33:11 UTC
There is some subtlety to the way we allocate context locals,
with [MAXNESTING+1] slots (to store outgoing arguments to
a potentially excessively nested function).  Maybe something
there is behaving badly.  See translate.cxx:1018 ish.
Comment 2 David Smith 2009-12-16 15:47:47 UTC
Here's code output by the translator to set up the arguments to a function, then
call it:

====
        c->locals[c->nesting+1].function_recurse.n = l->__tmp0;
        function_recurse (c);
====

One of the first things a function does is check MAXNESTING:

====
static void function_recurse (struct context* __restrict__ c) {
  struct function_recurse_locals *  __restrict__ l =
    & c->locals[c->nesting+1].function_recurse;
  (void) l;
  #define CONTEXT c
  #define THIS l
  if (0) goto out;
  c->last_stmt = "identifier 'recurse' at
../src/testsuite/systemtap.base/control_limits.stp:3:10";
  if (unlikely (c->nesting+1 > MAXNESTING)) {
    c->last_error = "MAXNESTING exceeded";
    return;
  } else {
    c->nesting ++;
  }
====

But, this check is too late - setting up the arguments has already written past
the end of the context structure.
Comment 3 David Smith 2009-12-16 15:54:47 UTC
Hmm, strangely enough, we try to handle this by allocating 1 extra set of locals:

  o->newline(-1) << "} locals [MAXNESTING+1];"; 

  // NB: The +1 above for extra room for outgoing arguments of next nested function.
  // If MAXNESTING is set too small, the args will be written, but the MAXNESTING
  // check done at c_unparser::emit_function will reject.
  //
  // This policy wastes memory (one row of locals[] that cannot really
  // be used), but trades that for smaller code (not having to check
  // c->nesting against MAXNESTING at every call site).
Comment 4 David Smith 2009-12-16 18:08:54 UTC
Fixed in commit 101a2bb.  The test:

 if (unlikely (c->nesting+1 > MAXNESTING)) {

was wrong.  It should have been:

 if (unlikely (c->nesting+1 >= MAXNESTING)) {

The original code actually allowed MAXNESTING+1 recursion levels.

Unfortunately, I don't see a good way of adding a testsuite test for this
problem, since the -DDEBUG_MEM logic just prints to the console.