This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug runtime/19799] New: deleting from array of aggregate unreliable


https://sourceware.org/bugzilla/show_bug.cgi?id=19799

            Bug ID: 19799
           Summary: deleting from array of aggregate unreliable
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
          Assignee: systemtap at sourceware dot org
          Reporter: raeburn at permabit dot com
  Target Milestone: ---

I've got a SystemTap script that updates entries in an array of aggregates, and
occasionally deletes entries, but the deletion doesn't reliably seem to work.

I'm using a function probe that collects some timing data and updates stats in
an array. Periodically (with a "timer.ms" probe) we pick a range of indices and
print out the values accumulated so far, and (try to) clear them.

  global stats[500000];
  probe module(...).function(...) {
    ...stats[a,1] <<< value1;...stats[a,2] <<< value2;...etc...
  }
  probe timer.ms(NNN) {
    for (...) {
      printf(...stats[x,y]...);
      delete stats[x,y];
    }
  }

As I understand it, the delete should get rid of the array entry, effectively
resetting the counter for the key-pair to zero. What I'm seeing instead is that
often the array entry doesn't get deleted; if I use:

            delete stats[thisIndex,1];
            if ([thisIndex,1] in stats) {
                printf("eek! [%d,1] in stats after deletion??\n",
                       thisIndex);
            }

then the error message fires pretty often, but not always, with my script. And
the values output are clearly continuing to accumulate data from one report to
the next.

This happens with "version 2.7/0.161, rpm 2.7-2.el6" on RHEL6, version 2.9 from
the web site, and git rev d3aa622.

Looking at pmap-gen.c in git (which could use a few more comments maybe?), it
looks to me like the data is stored in per-CPU maps, and collected from all of
them when read out, but _stp_pmap_del appears to operate only on the per-CPU
map for the current CPU.

A quick experiment putting a for_each_possible_cpu loop into _stp_pmap_del
seems to fix the problem for me, on initial testing; the error message above
doesn't fire, and the counters reported are often smaller than on the previous
iteration. I won't bother sending my patch, as it seems to be functional but
isn't very good -- it recomputes the hash value for every per-CPU map, and I
overlooked the aggregate map, but I assume the entry should probably be removed
there too.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]