Differences between revisions 3 and 4
Revision 3 as of 2007-05-11 23:58:04
Size: 3150
Editor: FChE
Comment:
Revision 4 as of 2008-01-10 19:47:28
Size: 3150
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 78: Line 78:
See also ["WSKernelProfile"]. See also [[WSKernelProfile]].

Function call count

Problem

Sometimes one writes general algorithms to solve a problem where one doesn't have a good knowledge about the possible extreme distributions of input data. The algorithm may loop or nest too many times, sometimes orders of magnitude worse than for the average case the developer may encounter on his own box. Due to algorithmic complexity, the machine may seem to hang.

There was at least one such problem in the memory management code in an earlier RHEL kernel. The story involves some vicious interplay between parts of linux that attempt to recycle unused memory pages; parts that swap them to/from disk; parts that assess how bad memory pressure is at all. Apparently, some iteration function was being called millions of times per higher level operation instead of hundreds like its developer imagined. Or something like that.

In order to diagnose this problem, the developer had to manually instrument his code, recompile, and try to reproduce the unusual load that triggered the algorithmic bloat. It worked, but if he had systemtap at the time, something like this may have helped identify the situation.

Scripts

probe kernel.function("*@mm/*.c") {  # probe every function in any C file under mm/
  called[probefunc()] <<< 1  # add a count efficiently
}
global called
probe end,timer.ms(30000) {
  foreach (fn+ in called)  # Sort by function name
  #       (fn in called-)  # Sort by call count (in decreasing order)
    printf("%s %d\n", fn, @count(called[fn]))
  exit()
}

Output

# stap counts.stp
__alloc_pages 1807
__anon_vma_link 441
__do_page_cache_readahead 119
__filemap_fdatawrite_range 3
__free_pages 51
__free_pages_ok 2
__generic_file_aio_read 327
__generic_file_aio_write_nolock 10
__get_free_pages 6
__get_page_state 6
...
unlink_file_vma 1945
unlock_page 1004
unmap_region 121
unmap_vmas 157
vm_normal_page 17371
vm_stat_account 489
vma_adjust 143
vma_link 324
vma_merge 356
vma_prio_tree_add 1494
vma_prio_tree_insert 402
vma_prio_tree_remove 1714
wait_on_page_bit 3
wait_on_page_writeback_range 3
wakeup_pdflush 1
wb_kupdate 6
wb_timer_fn 6
zone_watermark_ok 1807

Lessons

  • Grossly large numbers are obvious even on a simple summary report like the above.
  • Filtering further (eliding counts below a few thousand) would focus on busy sections only.
  • Systemtap encourages incremental experimentation and exploration.
  • It would be nice to have finer-granularity ("every statement") for functions of particular interest. It could be expressed by supporting wildcards on the line-number portion of the probe point specification: probe kernel.statement("my_function@*:*") { counts[pp()]<<<1 }

See also WSKernelProfile.


WarStories

None: WSFunctionCallCount (last edited 2008-01-10 19:47:28 by localhost)