Function call count
Problem
Sometimes one writes general algorithms to solve a problem where one doesn't have a good knowledge about the possible extreme distributions of input data. The algorithm may loop or nest too many times, sometimes orders of magnitude worse than for the average case the developer may encounter on his own box. Due to algorithmic complexity, the machine may seem to hang.
There was at least one such problem in the memory management code in an earlier RHEL kernel. The story involves some vicious interplay between parts of linux that attempt to recycle unused memory pages; parts that swap them to/from disk; parts that assess how bad memory pressure is at all. Apparently, some iteration function was being called millions of times per higher level operation instead of hundreds like its developer imagined. Or something like that.
In order to diagnose this problem, the developer had to manually instrument his code, recompile, and try to reproduce the unusual load that triggered the algorithmic bloat. It worked, but if he had systemtap at the time, something like this may have helped identify the situation.
Scripts
probe kernel.function("*@mm/*.c") { # probe every function in any C file under mm/ called[probefunc()] <<< 1 # add a count efficiently } global called probe end,timer.ms(30000) { foreach (fn+ in called) # Sort by function name # (fn in called-) # Sort by call count (in decreasing order) printf("%s %d\n", fn, @count(called[fn])) exit() }
Output
# stap counts.stp __alloc_pages 1807 __anon_vma_link 441 __do_page_cache_readahead 119 __filemap_fdatawrite_range 3 __free_pages 51 __free_pages_ok 2 __generic_file_aio_read 327 __generic_file_aio_write_nolock 10 __get_free_pages 6 __get_page_state 6 ... unlink_file_vma 1945 unlock_page 1004 unmap_region 121 unmap_vmas 157 vm_normal_page 17371 vm_stat_account 489 vma_adjust 143 vma_link 324 vma_merge 356 vma_prio_tree_add 1494 vma_prio_tree_insert 402 vma_prio_tree_remove 1714 wait_on_page_bit 3 wait_on_page_writeback_range 3 wakeup_pdflush 1 wb_kupdate 6 wb_timer_fn 6 zone_watermark_ok 1807
Lessons
- Grossly large numbers are obvious even on a simple summary report like the above.
- Filtering further (eliding counts below a few thousand) would focus on busy sections only.
- Systemtap encourages incremental experimentation and exploration.
It would be nice to have finer-granularity ("every statement") for functions of particular interest. It could be expressed by supporting wildcards on the line-number portion of the probe point specification: probe kernel.statement("my_function@*:*") { counts[pp()]<<<1 }
See also WSKernelProfile.