Bug 15781 - improve pass-2 function suggestions
Summary: improve pass-2 function suggestions
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-25 13:45 UTC by Jonathan Lebon
Modified: 2014-01-22 21:10 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Lebon 2013-07-25 13:45:57 UTC
In bug 15375, function suggestions were added when DWARF function probes could not be resolved. There are two things that could be improved:

1. Also suggest things for optional probes

This is also mentioned in the code in tapsets.cxx:7189. In a nutshell, if a probe contains optional probe points, then suggestions should also be offered for the optional ones if none of the probe points could be resolved. A few ideas on how we could go about this:
 - Always calculate suggestions, regardless of optionality, except for process(glob) and library(glob) recursive calls to build()
 - To distinguish between truly script optional probe points (appended with ?) and optional probes that were synthesized from process(glob) and library(glob) components, instead of overriding the optional parameter, create another field in the probe_point struct (or change optional member to an enum?)
 - Buffer the suggestions in the original derive_probes() call and print them out at the end only, if all probe points failed.

2. Improve levenshtein() algorithm

- Since we're only going to be printing MAXFUNCS functions, we can abort the calculations in levenshtein() at any point we know that the final score will be higher than the top MAXFUNCS functions already calculated. This is also mentioned in the code in util.cxx:1087.
- An easy way to quickly improve performance is to compare the length of the strings right from the start. If their diff is larger than the max, then abort.
- Other possible tips available in these pages:

http://en.wikipedia.org/wiki/Levenshtein_distance
http://stackoverflow.com/questions/3183149/
http://markos.gaivo.net/blog/?p=211
Comment 1 Jonathan Lebon 2013-12-12 15:43:31 UTC
(In reply to Jonathan Lebon from comment #0)
> 1. Also suggest things for optional probes
> 
> This is also mentioned in the code in tapsets.cxx:7189. In a nutshell, if a
> probe contains optional probe points, then suggestions should also be
> offered for the optional ones if none of the probe points could be resolved.

Clarification on what this means:

probe kernel.function("not_a_func") { next }
   -- p2 fail, suggests alternatives to function name

probe kernel.function("not_a_func1"), kernel.function("not_a_func2") { next }
   -- p2 fail, suggests alternatives for both function names

probe kernel.function("not_a_func1")?, kernel.function("vfs_read") { next }
   -- works, does not suggest anything

probe kernel.function("not_a_func1")?, kernel.function("not_a_func2")? { next }
   -- works, does not suggest anything (assuming there are other probes in the script, otherwise p2 fail with 'no probes found' and no suggestions)

probe kernel.function("not_a_func1")?, kernel.function("not_a_func2") { next }
   -- p2 fail, suggests alternatives for the non-optional probe point only

What we're describing in this PR is that in the last case, we should suggest something for both probe points, regardless of optionality, because the whole probe failed to resolved. We could also extend this to catch the before-last case as well if it would result in a p2 fail.

This might be more trouble than it's worth, especially due to globby probes and recursive calls, as well as performance issues, if the approach relies on calculating suggestions for each probe point prematurely.

> 2. Improve levenshtein() algorithm
> 
> - Since we're only going to be printing MAXFUNCS functions, we can abort the
> calculations in levenshtein() at any point we know that the final score will
> be higher than the top MAXFUNCS functions already calculated. This is also
> mentioned in the code in util.cxx:1087.
> - An easy way to quickly improve performance is to compare the length of the
> strings right from the start. If their diff is larger than the max, then
> abort.

This was done in commit 59b11ea.
Comment 2 Jonathan Lebon 2014-01-16 23:19:53 UTC
(In reply to Jonathan Lebon from comment #0)
> In bug 15375, function suggestions were added when DWARF function probes
> could not be resolved. There are two things that could be improved:
> 
> 1. Also suggest things for optional probes

I've created the branch jlebon/pr15781 containing patches to do this. The technique relies on derive_probes buffering semantic_error objects caught for optional probe points as well as a new field in probe_point to track whether the probe point was created from a globby one.
Comment 3 Jonathan Lebon 2014-01-22 21:10:23 UTC
Now in master (commit 4bb17d5 and previous ones).