]> sourceware.org Git - systemtap.git/commitdiff
levenshtein: half substitution penalty if same case
authorJonathan Lebon <jlebon@redhat.com>
Fri, 15 Nov 2013 20:11:46 +0000 (15:11 -0500)
committerJonathan Lebon <jlebon@redhat.com>
Fri, 15 Nov 2013 20:36:40 +0000 (15:36 -0500)
We tweak the penalties so that if they differ only by their casing, then
the substitution penalty is halved.

Example of the effect:

Equal weights:
SYS_close --> SYSC_close, SyS_close, SYSC_clone, SyS_clone, con_close

Half penalty for diff casing
SYS_close --> SyS_close, SYSC_close, SyS_clone, sys_close, SYSC_clone

The 'SYSC' versions rank lower. But more importantly, we see 'sys_close'
become higher rank than 'con_close'.

util.cxx

index b2c6df053f76f72b152206c940ccf6b22c0d2954..32ea329fe87c92e231dbe2ea62b49dcb1b5f1716 100644 (file)
--- a/util.cxx
+++ b/util.cxx
@@ -1120,10 +1120,16 @@ levenshtein(const string& a, const string& b)
       if (a[i-1] == b[j-1]) // match
         d(i,j) = d(i-1, j-1);
       else // penalties open for adjustments
-        d(i,j) = min(min(
-            d(i-1,j-1) + 1,  // substitution
-            d(i-1,j)   + 1), // deletion
-            d(i,j-1)   + 1); // insertion
+        {
+          unsigned subpen = 2; // substitution penalty
+          // check if they are upper/lowercase related
+          if (tolower(a[i-1]) == tolower(b[j-1]))
+            subpen = 1; // half penalty
+          d(i,j) = min(min(
+              d(i-1,j-1) + subpen,  // substitution
+              d(i-1,j)   + 2), // deletion
+              d(i,j-1)   + 2); // insertion
+        }
     }
   }
 
This page took 0.031079 seconds and 5 git commands to generate.