Fastest String Search Algorithm.
Wilco Dijkstra
Wilco.Dijkstra@arm.com
Fri Jun 11 11:45:31 GMT 2021
Hi Amit,
> In my opinion, needle should be in the middle so as not to give any
> algorithm any unfair advantage - like Boyer-Moore which searches from the end.
There is no unfair advantage to any algorithm since they all start searching
from the start (note BM does not start at the end).
> I searched for the needle in the haystack using document editor's search
> and found that needle was located in the beginning.
No, it's always located at the end - it is even tested explicitly.
> The worst case scenario of my algorithm is O(mn) where m is the length of
> the text and n is the length of the pattern.
Exactly. And that is problematic if both m and n are large.
> And this worst case scenario will be hit in only one case:
There is not just one worst-case, the number is practically infinite.
A typical worst case is AAAAAAAAAAAAAAA with needle AAAAAAABA.
> But my theory is this: We should be fastest 100% if the time. If that is not possible,
> then we should try to get as close to 100% as possible.
>
> This means that if for average case, some algorithm is faster 99% of the time but
> slower 1% of the time in the worst case scenario then we should go for this
> algorithm rather than fretting over worst case scenario.
For large needles and haystacks the worst case becomes more important because
the slowdown becomes larger with the length. It's bad even if there is a 0.001%
chance that your computer locks up for minutes.
Cheers,
Wilco
More information about the Libc-alpha
mailing list