This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Improve memmem.


On Wed, May 13, 2015 at 06:06:58PM -0700, Paul Eggert wrote:
> OndÅej BÃlka wrote:
> 
> >You wont get illegal acces.
> 
> I don't see why not.  If memchr returns haystack_end - 1,
> haystack[1] will be equivalent to *haystack_end, i.e., one past the
> end of the haystack.
>
I am using different end here

+  const unsigned char *haystack_end = (const unsigned char *)
+                                      haystack_start + haystack_len
+                                      - needle_len + 1;

 
> >Here is equivalent without goto. It uses implementation behaviour that
> >memcmp(x,y,0) doesn't touch anything.
> 
> Better, but I think it still has the problem.  Also, come to think
> of it, it's not clear that we should bother with the 'needle_len ==
> 2' test (are needles of length 2 really that common?), and things
> are simpler and smaller without it, so how about the attached
> instead?
>
Reason is performance. Idea is to use this algorithm and switch to slow
two-way algorithm only for pathological inputs.

This uses simple heuristic to look only for first k characters and once
it matches character beyond k'th switch to two-way algorithm.

With single memchr as now its heuristic with k=1. What are you
suggesting is k=2. My algorithm uses k=3.

Main motivations is that pairs are still too common and in longer switch
you will quickly switch to slow two-way algorithm.

I could use more sophisticated accounting of runtime cost but it would
likely cost more than benefit as most inputs are quite small.

Finally there is buy-or-rent problem. Computing critical factorization
is very costy, often several times more expensive than using naive
algorithm. A solution to limit loss is buy-or-rent approach. You run a
naive algorithm even if its slower until you reach time diffierence
equal of doing factorization, then you switch to two-way algorithm and
you are at most twice slower than if you guessed correctly what to use.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]