The worst-case complexity of memmem is currently quadratic, O(m*n) where m is the length of the haystack and n the length of the needle. The worst case occurs when the distinguishing feature of the needle is at the end, the haystack consists of repetitions of the needle's prefix, and the needle is either not present or occurs late in the haystack. Since each iteration examines the bulk of the needle, but only advances one byte per iteration, you have quadratic performance. Using the Knuth-Morris-Pratt algorithm or Boyer-Moore algorithm would guarantee worst-case complexity of O(m+n), at the expense of a memory allocation of O(n). Gnulib recently converted its memmem implementation to use KMP, along with an allocation amortization that avoids KMP until after a bounded number of failed naive iterations, and falling back to the naive algorithm when KMP cannot allocate enough memory. http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/memmem.c;h=69d2edc;hb=fc068cf It looks like strstr could also benefit from this algorithmic speedup.
For libc, it is not really feasible to have functions like memmem and strstr calling malloc. Such functions have always been simple reentrant code before, and we can't go introducing locking and so forth there. An implementation that needs a varying amount of memory is only OK if the amount required is small enough to use alloca (we have __libc_use_alloca to provide a size limit test). If you would like to supply such an implementation for libc, post the patch to libc-alpha.
> it is not really feasible to have functions like memmem and strstr > calling malloc. When the call to malloc fails, the proposed memmem code falls back to the O(n^2) code, so degrades gracefully, still returning the correct result (just slower). > Such functions have always been simple reentrant code before, > and we can't go introducing locking and so forth there. Can you explain why? The locking is done inside malloc(). The performance will not suffer in most cases, since malloc() is called only when the algorithm has hints that it is running on one of the worst-case scenarios. > post the patch to libc-alpha. Our experience with submissions via bugzilla has been better than with posts to libc-alpha.
> Can you explain why? Because the malloc call makes the function no longer async-signal safe. While POSIX says nothing about memmem, it is quite desirable to let the basic mem* and str*/stp* memory/string ops are async-signal safe (of course strdup is an exception here). So the code should do: if (__libc_use_alloca (size_needed_to_allocate)) { ptr = alloca (size_needed_to_allocate); // optimized algorithm } else // current O(m*n) search
Thanks for the explanations. I tried to modify the KMP algorithm to use only bounded additional space, as permitted by alloca. See attached patches. But the result is unfortunately only O(n*(m-tm)) instead of O(n*m), where tm is the number of 'int's that can be stored on the stack, typically something like tm = 1024. For the really hard cases, where m can easily be 1000000 or similar, it hardly helps. Therefore I give up on this.
Created attachment 2161 [details] Attempted patch for bounded-stack-space solution, part 1
Created attachment 2162 [details] Attempted patch for bounded-stack-space solution, part 2
I agree with the analysis that memmem (and strstr) can either be async-safe, or have linear time complexity, but not both (there is no way around the fact that string searches are inherently quadratic in space-time; you can either have O(n^2) time and O(1) space, as is currently the case, or you can have O(n) time and O(n) space, as in KMP or Boyer-Moore, but you cannot have O(n) time and O(1) space, and O(n) space is not async-safe). However, POSIX does not require any of the standardized str* or mem* functions to be async-safe. I plan on asking the Austin Group if this is an inadvertent oversight for some of the obvious functions, like strlen. But while Jakub is correct in comment #3 that async-safety is nice, I see nothing that requires us to preserve async-safety in strstr (and by extrapolation, memmem). This means that without feedback from the Austin Group, we are now faced with the engineering decision of whether the default of async-safety or better time complexity is more desirable in the default case - after all, poor time complexity can be used as a denial of service attack. Whichever default is chosen, someone will come up with a counter program that is penalized. It almost makes me wonder if there should be two alternate interfaces, strstr_r that guarantees async-safety but with potentially poor performance, and strstr_fast that guarantees linear performance but with potential locking. Or is there a way to detect whether we are in the middle of a signal handler, and make a decision between speed and async-safety accordingly?
I stand corrected - my claim that string searches are inherently quadratic in space-time appears to be valid for arbitrary alphabets with no further constraints. But if [1] is to be believed, then the Two-Way algorithm provides an algorithm that exploits an additional constraint of random access and a sorted alphabet to provide an algorithm with the desirable properties of O(n) time and O(1) space searching. I'm not sure if strcasestr qualifies as using a sorted alphabet, but memmem and strstr both do. Therefore, I'll experiment with coding the Two-Way algorithm in gnulib, and if it works, post the results here. [1] http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
Gnulib now has a reentrant linear-time constant-space solution for memmem, which has the added benefit of sublinear speed for large enough needles: http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/memmem.c;h=622a034;hb=9d8d6cd It also has a testsuite that exposes a couple of cases where the current glibc takes orders of magnitude longer than the gnulib implementation to come up with the same answer: http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=tests/test-memmem.c I'm still working on factoring the code to reuse the bulk of the algorithm in strstr, strcasestr, strncasestr, and wcsstr (although sublinear speed there is not an option, since the search for the NUL ending the haystack is an unavoidable linear task), before posting the same code back here as a diff against the glibc code base.
Created attachment 2339 [details] fixed-allocation linear memmem/strstr 2008-03-29 Eric Blake <ebb9@byu.net> Rewrite string searches to O(n) rather than O(n^2). * string/str-two-way.h: New file, for linear fixed-allocation string searching. * string/memmem.c: New implementation. * string/strstr.c: New implementation. * string/strcasestr.c: New implementation.
I don't think you have an assignment for glibc on file. Until you confirm that there is such an assignment (maybe specific to this code) I won't even look at the code.
I wrote this code for gnulib, where I already had assignment on file. At any rate, my glibc assignment is now also on file.
I added the code on the trunk.
It appears that this implementation (or at least the form in which it currently appears in glibc trunk) is buggy :-( See http://sourceware.org/bugzilla/show_bug.cgi?id=12092 for repro details.
http://sourceware.org/bugzilla/show_bug.cgi?id=12092 mentions SSE2, which uses an assembly optimization. Can you first isolate whether your claim of a bug is limited to the SSE version, or whether it also affects the straight C code version?
(In reply to comment #15) > http://sourceware.org/bugzilla/show_bug.cgi?id=12092 mentions SSE2, which uses > an assembly optimization. The assembly implementation is only for SSE4_2, and is correct. > Can you first isolate whether your claim of a bug is > limited to the SSE version, or whether it also affects the straight C code > version? The straight C version is the one that's broken (AFAICT).
(In reply to comment #16) > The assembly implementation is only for SSE4_2, and is correct. No, it isn't. The assembly implementation re-introduced the quadratic behavior that existed pre-glibc 2.9, rather than keeping the linear behavior of the C implementation (but only applies to strstr() and strcasestr(), and not memmem()). What REALLY needs to happen is a hybrid approach that uses assembly for short needles, but C for long needles, since the cost of factoring the needle is proportionately worse for short needles, and assembly can probably perform brute force searching on short needles more efficiently than C code. Quadratic scaling of brute force doesn't matter if you cap the maximum needle length where you use brute force. Also, the (non-quadratic) SSE4.2 assembly approach needs to be extended to cover memmem(). > > The straight C version is the one that's broken (AFAICT). The fix for the C code has been posted to http://sourceware.org/bugzilla/show_bug.cgi?id=12092
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.