The worst-case complexity of memmem is currently quadratic, O(m*n) where m is
the length of the haystack and n the length of the needle. The worst case
occurs when the distinguishing feature of the needle is at the end, the haystack
consists of repetitions of the needle's prefix, and the needle is either not
present or occurs late in the haystack. Since each iteration examines the bulk
of the needle, but only advances one byte per iteration, you have quadratic
performance. Using the Knuth-Morris-Pratt algorithm or Boyer-Moore algorithm
would guarantee worst-case complexity of O(m+n), at the expense of a memory
allocation of O(n).
Gnulib recently converted its memmem implementation to use KMP, along with an
allocation amortization that avoids KMP until after a bounded number of failed
naive iterations, and falling back to the naive algorithm when KMP cannot
allocate enough memory.
It looks like strstr could also benefit from this algorithmic speedup.
For libc, it is not really feasible to have functions like memmem and strstr
calling malloc. Such functions have always been simple reentrant code before,
and we can't go introducing locking and so forth there. An implementation that
needs a varying amount of memory is only OK if the amount required is small
enough to use alloca (we have __libc_use_alloca to provide a size limit test).
If you would like to supply such an implementation for libc, post the patch to
> it is not really feasible to have functions like memmem and strstr
> calling malloc.
When the call to malloc fails, the proposed memmem code falls back to the
O(n^2) code, so degrades gracefully, still returning the correct result
> Such functions have always been simple reentrant code before,
> and we can't go introducing locking and so forth there.
Can you explain why? The locking is done inside malloc(). The performance
will not suffer in most cases, since malloc() is called only when the
algorithm has hints that it is running on one of the worst-case scenarios.
> post the patch to libc-alpha.
Our experience with submissions via bugzilla has been better than with
posts to libc-alpha.
> Can you explain why?
Because the malloc call makes the function no longer async-signal safe.
While POSIX says nothing about memmem, it is quite desirable to let
the basic mem* and str*/stp* memory/string ops are async-signal safe (of course
strdup is an exception here). So the code should do:
if (__libc_use_alloca (size_needed_to_allocate))
ptr = alloca (size_needed_to_allocate);
// optimized algorithm
// current O(m*n) search
Thanks for the explanations. I tried to modify the KMP algorithm to use
only bounded additional space, as permitted by alloca. See attached patches.
But the result is unfortunately only O(n*(m-tm)) instead of O(n*m), where tm
is the number of 'int's that can be stored on the stack, typically something
like tm = 1024. For the really hard cases, where m can easily be 1000000 or
similar, it hardly helps. Therefore I give up on this.
Created attachment 2161 [details]
Attempted patch for bounded-stack-space solution, part 1
Created attachment 2162 [details]
Attempted patch for bounded-stack-space solution, part 2
I agree with the analysis that memmem (and strstr) can either be async-safe, or
have linear time complexity, but not both (there is no way around the fact that
string searches are inherently quadratic in space-time; you can either have
O(n^2) time and O(1) space, as is currently the case, or you can have O(n) time
and O(n) space, as in KMP or Boyer-Moore, but you cannot have O(n) time and O(1)
space, and O(n) space is not async-safe).
However, POSIX does not require any of the standardized str* or mem* functions
to be async-safe. I plan on asking the Austin Group if this is an inadvertent
oversight for some of the obvious functions, like strlen. But while Jakub is
correct in comment #3 that async-safety is nice, I see nothing that requires us
to preserve async-safety in strstr (and by extrapolation, memmem). This means
that without feedback from the Austin Group, we are now faced with the
engineering decision of whether the default of async-safety or better time
complexity is more desirable in the default case - after all, poor time
complexity can be used as a denial of service attack. Whichever default is
chosen, someone will come up with a counter program that is penalized.
It almost makes me wonder if there should be two alternate interfaces, strstr_r
that guarantees async-safety but with potentially poor performance, and
strstr_fast that guarantees linear performance but with potential locking. Or
is there a way to detect whether we are in the middle of a signal handler, and
make a decision between speed and async-safety accordingly?
I stand corrected - my claim that string searches are inherently quadratic in
space-time appears to be valid for arbitrary alphabets with no further
constraints. But if  is to be believed, then the Two-Way algorithm provides
an algorithm that exploits an additional constraint of random access and a
sorted alphabet to provide an algorithm with the desirable properties of O(n)
time and O(1) space searching. I'm not sure if strcasestr qualifies as using a
sorted alphabet, but memmem and strstr both do. Therefore, I'll experiment with
coding the Two-Way algorithm in gnulib, and if it works, post the results here.
Gnulib now has a reentrant linear-time constant-space solution for memmem, which
has the added benefit of sublinear speed for large enough needles:
It also has a testsuite that exposes a couple of cases where the current glibc
takes orders of magnitude longer than the gnulib implementation to come up with
the same answer:
I'm still working on factoring the code to reuse the bulk of the algorithm in
strstr, strcasestr, strncasestr, and wcsstr (although sublinear speed there is
not an option, since the search for the NUL ending the haystack is an
unavoidable linear task), before posting the same code back here as a diff
against the glibc code base.
Created attachment 2339 [details]
fixed-allocation linear memmem/strstr
2008-03-29 Eric Blake <email@example.com>
Rewrite string searches to O(n) rather than O(n^2).
* string/str-two-way.h: New file, for linear fixed-allocation
* string/memmem.c: New implementation.
* string/strstr.c: New implementation.
* string/strcasestr.c: New implementation.
I don't think you have an assignment for glibc on file. Until you confirm that
there is such an assignment (maybe specific to this code) I won't even look at
I wrote this code for gnulib, where I already had assignment on file. At any
rate, my glibc assignment is now also on file.
I added the code on the trunk.
It appears that this implementation (or at least the form in which it currently appears in glibc trunk) is buggy :-(
See http://sourceware.org/bugzilla/show_bug.cgi?id=12092 for repro details.
http://sourceware.org/bugzilla/show_bug.cgi?id=12092 mentions SSE2, which uses an assembly optimization. Can you first isolate whether your claim of a bug is limited to the SSE version, or whether it also affects the straight C code version?
(In reply to comment #15)
> http://sourceware.org/bugzilla/show_bug.cgi?id=12092 mentions SSE2, which uses
> an assembly optimization.
The assembly implementation is only for SSE4_2, and is correct.
> Can you first isolate whether your claim of a bug is
> limited to the SSE version, or whether it also affects the straight C code
The straight C version is the one that's broken (AFAICT).
(In reply to comment #16)
> The assembly implementation is only for SSE4_2, and is correct.
No, it isn't. The assembly implementation re-introduced the quadratic behavior that existed pre-glibc 2.9, rather than keeping the linear behavior of the C implementation (but only applies to strstr() and strcasestr(), and not memmem()).
What REALLY needs to happen is a hybrid approach that uses assembly for short needles, but C for long needles, since the cost of factoring the needle is proportionately worse for short needles, and assembly can probably perform brute force searching on short needles more efficiently than C code. Quadratic scaling of brute force doesn't matter if you cap the maximum needle length where you use brute force.
Also, the (non-quadratic) SSE4.2 assembly approach needs to be extended to cover memmem().
> The straight C version is the one that's broken (AFAICT).
The fix for the C code has been posted to
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.