Bug 5514 - memmem is O(n^2), but should be O(n)
Summary: memmem is O(n^2), but should be O(n)
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.4
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-20 13:30 UTC by Eric Blake
Modified: 2014-07-03 11:46 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Attempted patch for bounded-stack-space solution, part 1 (1.05 KB, patch)
2007-12-26 16:19 UTC, Bruno Haible
Details | Diff
Attempted patch for bounded-stack-space solution, part 2 (4.14 KB, patch)
2007-12-26 16:20 UTC, Bruno Haible
Details | Diff
fixed-allocation linear memmem/strstr (6.93 KB, patch)
2008-03-29 16:43 UTC, Eric Blake
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Blake 2007-12-20 13:30:12 UTC
The worst-case complexity of memmem is currently quadratic, O(m*n) where m is
the length of the haystack and n the length of the needle.  The worst case
occurs when the distinguishing feature of the needle is at the end, the haystack
consists of repetitions of the needle's prefix, and the needle is either not
present or occurs late in the haystack.  Since each iteration examines the bulk
of the needle, but only advances one byte per iteration, you have quadratic
performance.  Using the Knuth-Morris-Pratt algorithm or Boyer-Moore algorithm
would guarantee worst-case complexity of O(m+n), at the expense of a memory
allocation of O(n).

Gnulib recently converted its memmem implementation to use KMP, along with an
allocation amortization that avoids KMP until after a bounded number of failed
naive iterations, and falling back to the naive algorithm when KMP cannot
allocate enough memory.

http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/memmem.c;h=69d2edc;hb=fc068cf

It looks like strstr could also benefit from this algorithmic speedup.
Comment 1 Roland McGrath 2007-12-20 19:53:03 UTC
For libc, it is not really feasible to have functions like memmem and strstr
calling malloc.  Such functions have always been simple reentrant code before,
and we can't go introducing locking and so forth there.  An implementation that
needs a varying amount of memory is only OK if the amount required is small
enough to use alloca (we have __libc_use_alloca to provide a size limit test). 
If you would like to supply such an implementation for libc, post the patch to
libc-alpha.
Comment 2 Bruno Haible 2007-12-21 13:50:39 UTC
> it is not really feasible to have functions like memmem and strstr
> calling malloc.

When the call to malloc fails, the proposed memmem code falls back to the
O(n^2) code, so degrades gracefully, still returning the correct result
(just slower).

> Such functions have always been simple reentrant code before,
> and we can't go introducing locking and so forth there.

Can you explain why? The locking is done inside malloc(). The performance
will not suffer in most cases, since malloc() is called only when the
algorithm has hints that it is running on one of the worst-case scenarios.

> post the patch to libc-alpha.

Our experience with submissions via bugzilla has been better than with
posts to libc-alpha.
Comment 3 Jakub Jelinek 2007-12-21 14:00:10 UTC
> Can you explain why?

Because the malloc call makes the function no longer async-signal safe.
While POSIX says nothing about memmem, it is quite desirable to let
the basic mem* and str*/stp* memory/string ops are async-signal safe (of course
strdup is an exception here).  So the code should do:
if (__libc_use_alloca (size_needed_to_allocate))
  {
      ptr = alloca (size_needed_to_allocate);
      // optimized algorithm
    }
  else
    // current O(m*n) search
Comment 4 Bruno Haible 2007-12-26 16:18:21 UTC
Thanks for the explanations. I tried to modify the KMP algorithm to use
only bounded additional space, as permitted by alloca. See attached patches.
But the result is unfortunately only O(n*(m-tm)) instead of O(n*m), where tm
is the number of 'int's that can be stored on the stack, typically something
like tm = 1024. For the really hard cases, where m can easily be 1000000 or
similar, it hardly helps. Therefore I give up on this.
Comment 5 Bruno Haible 2007-12-26 16:19:26 UTC
Created attachment 2161 [details]
Attempted patch for bounded-stack-space solution, part 1
Comment 6 Bruno Haible 2007-12-26 16:20:00 UTC
Created attachment 2162 [details]
Attempted patch for bounded-stack-space solution, part 2
Comment 7 Eric Blake 2008-01-04 23:04:42 UTC
I agree with the analysis that memmem (and strstr) can either be async-safe, or
have linear time complexity, but not both (there is no way around the fact that
string searches are inherently quadratic in space-time; you can either have
O(n^2) time and O(1) space, as is currently the case, or you can have O(n) time
and O(n) space, as in KMP or Boyer-Moore, but you cannot have O(n) time and O(1)
space, and O(n) space is not async-safe).

However, POSIX does not require any of the standardized str* or mem* functions
to be async-safe.  I plan on asking the Austin Group if this is an inadvertent
oversight for some of the obvious functions, like strlen.  But while Jakub is
correct in comment #3 that async-safety is nice, I see nothing that requires us
to preserve async-safety in strstr (and by extrapolation, memmem).  This means
that without feedback from the Austin Group, we are now faced with the
engineering decision of whether the default of async-safety or better time
complexity is more desirable in the default case - after all, poor time
complexity can be used as a denial of service attack.  Whichever default is
chosen, someone will come up with a counter program that is penalized.

It almost makes me wonder if there should be two alternate interfaces, strstr_r
that guarantees async-safety but with potentially poor performance, and
strstr_fast that guarantees linear performance but with potential locking.  Or
is there a way to detect whether we are in the middle of a signal handler, and
make a decision between speed and async-safety accordingly?

Comment 8 Eric Blake 2008-01-05 00:09:41 UTC
I stand corrected - my claim that string searches are inherently quadratic in
space-time appears to be valid for arbitrary alphabets with no further
constraints.  But if [1] is to be believed, then the Two-Way algorithm provides
an algorithm that exploits an additional constraint of random access and a
sorted alphabet to provide an algorithm with the desirable properties of O(n)
time and O(1) space searching.  I'm not sure if strcasestr qualifies as using a
sorted alphabet, but memmem and strstr both do.  Therefore, I'll experiment with
coding the Two-Way algorithm in gnulib, and if it works, post the results here.

[1] http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260 
Comment 9 Eric Blake 2008-01-08 21:50:35 UTC
Gnulib now has a reentrant linear-time constant-space solution for memmem, which
has the added benefit of sublinear speed for large enough needles:
http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/memmem.c;h=622a034;hb=9d8d6cd

It also has a testsuite that exposes a couple of cases where the current glibc
takes orders of magnitude longer than the gnulib implementation to come up with
the same answer:
http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=tests/test-memmem.c

I'm still working on factoring the code to reuse the bulk of the algorithm in
strstr, strcasestr, strncasestr, and wcsstr (although sublinear speed there is
not an option, since the search for the NUL ending the haystack is an
unavoidable linear task), before posting the same code back here as a diff
against the glibc code base.
Comment 10 Eric Blake 2008-03-29 16:43:48 UTC
Created attachment 2339 [details]
fixed-allocation linear memmem/strstr

2008-03-29  Eric Blake	<ebb9@byu.net>

	Rewrite string searches to O(n) rather than O(n^2).
	* string/str-two-way.h: New file, for linear fixed-allocation
	string searching.
	* string/memmem.c: New implementation.
	* string/strstr.c: New implementation.
	* string/strcasestr.c: New implementation.
Comment 11 Ulrich Drepper 2008-04-07 18:28:11 UTC
I don't think you have an assignment for glibc on file.  Until you confirm that
there is such an assignment (maybe specific to this code) I won't even look at
the code.
Comment 12 Eric Blake 2008-04-21 12:06:21 UTC
I wrote this code for gnulib, where I already had assignment on file.  At any
rate, my glibc assignment is now also on file.
Comment 13 Ulrich Drepper 2008-05-15 04:43:09 UTC
I added the code on the trunk.
Comment 14 Paul Pluzhnikov 2010-10-05 17:06:24 UTC
It appears that this implementation (or at least the form in which it currently appears in glibc trunk) is buggy :-(

See http://sourceware.org/bugzilla/show_bug.cgi?id=12092 for repro details.
Comment 15 Eric Blake 2010-10-05 17:11:59 UTC
http://sourceware.org/bugzilla/show_bug.cgi?id=12092 mentions SSE2, which uses an assembly optimization.  Can you first isolate whether your claim of a bug is limited to the SSE version, or whether it also affects the straight C code version?
Comment 16 Paul Pluzhnikov 2010-10-05 17:14:16 UTC
(In reply to comment #15)
> http://sourceware.org/bugzilla/show_bug.cgi?id=12092 mentions SSE2, which uses
> an assembly optimization.

The assembly implementation is only for SSE4_2, and is correct.

> Can you first isolate whether your claim of a bug is
> limited to the SSE version, or whether it also affects the straight C code
> version?

The straight C version is the one that's broken (AFAICT).
Comment 17 Eric Blake 2010-10-06 15:52:16 UTC
(In reply to comment #16)
> The assembly implementation is only for SSE4_2, and is correct.

No, it isn't.  The assembly implementation re-introduced the quadratic behavior that existed pre-glibc 2.9, rather than keeping the linear behavior of the C implementation (but only applies to strstr() and strcasestr(), and not memmem()).

What REALLY needs to happen is a hybrid approach that uses assembly for short needles, but C for long needles, since the cost of factoring the needle is proportionately worse for short needles, and assembly can probably perform brute force searching on short needles more efficiently than C code.  Quadratic scaling of brute force doesn't matter if you cap the maximum needle length where you use brute force.

Also, the (non-quadratic) SSE4.2 assembly approach needs to be extended to cover memmem().

> 
> The straight C version is the one that's broken (AFAICT).

The fix for the C code has been posted to
 http://sourceware.org/bugzilla/show_bug.cgi?id=12092
Comment 18 Jackie Rosen 2014-02-16 19:43:52 UTC Comment hidden (spam)