This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Testing 2.16 release candidate with gnulib - quadratic behaviour detected in SSE42 strstr?
- From: Bruno Haible <bruno at clisp dot org>
- To: Carlos O'Donell <carlos_odonell at mentor dot com>
- Cc: libc-alpha <libc-alpha at sourceware dot org>, Eric Blake <eblake at redhat dot com>
- Date: Thu, 28 Jun 2012 13:19 +0200
- Subject: Re: Testing 2.16 release candidate with gnulib - quadratic behaviour detected in SSE42 strstr?
- Bcc: bruno at haible dot de
- References: <4FEBDA04.6010709@mentor.com>
Hi Carlos,
> (d) strstr
>
> This is the one that worries me.
>
> configure:121073: checking whether strstr works in linear time
> configure:121161: gcc -lm -Wl,-rpath=/scratch/carloso/build4-lucid-cs/install//lib64:/scratch/carloso/build4-lucid-cs/install//usr/lib64 -Wl,--dynamic-linker=/scratch/carloso/build4-lucid-cs/install//lib64/ld-linux-x86-64.so.2 -std=gnu99 -o conftest -g -O2 -nostdinc -I/usr/local/tools/gcc-4.3.3/lib/gcc/x86_64-unknown-linux-gnu/4.3.2/include-fixed -I/scratch/carloso/build4-lucid-cs/install//usr/include -I/usr/local/tools/gcc-4.3.3/lib/gcc/x86_64-unknown-linux-gnu/4.3.2/include -Wall conftest.c >&5
> configure:121165: $? = 0
> configure:121171: ./conftest
> configure:121175: $? = 142
> configure: program exited with status 142
> configure: failed program was:
> | /* confdefs.h. */
> ...
> | /* end confdefs.h. */
> |
> | #include <signal.h> /* for signal */
> | #include <string.h> /* for strstr */
> | #include <stdlib.h> /* for malloc */
> | #include <unistd.h> /* for alarm */
> | static void quit (int sig) { exit (sig + 128); }
> |
> | int
> | main ()
> | {
> |
> | int result = 0;
> | size_t m = 1000000;
> | char *haystack = (char *) malloc (2 * m + 2);
> | char *needle = (char *) malloc (m + 2);
> | /* Failure to compile this test due to missing alarm is okay,
> | since all such platforms (mingw) also have quadratic strstr. */
> | signal (SIGALRM, quit);
> | alarm (5);
> | /* Check for quadratic performance. */
> | if (haystack && needle)
> | {
> | memset (haystack, 'A', 2 * m);
> | haystack[2 * m] = 'B';
> | haystack[2 * m + 1] = 0;
> | memset (needle, 'A', m);
> | needle[m] = 'B';
> | needle[m + 1] = 0;
> | if (!strstr (haystack, needle))
> | result |= 1;
> | }
> | return result;
> |
> | ;
> | return 0;
> | }
> configure:121193: result: no
> ...
> user 4m21.060s
Yes, if an strstr implementation takes more than 4 minutes to search for
a 100000 byte long string in a 2000000 byte long string, it's quadratic
behaviour.
The O(n) worst-case algorithm with O(1) intermediate storage was introduced
in string/strstr.c in 2008. But sysdeps/x86_64/multiarch/strstr.c was added
in 2009; it looks like it intends to use Knuth-Morris-Pratt only on small
substrings (the original Knuth-Morris-Pratt is O(n) worst-case and uses
O(n) intermediate storage) and therefore is O(n²).
Bruno