This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: strstr()

From: Tor Myklebust <tmyklebu at caffeine dot csclub dot uwaterloo dot ca>
To: Petr Baudis <pasky at suse dot cz>
Cc: libc-alpha at sources dot redhat dot com
Date: Mon, 17 Jul 2006 23:59:30 -0400 (EDT)
Subject: Re: strstr()
References: <Pine.LNX.4.64.0607082011580.17525@caffeine.csclub.uwaterloo.ca> <20060714120416.GA12931@pasky.or.cz>

On Fri, 14 Jul 2006, Petr Baudis wrote:

Dear diary, on Sun, Jul 09, 2006 at 02:38:56AM CEST, I got a letter
where Tor Myklebust <tmyklebu@caffeine.csclub.uwaterloo.ca> said that...

I've attached a strstr() implementation that uses the present code on
short patterns but switches over to a linear-time, constant-space
algorithm on longer patterns.  It was made against the glibc-2.3.2 package
from Debian sarge.  I've tested this patch extensively on both an i386
machine and an AMD64 machine, and have found no problems.  (It is, of
course, possible that I have overlooked all of its bugs.)


 did you do any benchmarks? It would be good to first see how does it
actually affect performance for various haystack-needle length
configurations. It seems that the performance impact on short haystack +
short needle would be non-negligible and it's more sensible to optimize
for that case; applications requiring large searches are more likely to
have own fast string search implementations.

What do you have in mind for "short"? For needles of length less than eight characters, the old strstr() is used; the performance impact here should be negligible. So, there should be very little difference in performance for truly short needles.

Yes, I have benchmarked my code against the present code. My code fares slightly worse on random cases (as should be expected), but is a big win in the pathological cases. Here is the code I used:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <time.h>

char  a[10000001],
      b[ 5000001];

int main() {
  int hl, nl;
  int j;
  printf("random caees:\n");
  for (hl = 20; hl <= 100; hl += 20)
    for (nl = 10; nl <= 50; nl += 10) {
      int i;
      int ticks = 0;
      for (j = 0; j < 1000; j++) {
        for (i = 0; i < hl; i++)
          a[i] = random() % 2 + 'a';
        for (i = 0; i < nl; i++)
          b[i] = random() % 2 + 'a';
        b[nl] = a[hl] = 0;
        char *foo = strstr(a,b);
        clock_t time0 = clock();
        for (i = 0; i < 22222; i++)
          assert(strstr(a,b) == foo);
        clock_t time1 = clock();
        ticks += time1 - time0;
      }
      printf("hl: %3d; nl: %3d; %.2lf\n", hl, nl,
             ticks / (double)CLOCKS_PER_SEC);
      fflush(stdout);
    }
  printf("bad case for svdb:\n");
  for (hl = 20; hl <= 100; hl += 20)
    for (nl = 10; nl <= 50; nl += 10) {
      int i;
      for (i = 0; i < hl; i++)
        a[i] = 'a';
      for (i = 0; i < nl; i++)
        b[i] = 'a';
      b[nl-1] = 'b';
      b[nl] = a[hl] = 0;
      char *foo = strstr(a,b);
      clock_t time0 = clock();
      for (i = 0; i < 2222222; i++)
        assert(strstr(a,b) == foo);
      clock_t time1 = clock();
      printf("hl: %3d; nl: %3d; %.2lf\n", hl, nl,
             (time1 - time0) / (double)CLOCKS_PER_SEC);
      fflush(stdout);
    }
}

Briefly, for each haystack length in {20, 40, 60, 80, 100} and each needle length in {10, 20, 30, 40, 50}, we generate 1000 random (needle, haystack) pairs of the appropriate lengths, call strstr() 22222 times on each pair, add up the time taken for all of these, and report it. Then, for each haystack length and each needle length, we generate a haystack of all a's and a needle of many a's followed by a b, then call strstr() 2222222 times.

I ran this code on an "AMD Athlon(tm) 64 X2 Dual Core Processor 4200+" against my strstr() and the old strstr(), then used the UNIX 'paste' command, yielding the following output (my strstr() on the left, and the old strstr() on the right):

random caees:	random caees:
hl:  20; nl:  10; 2.37	hl:  20; nl:  10; 1.70
hl:  20; nl:  20; 1.52	hl:  20; nl:  20; 1.72
hl:  20; nl:  30; 1.53	hl:  20; nl:  30; 1.72
hl:  20; nl:  40; 1.54	hl:  20; nl:  40; 1.74
hl:  20; nl:  50; 1.49	hl:  20; nl:  50; 1.73
hl:  40; nl:  10; 4.76	hl:  40; nl:  10; 3.93
hl:  40; nl:  20; 5.14	hl:  40; nl:  20; 4.02
hl:  40; nl:  30; 3.50	hl:  40; nl:  30; 4.00
hl:  40; nl:  40; 3.32	hl:  40; nl:  40; 4.02
hl:  40; nl:  50; 3.67	hl:  40; nl:  50; 3.98
hl:  60; nl:  10; 7.52	hl:  60; nl:  10; 6.40
hl:  60; nl:  20; 8.47	hl:  60; nl:  20; 6.65
hl:  60; nl:  30; 8.73	hl:  60; nl:  30; 6.66
hl:  60; nl:  40; 7.04	hl:  60; nl:  40; 6.67
hl:  60; nl:  50; 7.05	hl:  60; nl:  50; 6.65
hl:  80; nl:  10; 10.54	hl:  80; nl:  10; 8.92
hl:  80; nl:  20; 11.40	hl:  80; nl:  20; 9.43
hl:  80; nl:  30; 12.50	hl:  80; nl:  30; 9.48
hl:  80; nl:  40; 12.57	hl:  80; nl:  40; 9.49
hl:  80; nl:  50; 9.78	hl:  80; nl:  50; 9.45
hl: 100; nl:  10; 13.82	hl: 100; nl:  10; 11.71
hl: 100; nl:  20; 14.99	hl: 100; nl:  20; 12.33
hl: 100; nl:  30; 15.70	hl: 100; nl:  30; 12.34
hl: 100; nl:  40; 16.65	hl: 100; nl:  40; 12.38
hl: 100; nl:  50; 15.76	hl: 100; nl:  50; 12.39
bad case for svdb:	bad case for svdb:
hl:  20; nl:  10; 0.16	hl:  20; nl:  10; 0.66
hl:  20; nl:  20; 0.15	hl:  20; nl:  20; 1.17
hl:  20; nl:  30; 0.16	hl:  20; nl:  30; 1.15
hl:  20; nl:  40; 0.15	hl:  20; nl:  40; 1.16
hl:  20; nl:  50; 0.16	hl:  20; nl:  50; 1.16
hl:  40; nl:  10; 0.24	hl:  40; nl:  10; 1.30
hl:  40; nl:  20; 0.24	hl:  40; nl:  20; 2.90
hl:  40; nl:  30; 0.26	hl:  40; nl:  30; 3.48
hl:  40; nl:  40; 0.25	hl:  40; nl:  40; 3.70
hl:  40; nl:  50; 0.26	hl:  40; nl:  50; 3.74
hl:  60; nl:  10; 0.32	hl:  60; nl:  10; 1.97
hl:  60; nl:  20; 0.33	hl:  60; nl:  20; 4.62
hl:  60; nl:  30; 0.34	hl:  60; nl:  30; 5.92
hl:  60; nl:  40; 0.34	hl:  60; nl:  40; 6.88
hl:  60; nl:  50; 0.36	hl:  60; nl:  50; 7.48
hl:  80; nl:  10; 0.40	hl:  80; nl:  10; 2.65
hl:  80; nl:  20; 0.42	hl:  80; nl:  20; 6.37
hl:  80; nl:  30; 0.42	hl:  80; nl:  30; 8.35
hl:  80; nl:  40; 0.42	hl:  80; nl:  40; 9.97
hl:  80; nl:  50; 0.44	hl:  80; nl:  50; 11.37
hl: 100; nl:  10; 0.49	hl: 100; nl:  10; 3.31
hl: 100; nl:  20; 0.49	hl: 100; nl:  20; 8.10
hl: 100; nl:  30; 0.39	hl: 100; nl:  30; 10.84
hl: 100; nl:  40; 0.41	hl: 100; nl:  40; 13.11
hl: 100; nl:  50; 0.44	hl: 100; nl:  50; 15.27

As for your remark about it being more sensible to optimise for small needle/haystack combinations, I'm not sure this is the case. A cursory grep through the source code of several software packages (gcc, glibc, perl, vim) reveals that strstr() does not appear to be called repeatedly on short needles and haystacks very often. Do you know of any packages that do call strstr() often with small arguments?

I'm sure that most mature applications that actually need to do string matching on large strings will have an efficient strstr() implementation, yes; given that the old strstr() does not run efficiently on large strings, using a different strstr() would be a necessity. People do still write code, though, and the presence of an efficient strstr() in the standard C library ought to make this easier --- it seems rather wasteful of programmer time for every application to reinvent strstr() simply because the standard C library's strstr() is too slow.

I do not, by any means, claim that this is the "best" implementation of strstr() in C. It is, however, considerably more efficient (in the worst case) than the old code while taking only a modest performance hit in the "average" case.

Tor Myklebust

References:
- strstr()
  - From: Tor Myklebust
- Re: strstr()
  - From: Petr Baudis

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]