This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] Improve performance of strstr

From: Zack Weinberg <zackw at panix dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
Cc: Rich Felker <dalias at libc dot org>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, GNU C Library <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
Date: Mon, 15 Apr 2019 16:15:02 -0400
Subject: Re: [PATCH v2] Improve performance of strstr
References: <DB5PR08MB1030CDDFA10DCA6E414A844683B20@DB5PR08MB1030.eurprd08.prod.outlook.com> <DB5PR08MB1030F31182F495BA589BAF9A83920@DB5PR08MB1030.eurprd08.prod.outlook.com> <DB5PR08MB1030487C1795F6E4E018BA5383730@DB5PR08MB1030.eurprd08.prod.outlook.com> <AM6PR08MB50781FE6FB08D76A2035032583470@AM6PR08MB5078.eurprd08.prod.outlook.com> <AM6PR08MB507809D6436B43083AF8306883280@AM6PR08MB5078.eurprd08.prod.outlook.com> <2b27b4ca-0370-442d-9f39-210265f00444@arm.com> <AM6PR08MB507895A65E1F0EE5780D39C983280@AM6PR08MB5078.eurprd08.prod.outlook.com> <20190412171613.GB23599@brightrain.aerifal.cx> <AM6PR08MB50783C3A091B45F16965261E832B0@AM6PR08MB5078.eurprd08.prod.outlook.com> <20190415144051.GE23599@brightrain.aerifal.cx> <AM6PR08MB5078A8FE8466F406C3BCD11B832B0@AM6PR08MB5078.eurprd08.prod.outlook.com>

On Mon, Apr 15, 2019 at 2:02 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Hi Rich,
...
> >> Yes, without a reproducible example I can't see what your issue is. You
> >> can't make it go quadratic because it simply isn't.
> >
> > Obviously it's not unbounded because you have a (very large) bound on
> > the size, 256. I can make it do a 256-byte strcmp for nearly every
> > byte of the input haystack. Maybe because of vectorization on some
> > targets that's only 16x slower than the current code rather than 256x
> > slower, but it's still a lot slower.
>
> No you can't. It's impossible to make it do a full 256 byte memcmp every
> character. And bad cases are not bad because of the time spent comparing
> strings - they are bad because of mispredicted branches. So it's not possible
> to compare bad cases without benchmarking actual examples on modern
> CPUs.

This discussion has been going in circles for quite some time now.

Wilco, Rich, I think it would help a lot if you could BOTH write down
some example needle and haystack pairs that you believe will
demonstrate significantly improved performance with your preferred
algorithm, and/or pathologically slow performance with your
dispreferred algorithm.  Even without specific numbers, that will give
everyone something concrete to argue over, at least.

zw

Follow-Ups:
- Re: [PATCH v2] Improve performance of strstr
  - From: Rich Felker
- Re: [PATCH v2] Improve performance of strstr
  - From: Szabolcs Nagy

References:
- Re: [PATCH v2] Improve performance of strstr
  - From: Wilco Dijkstra
- Re: [PATCH v2] Improve performance of strstr
  - From: Szabolcs Nagy
- Re: [PATCH v2] Improve performance of strstr
  - From: Wilco Dijkstra
- Re: [PATCH v2] Improve performance of strstr
  - From: Rich Felker
- Re: [PATCH v2] Improve performance of strstr
  - From: Wilco Dijkstra
- Re: [PATCH v2] Improve performance of strstr
  - From: Rich Felker
- Re: [PATCH v2] Improve performance of strstr
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]