This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Improve performance of strstr


On Mon, Dec 31, 2018 at 09:01:56PM +0000, Wilco Dijkstra wrote:
> This patch significantly improves performance of strstr compared to the
> previous version [1] using a novel modified Horspool algorithm.  Needles up
> to size 256 use a bad-character table indexed by hashed pairs of characters
> to quickly skip past mismatches. Long needles use a self-adapting filtering
> step to avoid comparing the whole needle repeatedly.
> 
> By limiting the needle length to 256, the shift table only requires 8 bits
> per entry, lowering preprocessing overhead and minimizing cache effects.
> This limit also implies worst-case performance is linear.
> 
> Small needles up to size 3 use a dedicated linear search.  Very long needles
> use the Two-Way algorithm.
> 
> The performance gain using the improved bench-strstr [2] on Cortex-A72 is 5..8
> times basic_strstr and 3.7 times twoway_strstr.
> 
> Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
> (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).
> 
> OK for commit?

This is a major performance regression and should not be committed.
The test case does not cover anything near the worst cases. With a
limit more like 16 bytes it might be reasonable but 256 is plenty
large to see the quadratic effects (64k is a big number).

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]