[PATCH] optimize memmem for short needles

Wilco Dijkstra Wilco.Dijkstra@arm.com
Wed Mar 6 19:10:36 GMT 2024


Hi James,

> The current implementation does not check for an early match with memchr
> before initializing the shift table because large shifts may be faster
> than memchr. However, for short shifts, memchr may be faster.
> find_edge_in_needle (taken from strstr_avx512) is used to find a rarer
> character for memchr to find.

It looks like it is faster on this particular benchmark - however I'm not convinced
doing this is faster in general. I tried the change on the SMART benchmark [1][2],
and there it is generally slower. A few cases show quite large differences:

rand2, len 2: 58% slower (len 3-16 also slower but by less)
rand4, len 2: 43% slower (len 3-16 also slower but by less)
rand8, len 2: 16% slower (len 3-16 slower but close)
chineseTexts, len 2: 8% faster
genome, len 2: 44% slower (len 3 slower but close)

Overall there is only one clear gain on Chinese but there are large slowdowns
with short needles in texts with a small number of symbols.

Cheers,
Wilco

[1] https://www.dmi.unict.it/faro/smart/howto.php
[2] https://github.com/smart-tool/smart.git


More information about the Libc-alpha mailing list