PATCH] Improve performance of memmem
Eric Blake
eblake@redhat.com
Tue Dec 18 21:08:00 GMT 2018
On 12/18/18 2:23 PM, Wilco Dijkstra wrote:
>> ...this is the naive quadratic implementation. But it does certainly
>> qualify as the algorithm to use for someone wanting a compilation as
>> small as possible, and who doesn't care about speed (most likely because
>> their needles are unlikely to be large enough to notice the effects of
>> quadratic slowdowns).
>
> However the interesting thing is that this naive implementation is actually
> faster than the existing Two-Way implementation on all typical inputs: it is
>> 50% faster for any needle up to 32 bytes on any haystack size. So it
> would have been better if memmem used the naive implementation for
> small needles. The quadratic behaviour only becomes problematic when
> needles are like 4KBytes.
Indeed - when I first submitted a non-quadratic strstr/memmem, I was so
caught up on benchmarking the large outlier cases for linear behavior
that I forgot to benchmark the overhead of the more common short cases.
Algorithmic complexity doesn't overcome sheer overhead of setup, so I
have been glad to watch over time as various consumers of my original
Two-Way code have added heuristics to avoid it for short needles.
>>> +Â memset (shift, 0, sizeof (shift));
>>
>> I don't know if we have a preference for 'sizeof shift' rather than
>> 'sizeof (shift)'.
>
> Of the 1677 occurences of sizeof in newlib, 984 are "sizeof (x)", 539 are
> "sizeof(x)" (missing space) and 74 are sizeof x, so I guess parenthesis is
> the preferred spelling.
There's also the fact that:
sizeof (int)
has to use parenthesis; grepping for the use of () is tricky if you
don't filter out the uses that are for types rather than for expressions
(but I _do_ prefer sizeof expression over sizeof (type), which is why I
prefer avoiding the () to make it easier to use grep to tell the two apart).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
More information about the Newlib
mailing list