PATCH] Improve performance of memmem

Eric Blake eblake@redhat.com
Tue Dec 18 21:08:00 GMT 2018


On 12/18/18 2:23 PM, Wilco Dijkstra wrote:
>> ...this is the naive quadratic implementation.  But it does certainly
>> qualify as the algorithm to use for someone wanting a compilation as
>> small as possible, and who doesn't care about speed (most likely because
>> their needles are unlikely to be large enough to notice the effects of
>> quadratic slowdowns).
> 
> However the interesting thing is that this naive implementation is actually
> faster than the existing Two-Way implementation on all typical inputs: it is
>> 50% faster for any needle up to 32 bytes on any haystack size. So it
> would have been better if memmem used the naive implementation for
> small needles. The quadratic behaviour only becomes problematic when
> needles are like 4KBytes.

Indeed - when I first submitted a non-quadratic strstr/memmem, I was so 
caught up on benchmarking the large outlier cases for linear behavior 
that I forgot to benchmark the overhead of the more common short cases. 
Algorithmic complexity doesn't overcome sheer overhead of setup, so I 
have been glad to watch over time as various consumers of my original 
Two-Way code have added heuristics to avoid it for short needles.


>>> +  memset (shift, 0, sizeof (shift));
>>
>> I don't know if we have a preference for 'sizeof shift' rather than
>> 'sizeof (shift)'.
> 
> Of the 1677 occurences of sizeof in newlib, 984 are "sizeof (x)", 539 are
> "sizeof(x)" (missing space) and 74 are sizeof x, so I guess parenthesis is
> the preferred spelling.

There's also the fact that:

sizeof (int)

has to use parenthesis; grepping for the use of () is tricky if you 
don't filter out the uses that are for types rather than for expressions 
(but I _do_ prefer sizeof expression over sizeof (type), which is why I 
prefer avoiding the () to make it easier to use grep to tell the two apart).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



More information about the Newlib mailing list