PATCH] Improve performance of memmem

Corinna Vinschen vinschen@redhat.com
Wed Dec 19 12:16:00 GMT 2018


On Dec 18 14:32, Eric Blake wrote:
> On 12/18/18 2:23 PM, Wilco Dijkstra wrote:
> > > ...this is the naive quadratic implementation.  But it does certainly
> > > qualify as the algorithm to use for someone wanting a compilation as
> > > small as possible, and who doesn't care about speed (most likely because
> > > their needles are unlikely to be large enough to notice the effects of
> > > quadratic slowdowns).
> > 
> > However the interesting thing is that this naive implementation is actually
> > faster than the existing Two-Way implementation on all typical inputs: it is
> > > 50% faster for any needle up to 32 bytes on any haystack size. So it
> > would have been better if memmem used the naive implementation for
> > small needles. The quadratic behaviour only becomes problematic when
> > needles are like 4KBytes.
> 
> Indeed - when I first submitted a non-quadratic strstr/memmem, I was so
> caught up on benchmarking the large outlier cases for linear behavior that I
> forgot to benchmark the overhead of the more common short cases. Algorithmic
> complexity doesn't overcome sheer overhead of setup, so I have been glad to
> watch over time as various consumers of my original Two-Way code have added
> heuristics to avoid it for short needles.
> 
> 
> > > > +  memset (shift, 0, sizeof (shift));
> > > 
> > > I don't know if we have a preference for 'sizeof shift' rather than
> > > 'sizeof (shift)'.
> > 
> > Of the 1677 occurences of sizeof in newlib, 984 are "sizeof (x)", 539 are
> > "sizeof(x)" (missing space) and 74 are sizeof x, so I guess parenthesis is
> > the preferred spelling.
> 
> There's also the fact that:
> 
> sizeof (int)
> 
> has to use parenthesis; grepping for the use of () is tricky if you don't
> filter out the uses that are for types rather than for expressions (but I
> _do_ prefer sizeof expression over sizeof (type), which is why I prefer
> avoiding the () to make it easier to use grep to tell the two apart).

Not a fixed rule since, afaik, BSD code always uses parens in sizeof
expressions.  Apart from that I think it's a good idea to use

  sizeof (type) 

vs.

  sizeof var

but again, there's no strong rule.


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/newlib/attachments/20181219/ca463378/attachment.sig>


More information about the Newlib mailing list