This is the mail archive of the
mailing list for the glibc project.
RE: [PATCH 4/*] Generic string memchr and strnlen
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: 'OndÅej BÃlka' <neleai at seznam dot cz>
- Cc: "'GNU C Library'" <libc-alpha at sourceware dot org>
- Date: Mon, 27 Jul 2015 14:54:21 +0100
- Subject: RE: [PATCH 4/*] Generic string memchr and strnlen
- Authentication-results: sourceware.org; auth=none
- References: <003b01d0c622$dd96f530$98c4df90$ at com> <20150724153758 dot GA855 at domone> <003c01d0c62f$3440aa00$9cc1fe00$ at com> <20150724170425 dot GA10041 at domone> <003d01d0c63b$0080b730$01822590$ at com> <20150724191248 dot GA2889 at domone>
> OndÅej BÃlka wrote:
> On Fri, Jul 24, 2015 at 07:03:10PM +0100, Wilco Dijkstra wrote:
> > > OndÅej BÃlka wrote:
> > > On Fri, Jul 24, 2015 at 05:38:43PM +0100, Wilco Dijkstra wrote:
> > > > > OndÅej BÃlka wrote:
> > > > > On Fri, Jul 24, 2015 at 04:10:24PM +0100, Wilco Dijkstra wrote:
> > Which alternatives? I didn't see a mention of an alternative that would
> > actually be faster.
> As I wrote before for example derive strnlen from memchr assembly.
That's not a realistic alternative at all for most targets. I am talking
about a generic C solution that applies to all targets, similar to what I
did for strcpy, strcat, mempcpy calling optimized strlen and memcpy.
> > I'd find it hard to believe you can beat assembly implementations. Do you
> > have any performance results for your patches? There were a lot of patches
> > posted but I don't recall any performance results in any.
> That isn't hard to believe, actually its quite easy. If current assembly
> isn't particulary good then c implementation will beat assembly.
That's extremely unlikely. The generic C implementations that we haven't
fixed already are quite inefficient, while the assembly implementations I've
looked at were all efficient.
> And as I read assembly it isn't particulary well optimized for most
> architectures. In most cases code looks like you would take current
> generic implementation and ran gcc -S. For example most assemlbly
> implementations don't do loop unrolling.
Loop unrolling doesn't always increase performance.
> As for performance results I asked maintainers to run them but didn't
> get response. I don't have access to exotic architectures so I couldn't
> provide it.
All we need is results for a few popular architectures. Without any result
I am very sceptical that it beats any assembly implementation.
> > That's only possible in a few cases. I'm talking about missing optimized
> > implementations. Are you saying we should continue to use slow C code rather
> > than trying to call an optimized assembler function?
> What cases are you talking about. All of strnlen, strlen, rawmemchr
> could be obtained from memchr assembly by doing dead code elimination of
> unused code.
A case where it may be possible is strchrnul vs strchr for example where it is
just a slightly different return value. Other cases are non-trivial and if you
don't spend a significant effort on optimizing each function you end up with
We have lots of targets which are missing optimized functions, but which do have
some related ones that can be called from generic code instead. Clearly doing
that is a far more practical and realistic approach than talking about how easy
it is to do dead-code elimination in assembly code when nobody does that.
Or are you planning to do the dead-code elimination yourself? If not, we need
to fix the generic code.
> > > > > Suggestion to express strlen as memchr would just cause regression. On
> > > > > my system there happened 9535682 calls of strlen while memchr was called
> > > > > just 11633 times and rawmemchr 1742 times.
> > > >
> > > > Why would it cause a regression? If you don't have an optimized strlen,
> > > > what other implementation would be the fastest alternative?
> > > >
> > > It would be my generic strlen implementation. If you don't have
> > > optimized strlen then you certainly don't have optimized memchr that is
> > > called 819 times less often.
> > Well I'd like to see results that show a C version of strlen beating an
> > optimized memchr on x64. Still it seems to me there is no real need for an
> > optimized C version of strlen - every target already provides an optimized
> > version and it is hard to believe it is possible to beat those.
> Thats trivial, I attached what I have. When I compile it I get
Please reread the subject, we're talking about generic C code...
> And for strlen being available it was exactly my point as its more important to
> have assembly strlen than using memchr/rawmechr so your case wont happen.
Strlen is implemented in most targets so that is not an issue indeed, however
there are quite a few targets which do have memchr but no strnlen, so that
scenario is certainly common.
> > That's only true for few cases. Note given its rarity, it seems better
> > to change any call to rawmemchr into memchr(s, c, SIZE_MAX) - the gain
> > due to cache sharing should far outweigh the small loss due to the extra
> > length checks.
> Thats also false as saving is much more than few cycles. Also that it
> doesn't follow that if function has low number of total calls it would
> be cold, just that there are few applications that use them. And in
> application that uses them it could use it only in one tight loop. Here
> rawmemchr is used only in mutt while strlen is used everywhere.
> A performance difference could be quite big in x64, from benchtests;
It looks like that implementation isn't very optimal then. Decrementing a
counter every 64 bytes is not a lot of overhead.