This is the mail archive of the
mailing list for the glibc project.
RE: [PATCH] Improve performance of strncpy
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: "'Rich Felker'" <dalias at libc dot org>, "Florian Weimer" <fweimer at redhat dot com>
- Cc: <azanella at linux dot vnet dot ibm dot com>, <libc-alpha at sourceware dot org>
- Date: Wed, 10 Sep 2014 19:25:21 +0100
- Subject: RE: [PATCH] Improve performance of strncpy
- Authentication-results: sourceware.org; auth=none
- References: <001301cfcd0a$f0b62670$d2227350$ at com> <54108BB0 dot 90902 at redhat dot com> <20140910180144 dot GK23797 at brightrain dot aerifal dot cx>
> Rich Felker wrote:
> On Wed, Sep 10, 2014 at 07:34:40PM +0200, Florian Weimer wrote:
> > On 09/10/2014 05:21 PM, Wilco Dijkstra wrote:
> > >Yes, you're right, I timed it and there is actually little difference, while
> > >the code is now even simpler. New version below (not attaching results in bad
> > >characters due to various mail servers changing line endings).
> > >
> > >OK for commit?
> > I think you could simplify it down to strnlen, memcpy, and memset.
> I don't think that's an improvement, at least not in the general case.
> It involves iterating twice over the source string, which for long
> strings could mean blowing the whole cache twice and fetching from
> main memory twice. There's a good reason that string operations are
> usually implemented to perform the copy and length computation
> together in a single pass.
Few strings will be larger than the typical L1 size of 32KB. You're right
that it is best to do a single pass in a highly optimized implementation.
However the issue is that the C versions are so slow that even doing 2
passes will be significantly faster due to processing 8 bytes at a time -
likely even if much larger than L1 (I'll check that).
The goal of these patches is to ensure the C string routines are quite
competitive out of the box, and benefit further when you add a few highly
optimized routines (eg. strlen/strcpy). That means new targets are not
forced to add optimized versions of all of the string routines in order to
get decent performance (as unfortunately is the case today).