This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Improve performance of strncpy
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: 'Rich Felker' <dalias at libc dot org>, Florian Weimer <fweimer at redhat dot com>, azanella at linux dot vnet dot ibm dot com, libc-alpha at sourceware dot org
- Date: Fri, 12 Sep 2014 08:22:03 +0200
- Subject: Re: [PATCH] Improve performance of strncpy
- Authentication-results: sourceware.org; auth=none
- References: <001301cfcd0a$f0b62670$d2227350$ at com> <54108BB0 dot 90902 at redhat dot com> <20140910180144 dot GK23797 at brightrain dot aerifal dot cx> <002501cfcdf7$cc046510$640d2f30$ at com>
On Thu, Sep 11, 2014 at 08:37:17PM +0100, Wilco Dijkstra wrote:
> > Rich Felker wrote:
> > On Wed, Sep 10, 2014 at 07:34:40PM +0200, Florian Weimer wrote:
> > > On 09/10/2014 05:21 PM, Wilco Dijkstra wrote:
> > > >Yes, you're right, I timed it and there is actually little difference, while
> > > >the code is now even simpler. New version below (not attaching results in bad
> > > >characters due to various mail servers changing line endings).
> > > >
> > > >OK for commit?
> > >
> > > I think you could simplify it down to strnlen, memcpy, and memset.
> > I don't think that's an improvement, at least not in the general case.
> > It involves iterating twice over the source string, which for long
> > strings could mean blowing the whole cache twice and fetching from
> > main memory twice. There's a good reason that string operations are
> > usually implemented to perform the copy and length computation
> > together in a single pass.
No that is what you think but never tested it. A problem is that in
strcpy you spend most of time in call that are at most 256 bytes large.
And if you spend 99% of time in some path and you slow down it by 2% to
make case in with you spend 1% of time faster it is bad idea even if you
could make that 1% for free.
Give me name of linux package that regularly does strcpy of 32k+ strings,
until then you should not focus on case that does not happen.
Anyway we talk about strncpy here, anybody that uses it does not care
about performance, its terrible by design, like write 64 byte string then do
useless zeroing of remaining 4032 bytes speaks for itself.
And as main usage is for fixed size buffers and these buffers are
typically less than 32k its even more dubious to optimize for large
> I did a quick experiment with strcpy as it's simpler. Replacing it
> with memcpy (d, s, strlen (s) + 1) is 3 times faster even on strings
> of 16Mbytes! Perhaps more surprisingly, it has similar performance on
> these huge strings as an optimized strcpy.
What architecture? This could also happen because memcpy has special
case to handle large strings that speeds this up. Its something that I
tried in one-pass strcpy but it harms performance as overhead of checking
size is bigger than benefit of larger size.
> So the results are pretty clear, if you don't have a super optimized
> strcpy, then strlen+memcpy is the best way to do it.
It is not that clear as you spend considerable amount of time on small
lenghts, what is important is constant overhead of strcpy startup.
However this needs platform specific tricks to decide which alternative