This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [PATCH] Improve performance of strncpy

From: "Wilco Dijkstra" <wdijkstr at arm dot com>
To: 'Ondřej Bílka' <neleai at seznam dot cz>
Cc: "'Rich Felker'" <dalias at libc dot org>, "Florian Weimer" <fweimer at redhat dot com>, <azanella at linux dot vnet dot ibm dot com>, <libc-alpha at sourceware dot org>
Date: Fri, 12 Sep 2014 12:04:07 +0100
Subject: RE: [PATCH] Improve performance of strncpy
Authentication-results: sourceware.org; auth=none
References: <001301cfcd0a$f0b62670$d2227350$ at com> <54108BB0 dot 90902 at redhat dot com> <20140910180144 dot GK23797 at brightrain dot aerifal dot cx> <002501cfcdf7$cc046510$640d2f30$ at com> <20140912062203 dot GB19287 at domone>

> Ondřej Bílka wrote:
> On Thu, Sep 11, 2014 at 08:37:17PM +0100, Wilco Dijkstra wrote:
> > I did a quick experiment with strcpy as it's simpler. Replacing it
> > with memcpy (d, s, strlen (s) + 1) is 3 times faster even on strings
> > of 16Mbytes! Perhaps more surprisingly, it has similar performance on
> > these huge strings as an optimized strcpy.
> >
> What architecture? This could also happen because memcpy has special
> case to handle large strings that speeds this up. Its something that I
> tried in one-pass strcpy but it harms performance as overhead of checking
> size is bigger than benefit of larger size.

The 3x happens on all 3 ISAs I tried. On ARM the memcpy/strlen variant
even beats the optimized strcmp case for most sizes, on x64 it runs at
about 80% of the optimized strcpy for sizes above 4KB.

> > So the results are pretty clear, if you don't have a super optimized
> > strcpy, then strlen+memcpy is the best way to do it.
> >
> It is not that clear as you spend considerable amount of time on small
> lenghts, what is important is constant overhead of strcpy startup.
> However this needs platform specific tricks to decide which alternative
> is fastest.

The overheads are relatively small on modern cores. The memcpy/strlen
is always faster than the single loop for lengths larger than 8-16.

Wilco

References:
- RE: [PATCH] Improve performance of strncpy
  - From: Wilco Dijkstra
- Re: [PATCH] Improve performance of strncpy
  - From: Florian Weimer
- Re: [PATCH] Improve performance of strncpy
  - From: Rich Felker
- RE: [PATCH] Improve performance of strncpy
  - From: Wilco Dijkstra
- Re: [PATCH] Improve performance of strncpy
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]