This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][PING] Improve stpncpy performance

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Wilco Dijkstra <wdijkstr at arm dot com>
Cc: libc-alpha at sourceware dot org
Date: Wed, 12 Aug 2015 22:57:44 +0200
Subject: Re: [PATCH][PING] Improve stpncpy performance
Authentication-results: sourceware.org; auth=none
References: <000301d0b7df$02f253d0$08d6fb70$ at com> <20150709110655 dot GA29253 at domone> <000c01d0ba58$44d1c170$ce754450$ at com>

On Thu, Jul 09, 2015 at 04:02:26PM +0100, Wilco Dijkstra wrote:
> > OndÅej BÃlka wrote:
> >
> > You don't have to use special case
> > 
> > if (size == n)
> >   return dest;
> > 
> > as it should be handled by
> > 
> > return memset (dest, '\0', 0);
> > 
> > That could improve performance a bit if its rare case. That doesn't
> > matter much as memset makes that function slow and it shouldn't be
> > used in performance sensitive code.
> > 
> > Otherwise ok for me.
> 
> On the benchtests the extra if made a significant difference, particularly
> since memset of 0 is relatively expensive as it is being regarded as a very
> rare case. It seems it should be less likely than the benchtests indicate,
> but we'd have to fix the benchtest first to use more realistic data.
>
Ok, I did data collection and I take my objection back as it almost
always happens in bash. I was surprised why it needs to use strncpy to
copy small number of bytes.  

When I tested dryrun benchmark special casing is faster. I got following
data on strncpy but no on stpncpy so we could reuse that for stpcpy patch that you also submitted.

replaying bash

calls 194

average n:  15.6082    n <= 0:   4.6% n <= 4:   6.7% n <= 8:  43.3% n <= 16:  80.9% n <= 24:  88.7% n <= 32:  88.7% n <= 48:  91.2% n <= 64:  98.5% 
s aligned to 4 bytes:  99.5%  8 bytes:  97.9% 16 bytes:   0.5% 
average *s access cache latency  0.9072    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
s2 aligned to 4 bytes:  34.0%  8 bytes:  24.2% 16 bytes:   1.5% 
s-s2 aligned to 4 bytes:  34.5%  8 bytes:  22.7% 16 bytes:  22.7% 
average *s2 access cache latency  1.1186    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
average capacity:  15.6082    c <= 0:   4.6% c <= 4:   6.7% c <= 8:  43.3% c <= 16:  80.9% c <= 24:  88.7% c <= 32:  88.7% c <= 48:  91.2% c <= 64:  98.5% n == capa : 100.0% 

replaying mc

calls 6971

average n:  9.0773    n <= 0:   2.7% n <= 4:  54.3% n <= 8:  71.9% n <= 16:  85.5% n <= 24:  91.0% n <= 32:  94.2% n <= 48:  96.9% n <= 64:  98.7% 
s aligned to 4 bytes: 100.0%  8 bytes: 100.0% 16 bytes: 100.0% 
average *s access cache latency  36.3347    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
s2 aligned to 4 bytes:  55.3%  8 bytes:  49.5% 16 bytes:  48.1% 
s-s2 aligned to 4 bytes:  55.3%  8 bytes:  49.5% 16 bytes:  48.1% 
average *s2 access cache latency  1.0126    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
average capacity:  9.5847    c <= 0:   0.7% c <= 4:  52.2% c <= 8:  70.9% c <= 16:  84.9% c <= 24:  90.7% c <= 32:  93.8% c <= 48:  96.5% c <= 64:  98.7% n == capa :  63.6% 

replaying mutt

calls 10415

average n:  7.6572    n <= 0:   0.1% n <= 4:  68.6% n <= 8:  82.5% n <= 16:  86.4% n <= 24:  87.7% n <= 32:  94.5% n <= 48:  96.6% n <= 64:  98.9% 
s aligned to 4 bytes:  57.9%  8 bytes:  49.1% 16 bytes:  45.2% 
average *s access cache latency  1.1092    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
s2 aligned to 4 bytes:  85.7%  8 bytes:  79.8% 16 bytes:  79.5% 
s-s2 aligned to 4 bytes:  43.6%  8 bytes:  28.9% 16 bytes:  24.7% 
average *s2 access cache latency  1.1324    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
average capacity:  51.2750    c <= 0:   0.0% c <= 4:  65.3% c <= 8:  73.0% c <= 16:  74.6% c <= 24:  75.2% c <= 32:  75.3% c <= 48:  75.3% c <= 64:  75.3% n == capa :  64.4% 

replaying /bin/bash

calls 60

average n:  10.3167    n <= 0:   1.7% n <= 4:  26.7% n <= 8:  56.7% n <= 16:  88.3% n <= 24:  98.3% n <= 32:  98.3% n <= 48:  98.3% n <= 64:  98.3% 
s aligned to 4 bytes: 100.0%  8 bytes: 100.0% 16 bytes:   1.7% 
average *s access cache latency  0.8833    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
s2 aligned to 4 bytes:  36.7%  8 bytes:  33.3% 16 bytes:   3.3% 
s-s2 aligned to 4 bytes:  36.7%  8 bytes:  33.3% 16 bytes:  31.7% 
average *s2 access cache latency  0.9167    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
average capacity:  10.3167    c <= 0:   1.7% c <= 4:  26.7% c <= 8:  56.7% c <= 16:  88.3% c <= 24:  98.3% c <= 32:  98.3% c <= 48:  98.3% c <= 64:  98.3% n == capa : 100.0% 

replaying as

calls 122

average n:  6.8115    n <= 0:   0.8% n <= 4:   6.6% n <= 8:  95.1% n <= 16:  98.4% n <= 24: 100.0% n <= 32: 100.0% n <= 48: 100.0% n <= 64: 100.0% 
s aligned to 4 bytes: 100.0%  8 bytes: 100.0% 16 bytes: 100.0% 
average *s access cache latency  1.0410    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
s2 aligned to 4 bytes:  25.4%  8 bytes:  13.1% 16 bytes:   5.7% 
s-s2 aligned to 4 bytes:  25.4%  8 bytes:  13.1% 16 bytes:   5.7% 
average *s2 access cache latency  0.9262    l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0% 
average capacity:  126.9508    c <= 0:   0.8% c <= 4:   0.8% c <= 8:   0.8% c <= 16:   0.8% c <= 24:   0.8% c <= 32:   0.8% c <= 48:   0.8% c <= 64:   0.8% n == capa :   0.8%

Follow-Ups:
- RE: [PATCH][PING] Improve stpncpy performance
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]