This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][PING] Improve stpncpy performance
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 12 Aug 2015 22:57:44 +0200
- Subject: Re: [PATCH][PING] Improve stpncpy performance
- Authentication-results: sourceware.org; auth=none
- References: <000301d0b7df$02f253d0$08d6fb70$ at com> <20150709110655 dot GA29253 at domone> <000c01d0ba58$44d1c170$ce754450$ at com>
On Thu, Jul 09, 2015 at 04:02:26PM +0100, Wilco Dijkstra wrote:
> > OndÅej BÃlka wrote:
> >
> > You don't have to use special case
> >
> > if (size == n)
> > return dest;
> >
> > as it should be handled by
> >
> > return memset (dest, '\0', 0);
> >
> > That could improve performance a bit if its rare case. That doesn't
> > matter much as memset makes that function slow and it shouldn't be
> > used in performance sensitive code.
> >
> > Otherwise ok for me.
>
> On the benchtests the extra if made a significant difference, particularly
> since memset of 0 is relatively expensive as it is being regarded as a very
> rare case. It seems it should be less likely than the benchtests indicate,
> but we'd have to fix the benchtest first to use more realistic data.
>
Ok, I did data collection and I take my objection back as it almost
always happens in bash. I was surprised why it needs to use strncpy to
copy small number of bytes.
When I tested dryrun benchmark special casing is faster. I got following
data on strncpy but no on stpncpy so we could reuse that for stpcpy patch that you also submitted.
replaying bash
calls 194
average n: 15.6082 n <= 0: 4.6% n <= 4: 6.7% n <= 8: 43.3% n <= 16: 80.9% n <= 24: 88.7% n <= 32: 88.7% n <= 48: 91.2% n <= 64: 98.5%
s aligned to 4 bytes: 99.5% 8 bytes: 97.9% 16 bytes: 0.5%
average *s access cache latency 0.9072 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 34.0% 8 bytes: 24.2% 16 bytes: 1.5%
s-s2 aligned to 4 bytes: 34.5% 8 bytes: 22.7% 16 bytes: 22.7%
average *s2 access cache latency 1.1186 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 15.6082 c <= 0: 4.6% c <= 4: 6.7% c <= 8: 43.3% c <= 16: 80.9% c <= 24: 88.7% c <= 32: 88.7% c <= 48: 91.2% c <= 64: 98.5% n == capa : 100.0%
replaying mc
calls 6971
average n: 9.0773 n <= 0: 2.7% n <= 4: 54.3% n <= 8: 71.9% n <= 16: 85.5% n <= 24: 91.0% n <= 32: 94.2% n <= 48: 96.9% n <= 64: 98.7%
s aligned to 4 bytes: 100.0% 8 bytes: 100.0% 16 bytes: 100.0%
average *s access cache latency 36.3347 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 55.3% 8 bytes: 49.5% 16 bytes: 48.1%
s-s2 aligned to 4 bytes: 55.3% 8 bytes: 49.5% 16 bytes: 48.1%
average *s2 access cache latency 1.0126 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 9.5847 c <= 0: 0.7% c <= 4: 52.2% c <= 8: 70.9% c <= 16: 84.9% c <= 24: 90.7% c <= 32: 93.8% c <= 48: 96.5% c <= 64: 98.7% n == capa : 63.6%
replaying mutt
calls 10415
average n: 7.6572 n <= 0: 0.1% n <= 4: 68.6% n <= 8: 82.5% n <= 16: 86.4% n <= 24: 87.7% n <= 32: 94.5% n <= 48: 96.6% n <= 64: 98.9%
s aligned to 4 bytes: 57.9% 8 bytes: 49.1% 16 bytes: 45.2%
average *s access cache latency 1.1092 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 85.7% 8 bytes: 79.8% 16 bytes: 79.5%
s-s2 aligned to 4 bytes: 43.6% 8 bytes: 28.9% 16 bytes: 24.7%
average *s2 access cache latency 1.1324 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 51.2750 c <= 0: 0.0% c <= 4: 65.3% c <= 8: 73.0% c <= 16: 74.6% c <= 24: 75.2% c <= 32: 75.3% c <= 48: 75.3% c <= 64: 75.3% n == capa : 64.4%
replaying /bin/bash
calls 60
average n: 10.3167 n <= 0: 1.7% n <= 4: 26.7% n <= 8: 56.7% n <= 16: 88.3% n <= 24: 98.3% n <= 32: 98.3% n <= 48: 98.3% n <= 64: 98.3%
s aligned to 4 bytes: 100.0% 8 bytes: 100.0% 16 bytes: 1.7%
average *s access cache latency 0.8833 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 36.7% 8 bytes: 33.3% 16 bytes: 3.3%
s-s2 aligned to 4 bytes: 36.7% 8 bytes: 33.3% 16 bytes: 31.7%
average *s2 access cache latency 0.9167 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 10.3167 c <= 0: 1.7% c <= 4: 26.7% c <= 8: 56.7% c <= 16: 88.3% c <= 24: 98.3% c <= 32: 98.3% c <= 48: 98.3% c <= 64: 98.3% n == capa : 100.0%
replaying as
calls 122
average n: 6.8115 n <= 0: 0.8% n <= 4: 6.6% n <= 8: 95.1% n <= 16: 98.4% n <= 24: 100.0% n <= 32: 100.0% n <= 48: 100.0% n <= 64: 100.0%
s aligned to 4 bytes: 100.0% 8 bytes: 100.0% 16 bytes: 100.0%
average *s access cache latency 1.0410 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
s2 aligned to 4 bytes: 25.4% 8 bytes: 13.1% 16 bytes: 5.7%
s-s2 aligned to 4 bytes: 25.4% 8 bytes: 13.1% 16 bytes: 5.7%
average *s2 access cache latency 0.9262 l <= 8: 100.0% l <= 16: 100.0% l <= 32: 100.0% l <= 64: 100.0% l <= 128: 100.0%
average capacity: 126.9508 c <= 0: 0.8% c <= 4: 0.8% c <= 8: 0.8% c <= 16: 0.8% c <= 24: 0.8% c <= 32: 0.8% c <= 48: 0.8% c <= 64: 0.8% n == capa : 0.8%