This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Optimization for strncpy and stpncpy on Power7
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Tue, 06 May 2014 13:55:38 -0300
- Subject: Re: [PATCH] Optimization for strncpy and stpncpy on Power7
- Authentication-results: sourceware.org; auth=none
- References: <1399057865-9822-1-git-send-email-vidya at linux dot vnet dot ibm dot com>
On 02-05-2014 16:11, email@example.com wrote:
> From: Vidya Ranganathan <firstname.lastname@example.org>
> The optimization is achieved by following techniques:
> > data alignment [gain from aligned memory access on read/write]
> > prefetch data [gain from cache misses by anticipating load]
> > POWER7 gains performance with loop unrolling/unwinding
> [gain by reduction of branch penalty].
Thanks for the patch, I pushed upstream as f360f94a05570045be615649e9a411cefba2e210
with some modifications:
* sysdeps/powerpc/powerpc64/multiarch/strncpy.c header indentation was bogus
* I removed the pre-fetch instruction on strncpy.S for some reason:
1. It didn't show performance difference in my tests
2. The usage for r3 is wrong, it should be dcbtst since the memory location will
3. Also, on other string/memory operations it is used on loops with large sizes.
Usually for few bytes the cost is not shown (as I saw) and you usage just
prefetch the first cache-line.
If you find a new utilization that shows a better performance, please send a patch
* Added a comment saying the algorithms is using memset to zero pad the final bytes
(different than default algorithm)