Re: [PATCH] Optimization for strncpy and stpncpy on Power7

On 02-05-2014 16:11, wrote:
> From: Vidya Ranganathan <>
> The optimization is achieved by following techniques:
>    > data alignment [gain from aligned memory access on read/write]
>    > prefetch data [gain from cache misses by anticipating load]
>    > POWER7 gains performance with loop unrolling/unwinding
>       [gain by reduction of branch penalty].

Hi Vidya,

Thanks for the patch, I pushed upstream as f360f94a05570045be615649e9a411cefba2e210
with some modifications:

* sysdeps/powerpc/powerpc64/multiarch/strncpy.c header indentation was bogus

* I removed the pre-fetch instruction on strncpy.S for some reason:
  1. It didn't show performance difference in my tests
  2. The usage for r3 is wrong, it should be dcbtst since the memory location will
     be written.
  3. Also, on other string/memory operations it is used on loops with large sizes.
     Usually for few bytes the cost is not shown (as I saw) and you usage just 
     prefetch the first cache-line.

  If you find a new utilization that shows a better performance, please send a patch
  with results

* Added a comment saying the algorithms is using memset to zero pad the final bytes
  (different than default algorithm)

