Re: [Patch, AArch64] Optimized strcpy

On 17/12/14 12:22, Richard Earnshaw wrote:
> On 17/12/14 12:12, Richard Earnshaw wrote:
>> This patch contains an optimized implementation of strcpy for AArch64
>> systems.  Benchmarking shows that it is approximately 20-25% faster than
>> the generic implementation across the board.
>> R.
>> <date>  Richard Earnshaw  <>
>> 	* sysdeps/aarch64/strcpy.S: New file.

Following the various discussions about the above, I've done some
further tweaking of the code and indeed there some further performance
improvements, particularly for short strings.

I think this is likely to be the final version (at least, for 2.21).

Changes this time around:

- Add the ability to build the code as stpcpy().

- Small change to the page crossing check, that uses the same number of
instructions, but could be faster on some micro-architectures.

- For the slow (page crossing) check, once a page cross is known to
occur, jump to the normal entry point.

- For big-endian only, on the first check we pre-reverse the bytes so
that we don't have to recalculate the syndrome in the (likely) case that
the string is short.

- For the initial unaligned fetch, detect zeros in the first and second
DWords independently and jump to the relevant epilogue sequence
directly.  This eliminates another level of branching later on for the
special cases when we have to use sub-dword sized stores

- Other changes are mostly re-ordering of the hunks of code and
micro-optimizations that fall out of the above changes.


	* sysdeps/aarch64/strcpy.S: New file.
	* sysdeps/aarch64/stpcpy.S: New file.

