[PATCH, AARCH64] Optimized memcpy

Wed Jul 8 15:06:00 GMT 2015

This is an optimized memcpy for AArch64. Copies are split into 3 main cases: small copies of up to
16 bytes, medium copies of 17..96 bytes which are fully unrolled. Large copies of more than 96 bytes
align the destination and use an unrolled loop processing 64 bytes per iteration. In order to share
code with memmove, small and medium copies read all data before writing, allowing any kind of
overlap. On a random copy test memcpy is 40.8% faster on A57 and 28.4% on A53.

ChangeLog:
2015-07-08  Wilco Dijkstra  <wdijkstr@arm.com>

	* newlib/libc/machine/aarch64/memcpy.S (memcpy):
	Rewrite of optimized memcpy.

OK for commit?
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-Optimized-memcpy.txt
URL: <http://sourceware.org/pipermail/newlib/attachments/20150708/a52d6d9f/attachment.txt>