[PATCH v2] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC.
Joseph S. Myers
joseph@codesourcery.com
Fri Apr 19 21:47:00 GMT 2013
On Tue, 16 Apr 2013, Will Newton wrote:
> Add a high performance memcpy routine optimized for Cortex-A15 with
> variants for use in the presence of NEON and VFP hardware selected
> at runtime using indirect function support.
The functions __aeabi_memcpy, __aeabi_memcpy4 and __aeabi_memcpy8,
currently implemented to call memcpy, have their ABI defined to clobber
only the core registers permitted to be clobbered by AAPCS, and not the
normally call-clobbered VFP/NEON registers.
This patch would cause those functions to start clobbering some VFP/NEON
registers. So you need to do something to avoid that, whether making the
__aeabi_* functions save and restore registers in the affected case,
making the new functions do so or some other approach such as making
__aeabi_* use a variant of the code with an extra save/restore.
As I understand the code, memcpy within ld.so itself will always be a
version using the core registers only, so you shouldn't have the extra
issue of needing to avoid corrupting such registers when used for argument
passing in the VFP ABI variant. Though if you were to support building a
glibc version that requires VFP/NEON, where the new code is used
unconditionally rather than just through IFUNC - and such a glibc is a
perfectly reasonable thing to build, after all if you are building for the
VFP ABI then you may as well assume at least VFP to be present everywhere
- then you would need to deal with that issue. (Cf.
<http://sourceware.org/ml/libc-ports/2012-04/msg00087.html>.)
--
Joseph S. Myers
joseph@codesourcery.com
More information about the Libc-ports
mailing list