This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC.


On 19 April 2013 22:47, Joseph S. Myers <joseph@codesourcery.com> wrote:

Hi Joseph,

> On Tue, 16 Apr 2013, Will Newton wrote:
>
>> Add a high performance memcpy routine optimized for Cortex-A15 with
>> variants for use in the presence of NEON and VFP hardware selected
>> at runtime using indirect function support.
>
> The functions __aeabi_memcpy, __aeabi_memcpy4 and __aeabi_memcpy8,
> currently implemented to call memcpy, have their ABI defined to clobber
> only the core registers permitted to be clobbered by AAPCS, and not the
> normally call-clobbered VFP/NEON registers.
>
> This patch would cause those functions to start clobbering some VFP/NEON
> registers.  So you need to do something to avoid that, whether making the
> __aeabi_* functions save and restore registers in the affected case,
> making the new functions do so or some other approach such as making
> __aeabi_* use a variant of the code with an extra save/restore.
>
> As I understand the code, memcpy within ld.so itself will always be a
> version using the core registers only, so you shouldn't have the extra
> issue of needing to avoid corrupting such registers when used for argument
> passing in the VFP ABI variant.  Though if you were to support building a
> glibc version that requires VFP/NEON, where the new code is used
> unconditionally rather than just through IFUNC - and such a glibc is a
> perfectly reasonable thing to build, after all if you are building for the
> VFP ABI then you may as well assume at least VFP to be present everywhere
> - then you would need to deal with that issue.  (Cf.
> <http://sourceware.org/ml/libc-ports/2012-04/msg00087.html>.)

I suspect adding in extra saving/restoring would be a significant
performance overhead, particularly for small copies. Would it make
sense just to make __aeabi_memcpy call the fallback arm routine? That
would mean no performance improvement for __aeabi_memcpy calls but no
performance degradation for the explicit memcpy case.


--
Will Newton
Toolchain Working Group, Linaro


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]