[PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.

Richard Earnshaw rearnsha@arm.com
Fri Apr 12 12:15:00 GMT 2013

On 12/04/13 12:52, Schwarz, Konrad wrote:
>> -----Original Message-----
>> From: Richard Earnshaw [mailto:rearnsha@arm.com]
>> Sent: Friday, April 12, 2013 11:41 AM
>> Subject: Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy
>> using NEON/VFP.
>> I would have thought these days, with hardware floating-point support
>> required by the Linux HF ABI, that this wasn't likely to be a major
>> issue.  The compiler will use FP insns freely as well whenever they are
>> available, even for data moves.  If you're using memcpy enough for
>> performance to be an issue, then you'd want to use the fastest sequence
>> possible.  If you're not, then why would you care?
> A thread's context switch time is increased if it uses
> floating point registers.  Once a thread has obtained a
> floating point context, there is no way of getting rid of
> it again.

Lazy context switching would essentially do that.

> Note that Section B1.8.4 of DDI0406B, an edition of the
> ARM V7 architecture manual, describes exactly the optimization
> I mentioned in my original post.

I'm well aware of it.  Indeed, I've written such context switching code 
myself in the past.

> So at least up to V7, the architects behind the ARM ISA cared
> about this.
> Also, I've never seen a compiler use FP instructions freely, as I expect
> compiler writers are aware of this issue.

Well I've certainly seen GCC do that rather than spill registers.

>> I see build options in the code for three variants: With Neon (and
>> VFP), with VFP only and without either.  That means that a bare metal
>> systems have the option of using an integer-only variant (as does
>> anyone else if they are really worried about using FP registers within
>> memcpy).
> How would an application select which variant to use?

Everything you're talking about here is for full OS-based systems with 
context switching.  Since Newlib is primarily (outside of Cygwin) a 
library for bare metal systems, surely we should provide developers with 
ability to chose.  Given that the code provides three variants for 
different configurations, I don't really see what you are arguing about, 
unless it is that we shouldn't give users a choice.

On Linux the code can be bound at run time using the Ifunc feature. 
Normally that would be done as a platform choice based on the hardware 
features available, but I see no reason why it couldn't have some input 
from the APP developer if that was really felt to be necessary.


More information about the Newlib mailing list