[PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
Fri Apr 12 12:15:00 GMT 2013
On 12/04/13 12:52, Schwarz, Konrad wrote:
>> -----Original Message-----
>> From: Richard Earnshaw [mailto:firstname.lastname@example.org]
>> Sent: Friday, April 12, 2013 11:41 AM
>> Subject: Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy
>> using NEON/VFP.
>> I would have thought these days, with hardware floating-point support
>> required by the Linux HF ABI, that this wasn't likely to be a major
>> issue. The compiler will use FP insns freely as well whenever they are
>> available, even for data moves. If you're using memcpy enough for
>> performance to be an issue, then you'd want to use the fastest sequence
>> possible. If you're not, then why would you care?
> A thread's context switch time is increased if it uses
> floating point registers. Once a thread has obtained a
> floating point context, there is no way of getting rid of
> it again.
Lazy context switching would essentially do that.
> Note that Section B1.8.4 of DDI0406B, an edition of the
> ARM V7 architecture manual, describes exactly the optimization
> I mentioned in my original post.
I'm well aware of it. Indeed, I've written such context switching code
myself in the past.
> So at least up to V7, the architects behind the ARM ISA cared
> about this.
> Also, I've never seen a compiler use FP instructions freely, as I expect
> compiler writers are aware of this issue.
Well I've certainly seen GCC do that rather than spill registers.
>> I see build options in the code for three variants: With Neon (and
>> VFP), with VFP only and without either. That means that a bare metal
>> systems have the option of using an integer-only variant (as does
>> anyone else if they are really worried about using FP registers within
> How would an application select which variant to use?
Everything you're talking about here is for full OS-based systems with
context switching. Since Newlib is primarily (outside of Cygwin) a
library for bare metal systems, surely we should provide developers with
ability to chose. Given that the code provides three variants for
different configurations, I don't really see what you are arguing about,
unless it is that we shouldn't give users a choice.
On Linux the code can be bound at run time using the Ifunc feature.
Normally that would be done as a platform choice based on the hardware
features available, but I see no reason why it couldn't have some input
from the APP developer if that was really felt to be necessary.
More information about the Newlib