[PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
Fri Apr 12 11:52:00 GMT 2013
> -----Original Message-----
> From: Richard Earnshaw [mailto:firstname.lastname@example.org]
> Sent: Friday, April 12, 2013 11:41 AM
> Subject: Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy
> using NEON/VFP.
> I would have thought these days, with hardware floating-point support
> required by the Linux HF ABI, that this wasn't likely to be a major
> issue. The compiler will use FP insns freely as well whenever they are
> available, even for data moves. If you're using memcpy enough for
> performance to be an issue, then you'd want to use the fastest sequence
> possible. If you're not, then why would you care?
A thread's context switch time is increased if it uses
floating point registers. Once a thread has obtained a
floating point context, there is no way of getting rid of
Note that Section B1.8.4 of DDI0406B, an edition of the
ARM V7 architecture manual, describes exactly the optimization
I mentioned in my original post.
So at least up to V7, the architects behind the ARM ISA cared
Also, I've never seen a compiler use FP instructions freely, as I expect
compiler writers are aware of this issue.
> I see build options in the code for three variants: With Neon (and
> VFP), with VFP only and without either. That means that a bare metal
> systems have the option of using an integer-only variant (as does
> anyone else if they are really worried about using FP registers within
How would an application select which variant to use?
More information about the Newlib