This is the mail archive of the
mailing list for the newlib project.
RE: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
- From: "Schwarz, Konrad" <konrad dot schwarz at siemens dot com>
- To: Richard Earnshaw <rearnsha at arm dot com>
- Cc: Will Newton <will dot newton at linaro dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, "newlib at sourceware dot org" <newlib at sourceware dot org>
- Date: Fri, 12 Apr 2013 11:52:00 +0000
- Subject: RE: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
- References: <515C5C47 dot 4090601 at linaro dot org> <5166F1AD dot 7010703 at arm dot com> <CANu=Dmhs_YKDj+6aJPO3tp_4Qk0NwhtYWUWibgy1tC3wWkd04g at mail dot gmail dot com> <8A6D8E6D161CD644B982513286072E8D02FB11 at DEFTHW99EJ1MSX dot ww902 dot siemens dot net> <5167D697 dot 5080906 at arm dot com>
> -----Original Message-----
> From: Richard Earnshaw [mailto:firstname.lastname@example.org]
> Sent: Friday, April 12, 2013 11:41 AM
> Subject: Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy
> using NEON/VFP.
> I would have thought these days, with hardware floating-point support
> required by the Linux HF ABI, that this wasn't likely to be a major
> issue. The compiler will use FP insns freely as well whenever they are
> available, even for data moves. If you're using memcpy enough for
> performance to be an issue, then you'd want to use the fastest sequence
> possible. If you're not, then why would you care?
A thread's context switch time is increased if it uses
floating point registers. Once a thread has obtained a
floating point context, there is no way of getting rid of
Note that Section B1.8.4 of DDI0406B, an edition of the
ARM V7 architecture manual, describes exactly the optimization
I mentioned in my original post.
So at least up to V7, the architects behind the ARM ISA cared
Also, I've never seen a compiler use FP instructions freely, as I expect
compiler writers are aware of this issue.
> I see build options in the code for three variants: With Neon (and
> VFP), with VFP only and without either. That means that a bare metal
> systems have the option of using an integer-only variant (as does
> anyone else if they are really worried about using FP registers within
How would an application select which variant to use?