This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
RE: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
- From: "Schwarz, Konrad" <konrad dot schwarz at siemens dot com>
- To: Richard Earnshaw <rearnsha at arm dot com>
- Cc: Will Newton <will dot newton at linaro dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, "newlib at sourceware dot org" <newlib at sourceware dot org>
- Date: Fri, 12 Apr 2013 11:52:00 +0000
- Subject: RE: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
- References: <515C5C47 dot 4090601 at linaro dot org> <5166F1AD dot 7010703 at arm dot com> <CANu=Dmhs_YKDj+6aJPO3tp_4Qk0NwhtYWUWibgy1tC3wWkd04g at mail dot gmail dot com> <8A6D8E6D161CD644B982513286072E8D02FB11 at DEFTHW99EJ1MSX dot ww902 dot siemens dot net> <5167D697 dot 5080906 at arm dot com>
> -----Original Message-----
> From: Richard Earnshaw [mailto:rearnsha@arm.com]
> Sent: Friday, April 12, 2013 11:41 AM
> Subject: Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy
> using NEON/VFP.
> I would have thought these days, with hardware floating-point support
> required by the Linux HF ABI, that this wasn't likely to be a major
> issue. The compiler will use FP insns freely as well whenever they are
> available, even for data moves. If you're using memcpy enough for
> performance to be an issue, then you'd want to use the fastest sequence
> possible. If you're not, then why would you care?
A thread's context switch time is increased if it uses
floating point registers. Once a thread has obtained a
floating point context, there is no way of getting rid of
it again.
Note that Section B1.8.4 of DDI0406B, an edition of the
ARM V7 architecture manual, describes exactly the optimization
I mentioned in my original post.
So at least up to V7, the architects behind the ARM ISA cared
about this.
Also, I've never seen a compiler use FP instructions freely, as I expect
compiler writers are aware of this issue.
> I see build options in the code for three variants: With Neon (and
> VFP), with VFP only and without either. That means that a bare metal
> systems have the option of using an integer-only variant (as does
> anyone else if they are really worried about using FP registers within
> memcpy).
How would an application select which variant to use?
Thanks,
Konrad Schwarz