This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.

From: Richard Earnshaw <rearnsha at arm dot com>
To: "Schwarz, Konrad" <konrad dot schwarz at siemens dot com>
Cc: Will Newton <will dot newton at linaro dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, "newlib at sourceware dot org" <newlib at sourceware dot org>
Date: Fri, 12 Apr 2013 10:40:39 +0100
Subject: Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
References: <515C5C47 dot 4090601 at linaro dot org> <5166F1AD dot 7010703 at arm dot com> <CANu=Dmhs_YKDj+6aJPO3tp_4Qk0NwhtYWUWibgy1tC3wWkd04g at mail dot gmail dot com> <8A6D8E6D161CD644B982513286072E8D02FB11 at DEFTHW99EJ1MSX dot ww902 dot siemens dot net>

On 12/04/13 08:42, Schwarz, Konrad wrote:

-----Original Message-----
From: newlib-owner@sourceware.org [mailto:newlib-owner@sourceware.org]
On Behalf Of Will Newton
Sent: Thursday, April 11, 2013 10:14 PM
To: Ramana Radhakrishnan
Cc: newlib@sourceware.org
Subject: Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy
using NEON/VFP.


My apologies if this has all been answered before, but the floating-point
units I am familiar with usually allow lazy context switching.
A common operating system optimization is give threads a floating point
context only after they actually start executing floating point operations.
(The initial floating point instruction causes a trap; the OS can
initializes the thread's floating point context before restarting
the instruction.)
Threads start off as integer (register) context only.

Won't an implementation of a nominally non-floating point function
such as memcpy break this optimization, and perhaps be harmful from
a system standpoint?

In bare-metal systems, the OS/Executive/whatever may not even have
have floating point support and would be in for a rude surprise
when they call memcpy().

Konrad Schwarz

I would have thought these days, with hardware floating-point supportrequired by the Linux HF ABI, that this wasn't likely to be a majorissue. The compiler will use FP insns freely as well whenever they areavailable, even for data moves. If you're using memcpy enough forperformance to be an issue, then you'd want to use the fastest sequencepossible. If you're not, then why would you care?

I see build options in the code for three variants: With Neon (and VFP),with VFP only and without either. That means that a bare metal systemshave the option of using an integer-only variant (as does anyone else ifthey are really worried about using FP registers within memcpy).

R.

Follow-Ups:
- RE: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
  - From: Schwarz, Konrad

References:
- [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
  - From: Will Newton
- Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
  - From: Ramana Radhakrishnan
- Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
  - From: Will Newton
- RE: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.
  - From: Schwarz, Konrad

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]