This is the mail archive of the
mailing list for the newlib project.
Re: MMU Off / Strict Alignment
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: "Jonathan S. Shapiro" <shap at eros-os dot org>
- Cc: Christopher Covington <cov at codeaurora dot org>, "newlib at sourceware dot org" <newlib at sourceware dot org>, Marcus Shawcroft <marcus dot shawcroft at linaro dot org>, Matthew Gretton-Dann <matthew dot gretton-dann at linaro dot org>, "linaro-toolchain at lists dot linaro dot org" <linaro-toolchain at lists dot linaro dot org>
- Date: Thu, 19 Dec 2013 10:18:28 +0000
- Subject: Re: MMU Off / Strict Alignment
- Authentication-results: sourceware.org; auth=none
- References: <528CF7F1 dot 5050001 at codeaurora dot org> <CADSXKXqJgD3cq594+NeRk9=QHA1DKh3o7aPjsVYOx5OqT1Y6pw at mail dot gmail dot com> <52AF3E5A dot 4050507 at codeaurora dot org> <52B00D46 dot 6050302 at arm dot com> <CAAP=3QN-NHH+bONrB3P6oCEQ8R-aaULxJcR2_T_EdH5_EkZyQg at mail dot gmail dot com>
On 18/12/13 05:06, Jonathan S. Shapiro wrote:
> At the risk of sticking my nose in, this isn't a startup code issue.
> It's a contract issue.
> First, I don't buy Richard's argument about memcpy() startup costs and
> hard-to-predict branches. We do those tests on essentially every
> *other* RISC platform without complaint, and it's very easy to order
> those branches so that the currently efficient cases run well. Perhaps
> more to the point, I haven't seen anybody put forward quantitative
> data that using the MMU for unaligned references is any better than
> executing those branches. Speaking as a recovering processor
> architect, that assumption needs to be validated quantitatively. My
> guess is that the branches are faster if properly arranged.
> Second, this is a contract issue. If newlib intends to support
> embedded platforms, then it needs to implement algorithms that are
> functionally correct without relying on an MMU. By all means use
> simpler or smarter algorithms when an MMU can be assumed to be
> available in a given configuration, but provide an algorithm that is
> functionally correct when no MMU is available. "Good overall
> performance in memcpy" is a fine thing, but it is subject to the
> requirement of meeting functional specifications. As Jochen Liedtke
> famously put it (read this in a heavy German accent): "Fast, ya. But
> correct? (shrug) Eh!"
> So: we need a normative statement saying what the contract is. The
> rest of the answer will fall out from that.
> I do agree with Richard that startup code is special. I've built
> deeply embedded runtimes of one form or another for 25 years now, and
> I have yet to see a system where optimizing a simplistic byte-wise
> memcpy during bootstrap would have made any difference in anything
> overall. That said, if the specification of memcpy requires it to
> handle incompatibly aligned pointers (and it does), and the contract
> for newlib requires it to operate in MMU-less scenarios in a given
> configuration (which, at least in some cases, it does), it's
> completely legitimate to expect that bootstrap code can call memcpy()
> and expect behavior that meets specifications.
> So what's the contract?
I disagree with your assertion that newlib *requires* it to operate in
an MMU-less scenario for all targets; it only does so when the target
can reasonably be expected to not have an MMU.
The only contract that exists is the one written in the C standard:
18.104.22.168#2 The memcpy function copies n characters from the object
pointed to by s2 into the object pointed to by s1. If copying takes
place between objects that overlap, the behavior is undefined.
But that is written on the assumption that we're in a normal execution
environment, not in some special case.
What you're missing is that AArch64 is (in ARM ARM terms) an A-profile
only environment where an MMU is mandated in the system. Furthermore,
processors implementing the architecture will *expect* that the MMU be
turned on as soon as possible after boot, since without this the caches
cannot be used and without those the performance will be truly horrible.
Once the caches are enabled, it's perfectly reasonable to assume that
memcpy will only be used for copies to and from NORMAL memory, since
other types of memory have potential side effects, which means that use
of memcpy would be unsafe.
If you want to write an MMU-less memcpy, then feel free to write one;
but please install it with a different interface -- something like
__memcpy_nommu(). Don't penalise the standard case for the non-standard