This is the mail archive of the
mailing list for the newlib project.
Re: MMU Off / Strict Alignment
- From: Christopher Covington <cov at codeaurora dot org>
- To: newlib at sourceware dot org
- Date: Wed, 18 Dec 2013 10:14:00 -0500
- Subject: Re: MMU Off / Strict Alignment
- Authentication-results: sourceware.org; auth=none
- References: <528CF7F1 dot 5050001 at codeaurora dot org> <CADSXKXqJgD3cq594+NeRk9=QHA1DKh3o7aPjsVYOx5OqT1Y6pw at mail dot gmail dot com> <52AF3E5A dot 4050507 at codeaurora dot org> <52B00D46 dot 6050302 at arm dot com> <CAAP=3QN-NHH+bONrB3P6oCEQ8R-aaULxJcR2_T_EdH5_EkZyQg at mail dot gmail dot com> <20131218141016 dot GU30010 at calimero dot vinschen dot de>
On 12/18/2013 09:10 AM, Corinna Vinschen wrote:
> On Dec 17 21:06, Jonathan S. Shapiro wrote:
>> At the risk of sticking my nose in, this isn't a startup code issue.
>> It's a contract issue.
To provide some context, we're looking to use Newlib for a low-level test
suite where among other things we may need to set up the page tables and turn
on the MMU in different ways than what the existing code in libgloss does.
>> First, I don't buy Richard's argument about memcpy() startup costs and
>> hard-to-predict branches. We do those tests on essentially every
>> *other* RISC platform without complaint, and it's very easy to order
>> those branches so that the currently efficient cases run well. Perhaps
>> more to the point, I haven't seen anybody put forward quantitative
>> data that using the MMU for unaligned references is any better than
>> executing those branches. Speaking as a recovering processor
>> architect, that assumption needs to be validated quantitatively. My
>> guess is that the branches are faster if properly arranged.
>> Second, this is a contract issue. If newlib intends to support
>> embedded platforms, then it needs to implement algorithms that are
>> functionally correct without relying on an MMU. By all means use
>> simpler or smarter algorithms when an MMU can be assumed to be
>> available in a given configuration, but provide an algorithm that is
>> functionally correct when no MMU is available. "Good overall
>> performance in memcpy" is a fine thing, but it is subject to the
>> requirement of meeting functional specifications. As Jochen Liedtke
>> famously put it (read this in a heavy German accent): "Fast, ya. But
>> correct? (shrug) Eh!"
>> So: we need a normative statement saying what the contract is. The
>> rest of the answer will fall out from that.
>> I do agree with Richard that startup code is special. I've built
>> deeply embedded runtimes of one form or another for 25 years now, and
>> I have yet to see a system where optimizing a simplistic byte-wise
>> memcpy during bootstrap would have made any difference in anything
>> overall. That said, if the specification of memcpy requires it to
>> handle incompatibly aligned pointers (and it does), and the contract
>> for newlib requires it to operate in MMU-less scenarios in a given
>> configuration (which, at least in some cases, it does), it's
>> completely legitimate to expect that bootstrap code can call memcpy()
>> and expect behavior that meets specifications.
>> So what's the contract?
> I don't know anything about this contract, but what keeps people
> from extending memcpy for aarch to contain two sets of instructions,
> one with MMU-less instructions and one with MMU?
One can essentially build those two variants already, by defining
PREFER_SIZE_OVER_SPEED or not. When that macro is defined, byte-at-a-time C
routines are used.
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.