Mon Feb 28 11:41:00 GMT 2011
in the version of the newlib library distributed by Code Sourcery as part of the freescale-2010.09-56-powerpc-eabi release, memcpy() is implemented in assembly language. This code uses (64-bit) registers defined as part of the "Signal Processing Extension" (SPE) to the Power Architecture.
Instead of defining additional registers, as is usual with floating point units, SPE extends the Power general purpose registers (GPRs) to 64 bits when starting from a 32-bit implementation.
Similar to floating point units, a bit is defined in the machine status register that causes SPE instructions to trap instead of executing successfully. Just as with floating point, this bit can be used to acquire an extended task context (extended over the core general purpose and other user readable/writable registers) lazily: the upper halves of the GPRs in a 32-bit implementation are saved/restored only when a SPE instruction is seen for the first time after they have last been context-switched out.
Additional care must be taken for interrupt service routines: in most operating system implementations, they are responsible for saving/restoring the extended context if necessary.
GCC generates calls to memcpy() to implement assignment of structures larger than some internally-defined size. After studying the manual, I am convinced it is not possible to turn this off. GCC requires the standard library to provide these routines.
Naturally, I have an interrupt service routine that assigns a large enough structure without saving/restoring the additional context and runs into this problem. Rest assured that assigning such a structure in an ISR is not a design flaw.
The Newlib configuration script tests for the GCC-defined preprocessor macro __SPE__ to enable the problematic assembly language versions of memcpy().
Now what I would like is some advice on improving this test.
For a 32-bit implementation, in particular in the e200 series of cores, the 32-bit only store/load multiple word instructions are a convenient and fast way of context switching the 32-bit portion of the task context; to keep interrupt-service-routine latency low, only the minimal context should be saved when entering an ISR, which is also handily achieved by the load/store multiple word instructions; ISR authors can't be faulted when GCC emits calls to memcpy()--it is not even clear when exactly GCC will do this.
A memcpy() that traps and causes each task to aquire extended context is of questional value anyhow for predominantly integer code. It is also unclear whether a trapping memcpy() is actually conforming: a slightly slower memcpy() that works universally is probably preferable. Unfortunately, user-space code cannot check if the SPE is enabled or not, as the machine state register is not readable from user space.
So one thing I could do is to add a check for 32-bitness. I am not sure this solves the problem for 64-bit code. On the other hand, GCC probably provides the necessary preprocessor definitions for 32/64 bits already---I am not sure where to gather additional information. Any other ideas?
More information about the Newlib