Help porting newlib to a new CPU architecture (sorta)

Orlando Arias
Tue Jul 6 20:46:33 GMT 2021


On 7/6/21 4:01 PM, Hans-Bernhard Bröker wrote:
> Am 06.07.2021 um 21:04 schrieb Orlando Arias:
>> Right, went back and looked at the standard. There is no description of
>> what the abstract machine for the execution environment should be. I
>> guess my confusion came from the second paragraph in [1]. Harvard
>> architectures still have the thing that you have to define whether a
>> pointer refers to something in program space or data space, and standard
>> C has no way of signaling this. 
> You're mixing thing up there.  Standard C has a perfectly fine
> distinction between program space and data space, including pointers
> thereto.  Function pointers and data pointers _are_ distinct.
> What Standard C does lack is a standardized distinction between pointers
> into ROM data and RAM data.  const-qualified pointers may seem like they
> offer that, but ultimately they don't.

Possibly I am explaining myself incorrectly here, and likely I am mixing
terminology, yes. There is also a good likelyhood I am conflating things
as well. If that is the case, my apologies and please feel free to
correct my understanding/terminology. What I mean to say is the
following [through an example].

Consider the AVR architecture, where program and data spaces have
distinct address spaces. We have a pointer to a string literal that
resides in program memory. We wish to compare it to a string that
resides in data memory. We could use a [naive] comparison method, such
as strcpy().

const char* str PROGMEM = "hello";

const char* a = str;
const char* b = data_memory_location;

while(*a != '\0' && *a == *b) {
	a++; b++;
return *a - *b;

The problem with this code is that we are treating a as a pointer in
data memory. Declaring a to be PROGMEM does not help. We actually need
to rewrite the code to force the compiler to use the proper instruction:

char t;
while((t = pgm_read_byte(a)) != '\0' && t == *b)
	a++; b++;

return t - *b;

We use the pgm_read_byte() macro to issue the LPM instruction, instead
of a regular load instruction. In fact, avr-libc provides a collection
of functions [which can be identified by their suffix _P] for the
particular event where data resides in program memory. For example, we
have strcmp_P(), where the second argument refers to a pointer to data
in the program memory address space.

>> This is what I meant by the von Neumann requirement: all pointers
>> dereference to the same address space. 
> That's stated broadly enough to be wrong.  The C virtual machine is, in
> fact, a Harvard architecture.  It assumes that const and non-const data
> live in the same address space, but that doesn't make it von-Neumann.

Right, so herein lies a problem. A Harvard machine implies that program
and data are in different address spaces. Unless my understanding is
wrong, this means that there is one address bus for data, and one
address bus for instructions. Dereferencing a function pointer and
dereferencing a data pointer would result in dereferencing to different
address spaces. Now, I believe that doing something like (char*)fn_ptr
in C is either undefined behavior or implementation-defined behavior.
However, the implementations I have seen would treat this pointer as
something in data memory, rather than something in program memory.
Actually modifying what fn_ptr points to would require the use of an
extension to the language [which would be implied if the behavior was
indeed UB or implementation defined]. Please correct me on this one.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <>

More information about the Newlib mailing list