Help porting newlib to a new CPU architecture (sorta)

Brian Inglis
Wed Jul 7 05:45:52 GMT 2021

On 2021-07-06 14:46, Orlando Arias wrote:
> Greetings,
> On 7/6/21 4:01 PM, Hans-Bernhard Bröker wrote:
>> Am 06.07.2021 um 21:04 schrieb Orlando Arias:
>>> Right, went back and looked at the standard. There is no description of
>>> what the abstract machine for the execution environment should be. I
>>> guess my confusion came from the second paragraph in [1]. Harvard
>>> architectures still have the thing that you have to define whether a
>>> pointer refers to something in program space or data space, and standard
>>> C has no way of signaling this.
>> You're mixing thing up there.  Standard C has a perfectly fine
>> distinction between program space and data space, including pointers
>> thereto.  Function pointers and data pointers _are_ distinct.
>> What Standard C does lack is a standardized distinction between pointers
>> into ROM data and RAM data.  const-qualified pointers may seem like they
>> offer that, but ultimately they don't.
> Possibly I am explaining myself incorrectly here, and likely I am mixing
> terminology, yes. There is also a good likelyhood I am conflating things
> as well. If that is the case, my apologies and please feel free to
> correct my understanding/terminology. What I mean to say is the
> following [through an example].
> Consider the AVR architecture, where program and data spaces have
> distinct address spaces. We have a pointer to a string literal that
> resides in program memory. We wish to compare it to a string that
> resides in data memory. We could use a [naive] comparison method, such
> as strcpy().
> const char* str PROGMEM = "hello";
> const char* a = str;
> const char* b = data_memory_location;
> while(*a != '\0' && *a == *b) {
> 	a++; b++;
> }
> return *a - *b;
> The problem with this code is that we are treating a as a pointer in
> data memory. Declaring a to be PROGMEM does not help. We actually need
> to rewrite the code to force the compiler to use the proper instruction:
> char t;
> while((t = pgm_read_byte(a)) != '\0' && t == *b)
> 	a++; b++;
> }
> return t - *b;
> We use the pgm_read_byte() macro to issue the LPM instruction, instead
> of a regular load instruction. In fact, avr-libc provides a collection
> of functions [which can be identified by their suffix _P] for the
> particular event where data resides in program memory. For example, we
> have strcmp_P(), where the second argument refers to a pointer to data
> in the program memory address space.
>>> This is what I meant by the von Neumann requirement: all pointers
>>> dereference to the same address space.
>> That's stated broadly enough to be wrong.  The C virtual machine is, in
>> fact, a Harvard architecture.  It assumes that const and non-const data
>> live in the same address space, but that doesn't make it von-Neumann.
> Right, so herein lies a problem. A Harvard machine implies that program
> and data are in different address spaces. Unless my understanding is
> wrong, this means that there is one address bus for data, and one
> address bus for instructions. Dereferencing a function pointer and
> dereferencing a data pointer would result in dereferencing to different
> address spaces. Now, I believe that doing something like (char*)fn_ptr
> in C is either undefined behavior or implementation-defined behavior.

Function and object pointer inter-conversions are UB - daemons fly out 
your nose! For example, when code and data model pointer sizes are 
different, pointers are physically incompatible.

> However, the implementations I have seen would treat this pointer as
> something in data memory, rather than something in program memory.
> Actually modifying what fn_ptr points to would require the use of an
> extension to the language [which would be implied if the behavior was
> indeed UB or implementation defined]. Please correct me on this one.

Works on von Neumann architecture implementations, or where mapping 
registers map the same address ranges for code and data, perhaps with 
different access modes.
Modifying what a function pointer object points to is fairly common in 
C, as long as when they are used, they are (cast to) a pointer to a 
function of the correct type; c.f. qsort pointer to comparison function 
in its last argument, and Unix system driver interfaces which are 
effectively arrays of function pointers.

If you look at e.g the PDP11 architecture, somewhat similar 6800 series 
models, or the like, it had a number of mainly orthogonal general 
register addressing modes, including PC relative, indirect PC relative, 
and either with autoinc-/decrement, so it could use many registers to 
access "program" memory, absolute "program" addresses, and move through 
that space like an IP for threaded code, or as a subroutine stack, or 
access RO data in the instruction space directly.
For example, to copy RO instruction space data to RAM, the move source 
register uses autoincrement PC relative addressing and the destination 
register uses autoincrement relative addressing from a RAM base address.

Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

More information about the Newlib mailing list