Address sizes on 64-bit MIPS targets

Wed Mar 2 09:36:00 GMT 2005

Thiemo Seufer <ica2_ts@csv.ica.uni-stuttgart.de> writes:
> Richard Sandiford wrote:
> [snip]
>> However, if the sequence is:
>> 
>>         dla     $2,0xa8000000000000
>>         lw      $2,0x100000($2)
>> 
>> then the lw will use 32-bit address arithmetic:
>> 
>>         lui     $1,0x10
>>         addu    $1,$1,$2
>>         lw      $2,0($1)
>> 
>> and the behaviour will be unpredictable.
>
> Which isn't exactly a problem, because 0xa8000000000000 is an invalid
> address for n32. For 32bit sign-extended addresses it works fine.

Sure.  But you seem to be missing the whole point.  0xa8000000000000 was
just as invalid in the other two examples as well.  I see no fundamental
difference between the example quoted above and the two previous ones.

>> I suppose one justification for using "addu" might be that addresses
>> should stay within the 32-bit address space for n32 and o64, even if
>> the calculation overflows.  But suppose we have a (non-macro) instruction
>> with a 16-bit offset:
>> 
>>         lw      $2,0x1000($2)
>> 
>> The behaviour of this instruction is unpredicatable for 32-bit address
>> spaces on 64-bit targets if $2 + 0x1000 overflows (for example, if
>> $2 == 0x7ffffff0).
>
> What is unpredictable there? It will point to the start of CKSEG0, and
> trigger an address exception.

It depends.  You only get wrap-around when running in user mode.
n32 code running in kernel or supervisor mode will not wrap around,
so the results are sensitive to processor mode.

And I'm not sure how predicatable the behaviour is for all 64-bit
processors.  The MIPS64 spec guarantees wrap-around for user mode
programs when UX=0, but (for example) the description of supervisor
and kernel addressing in the VR4100 manual says:

   Usually, it is impossible for 32-bit mode programs to generate
   invalid addresses.  In an operation of base register + offset
   for addressing, however, a two's complement overflow may occur,
   causing an invalid address.  Note that the result becomes undefined.

But maybe I'm being overly influenced by that ;)

The bottom line is: you can only rely on the final address being
sign-extended from 32 bits when running in user mode.  And when you
_can_ rely on that, you'll get the behaviour you want regardless of
whether we use "addu" or "daddu" in the example above.  So what
harm is there in using daddu?

Like I say, SGI (who defined n32 in the first place ;) use 64-bit
arithmetic in their assemblers.  I don't think we can claim that
we're being more ABI-compliant by using 32-bit arithmetic for the
example above instead.

>> I'd like to change things so that:
>> 
>>     - "dla"s with no symbolic component use 64-bit arithmetic
>>     - "la"s with no symbolic component use 32-bit arithmetic
>>     - "la"s and "dla"s with a symbolic component use 64-bit arithmetic
>>       iff HAVE_64BIT_ADDRESSES (i.e. they use whatever the ABI dictates). [*]
>> 
>>     - loads and stores with no symbolic component use 64-bit arithmetic
>>       iff HAVE_64BIT_GPRS.
>>     - loads and stores with a symbolic component use 64-bit arithmetic
>>       iff HAVE_64BIT_ADDRESSES. [*]
>
> Instead of changing the behaviour of this already fragile hack I would
> prefer an explicit -msym32 switch for ABI N64.

And I have a patch for that. ;)  Hope to get it polished up and posted
tonight.  But as I said at the beginning of yesterday's message, this
patch is paving the way for -msym32 by auditing existing uses of
HAVE_{32,64}BIT_ADDRESSES.

I couldn't really tell from your reply whether you object to the changes
outright.  Is there any way I can persuade you to go with this?  FWIW,
restricting the influence of HAVE_64BIT_ADDRESSES to symbolic arithmetic
really does make the -msym32 behaviour more obvious.

Richard