arm-coff-gcc function prologue w/ unsigned short parameter
Richard Earnshaw
rearnsha@arm.com
Thu May 11 03:20:00 GMT 2000
> > By declaring x as "unsigned short", you are saying that only the bottom 16
> > bits contain meaningful data; but you then try to use all 32 (assuming
> > that the top 16 are zero).
>
> Perhaps I was wrong, but I was under the impression that the ARM core
> cannot deal with just 16 bits of a register unless loading or storing it
> to memory. All ALU manipulations _must_ operate on all 32 bits. Is
> this the case? If so, then would it not be a requirement to zero the
> top 16 bits of a register containing an unsigned short value before
> operating on it? And if not, then why does gcc religiously do so, even
> under -O2 optimization (short of this one bug I'm investigating)?
Ok, let me rephrase my statement slightly more carefully.
By declaring x as "unsigned short", you are saying that when it is
transferred into a 32-bit register, only the bottom 16 bits are
meaningful. If you subsequently want to do an operation on that register
that relies on the top 16-bits being well defined, then you/the compiler
must first convert it into a 32-bit quantity (by zero- or sign-extending
it). If you do not do this, then the top 16 bits my contain garbage (the
compiler is not required to keep those top bits correct at all times).
So, for example, the code
void foo(unsigned short *x)
{
*x += 16;
}
can be compiled to
ldrh r1, [r0] // r0 contains x
add r1, r1, #16 // 32-bit add
strh r1, [r0] // Store bottom 16 bits
In this case there is no need to zero-extend r1, either on the load or on
the
store, since the setting of those bits can never affect the behaviour of
the compiler. Indeed, for the above example
ldrsh r1, [r0]
add r1, r1, #16
strh r1, [r0]
would have given exactly the same results, provided the value in r1 is not
needed after this. And further, on an ARM that doesn't support the
ldrh/strh instructions, the code (little-endian) could be
ldr r1, [r0]
add r1, r1, #16
strb r1, [r0]
mov r1, r1, lsr #8
strb r1, [r0, #1]
(remember that on arm ldr will rotate the addressed halfword to the bottom
of the register, even if it is only 16-bit aligned).
On the other hand, the code
int foo (unsigned short *x)
{
return *x + 16 < 5;
}
must be coded as
ldrh r1, [r0]
add r1, r1, #16
mov r1, r1, asl #16
mov r1, r1, lsr #16
cmp r1, #5
movlo r0, #1
movhs r0, #0
The zero-extension is required because we now need to examine the top 16
bits. (In this latter case, the compiler will sometimes make an
optimization to the above, saving the second shift):
ldrh r1, [r0]
add r1, r1, #16
mov r1, r1, asl #16
cmp r1, #327680 // (5 << 16)
movlo r0, #1
movhs r0, #0
>
> > What you really need to write for swabw is
> >
> > static unsigned short
> > swabw(unsigned short x)
> > {
> > unsigned y = x;
> > __asm__("orr %0, %0, %0, lsl #16 ; mov %0, %0, lsr #8" : "+r" (y) );
> > return y & 0xffff;
> > }
> >
> > This will then ensure that the top bits of 'y' are all zero.
>
> Thanks for the tip, Richard. I think I'll put this in our official
> version of swabw, as under -O2, it doesn't generate any extra instructions.
> In fact, it comes out exactly the same as my version, except that the
> erroneous asr #16 modifier is replaced with the correct lsr #16.
To be honest, the only thing I thought the compiler was doing oddly with
your original code was that it was doing any manipulation at all on the
incoming argument... (if you pass a 16-bit quantity into an asm, then you
must either not care about the high-order bits, or you must explicitly
clear them yourself).
>
> However, as I said before, I see this happening in other places that
> do not have any inline assembly. I just haven't been able to come up
> with a very short example, suitable for posting here, that exhibits the
> bug using pure ANSI C. So though your fix addresses this one instance
> nicely, the problem, in general, remains.
We really are going to need a further example if we are going to get any
further with this.
>
> To elaborate on my assumption/question above, how, in theory, should gcc
> be dealing with 16-bit values on the arm, which natively wants only to
> deal with 32-bit values?
I hope the examples above have made the issue clearer.
Richard.
------
Want more information? See the CrossGCC FAQ, http://www.objsw.com/CrossGCC/
Want to unsubscribe? Send a note to crossgcc-unsubscribe@sourceware.cygnus.com
More information about the crossgcc
mailing list