optimizing gcc output for ARM
Jens-Christian Lache
lache@tu-harburg.de
Wed Nov 22 10:20:00 GMT 2000
Sorry, there was a mistake in the source code. (But the
strange behavior remains the same)
the c-file should look like:
int main(void) {
int sum=10;
int coeff=20;
int sample=30;
int i=0;
for (i=0;i<MAX;i++) {
asm volatile ("mla %0,%2,%3,%1":"=r" (sum):"r" (sum), "r" (coeff), "r" (sample), "r" (sum));
}
return 0;
}
And the corresponding assembler output:
@ Generated by gcc 2.95.2 19991024 (release) for ARM/elf
.file "multiplikation.c"
.gcc2_compiled.:
.text
.align 2
.global main
.type main,function
main:
@ args = 0, pretend = 0, frame = 16
@ frame_needed = 1, current_function_anonymous_args = 0
mov ip, sp
stmfd sp!, {fp, ip, lr, pc}
sub fp, ip, #4
sub sp, sp, #16
bl __gccmain
mov r3, #10
str r3, [fp, #-16]
mov r3, #20
str r3, [fp, #-20]
mov r3, #30
str r3, [fp, #-24]
mov r3, #0
str r3, [fp, #-28]
mov r3, #0
str r3, [fp, #-28]
.L3:
ldr r3, [fp, #-28]
cmp r3, #9
ble .L6
b .L4
.L6:
ldr r3, [fp, #-16] /* r3= sum*/
ldr r2, [fp, #-20] /* r2= coeff */
ldr r1, [fp, #-24] /* r1= sample*/
ldr ip, [fp, #-16] <- WHAT FOR???
mla r3,r2,r1,r3
mov r2, r3 <- str r3, [fp,#-16] would be faster
str r2, [fp, #-16]
.L5:
ldr r3, [fp, #-28] /* r3=i */
add r2, r3, #1
str r2, [fp, #-28]
b .L3
.L4:
mov r0, #0
b .L2
.L2:
ldmea fp, {fp, sp, pc}
.Lfe1:
.size main,.Lfe1-main
Am Mit, 22 Nov 2000 schrieben Sie:
> If I compile the following lines
>
> for (i=0;i<10;i++) {
> asm volatile ("mla %0,%3,%2,%1":"=r" (sum):"r" (coeff), "r" (sample), "r" (sum));
> }
> I get the assembler output
> .L3:
> ldr r3, [fp, #-28]
> cmp r3, #9
> ble .L6
> b .L4
> .L6:
> ldr r3, [fp, #-16]
> ldr r2, [fp, #-20]
> ldr r1, [fp, #-24]
> ldr ip, [fp, #-16]
> mla r3,r1,r2,r3
> mov r2, r3
> str r2, [fp, #-16]
> .L5:
> ldr r3, [fp, #-28]
> add r2, r3, #1
> str r2, [fp, #-28]
> b .L3
>
> (explanation: [fp, #-16]: sum; [fp, #-20]:coeff; [fp, #-24]:sample)
>
> I can life with the fact that gcc doesnôt recognize the
> special MAC instruction from the ARM, but the line
> after it is stupid,
> str r3, [fp, #-16]
> would work fine too, and is one ins. shorter.
> And the line above it is not nessesary, too.
> 1.) Why does the gcc produce such output?
> 2.) How can I avoid this?
>
> Thankôs for your hints!
> Jens-Christian
>
> --
>
>
> Jens-Christian Lache
> Technische Universitaet Hamburg-Harburg
> www.tu-harburg.de/~sejl1601
> Mail:
> lache@tu-harburg.de
> lache@ngi.de
> Tel.:
> +0491759610756
--
Jens-Christian Lache
Technische Universitaet Hamburg-Harburg
www.tu-harburg.de/~sejl1601
Mail:
lache@tu-harburg.de
lache@ngi.de
Tel.:
+0491759610756
------
Want more information? See the CrossGCC FAQ, http://www.objsw.com/CrossGCC/
Want to unsubscribe? Send a note to crossgcc-unsubscribe@sourceware.cygnus.com
More information about the crossgcc
mailing list