This is the mail archive of the crossgcc@sourceware.org mailing list for the crossgcc project.
See the CrossGCC FAQ for lots more information.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Basically, an older set of tools I built is generating much faster floating point code. A new set of tools I built does not have such fast FP code, and I'd like to figure out how to rebuild it so that it does.
I've compared some of the floating point code in the disassembly of our code. In one example, __addfs3, the code from one toolsuite is markedly different from the other. I've included the disassembly below.
Clearly, the floating point code in the fast case is highly optimized. It doesn't use the stack, it doesn't branch to other routines, etc.
TIA, Rick
80101a30 <__addsf3>: 80101a30: e92d4030 stmdb sp!, {r4, r5, lr} 80101a34: e24dd038 sub sp, sp, #56 ; 0x38 80101a38: e28d5020 add r5, sp, #32 ; 0x20 80101a3c: e58d0034 str r0, [sp, #52] 80101a40: e58d1030 str r1, [sp, #48] 80101a44: e28d0034 add r0, sp, #52 ; 0x34 80101a48: e1a01005 mov r1, r5 80101a4c: e28d4010 add r4, sp, #16 ; 0x10 80101a50: eb0001b1 bl 8010211c <__unpack_f> 80101a54: e28d0030 add r0, sp, #48 ; 0x30 80101a58: e1a01004 mov r1, r4 80101a5c: eb0001ae bl 8010211c <__unpack_f> 80101a60: e1a01004 mov r1, r4 80101a64: e1a0200d mov r2, sp 80101a68: e1a00005 mov r0, r5 80101a6c: ebffff55 bl 801017c8 <_fpadd_parts> 80101a70: eb00014e bl 80101fb0 <__pack_f> 80101a74: e28dd038 add sp, sp, #56 ; 0x38 80101a78: e8bd8030 ldmia sp!, {r4, r5, pc}
80101750 <__addsf3>: 80101750: e1b02080 lsls r2, r0, #1 80101754: 11b03081 lslsne r3, r1, #1 80101758: 11320003 teqne r2, r3 8010175c: 11f0cc42 mvnsne ip, r2, asr #24 80101760: 11f0cc43 mvnsne ip, r3, asr #24 80101764: 0a00003c beq 8010185c <__addsf3+0x10c> 80101768: e1a02c22 lsr r2, r2, #24 8010176c: e0723c23 rsbs r3, r2, r3, lsr #24 80101770: c0822003 addgt r2, r2, r3 80101774: c0201001 eorgt r1, r0, r1 80101778: c0210000 eorgt r0, r1, r0 8010177c: c0201001 eorgt r1, r0, r1 80101780: b2633000 rsblt r3, r3, #0 ; 0x0 80101784: e3530019 cmp r3, #25 ; 0x19 80101788: 812fff1e bxhi lr 8010178c: e3100102 tst r0, #-2147483648 ; 0x80000000 80101790: e3800502 orr r0, r0, #8388608 ; 0x800000 80101794: e3c004ff bic r0, r0, #-16777216 ; 0xff000000 80101798: 12600000 rsbne r0, r0, #0 ; 0x0 8010179c: e3110102 tst r1, #-2147483648 ; 0x80000000 801017a0: e3811502 orr r1, r1, #8388608 ; 0x800000 801017a4: e3c114ff bic r1, r1, #-16777216 ; 0xff000000 801017a8: 12611000 rsbne r1, r1, #0 ; 0x0 801017ac: e1320003 teq r2, r3 801017b0: 0a000023 beq 80101844 <__addsf3+0xf4> 801017b4: e2422001 sub r2, r2, #1 ; 0x1 801017b8: e0900351 adds r0, r0, r1, asr r3 801017bc: e2633020 rsb r3, r3, #32 ; 0x20 801017c0: e1a01311 lsl r1, r1, r3 801017c4: e2003102 and r3, r0, #-2147483648 ; 0x80000000 801017c8: 5a000001 bpl 801017d4 <__addsf3+0x84> 801017cc: e2711000 rsbs r1, r1, #0 ; 0x0 801017d0: e2e00000 rsc r0, r0, #0 ; 0x0 801017d4: e3500502 cmp r0, #8388608 ; 0x800000 801017d8: 3a00000b bcc 8010180c <__addsf3+0xbc> 801017dc: e3500401 cmp r0, #16777216 ; 0x1000000 801017e0: 3a000004 bcc 801017f8 <__addsf3+0xa8> 801017e4: e1b000a0 lsrs r0, r0, #1 801017e8: e1a01061 rrx r1, r1 801017ec: e2822001 add r2, r2, #1 ; 0x1 801017f0: e35200fe cmp r2, #254 ; 0xfe 801017f4: 2a00002d bcs 801018b0 <__addsf3+0x160> 801017f8: e3510102 cmp r1, #-2147483648 ; 0x80000000 801017fc: e0a00b82 adc r0, r0, r2, lsl #23 80101800: 03c00001 biceq r0, r0, #1 ; 0x1 80101804: e1800003 orr r0, r0, r3 80101808: e12fff1e bx lr 8010180c: e1b01081 lsls r1, r1, #1 80101810: e0a00000 adc r0, r0, r0 80101814: e3100502 tst r0, #8388608 ; 0x800000 80101818: e2422001 sub r2, r2, #1 ; 0x1 8010181c: 1afffff5 bne 801017f8 <__addsf3+0xa8> 80101820: e16fcf10 clz ip, r0 80101824: e24cc008 sub ip, ip, #8 ; 0x8 80101828: e052200c subs r2, r2, ip 8010182c: e1a00c10 lsl r0, r0, ip 80101830: a0800b82 addge r0, r0, r2, lsl #23 80101834: b2622000 rsblt r2, r2, #0 ; 0x0 80101838: a1800003 orrge r0, r0, r3 8010183c: b1830230 orrlt r0, r3, r0, lsr r2 80101840: e12fff1e bx lr 80101844: e3320000 teq r2, #0 ; 0x0 80101848: e2211502 eor r1, r1, #8388608 ; 0x800000 8010184c: 02200502 eoreq r0, r0, #8388608 ; 0x800000 80101850: 02822001 addeq r2, r2, #1 ; 0x1 80101854: 12433001 subne r3, r3, #1 ; 0x1 80101858: eaffffd5 b 801017b4 <__addsf3+0x64> 8010185c: e1a03081 lsl r3, r1, #1 80101860: e1f0cc42 mvns ip, r2, asr #24 80101864: 11f0cc43 mvnsne ip, r3, asr #24 80101868: 0a000013 beq 801018bc <__addsf3+0x16c> 8010186c: e1320003 teq r2, r3 80101870: 0a000002 beq 80101880 <__addsf3+0x130> 80101874: e3320000 teq r2, #0 ; 0x0 80101878: 01a00001 moveq r0, r1 8010187c: e12fff1e bx lr 80101880: e1300001 teq r0, r1 80101884: 13a00000 movne r0, #0 ; 0x0 80101888: 112fff1e bxne lr 8010188c: e31204ff tst r2, #-16777216 ; 0xff000000 80101890: 1a000002 bne 801018a0 <__addsf3+0x150> 80101894: e1b00080 lsls r0, r0, #1 80101898: 23800102 orrcs r0, r0, #-2147483648 ; 0x80000000 8010189c: e12fff1e bx lr 801018a0: e2922402 adds r2, r2, #33554432 ; 0x2000000 801018a4: 32800502 addcc r0, r0, #8388608 ; 0x800000 801018a8: 312fff1e bxcc lr 801018ac: e2003102 and r3, r0, #-2147483648 ; 0x80000000 801018b0: e383047f orr r0, r3, #2130706432 ; 0x7f000000 801018b4: e3800502 orr r0, r0, #8388608 ; 0x800000 801018b8: e12fff1e bx lr 801018bc: e1f02c42 mvns r2, r2, asr #24 801018c0: 11a00001 movne r0, r1 801018c4: 01f03c43 mvnseq r3, r3, asr #24 801018c8: 11a01000 movne r1, r0 801018cc: e1b02480 lsls r2, r0, #9 801018d0: 01b03481 lslseq r3, r1, #9 801018d4: 01300001 teqeq r0, r1 801018d8: 13800501 orrne r0, r0, #4194304 ; 0x400000 801018dc: e12fff1e bx lr
A little more information: there seems to be a difference in the resulting binary's floating point (which would go a long way to explaining what I'm seeing). The ELF built with the more recent tools results in this:
$ xscale-elf-readelf -h h.elf ELF Header: Magic: 7f 45 4c 46 01 01 01 61 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: ARM ABI Version: 0 Type: EXEC (Executable file) Machine: ARM Version: 0x1 Entry point address: 0x80100000 Start of program headers: 52 (bytes into file) Start of section headers: 448508 (bytes into file) Flags: 0x602, has entry point, GNU EABI, software FP, VFP Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 1 Size of section headers: 40 (bytes) Number of section headers: 25 Section header string table index: 22
$ arm-elf-readelf -h h.elf ELF Header: Magic: 7f 45 4c 46 01 01 01 61 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: ARM ABI Version: 0 Type: EXEC (Executable file) Machine: ARM Version: 0x1 Entry point address: 0x80100000 Start of program headers: 52 (bytes into file) Start of section headers: 411484 (bytes into file) Flags: 0x402, has entry point, GNU EABI, VFP Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 1 Size of section headers: 40 (bytes) Number of section headers: 26 Section header string table index: 23
The relevant change is in the Flags: field. The new tools include "software FP", the old tools don't.
Now, the processor doesn't have hardware floating point, yet the code runs in both cases, so some kind of software floating point code is being emitted.
TIA, Rick
I've been building tools targeting the Marvell Xscale processor a lot lately. A set of tools I build a few months ago seem to generate much faster code on our target hardware than tools I built more recently. There were some significant differences in the way the tools were built, but it doesn't seem like that's enough to explain the difference. Unfortunately, I don't remember exactly how I built the older toolchain, so I'm hoping someone can help me determine what it was by looking at the build result.
Old tools:
$ arm-elf-gcc -v Using built-in specs. Target: arm-elf Configured with: ../configure --prefix=/usr/local/arm3 --target=arm- elf --with-newlib --with-cpu=xscale --enable-languages=c,c++ Thread model: single gcc version 4.2.1
$ arm-elf-ld --version GNU ld (GNU Binutils) 2.18
How do I tell what version of newlib is installed (I think it's 1.15)?
Built using a multistep process, where I first built binutils, then gcc, then newlib (I don't recall if I did a stage 1 GCC build first, but somehow I got it all working).
The latest tools are slightly different, and built with a combined tree build:
gcc-4.2.2 binutils-2.17 newlib-1.15
$ xscale-elf-gcc -v Using built-in specs. Target: xscale-elf Configured with: ../combined/configure --target=xscale-elf --disable- nls --with-newlib --prefix=/usr/local/gcc-xscale-elf --disable- newlib-supplied-syscalls Thread model: single gcc version 4.2.2
I'm sorry I can't provide better information, but I'd really like to figure this out. The code doesn't call into the standard C library, but does make use of a lot of floating point code. Is it possible that this code is better with the other tools (either built more optimized, or generally different)? I don't know I'm just speculating. It is C++ code (bouncing balls on a screen, the balls are object instances).
Thanks for any help!
-- Rick
-- For unsubscribe information see http://sourceware.org/lists.html#faq
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |