Poor performance on software cross-compiled for MinGW
Toralf Lund
toralf@procaptura.com
Wed Feb 21 21:08:00 GMT 2007
Bill Gatliff wrote:
> Toralf Lund wrote:
>> I just discovered that the output from my Linux-hosted MinGW gcc
>> cross compiler has some performance issues. That is, I have some code
>> that runs 4-5 times faster on Linux (Red Hat Enterprise 4) when built
>> using the standard compiler there, and using the same options as for
>> the MinGW build, than the cross-compiled code does on Windows. The
>> hardware is identical, and the job consists mainly of raw processing,
>> so I'm inclined to blame it on the compiler rather than OS
>> differences or similar. I'm not using any -O... flags at this time.
>>
>> Cross compiler version is 3.4.2, with
>> http://surfnet.dl.sourceforge.net/sourceforge/mingw/gcc-3.4.2-20040916-1-src.diff.gz
>> applied and otherwise built using the standard procedure, if there is
>> such a thing. I'll probably write up all the gory details later, but
>> thought I might send a quick post first just to ask for ideas about
>> where to start looking for the cause of the performance gap.
>
> If the cross and native compilers are the same versions, then their
> assembly language output should be virtually identical. Diff the asm
> for one of your hotspot functions, and see if there are major
> differences.
Good idea. I should have thought of that.
Now, doing this I have established that the actual computations are
identical in the sense that both version use the same mul/div/sub/add
commands - so different CPU or FPU type setup does not seem to be the
problem.
On the other hand, there are some differences in the way the stack and
registers are used on function calls etc., and I think maybe the MinGW
variant addresses memory in ways that will make it somewhat slower.
For instance, these assembly lines from the MinGW C++ compiler:
__ZNK6IMBblk11getPixelRowERSt6vectorIiSaIiEEi:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $116, %esp
movl 8(%ebp), %eax
movl (%eax), %eax
addl $8, %eax
movl %eax, (%esp)
call __ZNK12IMBreferenceI10IMBblkInfoEptEv
movl 4(%eax), %eax
movl %eax, 4(%esp)
movl 12(%ebp), %eax
movl %eax, (%esp)
call __ZNSt6vectorIiSaIiEE6resizeEj
movl 12(%ebp), %eax
movl %eax, (%esp)
call __ZNSt6vectorIiSaIiEE5beginEv
movl %eax, -12(%ebp)
movl 8(%ebp), %eax
movl %eax, (%esp)
call __ZNK6IMBblk11getDataCharEv
Has the following Linux equivalent
_ZNK6IMBblk11getPixelRowERSt6vectorIiSaIiEEi:
.LFB3488:
pushl %ebp
.LCFI830:
movl %esp, %ebp
.LCFI831:
pushl %ebx
.LCFI832:
subl $100, %esp
.LCFI833:
subl $8, %esp
subl $4, %esp
movl 8(%ebp), %eax
movl (%eax), %eax
addl $8, %eax
pushl %eax
.LCFI834:
call _ZNK12IMBreferenceI10IMBblkInfoEptEv
addl $8, %esp
pushl 4(%eax)
pushl 12(%ebp)
.LCFI835:
call _ZNSt6vectorIiSaIiEE6resizeEj
addl $16, %esp
leal -12(%ebp), %eax
subl $8, %esp
pushl 12(%ebp)
pushl %eax
call _ZNSt6vectorIiSaIiEE5beginEv
addl $12, %esp
subl $12, %esp
pushl 8(%ebp)
call _ZNK6IMBblk11getDataCharEv
I think this corresponds to the following lines of code:
void IMBblk::getPixelRow(std::vector<int> &pixels, int row) const
{
pixels.resize(store->blkInfo->width);
std::vector<int>::iterator p=pixels.begin();
char *data=getDataChar();
Maybe you expect such differences due to plaform specific call
conventions etc, though, and I somehow doubt that they explain the
performance gap. So I guess I have to keep looking...
- T
--
For unsubscribe information see http://sourceware.org/lists.html#faq
More information about the crossgcc
mailing list