Poor performance on software cross-compiled for MinGW

Wed Feb 21 21:08:00 GMT 2007

Bill Gatliff wrote:
> Toralf Lund wrote:
>> I just discovered that the output from my Linux-hosted MinGW gcc 
>> cross compiler has some performance issues. That is, I have some code 
>> that runs 4-5 times faster on Linux (Red Hat Enterprise 4) when built 
>> using the standard compiler there, and using the same options as for 
>> the MinGW build, than the cross-compiled code does on Windows. The 
>> hardware is identical, and the job consists mainly of raw processing, 
>> so I'm inclined to blame it on the compiler rather than OS 
>> differences or similar. I'm not using any -O... flags at this time.
>>
>> Cross compiler version is 3.4.2, with 
>> http://surfnet.dl.sourceforge.net/sourceforge/mingw/gcc-3.4.2-20040916-1-src.diff.gz 
>> applied and otherwise built using the standard procedure, if there is 
>> such a thing. I'll probably write up all the gory details later, but 
>> thought I might send a quick post first just to ask for ideas about 
>> where to start looking for the cause of the performance gap.
>
> If the cross and native compilers are the same versions, then their 
> assembly language output should be virtually identical.  Diff the asm 
> for one of your hotspot functions, and see if there are major 
> differences.
Good idea. I should have thought of that.

Now, doing this I have established that the actual computations are 
identical in the sense that both version use the same mul/div/sub/add 
commands - so different CPU or FPU type setup does not seem to be the 
problem.

On the other hand, there are some differences in the way the stack and 
registers are used on function calls etc., and I think maybe the MinGW 
variant addresses memory in ways that will make it somewhat slower.

For instance, these assembly lines from the MinGW C++ compiler:

__ZNK6IMBblk11getPixelRowERSt6vectorIiSaIiEEi:
    pushl    %ebp
    movl    %esp, %ebp
    pushl    %ebx
    subl    $116, %esp
    movl    8(%ebp), %eax
    movl    (%eax), %eax
    addl    $8, %eax
    movl    %eax, (%esp)
    call    __ZNK12IMBreferenceI10IMBblkInfoEptEv
    movl    4(%eax), %eax
    movl    %eax, 4(%esp)
    movl    12(%ebp), %eax
    movl    %eax, (%esp)
    call    __ZNSt6vectorIiSaIiEE6resizeEj
    movl    12(%ebp), %eax
    movl    %eax, (%esp)
    call    __ZNSt6vectorIiSaIiEE5beginEv
    movl    %eax, -12(%ebp)
    movl    8(%ebp), %eax
    movl    %eax, (%esp)
    call    __ZNK6IMBblk11getDataCharEv

Has the following Linux equivalent

_ZNK6IMBblk11getPixelRowERSt6vectorIiSaIiEEi:
.LFB3488:
    pushl    %ebp
.LCFI830:
    movl    %esp, %ebp
.LCFI831:
    pushl    %ebx
.LCFI832:
    subl    $100, %esp
.LCFI833:
    subl    $8, %esp
    subl    $4, %esp
    movl    8(%ebp), %eax
    movl    (%eax), %eax
    addl    $8, %eax
    pushl    %eax
.LCFI834:
    call    _ZNK12IMBreferenceI10IMBblkInfoEptEv
    addl    $8, %esp
    pushl    4(%eax)
    pushl    12(%ebp)
.LCFI835:
    call    _ZNSt6vectorIiSaIiEE6resizeEj
    addl    $16, %esp
    leal    -12(%ebp), %eax
    subl    $8, %esp
    pushl    12(%ebp)
    pushl    %eax
    call    _ZNSt6vectorIiSaIiEE5beginEv
    addl    $12, %esp
    subl    $12, %esp
    pushl    8(%ebp)
    call    _ZNK6IMBblk11getDataCharEv

I think this corresponds to the following lines of code:

void IMBblk::getPixelRow(std::vector<int> &pixels, int row) const
{
  pixels.resize(store->blkInfo->width);

  std::vector<int>::iterator p=pixels.begin();
  char *data=getDataChar();

Maybe you expect such differences due to plaform specific call 
conventions etc, though, and I somehow doubt that they explain the 
performance gap. So I guess I have to keep looking...

- T

--
For unsubscribe information see http://sourceware.org/lists.html#faq