This is the mail archive of the
cygwin-patches
mailing list for the Cygwin project.
Re: [PATCH v2 1/4] Cygwin: console: Add workaround for broken IL/DL in xterm mode.
On Sun, 1 Mar 2020 14:56:31 +0100
Hans-Bernhard Bröker wrote:
> Am 01.03.2020 um 07:33 schrieb Takashi Yano:
>
> > However, from the view point of performance, just inline
> > static function is better.
>
> I don't see how that could be the case. Inline methods of a static C++
> object should not suffer any perfomance penalty compared to inline
> functions operating on static variables.
>
> > Attached code measures the
> > performance of access speed for wpbuf.
> > I compiled it by g++ 7.4.0 with -O2 option.
> >
> > The result is as follows.
> >
> > Total1: 2.315627 second
> > Total2: 1.588511 second
> > Total3: 1.571572 second
>
> Strange. The result here (with GCC 9.2) is rather different:
>
> $ g++ -O2 -o tt wpbuf-bench.cc && ./tt
> Total1: 0.753815 second
> Total2: 0.757444 second
> Total3: 1.217352 second
>
> And on inspection, all three bench*() functions do appear to have
> exactly the same machine code, too. They may be inlined and mixed into
> main() somewhat differently, though. That might explain the difference
> more readily than any actual difference in speed between the three
> implementations.
I looked into the code generated by g++ 7.4.0 with -O2. The codes
generated are different.
With 32bit compiler,
bench1():
L3:
cmpl $255, %edx
jg L2
movb $65, _wpbuf(%edx)
movl $1, %ecx
addl $1, %edx
L2:
subl $1, %eax
[...]
bench2(), bench3():
L22:
cmpl $255, %edx
jg L21
movb $65, _wpbuf2(%edx)
addl $1, %edx
L21:
subl $1, %eax
[...]
With 64bit compiler,
bench1():
.L3:
cmpl $255, %edx
jg .L2
movslq %edx, %rcx
addl $1, %edx
movb $65, (%r8,%rcx)
movl $1, %ecx
.L2:
subl $1, %eax
[...]
bench2(), bench3():
.L15:
cmpl $255, %edx
jg .L14
movslq %edx, %rcx
addl $1, %edx
movb $65, (%r8,%rcx)
.L14:
subl $1, %eax
[...]
Obviously, code for bench2() and bench3() is shorter than
bench1().
However, with g++ 9.2.0 with -O2,
bench1(), bench2(), bench3():
L3:
cmpl $255, %edx
jg L2
movb $65, _wpbuf(%edx)
addl $1, %edx
L2:
subl $1, %eax
[...]
all the codes are exactly the same, as you mentioned.
So, if we assume g++ 9.2.0, please forget the previous remarks
about speed.
--
Takashi Yano <takashi.yano@nifty.ne.jp>