This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] [BZ15384] Enchance finite and isfinite.
- From: Marc Glisse <marc dot glisse at inria dot fr>
- To: Ondřej Bílka <neleai at seznam dot cz>
- Cc: libc-alpha at sourceware dot org
- Date: Sun, 21 Apr 2013 20:41:33 +0200 (CEST)
- Subject: Re: [RFC] [BZ15384] Enchance finite and isfinite.
- References: <20130421130733 dot GA13954 at domone dot kolej dot mff dot cuni dot cz> <alpine dot DEB dot 2 dot 02 dot 1304211522530 dot 3895 at laptop-mg dot saclay dot inria dot fr> <20130421145745 dot GA15025 at domone dot kolej dot mff dot cuni dot cz>
On Sun, 21 Apr 2013, OndÅej BÃlka wrote:
On Sun, Apr 21, 2013 at 03:35:19PM +0200, Marc Glisse wrote:
On Sun, 21 Apr 2013, OndÅej BÃlka wrote:
However on x64 even gcc without optimizations expands finite to inline
version which is slower than my version(see benchmark).
This seems to depend on the CPU. Here:
model name : Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz
Cannot duplicate
on Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz
and Intel(R) Core(TM)2 Duo CPU E7200 @ 2.53GHz
Could you try to run new version again?
However on AMD Phenom(tm) II X6 1090T Processor my results are below.
I fixed few mistakes in benchmark, now there should be correct version.
One problem is that we are affected by gcc bugs, particulary
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Funny, when I run your example from comment #2 in that PR, -march=native
helps. On the other hand, -march=native hurts in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57024#c1
Other explanation may be due bug in gcc that aligns loops only to 8
bytes. One implementation can get faster just because it is 16 byte
aligned so I changed that in assembly.
That sounds like a good reason.
current
real 0m0.816s
user 0m0.813s
sys 0m0.000s
new # from mail
real 0m0.738s
user 0m0.737s
sys 0m0.000s
opt # with fixed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
real 0m0.703s
user 0m0.700s
sys 0m0.000s
nonzero #from PR
real 0m0.826s
user 0m0.820s
sys 0m0.003s
Different order here:
.81
.72
.76
.64
--
Marc Glisse