[PATCH v2.0] Use saturated arithmetic for overflow detection.
Paul Eggert
eggert@cs.ucla.edu
Fri Nov 1 20:44:00 GMT 2013
On 11/01/2013 10:58 AM, OndÅej BÃlka wrote:
> I got similar slowdown on core2, nehalem and fx10 machines.
Conversely, I saw a 2x speedup on my platform, an AMD
Deneb (Phenom II X4 910e):
$ gcc -O2 assembly.c && time ./a.out
real 0m2.096s
user 0m2.095s
sys 0m0.001s
$ gcc -O2 branchfree.c && time ./a.out
real 0m1.057s
user 0m1.054s
sys 0m0.002s
> As code size is concerned my assembly has 8 extra bytes
> (jump 2, xor 3, neg 3). When I use sbb trick from article
> I could decrease that to 5.
5 bytes more than what we're doing now,
or 5 bytes more than the branchfree version?
I'm worried about code bloat compared to
what we're doing now.
More information about the Libc-alpha
mailing list