[PATCH v2.0] Use saturated arithmetic for overflow detection.

Paul Eggert eggert@cs.ucla.edu
Fri Nov 1 20:44:00 GMT 2013


On 11/01/2013 10:58 AM, Ondřej Bílka wrote:

> I got similar slowdown on core2, nehalem and fx10 machines.

Conversely, I saw a 2x speedup on my platform, an AMD
Deneb (Phenom II X4 910e):

   $ gcc -O2 assembly.c && time ./a.out

   real    0m2.096s
   user    0m2.095s
   sys    0m0.001s
   $ gcc -O2 branchfree.c && time ./a.out

   real    0m1.057s
   user    0m1.054s
   sys    0m0.002s


> As code size is concerned my assembly has 8 extra bytes
> (jump 2, xor 3, neg 3).  When I use sbb trick from article
> I could decrease that to 5.

5 bytes more than what we're doing now,
or 5 bytes more than the branchfree version?
I'm worried about code bloat compared to
what we're doing now.



More information about the Libc-alpha mailing list