This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.


On Fri, Nov 01, 2013 at 05:10:26PM -0700, Paul Eggert wrote:
> OndÅej BÃlka wrote:
> > Weird as I cannot get these on athlon X2 and phenom X6. As one iteration takes
> > 2.096 * 2600 / 340 = 16 cycles a slowdown is 8 cycles which is hard to explain.
> > 
> > I attached binaries which were used to test (gcc version 4.4.5 (Debian 4.4.5-8))
> 
> Yes, it's weird.  I built binaries on Ubuntu 13.10 and ran them
> on the Deneb machine in question and the times were identical.
> 
> Even stranger: the 2x difference came when I was using a GCC 4.8.2
> that I built myself, unmodified from the sources.  When I switched
> to the system-supplied Fedora 19 GCC 4.8.2 20131017 (Red Hat 4.8.2-1),
> the performance difference went away.
> 
> Could be a caching thing, I suppose.  I wouldn't worry about it too much.
>
I found that I forgotten mark rdx in branchfree version clobbered which
probably caused that. Correct assembly is following.

 size_t scratch;
  asm ("mul %%rdx; sbb %%rdx, %%rdx; or %%rdx, %%rax" : "=a" (ret) ,
"=d" (scratch) : "a" (x) , "d" (y));
 
> > overhead is versus version with no checking.
> 
> The measurement I'd like to see is how much does it bloat
> the code compared to the way we're doing it now, namely,
> 
>    p = malloc (add_sat (mul_sat (a, b), c));
> 
checking takes 10 bytes for branch version, 12 bytes in branchfree
version.

> versus
> 
>    p = a <= (SIZE_MAX - c) / b < malloc (a * b + c) : (errno=ENOMEM, NULL);
>
> or perhaps this would be a better comparison:
> 
>    p = malloc (a <= (SIZE_MAX - c) / b ? a * b + c : SIZE_MAX);
>
Here checking as done by gcc takes 19 bytes, which is mainly because
of moves SIZE_MAX and (SIZE_MAX - c) / b both taking 7 bytes.

 
> or, if we're going to go off the deep end tuning anyway, perhaps
> we should have a muladd_s primitive that does multiply *and* add!



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]