This is the mail archive of the
`libc-alpha@sourceware.org`
mailing list for the glibc project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: OndÅej BÃlka <neleai at seznam dot cz>*To*: Paul Eggert <eggert at cs dot ucla dot edu>*Cc*: libc-alpha at sourceware dot org*Date*: Sat, 2 Nov 2013 10:25:09 +0100*Subject*: Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.*Authentication-results*: sourceware.org; auth=none*References*: <20131030174502 dot GA18107 at domone dot podge> <Pine dot LNX dot 4 dot 64 dot 1310301749400 dot 22878 at digraph dot polyomino dot org dot uk> <20131030183318 dot GA18706 at domone dot podge> <20131101133126 dot GA2546 at domone dot podge> <5273E29D dot 90000 at cs dot ucla dot edu> <20131101175802 dot GA5471 at domone dot podge> <527412A1 dot 8080707 at cs dot ucla dot edu> <20131101215358 dot GA7000 at domone dot podge> <527442F2 dot 3040705 at cs dot ucla dot edu>

On Fri, Nov 01, 2013 at 05:10:26PM -0700, Paul Eggert wrote: > OndÅej BÃlka wrote: > > Weird as I cannot get these on athlon X2 and phenom X6. As one iteration takes > > 2.096 * 2600 / 340 = 16 cycles a slowdown is 8 cycles which is hard to explain. > > > > I attached binaries which were used to test (gcc version 4.4.5 (Debian 4.4.5-8)) > > Yes, it's weird. I built binaries on Ubuntu 13.10 and ran them > on the Deneb machine in question and the times were identical. > > Even stranger: the 2x difference came when I was using a GCC 4.8.2 > that I built myself, unmodified from the sources. When I switched > to the system-supplied Fedora 19 GCC 4.8.2 20131017 (Red Hat 4.8.2-1), > the performance difference went away. > > Could be a caching thing, I suppose. I wouldn't worry about it too much. > I found that I forgotten mark rdx in branchfree version clobbered which probably caused that. Correct assembly is following. size_t scratch; asm ("mul %%rdx; sbb %%rdx, %%rdx; or %%rdx, %%rax" : "=a" (ret) , "=d" (scratch) : "a" (x) , "d" (y)); > > overhead is versus version with no checking. > > The measurement I'd like to see is how much does it bloat > the code compared to the way we're doing it now, namely, > > p = malloc (add_sat (mul_sat (a, b), c)); > checking takes 10 bytes for branch version, 12 bytes in branchfree version. > versus > > p = a <= (SIZE_MAX - c) / b < malloc (a * b + c) : (errno=ENOMEM, NULL); > > or perhaps this would be a better comparison: > > p = malloc (a <= (SIZE_MAX - c) / b ? a * b + c : SIZE_MAX); > Here checking as done by gcc takes 19 bytes, which is mainly because of moves SIZE_MAX and (SIZE_MAX - c) / b both taking 7 bytes. > or, if we're going to go off the deep end tuning anyway, perhaps > we should have a muladd_s primitive that does multiply *and* add!

**Follow-Ups**:**Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.***From:*Paul Eggert

**References**:**[PATCH v2.0] Use saturated arithmetic for overflow detection.***From:*OndÅej BÃlka

**Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.***From:*Paul Eggert

**Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.***From:*OndÅej BÃlka

**Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.***From:*Paul Eggert

**Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.***From:*OndÅej BÃlka

**Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.***From:*Paul Eggert

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |