This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: Paul Eggert <eggert at cs dot ucla dot edu>, libc-alpha at sourceware dot org
- Date: Tue, 5 Nov 2013 11:22:26 +0100
- Subject: Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.
- Authentication-results: sourceware.org; auth=none
- References: <20131030174502 dot GA18107 at domone dot podge> <Pine dot LNX dot 4 dot 64 dot 1310301749400 dot 22878 at digraph dot polyomino dot org dot uk> <20131030183318 dot GA18706 at domone dot podge> <20131101133126 dot GA2546 at domone dot podge> <Pine dot LNX dot 4 dot 64 dot 1311011640530 dot 24652 at digraph dot polyomino dot org dot uk>
On Fri, Nov 01, 2013 at 04:48:38PM +0000, Joseph S. Myers wrote:
> On Fri, 1 Nov 2013, Ondrej Bilka wrote:
>
> > This version adds saturated arithmetic support with optimized x86_64
> > version.
>
> I consider such an x86_64 version to be premature optimization without
> clear benchmark results (speed or code size) to justify it. It would not
> surprise me if use of the overflow flag is slow in some cases and straight
> comparisons would be faster. And using these asm versions prevents the
> compiler from optimizing based on constant arguments.
>
> In my view, we shouldn't optimize this with inline asm at all. For any
> optimizations, work on getting appropriate built-in functions into GCC
> (making sure that they do get optimized there based on constant arguments,
> so that the overflow flag is used only when it's the most efficient
> approach) and then use those functions in glibc (architecture-independent
> file) conditional on the GCC version.
>
I could omit assembly optimizations but then do not argue about
performance impact. As results are used in malloc which is around
hundred times slower than performance lost effect is small. If function
allocates memory in two places then you should merge these allocations
into one which has better performance gain/added complexity ratio.