This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: IEEE128 binary float to decimal float conversion routines

From: Joseph Myers <joseph at codesourcery dot com>
To: Steve Munroe <sjmunroe at us dot ibm dot com>
Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Michael R Meissner <mrmeissn at us dot ibm dot com>, "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
Date: Wed, 18 Nov 2015 23:53:26 +0000
Subject: Re: IEEE128 binary float to decimal float conversion routines
Authentication-results: sourceware.org; auth=none
References: <564A16D5 dot 3020105 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511161803500 dot 30498 at digraph dot polyomino dot org dot uk> <564A6A90 dot 40607 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511162356020 dot 32387 at digraph dot polyomino dot org dot uk> <201511180131 dot tAI1Vs2L023118 at d03av01 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511180144150 dot 2302 at digraph dot polyomino dot org dot uk> <201511182301 dot tAIN1Igc011083 at d03av02 dot boulder dot ibm dot com>

On Wed, 18 Nov 2015, Steve Munroe wrote:

> > The problem I see is with the final "result = temp;" which converts
> double
> > to float.
> >
> > The earlier steps are probably accurate to within 1ulp.  But if temp (a
> > double) is half way between two representable float values - while the
> > original argument is very close to that half way value, but not exact -
> > then the final conversion will round to even, which may or may not be
> > correct depending on which side of that double value the original
> > _Decimal128 value was.  (Much the same applies in other rounding modes
> > when the double value equals a float value but the original value isn't
> > exactly that float value.)
> >
> Would changing the the decimal to binary conversion to be round to odd,
> offset the following round double to float?
> 
> http://www.exploringbinary.com/gcc-avoids-double-rounding-errors-with-round-to-odd/

No, because it would just offload the problem onto getting a conversion 
from _Decimal128 to double that is correctly rounded to odd, which is no 
easier (indeed, requires more work, not less) than the original problem of 
converting to float.

The existing code loses some of the original precision when taking just 15 
digits of the mantissa for conversion to double (not OK when you want to 
determine the exact value rounded to odd after further operations - in the 
hard cases, the final decimal digit will affect the correct rounding).  
Then the multiplications / divisions by precomputed powers of 10 use a 
table of long double values - while that gives extra precision (though 
probably not enough extra precision), it's also incompatible with doing 
rounding to odd, since IBM long double doesn't give meaningful "inexact" 
exceptions or work in non-default rounding modes, while rounding to odd 
requires working in round-to-zero mode and then checking the "inexact" 
flag.

> We could look at this if it requires a few additional instructions. But I
> would be very reluctant to resort to heavy handed (and extremely slow)
> solutions to get perfect rounding for a few corner cases.

It is of course possible to achieve IEEE-conforming results by first doing 
an approximate conversion with rigorous error bounds, then only doing the 
slower conversion if the result of the first conversion was very close to 
half way / exact (depending on the rounding mode), within the error bounds 
(so only using the slow case rarely, as long as you avoid it in the cases 
where the conversion is exact).  Cf. the dbl-64 libm functions that do 
things like that (and get complaints for the slowness of the slow case, 
because they use far more precision than is actually needed for correct 
rounding - in the case of conversions it's much easier to determine how 
much precision is actually needed).  (Now most of those libm functions 
don't actually need to be correctly rounded at all - TS 18661-4 defines 
separate names such as crexp for correctly rounded functions - whereas 
conversions between binary and decimal are defined to be correctly rounded 
by both TS 18661-2 and the older TR 24732 specification of C bindings for 
decimal floating-point.)

Another issue I see with the implementation: the "Obvious underflow" case 
for exponents below -39 includes a substantial part of the subnormal 
range, so that decimal values in that range will be wrongly converted to 
zero instead of appropriate subnormal floats (so being wildly inaccurate 
rather than the incorrect last place of the issue discussed above).  
Likewise for truncation to double (trunctddf.c).

-- 
Joseph S. Myers
joseph@codesourcery.com

References:
- IEEE128 binary float to decimal float conversion routines
  - From: Paul E. Murphy
- Re: IEEE128 binary float to decimal float conversion routines
  - From: Joseph Myers
- Re: IEEE128 binary float to decimal float conversion routines
  - From: Paul E. Murphy
- Re: IEEE128 binary float to decimal float conversion routines
  - From: Joseph Myers
- Re: IEEE128 binary float to decimal float conversion routines
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]