This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: IEEE128 binary float to decimal float conversion routines
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Steve Munroe <sjmunroe at us dot ibm dot com>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Michael R Meissner <mrmeissn at us dot ibm dot com>, "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
- Date: Wed, 18 Nov 2015 23:53:26 +0000
- Subject: Re: IEEE128 binary float to decimal float conversion routines
- Authentication-results: sourceware.org; auth=none
- References: <564A16D5 dot 3020105 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511161803500 dot 30498 at digraph dot polyomino dot org dot uk> <564A6A90 dot 40607 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511162356020 dot 32387 at digraph dot polyomino dot org dot uk> <201511180131 dot tAI1Vs2L023118 at d03av01 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511180144150 dot 2302 at digraph dot polyomino dot org dot uk> <201511182301 dot tAIN1Igc011083 at d03av02 dot boulder dot ibm dot com>
On Wed, 18 Nov 2015, Steve Munroe wrote:
> > The problem I see is with the final "result = temp;" which converts
> double
> > to float.
> >
> > The earlier steps are probably accurate to within 1ulp. But if temp (a
> > double) is half way between two representable float values - while the
> > original argument is very close to that half way value, but not exact -
> > then the final conversion will round to even, which may or may not be
> > correct depending on which side of that double value the original
> > _Decimal128 value was. (Much the same applies in other rounding modes
> > when the double value equals a float value but the original value isn't
> > exactly that float value.)
> >
> Would changing the the decimal to binary conversion to be round to odd,
> offset the following round double to float?
>
> http://www.exploringbinary.com/gcc-avoids-double-rounding-errors-with-round-to-odd/
No, because it would just offload the problem onto getting a conversion
from _Decimal128 to double that is correctly rounded to odd, which is no
easier (indeed, requires more work, not less) than the original problem of
converting to float.
The existing code loses some of the original precision when taking just 15
digits of the mantissa for conversion to double (not OK when you want to
determine the exact value rounded to odd after further operations - in the
hard cases, the final decimal digit will affect the correct rounding).
Then the multiplications / divisions by precomputed powers of 10 use a
table of long double values - while that gives extra precision (though
probably not enough extra precision), it's also incompatible with doing
rounding to odd, since IBM long double doesn't give meaningful "inexact"
exceptions or work in non-default rounding modes, while rounding to odd
requires working in round-to-zero mode and then checking the "inexact"
flag.
> We could look at this if it requires a few additional instructions. But I
> would be very reluctant to resort to heavy handed (and extremely slow)
> solutions to get perfect rounding for a few corner cases.
It is of course possible to achieve IEEE-conforming results by first doing
an approximate conversion with rigorous error bounds, then only doing the
slower conversion if the result of the first conversion was very close to
half way / exact (depending on the rounding mode), within the error bounds
(so only using the slow case rarely, as long as you avoid it in the cases
where the conversion is exact). Cf. the dbl-64 libm functions that do
things like that (and get complaints for the slowness of the slow case,
because they use far more precision than is actually needed for correct
rounding - in the case of conversions it's much easier to determine how
much precision is actually needed). (Now most of those libm functions
don't actually need to be correctly rounded at all - TS 18661-4 defines
separate names such as crexp for correctly rounded functions - whereas
conversions between binary and decimal are defined to be correctly rounded
by both TS 18661-2 and the older TR 24732 specification of C bindings for
decimal floating-point.)
Another issue I see with the implementation: the "Obvious underflow" case
for exponents below -39 includes a substantial part of the subnormal
range, so that decimal values in that range will be wrongly converted to
zero instead of appropriate subnormal floats (so being wildly inaccurate
rather than the incorrect last place of the issue discussed above).
Likewise for truncation to double (trunctddf.c).
--
Joseph S. Myers
joseph@codesourcery.com