This is the mail archive of the
mailing list for the glibc project.
Re: IEEE128 binary float to decimal float conversion routines
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- Cc: Steve Munroe <sjmunroe at us dot ibm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Michael R Meissner <mrmeissn at us dot ibm dot com>, "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
- Date: Tue, 8 Dec 2015 18:25:11 +0000
- Subject: Re: IEEE128 binary float to decimal float conversion routines
- Authentication-results: sourceware.org; auth=none
- References: <564A16D5 dot 3020105 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511161803500 dot 30498 at digraph dot polyomino dot org dot uk> <564A6A90 dot 40607 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511162356020 dot 32387 at digraph dot polyomino dot org dot uk> <201511180131 dot tAI1Vs2L023118 at d03av01 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511180144150 dot 2302 at digraph dot polyomino dot org dot uk> <201511182301 dot tAIN1Igc011083 at d03av02 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511182322260 dot 26547 at digraph dot polyomino dot org dot uk> <1449594999 dot 9274 dot 45 dot camel at oc7878010663>
On Tue, 8 Dec 2015, Steven Munroe wrote:
> The PowerISA (2.05 and later ) Decimal Floating-point "Round to Prepare
> for Shorter Precision" mode would not address the Decimal128
> convert/truncate to shorter binary floating-point (double or float).
> But it will address the Float128 convert/truncate to to shorter decimal
> floating-pointer (_Decimal64 and _Decimal32).
Yes, if you have a conversion from _Float128 to _Decimal128 that works for
Round to Prepare for Shorter Precision then you could use that as an
intermediate step in converting to _Decimal64 and _Decimal32 (it's not the
most efficient approach, but it's certainly simpler than having multiple
variants of the full conversion code).
The hardest part is converting from _Float128 to _Decimal128. Once you
can do that (for all rounding modes and with correct exceptions),
converting to the narrower types is easy, whether you have multiple
variants of the same code or use Round to Prepare for Shorter Precision.
Likewise for conversions in the other direction - _Decimal128 to _Float128
is the hardest part, if you can do that then converting to narrower types
> So in the case of TIMode or KFmode conversion to _Decimal64/_Decimal32
> we can save the current rounding mode (fe_dec_getround()) then use
> fe_dec_setround (DEC_ROUND_05UP) to set the "Round to Prepare for
> Shorter Precision" before the multiply that converts the mantissa to the
> target radix. Then just before the the instruction that rounds to the
> final (_Decimal64 or _Decimal32) type, we restore the callers rounding
> more and execute the final version in the correct rounding mode.
> I believe that addresses you double rounding concern for these
For TImode it's not hard to avoid double rounding this way, by splitting
the TImode number into two numbers that are exactly convertible to
_Decimal128, so the only inexact operation is a single addition, which can
be done in the Round to Prepare for Shorter Precision mode (and then you
can convert to _Decimal64 / _Decimal32 in the original mode). [In all
cases, getting the preferred quantum for decimal results is a minor matter
to deal with at the end.]
For _Float128, this only reduces the problem to doing a conversion of
_Float128 to _Decimal128 in that mode. Which is not simply a single
multiply. Not all mantissa values for _Float128 can be represented in
_Decimal128 (2**113 > 10**34). And nor can all powers of 2 that you need
to multiply / divide by be represented in _Decimal128. And when you have
more than one inexact operation, the final result is generally not
correctly rounded for any rounding mode. And so the complexity goes
massively up (compare the fmaf implementation with round-to-odd on double
- a single inexact addition on double done in round-to-odd followed by
converting back to float in the original rounding mode - with the
sysdeps/ieee754/dbl-64/s_fma.c code, which also uses round-to-odd, but
with far more complexity in order to achieve the precision extension
required for intermediate computations).
You may well be able to use precision-extension techniques - so doing a
conversion that produces a sum of two or three _Decimal128 values (the
exact number needed being determined by a continued fraction analysis) and
then adding up those values in the Round to Prepare for Shorter Precision
mode. But I'd be surprised if there is a simple and correct
implementation of the conversion that doesn't involve extending
intermediate precision to have about 128 extra bits, given the complexity
and extra precision described in the papers on this subject such as the
one referenced in this thread.
> My observation is that a common element of these conversion is a large
> precision multiply (to convert the radix of the mantissa) then a
> possible truncation (with rounding) to the final precision in the new
Where large precision means about 256 bits (not simply 128 * 128 -> 256
multiplication, but also having the powers of 2 or 10 to that precision,
so more like 128 * 256 -> 384 which can be truncated to about 256).
Again, exact precisions to be determined by continued fraction analysis.
> It seem a simple effort to provide a soft-fp implementation that
> combines the multiple and truncation, without intermediate rounding.
That much is simple (the soft-fp code expects to produce a binary result,
but you could make it produce integer * power of 10 for the conversions to
decimal); cf. the _FP_FMA implementation that does a double-width multiply
plus addition before truncating. You do need to determine the right
intermediate precision and add the implementations of that extra-precision
> This seems sufficient to address the issues you have raised and seems
> much simpler then wholesale additions of round to odd to the soft-fp
Adding round-to-odd would be simple enough as well (only a few places
check for particular rounding modes); I just don't think it would help
much. Anything using round-to-odd when working with separate
floating-point operations is better done in the soft-fp context by keeping
the sticky bit and avoiding intermediate roundings (and much the same
applies to Dekker-style precision extension - it makes much less sense
with soft-fp than with hardware floating point).
Joseph S. Myers