This is the mail archive of the
`libc-alpha@sourceware.org`
mailing list for the glibc project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: Joseph Myers <joseph at codesourcery dot com>*To*: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>*Cc*: Steve Munroe <sjmunroe at us dot ibm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Michael R Meissner <mrmeissn at us dot ibm dot com>, "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>*Date*: Tue, 8 Dec 2015 18:25:11 +0000*Subject*: Re: IEEE128 binary float to decimal float conversion routines*Authentication-results*: sourceware.org; auth=none*References*: <564A16D5 dot 3020105 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511161803500 dot 30498 at digraph dot polyomino dot org dot uk> <564A6A90 dot 40607 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511162356020 dot 32387 at digraph dot polyomino dot org dot uk> <201511180131 dot tAI1Vs2L023118 at d03av01 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511180144150 dot 2302 at digraph dot polyomino dot org dot uk> <201511182301 dot tAIN1Igc011083 at d03av02 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511182322260 dot 26547 at digraph dot polyomino dot org dot uk> <1449594999 dot 9274 dot 45 dot camel at oc7878010663>

On Tue, 8 Dec 2015, Steven Munroe wrote: > The PowerISA (2.05 and later ) Decimal Floating-point "Round to Prepare > for Shorter Precision" mode would not address the Decimal128 > convert/truncate to shorter binary floating-point (double or float). > > But it will address the Float128 convert/truncate to to shorter decimal > floating-pointer (_Decimal64 and _Decimal32). Yes, if you have a conversion from _Float128 to _Decimal128 that works for Round to Prepare for Shorter Precision then you could use that as an intermediate step in converting to _Decimal64 and _Decimal32 (it's not the most efficient approach, but it's certainly simpler than having multiple variants of the full conversion code). The hardest part is converting from _Float128 to _Decimal128. Once you can do that (for all rounding modes and with correct exceptions), converting to the narrower types is easy, whether you have multiple variants of the same code or use Round to Prepare for Shorter Precision. Likewise for conversions in the other direction - _Decimal128 to _Float128 is the hardest part, if you can do that then converting to narrower types is straightforward. > So in the case of TIMode or KFmode conversion to _Decimal64/_Decimal32 > we can save the current rounding mode (fe_dec_getround()) then use > fe_dec_setround (DEC_ROUND_05UP) to set the "Round to Prepare for > Shorter Precision" before the multiply that converts the mantissa to the > target radix. Then just before the the instruction that rounds to the > final (_Decimal64 or _Decimal32) type, we restore the callers rounding > more and execute the final version in the correct rounding mode. > > I believe that addresses you double rounding concern for these > conversions. For TImode it's not hard to avoid double rounding this way, by splitting the TImode number into two numbers that are exactly convertible to _Decimal128, so the only inexact operation is a single addition, which can be done in the Round to Prepare for Shorter Precision mode (and then you can convert to _Decimal64 / _Decimal32 in the original mode). [In all cases, getting the preferred quantum for decimal results is a minor matter to deal with at the end.] For _Float128, this only reduces the problem to doing a conversion of _Float128 to _Decimal128 in that mode. Which is not simply a single multiply. Not all mantissa values for _Float128 can be represented in _Decimal128 (2**113 > 10**34). And nor can all powers of 2 that you need to multiply / divide by be represented in _Decimal128. And when you have more than one inexact operation, the final result is generally not correctly rounded for any rounding mode. And so the complexity goes massively up (compare the fmaf implementation with round-to-odd on double - a single inexact addition on double done in round-to-odd followed by converting back to float in the original rounding mode - with the sysdeps/ieee754/dbl-64/s_fma.c code, which also uses round-to-odd, but with far more complexity in order to achieve the precision extension required for intermediate computations). You may well be able to use precision-extension techniques - so doing a conversion that produces a sum of two or three _Decimal128 values (the exact number needed being determined by a continued fraction analysis) and then adding up those values in the Round to Prepare for Shorter Precision mode. But I'd be surprised if there is a simple and correct implementation of the conversion that doesn't involve extending intermediate precision to have about 128 extra bits, given the complexity and extra precision described in the papers on this subject such as the one referenced in this thread. > My observation is that a common element of these conversion is a large > precision multiply (to convert the radix of the mantissa) then a > possible truncation (with rounding) to the final precision in the new > radix. Where large precision means about 256 bits (not simply 128 * 128 -> 256 multiplication, but also having the powers of 2 or 10 to that precision, so more like 128 * 256 -> 384 which can be truncated to about 256). Again, exact precisions to be determined by continued fraction analysis. > It seem a simple effort to provide a soft-fp implementation that > combines the multiple and truncation, without intermediate rounding. That much is simple (the soft-fp code expects to produce a binary result, but you could make it produce integer * power of 10 for the conversions to decimal); cf. the _FP_FMA implementation that does a double-width multiply plus addition before truncating. You do need to determine the right intermediate precision and add the implementations of that extra-precision multiply. > This seems sufficient to address the issues you have raised and seems > much simpler then wholesale additions of round to odd to the soft-fp > implementation. Adding round-to-odd would be simple enough as well (only a few places check for particular rounding modes); I just don't think it would help much. Anything using round-to-odd when working with separate floating-point operations is better done in the soft-fp context by keeping the sticky bit and avoiding intermediate roundings (and much the same applies to Dekker-style precision extension - it makes much less sense with soft-fp than with hardware floating point). -- Joseph S. Myers joseph@codesourcery.com

**Follow-Ups**:**Re: IEEE128 binary float to decimal float conversion routines***From:*Steven Munroe

**References**:**Re: IEEE128 binary float to decimal float conversion routines***From:*Steven Munroe

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |