This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: IEEE128 binary float to decimal float conversion routines

From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
To: Joseph Myers <joseph at codesourcery dot com>
Cc: Steve Munroe <sjmunroe at us dot ibm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Michael R Meissner <mrmeissn at us dot ibm dot com>, "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
Date: Tue, 15 Dec 2015 15:18:46 -0600
Subject: Re: IEEE128 binary float to decimal float conversion routines
Authentication-results: sourceware.org; auth=none
References: <564A16D5 dot 3020105 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511161803500 dot 30498 at digraph dot polyomino dot org dot uk> <564A6A90 dot 40607 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511162356020 dot 32387 at digraph dot polyomino dot org dot uk> <201511180131 dot tAI1Vs2L023118 at d03av01 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511180144150 dot 2302 at digraph dot polyomino dot org dot uk> <201511182301 dot tAIN1Igc011083 at d03av02 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511182322260 dot 26547 at digraph dot polyomino dot org dot uk> <1449594999 dot 9274 dot 45 dot camel at oc7878010663> <alpine dot DEB dot 2 dot 10 dot 1512081737230 dot 19569 at digraph dot polyomino dot org dot uk>
Reply-to: munroesj at linux dot vnet dot ibm dot com

On Tue, 2015-12-08 at 18:25 +0000, Joseph Myers wrote:
> On Tue, 8 Dec 2015, Steven Munroe wrote:
> 
> > The PowerISA (2.05 and later ) Decimal Floating-point "Round to Prepare
> > for Shorter Precision" mode would not address the Decimal128
> > convert/truncate to shorter binary floating-point (double or float).
> > 
> > But it will address the Float128 convert/truncate to to shorter decimal
> > floating-pointer (_Decimal64 and _Decimal32).
> 
> Yes, if you have a conversion from _Float128 to _Decimal128 that works for 
> Round to Prepare for Shorter Precision then you could use that as an 
> intermediate step in converting to _Decimal64 and _Decimal32 (it's not the 
> most efficient approach, but it's certainly simpler than having multiple 
> variants of the full conversion code).
> 
> The hardest part is converting from _Float128 to _Decimal128.  Once you 
> can do that (for all rounding modes and with correct exceptions), 
> converting to the narrower types is easy, whether you have multiple 
> variants of the same code or use Round to Prepare for Shorter Precision.  
> Likewise for conversions in the other direction - _Decimal128 to _Float128 
> is the hardest part, if you can do that then converting to narrower types 
> is straightforward.
> 
> > So in the case of TIMode or KFmode conversion to _Decimal64/_Decimal32
> > we can save the current rounding mode (fe_dec_getround()) then use
> > fe_dec_setround (DEC_ROUND_05UP) to set the "Round to Prepare for
> > Shorter Precision" before the multiply that converts the mantissa to the
> > target radix. Then just before the the instruction that rounds to the
> > final (_Decimal64 or _Decimal32) type, we restore the callers rounding
> > more and execute the final version in the correct rounding mode.
> > 
> > I believe that addresses you double rounding concern for these
> > conversions.
> 
> For TImode it's not hard to avoid double rounding this way, by splitting 
> the TImode number into two numbers that are exactly convertible to 
> _Decimal128, so the only inexact operation is a single addition, which can 
> be done in the Round to Prepare for Shorter Precision mode (and then you 
> can convert to _Decimal64 / _Decimal32 in the original mode).  [In all 
> cases, getting the preferred quantum for decimal results is a minor matter 
> to deal with at the end.]
> 
> For _Float128, this only reduces the problem to doing a conversion of 
> _Float128 to _Decimal128 in that mode.  Which is not simply a single 
> multiply.  Not all mantissa values for _Float128 can be represented in 
> _Decimal128 (2**113 > 10**34).  And nor can all powers of 2 that you need 
> to multiply / divide by be represented in _Decimal128.  And when you have 
> more than one inexact operation, the final result is generally not 
> correctly rounded for any rounding mode.  And so the complexity goes 
> massively up (compare the fmaf implementation with round-to-odd on double 
> - a single inexact addition on double done in round-to-odd followed by 
> converting back to float in the original rounding mode - with the 
> sysdeps/ieee754/dbl-64/s_fma.c code, which also uses round-to-odd, but 
> with far more complexity in order to achieve the precision extension 
> required for intermediate computations).
> 
> You may well be able to use precision-extension techniques - so doing a 
> conversion that produces a sum of two or three _Decimal128 values (the 
> exact number needed being determined by a continued fraction analysis) and 
> then adding up those values in the Round to Prepare for Shorter Precision 
> mode.  But I'd be surprised if there is a simple and correct 
> implementation of the conversion that doesn't involve extending 
> intermediate precision to have about 128 extra bits, given the complexity 
> and extra precision described in the papers on this subject such as the 
> one referenced in this thread.
> 
> > My observation is that a common element of these conversion is a large
> > precision multiply (to convert the radix of the mantissa) then a
> > possible truncation (with rounding) to the final precision in the new
> > radix. 
> 
> Where large precision means about 256 bits (not simply 128 * 128 -> 256 
> multiplication, but also having the powers of 2 or 10 to that precision, 
> so more like 128 * 256 -> 384 which can be truncated to about 256).  
> Again, exact precisions to be determined by continued fraction analysis.
> 

Ok let my try with the simpler case of _Decimal128 to Float128 where the
significand conversion is exact (log2(10e34) -> 112.9 -> <= 113 bits).
So you mention "continued fraction analysis" which was not part of my
formal education (40+ years ago) but I will try.

The question is how many significant bits does it take to represent a
power of 10? This is interesting because my implementation of trunctfkf
involves a multiple of converted (to float128) mantissa by 10eN where N
is the exponent of the original _Decimal128. So what powers of 10 can be
represented exactly as a float128?

The requires significant bits should be log2(10eN), but as the binary of
an exact power of 10 generate trailing zero bit for each N (1000 has 3
trailing zeros, 10000000 has 6, ...)

So the number significant bits are log2(10eN)-N. A quick binary search
of shows that values up to 10e48 require less than 113-bits and so can
be represented exactly in _float128.

So any _Decimal128 < 9999999999999999999999999999999999e48 (1.0e82) can
be converted with one _Float128 multiply, of 2 exact values, giving a
rounded result to 1ULP.

This does not require conversion to string and back or carrying more
precision then naturally available in the _float128.

Now as the exponent of _Decimal128 input exceeds 48 the table of
_float128 powers of 10 will contain values that have been rounded. Now I
assume that some additional exponent range can be covered by by insuring
that the table _float128 powers_of_10 have been pre-rounded to odd?

Do you agree with this analysis?

Follow-Ups:
- Re: IEEE128 binary float to decimal float conversion routines
  - From: Joseph Myers

References:
- Re: IEEE128 binary float to decimal float conversion routines
  - From: Steven Munroe
- Re: IEEE128 binary float to decimal float conversion routines
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]