This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: IEEE128 binary float to decimal float conversion routines

From: Joseph Myers <joseph at codesourcery dot com>
To: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
Cc: Steve Munroe <sjmunroe at us dot ibm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Michael R Meissner <mrmeissn at us dot ibm dot com>, "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
Date: Tue, 8 Dec 2015 18:25:11 +0000
Subject: Re: IEEE128 binary float to decimal float conversion routines
Authentication-results: sourceware.org; auth=none
References: <564A16D5 dot 3020105 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511161803500 dot 30498 at digraph dot polyomino dot org dot uk> <564A6A90 dot 40607 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511162356020 dot 32387 at digraph dot polyomino dot org dot uk> <201511180131 dot tAI1Vs2L023118 at d03av01 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511180144150 dot 2302 at digraph dot polyomino dot org dot uk> <201511182301 dot tAIN1Igc011083 at d03av02 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511182322260 dot 26547 at digraph dot polyomino dot org dot uk> <1449594999 dot 9274 dot 45 dot camel at oc7878010663>

On Tue, 8 Dec 2015, Steven Munroe wrote:

> The PowerISA (2.05 and later ) Decimal Floating-point "Round to Prepare
> for Shorter Precision" mode would not address the Decimal128
> convert/truncate to shorter binary floating-point (double or float).
> 
> But it will address the Float128 convert/truncate to to shorter decimal
> floating-pointer (_Decimal64 and _Decimal32).

Yes, if you have a conversion from _Float128 to _Decimal128 that works for 
Round to Prepare for Shorter Precision then you could use that as an 
intermediate step in converting to _Decimal64 and _Decimal32 (it's not the 
most efficient approach, but it's certainly simpler than having multiple 
variants of the full conversion code).

The hardest part is converting from _Float128 to _Decimal128.  Once you 
can do that (for all rounding modes and with correct exceptions), 
converting to the narrower types is easy, whether you have multiple 
variants of the same code or use Round to Prepare for Shorter Precision.  
Likewise for conversions in the other direction - _Decimal128 to _Float128 
is the hardest part, if you can do that then converting to narrower types 
is straightforward.

> So in the case of TIMode or KFmode conversion to _Decimal64/_Decimal32
> we can save the current rounding mode (fe_dec_getround()) then use
> fe_dec_setround (DEC_ROUND_05UP) to set the "Round to Prepare for
> Shorter Precision" before the multiply that converts the mantissa to the
> target radix. Then just before the the instruction that rounds to the
> final (_Decimal64 or _Decimal32) type, we restore the callers rounding
> more and execute the final version in the correct rounding mode.
> 
> I believe that addresses you double rounding concern for these
> conversions.

For TImode it's not hard to avoid double rounding this way, by splitting 
the TImode number into two numbers that are exactly convertible to 
_Decimal128, so the only inexact operation is a single addition, which can 
be done in the Round to Prepare for Shorter Precision mode (and then you 
can convert to _Decimal64 / _Decimal32 in the original mode).  [In all 
cases, getting the preferred quantum for decimal results is a minor matter 
to deal with at the end.]

For _Float128, this only reduces the problem to doing a conversion of 
_Float128 to _Decimal128 in that mode.  Which is not simply a single 
multiply.  Not all mantissa values for _Float128 can be represented in 
_Decimal128 (2**113 > 10**34).  And nor can all powers of 2 that you need 
to multiply / divide by be represented in _Decimal128.  And when you have 
more than one inexact operation, the final result is generally not 
correctly rounded for any rounding mode.  And so the complexity goes 
massively up (compare the fmaf implementation with round-to-odd on double 
- a single inexact addition on double done in round-to-odd followed by 
converting back to float in the original rounding mode - with the 
sysdeps/ieee754/dbl-64/s_fma.c code, which also uses round-to-odd, but 
with far more complexity in order to achieve the precision extension 
required for intermediate computations).

You may well be able to use precision-extension techniques - so doing a 
conversion that produces a sum of two or three _Decimal128 values (the 
exact number needed being determined by a continued fraction analysis) and 
then adding up those values in the Round to Prepare for Shorter Precision 
mode.  But I'd be surprised if there is a simple and correct 
implementation of the conversion that doesn't involve extending 
intermediate precision to have about 128 extra bits, given the complexity 
and extra precision described in the papers on this subject such as the 
one referenced in this thread.

> My observation is that a common element of these conversion is a large
> precision multiply (to convert the radix of the mantissa) then a
> possible truncation (with rounding) to the final precision in the new
> radix. 

Where large precision means about 256 bits (not simply 128 * 128 -> 256 
multiplication, but also having the powers of 2 or 10 to that precision, 
so more like 128 * 256 -> 384 which can be truncated to about 256).  
Again, exact precisions to be determined by continued fraction analysis.

> It seem a simple effort to provide a soft-fp implementation that
> combines the multiple and truncation, without intermediate rounding.

That much is simple (the soft-fp code expects to produce a binary result, 
but you could make it produce integer * power of 10 for the conversions to 
decimal); cf. the _FP_FMA implementation that does a double-width multiply 
plus addition before truncating.  You do need to determine the right 
intermediate precision and add the implementations of that extra-precision 
multiply.

> This seems sufficient to address the issues you have raised and seems
> much simpler then wholesale additions of round to odd to the soft-fp
> implementation. 

Adding round-to-odd would be simple enough as well (only a few places 
check for particular rounding modes); I just don't think it would help 
much.  Anything using round-to-odd when working with separate 
floating-point operations is better done in the soft-fp context by keeping 
the sticky bit and avoiding intermediate roundings (and much the same 
applies to Dekker-style precision extension - it makes much less sense 
with soft-fp than with hardware floating point).

-- 
Joseph S. Myers
joseph@codesourcery.com

Follow-Ups:
- Re: IEEE128 binary float to decimal float conversion routines
  - From: Steven Munroe

References:
- Re: IEEE128 binary float to decimal float conversion routines
  - From: Steven Munroe

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]