This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IEEE128 binary float to decimal float conversion routines

On Wed, 2015-11-18 at 23:53 +0000, Joseph Myers wrote:
> On Wed, 18 Nov 2015, Steve Munroe wrote:
> > > The problem I see is with the final "result = temp;" which converts
> > double
> > > to float.
> > >
> > > The earlier steps are probably accurate to within 1ulp.  But if temp (a
> > > double) is half way between two representable float values - while the
> > > original argument is very close to that half way value, but not exact -
> > > then the final conversion will round to even, which may or may not be
> > > correct depending on which side of that double value the original
> > > _Decimal128 value was.  (Much the same applies in other rounding modes
> > > when the double value equals a float value but the original value isn't
> > > exactly that float value.)
> > >
> > Would changing the the decimal to binary conversion to be round to odd,
> > offset the following round double to float?
> > 
> >
> No, because it would just offload the problem onto getting a conversion 
> from _Decimal128 to double that is correctly rounded to odd, which is no 
> easier (indeed, requires more work, not less) than the original problem of 
> converting to float.
Joseph I think we are talking pass each other on this.

The PowerISA (2.05 and later ) Decimal Floating-point "Round to Prepare
for Shorter Precision" mode would not address the Decimal128
convert/truncate to shorter binary floating-point (double or float).

But it will address the Float128 convert/truncate to to shorter decimal
floating-pointer (_Decimal64 and _Decimal32).

> The existing code loses some of the original precision when taking just 15 
> digits of the mantissa for conversion to double (not OK when you want to 
> determine the exact value rounded to odd after further operations - in the 
> hard cases, the final decimal digit will affect the correct rounding).  
> Then the multiplications / divisions by precomputed powers of 10 use a 
> table of long double values - while that gives extra precision (though 
> probably not enough extra precision), it's also incompatible with doing 
> rounding to odd, since IBM long double doesn't give meaningful "inexact" 
> exceptions or work in non-default rounding modes, while rounding to odd 
> requires working in round-to-zero mode and then checking the "inexact" 
> flag.
So in the case of TIMode or KFmode conversion to _Decimal64/_Decimal32
we can save the current rounding mode (fe_dec_getround()) then use
fe_dec_setround (DEC_ROUND_05UP) to set the "Round to Prepare for
Shorter Precision" before the multiply that converts the mantissa to the
target radix. Then just before the the instruction that rounds to the
final (_Decimal64 or _Decimal32) type, we restore the callers rounding
more and execute the final version in the correct rounding mode.

I believe that addresses you double rounding concern for these

And now we are address the similar issues with float128 with the recent
public release of the PowerISA-3.0:

The PowerISA-3.0 document is here:

As an addition to the Vector Scalar extents we add support for IEEE
128-bit binary floating point. This architecture provides the option to
override the default round-mode and force "round to odd" on a
instruction basis for all the key operators. This obviously includes
add/subtract/multiple/divide/sqrt and convert convert quad to double.

This will be sufficient to support the _Decimal128 to double and float
conversion that you mentioned as problematic.

While we are waiting for the hardware implementation of the
PowerISA-3.0, we need to address the specifics of these conversions in
the soft-fp implementation. 

My observation is that a common element of these conversion is a large
precision multiply (to convert the radix of the mantissa) then a
possible truncation (with rounding) to the final precision in the new

It seem a simple effort to provide a soft-fp implementation that
combines the multiple and truncation, without intermediate rounding.

As the soft-fp implementation performs rounding in the FP_PACK
operation, it simple to avoid the intermediate PACK/UNPACK steps between
the FP_MUL and FP_TRUNC operations. The only trick is that arithmetic
operations user the canonical PACK/UNPACK operation while the Truncate
operations use the SEMIRAW PACK/UNPACK. Some adjustments of the exponent
are requires to flow the (unrounded) result of the FP_MUL directly into
the FP_TRUNC operation, but this does not seem to require a large
effort. The final FP_PACK operation following the FP_TRUNC will perform
the final and correct rounding to the require precision.

This seems sufficient to address the issues you have raised and seems
much simpler then wholesale additions of round to odd to the soft-fp

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]