This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IEEE128 binary float to decimal float conversion routines

On Fri, 18 Dec 2015, Steven Munroe wrote:

> At this time I can NOT say this analysis applies to my case
> (IEEE754-2008 Densely Packed Decimal floating point) 

Since BID and DPD have exactly the same set of values, there is no 
significant difference for the purpose of determining the precision needed 
for conversions.  There's little difference in whether you extract the DFP 
mantissa by frexp / multiplication / converting to an integer type (as in 
libdfp), or directly from the binary representation (as can be done for 
BID).  Much the same applies for storing an integer mantissa when working 
in the other direction.

> > Thanks.  I'm not at all expert on decimal floating-point or on its state 
> > in the GNU toolchain (I just noted the absence of these conversions in the 
> > course of reviewing patches for __float128 libgcc support for powerpc).  
> > My general impression is that the IEEE 754 conformance state is probably 
> > similar to or worse than that for binary floating-point - that is, various 
> > miscellaneous local issues along with the same general issues of 
> > optimizations not respecting the state involved in exceptions and rounding 
> > modes (but because decimal floating-point is less widely used, such issues 
> > are less likely to have been found, especially if some code is correct for 
> > binary floating-point and no-one thought about decimal when writing an 
> > optimization, and especially when involving issues such as preferred 
> > quantum that don't exist for binary).
> > 
> Joseph,
> Please do not assume the Decimal Floating carries all the sins of the
> much maligned IBM long double. 

I'm not assuming that - I'm comparing with support for *IEEE* binary 
types, which also has plenty of deficiencies in GCC.  I'm working from, 
for example, the various open DFP bugs in GCC:

I think that illustrates how there are a range of miscellaneous local 
issues (similar to the state for binary floating-point) as well as the 
same general issues with optimizations as for binary floating-point.  Or 
cf. Fred Tydeman's comments in the October CFP minutes 

Maybe some of those bugs describe issues that are invalid or already fixed 
- but that doesn't really affect my point.

> So why would I even consider using software emulation to get one
> additional ULP of accuracy?

Well, I'd think the starting point is to implement operations following 
the correct semantics, which are quite clear in both TR 24732:2009 and TS 
18661-2: as per IEEE 754-2008, conversions are correctly rounded with 
correct exceptions.  Then, depending on how common such conversions 
between binary and decimal types actually are (I'd guess not very common), 
and which cases are more or less common, optimize them for the common 
cases, taking full advantage of hardware facilities in the process (and 
with the potential for e.g. separate -fno-trapping-math versions if 
correct exceptions involve significant performance cost).  That leaves 
software emulation, most likely, only for rare worst cases - most cases 
should be able to use fma plus Dekker-style precision extension to avoid 
software emulation, provided you take special case about exact and 
exactly-half-way cases.

If it were a function not fully bound to an IEEE 754 operation, then, yes, 
you probably wouldn't use software emulation for 1ulp extra precision.  
But that's for things such as the slow paths in dbl-64 functions that I 
recently proposed removing.  It's not for IEEE operations such as 
conversions, sqrt or fma.  (Of course the fma that Jakub implemented 
following the method of Boldo and Melquiond, to replace a particularly 
stupid fallback on processors without fma instructions, and that I fixed 
up for various exceptions and rounding modes issues, is slower than a 
fallback that's not correctly rounding.  But fma is an IEEE operation, 
which means doing the slow thing when no faster way of achieving correct 
results is available.)

glibc no longer works, as it used to, on the basis of implementing some 
vague approximation to the desired semantics with whatever deviations from 
the standard someone felt like having, although there is still plenty of 
old code like that (but I've been gradually working through libm functions 
over the past few years to ensure they follow consistent accuracy goals 
and that there is adequate test coverage for this).  For anything 
potentially controversial, especially new interfaces, we work to obtain 
consensus in the community, including a common understanding of the 
relevant standard semantics and how best to implement them in glibc (and, 
if it seems the standard may be defective, a common understanding in that 
regard - working with the relevant standards committees to resolve any 
such defects).  This means much more regard than a few years ago for 
standard semantics first, optimizations only where consistent with the 

Of course the libdfp maintainers can do what they want in libdfp.  But 
since this discussion was started on libc-alpha, I'm considering things in 
terms of standard glibc accuracy goals (which for operations fully bound 
to IEEE 754, on IEEE types, means exact correctly-rounded results and 
exceptions).  And for any new libm functions (e.g. for float128), getting 
consensus on the design and implementation approach at an early stage, 
working with the community, and following glibc standards at least to the 
extent that the existing ldbl-128 functions follow them, would be 
particularly important.  (It would not be OK, for example, to have 
architecture-specific optimized versions that fail to follow the standard 
semantics when the architecture-independent versions do follow those 
semantics, though architecture maintainers can e.g. decide which cases are 
important to optimize for on their architecture, while still keeping the 
slow cases correct.  Note that we removed various x86 function 
implementations using x87 trig instructions a few years ago because of 

Joseph S. Myers

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]