This is the mail archive of the
mailing list for the cygwin project.
RE: Precision of doubles and stdio
- From: "Phil Betts" <Phil dot Betts at heis dot co dot uk>
- To: <cygwin-talk at cygwin dot com>
- Date: Tue, 7 Mar 2006 18:10:35 -0000
- Subject: RE: Precision of doubles and stdio
- Reply-to: The Cygwin-Talk Malingering List <cygwin-talk at cygwin dot com>
Dave Korn wrote on Monday, March 06, 2006 6:41 PM::
> On 06 March 2006 17:36, Phil Betts wrote:
>> I'm absolutely amazed that you are a professor of computer science!
> I'm not, but then again, I've been paying attention to what Roberto
> has posted before, and therefore have some idea of his level of
> expertise and experience, whereas you are assuming that you know all
> there is to know and therefore do not need to check your beliefs
> against reality.
Not true. YOU are the one doing the assuming (that you know what I am
thinking.) On the other hand I _know_ what I'm thinking. It's not for
me to judge how well I communicated those thoughts, but it would appear
I left some room for improvement ;)
> See, the problem with indulging in this sort of posturing is that
> you better be /very/ sure that you're right, or you're going to make
> a fool of yourself.
Phew! It's a good job I was /very/ sure then.
> In this case, you have not revealed your great intellect: you have
> revealed that you have failed to understand what Roberto is doing in
> his application, and why it is valid. You have mistaken your lack of
> knowledge for an insight into necessity.
> Bad call.
On the contrary, I understood immediately what Roberto was doing. I've
done similar things myself nearly 20 years ago. There is absolutely
nothing wrong in what he's doing in principle, merely the execution.
The point I was making is that he's criticising the tool (i.e. the axe),
rather than choosing the right tool (i.e. a scalpel/laser cutter) for
> Floating point numbers are not approximate anything. They are
> exactly precise representations of a subset of the rational numbers.
OK, I accept that each floating point number does exactly represent
one rational number. The point I was trying to making is that all
real numbers between two boundaries share the same representation.
>> Floating point hardware is frequently not 100% accurate in the LSB of
>> the mantissa due to internal rounding errors and the datasheets will
>> often tell you so.
> This is utter fantasy.
Then I must have had a fantastic experience! Experience rather than
theory has taught me not to rely on the accuracy of the LSB.
> Floating point hardware either conforms to
> IEEE754, which specifies the exact algorithms to be used in different
> rounding modes, or it is broken. There is no room for ambiguity in
> the standard. It specifies guard and rounding bits precisely. It
> does not say that any part of the calculation may be random.
> IEEE-754 is entirely deterministic, and every implementation should
> produce EXACTLY the same results, down to the last bits.
Deterministic != 100% accurate, but that's beside the point, we're
talking about printf & co. here, which are not (AFAIR) part of the
IEEEE standard, but rather part of the ISO C standard (see below).
>> That being said, the thing which completely floors me is that you are
>> relying on behaviour which is clearly beyond that given in the
>> language specification.
> Really? Exactly what and where? Nobody has yet pointed to a
> precise paragraph in the C language spec. If you are as gobsmacked
> as you claim to be, you should be pretty damn sure exactly what
> behaviour he is relying on and in what way the says that is
> unspecified/undefined; I'm sure you've already referred to your copy
> of the standard to refresh your memory and make sure you aren't
> mistaken, no?
A sceptical mind is good. As it happens, I didn't need to refer, as
I was already intimately aware of the passages in question, as they
have given me cause for concern before. However, since you will
continue to doubt me until I provide the evidence:
From the ISO standard (OK, I'll admit it, it's the _draft_ standard of
August 3, 1998, so if the final version differs significantly, I
This deals with floating point numbers in general:
! Section 18.104.22.168.2 Characteristics of floating types <float.h>
! 4 The accuracy of the floating-point operations (+, -, *, /) and
! of the library functions in <math.h> and <complex.h> that return
! floating-point results is implementation defined. The
! implementation may state that the accuracy is unknown.
I.e. The C standard simply dodges the issue of what precision is used.
You don't know if you're using an axe or a scalpel, therefore you
should assume the worst until PROVEN otherwise.
This next paragraph describes why printf & co. are useless for the
! Section 22.214.171.124 The fprintf function
! 13 For e, E, f, F, g, and G conversions, if the number of significant
! decimal digits is at most DECIMAL_DIG, then the result should be
! correctly rounded. If the number of significant decimal digits
! is more than DECIMAL_DIG but the source value is exactly
! representable with DECIMAL_DIG digits, then the result should be
! an exact representation with trailing zeros. Otherwise, the source
! value is bounded by two adjacent decimal strings L < U, both having
! DECIMAL_DIG significant digits; the value of the resultant decimal
! string D should satisfy L < D < U, with the extra stipulation that
! the error should have a correct sign for the current rounding
In other words, the output decimal is only guaranteed to be between the
two DECIMAL_DIG digit decimals. In practice, you do get more, but it is
wrong to rely on this behaviour, which was my point.
The definition of DECIMAL_DIG is the number of decimal digits, q, such
that any floating-point number with q decimal digits can be rounded into
a floating-point number with p radix b digits AND BACK AGAIN <my
emphasis> without change to the q decimal digits. For a 53 bit
using the calculation given in the standard, this is about 15.5. I
haven't got the IEEE spec to hand, but AFAIR, the mantissa is
54 bits (the most significant being implied). This gives the quoted 16
(actually 15.95) digits of precision.
This is MUCH lower than the number of digits needed to EXACTLY convert
the floating point number to a decimal.
What Roberto is asking printf to do is therefore way beyond the language
specification (which was my main charge). Given that he is expecting a
full 33 *extra* decimal digits of precision, my axe metaphor was
extremely charitable - unless you know of an axe many times larger than
the entire known universe!
>> This is one of the most rudimentary mistakes in programming.
>> Frankly, this is beyond belief for someone in your position.
>> C doubles are accurate to 16 or 17 decimal places, if you expect 48
>> significant figures then you deserve all the bad results you get.
> This claim (ISTM) is based on the fallacious line of reasoning that
> if you can represent an N-bit binary integer with D decimal digits,
> you can also represent an N-bit binary fraction (the mantissa) with D
> decimal digits.
> Alas, you cannot.
We'll, you *can*, but not exactly. Read on...
> A 3 bit binary number - assuming all three of those bits to be
> above the decimal place - can represent the numbers 0 to 7, and hence
> requires only one decimal sig.fig., as indeed you might expect from
> computing (log10(2) * 3) and getting 0.903. I agree with you there.
> However, a 3 bit floating point mantissa is aligned below the
> decimal place.
> It can represent the binary numbers 0.000 - 0.111, where the place
> immediately after the decimal is 0.5, the next lower place 0.25, and
> the lowest 0.125.
> The values it can represent are from the set (0, 0.125, 0.25, 0.375,
> 0.5, 0.625, 0.75, 0.875), and as you can clearly see, many of them
> require three significant figures.
> Now please explain why a 53-bit mantissa - a /fractional/ number -
> can necessarily be represented with only 16 or 17 decimal places.
Because at that number of places, changing the least significant bit
of the double _always_ changes the least significant digit of the
Look at your example - 1 decimal digit is enough to differentiate each
value. In fact, because of the normalisation of floating-point numbers,
so that there is always a 1 in the most significant bit, a 3-bit
mantissa would only be used for the binary numbers 0.100, 0.101, 0.110
The following represents (from left to right) the conversion of each
to a 1-significant-digit decimal and back to binary.
Binary Exact Rounded 8*Rounded Nearest Binary
Decimal Decimal Decimal Integer Representation
0.100 0.5 0.5 4.0 4 100
0.101 0.625 0.6 4.8 5 101
0.110 0.75 0.8 6.4 6 110
0.111 0.875 0.9 7.2 7 111
No loss of information has taken place.
>> That you should then choose a public forum (and the wrong one at
>> that!) to complain about this is astounding.
> What's even more astounding is how you choose to grandstand on this
> subject when you clearly aren't a specialist and haven't studied it
> at all.
You must be looking at a 20 year old CV. Actually, about 18 years ago,
I wrote a 128-bit-mantissa floating-point library in M68k assembler,
because no floating point hardware with the required precision was
available. In order to maintain accuracy, internal operations were
performed using 256 bit wide mantissae.
Why? To investigate the apparently chaotic behaviour of the equations
related to Mie scattering of light in the atmosphere. That code used
the notion of upper and lower bounds in much the same way as Roberto
is doing. The difference is that I had total control of the
conversion to & from decimal, rather than having to rely on a library
written to a (relatively) vague standard.
>> Ask yourself this: if your brain surgeon uses an axe, will the
>> inquest find the axe at fault or the surgeon?
> I'd rather have someone doing surgery on me who goes to university
> and studies and does research and experiments and asks people
> questions, than someone who just /thinks/ they know it all and whose
> overconfidence leads them not to bother checking their facts and to
> (mis)assume they have understood a problem based on a superficial
> once-over in which they only see whatever fits with their idea of the
> facts. You completely just didn't even see Roberto's statement that
> he was using FP numbers to represent bounds-of-uncertainty, did you?
Yes I did. However, relying on the C library to print the results
introduces more uncertainty than he started out with! And if he's
relying on the C standard to specify the precision of his maths, he
cannot be certain of the input to printf either.
I hope by now, you'll see that I was in fact correct and that your
accusations of an off-the-handle, uneducated response were unfounded.
Although you don't know me, and therefore might be forgiven for
assuming otherwise, I can assure you that I wouldn't choose to
pontificate on a subject I was unsure of in such a public forum.
I will admit that the language I used could have been more diplomatic,
and I'm sorry for any offence. I'll try not to post at the end of a
long day in future.
PS. If the moderator of the PPL-devel list is monitoring this, I hope
you're feeling ashamed of yourself.
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the H.E Information Systems Ltd.
Tel: 0161 866 9066
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.