specfunc error analysis - generated and propagated errors.

Gerard Jungman jungman@lanl.gov
Tue Sep 7 23:15:00 GMT 2004


On Sun, 2004-09-05 at 17:39, Jonathan Underwood wrote:

> In an effort to fix up the error analysis in the wigner coefficient code I 
> posted recently, I started to look at how the error analysis is 
> implemented for GSL in the special functions (specfunc directory) and came 
> to the following conclusions and questions. The reason I'm posting these 
> is in the hope that my conclusions will be confirmed or corrected, and the 
> questions answered :).

Ahh, the joy of hackery. I'm responsible for this state of
affairs; so I'll answer as best I can.

Let me give a short history of the problem first. Don't
interpret this as a rationalization, and don't be afraid
to whack me with a large rubber mallet. I can take it.
Maybe this will motivate someone to help fix up some of this.

In the best of all possible worlds, each function would
return values which are within machine precision of the
correct answer. Input values are assumed to be exact so
that no propagation of "implicit" errors on input values
occurs; therefore, the "correct" answer is well-defined.
This is the sort of behaviour you expect for sin(x),
cos(x), etc.

There were two possible roads to take for GSL. The first
road follows the strict interpretation of the philosophy of
correctness above. The second road provides as many useful
functions as possible. In the end, the second philosophy
won out because it was decided that GSL needed at least
as much coverage as (gack) Numerical Recipes. Anyway,
since providing correctness for a wide range of functions
is going to be a life's work, I figured it was inevitable
that there would be a full spectrum of quality in the
implementation, from gold-plated to fishy-smelling.

For a long time, the error estimation code was not present
at all. When the push came for the 1.0 release we became
serious about testing, and it became clear that there
were issues with the functions. Basically, when
pushed to the edge, they were failing tests, and I needed
a way to figure out what was going on. So, I went back
and added the error estimation code to all the functions.
That way I could see that most of the failures were consistent
with back-of-the-envelope error propagation. This actually
exposed some implementation problems that were fixable,
so it contributed to the overall fight.

This also made it possible to write the tests so that they
were sensitive to "graceful" failures and loss of precision.
This is why you will occasionally see a test tolerance
bumped up when somebody complains about a test failure
on some platform. Typically, the differences in arithmetic
across platforms lead to inconsistencies in the last few
bits (usually intel vs. the rest of the world due to the
extended precision register arithmetic on x86).

In this way, the tests also document the limitations of
the functions. I think this is one of the more important
things that the tests can do. Since these limitations
were diagnosed by an iterative procedure involving the
tests and the error estimates, I felt that the error
estimation code provided a kind of runtime expression
of the limitations. With this interpretation, it became
reasonable to leave that code in the release, and that's
how the _e functions were born.

Of course, the interpretation of these error estimates
is unclear in any case. Sometimes they are provable
upper bounds to an error estimate based on interval
arithmetic. But in complicated situations it is hard
to prove any such bounds.


> (1) Generated error.
> ---------------------
> By which i mean the error from carrying out a function on a variable. For 
> most of the gsl_sf functions these are calculated and returned, however, 
> I'm a bit unsure  as to how things have been standardized for GSL for 
> addition, subtraction, multiplication and division i.e. for exact A and B, 
> what is the generated error for the operations A {+,-,*,/} B.

See comment below about A + B.

> (a) Multiplication: Looking at coupling.c, it looks like the _relative_ 
> generated error for multplication (A*B) is (2 * GSL_DBL_EPSILON). [See 
> lines 73 and 74 of coupling.c]. However, I then looked at legendre_poly.c, 
> as this contains some very simple algebra. Looking at line 84, I see that 
> my assertion must be wrong. But then, comparing line 122, which is the 
> same algebraic expression as line 84, gives an entirely different 
> generated error analysis. What gives?

Yes, these are inconsistent by a factor of 3. This is a code
duplication problem, and I have very little idea how extensive
it is. Rubber mallet time?


> (b) Addition: Again, the relative generated error seems to be (2 * 
> GSL_DBL_EPSILON) - see eg. line 177 of coupling.c
> 
> (c) Subtraction: Absolute generated error seems to be (2*GSL_DBL_EPSILON) 
> * (|A|+|B|) [this seems very odd]

The idea of the estimate on line 177 is that each of the
intermediate quantities suffers from roundoff, and a statement
like 2.0 * GSL_DBL_EPSILON * (|A| + |B|) accounts for
those. I don't think of it as generative, but as
as book-keeping (propagation) for the intermediate quantities.
The same book-keeping applies whether it is addition
or subtraction, which is the only logical way to
gracefully expose unfortunate cancellations and
the amplification of the relative error. This
is what is happening on line 177 of coupling.c.
By the way, line 178 accounts for the worst case
amplification of the error because each step of
the summation introduces another relative error
of GSL_DBL_EPSILON; it typically dwarfs the contribution
from line 177 _except_ in the case of a rare cancellation
when the minimum value expressed by line 177 dominates.

This is different from the notion of error on A+B, A-B,A*B,A/B
for exact A and B, which is one tick up or down to the next
machine representable value, by assumption on the underlying
machine arithmetic, though the book-keeping for that is done
using a factor of GSL_DBL_EPSILON as well. I think this is
what you mean by generative error, which is why line 177
is not generative by that definition.


> (d) Division: Didn't consider this, but it would be nice to know.

Really the same as any binary operation, see next comment.


> It would be really helpful if someone could clarify on the issue of 
> generated errors, with examples. This should eventually go in the design 
> document (I'm happy to collate the text and add it to the documentation).

Roundoff errors on generative operations (A op B, for exact A and B)
are codified as a relative error governed by (a fixed multiple of) 
GSL_DBL_EPSILON. As far as I know, this is always the case in the
GSL functions; at least I can't think of a reason for it to be any
other way, so if I did anything any different then it is an issue.
The fixed multiple is usually a factor of 2, by convention.
It should probably never differ from a fixed choice throughout
the code, so if you see a statement with something other than this
convention, then it probably indicates that it is doing something
different from keeping track of a generated error for a binary
operation on exact quantities.


> (2) Propagated Error
> ---------------------
> This is simpler
> (a) Addition and subtraction: Algebraically add the absolute errors in all 
> variables being added together to get the result's absolute error. eg. for 
> A with absolute error a, and B with absolute error b, the error in (A +/- 
> B) will be a+b. As Brian said, we're not adding errors in quadrature here 
> (a simplification).

Yes. This is consistent with what is said above, e.g. the
line 177 example.


> (b) Multiplication and division: the resulting relative error is the sum 
> of the relative errors of the values, i.e. for A*B, the relative error in 
> the answer is (a/A + b/B), and the absolute error in the answer is 
> |AB|*(|a/A| + |b/B|). Ignoring terms of (ab), can then write the resultant 
> absolute error as a|A| +b|B|. This is extended to A*B*C, where the 
> absolute error in the answer will be a|BC| + b|AC| + c|AB|. etc.

I agree with this as well, and I think that is what I did
everywhere. Again, if I did something fundamentally different,
then it is an issue. In some case I may do something which is
simpler but which clearly provides an upper bound, which I
consider acceptable, although non-optimal.

At this point it is worth talking about the "mysterious factors"
approach to the final returned estimate. In all cases I applied
rules as given above. However, sometimes the results were
still "too small". After tracking everything down to the point
where I felt the estimate should be good, the computed value
was still not within its own tolerance specification of the real
value. Usually the difference was a factor close to unity.
At that point I would give up and introduce a mysterious
factor, like changing a "2.0*..." to "3.0*...", or worse.

I did this because I thought it was very important for the
value reported by the function to be within its own estimated
error of the correct value. Otherwise there is an obvious
inconsistency. Of course, this is not a logical process,
so it may often just paper-over some underlying problem
which would be exposed with expanded testing.


How the test tolerances are set is another issue, which is
logically orthogonal to the question of the mysterious factors
in the returned estimate. This is why there are two distinct
failure conditions on the tests. One failure mode is
"returned value not within specified test tolerance of given test
value". The other failure mode is "returned value not within
computed error estimate of given test value".

However, the development of these estimates was iterative, so
the tolerances in the tests and the computed error estimates
were not set independently. The whole thing just sorta came
together as 1.0 was on its way out the door.


> Further points
> ---------------
> Brian in a previous email alluded to keeping the relative errors signed 
> such as to allow for error cancellation during propagation, however I 
> don't see any implementation of that.

I certainly don't do that. In order to do that correctly you would
need to prove that the estimate you obtain is an upper bound. Since
I didn't have the time to do that, I try to err on the conservative
side. Anyway, signed errors increase the complexity a great deal.


> Related to this, I wonder what the error analysis is actually trying to 
> achieve. It seems to be giving an estimation of the worst case error 
> resulting from roundoff error,

Yes. That is the governing idea.


> but I don't think it is in anyway checking 
> or estimating truncation error (is it even possible to have a general 
> pescription for estimating truncation error numerically?).

What do you mean by truncation error?


> When I say 
> "worst case", I mean it doesn't seem to allow for error cancellation, and 
> assumes all steps contribute maximum error (there will of course be a 
> distribution of error).

Right, but without a probabilistic model there is no way to compute
with a distribution of error. Anyway, I am generally suspicious
of probabilistic error models. In my opinion, an error estimate should
always produce an upper bound. And the harder you work, the less
vacuous those bounds become.


> As a final design point - heretically, I wonder about the decision to be 
> always calculating the error - this puts a big overhead on each 
> calculation as it roughly doubles the number of operations. I realize that 
> by always giving the user the option to examine the error (s)he can then 
> write new functions and propagate the error accordingly. I'd reckon that 
> >98 percent of the time, noone uses the _e functions however. Would it not 
> be better to have error calculation only done when the library is compiled 
> with a certain flag set?


I wonder about it too. I always intended on benchmarking to see if the
added operations really amounted to anything. In the end I never got the
chance. I doubt that it is a big overhead. But that is a very sweeping
statement, and I wouldn't mind seeing some actual data.

Also, if your full computation is bottle-necked because
of the cost of computing Bessel functions, then there is
a good chance that you are doing something wrong. This is
another sweeping statement, but you see the point.
Often you can use a little knowledge of the functions
(recursions, etc) to reduce the number of
independently computed values to a minimum.

As for a compilation flag, it is one idea. I would rather
see some kind of link-time or run-time solution. If we
introduced a compilation flag, the installation would have
to provide two compiled libraries, one with error
calculations and one without.
Also, the _e functions must still do something, since we
cannot force people to modify their source. That means
some policy decision has to be made about the meaning
of the _e functions when errors are "not being computed". 
In this light, we have a general configuration management
problem, where the configuration includes the following:

 - bounds checking on arrays
 - inlining optimizations
 - error estimation

With all these compilation flags (and there must be others),
configuration management becomes a serious issue. We can't
have 2^N different compiled libraries floating around.

There are also "link-time" configurations:

 - cblas implementation
 - linear algebra implementation (eventually people should be able
   to just recompile/relink their code substituting some external
   lapack-type implementation for any GSL linear algebra functionality)

Although this seems to be getting far away from the question
of error estimates, you see how a decision like this becomes
difficult.


> Finally, I apologize for all ignorance displayed by myself herein.

No problem. This is an important discussion. I still entertain the
idea of "doing everything right" in the future. The simplicity of
the first philosophy (absolute machine-precision correctness) is
very attractive, but it probably won't be possible to apply it
to every function. So I am still trying to develop a picture of
what constitutes correctness across the full suite of functions.
Maybe, after all, some functions like sin(x) are more equal
than others. It may depend on whether or not efficient
arbitrary precision algorithms are available, such as
for elliptic functions and all their friends.

By the way, lately I have been influenced by a nice
little book by Muller called "Elementary Functions: Algorithms and
Implementation". I wish I had seen it before I started working on
the special functions for GSL.


-- 
Gerard Jungman <jungman@lanl.gov>
Los Alamos National Laboratory



More information about the Gsl-discuss mailing list