This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
crlibm elementary functions for the glibc ?
- From: Florent de Dinechin <Florent dot de dot Dinechin at ens-lyon dot fr>
- To: libc-alpha at sourceware dot org
- Cc: Christoph Quirin Lauter <christoph dot lauter at ens-lyon dot fr>
- Date: Tue, 06 Feb 2007 16:00:13 +0100
- Subject: crlibm elementary functions for the glibc ?
Hello to all,
We are the maintainers of the crlibm project, which aims at developping
a modern, correctly rounded version of the libm licenced under the LGPL.
Project homepage is https://lipforge.ens-lyon.fr/projects/crlibm/
I feel that this project is now achieving sufficient maturity to be a
candidate for gradual inclusion in the glibc. I would appreciate any
feedback on this question, and probably some practical advice.
Correct rounding means that the value returned by a libm function is
fully and uniquely specified, to the best mathematical possible. Its
main advantage is therefore to improve numerical portability between
heterogeneous systems. The drawbacks are a small degradation in
performance (theoretically a few percents in average) and a larger
degradation in code size. The pdf files distributed along with CRLibm
details this issue and gives more references.
As far as I observe, the libm currently used in glibc is a derivative
from IBM libultim, at least for architectures with double-precision by
default, like powerPC and x86_64 using the sse2 FP unit.
This is a very good choice, as it was the first efficient implementation
of a double-precision correctly rounded library.
However, libultim has several drawbacks:
- impredictible worst-case performance with some calls in the million
cycle range (see below)
- obscure code,
- some large tables of precomputed values (up to 43 KB for arctan, 13KB
for exp, etc)
CRLibm has a sounder theoretical basis, which allows in particular to
provide much more acceptable worst-case execution time (several hundred
times lower, see below). We have an emphasis on the proof of the correct
rounding property and in general on documentation. We also limit the
table size to 4KB altogether for each function (whether this is a good
idea can be discussed - several years ago, it seemed so). All this has
some performance cost, which we are slowly catching up. Appended are
some performance comparisons.
Besides, CRLibm is currently actively maintained, and in any case should
be more future-proof than libultim. For instance, unlike libultim, the
CRLibm distribution includes all the files that generate the tables and
other constants, so current tradeoffs can be changed as technology
evolves. It also includes a battery of selftests. Another example is
that our code is designed to use FMA efficiently (on power/powerPC and
ia64) as soon as there is clean and efficient compiler support for it.
The code also attempts to expose parallelism (which current gcc fails to
exploit efficiently, but this is bound to improve).
We also have more correctly rounded functions that libultim (currently
log2 and log10, expm1 and log1p, and the trigpi functions mentionned in
the current draft of the revised IEEE-754 standard), and we keep adding
new ones.
There are other more prospective features, like machine-checkable proofs
of some of the code, partial support of directed rounding modes and
interval arithmetic functions, etc, all of which is probably currently
irrelevant to the glibc.
If there is interest, I would like some advice on the procedure to
follow. I appreciate that we will have to submit patches. I have many
questions on this process: should we add a separate directory and where,
which functions should we submit first, should first improve average
performance at the expense of larger tables, what should we do with the
autotest and documentation, etc. I also appreciate that it will take
some work to clean up crlibm and bring it to GNU standards.
We would also appreciate pointers on relevant discussions on the list
archive.
Thank you for your interest, and many thanks in advance,
Florent de Dinechin and Christoph Lauter
Here is a sample of performance results. The actual input values used
for the worst-case timing tests are available in the crlibm distribution.
The random generators used for average case are by definition
controversial, but they were chosen in good faith, not to distort the
comparison in favor of crlibm. For example our asin and acos
implementations are faster than libultim's on a random distribution of
the floating-point numbers between -1 and 1, but slower on a normal
distribution of the reals between -1 and 1 (used below).
Tests on an Opteron
( Linux 2.6.17-2-amd64
gcc (GCC) 4.1.2 20060901 (prerelease) (Debian 4.1.1-13) )
exp avg time max time
default libm 141 1105468
CRLibm 184 1210
log avg time max time
default libm 210 150919
CRLibm 146 903 (using double-extended arithmetic)
CRLibm 266 1459 (using standard SSE2 doubles)
sin avg time max time
default libm 147 1018622
CRLibm 171 3895
cos avg time max time
default libm 152 028302
CRLibm 171 3752
tan avg time max time
default libm 244 1101230
CRLibm 280 9949
asin avg time max time
default libm 107 1018823
CRLibm 297 2383
acos avg time max time
default libm 83 1018786
CRLibm 322 2360
atan avg time max time
default libm 144 100066
CRLibm 243 4724
log10 avg time max time
default libm 249 1061 NOT correctly rounded
CRLibm 321 1706
log2 avg time max time
default libm 174 274 NOT correctly rounded
CRLibm 320 1663
The same, on an IBM power5 server (time units are arbitrary)
log avg time max time
default libm 61 23471
CRLibm 57 307
exp avg time max time
default libm 41 25019
CRLibm 42 242
sin avg time max time
default libm 37 132435
CRLibm 43 1910
cos avg time max time
default libm 38 134045
CRLibm 44 1946
tan avg time max time
default libm 65 141885
CRLibm 72 4671
asin avg time max time
default libm 33 132912
CRLibm 50 465
acos avg time max time
default libm 34 132798
CRLibm 58 433
atan avg time max time
default libm 35 18785
CRLibm 53 2315
log10 avg time max time
default libm 65 153 NOT correctly rounded
CRLibm 65 355
log2 avg time max time
default libm 47 59 NOT correctly rounded
CRLibm 65 351
PS2 : while browsing the code I just noticed a probably harmless bug
line 27 of sysdeps/ieee754/dbl-64/sincos.tbl : 40 should be 440)