This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

crlibm elementary functions for the glibc ?


Hello to all,

We are the maintainers of the crlibm project, which aims at developping a modern, correctly rounded version of the libm licenced under the LGPL.
Project homepage is https://lipforge.ens-lyon.fr/projects/crlibm/


I feel that this project is now achieving sufficient maturity to be a
candidate for gradual inclusion in the glibc. I would appreciate any feedback on this question, and probably some practical advice.



Correct rounding means that the value returned by a libm function is fully and uniquely specified, to the best mathematical possible. Its main advantage is therefore to improve numerical portability between heterogeneous systems. The drawbacks are a small degradation in performance (theoretically a few percents in average) and a larger degradation in code size. The pdf files distributed along with CRLibm details this issue and gives more references.

As far as I observe, the libm currently used in glibc is a derivative
from IBM libultim, at least for  architectures with double-precision by
default, like powerPC and x86_64 using the sse2 FP unit.
This is a very good choice, as it was the first efficient implementation
of a double-precision correctly rounded library.

However, libultim has several drawbacks:
- impredictible worst-case performance with some calls in the million
cycle range (see below)
- obscure code,
- some large tables of precomputed values (up to 43 KB for arctan, 13KB
for exp, etc)

CRLibm has a sounder theoretical basis, which allows in particular to
provide much more acceptable worst-case execution time (several hundred
times lower, see below). We have an emphasis on the proof of the correct
rounding property and in general on documentation. We also limit the
table size to 4KB altogether for each function (whether this is a good
idea can be discussed - several years ago, it seemed so).  All this has
some performance cost, which we are slowly catching up. Appended are
some performance comparisons.

Besides, CRLibm is currently actively maintained, and in any case should
be more future-proof than libultim. For instance, unlike libultim, the
CRLibm distribution includes all the files that generate the tables and
other constants, so current tradeoffs can be changed as technology
evolves. It also includes a battery of selftests. Another example is
that our code is designed to use FMA efficiently (on power/powerPC and
ia64) as soon as there is clean and efficient compiler support for it. The code also attempts to expose parallelism (which current gcc fails to exploit efficiently, but this is bound to improve).


We also have more correctly rounded functions that libultim (currently
log2 and log10, expm1 and log1p, and the trigpi functions mentionned in
the current draft of the revised IEEE-754 standard), and we keep adding
new ones.

There are other more prospective features, like machine-checkable proofs
of some of the code, partial support of directed rounding modes and
interval arithmetic functions, etc, all of which is probably currently irrelevant to the glibc.


If there is interest, I would like some advice on the procedure to
follow. I appreciate that we will have to submit patches. I have many
questions on this process: should we add a separate directory and where,
which functions should we submit first, should first improve average
performance at the expense of larger tables, what should we do with the
autotest and documentation, etc. I also appreciate that it will take
some work to clean up crlibm and bring it to GNU standards.

We would also appreciate pointers on relevant discussions on the list archive.

Thank you for your interest, and many thanks in advance,

Florent de Dinechin and Christoph Lauter



Here is a sample of performance results. The actual input values used
for the worst-case timing tests are available in the crlibm distribution.
The random generators used for average case are by definition controversial, but they were chosen in good faith, not to distort the comparison in favor of crlibm. For example our asin and acos implementations are faster than libultim's on a random distribution of the floating-point numbers between -1 and 1, but slower on a normal distribution of the reals between -1 and 1 (used below).


Tests on an Opteron
( Linux 2.6.17-2-amd64
   gcc (GCC) 4.1.2 20060901 (prerelease) (Debian 4.1.1-13) )

     exp      avg time  max time
default libm    141    1105468
  CRLibm        184       1210

      log     avg time  max time
default libm    210     150919
  CRLibm        146        903      (using double-extended arithmetic)
  CRLibm        266       1459      (using standard SSE2 doubles)

      sin     avg time  max time
default libm    147    1018622
  CRLibm        171       3895

      cos     avg time  max time
default libm    152     028302
  CRLibm        171       3752

      tan     avg time  max time
default libm    244    1101230
  CRLibm        280       9949

     asin     avg time  max time
default libm    107    1018823
  CRLibm        297       2383

     acos     avg time  max time
default libm    83    1018786
  CRLibm        322      2360

     atan     avg time  max time
default libm    144     100066
  CRLibm        243       4724

    log10     avg time  max time
default libm    249       1061    NOT correctly rounded
  CRLibm        321       1706

     log2     avg time  max time
default libm    174        274    NOT correctly rounded
  CRLibm        320       1663



The same, on an IBM power5 server (time units are arbitrary)


log avg time max time default libm 61 23471 CRLibm 57 307

     exp      avg time  max time
default libm     41      25019
  CRLibm         42        242

      sin     avg time  max time
default libm     37     132435
  CRLibm         43       1910

      cos     avg time  max time
default libm     38     134045
  CRLibm         44       1946

      tan     avg time  max time
default libm     65     141885
  CRLibm         72       4671

     asin     avg time  max time
default libm     33     132912
  CRLibm         50        465

     acos     avg time  max time
default libm     34     132798
  CRLibm         58        433

     atan     avg time  max time
default libm     35      18785
  CRLibm         53       2315

    log10     avg time  max time
default libm     65        153	NOT correctly rounded
  CRLibm         65        355

     log2     avg time  max time
default libm     47         59  NOT correctly rounded
  CRLibm         65        351


PS2 : while browsing the code I just noticed a probably harmless bug line 27 of sysdeps/ieee754/dbl-64/sincos.tbl : 40 should be 440)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]