1. Alternate Runtimes
One of the ways in which we can enhance libm is by providing alternate runtimes with guarantees that more closely match user requirements.
A user would select the alternate runtime by selecting it at link time via a compiler flag.
The proposed alternate runtimes are:
- Fast implementation
- Default implementation
- Precise implementation
1.1. Fast Implementation
The fast implementation would would allow results to be off by several ULPs.
The results are not required to be correctly rounded.
Selected by compiling with gcc and the use of the existing -ffast-math flag along with other math-related optimizations.
Selected by compiling with gcc and the use of the not-yet-implemented flag -ffast-libm.
1.2. Default Implementation
The default implementation is what we currently have with the IBM multi-precision code in the library.
Selected by compiling without any special options i.e. the default.
1.3. Precise Implementation
The precise implementation would strive to be strict IEEE correctly rounded despite the speed code.
Selected by compiling with gcc and the use of the not-yet-implemented -fprecise-libm.
The alternate runtimes will be implemented as alternate entry points in the same library. The fast versions of the functions will be called __fast_* and the precise versions of the functions will be called __cr_* where cr stands for "correctly rounded." The existing implementation will continue to use the existing symbols.
The compiler will automatically rewrite a call to * with a call to __fast_* if the -ffast-libm option was given, likewise for -fprecise-libm.
By using alternate entry points we ensure that the compilation unit always runs with the code it was intended to run with and that the properties of libm can't easily be changed at runtime (ignoring interposition of a libm with __fast_* symbols which aren't fast).
Initially libm can alias all __fast_* and __cr_* symbols to the existing symbols, but we can migrate symbols to new versions when we provide fast or correctly rounded variants of the functions to the library.
2. Vector Entry Points
The math library may provide small-vector entry points to support reducing the cost of calling libm functions over multiple input arguments.
The compiler may collect several calls to a functions and then call an alternate vector entry point e.g. __vec4_fast_*.
The math library must provide small-vector entry points for certain functions.
The compiler must be able to auto-vectorize calls with inputs into vector calls of the same functions.
Analysis needs to be done to decide exactly what size is beneficial.
The alignment of the types must be precisely documented.
Large vectors are excluded because they can more efficiently be handled by accelerators or other APIs.