openlibm and libm: mutation testing
Sat Dec 2 06:09:00 GMT 2017
Thanks for the hint about glibc's libm tests: I will see how easy it is to run
Mull on them and then if we find anything.
The answer to the second paragraph: I don't know how and why the functions like
sqrt and acos are really implemented, my goal was to see if a test vector
of 284 inputs would produce a 100% mutation testing coverage on newlib/libm's
test suite of 2002 year. The result of ~60% means that there is a gray area of
subtle things which I as an outsider cannot see through: if these are factors
with small contribution weight like you mentioned or some more or less serious
bugs hiding around.
In the Julia/openlibm thread I share what I found with other tools like KLEE
and libFuzzer: https://github.com/JuliaLang/openlibm/issues/172#issuecomment-348649289.
I cannot say with confidence that those are real bugs because I am
on a docker's Ubuntu 64bit image with some hacks to make the library compile
and run the test suite.
But I am very sure that newlib/libm would benefit a lot from being tested
continuously with a modern test suite with Clang and memory and undefined
behaviour sanitizers enabled and also being part of
Thanks for attention.
On Wed, Nov 22, 2017 at 12:55 AM, Joseph Myers <firstname.lastname@example.org> wrote:
> I would suggest looking at glibc's libm tests (60 MB of expectations
> generated using MPFR and MPC, plus many tests with manually maintained
> expectations for special cases of particular functions). In principle it
> should be possible to test other libm implementations using them, although
> certainly at present there are lots of dependencies on glibc-specific
> interfaces and glibc's rules for error handling and accuracy requirements.
> Where the glibc tests do not provide sufficient code coverage, whether for
> glibc's implementations (NB many functions have several different
> implementations used on different architectures, so several architectures
> would need testing to assess coverage well) or for other libms'
> implementations of the same functions, additional test inputs to improve
> coverage would make sense.
> I would point out that there are many cases in Sun fdlibm, and thus in
> other implementations derived from it, where there is a lot of
> arbitraryness in the particular value or operation used to e.g. force an
> exception; if the point of doing C * C is to generate "overflow", it's
> expected that you can change the (large) value of C with no change to the
> results of the function; if the point of doing C + x is to generate
> "inexact" (not in glibc's goals for most functions, but various such code
> has yet to be cleaned up), you can change C, or change the operation + to
> -, without any change to the results of the function being expected; if a
> function uses a polynomial approximation, many changes to low-order
> coefficients may result in only small changes to the result of the
> function, within its accuracy goals. So the fact that mutating the code
> does not affect test results does not necessarily indicate any problem
> with test coverage.
> Joseph S. Myers
More information about the Newlib