[PATCH] replace tgammaf by the CORE-MATH implementation (correctly rounded)

Mon Jul 29 22:30:20 GMT 2024

On Mon, 29 Jul 2024, Paul Zimmermann wrote:

> The CORE-MATH implementation is correctly rounded (for any rounding mode).
> This can be checked by exhaustive tests in a few minutes since there are
> less than 2^32 values to check (against GNU MPFR for example).

What platforms have you tested on?  (That is, tested *with the glibc 
testsuite*, which covers correct exceptions and errno setting in 
accordance with glibc expectations.)  Those should include at least one 
platform without excess precision (e.g. x86_64) and one with excess 
precision (32-bit x86).

> diff --git a/math/w_tgammaf_compat.c b/math/w_tgammaf_compat.c
> index 34e0e096e0..b5208c54d5 100644
> --- a/math/w_tgammaf_compat.c
> +++ b/math/w_tgammaf_compat.c

These compat wrapper changes are suspect.  The ABI for the compat wrappers 
is that they support SVID error handling (calling __kernel_standard* on 
error in order to call a user-provided matherr function).  (This is only 
relevant for old binaries, not new ones, as the symbols required for SVID 
error handling are compat symbols so you can't link new binaries to use it 
without symbol versioning tricks that shouldn't be used outside the glibc 
testsuite.)

We don't systematically test SVID error handling for all functions in the 
glibc testsuite, only a few (see test-matherr.c and test-matherr-2.c).  
Maybe we *should* add tests of SVID error handling for all functions (and 
all the supported error cases for each function) to avoid patches 
mistakenly changing the compat wrappers in ways that break compatibility - 
if we'd had a test of SVID error handling for tgammaf, I expect it would 
have detected breakage from thie change to a compat wrapper.

> +  b32u32_u t = {.f = x};
> +  uint32_t ax = t.u<<1;
> +  if(__builtin_expect(ax>=(0xffu<<24), 0)){
> +    if(ax==(0xffu<<24)){
> +      if(t.u>>31){
> +	errno = EDOM;
> +	return __builtin_nanf("12");
> +      }

I think this is about the -Inf case?

(a) Please include comments explaining what cases are being handled 
whenever the conditions are determined by bit manipulation rather than 
straight comparisons of floating-point values.  The old code has comments 
such as

/* Return value for integer x < 0 is NaN with invalid exception.  */

and

/* x == -Inf.  According to ISO this is NaN.  */

and

/* Positive infinity (return positive infinity) or NaN (return NaN).  */

and it's not appropriate to regress this by introducing less-documented 
code.  More generally, lots more comments throughout the code, explaining 
what it's doing (particular cases being handled, overall evaluation 
strategy for the function in those cases), would be helpful.  Not comments 
repeating the obvious meaning of an individual line of C code, but 
comments of the level of "handle special case X" or "evaluate tgammaf on 
interval [A, B] using a degree-C polynomial approximation" or "handle hard 
cases for correct rounding, listed in table X" or similar, so that the 
reader can quickly understand the overall intent of each part of the code.

(b) errno is handled by the wrapper (w_tgamma_template.c); you shouldn't 
need to handle it here.

(c) On the other hand, the wrapper assumes that the individual function 
implementation raised appropriate exceptions.  How does this code ensure 
that "invalid" is raised?  Have you tested with the glibc testsuite, which 
verifies the "invalid" exception for negative infinity?  I'd expect 
something like "return x - x;" to ensure an appropriate exception is 
raised.

    TEST_f_f (tgamma, 0, plus_infty, DIVIDE_BY_ZERO_EXCEPTION|ERRNO_ERANGE),

> +    return x; // nan

And this is for negative NaN.  But you need to ensure that a signaling NaN 
is converted to quiet with "invalid" raised, so something more like 
"return x - x;" would be appropriate here as well.  The glibc testsuite 
includes tests for this as well.

> +  }
> +  double z = x;
> +  if(__builtin_expect(ax<0x6d000000u, 0)){

Again, please add comments explaining any comparisons based on bit 
patterns.

> +    volatile double d = (0x1.fa658c23b1578p-1 - 0x1.d0a118f324b63p-1*z)*z - 0x1.2788cfc6fb619p-1;
> +    double f = 1.0/z + d;
> +    float r = f;
> +    if(__builtin_fabs(r)>0x1.fffffep+127f) errno = ERANGE;

Any time you call a double function on a float value is suspect 
(__builtin_fabs here), because doing so is usually a bug.  If it's 
deliberate, it should have a comment.  If you're meaning to deal with 
excess precision in some way, use of math_narrow_eval is normal in glibc 
code.

> +  float fx = __builtin_floor(x);

Likewise, if you need to use the double function rather than the float one 
then please explain why in a comment.  And normally we'd use floorf etc. 
directly rather than __builtin_*.

> +  int k = fx;
> +  if(__builtin_expect(x >= 0x1.18522p+5f, 0)){
> +    float r = 0x1p127f * 0x1p127f;
> +    if(r>0x1.fffffep+127) errno = ERANGE;
> +    return r;

Definitely a case for math_narrow_eval.  As everywhere, there should be no 
need to set errno because the wrappers handle that.

> +  }
> +  if(__builtin_expect(fx==x, 0)){

Throughout, in glibc code it's better to use GNU coding style, and glibc 
macros such as __glibc_unlikely rather than raw __builtin_expect, unless 
you expect this code to be copied verbatim from upstream in future with no 
local changes at all.

> +    if(x < 0.0f) {
> +      errno = EDOM;
> +      return __builtin_nanf("12");

As elsewere, you need to compute a NaN in a way that raises an exception.

> +  double m = z - 0x1.7p+1, i = __builtin_roundeven(m), step = __builtin_copysign(1.0,i);

Normally the relevant functions would be used directly without __builtin.

When benchmarking, try to include platforms where roundeven is not inlined 
and so the out-of-line roundeven function in glibc is called here - 
especially where that means the C implementation rather than a 
processor-specific IFUNC / assembly implementation - as that may 
significantly affect performance.  And state explicitly what platform you 
benchmarked on (both how the compiler building glibc is configured, and 
what processor is in use at runtime as that may affect IFUNCs used).

It's likely this is substantially faster than the existing code even with 
the use of the C roundeven implementation, but the choice of roundeven 
implementation could be significant enough it's important to consider it 
in benchmarking.

> +  if(__builtin_expect(((rt.u+2)&0xfffffff) < 8, 0)){
> +    for(unsigned i=0;i<sizeof(tb)/sizeof(tb[0]);i++) {
> +      if(t.u==tb[i].x.u) return tb[i].f + tb[i].df;
>      }

I don't think shadowing variables is very good style (you have a double 
variable i in scope from just above this).

> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 37d8998c71..29e3ca926a 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -2263,25 +2263,21 @@ double: 1
>  
>  Function: "tgamma":
>  double: 9
> -float: 8
>  float128: 4
>  ldouble: 5

Removing ulps entries like this is something purely mechanical it seems 
appropriate to apply to *all* libm-test-ulps files, not just one.  (Cf. 
how when adding the logp1 aliases I copied ulps from log1p for all 
libm-test-ulps files.)

-- 
Joseph S. Myers
josmyers@redhat.com