[PATCH 09/17] math: Use exp10m1f from CORE-MATH

Adhemerval Zanella Netto adhemerval.zanella@linaro.org
Thu Oct 31 13:20:08 GMT 2024



On 29/10/24 18:09, DJ Delorie wrote:
> 
> A few questions but I suspect they're easily answered.  LGTM with
> reasonable answers.  I'll elide the same questions in the other patches
> in this series with the assumption that the same answers apply ;-)
> 
> Reviewed-by: DJ Delorie <dj@redhat.com>
> 
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>> diff --git a/SHARED-FILES b/SHARED-FILES
>> +sysdeps/ieee754/flt-32/s_exp10m1f.c
>> +  (file src/binary32/exp10m1/exp10m1f.c in CORE_MATH)
> 
> I notice this file references the CORE MATH project (is it CORE_MATH or
> CORE-MATH?  The file is inconsistent) but doesn't provide an URL to the
> project; such a link would be appreciated in a future patch :-)

It is CORE-MATH, I will fix it.  And I will send patch to add a proper
link to the project.

> 
>> diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
>> index 8c333fcd29..00832393f6 100644
>> --- a/sysdeps/aarch64/libm-test-ulps
>> +++ b/sysdeps/aarch64/libm-test-ulps
>> @@ -1083,22 +1083,18 @@ ldouble: 3
>>  
>>  Function: "exp10m1":
>>  double: 4
>> -float: 2
>>  ldouble: 3
> 
> Does this mean "we're not testing it any more" or "we expect zero ULPs
> and verify that" ?  I don't see any ": 0" in any of the ulps files so I
> assume the latter.

The former, we are expecting a correctly rounded implementation for the
inputs we test.

> 
>>  Function: "exp10m1_downward":
>>  double: 3
>> -float: 3
>>  ldouble: 6
>>  
>>  Function: "exp10m1_towardzero":
>>  double: 2
>> -float: 3
>>  ldouble: 6
>>  
>>  Function: "exp10m1_upward":
>>  double: 5
>> -float: 3
>>  ldouble: 6
>>  
>>  Function: "exp2":
>> diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps
>> index d1665b9b72..ad4ae4c632 100644
>> --- a/sysdeps/arc/fpu/libm-test-ulps
>> +++ b/sysdeps/arc/fpu/libm-test-ulps
>> @@ -789,7 +789,6 @@ float: 1
>>  
>>  Function: "exp10m1":
>>  double: 4
>> -float: 2
>>  
>>  Function: "exp10m1_downward":
>>  double: 3
> 
> The other roundings still have ULPs?

I should, since arc uses the generic implementation.  I missed these entries,
thank for spotting it.

> 
>> diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps
>> index e768024eb5..4cecea6919 100644
>> --- a/sysdeps/arc/nofpu/libm-test-ulps
>> +++ b/sysdeps/arc/nofpu/libm-test-ulps
>> @@ -194,7 +194,6 @@ float: 1
>>  
>>  Function: "exp10m1":
>>  double: 3
>> -float: 2
>>  
>>  Function: "exp2":
>>  double: 1
> 
> Likewise here.

This is different because usually for soft-fp and for some hard-fp as well, 
only one rounding mode is support.  For such cases, there is no symbol_rounding
entries.

> 
>> diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps
>> diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps
>> diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps
>> diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps
> 
> Ok.
> 
>> diff --git a/sysdeps/ieee754/flt-32/s_exp10m1f.c b/sysdeps/ieee754/flt-32/s_exp10m1f.c
>> new file mode 100644
>> index 0000000000..610f269b01
>> --- /dev/null
>> +++ b/sysdeps/ieee754/flt-32/s_exp10m1f.c
>> @@ -0,0 +1,225 @@
>> +/* Implementation of the exp10m1 function for binary32.
>> +
>> +Copyright (c) 2022-2024 Alexei Sibidanov. Paul Zimmermann.
>> +
>> +The original version of this file was copied from the CORE-MATH
>> +project (file src/binary32/exp10m1/exp10m1f.c, revision d680516).
>> +
>> +Permission is hereby granted, free of charge, to any person obtaining a copy
>> +of this software and associated documentation files (the "Software"), to deal
>> +in the Software without restriction, including without limitation the rights
>> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> +copies of the Software, and to permit persons to whom the Software is
>> +furnished to do so, subject to the following conditions:
>> +
>> +The above copyright notice and this permission notice shall be included in all
>> +copies or substantial portions of the Software.
>> +
>> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> +SOFTWARE.
>> + */
>> +
>> +#include <math.h>
>> +#include "math_config.h"
>> +#include <libm-alias-float.h>
> 
> Ok.
> 
>> +float
>> +__exp10m1f (float x)
>> +{
>> +  const double iln10h = 0x1.a934f09p+1 * 16;
>> +  const double iln10l = 0x1.e68dc57f2496p-29 * 16;
>> +  double z = x;
>> +  uint32_t ux = asuint (x);
>> +  uint32_t ax = ux & (~0u >> 1);
>> +  if (__glibc_unlikely (ux > 0xc0f0d2f1u))
>> +    { /* x < -7.52575 */
>> +      if (ax > (0xffu << 23))
>> +	return x + x; /* nan */
>> +      return (ux == 0xff800000) ? -0x1p+0 : -0x1p+0 + 0x1p-26f;
>> +    }
> 
> Ok.
> 
>> +  else if (__glibc_unlikely (ax > 0x421a209au))
>> +    { /* x > 38.5318 */
>> +      if (ax >= asuint (INFINITY))
>> +	return x + x; /* +Inf or NaN */
>> +      return __math_oflowf (0);
>> +    }
> 
> Ok.
> 
>> +  else if (__glibc_unlikely (ax < 0x3d89c604u))
>> +    { /* |x| < 0.1549/log(10) */
>> +      double z2 = z * z, r;
>> +      if (__glibc_unlikely (ax < 0x3d1622fbu))
>> +	{ /* |x| < 8.44e-2/log(10)  */
>> +	  if (__glibc_unlikely (ax < 0x3c8b76a3u))
>> +	    { /* |x| < 3.92e-2/log(10)  */
>> +	      if (__glibc_unlikely (ax < 0x3bcced04u))
>> +		{ /* |x| < 1.44e-2/log(10) */
>> +		  if (__glibc_unlikely (ax < 0x3acf33ebu))
>> +		    { /* |x| < 3.64e-3/log(10 */
>> +		      if (__glibc_unlikely (ax < 0x395a966bu))
>> +			{ /* |x| < 4.8e-4/log(10 */
>> +			  if (__glibc_unlikely (ax < 0x36fe4a4bu))
>> +			    { /* |x| < 1.745e-5/log(10) */
>> +			      if (__glibc_unlikely (ax < 0x32407f39u))
>> +				{ /* |x| < 2.58e-8/log(10) */
>> +				  if (__glibc_unlikely (ax < 0x245e5bd9u))
>> +				    { /* |x| < 4.82164e-17 */
>> +				      r = 0x1.26bb1bbb55516p+1;
>> +				    }
>> +				  else
>> +				    {
>> +				      if (__glibc_unlikely (ux == 0x2c994b7bu))
>> +					return 0x1.60f974p-37f - 0x1p-90f;
>> +				      r = 0x1.26bb1bbb55516p+1
>> +					  + z * 0x1.53524c73cea69p+1;
>> +				    }
>> +				}
>> +			      else
>> +				{
>> +				  if (__glibc_unlikely (ux == 0xb6fa215bu))
>> +				    return -0x1.1ff87ep-16 + 0x1p-68;
>> +				  r = 0x1.26bb1bbb55516p+1
>> +				      + z * (0x1.53524c73ea62fp+1
>> +					     + z * 0x1.0470591de2c75p+1);
>> +				}
>> +			    }
>> +			  else
>> +			    {
>> +			      static const double cp[] =
>> +				{
>> +				  0x1.26bb1bbb55515p+1, 0x1.53524c73cea69p+1,
>> +				  0x1.0470595038cc2p+1, 0x1.2bd7609fe1561p+0
>> +			        };
>> +			      r = (cp[0] + z * cp[1])
>> +				  + z2 * (cp[2] + z * cp[3]);
>> +			    }
>> +			}
>> +		      else
>> +			{
>> +			  static const double cp[] =
>> +			    {
>> +			      0x1.26bb1bbb55516p+1, 0x1.53524c73ce6dbp+1,
>> +			      0x1.0470591de3024p+1, 0x1.2bd76b79060e6p+0,
>> +			      0x1.1429ffd3a963dp-1
>> +			    };
>> +			  r = (cp[0] + z * cp[1])
>> +			      + z2 * (cp[2] + z * (cp[3] + z * cp[4]));
>> +			}
>> +		    }
>> +		  else
>> +		    {
>> +		      static const double cp[] =
>> +			{
>> +			  0x1.26bb1bbb55516p+1, 0x1.53524c73cea67p+1,
>> +			  0x1.0470591dc2953p+1, 0x1.2bd760a004d64p+0,
>> +			  0x1.142a85da6f072p-1, 0x1.a7ed70725b00ep-3
>> +			};
>> +		      r = (cp[0] + z * cp[1]) + z2
>> +			    * ((cp[2] + z * cp[3])
>> +			       + z2 * (cp[4] + z * cp[5]));
>> +		    }
>> +		}
>> +	      else
>> +		{
>> +		  static const double cp[] =
>> +		    {
>> +		      0x1.26bb1bbb55516p+1, 0x1.53524c73ceadep+1,
>> +		      0x1.0470591de2bb4p+1, 0x1.2bd76099a9d33p+0,
>> +		      0x1.1429ffd829b0bp-1, 0x1.a7f2a6a0f7dc8p-3,
>> +		      0x1.16e4dfbce0f56p-4
>> +		    };
>> +		  r = (cp[0] + z * cp[1])
>> +		      + z2 * ((cp[2] + z * cp[3])
>> +			      + z2 * (cp[4] + z * (cp[5] + z * cp[6])));
>> +		}
>> +	    }
>> +	  else
>> +	    {
>> +	      static const double cp[] =
>> +		{
>> +		  0x1.26bb1bbb55515p+1, 0x1.53524c73cea6ap+1,
>> +		  0x1.0470591de476p+1,  0x1.2bd7609fd4ee2p+0,
>> +		  0x1.1429ff70a9b48p-1, 0x1.a7ed71259ba5bp-3,
>> +		  0x1.16f3004fb3ac1p-4, 0x1.4116b0388aa9fp-6
>> +		};
>> +	      r = ((cp[0] + z * cp[1]) + z2 * (cp[2] + z * cp[3]))
>> +		  + (z2 * z2) * ((cp[4] + z * cp[5]) + z2 * (cp[6]
>> +							     + z * cp[7]));
>> +	    }
>> +	}
>> +      else
>> +	{
>> +	  static const double cp[] =
>> +	    {
>> +	      0x1.26bb1bbb55515p+1, 0x1.53524c73cea42p+1, 0x1.0470591de2d1dp+1,
>> +	      0x1.2bd760a010a53p+0, 0x1.1429ffd16170cp-1, 0x1.a7ed6b2a0d97fp-3,
>> +	      0x1.16e4e37fa51e4p-4, 0x1.4147fe4c1676fp-6, 0x1.4897c4b3e329ap-8
>> +	    };
>> +	  r = ((cp[0] + z * cp[1]) + z2 * (cp[2] + z * cp[3]))
>> +	      + (z2 * z2) * ((cp[4] + z * cp[5])
>> +			     + z2 * (cp[6] + z * (cp[7] + z * cp[8])));
>> +	}
>> +      r *= z;
>> +      return r;
>> +    }
> 
> I'm not a fan of having 99% of the code be in a hugely nested "else"
> clause, but I assume the upstream code is the same way.  Better would be
> to keep each block of code near its conditional by inverting the
> conditionals, assuming performance isn't affected by it.
> 
> Not a blocker though, just hard to understand.

Me neither, but in this case I did not change the original code (just
use glibc code style).  I will check with Paul if a code restructuring
would be worth. 

> 
>> +  else
>> +    {
>> +      /* -7.52575 < x < -0.1549/log(10) or 0.1549/log(10) < x < 38.5318  */
>> +      static const double tb[] =
>> +	{
>> +	  0x1p+0, 0x1.0b5586cf9890fp+0, 0x1.172b83c7d517bp+0,
>> +	  0x1.2387a6e756238p+0, 0x1.306fe0a31b715p+0, 0x1.3dea64c123422p+0,
>> +	  0x1.4bfdad5362a27p+0, 0x1.5ab07dd485429p+0, 0x1.6a09e667f3bcdp+0,
>> +	  0x1.7a11473eb0187p+0, 0x1.8ace5422aa0dap+0, 0x1.9c49182a3f09p+0,
>> +	  0x1.ae89f995ad3adp+0, 0x1.c199bdd85529cp+0, 0x1.d5818dcfba487p+0,
>> +	  0x1.ea4afa2a490dap+0
>> +	};
>> +      static const double c[] =
>> +	{
>> +	  0x1.62e42fefa398bp-5,  0x1.ebfbdff84555ap-11, 0x1.c6b08d4ad86d3p-17,
>> +	  0x1.3b2ad1b1716a2p-23, 0x1.5d7472718ce9dp-30, 0x1.4a1d7f457ac56p-37
>> +	};
>> +
>> +      if (__glibc_unlikely ((ux << 11) == 0))
>> +	{
>> +	  uint32_t k = (ux >> 21) - 0x1fc;
>> +	  if (k <= 0xb)
>> +	    {
>> +	      if (k == 0)
>> +		return 10.0f - 1.0f;
>> +	      if (k == 4)
>> +		return 100.0f - 1.0f;
>> +	      if (k == 6)
>> +		return 1000.0f - 1.0f;
>> +	      if (k == 8)
>> +		return 10000.0f - 1.0f;
>> +	      if (k == 9)
>> +		return 100000.0f - 1.0f;
>> +	      if (k == 10)
>> +		return 1000000.0f - 1.0f;
>> +	      if (k == 11)
>> +		return 10000000.0f - 1.0f;
>> +	    }
>> +	}
> 
> Ok.
> 
>> +      double a = iln10h * z;
>> +      double ia = floor (a);
>> +      double h = (a - ia) + iln10l * z;
>> +      int64_t i = ia;
>> +      int64_t j = i & 0xf;
>> +      int64_t e = i - j;
>> +      e >>= 4;
>> +      double s = tb[j];
>> +      s *= asdouble ((e + 0x3ffull) << 52);
>> +      double h2 = h * h;
>> +      double c0 = c[0] + h * c[1];
>> +      double c2 = c[2] + h * c[3];
>> +      double c4 = c[4] + h * c[5];
>> +      c0 += h2 * (c2 + h2 * c4);
>> +      double w = s * h;
>> +      return (s - 1.0) + w * c0;
>> +    }
>> +}
>> +libm_alias_float (__exp10m1, exp10m1)
> 
> Ok.
> 
>> diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps
>> diff --git a/sysdeps/m68k/m680x0/fpu/libm-test-ulps b/sysdeps/m68k/m680x0/fpu/libm-test-ulps
>> diff --git a/sysdeps/mips/mips32/libm-test-ulps b/sysdeps/mips/mips32/libm-test-ulps
>> diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps
>> diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
>> diff --git a/sysdeps/powerpc/nofpu/libm-test-ulps b/sysdeps/powerpc/nofpu/libm-test-ulps
>> diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps
> 
> Ok.
> 
>> index 4adad138b0..2271e1ee6c 100644
>> --- a/sysdeps/riscv/nofpu/libm-test-ulps
>> +++ b/sysdeps/riscv/nofpu/libm-test-ulps
>> @@ -974,7 +974,6 @@ ldouble: 3
>>  
>>  Function: "exp10m1":
>>  double: 2
>> -float: 1
>>  ldouble: 1
>>  
>>  Function: "exp2":
> 
> The others are missing?

Same case for arc-nofpu, riscv softfp only has support for one rounding.

> 
>> diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps
>> diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps
>> diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
>> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> 
> Ok.
> 



More information about the Libc-alpha mailing list