This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Inline C99 math functions


On Mon, Jun 15, 2015 at 09:35:22PM +0000, Joseph Myers wrote:
> On Mon, 15 Jun 2015, OndÅej BÃlka wrote:
> 
> > As I wrote in other thread that gcc builtins have poor performance a
> > benchmark is tricky. Main problem is that these tests are in branch and
> > gcc will simplify them. As builtins are branchless it obviously couldn't
> > simplify them.
> 
> Even a poor benchmark, checked into the benchtests directory, would be a 
> starting point for improved benchmarks as well as for benchmarking any 
> future improvements to these functions.  Having a benchmark that everyone 
> can readily use with glibc is better than having a performance improvement 
> asserted in a patch submission without the benchmark being available at 
> all.
>
No, a poor benchmark is dangerous and much worse than none at all. With
poor benchmark you could easily check performance regression that looks
like improvement on benchmark and you wouldn't notice until some
developer measures poor performance of his application and finds that
problem is on his side.

I could almost "show" that fpclassify gcc builtin is slower than library
call, in benchmark below I exploit branch misprediction to get close. If
I could use "benchtest" below I could "improve" fpclassify by making
zero check branchless which would improve benchmark numbers to actually
beat call overhead. Or I could play with probability of subnormals to
increase running time of gcc builtin and decrease of library. Moral is
that with poor benchmark your implementation will be poor as it tries to
minimize benchmark.

With d[i]=(rand() % 2) ? 0.0 : (rand() % 2 ? 1.3 : 1.0/0.0);

there is almost no difference

06:28:15:~$  gcc -O3  ft.c -lm 
06:28:19:~$ time ./a.out 

real	0m0.719s
user	0m0.712s
sys	0m0.004s

06:28:26:~$  gcc -O3  ft.c -lm  -DBUILTIN
06:28:39:~$ time ./a.out 

real	0m0.677s
user	0m0.676s
sys	0m0.000s

While when I change that to d[i]=4.2
then there is big improvement.

06:50:55:~$ gcc -O3 ft.c -lm 
06:51:00:~$ time ./a.out 

real	0m0.624s
user	0m0.624s
sys	0m0.000s
06:51:02:~$ gcc -O3 ft.c -lm  -DBUILTIN
06:51:12:~$ time ./a.out 

real	0m0.384s
user	0m0.380s
sys	0m0.000s


> It isn't necessary to show that the use of built-in functions here is 
> optimal.  Simply provide evidence that (a) it's at least as good as the 
> existing out-of-line functions, for calls from user programs, and (b) libm 
> functions that previously used glibc-internal inlines, and would use GCC 
> built-in functions after the patch, don't suffer any significant 
> performance regression from that change.
> 
Which as I write before isn't as due gcc builtin poor implementation.
You would need to fix that or add inlines for optimal performance.

And variant of my "benchmark" is here:


#include <stdlib.h>
#include <stdio.h>
#include <math.h>

#ifdef BUILTIN
int __attribute__((noinline)) nor(double x)
{
  return  __builtin_expect( 
__builtin_fpclassify (FP_NAN, FP_INFINITE,           \
     FP_NORMAL, FP_SUBNORMAL, FP_ZERO, x),0);

}
#else
int __attribute__((noinline)) nor(double x)
{
  return fpclassify (x);
}
#endif


int main()
{
  double ret= 0.0;
  int i, j;
  double *d = malloc (800000);
  for (i=0;i<1000;i++)
    d[i]=(rand() % 2) ? 0.0 : (rand() % 2 ? 1.3 : 1.0/0.0);

  for (j=0; j<100000; j++)
  for (i=0; i<1000; i++){
   int result = nor(d[i]);
    if (result == FP_NORMAL)
      ret += 42;                         }
  
  return ret;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]