This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Inline C99 math functions

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
Cc: libc-alpha at sourceware dot org
Date: Tue, 16 Jun 2015 15:43:31 +0200
Subject: Re: [PATCH] Inline C99 math functions
Authentication-results: sourceware.org; auth=none
References: <001201d0a75b$921d9860$b658c920$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151431490 dot 26683 at digraph dot polyomino dot org dot uk> <001701d0a789$f2ab86f0$d80294d0$ at com> <20150615185201 dot GA3023 at domone> <alpine dot DEB dot 2 dot 10 dot 1506152127340 dot 9772 at digraph dot polyomino dot org dot uk> <20150616050045 dot GA8021 at domone> <55801706 dot 4010109 at linaro dot org>

On Tue, Jun 16, 2015 at 09:31:02AM -0300, Adhemerval Zanella wrote:
> 
> 
> On 16-06-2015 02:00, OndÅej BÃlka wrote:
> > On Mon, Jun 15, 2015 at 09:35:22PM +0000, Joseph Myers wrote:
> >> On Mon, 15 Jun 2015, OndÅej BÃlka wrote:
> >>
> >>> As I wrote in other thread that gcc builtins have poor performance a
> >>> benchmark is tricky. Main problem is that these tests are in branch and
> >>> gcc will simplify them. As builtins are branchless it obviously couldn't
> >>> simplify them.
> >>
> >> Even a poor benchmark, checked into the benchtests directory, would be a 
> >> starting point for improved benchmarks as well as for benchmarking any 
> >> future improvements to these functions.  Having a benchmark that everyone 
> >> can readily use with glibc is better than having a performance improvement 
> >> asserted in a patch submission without the benchmark being available at 
> >> all.
> >>
> > No, a poor benchmark is dangerous and much worse than none at all. With
> > poor benchmark you could easily check performance regression that looks
> > like improvement on benchmark and you wouldn't notice until some
> > developer measures poor performance of his application and finds that
> > problem is on his side.
> > 
> > I could almost "show" that fpclassify gcc builtin is slower than library
> > call, in benchmark below I exploit branch misprediction to get close. If
> > I could use "benchtest" below I could "improve" fpclassify by making
> > zero check branchless which would improve benchmark numbers to actually
> > beat call overhead. Or I could play with probability of subnormals to
> > increase running time of gcc builtin and decrease of library. Moral is
> > that with poor benchmark your implementation will be poor as it tries to
> > minimize benchmark.
> 
> So to make this proposal to move forward, how exactly do you propose to
> create a benchtest for such scenario? I get this is tricky and a lot of
> variables may apply, but I do agree with Joseph that we shouldn't quite
> aim for optimal performance, imho using compiler builtins with reasonable
> performance is a gain in code maintainability.
> 
As I said before about these they are hard to measure and I could
argue also versus my benchmark that its inaccurate as it doesn't measure
effect of cpu pipeline when function does other computation. Answer is
don't do microbenchmark.

Take existing math function with benchtest and see what performance
difference you gain. Most complex math functions start with tests like

if (isfinite(x) && isfinite(y))

so there should be measurable performance improvement, possibly add
these if they aren't covered.


> So from various code pieces you have thrown in maillist, I see that we
> may focus on a benchmark that uses a random sample with different 
> probability scenarios FP number types:
> 
> 1. high prob for normal
> 2. high prob for nan
> 3. high prob for inf,
> 4. high prob for subnormal 
> 
> And 2, 3, 4 should not be the optimization focus (since they are not the
> usual case for mostly of computations and algorithms). Do you propose
> something different?

No, I done that to show that microbenchmarks tend to be wrong, you need
to do benchmarking to rule that out. Depending on distribution of these
cases a different implemenation will become optimal and with some work I
could find distribution where gcc builtin decision tree cost is more
than saved overhead.

That just means to go collect data to avoid problems like this. And you
forget case 5. high prob of zero. That could make fpclassify benchmark
misleading when you just use zeroed array like I did with original
benchmark.

Follow-Ups:
- Re: [PATCH] Inline C99 math functions
  - From: Carlos O'Donell

References:
- [PATCH] Inline C99 math functions
  - From: Wilco Dijkstra
- Re: [PATCH] Inline C99 math functions
  - From: Joseph Myers
- RE: [PATCH] Inline C99 math functions
  - From: Wilco Dijkstra
- Re: [PATCH] Inline C99 math functions
  - From: OndÅej BÃlka
- Re: [PATCH] Inline C99 math functions
  - From: Joseph Myers
- Re: [PATCH] Inline C99 math functions
  - From: OndÅej BÃlka
- Re: [PATCH] Inline C99 math functions
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]