This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Support separate benchmark outputs


On Tue, Apr 16, 2013 at 07:33:55PM +0530, Siddhesh Poyarekar wrote:
> On Tue, Apr 16, 2013 at 03:28:38PM +0200, OndÅej BÃlka wrote:
> > I already wrote systemwide profiler for string functions. It integrates
> > results so you do not have to. 
> > I also included unit test there. See kam/WWW/memcpy_profile.tar.bz2
> > 
> > I plan to integrate this to dryrun framework. 
> 
> Systemwide profiling has different goals compared to microbenchmarks.
> 
> > > +      for (i = 0; i < 32; ++i)
> > > +	{
> > > +	  HP_TIMING_NOW (start);
> > > +	  CALL (impl, dst, src, len);
> > > +	  HP_TIMING_NOW (stop);
> > > +	  HP_TIMING_BEST (best_time, start, stop);
> > > +	}
> > > +
> > You simply cannot do measurements in this way. They are biased and 
> > you will get result that is about 20 cycles off because you it did 
> > not take branch misprediction and thousand other factors.
> 
> And I think that's fine because I get measurements for what I have
> defined.  
That is not fine. You could as well place random() there and say that 
you measure that you defined.

> While I agree that systemwide profiling might give a good
> overall picture about string function performance, it does not give
> any information about its performance in specific cases.  Also, the
> key factor here is the ability to compare function implementations
> side by side.  

> More than numbers, what matters here is the relative
> performance.

That is my point that you must measure relative performance. However
code above does not measure performance. In simple test 
http://kam.mff.cuni.cz/~ondra/memcpy_test.tar.bz2 
I just switched order if measurements are done randomly or sequentialy
like you do. 

According to sequential glibc implementation is better than my by 15%. 
However when I sample randomly my implementation becomes 33% better than
glibc one.  

seq glibc

real	0m0.207s
user	0m0.196s
sys	0m0.008s

rand glibc

real	0m0.450s
user	0m0.448s
sys	0m0.000s

seq new

real	0m0.215s
user	0m0.216s
sys	0m0.000s

rand new

real	0m0.283s
user	0m0.280s
sys	0m0.000s

seq generic

real	0m0.218s
user	0m0.216s
sys	0m0.000s

rand generic

real	0m0.472s
user	0m0.464s
sys	0m0.004s

seq byte

real	0m2.034s
user	0m2.028s
sys	0m0.000s

rand byte

real	0m2.079s
user	0m2.068s
sys	0m0.008s

> 
> In other words, it would be more productive to help enhance the data
> in the tests to increase coverage.

You will get data from dryrun framework. Below is 20MB of memcpy data
http://kam.mff.cuni.cz/~ondra/dryrun_memcpy.tar.bz2



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]