This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Support separate benchmark outputs
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Siddhesh Poyarekar <siddhesh at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 16 Apr 2013 17:50:32 +0200
- Subject: Re: [PATCH] Support separate benchmark outputs
- References: <20130416122544 dot GH3063 at spoyarek dot pnq dot redhat dot com> <20130416132838 dot GA29626 at domone dot kolej dot mff dot cuni dot cz> <20130416140355 dot GI3063 at spoyarek dot pnq dot redhat dot com>
On Tue, Apr 16, 2013 at 07:33:55PM +0530, Siddhesh Poyarekar wrote:
> On Tue, Apr 16, 2013 at 03:28:38PM +0200, OndÅej BÃlka wrote:
> > I already wrote systemwide profiler for string functions. It integrates
> > results so you do not have to.
> > I also included unit test there. See kam/WWW/memcpy_profile.tar.bz2
> >
> > I plan to integrate this to dryrun framework.
>
> Systemwide profiling has different goals compared to microbenchmarks.
>
> > > + for (i = 0; i < 32; ++i)
> > > + {
> > > + HP_TIMING_NOW (start);
> > > + CALL (impl, dst, src, len);
> > > + HP_TIMING_NOW (stop);
> > > + HP_TIMING_BEST (best_time, start, stop);
> > > + }
> > > +
> > You simply cannot do measurements in this way. They are biased and
> > you will get result that is about 20 cycles off because you it did
> > not take branch misprediction and thousand other factors.
>
> And I think that's fine because I get measurements for what I have
> defined.
That is not fine. You could as well place random() there and say that
you measure that you defined.
> While I agree that systemwide profiling might give a good
> overall picture about string function performance, it does not give
> any information about its performance in specific cases. Also, the
> key factor here is the ability to compare function implementations
> side by side.
> More than numbers, what matters here is the relative
> performance.
That is my point that you must measure relative performance. However
code above does not measure performance. In simple test
http://kam.mff.cuni.cz/~ondra/memcpy_test.tar.bz2
I just switched order if measurements are done randomly or sequentialy
like you do.
According to sequential glibc implementation is better than my by 15%.
However when I sample randomly my implementation becomes 33% better than
glibc one.
seq glibc
real 0m0.207s
user 0m0.196s
sys 0m0.008s
rand glibc
real 0m0.450s
user 0m0.448s
sys 0m0.000s
seq new
real 0m0.215s
user 0m0.216s
sys 0m0.000s
rand new
real 0m0.283s
user 0m0.280s
sys 0m0.000s
seq generic
real 0m0.218s
user 0m0.216s
sys 0m0.000s
rand generic
real 0m0.472s
user 0m0.464s
sys 0m0.004s
seq byte
real 0m2.034s
user 0m2.028s
sys 0m0.000s
rand byte
real 0m2.079s
user 0m2.068s
sys 0m0.008s
>
> In other words, it would be more productive to help enhance the data
> in the tests to increase coverage.
You will get data from dryrun framework. Below is 20MB of memcpy data
http://kam.mff.cuni.cz/~ondra/dryrun_memcpy.tar.bz2