This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fixes tree-loop-distribute-patterns issues


On Fri, 2013-07-05 at 09:26 +0200, OndÅej BÃlka wrote:
> On Fri, Jun 21, 2013 at 12:44:03PM +0200, Torvald Riegel wrote:
> > On Fri, 2013-06-21 at 13:24 +0200, OndÅej BÃlka wrote:
> > > On Fri, Jun 21, 2013 at 10:07:08AM +0200, Torvald Riegel wrote:
> > > > On Fri, 2013-06-21 at 04:00 +0200, OndÅej BÃlka wrote:
> > > > > I choose a O0 as lesser evil than having reference implementation twice
> > > > > faster depending what compiler you do use.
> > > > > 
> > > > > One solution is mandate to run benchmarks with fixed version of gcc and
> > > > > fixed flags.
> > > > > 
> > > > > Second variant could be have assemblies and regeneration script that would 
> > > > > be ran with specific gcc.
> > > > 
> > > > Yes you can try to find a niche where you hope you can compare stuff.
> > > > But you can as well just get all the measurements you can from people
> > > > out there -- with whatever version of gcc is available -- and take this
> > > > into account when drawing conclusions from the data.  That is, you'd
> > > > setup your machine learning in such a way that it looks a data and
> > > > checks whether there is high confidence for a certain conclusion (eg,
> > > > new version of code faster or not).  Confidence will be lower if, for
> > > > example, we see performance vary a lot with different versions of gcc,
> > > > but remain more or less unchanged when gcc versions don't differ; but if
> > > > performance varies independently of the gcc version, that's also useful
> > > > to know because it means we draw our conclusion from a wider set of
> > > > tests.  Likewise for other properties of the test environment such as
> > > > the CPU etc.
> > > >
> > > And what we will do with this data?
> > >
> Please answer this question.

Analyze it.  What else?

> When you make vague proposals you risk that
> people can argue that your bussiness plan is to make $10000 piece of
> machinery for task that can be easily solved with $5 screwdriver.
> 
> I sent simple measurement. Please state what additional information you
> want.

I was arguing for getting as many measurements as we can with as much
metadata as possible, even if it might be from experiments which we
don't control tightly.  And for following up on this by trying to
extract insights from it.  That doesn't mean any tightly controlled
experiments are bad.  But it also doesn't mean that only tightly
controlled experiments are useful.

> You migth mine gold there but trying that on byte by byte version for
> artifical set of data is one of least likely places. 

I don't understand this sentence.  I read this as you saying that you
don't think one will be able to analyze the data.  Is that correct?

> > > You typicaly use machine learning to learn trivial facts from data sets
> > > that are too vast to browse manualy.
> > 
> > Is your average web search just about trivial facts?
> >
> Actually yes. Most of my search are copypasted lines with error
> messages. Over time google results got worse and worse as more
> sophisticated techniques add only noise here. Other is searching for
> projects with acronym where again you could get much better results if
> google did not ignore casings, did not think it is misspelling etc.

Well, you certainly got the point I was trying to make (I hope).  I
doubt Google would change their search if it wouldn't benefit most uses.

> > > It is faster to just browse results
> > > and you will train intuition on it. 
> > 
> > Manual inspection just doesn't scale to the scope we need it to scale
> > to.  We know that there are *lots* of parameters that can influence
> > performance, we cannot control all of them, and we likely don't even
> > know all of them.
> 
> Please be specific.

What I said was specific.  Do you want examples (beside the example
below you gave)?

> Most difficult part is have enough data for causes
> manifest as signal, not just random noise. For example if you do not
> track time when benchmarks were ran you could miss that those ran in leo
> and virgin were slower.
> 
> Second difficult step once you have data is to act onto it. You need to
> form an model to explain what was happening so you can modify your code
> accordingly.

Coming up with a performance model is rather independent of how you
collect the data that's input to the model.  But validating the model is
usually easier if you have lots of data, and not just data from a small
set of experiments you selected.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]