This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fixes tree-loop-distribute-patterns issues


I answer here question if using O0 or O2 is better to get consistent
results. I do this mainly to make sure that if somebody will browse
archives it will not propagate errors like that O2 works. Now I am
convinced that you cannot get consistent results for all compilers and
all machines.

I picked O0 mainly because possibility of big discrepancy looked
smaller, 

As always when there are several conflicting factors which can affect
performance only way to resolve it is to do measurement.

I picked simple_memcpy,simple_memset,simple_strcmp,simple_strcspn and
compiled with gcc varying between 4.1,4.4,4.7,4.8,4.9.


Benchmark and results are attached (rep_arch files).

On nehalem O0 memcpy is 30% faster for gcc4.8,4.9.
But a O2 memcpy first improves by around 40% for gcc4.7,4.8 and then
gets an 25% regression.

For memset there is regression on O0 but O2 stays same.
For strcmp both improve by similar amount for gcc 4.8,4.9
There is 20% regression in O2 strcspn but at O0 performance is rougthly same.

On xeon memcpy is unstable on O0 but stable on O2.
A memsets are within 20% on O0 but on O2 there is 30% performance
regression for gcc-4.8,4.9.
strcmp gets 25% boost for both O0 and O2
strcspn is unstable for O0 and stable for O2

On ivy bridge memcpy is more stable on O2 than O0,
memset is stable on both
strcmp is more stable on O0 than O2
strcspn is more stable on O2 than O0

On opteron it is clear win for O2 except for 
strcspn where there are two regressions for gcc-4.4 and 4.8

On fx10 results are similar with no clear lead and sometimes noisy.

I also measured O1 and O3 with similarly chaotic results, you can see
them yourself.

On Fri, Jun 21, 2013 at 04:52:40PM +0000, Joseph S. Myers wrote:
> On Fri, 21 Jun 2013, Ondrej Bilka wrote:
> 
> > Are you sure? Lower optimization levels keep a structure of program
> > mostly intact so a single change is unlikely to have big impact on
> > performance. If this is so then combination is likely to produce just a
> > noise.
> 
> "structure of program" can include such things as an explicit conversion 
> step from void * to char * (for example).  If the internal representation 
> of one compiler version involves an assignment between two internal 
> variables with different types to effect that conversion, and another 
> compiler version elides that assignment and uses just one internal 
> variable, you may get different code, even though all versions would elide 
> such an assignment when optimizing.
> 

> > > Any sort of performance measurement involving -O0 is extremely suspect, 
Now there are results and O0 looks no more suspect than O2

> > > simply because performance is essentially not a consideration at all for 
> > > -O0 code generation; other matters such as speed of the compiler itself 
> > > and debuggability are the considerations involved, and are the things 
> > > people may try to avoid regressing across compiler upgrades.
> > > 
> > Here we need it mainly reference, as in this case it is more important than
> > actual performance.
> 
> I think a comparison is only particularly meaningful when the different 
> versions of a function being compared are built with the same compiler, 
> with the same options, and run on the same hardware.
>
Already discussed, see
 http://www.sourceware.org/ml/libc-alpha/2013-06/msg00784.html

> If what you want to do is compare <optimized-memcpy-1> and 
> <optimized-memcpy-2>, it's not clear to me why a simple version is needed 
> at all; just compare the two implementations of interest directly and 
> don't involve a third implementation.  But if you choose to do the 
> comparison as <optimized-memcpy-1>/<simple-memcpy> compared to 
> <optimized-memcpy-2>/<simple-memcpy>,
> 
I also thing that adding simple-memcpy was design mistake but as it is
there you should make results meaningful or write patch to remove simple* from
benchmarks.

> (a) it doesn't really matter how <simple-memcpy> performs, as long as the 
> two comparisons use identical <simple-memcpy>; and
> 
> (b) the ratios themselves will be more meaningful to humans if the 
> comparison is against <simple-memcpy> built with the same options used for 
> normal C code in glibc, rather than against something so stupid it would 
> never go in a glibc binary.
> 
Pick one. These two arguments contradict each other, if you want use
options as glibc is built you must use specific set of options and
specific gcc version.

Attachment: byte_benchmark.tar.bz2
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]