This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fixes tree-loop-distribute-patterns issues


> Actually you should split simple_* to separate files and compile them with
> O0.

__attribute__ ((optimize ("O0"))) is sufficient in compilers that support
it (4.6, I think) and less hassle than breaking up files.  I don't think
anyone does or should care about performance analysis using compilers that
are so old as not to have that.

> Doing otherwise makes their performance dependent on gcc version and
> this makes results even more unreliable.

Perhaps that matters for benchtests, if they are intended to use the
simple_* implementations' performance as a baseline for comparison.  The
correctness tests (i.e. all tests outside benchtests/) do not care about
that, and that's all I'm personally concerned with.

If what you want as a performance baseline is "the obvious loop handling a
byte at a time", then -O0 code can easily be substantially worse than this
and give a misleading impression of what naive code would actually do.
With -O0, the compiler is exceedingly stupid (by design), and usually every
operation has excess spill and reload operations, which could easily
dominate the performance of what would otherwise be a very tight loop.
Short of hand-coding naive assembly for each machine, I'm not sure how you
can robustly address that issue.  Perhaps -O1 is a good fit for what
assembly a human would write when not trying to be especially clever;
but that's just a shot in the dark.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]