This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: HWCAP is method to determine cpu features, not selection mechanism.


On Wed, 2015-06-10 at 11:21 -0300, Adhemerval Zanella wrote:
> 
> On 10-06-2015 11:16, Szabolcs Nagy wrote:
> > On 10/06/15 14:35, Adhemerval Zanella wrote:
> >> I agree that adding an API to modify the current hwcap is not a good
> >> approach. However the cost you are assuming here are *very* x86 biased,
> >> where you have only on instruction (movl <variable>(%rip), %<destiny>) 
> >> to load an external variable defined in a shared library, where for
> >> powerpc it is more costly:
> > 
> > debian codesearch found 4 references to __builtin_cpu_supports
> > all seem to avoid using it repeatedly.
> > 
> > multiversioning dispatch only happens at startup (for a small
> > number of functions according to existing practice).
> > 
> > so why is hwcap expected to be used in hot loops?
> > 
> 
> Good question, I do not know and I believe Steve could answer this
> better than me.  I am only advocating here that assuming x86 costs
> for powerpc is not the way to evaluate this patch.
> 

The trade off is that the dynamic solutions (platform library selection
via AT_PLATFORM) and STT_GNU_IFUNC require a dynamic call which in our
ABI required an indirect branch and link via the CTR. There is also the
overhead of the TOC save/reload.

The net is the trade-offs are different for POWER then for other
platform. I spend a lot of time looking at performance data from
customer applications and see these issues (as measurable additional
path length and forced hazards).

So there is a place for this proposed optimization strategy where we can
avoid the overhead of the dynamic call and substitute the smaller more
predictable latency of the HWCAP; load word, and immediate record, and
branch conditional (3 instructions, low cache hazard, and highly
predictable branch).

The concern about the cache foot print does not apply as these fields
share the cache line with other active TCB fields. This line will be in
L1 for any active thread.






Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]