This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: HWCAP is method to determine cpu features, not selection mechanism.
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Wed, 10 Jun 2015 11:45:53 -0500
- Subject: Re: HWCAP is method to determine cpu features, not selection mechanism.
- Authentication-results: sourceware.org; auth=none
- References: <55760314 dot 6070601 at linux dot vnet dot ibm dot com> <5576FC80 dot 1090806 at arm dot com> <1433862393 dot 21101 dot 9 dot camel at sjmunroe-ThinkPad-W500> <20150609154223 dot GA20028 at domone> <1433865684 dot 21101 dot 20 dot camel at sjmunroe-ThinkPad-W500> <20150610125047 dot GA10861 at domone> <55783D2A dot 8050703 at linaro dot org> <557846D9 dot 3060909 at arm dot com> <55784802 dot 8070605 at linaro dot org>
- Reply-to: munroesj at linux dot vnet dot ibm dot com
On Wed, 2015-06-10 at 11:21 -0300, Adhemerval Zanella wrote:
>
> On 10-06-2015 11:16, Szabolcs Nagy wrote:
> > On 10/06/15 14:35, Adhemerval Zanella wrote:
> >> I agree that adding an API to modify the current hwcap is not a good
> >> approach. However the cost you are assuming here are *very* x86 biased,
> >> where you have only on instruction (movl <variable>(%rip), %<destiny>)
> >> to load an external variable defined in a shared library, where for
> >> powerpc it is more costly:
> >
> > debian codesearch found 4 references to __builtin_cpu_supports
> > all seem to avoid using it repeatedly.
> >
> > multiversioning dispatch only happens at startup (for a small
> > number of functions according to existing practice).
> >
> > so why is hwcap expected to be used in hot loops?
> >
>
> Good question, I do not know and I believe Steve could answer this
> better than me. I am only advocating here that assuming x86 costs
> for powerpc is not the way to evaluate this patch.
>
The trade off is that the dynamic solutions (platform library selection
via AT_PLATFORM) and STT_GNU_IFUNC require a dynamic call which in our
ABI required an indirect branch and link via the CTR. There is also the
overhead of the TOC save/reload.
The net is the trade-offs are different for POWER then for other
platform. I spend a lot of time looking at performance data from
customer applications and see these issues (as measurable additional
path length and forced hazards).
So there is a place for this proposed optimization strategy where we can
avoid the overhead of the dynamic call and substitute the smaller more
predictable latency of the HWCAP; load word, and immediate record, and
branch conditional (3 instructions, low cache hazard, and highly
predictable branch).
The concern about the cache foot print does not apply as these fields
share the cache line with other active TCB fields. This line will be in
L1 for any active thread.