This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: HWCAP is method to determine cpu features, not selection mechanism.
- From: Andrew Pinski <pinskia at gmail dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: munroesj at linux dot vnet dot ibm dot com, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Szabolcs Nagy <szabolcs dot nagy at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Thu, 11 Jun 2015 15:08:32 +0800
- Subject: Re: HWCAP is method to determine cpu features, not selection mechanism.
- Authentication-results: sourceware.org; auth=none
- References: <1433865684 dot 21101 dot 20 dot camel at sjmunroe-ThinkPad-W500> <20150610125047 dot GA10861 at domone> <55783D2A dot 8050703 at linaro dot org> <557846D9 dot 3060909 at arm dot com> <55784802 dot 8070605 at linaro dot org> <20150610150944 dot GA11504 at domone> <5578567C dot 5020504 at linaro dot org> <20150610155354 dot GA12820 at domone> <1433962707 dot 25475 dot 92 dot camel at sjmunroe-ThinkPad-W500> <CA+=Sn1nZg3K6oJsKREuC56k7M1N=nPLF735B7cT7sp3vh4EKyw at mail dot gmail dot com> <20150611065220 dot GA18868 at domone>
On Thu, Jun 11, 2015 at 2:52 PM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Thu, Jun 11, 2015 at 01:30:51PM +0800, Andrew Pinski wrote:
>> On Thu, Jun 11, 2015 at 2:58 AM, Steven Munroe
>> <munroesj@linux.vnet.ibm.com> wrote:
>> > On Wed, 2015-06-10 at 17:53 +0200, OndÅej BÃlka wrote:
>> >> On Wed, Jun 10, 2015 at 12:23:40PM -0300, Adhemerval Zanella wrote:
>> >> >
>> >> >
>> >> > On 10-06-2015 12:09, OndÅej BÃlka wrote:
>> >> > > On Wed, Jun 10, 2015 at 11:21:54AM -0300, Adhemerval Zanella wrote:
>> >> > >>
>> >> > >>
>> >> > >> On 10-06-2015 11:16, Szabolcs Nagy wrote:
>> >> > >>> On 10/06/15 14:35, Adhemerval Zanella wrote:
>> >> > >>>> I agree that adding an API to modify the current hwcap is not a good
>> >> > >>>> approach. However the cost you are assuming here are *very* x86 biased,
>> >> > >>>> where you have only on instruction (movl <variable>(%rip), %<destiny>)
>> >> > >>>> to load an external variable defined in a shared library, where for
>> >> > >>>> powerpc it is more costly:
>> >> > >>>
>> >> > >>> debian codesearch found 4 references to __builtin_cpu_supports
>> >> > >>> all seem to avoid using it repeatedly.
>> >> > >>>
>> >> > >>> multiversioning dispatch only happens at startup (for a small
>> >> > >>> number of functions according to existing practice).
>> >> > >>>
>> >> > >>> so why is hwcap expected to be used in hot loops?
>> >> > >>>
>> >> > >>
>> >> snip
>> >> > And my understanding is to optimize hwcap access to provide a 'better' way
>> >> > to enable '__builtin_cpu_supports' for powerpc. IFUNC is another way to provide
>> >> > function selection, but it does not exclude that accessing hwcap through
>> >> > TLS is *faster* than current options. It is up to developer to decide to use
>> >> > either IFUNC or __builtin_cpu_supports. If the developer will use it in
>> >> > hot loops or not, it is up to them to profile and use another way.
>> >> >
>> >> > You can say the same about current x86 __builtin_cpu_supports support: you should
>> >> > not use in loops, you should use ifunc, whatever.
>> >>
>> >> Sorry but no again. We are talking here on difference between variable
>> >> access and tcb access. You forgot to count total cost. That includes
>> >> initialization overhead per thread to initialize hwcap, increased
>> >> per-thread memory usage, maintainance burden and increased cache misses.
>> >> If you access hwcap only rarely as you should then per-thread copies
>> >> would introduce cache miss that is more costy than GOT overhead. In GOT
>> >> case it could be avoided as combined threads would access it more often.
>> >>
>> > Actually Adhemerval does have the knowledge, background, and experience
>> > to understand this difference and accurately access the trade-offs.
>>
>> Yes and the trade-offs for Power are going to be different than the
>> trade-offs for AARCH64 and x86_64. And it gets harder for AARCH64
>> really as there are many micro-architectures and not controlled by
>> just one vendor (this is getting off topic).
>>
>>
> But I was talking about general trade off that you shouldn't do
> instruction selection frequently. You should select granularity that
> makes overhead of selection itself insignificant. If there is small
> function that requires it you should inline it or resolve which variant
> to do in caller. That stays true on all platforms.
>>
>> >
>> >> So if your multithreaded application access hwcap maybe 10 times per run
>> >> you would likely harm performance.
>> >>
>> > Sorry this is not an accurate assessment as the proposed fields are in
>> > the same cache line as other more frequently accessed fields of the TCB.
>> >
>> > The proposal will not effectively increase the cache foot-print.
>>
>> very true, it might actually decrease it :).
>>
> Are you claiming that adding a unused fields to between frequently used
> fields of structure decreases cache footprint?
>
> Or are you claiming that at least 10% of applications on powerpc will
> frequently access hwcap?
>
> As I said before provide evidence. Naturally if 90% of applications
> wouldn't access hwcap then it would probably increase memory footprint
> as you add unused field per thread.
>
> I am talking about average impact. I could say about almost anything
> that in best case it decreases cache footprint. For example that by
> chance adding variable makes frequently used firefox tls structure
> aligned to 64 bytes.
>
>
>> >
>> >> I could from my head tell ten functions that with tcb entry lead to much
>> >> bigger performance gains. So if this is applicable I will submit strspn
>> >> improvement that keeps 32 bitmask and checks if second argument didn't
>> >> changed. That would be better usage of tls than keeping hwcap data.
>> >>
>> > If you are suggestion saving results across strspn calls then a normal
>> > TLS variable would be an appropriate choice.
>> >
>> > This proposal covers a different situation.
>> >
>> >
>> > /soap box
>> > While I am no expert in all things and try not to comment on things
>> > which I really don't have the expertise (especially other platforms), I
>> > do know a lot about the POWER platform.
>> >
>> > I am responsible for the overall delivery of the open source toolchain
>> > for Linux on Power. GLIBC is just one component of many that needs to be
>> > coordinated for delivery. I also get involved directly with Linux
>> > customers and try to respond to issues they identify. As such I am in a
>> > good position to see how all the pieces (hardware, software, ABIs, ...)
>> > fit together and where they can be made better.
>> >
>> > With this larger responsibility, I don't have much time to quibble over
>> > the fine point of esoteric design. So I tend to short cut to conclusions
>> > and support my team.
>>
>> I know how it feels, I am in the same boat. Usually my suggestions
>> are more aimed at getting some free work done for myself :).
>> But I actually like this proposal and even thinking about it for
>> AARCH64 with both hwcap and another AUVX varaible.
>>
> Which ones and why not reparse parse entire AUXV to translate each
> getauxval(x) to have static offset for each.
The one (MIDR) which is equivalent of doing cpuid on x86. I still
need to submit the kernel patch for this but that will be next week.
HWCAP is not enough in this case as there are going to be many more
micro-architectures and even different passes (major revisions) of the
same micro-architecture might have slightly different behavior (I
already know of one but I can't say anything more than that).
Thanks,
Andrew
>
> if (__builtin_constant_p(x) && x == foo)
> &(auxval_hack_foo)
>
> That would provide faster getgid and geteuid. If you do this with
> Florian's hack it could help.
>
>
- References:
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
- HWCAP is method to determine cpu features, not selection mechanism.
- Re: HWCAP is method to determine cpu features, not selection mechanism.
- Re: HWCAP is method to determine cpu features, not selection mechanism.
- Re: HWCAP is method to determine cpu features, not selection mechanism.
- Re: HWCAP is method to determine cpu features, not selection mechanism.
- Re: HWCAP is method to determine cpu features, not selection mechanism.
- Re: HWCAP is method to determine cpu features, not selection mechanism.
- Re: HWCAP is method to determine cpu features, not selection mechanism.
- Re: HWCAP is method to determine cpu features, not selection mechanism.
- Re: HWCAP is method to determine cpu features, not selection mechanism.