This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: HWCAP is method to determine cpu features, not selection mechanism.


On Thu, Jun 11, 2015 at 2:58 AM, Steven Munroe
<munroesj@linux.vnet.ibm.com> wrote:
> On Wed, 2015-06-10 at 17:53 +0200, OndÅej BÃlka wrote:
>> On Wed, Jun 10, 2015 at 12:23:40PM -0300, Adhemerval Zanella wrote:
>> >
>> >
>> > On 10-06-2015 12:09, OndÅej BÃlka wrote:
>> > > On Wed, Jun 10, 2015 at 11:21:54AM -0300, Adhemerval Zanella wrote:
>> > >>
>> > >>
>> > >> On 10-06-2015 11:16, Szabolcs Nagy wrote:
>> > >>> On 10/06/15 14:35, Adhemerval Zanella wrote:
>> > >>>> I agree that adding an API to modify the current hwcap is not a good
>> > >>>> approach. However the cost you are assuming here are *very* x86 biased,
>> > >>>> where you have only on instruction (movl <variable>(%rip), %<destiny>)
>> > >>>> to load an external variable defined in a shared library, where for
>> > >>>> powerpc it is more costly:
>> > >>>
>> > >>> debian codesearch found 4 references to __builtin_cpu_supports
>> > >>> all seem to avoid using it repeatedly.
>> > >>>
>> > >>> multiversioning dispatch only happens at startup (for a small
>> > >>> number of functions according to existing practice).
>> > >>>
>> > >>> so why is hwcap expected to be used in hot loops?
>> > >>>
>> > >>
>> snip
>> > And my understanding is to optimize hwcap access to provide a 'better' way
>> > to enable '__builtin_cpu_supports' for powerpc.  IFUNC is another way to provide
>> > function selection, but it does not exclude that accessing hwcap through
>> > TLS is *faster* than current options. It is up to developer to decide to use
>> > either IFUNC or __builtin_cpu_supports. If the developer will use it in
>> > hot loops or not, it is up to them to profile and use another way.
>> >
>> > You can say the same about current x86 __builtin_cpu_supports support: you should
>> > not use in loops, you should use ifunc, whatever.
>>
>> Sorry but no again. We are talking here on difference between variable
>> access and tcb access. You forgot to count total cost. That includes
>> initialization overhead per thread to initialize hwcap, increased
>> per-thread memory usage, maintainance burden and increased cache misses.
>> If you access hwcap only rarely as you should then per-thread copies
>> would introduce cache miss that is more costy than GOT overhead. In GOT
>> case it could be avoided as combined threads would access it more often.
>>
> Actually Adhemerval does have the knowledge, background, and experience
> to understand this difference and accurately access the trade-offs.

Yes and the trade-offs for Power are going to be different than the
trade-offs for AARCH64 and x86_64.  And it gets harder for AARCH64
really as there are many micro-architectures and not controlled by
just one vendor (this is getting off topic).



>
>> So if your multithreaded application access hwcap maybe 10 times per run
>> you would likely harm performance.
>>
> Sorry this is not an accurate assessment as the proposed fields are in
> the same cache line as other more frequently accessed fields of the TCB.
>
> The proposal will not effectively increase the cache foot-print.

very true, it might actually decrease it :).

>
>> I could from my head tell ten functions that with tcb entry lead to much
>> bigger performance gains. So if this is applicable I will submit strspn
>> improvement that keeps 32 bitmask and checks if second argument didn't
>> changed. That would be better usage of tls than keeping hwcap data.
>>
> If you are suggestion saving results across strspn calls then a normal
> TLS variable would be an appropriate choice.
>
> This proposal covers a different situation.
>
>
> /soap box
> While I am no expert in all things and try not to comment on things
> which I really don't have the expertise (especially other platforms), I
> do know a lot about the POWER platform.
>
> I am responsible for the overall delivery of the open source toolchain
> for Linux on Power. GLIBC is just one component of many that needs to be
> coordinated for delivery. I also get involved directly with Linux
> customers and try to respond to issues they identify. As such I am in a
> good position to see how all the pieces (hardware, software, ABIs, ...)
> fit together and where they can be made better.
>
> With this larger responsibility, I don't have much time to quibble over
> the fine point of esoteric design. So I tend to short cut to conclusions
> and support my team.

I know how it feels, I am in the same boat.  Usually my suggestions
are more aimed at getting some free work done for myself :).
But I actually like this proposal and even thinking about it for
AARCH64 with both hwcap and another AUVX varaible.

>
> If you do catch me pontificating on some other platform, without basis
> in fact, please feel free to call me out.
>
> But lots people seem to want to provide their opinion based on their
> experience with other platforms and point out where I might have
> strayed. Fine, but I can and do try to point out that their argument
> does not apply (to my platform).

Totally 100% agree.  Even then there is some micro-architectures
differences even on some architectures which some folks don't
understand that trade-offs need to be taken even for differences in
micro-architectures.

Thanks,
Andrew Pinski

>
> But recent comments and responses have gone past the normal give and
> take of a healthy community, and into accusations and attacks.
>
> That is going too far should not be tolerated.
>
> \soap box
>
>
>
>
>
>
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]