This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: HWCAP is method to determine cpu features, not selection mechanism.


On Wed, Jun 10, 2015 at 01:58:27PM -0500, Steven Munroe wrote:
> On Wed, 2015-06-10 at 17:53 +0200, OndÅej BÃlka wrote:
> > On Wed, Jun 10, 2015 at 12:23:40PM -0300, Adhemerval Zanella wrote:
> > > 
> > > 
> > > On 10-06-2015 12:09, OndÅej BÃlka wrote:
> > > > On Wed, Jun 10, 2015 at 11:21:54AM -0300, Adhemerval Zanella wrote:
> > > >>
> > > >>
> > > >> On 10-06-2015 11:16, Szabolcs Nagy wrote:
> > > >>> On 10/06/15 14:35, Adhemerval Zanella wrote:
> > > >>>> I agree that adding an API to modify the current hwcap is not a good
> > > >>>> approach. However the cost you are assuming here are *very* x86 biased,
> > > >>>> where you have only on instruction (movl <variable>(%rip), %<destiny>) 
> > > >>>> to load an external variable defined in a shared library, where for
> > > >>>> powerpc it is more costly:
> > > >>>
> > > >>> debian codesearch found 4 references to __builtin_cpu_supports
> > > >>> all seem to avoid using it repeatedly.
> > > >>>
> > > >>> multiversioning dispatch only happens at startup (for a small
> > > >>> number of functions according to existing practice).
> > > >>>
> > > >>> so why is hwcap expected to be used in hot loops?
> > > >>>
> > > >>
> > snip
> > > And my understanding is to optimize hwcap access to provide a 'better' way
> > > to enable '__builtin_cpu_supports' for powerpc.  IFUNC is another way to provide
> > > function selection, but it does not exclude that accessing hwcap through
> > > TLS is *faster* than current options. It is up to developer to decide to use
> > > either IFUNC or __builtin_cpu_supports. If the developer will use it in
> > > hot loops or not, it is up to them to profile and use another way.
> > > 
> > > You can say the same about current x86 __builtin_cpu_supports support: you should
> > > not use in loops, you should use ifunc, whatever.
> > 
> > Sorry but no again. We are talking here on difference between variable
> > access and tcb access. You forgot to count total cost. That includes
> > initialization overhead per thread to initialize hwcap, increased
> > per-thread memory usage, maintainance burden and increased cache misses.
> > If you access hwcap only rarely as you should then per-thread copies
> > would introduce cache miss that is more costy than GOT overhead. In GOT
> > case it could be avoided as combined threads would access it more often.
> > 
> Actually Adhemerval does have the knowledge, background, and experience
> to understand this difference and accurately access the trade-offs.
>
While he may have background he didn't cover drawbacks. So I needed to
point them out to start discussing cost-benefit analysis instead looking
at them with rose glasses.
 
> > So if your multithreaded application access hwcap maybe 10 times per run 
> > you would likely harm performance.
> > 
> Sorry this is not an accurate assessment as the proposed fields are in
> the same cache line as other more frequently accessed fields of the TCB.
> 
> The proposal will not effectively increase the cache foot-print.
> 
It could by displacement. Whats next field? By adding that you could
shift that to next cache line. When it would be frequently used you are
using two cache lines instead one.


> > I could from my head tell ten functions that with tcb entry lead to much
> > bigger performance gains. So if this is applicable I will submit strspn
> > improvement that keeps 32 bitmask and checks if second argument didn't
> > changed. That would be better usage of tls than keeping hwcap data.
> >
> If you are suggestion saving results across strspn calls then a normal
> TLS variable would be an appropriate choice.
> 
> This proposal covers a different situation.
> 
I am not saying that. I am saying that place at tcb table is resource
that needs to be managed.

I am not convinced about your proposal as it would help only your
application. Remaining applications that won't use hwcap would pay in
increased startup overhead of threads and bit bigger memory comsumption.

For example we could decide to add per-thread 256 byte cache to malloc 
and inline small allocations to use that cache with fast access by tcb.
That would likely benefit everybody and is wise thing to do. Then there
are other use cases and we should set treshold on how big average performance
gain you need to show. 

Thats why you need to calculate cost and you need to show that benefits are bigger. 
It may benefit your application which is one of thousand. Remaining 999 applications
could also find tcb variable that will give them similar
speedup as your application. If we are impartial we should add them all.
That would result in each thread needing additional 8kb tls space per
thread and being slowed down by initialization. So where is your
evidence that gains would be so widespread?

Also I wasn't saying that strspn could benefit from normal tls variable.
I was saying that if you do a cost benefit analysis which one of hwcap
and strspn optimization should use tcb then you should include strspn
and leave hwcap alone. There are many more applications that use strspn
so overall gain would be bigger.


> 
> /soap box
> While I am no expert in all things and try not to comment on things
> which I really don't have the expertise (especially other platforms), I
> do know a lot about the POWER platform.
> 
> I am responsible for the overall delivery of the open source toolchain
> for Linux on Power. GLIBC is just one component of many that needs to be
> coordinated for delivery. I also get involved directly with Linux
> customers and try to respond to issues they identify. As such I am in a
> good position to see how all the pieces (hardware, software, ABIs, ...)
> fit together and where they can be made better.
> 
> With this larger responsibility, I don't have much time to quibble over
> the fine point of esoteric design. So I tend to short cut to conclusions
> and support my team.
>
Thats problem as naturaly these shortcut lead to worse decisions. You
should delegate that responsibility to somebody who knows details.

 
> If you do catch me pontificating on some other platform, without basis
> in fact, please feel free to call me out.
> 
> But lots people seem to want to provide their opinion based on their
> experience with other platforms and point out where I might have
> strayed. Fine, but I can and do try to point out that their argument
> does not apply (to my platform).
> 
> But recent comments and responses have gone past the normal give and
> take of a healthy community, and into accusations and attacks.
> 
> That is going too far should not be tolerated.
> 
> \soap box
> 
> 
> 
> 
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]