This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB


On Tue, 2015-06-09 at 13:42 -0400, Rich Felker wrote:
> On Tue, Jun 09, 2015 at 12:37:04PM -0500, Steven Munroe wrote:
> > On Tue, 2015-06-09 at 12:50 -0400, Rich Felker wrote:
> > > On Tue, Jun 09, 2015 at 04:48:10PM +0100, Szabolcs Nagy wrote:
> > > > >> if hwcap is useful abi between compiler and libc
> > > > >> then why is this done in a powerpc specific way?
> > > > > 
> > > > > Other platform are free use this technique.
> > > > 
> > > > i think this is not a sustainable approach for
> > > > compiler abi extensions.
> > > > 
> > > > (it means juggling with magic offsets on the order
> > > > of compilers * libcs * targets).
> > > > 
> > > > unfortunately accessing the ssp canary is already
> > > > broken this way, i'm not sure what's a better abi,
> > > > but it's probably worth thinking about one before
> > > > the tcb code gets too messy.
> > > 
> > > For the canary I think it makes sense, even though it's ugly -- the
> > > compiler has to generate a reference in every single function (for
> > > 'all' mode, or just most non-trivial functions in 'strong' mode).
> > > That's much different from a feature (hwcap) that should only be used
> > > at init-time and where, even if programmers did abuse it and use it
> > > over and over at runtime, it's only going to be a small constant
> > > overhead in a presumably medium to large sized function, and the cost
> > > is only the need to setup the GOT register and load from the GOT,
> > > anyway.
> > 
> > You are entitled to you own opinion but you are not accounting for the
> > aggressive out of order execution the POWER processors and specifics of
> > the PowerISA. In the time it take to load indirect via the TOC (4 cycles
> > minimum) compare/branch we could have executed 12-16 useful
> > instructions. 
> > 
> > Any indirection exposes the sequences to hazards (like cache miss) which
> > only make things worse.
> > 
> > As stated before I have thought about this and understand the options in
> > the context of the PowerISA, POWER micro-architecture, and the PowerPC
> > ABIs. This information is publicly available (if a little hard to find)
> > but I doubt you have taken the time to study it in detail, if at all.
> > 
> > I suspect you base your opinion on other architectures and hardware
> > implementations that do not apply to this situation. 
> 
> That's nice but all theoretical. I've seen countless such theoretical
> claims from people who are coming from a standpoint of the vendor
> manuals for the ISA they're working with, and more often than not,
> they don't translate into measurable benefits. (I've been guilty of
> this myself too, going to great lengths to tweak x86 codegen or even
> write the asm by hand, only to find the resulting code to run the
> exact same speed.) Creating a permanent ABI is an extremely high cost,
> and unless you can justify the cost with actual measurements and a
> reason to believe those measurements have anything to do with
> real-world usage needs, I believe it's an unjustified cost.
> 

This is not theory, I am thinking at the level of pipeline cycle timing
for P7/P8. I have been at this so long I can do this in my head.

Now experience does tell me that adding an indirection and the
associated exposure to cache miss hazard can mean the the performance
optimization gets lost in the hazard when it is measured.

I have been to this movie, I don't need to see it again.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]