This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, Carlos Eduardo Seo <cseo at linux dot vnet dot ibm dot com>, GLIBC Devel <libc-alpha at sourceware dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
- Date: Tue, 09 Jun 2015 12:37:04 -0500
- Subject: Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
- Authentication-results: sourceware.org; auth=none
- References: <55760314 dot 6070601 at linux dot vnet dot ibm dot com> <5576FC80 dot 1090806 at arm dot com> <1433862393 dot 21101 dot 9 dot camel at sjmunroe-ThinkPad-W500> <55770ABA dot 1010205 at arm dot com> <20150609165018 dot GK17573 at brightrain dot aerifal dot cx>
- Reply-to: munroesj at linux dot vnet dot ibm dot com
On Tue, 2015-06-09 at 12:50 -0400, Rich Felker wrote:
> On Tue, Jun 09, 2015 at 04:48:10PM +0100, Szabolcs Nagy wrote:
> > >> if hwcap is useful abi between compiler and libc
> > >> then why is this done in a powerpc specific way?
> > >
> > > Other platform are free use this technique.
> >
> > i think this is not a sustainable approach for
> > compiler abi extensions.
> >
> > (it means juggling with magic offsets on the order
> > of compilers * libcs * targets).
> >
> > unfortunately accessing the ssp canary is already
> > broken this way, i'm not sure what's a better abi,
> > but it's probably worth thinking about one before
> > the tcb code gets too messy.
>
> For the canary I think it makes sense, even though it's ugly -- the
> compiler has to generate a reference in every single function (for
> 'all' mode, or just most non-trivial functions in 'strong' mode).
> That's much different from a feature (hwcap) that should only be used
> at init-time and where, even if programmers did abuse it and use it
> over and over at runtime, it's only going to be a small constant
> overhead in a presumably medium to large sized function, and the cost
> is only the need to setup the GOT register and load from the GOT,
> anyway.
You are entitled to you own opinion but you are not accounting for the
aggressive out of order execution the POWER processors and specifics of
the PowerISA. In the time it take to load indirect via the TOC (4 cycles
minimum) compare/branch we could have executed 12-16 useful
instructions.
Any indirection exposes the sequences to hazards (like cache miss) which
only make things worse.
As stated before I have thought about this and understand the options in
the context of the PowerISA, POWER micro-architecture, and the PowerPC
ABIs. This information is publicly available (if a little hard to find)
but I doubt you have taken the time to study it in detail, if at all.
I suspect you base your opinion on other architectures and hardware
implementations that do not apply to this situation.