This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot comcom>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: "Carlos O'Donell" <carlos at redhat dot com>, Carlos Eduardo Seo <cseo at linux dot vnet dot ibm dot com>, GLIBC Devel <libc-alpha at sourceware dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>, Richard Henderson <rth at redhat dot com>
- Date: Tue, 07 Jul 2015 10:35:24 -0500
- Subject: Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
- Authentication-results: sourceware.org; auth=none
- References: <55760314 dot 6070601 at linux dot vnet dot ibm dot com> <559617FF dot 8010100 at redhat dot com> <20150703085542 dot GE32307 at domone> <55968AF8 dot 8060104 at redhat dot com> <20150703171121 dot GA23898 at domone>
- Reply-to: munroesj at linux dot vnet dot ibm dot com
On Fri, 2015-07-03 at 19:11 +0200, OndÅej BÃlka wrote:
> On Fri, Jul 03, 2015 at 09:15:36AM -0400, Carlos O'Donell wrote:
> > On 07/03/2015 04:55 AM, OndÅej BÃlka wrote:
> > >> At the end of the day it's up to IBM to make the best use of the
> > >> tp+offset data stored in the TCB, but every byte you save is another
> > >> byte you can use later for something else.
> > >>
> > > Carlos a problem with this patch is that they ignored community
> > > feedback. Early in this thread Florian come with better idea to use
> > > GOT+offset that could be accessed as
> > > &hwcap_hack and avoids per-thread runtime overhead.
> > Steven and Carlos have not ignored the community feedback, they just
> > have a different set of priorities and requirements. There is little
> > to discuss if your priorities and requirements are different.
> > The use of tp+offset data is indeed a scarce resource that should be
> > used only when absolutely necessary or when the use case dictates.
> > It is my opinion as a developer, that Carlos' patch is flawed because
> > it uses a finite resource, namely tp+offset data, for what I perceive
> > to be a flawed design pattern that as a free software developer I don't
> > want to encourage. These are not entirely technical arguments though,
> > they are subjective and based on my desire to educate and mentor developers
> > who write such code. I don't present these arguments as sustained
> > opposition to the patch because they are not technical and Carlos
> > has a need to accelerate this use case today.
Value judgments about what is precious can vary.
On Power CPU, cycles and hazard avoidance is more precious then a double
word or two. On a machine with 64KB pages, 128-byte cache-lines, and
supported memory configs up to 32TB, this is a good trade-off.
I am not trying to impose this on any one else.
> > I have only a few substantive technical issues with the patch. Given
> > that the ABI allocates a large block of tp+offset data, I think it is
> > OK for IBM to use the data in this way. For example I think it is much
> > much more serious that such a built application will likely just crash
> > when run with an older glibc. This is a distribution maintenance issue
> > that I can't ignore and I'd like to see it solved by a dependency on a
> > versioned dummy symbol.
We agree to add the symbol check and fail the app it is loading an old
> > Lastly, the symbol address hack is an incomplete solution because Florian
> > has not provided an implementation. Depending on the implementation it
> > may require a new relocation, and that is potentially more costly to the
> > program startup than the present process for filling in HWCAP/HWCAP2.
> Thats valid concern. My idea was checking if hwcap_hack relocation exist.
> I didn't realize that it scales with number of libraries.
> One of reasons why I didn't like this proposal is that it harms linux
> ecosystem as it increases startup cost of a bit everything while its
> unlikely that cross-platform projects will use this.
> But these could be done without much of our help. We need to keep these
> writable to support this hack. I don't know exact assembly for powerpc,
> it should be similar to how do it on x64:
> int x;
> int foo()
> #ifdef SHARED
> asm ("lea x@GOTPCREL(%rip), %rax; movb $32, (%rax)");
> asm ("lea x(%rip), %rax; movb $32, (%rax)");
> return &x;
Not so simple on PowerISA as we don't have PC-relative addressing.
1) The global entry requires 2 instruction to establish the TOC/GOT
2) Medium model requires two instructions (fused) to load a pointer from
3) Finally we can load the cached hwcap.
None of this is required for the TP+offset.
Telling me how x86 does things is not much help.
> > Without a concrete implementation I can't comment on one or the other.
> > It is in my opinion overly harsh to force IBM to go implement this new
> > feature. They have space in the TCB per the ABI and may use it for their
> > needs. I think the community should investigate symbol address munging
> > as a method for storing data in addresses and make a generic API from it,
> > likewise I think the community should investigate standardizing tp+offset
> > data access behind a set of accessor macros and normalizing the usage
> > across the 5 or 6 architectures that use it.
> I would like this as with access to that I could improve performance of
> several inlines.
> > > Also I now have additional comment with api as if you want faster checks
> > > wouldn't be faster to save each bit of hwcap into byte field so you
> > > could avoid using mask at each check?
> > That is an *excellent* suggestion, and exactly the type of technical
> > feedback that we should be giving IBM, and Carlos can confirm if they've
> > tried such "unpacking" of the bits into byte fields. Such unpacking is
> > common in other machine implementations.
This does not help on Power, Any (byte, halfword, word, doubleword,
quadword) aligned load is the same performance. Splitting our bits to
bytes just slow things down. Consider:
if (__builtin_cpu_supports(ARCH_2_07) &&
This is 3 instructions (lwz, andi., bc) as packed bits, but 5 or 6 as
Again value judgements about that is fast or slow can vary by platform.