This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: Carlos Eduardo Seo <cseo at linux dot vnet dot ibm dot com>, GLIBC Devel <libc-alpha at sourceware dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>, Richard Henderson <rth at redhat dot com>
- Date: Fri, 3 Jul 2015 19:11:21 +0200
- Subject: Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
- Authentication-results: sourceware.org; auth=none
- References: <55760314 dot 6070601 at linux dot vnet dot ibm dot com> <559617FF dot 8010100 at redhat dot com> <20150703085542 dot GE32307 at domone> <55968AF8 dot 8060104 at redhat dot com>
On Fri, Jul 03, 2015 at 09:15:36AM -0400, Carlos O'Donell wrote:
> On 07/03/2015 04:55 AM, OndÅej BÃlka wrote:
> >> At the end of the day it's up to IBM to make the best use of the
> >> tp+offset data stored in the TCB, but every byte you save is another
> >> byte you can use later for something else.
> > Carlos a problem with this patch is that they ignored community
> > feedback. Early in this thread Florian come with better idea to use
> > GOT+offset that could be accessed as
> > &hwcap_hack and avoids per-thread runtime overhead.
> Steven and Carlos have not ignored the community feedback, they just
> have a different set of priorities and requirements. There is little
> to discuss if your priorities and requirements are different.
> The use of tp+offset data is indeed a scarce resource that should be
> used only when absolutely necessary or when the use case dictates.
> It is my opinion as a developer, that Carlos' patch is flawed because
> it uses a finite resource, namely tp+offset data, for what I perceive
> to be a flawed design pattern that as a free software developer I don't
> want to encourage. These are not entirely technical arguments though,
> they are subjective and based on my desire to educate and mentor developers
> who write such code. I don't present these arguments as sustained
> opposition to the patch because they are not technical and Carlos
> has a need to accelerate this use case today.
> I have only a few substantive technical issues with the patch. Given
> that the ABI allocates a large block of tp+offset data, I think it is
> OK for IBM to use the data in this way. For example I think it is much
> much more serious that such a built application will likely just crash
> when run with an older glibc. This is a distribution maintenance issue
> that I can't ignore and I'd like to see it solved by a dependency on a
> versioned dummy symbol.
> Lastly, the symbol address hack is an incomplete solution because Florian
> has not provided an implementation. Depending on the implementation it
> may require a new relocation, and that is potentially more costly to the
> program startup than the present process for filling in HWCAP/HWCAP2.
Thats valid concern. My idea was checking if hwcap_hack relocation exist.
I didn't realize that it scales with number of libraries.
One of reasons why I didn't like this proposal is that it harms linux
ecosystem as it increases startup cost of a bit everything while its
unlikely that cross-platform projects will use this.
But these could be done without much of our help. We need to keep these
writable to support this hack. I don't know exact assembly for powerpc,
it should be similar to how do it on x64:
asm ("lea x@GOTPCREL(%rip), %rax; movb $32, (%rax)");
asm ("lea x(%rip), %rax; movb $32, (%rax)");
> Without a concrete implementation I can't comment on one or the other.
> It is in my opinion overly harsh to force IBM to go implement this new
> feature. They have space in the TCB per the ABI and may use it for their
> needs. I think the community should investigate symbol address munging
> as a method for storing data in addresses and make a generic API from it,
> likewise I think the community should investigate standardizing tp+offset
> data access behind a set of accessor macros and normalizing the usage
> across the 5 or 6 architectures that use it.
I would like this as with access to that I could improve performance of
> > Also I now have additional comment with api as if you want faster checks
> > wouldn't be faster to save each bit of hwcap into byte field so you
> > could avoid using mask at each check?
> That is an *excellent* suggestion, and exactly the type of technical
> feedback that we should be giving IBM, and Carlos can confirm if they've
> tried such "unpacking" of the bits into byte fields. Such unpacking is
> common in other machine implementations.
Also with unpacking doing that in userspace becomes more attractive so
we don't have to copy 64 bytes for each thread.