This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

From: "Carlos O'Donell" <carlos at redhat dot com>
To: munroesj at linux dot vnet dot ibm dot com
Cc: Rich Felker <dalias at libc dot org>, libc-alpha at sourceware dot org
Date: Wed, 08 Jul 2015 03:51:04 -0400
Subject: Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
Authentication-results: sourceware.org; auth=none
References: <55760314 dot 6070601 at linux dot vnet dot ibm dot com> <20150609163835 dot GI17573 at brightrain dot aerifal dot cx> <1435777940 dot 7125 dot 132 dot camel at oc7878010663> <5596C284 dot 9070108 at redhat dot com> <1436145404 dot 10792 dot 46 dot camel at oc7878010663>

On 07/05/2015 09:16 PM, Steven Munroe wrote:
> On Fri, 2015-07-03 at 13:12 -0400, Carlos O'Donell wrote:
>> On 07/01/2015 03:12 PM, Steven Munroe wrote:
>>> If you think about the requirements for a while it becomes clear. As the
>>> HWCAP cache would have to be defined and initialized in either libgcc or
>>> libc, accept will be none local from any user library. So all the local
>>> TLC access optimization's are disallowed. Add the requirement to support
>>> dl_open() libraries leaves the general dynamic TLS model as the ONLY
>>> safe option.
>>
>> That's not true anymore? Alan Modra added pseudo-TLS descriptors to POWER
>> just recently[1], which means __tls_get_addr call is elided and the offset
>> returned immediately via a linker stub for use with tp+offset. However,
>> I agree that even Alan's great work here is still going to be several
>> more instructions than a raw tp+offset access. However, it would be
>> interesting to discuss with Alan if his changes are sufficiently good
>> that the out-of-order execution hides the latency of this additional
>> instructions and his methods are a sufficient win that you *can* use
>> TLS variables?
>>
> I did discuss this with Alan and he agree that with the given
> requirements the the standard TLS mechanism is always slower them my
> original TCB proposal.

Sounds good, thank you for clarifying that.

> Why would you think I had not talked to Alan?

As a reviewer I can't assume anything you don't tell me.

Let me use a Mark Mitchell anecdote: You walk into class on the first
day of class. The teacher says "What's your job?" You say "To learn!"
The teacher says "No. It's to make it easy for the grader to give you
an A."

You make it easy for the reviewer to accept your patch when the
submission answers all of the questions the reviewer would ask.

>>> Now there were a lot of suggestions to just force the HWCAP TLS
>>> variables into initial exec or local exec TLS model with an attribute.
>>> This would resolve to direct TLS offset in some special reserved TLS
>>> space?
>>
>> It does. Since libc.so is always seen by the linker it can always allocate
>> static TLS space for that library when it computes the maximum size of
>> static TLS space.
>>
>>> How does that work with a library loaded with dl_open()? How does that
>>> work with a library linked with one toolchain / GLIBC on Distro X and
>>> run on a system with a different toolchain and GLIBC on Distro Y? With
>>> different versions of GLIBC? Will HWCAP get the same TLS offset? Do we
>>> end up with .text relocations that we are also trying to avoid?
>>
>> (1) Interaction with dlopen?
>>
>> The two variables in question are always in libc.so.6, and therefore are
>> always loaded first by DT_NEEDED, and there is always static storage
>> reserved for that library.
>>
>> There are 2 scenarios which are problematic.
>>
>> (a) A static application accessing NSS / ICONV / IDN must dynamically
>>     load libc.so.6, and there must be enough reserve static TLS space
>>     for the allocated IE TLS variables or the dynamic loader will abort
>>     the load indicating that there is not enough space to load any more
>>     static TLS using DSOs. This is solved today by providing surplus
>>     static TLS storage space.
>>
>> (b) Use of dlmopen to load multiple libc.so.6's. In this case you could
>>     load libc.so.6 into alternate namespaces and eventually run out of
>>     surplus static TLS. We have never seen this in common practice because
>>     there are very few users of dlmopen, and to be honest the interface
>>     is poorly documented and fraught with problems.
>>
>> Therefore in the average scenario it will work to use static TLS, or
>> IE TLS variables in glibc in the average case. I consider the above
>> cases to be outside the normal realm of user applications.
>>
>> e.g.
>> extern __thread int foo __attribute__((tls_model("initial-exec")));
>>
>> (2) Distro to distro compatibility?
>>
>> With my Red Hat on:
>>
>> Let me start by saying you have absolutely no guarantee here at all
>> provided by any distribution. As the Fedora and RHEL glibc maintainer
>> your vendor is far outside the scope of support and such a scenario is
>> never possible. You can wish it, but it's not true unless you remain
>> very very low level and very very simple interfaces. That is to say
>> that you have no guarantee that a library linked by a vendor with one
>> toolchain in distro X will work in distro Y. If you need to do that
>> then build in a container, chroot or VM with distro Y tools. No vendor
>> I've ever talked to expects or even supports such a scenario.
>>
>> With my hacker hat on:
>>
>> Generally for simple features it just works as long as both distros
>> have the same version of glibc. However, we're talking only about
>> the glibc parts of the problem. Compatibility with other libraries
>> is another issue.
>>
> No! the version of GLIBC does not matter as long as the GLIBC supports
> TLS (GLIBC-2.5?)

You are correct, the runtime glibc version does not strictly matter,
but I think it *might* matter if you use an old glibc (see discussion
about crashes).

>> (3) Different versions of glibc?
>>
>> Sure it works, as long as all the versions have the same feature and
>> are newer than the version in which you introduced the change. That's
>> what backwards compatibility is for.
>>
>> (4) Will HWCAP get the same TLS offset? 
>>
>> That's up to the static linker. You don't care anymore though, the gcc
>> builtin will reference the IE TLS variables like it would normally as
>> part of the shared implementation, and that variable is resolved to glibc
>> and normal library versioning hanppens. The program will now require that
>> glibc or newer and you'll get proper error messages about that.
>>
>> (5) Do we end up with .text relocations that we are also trying to avoid?
>>
>> You should not. The offset is known at link time and inserted by the
>> static linker.
>>
> To avoid the text relocation I believe there is an extra GOT load of the
> offset. If this is not true then Alan owes me an update to the ABI
> document to explain how this would work. As the current Draft ELF2 ABI
> update does not say this is supported.

Sorry, you are correct, for ppc64 there is a R_PPC64_TPREL64 on the GOT
and an indirect load. So this doesn't work for you either because of the
indirect performance penalty.

>>> Again the TCB avoids all of this as it provides a fixed offset defined
>>> by the ABI and does not require any up-calls or indirection. And also
>>> will work in any library without induced hazards. This clearly works
>>> across distros including previous version of GLIBC as the words where
>>> previously reserved by the ABI. Application libraries that need to run
>>> on older distros can add a __built_cpu_init() to their library init or
>>> if threaded to their thread create function.
>>
>> You get a crash since previous glibc's don't fill in the data?
>> And that crash gives you only some information to debug the problem,
>> namely that you ran code for a processors you didn't support.
>>
> There is NO crash. There never was a crash. There is no additional
> security exposure. The only TCB fields that might be a security exposure
> where already there, in every other platform.

Sorry, I don't follow you here, could you expand what you mean by
"already there?" Do you mean to say that "The ABI has always specified
this space as reserved?"

> The worst there can be is is fallback the to base implementation (the
> bit is 0 when is should be 1).

The threading support uses a stack cache that reuses allocated stacks
from other threads, and depending on the requirements of the thread to
have guards or other parameters that consume stack space I don't know
that you can guarantee the reserved space stays at zero for the lifetime
of the program without initializing it every time the thread is started.
A reused stack for a newly started thread might therefore have non-zero
data in the reserved spot and cause the code for an invalid CPU to be
selected. This can't be fixed without the per-thread initialization
code in glibc?

Someone should look at this case minimally, or alternatively version
the interface and only use this support with newer glibc's that carry
out the initialization.

> As explained the dword is already there and initialized to 0 when the
> page is allocate. So the load will work NOW for any GLIBC since TLS was
> implemented.
> 
> As implemented by Alan and I.
 
I don't think this is true per my comments above regarding stack reuse.
 
>> It is true that you could use LD_PRELOAD to run __builtin_cpu_init()
>> on older systems, but you need to *know* that, and use that. What
>> provides this function? libgcc?
>>
> We will provide a little init routine applications can use. This is not
> hard.

I assume they have to use it in every thread before they can call any
of the builtins?

>> It is certainly a benefit to using the TCB, that this kind of use case
>> is supported. However, in doing so you adversely impact the distribution
>> maintainers for the benefit of?
>>
> I can not think of any adverse impacts on any of the other platform
> maintainers, on any the distros.

As described above I think you can get crashes because of stack cache
reuse leaving some of these reserved words potentially non-zero.
I also think a cancelled thread (which might be in an undefined state
and have written into the TCB) can have it's stack reused also.

Cheers,
Carlos.

References:
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Steven Munroe
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Carlos O'Donell
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Steven Munroe

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]