This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: static TLS exhausted on ppc64le


On 9/30/19 1:50 PM, Rich Felker wrote:
> On Mon, Sep 30, 2019 at 04:13:07PM +0000, Szabolcs Nagy wrote:
>> On 30/09/2019 17:06, Rich Felker wrote:
>>> On Mon, Sep 30, 2019 at 05:47:29PM +0200, Florian Weimer wrote:
>>>> * Szabolcs Nagy:
>>>>
>>>>> On 30/09/2019 15:02, Dan Horák wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I would like to open a problem we have already met twice in Fedora. the
>>>>>> symptom is
>>>>>>
>>>>>> "/lib64/libgomp.so.1: cannot allocate memory in static TLS block"
>>>>>>
>>>>>> usually when loading a lot of libraries/modules into a Python
>>>>>> application. It happened on ppc64le and also on aarch64 systems.
>>>>>>
>>>>>> We have 2 reports in Fedora bugzilla about with more details.
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1722181
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1738752
>>>>>>
>>>>>> We have already discussed that briefly with Florian and other members of
>>>>>> the Red Hat toolchain team, but outcome was in form of a recommendation
>>>>>> to reduce the usage of "static TLS" objects in the individual libraries.
>>>>>> But the open question still is - is there a fix for the TLS space
>>>>>> exhaustion? I believe it can easily become a more serious problem soon.
>>>>>
>>>>> (a workaround is preloading the problematic libs at startup time)
>>>>>
>>>>> i think it's a bug in libgomp.so.1, gcc should not build
>>>>> broken dsos by default (unless it can ensure they are never
>>>>> loaded dynamically):
>>>>>
>>>>> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgomp/configure.tgt;h=b88bf72fe3de3735929635c874b8da375c841b1d;hb=HEAD#l13
>>>>
>>>> I like the simplicity of initial-exec TLS.
>>>
>>> I wouldn't really characterize it as simplicity. It's a trade of
>>> complex (at least to the user) constraints on whether or not it works
>>> for some simplicity of implementation.
>>>
>>> I guess for glibc at present, there's a lot more complexity to dynamic
>>> models because of lazy allocation and installation and generation
>>> counters, and these interact with AS-safety and failsafety in
>>> undesirable ways. I'd like to see that fixed but I know it's a big
>>> change.
>>>
>>>> I think there was a change on POWER to use the static TLS reservation
>>>> for dynamic TLS, as an optimization.  Obviously, that's going to hurt
>>>> those cases where a library with initial-exec TLS is loaded late, even
>>>> if the static TLS reservation would ordinarily be large enough.
>>>
>>> Was that because of the PLT-stub hack on powerpc done in lieu of
>>> tlsdesc? That should really be abandoned entirely IMO, since it
>>> *doesn't* give you any of the benefit of tlsdesc -- the whole point is
>>> not the short code path but avoiding register spills for the standard
>>> ABI call to __tls_get_addr, and the powerpc hack doesn't let you avoid
>>> them. Real tlsdesc should be added to powerpc.
>>
>> the problem is TRY_STATIC_TLS (defined in dynamic-link.h)
>>
>> when it is used for dynamic tls (on targets where that's
>> possible: tlsdesc or ppc tls opt hack) it will eat the
>> preallocated static tls. (that's why this affects aarch64
>> and powrpc64)
>>
>> i think that logic can be easily changed so the preallocated
>> tls area is not used for normal dynamically loaded dsos
>> (assuming the intention of the prealloc tls is purely to
>> support dsos with initial-exec tls), that's less optimal for
>> the common dynamic tls use-case, but makes libgomp etc work.
> 
> I see. That's a shame, because if you have excess static TLS reserved,
> using it for tlsdesc is actually really nice -- it makes the accesses
> just as fast as initial-exec, but opportunistically, and falls back
> gracefully if you run out. Waiting to hand it out to badly-behaved
> libraries that are using initial-exec model only serves to reinforce
> the bad behavior and discourages adoption of tlsdesc since the bad
> behavior gets preferential treatment...
> 
> I think this analysis further supports my previous remarks that
> initial-exec in dlopened libraries should be deprecated and EOL'd.

We should dig deeper into the analysis here, and I just ran readelf for
all the implicated libraries in the bug.

The only *real* problem here is the implementation, only libc and libgomp
use TLS IE in this case (and libgl in the wild).

I think the best steps towards resolution are:

* Stop ppc64le and aarch64 from using ALL of the static TLS for tlsdesc / tls opt hack.
  - Reserve at least 128 bytes for libgomp + libgl.

* Fix lazy tls loading to stop being lazy about allocation and allocate all memory
  required up front.
  - This allows libc to use GD instead of IE and not worry about touching tls vars
    early before init or the ordering of IE vs. GD.
  - Requires a non-default dlopen flag to get back old behaviour.

* Switch glibc back to GD internally.

* Switch x86_64 to tlsdesc (can be done at any time) to get perf back.

Disallowing IE in DSOs is only going to get us angry users in a transitional period.

The above plan will benefit ppc64le and aarch64 since they continue to
have maximum performance for their usage of tlsdesc.

Thoughts?

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]