Re: static TLS exhausted on ppc64le

On 9/30/19 2:41 PM, Szabolcs Nagy wrote:
> On 30/09/2019 19:07, Carlos O'Donell wrote:
>> On 9/30/19 1:50 PM, Rich Felker wrote:
>>> I see. That's a shame, because if you have excess static TLS reserved,
>>> using it for tlsdesc is actually really nice -- it makes the accesses
>>> just as fast as initial-exec, but opportunistically, and falls back
>>> gracefully if you run out. Waiting to hand it out to badly-behaved
>>> libraries that are using initial-exec model only serves to reinforce
>>> the bad behavior and discourages adoption of tlsdesc since the bad
>>> behavior gets preferential treatment...
>>> I think this analysis further supports my previous remarks that
>>> initial-exec in dlopened libraries should be deprecated and EOL'd.
>> We should dig deeper into the analysis here, and I just ran readelf for
>> all the implicated libraries in the bug.
>> The only *real* problem here is the implementation, only libc and libgomp
>> use TLS IE in this case (and libgl in the wild).
>> I think the best steps towards resolution are:
>> * Stop ppc64le and aarch64 from using ALL of the static TLS for tlsdesc / tls opt hack.
>>   - Reserve at least 128 bytes for libgomp + libgl.
> i think this is easy and reasonable:
> _dl_try_allocate_static_tls can reserve 128 bytes
> which is only used by _dl_allocate_static_tls.
> (so the optimization is still applied to dynamically
> loaded dsos, but there is guaranteed tls for small
> number if IE libs)
>> * Fix lazy tls loading to stop being lazy about allocation and allocate all memory
>>   required up front.
>>   - This allows libc to use GD instead of IE and not worry about touching tls vars
>>     early before init or the ordering of IE vs. GD.
>>   - Requires a non-default dlopen flag to get back old behaviour.
> this sounds hard.
> (and observable to users with many threads and many dsos with tls:
> preallocation will use more memory and cause slower thread starts
> than lazy allocation)
>> * Switch glibc back to GD internally.
>> * Switch x86_64 to tlsdesc (can be done at any time) to get perf back.
>> Disallowing IE in DSOs is only going to get us angry users in a transitional period.
>> The above plan will benefit ppc64le and aarch64 since they continue to
>> have maximum performance for their usage of tlsdesc.
>> Thoughts?
> i'm ok with the plan, but i thin only the first part
> is realistic in short term.
I agree the middle part is hard.

If we fix this to reserve X bytes, then we can immediately start
transitioning x86_64 to tlsdesc also without hitting these problems? :-)


