This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Fixing the distribution problems with TLS and DTV_SURPLUS slots.

From: Rich Felker <dalias at libc dot org>
To: Carlos O'Donell <carlos at redhat dot com>
Cc: Alexandre Oliva <aoliva at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Adam Conrad <adconrad at ubuntu dot com>, Roland McGrath <roland at hack dot frob dot com>, Siddhesh Poyarekar <siddhesh at redhat dot com>
Date: Thu, 9 Oct 2014 20:15:26 -0400
Subject: Re: Fixing the distribution problems with TLS and DTV_SURPLUS slots.
Authentication-results: sourceware.org; auth=none
References: <5432EFF9 dot 5020602 at redhat dot com> <orzjd8xv5v dot fsf at free dot home> <5436A03F dot 2050008 at redhat dot com>

On Thu, Oct 09, 2014 at 10:48:31AM -0400, Carlos O'Donell wrote:
> On 10/07/2014 02:15 AM, Alexandre Oliva wrote:
> > On Oct  6, 2014, "Carlos O'Donell" <carlos@redhat.com> wrote:
> > 
> >> This code is a *heuristic*, it basically fails the load if there
> >> are no DTV slots left, even though we can still do the following:
> > 
> >> (a) Grow the DTV dynamically as many times as we want, with the
> >>     generation counter causing other threads to update.
> > 
> > or
> > 
> >   (a)' Stop wasting DTV entries with modules assigned to static TLS.
> >        There's no reason whatsoever to do so.
> > 
> >        This optimization is even described in the GCC Summit article in
> >        which I first proposed TLS Descriptors.  Unfortunately, I never
> >        got around to implementing it.
> 
> I was not aware of this, but if possible is a great solution.
> 
> >> and
> > 
> >> (b) Allocate from the static TLS image surplus until it is exhausted.
> > 
> > 
> >> - Remove the check above, allowing the code to grow the DTV as large
> >>   as it wants for as many STATIC_TLS modules as it wants.
> > 
> > We don't really need to grow the DTV right away.  If we have static TLS,
> > we could just leave the DTV alone.  No code will ever access the
> > corresponding DTV entry.  If any code needs to update the DTV, because
> > of some module assigned to dynamic TLS, then, and only then, should the
> > DTV grow.
> 
> I had not considered this optimization, but I guess it would work.
> 
> >> WARNING: On AArch64 or any architecture that uses the generic-ish
> >> code for TLS descriptors, you will have further problems. There
> >> the descriptors consume static TLS image greedily, which means
> >> you may find that there is zero static TLS image space when you
> >> go to dlopen an application.
> > 
> > That's perfectly valid behavior, that exposes the bug in libraries that
> > are expected to be loadable after a program starts (say, by dlopen) when
> > relocations indicate they had to be brought in by Initial Exec.
> 
> I did not argue that it was invalid behaviour. I only wished to warn
> the reader that the situation at present will result in broken applications.
> We the tools authors allows this situation to get out of hand, and now we
> have both pieces when it breaks, and must do our level best to ensure
> things continue to work while providing a way out of the situation.
>  
> > That they worked was not by design; it was pretty much by accident,
> > because glibc led by (bad) example instead of coming up with a real
> > solution, and others followed suit, breaking glibc's own assumption that
> > only a very small amount of static TLS space would ever be used after
> > theprogram started, and that the consumer of that space would be glibc
> > itself.
> 
> I agree.
> 
> >> We need to further subdivide the static TLS image space into "reserved
> >> for general use" and "reserved for DSO load uses."  With the TLS
> >> descriptors allocating from the general use space only.
> > 
> > ?!?
> > 
> > Static TLS space grows as much as needed to fit all IE DSOs.  Some
> > excess is reserved (and this should be configurable), but if we don't
> > use it for modules that could benefit from it, what should we use it
> > for?
> 
> My apologies let me clarify. The static TLS space that is allocated
> is only for DSOs that are known apriori to the static linker. They
> must have been specified on the command line. Unfortunately in programs
> written in interpreted languages like python, everything is a dlopen'd
> DSO. When you use dlopen with IE you run into the problem that that
> we see today with TLS descriptors. You have a desire to keep the
> application working with the existing set of ~40 DSOs on the system
> that use IE, and we have a desire to keep TLS descriptors optimal.
> If we keep TLS descriptors optimal, they may consume all static TLS
> image and result in an application crash if a dlopen'd DSO uses
> IE, and I wish to avoid that crash.
> 
> >> On Fedora for AArch64 this
> >> caused no end of headaches attempting to load TLS IE using DSOs
> >> only to find it was literally impossible because so much of the
> >> implementation used TLS descriptors that the surplus static TLS
> >> image space was gone, and while descriptors can be allocated 
> >> dynamically, the DSOs can't.
> > 
> > Err...  I get a feeling I have no idea of what you refer to as DSO.
> > From the description, it's not Dynamically-loaded Shared Object.  What
> > is it, then?
> 
> My apologies again. Given that known DSOs using IE at link time will
> have static TLS image space allocated I have stopped talking about
> those since we know they work correctly. When I speak about DSOs I
> speak singularly about those loaded via dlopen.
>  
> > I suppose you may be speaking of modules that assume IE is usable to
> > access TLS of some module (itself, or any other), even though the
> > assumption is no warranted.
> 
> Yes. We have libraries in the OS using GCC constructs to force IE for
> certain __thread variables. We need to move them away from those uses,
> but we need to ensure a good migration path e.g. same speed, continues
> to work until we migrate all DSOs etc.
> 
> > So assume you load a module A defining a TLS section, and conservatively
> > assign it to dynamic TLS, for whatever reason.  Then you load a module B
> > that expects A to be in static TLS, because it uses IE to reference its
> > TLS symbols.  Kaboom.  The âconservativeâ approach just broke what would
> > have worked if you hadn't gratuitously taken it out of TLS.
> 
> I don't think this scenario is supported by the present tools.
> 
> The only uses I have ever seen for IE in a DSO is optimal access of local
> thread variables.
> 
> If the static linker could see B accesses A's TLS using IE (requires B to
> be listed as a dependency or in the link list) then both A and B
> would have to use static TLS, and that forces both into the static TLS
> image. It would then be wrong for the dyn loader to load A as dynamic TLS.
> 
> If you do think it can happen please start a distinct thread and we talk
> about it and look into the source.
> 
> > Now, of course when you load A you don't know whether module B is going
> > to be loaded, and whether it will require A to use static TLS or not, or
> > whether module C would fail to load afterwards because there's not
> > enough static TLS space for its own TLS section, and it uses IE even
> > though it's NOT being loaded as a dependency of the IE.
> > 
> > So not saving static TLS space for later use may expose breakage in
> > subsequently loaded modules, whereas saving it may equally expose
> > breakage in subsequently loaded modules, but waste static TLS space and
> > *significantly* impact performance of TLS Descriptor-using modules that
> > could have got IE-like performance.  That sounds like a losing strategy
> > to me.
> 
> The only valid sequences I know of are:
> 
> (a) Module uses static TLS and is known by the static linker and has
>     static TLS image space allocated.
> 
> (b) Module uses static TLS and is not known to the static linker, accesses
>     only it's own variables with IE, and has no static TLS images space
>     reserved for it.
> 
> The optimizing use of static TLS by thread descriptors breaks (b).
> 
> > Greedy allocation doesn't guarantee optimal results, but it won't break
> > anything that isn't already broken, and if and when such breakage is
> > exposed, switching the broken modules to TLS Descriptors will get them
> > nearly identical performance for TLS references that happen to land in
> > static TLS, but that will NOT cause the library to fail to load
> > otherwise: it will just get GD-like performance.
> 
> What if the module author can never tolerate GD-like performance and
> would rather it fail than load and run slowly e.g. MESA/OpenGL?

This is not the module author's decision to make. If the user wants to
run, the user should be able to run. And the performance difference is
not measurable anyway except in artificial benchmarks that do nothing
but hammer TLS accesses without even using the data they read.

> Remember, and keep in mind our users, we do this for them, and some
> of them have strict performance requirements. We should not lightly
> tell them what they want is wrong.

Then like I said, you should not give the user an error just because
a hardware/driver vendor doesn't want to look bad (slow) and wrongly
things dynamic-model will make the driver look slow.

> For example our work on tunnables to allow users to tweak up the size
> of static TLS image surplus is one potential solution to this problem.
> 
> It might also be possible to try make the static TLS image size a single
> mapping that we might possible be able to grow with kernel help?

The only way to make it growable is to reserve space to begin with. In
any case it's not practical for the dynamic linker to "stop the
world" and probe whether each thread would have space to grow its
static TLS mapping in-place.

> 
> > So, in addition to stopping wasting DTV entries with static TLS
> > segments, Isuggest not papering over the problem in glibc, but rather
> > proactively convert dlopenable libraries that use IE to access TLS
> > symbols that are not guaranteed to have been loaded along with the IE to
> > use TLS Descriptors in General Dynamic mode.
> 
> I agree that this is the correct solution, but *today* we have problems
> loading user applications. I see no options but to follow a staggered
> strategy:
> 
> (a) Immediately increase DTV surplus size.
> 
> 	- Distribution patches are doing this already to keep applications working.

No objection.

> (b) Implement static TLS support without needing a DTV increase.
> 
> 	- Reduces memory usage of DTV. Small optimization.
> 
>     and
> 
>     Remove faulty heursitics around not wanting to increase DTV size.

Seems okay; I would defer to Alexandre's opinion.

> (c) Approach upstream projects with patches to convert to TLS descriptors.
> 
> When we do (c), can it be done on a per-variable basis?
> 
> Can I convert one variable at a time to be a TLS descriptor?
> 
> As is done currently with the gcc attributes for TLS mode?

Why would you want to? If I understand correctly, your idea is that
current libraries are using GD for most TLS, and IE for specific
variables, and they'd want to keep using the same (non-TLSDESC) GD for
most TLS but TLSDESC for specific variables. This makes no sense.
Simply using TLSDESC (which technically is GD model) for everything is
an improvement with no drawbacks.

> > In order to ease this sort of transition, I've been thinking of
> > introducing an alias access model in GCC, that would map to GD if TLS
> > Descriptors are enable, or to the failure-prone IE with old-style TLS.
> > Then those who incorrectly use IE today can switch to that; it will be a
> > no-op on arches that don't have TLSDesc, or that don't use it by
> > default, but it will make the fix automatic as each platform switches to
> > the superior TLS design.
> 
> Oh. Right. If upstream can't use TLS descriptors everywhere, then it
> may find itself failing to compile on certain targets that don't support
> descriptors.

TLSDESC is supported for the archs where it's likely to matter. On
some of the ones where it's not, even reading the thread pointer is
normally a trap to kernelspace, so whatever userspace overhead there
is in getting the offset for a TLS variable is going to be utterly
irrelevant (dominated by the trap).

> >> In Fedora we disallow greedy consumption of TLS descriptors on any
> >> targets that have TLS descriptors on by default.
> > 
> > Oh, wow, this is such a great move that it makes TLS Descriptors's
> > performance the *worst* of all existing access models.  If we want to
> > artificially force them into their worst case, we might as well get rid
> > of them altogether!
> 
> If it doesn't work and causes applications to stop working
> I'll disable it, and I did :-)
> 
> > Whom should I thank for making my work appear to suck?  :-(
> 
> Me. I didn't do it because I think it was the right solution.
> I did it because users need working applications to do the tasks
> they chose Fedora for.
> 
> It is not sufficient for me to say: "Wait a few months while I
> fix the fundamental flaws in the education of users and the usage
> of our tools." :-}

I'm with Alexandre on this. In this case it seems like your "quick
fix" may not just be neutral but actually _discouraging_ people from
switching to the better system.

> > :-P :-)
> > 
> >> We need to turn on TLS descriptors by default on x86_64 such
> >> that we can get the benefits there, and start moving DSOs away
> >> from TLS IE.
> > 
> > Hallelujah! :-)
> 
> You know I know what the right answer is, but we have to get there
> one step at a time with working applications the whole way.
> 
> In summary looks like we need:
> 
> (a) Immediately increase DTV surplus size.
> (b) Implement static TLS support without needing a DTV increase.
> (c) Remove faulty heursitics around not wanting to increase DTV size.
> (d) Add __attribute__((tls_model("go-fast"))) to gcc that defaults to
>     IE if TLS Desc is not present.
> (e) Approach upstream projects with patches to convert to TLS descriptors
>     using go-fast model.
> 
> Does this plan make sense?

I think (d) should be omitted, and a step (f) should be added: patch
binutils to disallow the creation of .so files with IE TLS.

Rich

Follow-Ups:
- Re: Fixing the distribution problems with TLS and DTV_SURPLUS slots.
  - From: Alexandre Oliva

References:
- Fixing the distribution problems with TLS and DTV_SURPLUS slots.
  - From: Carlos O'Donell
- Re: Fixing the distribution problems with TLS and DTV_SURPLUS slots.
  - From: Alexandre Oliva
- Re: Fixing the distribution problems with TLS and DTV_SURPLUS slots.
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]