This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Fixing the distribution problems with TLS and DTV_SURPLUS slots.
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>, Adam Conrad <adconrad at ubuntu dot com>, Alexandre Oliva <aoliva at redhat dot com>, Roland McGrath <roland at hack dot frob dot com>, Siddhesh Poyarekar <siddhesh at redhat dot com>
- Date: Mon, 06 Oct 2014 15:39:37 -0400
- Subject: Fixing the distribution problems with TLS and DTV_SURPLUS slots.
- Authentication-results: sourceware.org; auth=none
Adam,
I'm sitting on this patch in Fedora, and you asked me to send it
upstream. Unfortunately I don't think it is right solution for
upstream.
Firstly, please don't respond with "But DSOs using TLS IE accesses
are not allowed." It's allowed because the compiler and linker let
people use it and we should have prevented it or spent more time
educating our users. Either way there are valid uses of it, and glibc
itself along with other core libraries want the speed that it offers.
In the future we want to switch them to TLS descriptors which give
you the same fastness, but either way the momentum is there and we'd
have to patch every MESA to get rid of it, so 10 years down the road
we'll be done. Please see the "WARNING:" below about TLS descriptors
and AArch64 (and likely other TLS descriptor targets).
The patch in Fedora is this:
~~~
#
# This is an experimental patch that should go into rawhide and
# Fedora 21 to fix failures where python applications fail to
# load graphics applications because of the slot usages for TLS.
# This should eventually go upstream.
#
# - Carlos O'Donell
#
diff -urN glibc-2.19-886-gdd763fd/sysdeps/generic/ldsodefs.h glibc-2.19-886-gdd763fd.mod/sysdeps/generic/ldsodefs.h
--- glibc-2.19-886-gdd763fd/sysdeps/generic/ldsodefs.h 2014-08-21 01:00:55.000000000 -0400
+++ glibc-2.19-886-gdd763fd.mod/sysdeps/generic/ldsodefs.h 2014-09-04 19:29:42.929692810 -0400
@@ -388,8 +388,18 @@
have to iterate beyond the first element in the slotinfo list. */
#define TLS_SLOTINFO_SURPLUS (62)
-/* Number of additional slots in the dtv allocated. */
-#define DTV_SURPLUS (14)
+/* Number of additional allocated dtv slots. This was initially
+ 14, but problems with python, MESA, and X11's uses of static TLS meant
+ that most distributions were very close to this limit when they loaded
+ dynamically interpreted languages that used graphics. The simplest
+ solution was to roughly double the number of slots. The actual static
+ image space usage was relatively small, for example in MESA you
+ had only two dispatch pointers for a total of 16 bytes. If we hit up
+ against this limit again we should start a campaign with the
+ distributions to coordinate the usage of static TLS. Any user of this
+ resource is effectively coordinating a global resource since this
+ surplus is allocated for each thread at startup. */
+#define DTV_SURPLUS (32)
/* Initial dtv of the main thread, not allocated with normal malloc. */
EXTERN void *_dl_initial_dtv;
~~~
The error users are seeing is this:
"dlopen: cannot load any more object with static TLS"
This is triggered by this code:
elf/dl-open.c:
523 /* We need a second pass for static tls data, because _dl_update_slotinfo
524 must not be run while calls to _dl_add_to_slotinfo are still pending. */
525 for (unsigned int i = first_static_tls; i < new->l_searchlist.r_nlist; ++i)
526 {
527 struct link_map *imap = new->l_searchlist.r_list[i];
528
529 if (imap->l_need_tls_init
530 && ! imap->l_init_called
531 && imap->l_tls_blocksize > 0)
532 {
533 /* For static TLS we have to allocate the memory here and
534 now. This includes allocating memory in the DTV. But we
535 cannot change any DTV other than our own. So, if we
536 cannot guarantee that there is room in the DTV we don't
537 even try it and fail the load.
538
539 XXX We could track the minimum DTV slots allocated in
540 all threads. */
541 if (! RTLD_SINGLE_THREAD_P && imap->l_tls_modid > DTV_SURPLUS)
542 _dl_signal_error (0, "dlopen", NULL, N_("\
543 cannot load any more object with static TLS"));
544
545 imap->l_need_tls_init = 0;
546 #ifdef SHARED
547 /* Update the slot information data for at least the
548 generation of the DSO we are allocating data for. */
549 _dl_update_slotinfo (imap->l_tls_modid);
550 #endif
551
552 GL(dl_init_static_tls) (imap);
553 assert (imap->l_need_tls_init == 0);
554 }
555 }
This code is a *heuristic*, it basically fails the load if there
are no DTV slots left, even though we can still do the following:
(a) Grow the DTV dynamically as many times as we want, with the
generation counter causing other threads to update.
and
(b) Allocate from the static TLS image surplus until it is exhausted.
The heuristic avoids doing (a) and (b) if all the surplus slots
were taken.
A better solution would be:
- Keep the use of DTV_SURPLUS to avoid immediately having to reallocate
the DTV when you dlopen a couple of modules.
- Remove the check above, allowing the code to grow the DTV as large
as it wants for as many STATIC_TLS modules as it wants.
- Restrict only on the size of static TLS image space and error when
that is exhausted.
The most common application framework to trigger this is
Python. There are more than 14 libraries in Fedora using TLS,
in fact there are ~40, which is why I raised the DTV_SURPLUS
limit to 32 in Fedora (several can't be loaded simultaneously).
This raising of the DTV_SURPLUS limit is a bandaid, with the
added effect of optimizing performance for Python at the cost
of 18 * (sizeof(size_t)*sizeof(void*)) bytes of dtv_t entries
per thread which avoids the DTV realloc.
I'm not going to have time right now to implement the better
solution. What I'm looking for is expert advice on what to do
here.
The better solution requires considerably more testing, because
now we're doing something we've never done before: allocating
up to the limit of the surplus static TLS image.
Do we grow the DTV_SURPLUS knowing it's a bandaid?
WARNING: On AArch64 or any architecture that uses the generic-ish
code for TLS descriptors, you will have further problems. There
the descriptors consume static TLS image greedily, which means
you may find that there is zero static TLS image space when you
go to dlopen an application. We need to further subdivide the
static TLS image space into "reserved for general use" and
"reserved for DSO load uses." With the TLS descriptors allocating
from the general use space only. On Fedora for AArch64 this
caused no end of headaches attempting to load TLS IE using DSOs
only to find it was literally impossible because so much of the
implementation used TLS descriptors that the surplus static TLS
image space was gone, and while descriptors can be allocated
dynamically, the DSOs can't. In Fedora we disallow greedy
consumption of TLS descriptors on any targets that have TLS
descriptors on by default. Which leads me to the last point.
We need to turn on TLS descriptors by default on x86_64 such
that we can get the benefits there, and start moving DSOs away
from TLS IE.
Comments?
Cheers,
Carlos.