This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fixes TLS performance degradation after dlopen

On 6/7/2016 10:28 AM, Philipp Trommler wrote:
Fixes a performance degradation of TLS access in shared libraries which
can be observed after another shared library that uses TLS is loaded
via dl_open. In this case __tls_get_addr takes significant more time.
Once that shared library accesses it's TLS, the performance normalizes.

Attached is a minimal example which gives the following output:

time: 0.723744
after dlopen,
time: 1.789016
after tls access in loaded lib,
time: 0.672865

We do have a use-case where this is actually really significant. I
believe this happens for instance if libstdc++ is loaded implicitly,
but TLS features are not actively used.

I strongly suspect this is the same issue as discussed in this post on
the µClibc mailing list:

and therefore the patch provided mainly reuses the solution they've
found for µClibc.

I'm not sure this patch ever landed in uclibc; Tilera switched to using glibc
around that time and I didn't follow up with uclibc.

In a reply to that same thread on the uclibc mailing list (Oct 13 2010) I wrote:
"In my email quoted below, I found that __tls_get_addr() under the uClibc
version we were running with was always running the slow path.  We were
running an old version from the nptl branch, back at change 22990 on svn.
Since then, I've ported glibc-2.11.2 to tile.  I looked at the performance
of __tls_get_addr() under glibc, and found that the fast path is correctly
taken in that environment."

So it sounds like you may have found a case in glibc where we don't end
up on the slow path.

I admit I never felt like I fully understood what was going on in this code at
the time, and I was hoping my patch would inspire someone to jump on the
issue and either sign off on my patch, or figure out a deeper cause to the problem.
So I don't know that I feel qualified to sign off on your patch, particularly
six years after I last looked at this stuff.

One nitpick (and it's from my original code):

-  if (dtv[0].counter < listp->slotinfo[idx].gen)
-      /* The generation counter for the slot is higher than what the
-	 current dtv implements.  We have to update the whole dtv but
-	 only those entries with a generation counter <= the one for
-	 the entry we need.  */
-      size_t new_gen = listp->slotinfo[idx].gen;
+      /* Tilera: leave scope to make diffs against our baseline
easier. */
        size_t total = 0;
/* We have to look through the entire dtv slotinfo list. */

You may have intended this to help review by avoiding whitespace diffs,
as I did when I first mailed my patch out, but obviously you'll want to
remove the "Tilera" comment line, the bracket before it, and then re-indent
the whole body.

Carlos raises a concern about copyright.  I'm certainly happy to do whatever
I need to to make this code that I authored available to the FSF, but since
I am not directly involved with the testing and hacking of Philipp's use case,
it probably makes more sense for Philipp to drive the process.  So Carlos,
what is the right way to make this happen?  If this code is more or less
the right answer, and Philipp can test it and confirm that it works, and
someone else is happy to review it and sign off on it, I could resubmit it
under my name, I suppose?  Even if Philipp ends up making some minor
tweaks, that should fall under the usual "trivial work" exemption. And if
it's the wrong approach entirely, none of this matters :-)

By the way, I'm assuming my assignment on file for Tilera and/or EZchip
continues to be in force for Mellanox.  I don't know why it wouldn't,
but IANAL, so...

Chris Metcalf, Mellanox Technologies

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]