Bug 19924 - TLS performance degradation after dlopen
Summary: TLS performance degradation after dlopen
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.23
: P2 normal
Target Milestone: 2.39
Assignee: Not yet assigned to anyone
: 27248 (view as bug list)
Depends on:
Reported: 2016-04-08 10:56 UTC by Philipp Trommler
Modified: 2023-09-04 12:13 UTC (History)
13 users (show)

See Also:
Last reconfirmed:
fweimer: security-

Patch that fixes the TLS performance degradation after dlopen (1.96 KB, patch)
2016-04-08 10:56 UTC, Philipp Trommler
Details | Diff
tls_mini.c (451 bytes, text/x-csrc)
2016-04-08 10:57 UTC, Philipp Trommler
tls_access.c (87 bytes, text/x-csrc)
2016-04-08 10:57 UTC, Philipp Trommler
Makefile (200 bytes, text/plain)
2016-04-08 10:58 UTC, Philipp Trommler
dummy.c (79 bytes, text/x-csrc)
2016-04-08 10:58 UTC, Philipp Trommler
tls performace (1.43 KB, patch)
2021-02-23 02:26 UTC, ksong
Details | Diff
0001-This-fixes-a-subset-of-the-issues-described-in.patch (3.17 KB, patch)
2021-02-25 07:33 UTC, ksong
Details | Diff
0002-This-patch-is-not-necessary-for-the-bug-fix-just-mak.patch (980 bytes, patch)
2021-02-25 07:36 UTC, ksong
Details | Diff
Fix-TLS-performance-degradation-after-dlopen (1.45 KB, application/mbox)
2021-05-18 10:12 UTC, ksong

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Trommler 2016-04-08 10:56:02 UTC
Created attachment 9165 [details]
Patch that fixes the TLS performance degradation after dlopen


we have noticed a performance degradation of TLS access in shared libraries. If another shared library that uses TLS is loaded via dlopen, __tls_get_addr takes significant more time. Once that shared library accesses it's TLS, the performance normalizes.

Attached is a minimal example which gives the following output:

time: 0.723744
after dlopen,
time: 1.789016
after tls access in loaded lib,
time: 0.672865

We do have a use-case where this is actually really significant. I believe this happens for instance if libstdc++ is loaded implicitly, but TLS features are not actively used.

I strongly suspect this is the same issue as discussed in this post on the µClibc mailing list:


and therefore the patch provided mainly reuses the solution they've found for µClibc.

Main development platform information (tested on multiple platforms):

* latest glibc git version (pulled at Fri Apr  8 10:50:19 CEST 2016)
* x86_64-pc-linux-gnu (but should be independent)
* Linux vishnu 4.2.0-23-generic #28-Ubuntu SMP Sun Dec 27 17:47:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
* gcc version 5.2.1 20151010 (Ubuntu 5.2.1-22ubuntu2)
* GNU ld (GNU Binutils for Ubuntu) 2.25.1

The patch provided touches code already altered (but not released) by the work on bug 19329.

Philipp Trommler.
Comment 1 Philipp Trommler 2016-04-08 10:57:40 UTC
Created attachment 9166 [details]
Comment 2 Philipp Trommler 2016-04-08 10:57:59 UTC
Created attachment 9167 [details]
Comment 3 Philipp Trommler 2016-04-08 10:58:16 UTC
Created attachment 9168 [details]
Comment 4 Philipp Trommler 2016-04-08 10:58:33 UTC
Created attachment 9169 [details]
Comment 5 Philipp Trommler 2016-05-09 19:52:13 UTC
After a month of waiting, aren't there any thoughts about this? Do you need more information? Are there any caveats preventing the inclusion of the patch?
Comment 6 Adhemerval Zanella 2018-01-05 12:53:37 UTC
(In reply to Philipp Trommler from comment #5)
> After a month of waiting, aren't there any thoughts about this? Do you need
> more information? Are there any caveats preventing the inclusion of the
> patch?

Hi, patch proposal should be discussed in libc-alpha for review.
Comment 7 Florian Weimer 2021-01-26 10:29:24 UTC
*** Bug 27248 has been marked as a duplicate of this bug. ***
Comment 8 ksong 2021-01-27 01:44:41 UTC
We encountered the same problem. After using this patch, the performance has improved, but we have seen that this case has not been updated after 2018 and has not been merged upstream. Can we continue to push forward?
Comment 9 Adhemerval Zanella 2021-01-27 11:26:22 UTC
We need someone to rebase the previous patch [1], test if it really fixes the issue, provide a testcase that reproduces the issue.  The last submission [1] still have the same 'Tilera' common that Chris Metacalf has asked to be removed and the commit message did not describe the issue not what it is aiming to fix. 

Also, as Szabolcs has added this issues related to the BZ#19329 and he suggested first track this down to fix the correctness of this issue before optimizing it.

[1] https://sourceware.org/pipermail/libc-alpha/2018-January/090810.html
Comment 10 ksong 2021-02-02 10:36:50 UTC
It is difficult to reproduce the problem of providing test cases. Besides, how can we push this problem forward? Can you give some suggestions and advance it further?
For "rebase the previous patch",From the historical information, I think we can continue to rely on "Philipp Trommler".
Comment 11 ksong 2021-02-23 02:26:40 UTC
Created attachment 13253 [details]
tls performace

Test step:
I clone newest version for glibc and compile it.

export glibc_install="$(pwd)/glibc/build/install"

git clone git://sourceware.org/git/glibc.git
cd glibc
mkdir build
cd build
../configure --prefix "$glibc_install"

Download test case and compile it. Generate tls_mini.

$LANG=C /home/ksong/work/package/glibc/build/elf/ld-linux-x86-64.so.2 --library-path /home/ksong/work/test/tls:/home/ksong/work/package/glibc/build:/home/ksong/work/package/glibc/build/math:/home/ksong/work/package/glibc/build/elf:/home/ksong/work/package/glibc/build/dlfcn:/home/ksong/work/package/glibc/build/nss:/home/ksong/work/package/glibc/build/nis:/home/ksong/work/package/glibc/build/rt:/home/ksong/work/package/glibc/build/resolv:/home/ksong/work/package/glibc/build/crypt:/home/ksong/work/package/glibc/build/mathvec:/home/ksong/work/package/glibc/build/nptl:/lib64 --list ./tls_mini
linux-vdso.so.1 (0x00007ffdf37ce000)
libtls_access.so => /home/ksong/work/test/tls/libtls_access.so (0x00007f0d812ff000)
libdl.so.2 => /home/ksong/work/package/glibc/build/dlfcn/libdl.so.2 (0x00007f0d810fb000)
libc.so.6 => /home/ksong/work/package/glibc/build/libc.so.6 (0x00007f0d80d3b000)
/lib64/ld-linux-x86-64.so.2 => /home/ksong/work/package/glibc/build/elf/ld-linux-x86-64.so.2 (0x00007f0d81704000)

Exec tls_mini.
$LANG=C /home/ksong/work/package/glibc/build/elf/ld-linux-x86-64.so.2 --library-path /home/ksong/work/test/tls:/home/ksong/work/package/glibc/build:/home/ksong/work/package/glibc/build/math:/home/ksong/work/package/glibc/build/elf:/home/ksong/work/package/glibc/build/dlfcn:/home/ksong/work/package/glibc/build/nss:/home/ksong/work/package/glibc/build/nis:/home/ksong/work/package/glibc/build/rt:/home/ksong/work/package/glibc/build/resolv:/home/ksong/work/package/glibc/build/crypt:/home/ksong/work/package/glibc/build/mathvec:/home/ksong/work/package/glibc/build/nptl:/lib64 ./tls_mini
time: 0.510322
time: 1.468314
time: 0.501844

after "git am --reject 0001-Fix-TLS-performance-degradation-after-dlopen.patch"

and recompile glibc.
Exec tls_mini
$LANG=C /home/ksong/work/package/glibc/build/elf/ld-linux-x86-64.so.2 --library-path /home/ksong/work/test/tls:/home/ksong/work/package/glibc/build:/home/ksong/work/package/glibc/build/math:/home/ksong/work/package/glibc/build/elf:/home/ksong/work/package/glibc/build/dlfcn:/home/ksong/work/package/glibc/build/nss:/home/ksong/work/package/glibc/build/nis:/home/ksong/work/package/glibc/build/rt:/home/ksong/work/package/glibc/build/resolv:/home/ksong/work/package/glibc/build/crypt:/home/ksong/work/package/glibc/build/mathvec:/home/ksong/work/package/glibc/build/nptl:/lib64 ./tls_mini
time: 0.585535
time: 0.555755
time: 0.554991

We can see a big change in the performance of dlopen.
Comment 12 Carlos O'Donell 2021-02-23 02:47:47 UTC
(In reply to ksong from comment #11)
> We can see a big change in the performance of dlopen.

It would be great if you could test Szabolcs series here and see if it has a positive impact for you:

My suspicion is that this bug is the same as 19329 or other related TLS races.
Comment 13 ksong 2021-02-25 07:31:51 UTC
Try backport patch form Szabolcs Nagy.

Rebuild glibc.
Exec tls_mini test case.
[ctu-ksong-d1:tls]$LANG=C /home/ksong/work/package/glibc/build/elf/ld-linux-x86-64.so.2 --library-path /home/ksong/work/test/tls:/home/ksong/work/package/glibc/build:/home/ksong/work/package/glibc/build/math:/home/ksong/work/package/glibc/build/elf:/home/ksong/work/package/glibc/build/dlfcn:/home/ksong/work/package/glibc/build/nss:/home/ksong/work/package/glibc/build/nis:/home/ksong/work/package/glibc/build/rt:/home/ksong/work/package/glibc/build/resolv:/home/ksong/work/package/glibc/build/crypt:/home/ksong/work/package/glibc/build/mathvec:/home/ksong/work/package/glibc/build/nptl:/lib64 ./tls_mini
time: 0.583575
time: 1.244377
time: 0.553654

It has some performance improvements, but not as obvious as the previous solution.
Comment 14 ksong 2021-02-25 07:33:31 UTC
Created attachment 13263 [details]
Comment 15 ksong 2021-02-25 07:36:11 UTC
Created attachment 13264 [details]
Comment 16 ksong 2021-02-25 09:54:17 UTC
Refer http://patches-tcwg.linaro.org/patch/49142/
I also try backport these patch.
[01/15] aarch64: free tlsdesc data on dlclose [BZ #27403]
[02/15] elf: Fix data race in _dl_name_match_p [BZ #21349]
[03/15] Add test case for [BZ #19329]
[04/15] Add a DTV setup test [BZ #27136]
[05/15] elf: Fix a DTV setup issue [BZ #27136]
[06/15] elf: Fix comments and logic in _dl_add_to_slotinfo
[07/15] elf: Refactor _dl_update_slotinfo to avoid use after free
[08/15] elf: Fix data races in pthread_create and TLS access [BZ #19329]
[09/15] elf: Use relaxed atomics for racy accesses [BZ #19329]
[10/15] elf: Fix DTV gap reuse logic [BZ #27135] 

and try to rebuild glibc, but for our testcase tls_mini, Performance has not improved.

For below patch, 
[11/15] x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
[12/15] i386: Avoid lazy relocation of tlsdesc [BZ #27137]
[13/15] x86_64: Remove lazy tlsdesc relocation related code
[14/15] i386: Remove lazy tlsdesc relocation related code
[15/15] elf: Remove lazy tlsdesc relocation related code 

I backported them,but compile failed.
In file included from dynamic-link.h:92:0,
                 from dl-load.c:60:
../sysdeps/x86_64/dl-machine.h: In function ‘elf_machine_runtime_setup’:
../sysdeps/x86_64/dl-machine.h:132:23: error: ‘_dl_tlsdesc_resolve_rela’ undeclared (first use in this function); did you mean ‘_dl_tlsdesc_return’?
       = (ElfW(Addr)) &_dl_tlsdesc_resolve_rela;
../sysdeps/x86_64/dl-machine.h:132:23: note: each undeclared identifier is reported only once for each function it appears in
Comment 17 ksong 2021-03-24 12:08:52 UTC
For the issue, any update?
Comment 18 Carlos O'Donell 2021-03-24 15:19:57 UTC
(In reply to ksong from comment #17)
> hi,@all
> For the issue, any update?
> Thanks!

None made yet. The bigger issue right now is probably reviewing Szabolc Nagy's (Arm) work to remote data races from the early TLS access in the loader. I feel like that work is a pre-requisite for even looking at this.
e.g. https://patchwork.sourceware.org/project/glibc/list/?series=1673
Comment 19 ksong 2021-05-18 02:57:28 UTC
Refer https://sourceware.org/bugzilla/show_bug.cgi?id=19329
I have seen some related modifications have been merged.
But based on the latest version, performance problems still have not improved, some related code is not merged.
Please refer test result:
LANG=C /home/ksong/work/package/glibc/build/elf/ld-linux-x86-64.so.2 --library-path /home/ksong/work/test/tls:/home/ksong/work/package/glibc/build:/home/ksong/work/package/glibc/build/math:/home/ksong/work/package/glibc/build/elf:/home/ksong/work/package/glibc/build/dlfcn:/home/ksong/work/package/glibc/build/nss:/home/ksong/work/package/glibc/build/nis:/home/ksong/work/package/glibc/build/rt:/home/ksong/work/package/glibc/build/resolv:/home/ksong/work/package/glibc/build/crypt:/home/ksong/work/package/glibc/build/mathvec:/home/ksong/work/package/glibc/build/nptl:/lib64 ./tls_mini
time: 0.509420
time: 1.406026
time: 0.493889

I want to know what is next. 
Comment 20 ksong 2021-05-18 10:12:42 UTC
Created attachment 13457 [details]
Comment 21 ksong 2021-05-18 10:13:28 UTC
Based on the latest version, I made a new patch
And did the same test, and its performance is obviously better. 
LANG=C /home/ksong/work/package/glibc/build/elf/ld-linux-x86-64.so./ksong/work/test/tls:/home/ksong/work/package/glibc/build:/home/ksong/work/package/glibc/build/math:/home/ksong/wd/elf:/home/ksong/work/package/glibc/build/dlfcn:/home/ksong/work/package/glibc/build/nss:/home/ksong/work/package/ksong/work/package/glibc/build/rt:/home/ksong/work/package/glibc/build/resolv:/home/ksong/work/package/glibc/buwork/package/glibc/build/mathvec:/home/ksong/work/package/glibc/build/nptl:/lib64 ./tls_mini
time: 0.583998
time: 0.560171
time: 0.553203
Comment 22 Adhemerval Zanella 2021-05-18 13:11:46 UTC
The recent dynamic TLS fixes pushed upstream did not try to fix this issue. Szabolcs send a RFC fix [1], and it seems to fix this issue.

[1] https://patchwork.sourceware.org/project/glibc/patch/b116855de71098ef7dd2875dd3237f8f3ecc12c2.1618301209.git.szabolcs.nagy@arm.com/
Comment 23 ksong 2021-05-19 08:07:28 UTC
I made new patch base newest version and do same test.

LANG=C /home/ksong/work/package/glibc/build/elf/ld-linux-x86-64.so.2 --library-path /home/ksong/work/test/tls:/home/ksong/work/package/glibc/build:/home/ksong/work/package/glibc/build/math:/home/ksong/work/package/glibc/build/elf:/home/ksong/work/package/glibc/build/dlfcn:/home/ksong/work/package/glibc/build/nss:/home/ksong/work/package/glibc/build/nis:/home/ksong/work/package/glibc/build/rt:/home/ksong/work/package/glibc/build/resolv:/home/ksong/work/package/glibc/build/crypt:/home/ksong/work/package/glibc/build/mathvec:/home/ksong/work/package/glibc/build/nptl:/lib64 ./tls_mini
time: 0.551995
time: 0.545402
time: 0.544759

The performance has been improved.
Comment 24 ksong 2021-06-22 11:01:24 UTC
For the patch.
When can it be merged into the main branch? 
Comment 25 pgowda 2021-12-08 09:38:50 UTC
> The recent dynamic TLS fixes pushed upstream did not try to fix this issue.
> Szabolcs send a RFC fix [1], and it seems to fix this issue.
> https://patchwork.sourceware.org/project/glibc/patch/
> b116855de71098ef7dd2875dd3237f8f3ecc12c2.1618301209.git.szabolcs.nagy@arm.
> com/


Can anyone please update regarding the above mentioned patch?
Will it be merged upstream as its fixing the issue or should it be
further modified?

Comment 26 Paul Zimmermann 2023-01-05 16:09:54 UTC
my advice is to send the patch to the libc-alpha list. This is the usual way patches are proposed and reviewed: https://sourceware.org/mailman/listinfo/libc-alpha

Please also include a small example demonstrating why the patch improves things.
Comment 27 Szabolcs Nagy 2023-01-06 10:43:54 UTC
affects sage math too
Comment 28 Sourceware Commits 2023-09-01 07:22:13 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:


commit d2123d68275acc0f061e73d5f86ca504e0d5a344
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Tue Feb 16 12:55:13 2021 +0000

    elf: Fix slow tls access after dlopen [BZ #19924]
    In short: __tls_get_addr checks the global generation counter and if
    the current dtv is older then _dl_update_slotinfo updates dtv up to the
    generation of the accessed module. So if the global generation is newer
    than generation of the module then __tls_get_addr keeps hitting the
    slow dtv update path. The dtv update path includes a number of checks
    to see if any update is needed and this already causes measurable tls
    access slow down after dlopen.
    It may be possible to detect up-to-date dtv faster.  But if there are
    many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
    least walking the slotinfo list.
    This patch tries to update the dtv to the global generation instead, so
    after a dlopen the tls access slow path is only hit once.  The modules
    with larger generation than the accessed one were not necessarily
    synchronized before, so additional synchronization is needed.
    This patch uses acquire/release synchronization when accessing the
    generation counter.
    Note: in the x86_64 version of dl-tls.c the generation is only loaded
    once, since relaxed mo is not faster than acquire mo load.
    I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
    powerpc, sparc, x86 who reported that it fixes the performance issue
    of bug 19924.
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 29 Florian Weimer 2023-09-04 12:13:48 UTC
Fixed for glibc 2.39.