Bug 27137 - lazy tlsdesc relocation has data races
Summary: lazy tlsdesc relocation has data races
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.34
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-31 16:08 UTC by Szabolcs Nagy
Modified: 2021-06-10 02:30 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
simple reproducer of this problem (791.52 KB, application/x-gzip)
2021-04-09 17:53 UTC, Ben Woodard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Szabolcs Nagy 2020-12-31 16:08:37 UTC
on x86 tlsdesc can be lazily relocated, but that

(1) requires locks for _dl_make_tlsdesc_dynamic
(2) allocations in _dl_make_tlsdesc_dynamic can fail
(3) calls _dl_try_allocate_static_tls without locks

from these issues (1) and (2) applies to normal tls access
too (bug 16133 and bug 16134) but (3) is specific to lazy
tlsdesc relocations: there are a lot of accesses to global
state that was not designed to be done without holding
locks such as writing to GL(dl_tls_static_optional),
GL(dl_tls_static_used) and map->l_tls_offset.

on arm and aarch64 lazy tlsdesc reloc was removed earlier
because of data races between accesses to the same tls
variable, these are data races with concurrent thread
creation and independent tls accesses.
Comment 1 Ben Woodard 2021-04-09 17:53:04 UTC
It is hard to argue that this is the same problem but the same bit of code is to blame. The calculation for the TLSDESC relocation ends up crashing the dynamic linker when done lazily in some cases. The patch that addresses these race conditions by disabling lazy relocations also fixes this problem. In this particular case, the problem doesn't appear to be a data race, somehow some libraries are being compiled in such a way that they use the gnu2 TLS variant but they do not contain a PLT or GOT this causes the relocation calculation to go awry and execution crashes in the dynamic linker.

Steps to Reproduce:
1. compile up auditor.so and dlmain.c
2. LD_AUDIT=/path/to/auditor.so /path/to/dlmain /path/to/libmemkind.so

The patch that addresses this problem also addresses the problem I am describing. 
The key one needed to fix this problem is: https://sourceware.org/pipermail/libc-alpha/2021-February/122637.html
It simply disables lazy binding of gnu2 TLSDESC relocations. Unfortunately, there is a minor bug in that patch which the author said that he will address with the second version of the patch set. https://sourceware.org/pipermail/libc-alpha/2021-April/124889.html

It is fairly likely that a variant of this problem also exists in i386 but I have yet to see that in the wild. It therefore may be worth considering also adding patch 12 from Szabolcs Nagy as well when fixing this problem. https://sourceware.org/pipermail/libc-alpha/2021-February/122638.html

Patches 13 and 14 are not strictly necessary to address this bug but they remove the code that has the bug in it. https://sourceware.org/pipermail/libc-alpha/2021-February/122639.html https://sourceware.org/pipermail/libc-alpha/2021-February/122640.html
Comment 2 Ben Woodard 2021-04-09 17:53:42 UTC
Created attachment 13359 [details]
simple reproducer of this problem
Comment 3 Szabolcs Nagy 2021-04-12 08:36:26 UTC
(In reply to Ben Woodard from comment #1)
> somehow some libraries are being compiled in such a way that they use the
> gnu2 TLS variant but they do not contain a PLT or GOT this causes the
> relocation calculation to go awry and execution crashes in the dynamic
> linker.

that would be a linker bug as the DT_TLSDESC_PLT/GOT are needed
for lazy tlsdesc to work, however..

libmemkind.so DYNAMIC section has

 0x000000000000001e (FLAGS)              BIND_NOW
 0x000000006ffffffb (FLAGS_1)            Flags: NOW

i.e. it does not support lazy entry points.

The bug in this case is in ldaudit: it disables bind now and
tries to relocate everything lazily (so it can use plt hooks)
that's wrong for tlsdesc, i opened bug 27721.
Comment 4 Sourceware Commits 2021-04-15 08:48:56 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8f7e09f4dbdb5c815a18b8285fbc5d5d7bc17d86

commit 8f7e09f4dbdb5c815a18b8285fbc5d5d7bc17d86
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Thu Feb 11 11:29:23 2021 +0000

    x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
    
    Lazy tlsdesc relocation is racy because the static tls optimization and
    tlsdesc management operations are done without holding the dlopen lock.
    
    This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67
    for aarch64, but it fixes a different race: bug 27137.
    
    Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to
    relocate tlsdesc lazily, but that does not work in a BIND_NOW module
    due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at
    load time fixes this bug 27721 too.
Comment 5 Sourceware Commits 2021-04-15 08:49:02 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ddcacd91cc10ff92d6201eda87047d029c14158d

commit ddcacd91cc10ff92d6201eda87047d029c14158d
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Thu Feb 11 11:40:11 2021 +0000

    i386: Avoid lazy relocation of tlsdesc [BZ #27137]
    
    Lazy tlsdesc relocation is racy because the static tls optimization and
    tlsdesc management operations are done without holding the dlopen lock.
    
    This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67
    for aarch64, but it fixes a different race: bug 27137.
    
    On i386 the code is a bit more complicated than on x86_64 because both
    rel and rela relocs are supported.
Comment 6 Szabolcs Nagy 2021-04-15 10:45:58 UTC
fixed for 2.34
Comment 7 Sourceware Commits 2021-05-11 16:17:11 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1387ad6225c2222f027790e3f460e31aa5dd2c54

commit 1387ad6225c2222f027790e3f460e31aa5dd2c54
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Wed Dec 30 19:19:37 2020 +0000

    elf: Fix data races in pthread_create and TLS access [BZ #19329]
    
    DTV setup at thread creation (_dl_allocate_tls_init) is changed
    to take the dlopen lock, GL(dl_load_lock).  Avoiding data races
    here without locks would require design changes: the map that is
    accessed for static TLS initialization here may be concurrently
    freed by dlclose.  That use after free may be solved by only
    locking around static TLS setup or by ensuring dlclose does not
    free modules with static TLS, however currently every link map
    with TLS has to be accessed at least to see if it needs static
    TLS.  And even if that's solved, still a lot of atomics would be
    needed to synchronize DTV related globals without a lock. So fix
    both bug 19329 and bug 27111 with a lock that prevents DTV setup
    running concurrently with dlopen or dlclose.
    
    _dl_update_slotinfo at TLS access still does not use any locks
    so CONCURRENCY NOTES are added to explain the synchronization.
    The early exit from the slotinfo walk when max_modid is reached
    is not strictly necessary, but does not hurt either.
    
    An incorrect acquire load was removed from _dl_resize_dtv: it
    did not synchronize with any release store or fence and
    synchronization is now handled separately at thread creation
    and TLS access time.
    
    There are still a number of racy read accesses to globals that
    will be changed to relaxed MO atomics in a followup patch. This
    should not introduce regressions compared to existing behaviour
    and avoid cluttering the main part of the fix.
    
    Not all TLS access related data races got fixed here: there are
    additional races at lazy tlsdesc relocations see bug 27137.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 8 xujing 2021-06-10 02:30:58 UTC
(In reply to Ben Woodard from comment #2)
> Created attachment 13359 [details]
> simple reproducer of this problem

Hi,how can I reproduce on i386? I try to reproduce it on x86_64 by building 32bits program. The steps are as follows:
1. add -m32 to make, like this:
    dlmain: dlmain.c
            gcc -O2 -m32 -g -o dlmain dlmain.c -ldl
    auditor.so: auditor.c
            gcc -O2 -m32 -g -fPIC -shared -o auditor.so auditor.c
2. make
3. make auditor.so
4. LD_AUDIT=/path/to/auditor.so /path/to/dlmain /path/to/libmemkind.so

And I try to build(get) libmemkind.so(ELF 32-bit LSB shared object), but I can't success.
I also try to use other shared library(libreadline.so), I can't reproduce too(on x86_64 can't reproduce too. I think any shared library can replace libmemkind.so to reproduce but not).

Is it only produce on 32bits mathine? Or other problem?

Thanks.