Bug 27111

Summary: pthread_create and tls access use link_map objects that may be concurrently freed by dlclose
Product: glibc Reporter: Szabolcs Nagy <nsz>
Component: dynamic-linkAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 2.32   
Target Milestone: 2.34   
Host: Target:
Build: Last reconfirmed:

Description Szabolcs Nagy 2020-12-24 14:39:29 UTC
concurrent pthread_create (or tls access) and dlclose are not
safe now because pthread_create can dereference link_map
pointers that may be freed.

tls access has the same problem but there this is only used for
an assertion check that is not strictly necessary so easy to fix.
pthread_create really needs to look at the link_maps in case
they have static tls that needs tls and dtv initialization at
thread creation time.

neither pthread_create nor tls access hold the dl_load_lock
that would prevent this issue.
Comment 1 Sourceware Commits 2021-05-11 16:17:10 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1387ad6225c2222f027790e3f460e31aa5dd2c54

commit 1387ad6225c2222f027790e3f460e31aa5dd2c54
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Wed Dec 30 19:19:37 2020 +0000

    elf: Fix data races in pthread_create and TLS access [BZ #19329]
    
    DTV setup at thread creation (_dl_allocate_tls_init) is changed
    to take the dlopen lock, GL(dl_load_lock).  Avoiding data races
    here without locks would require design changes: the map that is
    accessed for static TLS initialization here may be concurrently
    freed by dlclose.  That use after free may be solved by only
    locking around static TLS setup or by ensuring dlclose does not
    free modules with static TLS, however currently every link map
    with TLS has to be accessed at least to see if it needs static
    TLS.  And even if that's solved, still a lot of atomics would be
    needed to synchronize DTV related globals without a lock. So fix
    both bug 19329 and bug 27111 with a lock that prevents DTV setup
    running concurrently with dlopen or dlclose.
    
    _dl_update_slotinfo at TLS access still does not use any locks
    so CONCURRENCY NOTES are added to explain the synchronization.
    The early exit from the slotinfo walk when max_modid is reached
    is not strictly necessary, but does not hurt either.
    
    An incorrect acquire load was removed from _dl_resize_dtv: it
    did not synchronize with any release store or fence and
    synchronization is now handled separately at thread creation
    and TLS access time.
    
    There are still a number of racy read accesses to globals that
    will be changed to relaxed MO atomics in a followup patch. This
    should not introduce regressions compared to existing behaviour
    and avoid cluttering the main part of the fix.
    
    Not all TLS access related data races got fixed here: there are
    additional races at lazy tlsdesc relocations see bug 27137.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 2 Sourceware Commits 2021-05-11 16:17:25 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=572bd547d57a39b6cf0ea072545dc4048921f4c3

commit 572bd547d57a39b6cf0ea072545dc4048921f4c3
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Thu Dec 31 13:59:38 2020 +0000

    elf: Fix DTV gap reuse logic [BZ #27135]
    
    For some reason only dlopen failure caused dtv gaps to be reused.
    
    It is possible that the intent was to never reuse modids for a
    different module, but after dlopen failure all gaps are reused
    not just the ones caused by the unfinished dlopened.
    
    So the code has to handle reused modids already which seems to
    work, however the data races at thread creation and tls access
    (see bug 19329 and bug 27111) may be more severe if slots are
    reused so this is scheduled after those fixes. I think fixing
    the races are not simpler if reuse is disallowed and reuse has
    other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
    removed from the middle of the slotinfo list. The value does
    not have to be correct: incorrect true value causes the next
    modid query to do a slotinfo walk, incorrect false will leave
    gaps and new entries are added at the end.
    
    Fixes bug 27135.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 3 Szabolcs Nagy 2021-05-11 16:26:24 UTC
fixed for 2.34