Bug 27135 - empty dtv slotinfo entries are not reused after dlclose
Summary: empty dtv slotinfo entries are not reused after dlclose
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.34
: P2 normal
Target Milestone: ---
Assignee: Adhemerval Zanella
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-31 12:59 UTC by Szabolcs Nagy
Modified: 2024-05-16 07:31 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Szabolcs Nagy 2020-12-31 12:59:50 UTC
assume a.so and b.so have tls, then

void f()
{
  void *a = dlopen ("a.so", RTLD_LAZY);
  void *b = dlopen ("b.so", RTLD_LAZY);
  for (i = 0; i < 1000; i++)
    {
      dlclose (a);
      a = dlopen ("a.so", RTLD_LAZY);
      dlclose (b);
      b = dlopen ("b.so", RTLD_LAZY);
    }
}

now creates 2002 entries in the dtv slotinfo list
instead of 2: unused entries are only reused if the
most recent dlopened module was closed, earlier gaps
are not reused.

there is logic to reclaim the gaps but GL(dl_tls_dtv_gaps)
is only set to true after a dlopen failure, not after a
dlclose: if a module with missing symbol references is
dlopened in the loop then the behaviour changes and gaps
are reclaimed as expected.

so in the absence of dlopen failures a dlopen/dlclose
heavy application can keep growing its dtv slotinfo
list which means growing dtv in all threads and slower
dtv management since there are many linear walks over
the slotinfo list.
Comment 1 Sourceware Commits 2021-04-15 08:33:53 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8f85075a2e9c26ff7486d4bbaf358999807d215c

commit 8f85075a2e9c26ff7486d4bbaf358999807d215c
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Thu Dec 31 12:24:38 2020 +0000

    elf: Add a DTV setup test [BZ #27136]
    
    The test dlopens a large number of modules with TLS, they are reused
    from an existing test.
    
    The test relies on the reuse of slotinfo entries after dlclose, without
    bug 27135 fixed this needs a failing dlopen. With a slotinfo list that
    has non-monotone increasing generation counters, bug 27136 can trigger.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 2 Sourceware Commits 2021-05-11 16:17:26 UTC
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=572bd547d57a39b6cf0ea072545dc4048921f4c3

commit 572bd547d57a39b6cf0ea072545dc4048921f4c3
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date:   Thu Dec 31 13:59:38 2020 +0000

    elf: Fix DTV gap reuse logic [BZ #27135]
    
    For some reason only dlopen failure caused dtv gaps to be reused.
    
    It is possible that the intent was to never reuse modids for a
    different module, but after dlopen failure all gaps are reused
    not just the ones caused by the unfinished dlopened.
    
    So the code has to handle reused modids already which seems to
    work, however the data races at thread creation and tls access
    (see bug 19329 and bug 27111) may be more severe if slots are
    reused so this is scheduled after those fixes. I think fixing
    the races are not simpler if reuse is disallowed and reuse has
    other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
    removed from the middle of the slotinfo list. The value does
    not have to be correct: incorrect true value causes the next
    modid query to do a slotinfo walk, incorrect false will leave
    gaps and new entries are added at the end.
    
    Fixes bug 27135.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Comment 3 Szabolcs Nagy 2021-05-11 16:25:23 UTC
fixed for 2.34
Comment 5 Florian Weimer 2021-06-25 07:00:47 UTC
I'm marking this as security- because there is no known security impact after the revert (although the incorrect modid aliasing could introduce application bugs).
Comment 6 Florian Weimer 2021-06-25 08:29:53 UTC
Revert applied to glibc 2.34:

commit 40ebfd016ad284872f434bdd76dbe9c708db4d6b
Author: Florian Weimer <fweimer@redhat.com>
Date:   Fri Jun 25 08:09:08 2021 +0200

    elf: Disable most of TLS modid gaps processing [BZ #27135]
    
    Revert "elf: Fix DTV gap reuse logic [BZ #27135]"
    
    This reverts commit 572bd547d57a39b6cf0ea072545dc4048921f4c3.
    
    It turns out that the _dl_next_tls_modid in _dl_map_object_from_fd keeps
    returning the same modid over and over again if there is a gap and
    more than TLS-using module is loaded in one dlopen call.  This corrupts
    TLS data structures.  The bug is still present after a revert, but
    empirically it is much more difficult to trigger (because it involves a
    dlopen failure).
Comment 7 Sourceware Commits 2021-07-14 18:10:44 UTC
The master branch has been updated by Adhemerval Zanella <azanella@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ba33937be210da5d07f7f01709323743f66011ce

commit ba33937be210da5d07f7f01709323743f66011ce
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Fri Jun 25 10:54:12 2021 -0300

    elf: Fix DTV gap reuse logic (BZ #27135)
    
    This is updated version of the 572bd547d57a (reverted by 40ebfd016ad2)
    that fixes the _dl_next_tls_modid issues.
    
    This issue with 572bd547d57a patch is the DTV entry will be only
    update on dl_open_worker() with the update_tls_slotinfo() call after
    all dependencies are being processed by _dl_map_object_deps().  However
    _dl_map_object_deps() itself might call _dl_next_tls_modid(), and since
    the _dl_tls_dtv_slotinfo_list::map is not yet set the entry will be
    wrongly reused.
    
    This patch fixes by renaming the _dl_next_tls_modid() function to
    _dl_assign_tls_modid() and by passing the link_map so it can set
    the slotinfo value so a subsequente _dl_next_tls_modid() call will
    see the entry as allocated.
    
    The intermediary value is cleared up on remove_slotinfo() for the case
    a library fails to load with RTLD_NOW.
    
    This patch fixes BZ #27135.
    
    Checked on x86_64-linux-gnu.
    
    Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Comment 8 Adhemerval Zanella 2021-07-14 18:11:29 UTC
Fixed on 2.34.