Bug 25112 - dlopen must not make new objects accessible when it still can fail with an error
Summary: dlopen must not make new objects accessible when it still can fail with an error
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.31
Assignee: Florian Weimer
URL:
Keywords:
: 14577 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-10-17 09:29 UTC by Florian Weimer
Modified: 2024-05-07 12:43 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Weimer 2019-10-17 09:29:32 UTC
dlopen calls add_to_global (from dl_open_worker) after running ELF constructors for the new objects.  At this point, recovery from malloc failure is quite complicated: We need to run the ELF destructors and close all opened objects.  The current implementation does none of this and leaves the process in an inconsistent state.

The proper fix to deal with this is to allocate the required memory before running ELF constructors.
Comment 1 Florian Weimer 2019-10-21 11:07:13 UTC
Patch posted: https://sourceware.org/ml/libc-alpha/2019-10/msg00534.html
Comment 2 Florian Weimer 2019-10-23 14:06:12 UTC
The problem is more general.  This loop in dl_open_worker adds new maps to the scope of maps that already exist when dlopen was called:

  /* If the file is not loaded now as a dependency, add the search
     list of the newly loaded object to the scope.  */
  bool any_tls = false;
  unsigned int first_static_tls = new->l_searchlist.r_nlist;
  for (unsigned int i = 0; i < new->l_searchlist.r_nlist; ++i)
    {
      struct link_map *imap = new->l_searchlist.r_list[i];
      int from_scope = 0;

      /* If the initializer has been called already, the object has
	 not been loaded here and now.  */
      if (imap->l_init_called && imap->l_type == lt_loaded)
…
	  /* First terminate the extended list.  Otherwise a thread
	     might use the new last element and then use the garbage
	     at offset IDX+1.  */
	  imap->l_scope[cnt + 1] = NULL;
	  atomic_write_barrier ();
	  imap->l_scope[cnt] = &new->l_searchlist;


A concurrent dlsym call or lazy binding operation therefore has access to the new map at this point. But subsequent malloc calls can still fail, resulting in a corrupted overall state. This is simlar to the originally reported add_to_global issue, but it involves different allocations.

We need to make objects available in this way before calling ELF constructors because they may use dlsym or lazy binding (including on other threads). That part is unavoidable. But after bug 24304, recoverable dlopen failure is no longer possible at this point, so we will eventually reach a consistent process state, once the constructors have run.
Comment 3 Sourceware Commits 2019-11-27 20:21:07 UTC
The master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=440b7f8653e4ed8f6e1425145208050b795e9a6c

commit 440b7f8653e4ed8f6e1425145208050b795e9a6c
Author: Florian Weimer <fweimer@redhat.com>
Date:   Thu Oct 31 18:25:39 2019 +0100

    Avoid late failure in dlopen in global scope update [BZ #25112]
    
    The call to add_to_global in dl_open_worker happens after running ELF
    constructors for new objects.  At this point, proper recovery from
    malloc failure would be quite complicated: We would have to run the
    ELF destructors and close all opened objects, something that we
    currently do not do.
    
    Instead, this change splits add_to_global into two phases,
    add_to_global_resize (which can raise an exception, called before ELF
    constructors run), and add_to_global_update (which cannot, called
    after ELF constructors).  A complication arises due to recursive
    dlopen: After the inner dlopen consumes some space, the pre-allocation
    in the outer dlopen may no longer be sufficient.  A new member in the
    namespace structure, _ns_global_scope_pending_adds keeps track of the
    maximum number of objects that need to be added to the global scope.
    This enables the inner add_to_global_resize call to take into account
    the needs of an outer dlopen.
    
    Most code in the dynamic linker assumes that the number of global
    scope entries fits into an unsigned int (matching the r_nlist member
    of struct r_scop_elem).  Therefore, change the type of
    _ns_global_scope_alloc to unsigned int (from size_t), and add overflow
    checks.
    
    Change-Id: Ie08e2f318510d5a6a4bcb1c315f46791b5b77524
Comment 4 Sourceware Commits 2019-11-27 20:21:12 UTC
The master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a509eb117fac1d764b15eba64993f4bdb63d7f3c

commit a509eb117fac1d764b15eba64993f4bdb63d7f3c
Author: Florian Weimer <fweimer@redhat.com>
Date:   Wed Nov 27 16:37:17 2019 +0100

    Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112]
    
    This change splits the scope and TLS slotinfo updates in dlopen into
    two parts: one to resize the data structures, and one to actually apply
    the update.  The call to add_to_global_resize in dl_open_worker is moved
    before the demarcation point at which no further memory allocations are
    allowed.
    
    _dl_add_to_slotinfo is adjusted to make the list update optional.  There
    is some optimization possibility here because we could grow the slotinfo
    list of arrays in a single call, one the largest TLS modid is known.
    
    This commit does not fix the fatal meory allocation failure in
    _dl_update_slotinfo.  Ideally, this error during dlopen should be
    recoverable.
    
    The update order of scopes and TLS data structures is retained, although
    it appears to be more correct to fully initialize TLS first, and then
    expose symbols in the newly loaded objects via the scope update.
    
    Tested on x86_64-linux-gnu.
    
    Change-Id: I240c58387dabda3ca1bcab48b02115175fa83d6c
Comment 5 Florian Weimer 2019-11-27 21:22:33 UTC
Fixed for glibc 2.31.
Comment 6 Florian Weimer 2024-05-07 12:43:32 UTC
*** Bug 14577 has been marked as a duplicate of this bug. ***