Created attachment 15507 [details] reproducer When a library is loaded and relocated as part of an application and it has TLS memory that it updates and then the library is dynamically loaded with dlopen(), the second load can change the generation counter leaving the library's TLS area set as unallocated causing it to be reallocated the next time that it is used. This results in any information that had been stored in the TLS before the dlopen() of the library being lost. This problem was reported by a customer when using libomp and librocprofiler and in that case libomp loses the mappings to its threads. This problem seems to have existed for quite some time. I have verified that it exists as far back as glibc-2.28 in RHEL8 and it still exists in the latest glibc-2.39 found in rawhide. In other words it seems like practically all versions of glibc are affected. The sequence of operations is as follows: Libraries A and B are loaded and relocated A's init constructor is called: Inside, A calls a function that resolves to B B accesses and alters its TLS B is then dlopen()'d by "C" (which may be A or B or neither) Inside, Glibc advances the generation counter and marks B's TLS as "unallocated" B accesses its TLS again, changes from before are lost In the failing case (audit + rocprof), B is libhpcrun.so, A is libomp.so and "C" is librocprofiler.so. In the suspicious case (no audit + rocprof), B is libomp.so, and A and "C" are libomptarget.so. Sometimes this bug is masked by the fact that B's TLS is a static block and so even though its TLS gets "reallocated" in the middle it gets the same memory back and it isn't reinitialized in between, so it looks like nothing happened. In other cases, the library whose TLS gets reallocated is written robustly enough that it simply reinitializes its TLS data and the only apparent effect is a loss of allocated memory.
The reproducer is a bit dodgy because it calls B_init in libB.so before the ELF constructor of libB.so has run. This exposed a bug in dlopen because it uses l_init_called as a proxy to decide whether to do TLS processing: /* Call _dl_add_to_slotinfo with DO_ADD set to false, to allocate space in GL (dl_tls_dtv_slotinfo_list). This can raise an exception. The return value is true if any of the new objects use TLS. */ static bool resize_tls_slotinfo (struct link_map *new) { bool any_tls = false; for (unsigned int i = 0; i < new->l_searchlist.r_nlist; ++i) { struct link_map *imap = new->l_searchlist.r_list[i]; /* Only add TLS memory if this object is loaded now and therefore is not yet initialized. */ if (! imap->l_init_called && imap->l_tls_blocksize > 0) { _dl_add_to_slotinfo (imap, false); any_tls = true; } } return any_tls; } The recursive dlopen call runs this code again (before invoking the libB.so ELF constructor), overwriting the TLS slot for the object with a newer generation ID. This triggers reallocation of the TLS block in the next _tls_get_addr call. Curiously, this is not a regression introduced by bug 20839, but maybe there are more cases now where this bug occurs.
Technically, this is a use-after-free bug, although in the reproducer, the TLS block is comes back from malloc with the same address.
Patch posted: [PATCH] elf: Avoid re-initializing already allocated TLS in dlopen (bug 31717) <https://inbox.sourceware.org/libc-alpha/878r0dmx0d.fsf@oldenburg.str.redhat.com/>
Fixed for glibc 2.41 via: commit 5097cd344fd243fb8deb6dec96e8073753f962f9 Author: Florian Weimer <fweimer@redhat.com> Date: Thu Aug 1 23:31:30 2024 +0200 elf: Avoid re-initializing already allocated TLS in dlopen (bug 31717) The old code used l_init_called as an indicator for whether TLS initialization was complete. However, it is possible that TLS for an object is initialized, written to, and then dlopen for this object is called again, and l_init_called is not true at this point. Previously, this resulted in TLS being initialized twice, discarding any interim writes (technically introducing a use-after-free bug even). This commit introduces an explicit per-object flag, l_tls_in_slotinfo. It indicates whether _dl_add_to_slotinfo has been called for this object. This flag is used to avoid double-initialization of TLS. In update_tls_slotinfo, the first_static_tls micro-optimization is removed because preserving the initalization flag for subsequent use by the second loop for static TLS is a bit complicated, and another per-object flag does not seem to be worth it. Furthermore, the l_init_called flag is dropped from the second loop (for static TLS initialization) because l_need_tls_init on its own prevents double-initialization. The remaining l_init_called usage in resize_scopes and update_scopes is just an optimization due to the use of scope_has_map, so it is not changed in this commit. The isupper check ensures that libc.so.6 is TLS is not reverted. Such a revert happens if l_need_tls_init is not cleared in _dl_allocate_tls_init for the main_thread case, now that l_init_called is not checked anymore in update_tls_slotinfo in elf/dl-open.c. Reported-by: Jonathon Anderson <janderson@rice.edu> Reviewed-by: Carlos O'Donell <carlos@redhat.com>