Many users have reported that Mesa 22 causes GTK4 programs to segfault on start with the crocus driver under X. This is caused by _dl_tlsdesc_dynamic returning a large negative value, which when added to FS results in a value slightly above zero. Unfortunately, I have been unable to find any reduced test case, even after testing many sequences of dlopen and dlclose calls. Additionally, the issue reportedly does not affect Fedora Linux. The shortest reproduction steps I have found are: 1. Use a GPU supported by the Mesa crocus driver; all i915 through Haswell Intel GPUs should be supported. 2. podman run -it --privileged --net=host -v /tmp/.X11-unix:/tmp/.X11-unix --rm archlinux. It may be possible to use docker instead of podman; I did not test this. Alternatively, install Arch Linux. 3. Run: pacman -Syu pacman -U https://archive.archlinux.org/packages/m/mesa/mesa-22.0.1-2-x86_64.pkg.tar.zst pacman -S gnome-chess useradd -m -u 1000 -g 1000 user # set to your host UID/GID su - user DISPLAY=:0 gnome-chess This should segfault in lookup_opcode_desc with a small pointer dereference which was computed in brw_opcode_desc from a call to _dl_tlsdesc_dynamic. Unfortunately, there are no symbols for Mesa, but there are dynamic symbols for glibc. Debug output: $ DISPLAY=:0 gdb gnome-chess [ ... ] (gdb) b _dl_tlsdesc_dynamic Breakpoint 1 at 0x7ffff7fdc600 (gdb) r [ ... ] Thread 1 "gnome-chess" hit Breakpoint 1, 0x00007ffff7fdc600 in _dl_tlsdesc_dynamic () from /lib64/ld-linux-x86-64.so.2 (gdb) info reg rax 0x7fffebe36898 140737150937240 rbx 0x7fffffff7e60 140737488322144 rcx 0x0 0 rdx 0x7fffffff81a0 140737488322976 rsi 0x555556426d00 93825007774976 rdi 0x7fffffff8190 140737488322960 rbp 0x7fffffff8310 0x7fffffff8310 rsp 0x7fffffff7e58 0x7fffffff7e58 r8 0x5555561fec90 93825005513872 r9 0x1 1 r10 0x7fffebb248c0 140737147717824 r11 0x91df835f16e6916b -7935479573974183573 r12 0x5555564c0a60 93825008405088 r13 0x0 0 r14 0x7fffffff87b0 140737488324528 r15 0x7fffffff82c0 140737488323264 rip 0x7ffff7fdc600 0x7ffff7fdc600 <_dl_tlsdesc_dynamic> eflags 0x202 [ IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) fin Run till exit from #0 0x00007ffff7fdc600 in _dl_tlsdesc_dynamic () from /lib64/ld-linux-x86-64.so.2 [Thread 0x7fffe9571640 (LWP 921) exited] [Thread 0x7fffe9d72640 (LWP 920) exited] 0x00007fffeb3d8979 in ?? () from /usr/lib/dri/crocus_dri.so (gdb) p $rax $2 = -140737272884288 (gdb) p $rax+$fs_base $3 = 0 Exporting LD_PRELOAD=/usr/local/lib/dri/crocus_dri.so avoids the crash, as static TLS is used in that case. More interestingly, exporting LD_PRELOAD=/usr/lib/libLLVM.so also avoids the crash, despite still using dynamic TLS. Exporting LD_BIND_NOW=1 does not avoid the crash.
Is there a way to get glibc debugging information on archlinux? I enabled debuginfod, and it downloaded some debugging information, but not for glibc. It looks like the fast path in _dl_tlsdesc_dynamic is taken, and I need to check what the data structures look like.
(In reply to Florian Weimer from comment #1) > Is there a way to get glibc debugging information on archlinux? I enabled > debuginfod, and it downloaded some debugging information, but not for glibc. > > It looks like the fast path in _dl_tlsdesc_dynamic is taken, and I need to > check what the data structures look like. Thanks for looking into this! As far as I know, Arch Linux currently doesn't have any public debug symbols for the distro-packaged glibc. If you're more familiar with Ubuntu, that may be preferable. It was originally reported on Ubuntu, but I had some issues installing old Mesa packages on Ubuntu, whereas it is a single command on Arch. I think Ubuntu has debug symbols for glibc though. If you'd like to continue using Arch, following these steps should build and install a standard glibc package with debug symbols: sed -i -e '/^BUILDENV=/s/check/!check/' -e '/^OPTIONS=/s/!debug/debug/' -e 's/^#MAKEFLAGS="-j2"$/MAKEFLAGS="-j'$(nproc)'"' /etc/makepkg.conf pacman -S base-devel asp sudo sed -i -e 's/# %wheel ALL=(ALL:ALL) NOPASSWD: ALL/%wheel ALL=(ALL:ALL) NOPASSWD: ALL/' /etc/sudoers su - user asp checkout glibc cd glibc/trunk gpg --recv-keys 16792B4EA25340F8 makepkg -si I tested approximately this method and was able to reproduce the issue on bare metal. Alternatively, it may be possible to manually install glibc with ./configure; make; make install. I didn't test this method; it may be necessary to source /etc/makepkg.conf; export CFLAGS LDFLAGS in order to reproduce the issue.
This has come up at https://bugzilla.redhat.com/show_bug.cgi?id=2251557 in the context of Asahi Linux (porting Linux & userland to the apple arm macs). See https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/7199 as well.
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3921c5b40f293c57cb326f58713c924b0662ef59 commit 3921c5b40f293c57cb326f58713c924b0662ef59 Author: Hector Martin <marcan@marcan.st> Date: Tue Nov 28 15:23:07 2023 +0900 elf: Fix TLS modid reuse generation assignment (BZ 29039) _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does *not* do anything to the generation counter. The first time this happens, the generation is zero and map_generation() returns the current generation to be used during relocation processing. However, if a slotinfo entry is later reused, it will already have a generation assigned. If this generation has fallen behind the current global max generation, then this causes an obsolete generation to be assigned during relocation processing, as map_generation() returns this generation if nonzero. _dl_add_to_slotinfo() eventually resets the generation, but by then it is too late. This causes DTV updates to be skipped, leading to NULL or broken TLS slot pointers and segfaults. Fix this by resetting the generation to zero in _dl_assign_tls_modid(), so it behaves the same as the first time a slot is assigned. _dl_add_to_slotinfo() will still assign the correct static generation later during module load, but relocation processing will no longer use an obsolete generation. Note that slotinfo entry (aka modid) reuse typically happens after a dlclose and only TLS access via dynamic tlsdesc is affected. Because tlsdesc is optimized to use the optional part of static TLS, dynamic tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls tunable to a large enough value, or by LD_PRELOAD-ing the affected modules. Fixes bug 29039. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
The master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=980450f12685326729d63ff72e93a996113bf073 commit 980450f12685326729d63ff72e93a996113bf073 Author: Szabolcs Nagy <szabolcs.nagy@arm.com> Date: Wed Nov 29 11:31:37 2023 +0000 elf: Add TLS modid reuse test for bug 29039 This is a minimal regression test for bug 29039 which only affects targets with TLSDESC and a reproducer requires that 1) Have modid gaps (closed modules) with old generation. 2) Update a DTV to a newer generation (needs a newer dlopen). 3) But do not update the closed gap entry in that DTV. 4) Reuse the modid gap for a new module (another dlopen). 5) Use dynamic TLSDESC in that new module with old generation (bug). 6) Access TLS via this TLSDESC and the now outdated DTV. However step (3) in practice rarely happens: during DTV update the entries for closed modids are initialized to "unallocated" and then dynamic TLSDESC calls __tls_get_addr independently of its generation. The only exception to this is DTV setup at thread creation (gaps are initialized to NULL instead of unallocated) or DTV resize where the gap entries are outside the previous DTV array (again NULL instead of unallocated, and this requires loading > DTV_SURPLUS modules). So the bug can only cause NULL (+ offset) dereference, not use after free. And the easiest way to get (3) is via thread creation. Note that step (5) requires that the newly loaded module has larger TLS than the remaining optional static TLS. And for (6) there cannot be other TLS access or dlopen in the thread that updates the DTV. Tested on aarch64-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Please cherry-pick to 2.38 at least (along with the test commit). Anyway, closing given this is fixed for 2.39.
The release/2.38/master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ccdc4cba07684fe1397e1f5f134a0a827af98c04 commit ccdc4cba07684fe1397e1f5f134a0a827af98c04 Author: Hector Martin <marcan@marcan.st> Date: Tue Nov 28 15:23:07 2023 +0900 elf: Fix TLS modid reuse generation assignment (BZ 29039) _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does *not* do anything to the generation counter. The first time this happens, the generation is zero and map_generation() returns the current generation to be used during relocation processing. However, if a slotinfo entry is later reused, it will already have a generation assigned. If this generation has fallen behind the current global max generation, then this causes an obsolete generation to be assigned during relocation processing, as map_generation() returns this generation if nonzero. _dl_add_to_slotinfo() eventually resets the generation, but by then it is too late. This causes DTV updates to be skipped, leading to NULL or broken TLS slot pointers and segfaults. Fix this by resetting the generation to zero in _dl_assign_tls_modid(), so it behaves the same as the first time a slot is assigned. _dl_add_to_slotinfo() will still assign the correct static generation later during module load, but relocation processing will no longer use an obsolete generation. Note that slotinfo entry (aka modid) reuse typically happens after a dlclose and only TLS access via dynamic tlsdesc is affected. Because tlsdesc is optimized to use the optional part of static TLS, dynamic tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls tunable to a large enough value, or by LD_PRELOAD-ing the affected modules. Fixes bug 29039. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 3921c5b40f293c57cb326f58713c924b0662ef59)
The release/2.38/master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0de9082ed8d8f149ca87d569a73692046e236c18 commit 0de9082ed8d8f149ca87d569a73692046e236c18 Author: Szabolcs Nagy <szabolcs.nagy@arm.com> Date: Wed Nov 29 11:31:37 2023 +0000 elf: Add TLS modid reuse test for bug 29039 This is a minimal regression test for bug 29039 which only affects targets with TLSDESC and a reproducer requires that 1) Have modid gaps (closed modules) with old generation. 2) Update a DTV to a newer generation (needs a newer dlopen). 3) But do not update the closed gap entry in that DTV. 4) Reuse the modid gap for a new module (another dlopen). 5) Use dynamic TLSDESC in that new module with old generation (bug). 6) Access TLS via this TLSDESC and the now outdated DTV. However step (3) in practice rarely happens: during DTV update the entries for closed modids are initialized to "unallocated" and then dynamic TLSDESC calls __tls_get_addr independently of its generation. The only exception to this is DTV setup at thread creation (gaps are initialized to NULL instead of unallocated) or DTV resize where the gap entries are outside the previous DTV array (again NULL instead of unallocated, and this requires loading > DTV_SURPLUS modules). So the bug can only cause NULL (+ offset) dereference, not use after free. And the easiest way to get (3) is via thread creation. Note that step (5) requires that the newly loaded module has larger TLS than the remaining optional static TLS. And for (6) there cannot be other TLS access or dlopen in the thread that updates the DTV. Tested on aarch64-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 980450f12685326729d63ff72e93a996113bf073)
The release/2.37/master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=874d4186975560fb79d5ebd46a4f378a2e3f7657 commit 874d4186975560fb79d5ebd46a4f378a2e3f7657 Author: Hector Martin <marcan@marcan.st> Date: Tue Nov 28 15:23:07 2023 +0900 elf: Fix TLS modid reuse generation assignment (BZ 29039) _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does *not* do anything to the generation counter. The first time this happens, the generation is zero and map_generation() returns the current generation to be used during relocation processing. However, if a slotinfo entry is later reused, it will already have a generation assigned. If this generation has fallen behind the current global max generation, then this causes an obsolete generation to be assigned during relocation processing, as map_generation() returns this generation if nonzero. _dl_add_to_slotinfo() eventually resets the generation, but by then it is too late. This causes DTV updates to be skipped, leading to NULL or broken TLS slot pointers and segfaults. Fix this by resetting the generation to zero in _dl_assign_tls_modid(), so it behaves the same as the first time a slot is assigned. _dl_add_to_slotinfo() will still assign the correct static generation later during module load, but relocation processing will no longer use an obsolete generation. Note that slotinfo entry (aka modid) reuse typically happens after a dlclose and only TLS access via dynamic tlsdesc is affected. Because tlsdesc is optimized to use the optional part of static TLS, dynamic tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls tunable to a large enough value, or by LD_PRELOAD-ing the affected modules. Fixes bug 29039. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 3921c5b40f293c57cb326f58713c924b0662ef59)
The release/2.36/master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=882a991620fcf2ecb3f623e2d29ac551b33bd6ee commit 882a991620fcf2ecb3f623e2d29ac551b33bd6ee Author: Hector Martin <marcan@marcan.st> Date: Tue Nov 28 15:23:07 2023 +0900 elf: Fix TLS modid reuse generation assignment (BZ 29039) _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does *not* do anything to the generation counter. The first time this happens, the generation is zero and map_generation() returns the current generation to be used during relocation processing. However, if a slotinfo entry is later reused, it will already have a generation assigned. If this generation has fallen behind the current global max generation, then this causes an obsolete generation to be assigned during relocation processing, as map_generation() returns this generation if nonzero. _dl_add_to_slotinfo() eventually resets the generation, but by then it is too late. This causes DTV updates to be skipped, leading to NULL or broken TLS slot pointers and segfaults. Fix this by resetting the generation to zero in _dl_assign_tls_modid(), so it behaves the same as the first time a slot is assigned. _dl_add_to_slotinfo() will still assign the correct static generation later during module load, but relocation processing will no longer use an obsolete generation. Note that slotinfo entry (aka modid) reuse typically happens after a dlclose and only TLS access via dynamic tlsdesc is affected. Because tlsdesc is optimized to use the optional part of static TLS, dynamic tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls tunable to a large enough value, or by LD_PRELOAD-ing the affected modules. Fixes bug 29039. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 3921c5b40f293c57cb326f58713c924b0662ef59)
The release/2.35/master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5f08ec08d03930050befec16fcc6264fa00c66fe commit 5f08ec08d03930050befec16fcc6264fa00c66fe Author: Hector Martin <marcan@marcan.st> Date: Tue Nov 28 15:23:07 2023 +0900 elf: Fix TLS modid reuse generation assignment (BZ 29039) _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does *not* do anything to the generation counter. The first time this happens, the generation is zero and map_generation() returns the current generation to be used during relocation processing. However, if a slotinfo entry is later reused, it will already have a generation assigned. If this generation has fallen behind the current global max generation, then this causes an obsolete generation to be assigned during relocation processing, as map_generation() returns this generation if nonzero. _dl_add_to_slotinfo() eventually resets the generation, but by then it is too late. This causes DTV updates to be skipped, leading to NULL or broken TLS slot pointers and segfaults. Fix this by resetting the generation to zero in _dl_assign_tls_modid(), so it behaves the same as the first time a slot is assigned. _dl_add_to_slotinfo() will still assign the correct static generation later during module load, but relocation processing will no longer use an obsolete generation. Note that slotinfo entry (aka modid) reuse typically happens after a dlclose and only TLS access via dynamic tlsdesc is affected. Because tlsdesc is optimized to use the optional part of static TLS, dynamic tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls tunable to a large enough value, or by LD_PRELOAD-ing the affected modules. Fixes bug 29039. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 3921c5b40f293c57cb326f58713c924b0662ef59)
The release/2.34/master branch has been updated by Szabolcs Nagy <nsz@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f95fe7060895bfe28ea5bdf8de240e01c1dea097 commit f95fe7060895bfe28ea5bdf8de240e01c1dea097 Author: Hector Martin <marcan@marcan.st> Date: Tue Nov 28 15:23:07 2023 +0900 elf: Fix TLS modid reuse generation assignment (BZ 29039) _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does *not* do anything to the generation counter. The first time this happens, the generation is zero and map_generation() returns the current generation to be used during relocation processing. However, if a slotinfo entry is later reused, it will already have a generation assigned. If this generation has fallen behind the current global max generation, then this causes an obsolete generation to be assigned during relocation processing, as map_generation() returns this generation if nonzero. _dl_add_to_slotinfo() eventually resets the generation, but by then it is too late. This causes DTV updates to be skipped, leading to NULL or broken TLS slot pointers and segfaults. Fix this by resetting the generation to zero in _dl_assign_tls_modid(), so it behaves the same as the first time a slot is assigned. _dl_add_to_slotinfo() will still assign the correct static generation later during module load, but relocation processing will no longer use an obsolete generation. Note that slotinfo entry (aka modid) reuse typically happens after a dlclose and only TLS access via dynamic tlsdesc is affected. Because tlsdesc is optimized to use the optional part of static TLS, dynamic tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls tunable to a large enough value, or by LD_PRELOAD-ing the affected modules. Fixes bug 29039. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 3921c5b40f293c57cb326f58713c924b0662ef59)