Originally reported downstream in Gentoo at https://bugs.gentoo.org/909544 and then to Python at https://github.com/python/cpython/issues/106428. * With GCC 12, Python 3.12 builds fine on ppc32. With GCC 13, Python 3.12 segfaults. * Python 3.12 bisect: https://github.com/python/cpython/commit/f8abfa331421e2c28388c5162f37820495e3c2ee (they started using TLS) * GCC 13 bisect: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1d561e1851c466a4952081caef17747781609b00 (ipa-visibility: Optimize TLS access) The latter commit co-author, Alexander Monakov, helpfully analysed the bug downstream in Gentoo (https://bugs.gentoo.org/909544#c12): >I can reproduce it. It's a BFD linker bug. With another linker (CC='gcc -fuse-ld=gold') it works fine. The linker seems to >mishandle one of these relocations when GOT is big: > > addis 3,3,_Py_tss_tstate@dtprel@ha > addi 3,3,_Py_tss_tstate@dtprel@l > >Single-stepping in GDB shows that the resulting value of r3 differs from actual address of _Py_tss_tstate (as computed by >GDB, which in principle could also be wrong) by 0x1000. > >You can reproduce it with any GCC version by specifying TLS model in python source explicitly: > >diff --git a/Python/pystate.c b/Python/pystate.c >index cdd975f..ef8b9f6 100644 >--- a/Python/pystate.c >+++ b/Python/pystate.c >@@ -63,6 +63,7 @@ extern "C" { > > > #ifdef HAVE_THREAD_LOCAL >+__attribute__((tls_model("local-dynamic"))) > _Py_thread_local PyThreadState *_Py_tss_tstate = NULL; > #endif > > >(or you can specify the "global-dynamic" model explicitly to paper over the linker bug with new GCC) I'll upload a (large) reproducer shortly.
[If you want to reproduce this from scratch with Python 3.12: (export CFLAGS_NODIST="-O2 -fwrapv -mcpu=powerpc" ; ./configure --enable-shared && make -j$(nproc) CPPFLAGS= CFLAGS= LDFLAGS= ) should do it, but that's not necessary here.] Reproducer tarball with ppc32 object files and a `reprod.sh` script (as well as already-broken libpython3.12.so + `python` binary): https://dev.gentoo.org/~sam/bugs/sourceware/30697/ppc-tls-bug.tar.xz. Access is possible to the machine (it's a clean test env) if needed.
Looking at the broken libpython3.12.so, I see lots of code similar to the following 2c64d4: 38 69 ff f8 addi r3,r9,-8 2c64d8: 48 0d 65 89 bl 39ca60 <00008000.got2.plt_pic32.__tls_get_addr_opt@@GLIBC_2.22+0x420> 2c64dc: 3c 63 00 00 addis r3,r3,0 2c64e0: 38 63 80 04 addi r3,r3,-32764 The addis,addi looks correct to me. _Py_tss_tstate is at offset 4 in TLS, and the dtv offset is 0x8000. 0057fff0 0 OBJECT LOCAL DEFAULT 20 _GLOBAL_OFFSET_TABLE_ and libpython3.12.so: file format elf32-powerpc Contents of section .got: 57ffdc 00000044 00000000 00000000 00000000 ...D............ 57ffec 00000000 0057fedc 00000000 00000000 .....W.......... says that tls_get_addr_opt is being called with r3 = 57ffe8. Looking at dynamic relocs shows a DTPMOD32/DTPREL32 pair at 57ffe0 (for a global dynamic access to _Py_tss_tstate as it happens), but nothing at 57ffe8. So the stub calling tls_get_addr_opt sees a zero in the first word of the tls_index pair, and returns a zero offset from r2 (the thread pointer). That's wrong (by 0x1000 in this particular testcase). The first word of the tls_index pair at 57ffe8 should have a DTPMOD32 dynamic reloc since this is in a shared library and therefore the module index is unknown. (In an executable the word could be initialised to 1.)
The master branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ae33771224660dac25e64c3f70943a17bfab7681 commit ae33771224660dac25e64c3f70943a17bfab7681 Author: Alan Modra <amodra@gmail.com> Date: Fri Aug 4 15:09:53 2023 +0930 PR30697, ppc32 mix of local-dynamic and global-dynamic TLS This fixes miscounting of dynamic relocations on GOT entries when a) there are both local-dynamic and global-dynamic tls accesss for a given symbol, and b) the symbol is global with non-default visibility, and c) the __tls_get_addr calls aren't optimised away. PR 30697 bfd/ * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic reloc count. ld/ * testsuite/ld-powerpc/tls32ldgd.d, * testsuite/ld-powerpc/tls32ldgd.s: New test. * testsuite/ld-powerpc/powerpc.exp: Run it.
Fixed mainline. I'm going to cherry-pick the fix to 2.41 and 2.40 branches after running tests.
The binutils-2_41-branch branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8c05bf16a51e413dfd3da2e018cbcd32a2a8c0f3 commit 8c05bf16a51e413dfd3da2e018cbcd32a2a8c0f3 Author: Alan Modra <amodra@gmail.com> Date: Fri Aug 4 15:09:53 2023 +0930 PR30697, ppc32 mix of local-dynamic and global-dynamic TLS This fixes miscounting of dynamic relocations on GOT entries when a) there are both local-dynamic and global-dynamic tls accesss for a given symbol, and b) the symbol is global with non-default visibility, and c) the __tls_get_addr calls aren't optimised away. PR 30697 bfd/ * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic reloc count. ld/ * testsuite/ld-powerpc/tls32ldgd.d, * testsuite/ld-powerpc/tls32ldgd.s: New test. * testsuite/ld-powerpc/powerpc.exp: Run it. (cherry picked from commit ae33771224660dac25e64c3f70943a17bfab7681)
The binutils-2_40-branch branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=391fd4d9ee5d2b78244cbcd57fc405738359b70b commit 391fd4d9ee5d2b78244cbcd57fc405738359b70b Author: Alan Modra <amodra@gmail.com> Date: Fri Aug 4 15:09:53 2023 +0930 PR30697, ppc32 mix of local-dynamic and global-dynamic TLS This fixes miscounting of dynamic relocations on GOT entries when a) there are both local-dynamic and global-dynamic tls accesss for a given symbol, and b) the symbol is global with non-default visibility, and c) the __tls_get_addr calls aren't optimised away. PR 30697 bfd/ * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic reloc count. ld/ * testsuite/ld-powerpc/tls32ldgd.d, * testsuite/ld-powerpc/tls32ldgd.s: New test. * testsuite/ld-powerpc/powerpc.exp: Run it. (cherry picked from commit ae33771224660dac25e64c3f70943a17bfab7681)
The binutils-2_39-branch branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=a648fe307354235f2460e004c164f99e4ad297f6 commit a648fe307354235f2460e004c164f99e4ad297f6 Author: Alan Modra <amodra@gmail.com> Date: Fri Aug 4 15:09:53 2023 +0930 PR30697, ppc32 mix of local-dynamic and global-dynamic TLS This fixes miscounting of dynamic relocations on GOT entries when a) there are both local-dynamic and global-dynamic tls accesss for a given symbol, and b) the symbol is global with non-default visibility, and c) the __tls_get_addr calls aren't optimised away. PR 30697 bfd/ * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic reloc count. ld/ * testsuite/ld-powerpc/tls32ldgd.d, * testsuite/ld-powerpc/tls32ldgd.s: New test. * testsuite/ld-powerpc/powerpc.exp: Run it. (cherry picked from commit ae33771224660dac25e64c3f70943a17bfab7681)
Very grateful Alan, thank you! Confirmed that the original unreduced testcase works well too.
The binutils-2_38-branch branch has been updated by Alan Modra <amodra@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ea5fe5d01e5a182ee7a0eddb54a702109a9f5931 commit ea5fe5d01e5a182ee7a0eddb54a702109a9f5931 Author: Alan Modra <amodra@gmail.com> Date: Fri Aug 4 15:09:53 2023 +0930 PR30697, ppc32 mix of local-dynamic and global-dynamic TLS This fixes miscounting of dynamic relocations on GOT entries when a) there are both local-dynamic and global-dynamic tls accesss for a given symbol, and b) the symbol is global with non-default visibility, and c) the __tls_get_addr calls aren't optimised away. PR 30697 bfd/ * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic reloc count. ld/ * testsuite/ld-powerpc/tls32ldgd.d, * testsuite/ld-powerpc/tls32ldgd.s: New test. * testsuite/ld-powerpc/powerpc.exp: Run it. (cherry picked from commit ae33771224660dac25e64c3f70943a17bfab7681)