Bug 30697 - ppc32 mix of local-dynamic and global-dynamic TLS
Summary: ppc32 mix of local-dynamic and global-dynamic TLS
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: 2.40
: P2 normal
Target Milestone: ---
Assignee: Alan Modra
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-29 14:12 UTC by Sam James
Modified: 2024-07-22 15:03 UTC (History)
3 users (show)

See Also:
Host:
Target: powerpc-linux
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James 2023-07-29 14:12:38 UTC
Originally reported downstream in Gentoo at https://bugs.gentoo.org/909544 and then to Python at https://github.com/python/cpython/issues/106428.

* With GCC 12, Python 3.12 builds fine on ppc32. With GCC 13, Python 3.12 segfaults.
* Python 3.12 bisect: https://github.com/python/cpython/commit/f8abfa331421e2c28388c5162f37820495e3c2ee (they started using TLS)
* GCC 13 bisect: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1d561e1851c466a4952081caef17747781609b00 (ipa-visibility: Optimize TLS access)

The latter commit co-author, Alexander Monakov, helpfully analysed the bug downstream in Gentoo (https://bugs.gentoo.org/909544#c12):
>I can reproduce it. It's a BFD linker bug. With another linker (CC='gcc -fuse-ld=gold') it works fine. The linker seems to >mishandle one of these relocations when GOT is big:
> 
>        addis 3,3,_Py_tss_tstate@dtprel@ha
>        addi 3,3,_Py_tss_tstate@dtprel@l
> 
>Single-stepping in GDB shows that the resulting value of r3 differs from actual address of _Py_tss_tstate (as computed by >GDB, which in principle could also be wrong) by 0x1000.
> 
>You can reproduce it with any GCC version by specifying TLS model in python source explicitly:
> 
>diff --git a/Python/pystate.c b/Python/pystate.c
>index cdd975f..ef8b9f6 100644
>--- a/Python/pystate.c
>+++ b/Python/pystate.c
>@@ -63,6 +63,7 @@ extern "C" {
> 
> 
> #ifdef HAVE_THREAD_LOCAL
>+__attribute__((tls_model("local-dynamic")))
> _Py_thread_local PyThreadState *_Py_tss_tstate = NULL;
> #endif
> 
> 
>(or you can specify the "global-dynamic" model explicitly to paper over the linker bug with new GCC)

I'll upload a (large) reproducer shortly.
Comment 1 Sam James 2023-07-29 15:03:07 UTC
[If you want to reproduce this from scratch with Python 3.12: (export CFLAGS_NODIST="-O2 -fwrapv -mcpu=powerpc" ; ./configure --enable-shared && make -j$(nproc) CPPFLAGS= CFLAGS= LDFLAGS= ) should do it, but that's not necessary here.]

Reproducer tarball with ppc32 object files and a `reprod.sh` script (as well as already-broken libpython3.12.so + `python` binary): https://dev.gentoo.org/~sam/bugs/sourceware/30697/ppc-tls-bug.tar.xz.

Access is possible to the machine (it's a clean test env) if needed.
Comment 2 Alan Modra 2023-08-04 02:02:52 UTC
Looking at the broken libpython3.12.so, I see lots of code similar to the following
  2c64d4:       38 69 ff f8     addi    r3,r9,-8
  2c64d8:       48 0d 65 89     bl      39ca60 <00008000.got2.plt_pic32.__tls_get_addr_opt@@GLIBC_2.22+0x420>
  2c64dc:       3c 63 00 00     addis   r3,r3,0
  2c64e0:       38 63 80 04     addi    r3,r3,-32764

The addis,addi looks correct to me.  _Py_tss_tstate is at offset 4 in TLS, and the dtv offset is 0x8000.

0057fff0     0 OBJECT  LOCAL  DEFAULT   20 _GLOBAL_OFFSET_TABLE_

and

libpython3.12.so:     file format elf32-powerpc
Contents of section .got:
 57ffdc 00000044 00000000 00000000 00000000  ...D............
 57ffec 00000000 0057fedc 00000000 00000000  .....W..........

says that tls_get_addr_opt is being called with r3 = 57ffe8.  Looking at dynamic relocs shows a DTPMOD32/DTPREL32 pair at 57ffe0 (for a global dynamic access to _Py_tss_tstate as it happens), but nothing at 57ffe8.  So the stub calling tls_get_addr_opt sees a zero in the first word of the tls_index pair, and returns a zero offset from r2 (the thread pointer).  That's wrong (by 0x1000 in this particular testcase).

The first word of the tls_index pair at 57ffe8 should have a DTPMOD32 dynamic reloc since this is in a shared library and therefore the module index is unknown.  (In an executable the word could be initialised to 1.)
Comment 3 Sourceware Commits 2023-08-04 08:29:58 UTC
The master branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ae33771224660dac25e64c3f70943a17bfab7681

commit ae33771224660dac25e64c3f70943a17bfab7681
Author: Alan Modra <amodra@gmail.com>
Date:   Fri Aug 4 15:09:53 2023 +0930

    PR30697, ppc32 mix of local-dynamic and global-dynamic TLS
    
    This fixes miscounting of dynamic relocations on GOT entries when
    a) there are both local-dynamic and global-dynamic tls accesss for a
       given symbol, and
    b) the symbol is global with non-default visibility, and
    c) the __tls_get_addr calls aren't optimised away.
    
            PR 30697
    bfd/
            * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic
            reloc count.
    ld/
            * testsuite/ld-powerpc/tls32ldgd.d,
            * testsuite/ld-powerpc/tls32ldgd.s: New test.
            * testsuite/ld-powerpc/powerpc.exp: Run it.
Comment 4 Alan Modra 2023-08-04 08:39:51 UTC
Fixed mainline.  I'm going to cherry-pick the fix to 2.41 and 2.40 branches after running tests.
Comment 5 Sourceware Commits 2023-08-04 09:06:21 UTC
The binutils-2_41-branch branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8c05bf16a51e413dfd3da2e018cbcd32a2a8c0f3

commit 8c05bf16a51e413dfd3da2e018cbcd32a2a8c0f3
Author: Alan Modra <amodra@gmail.com>
Date:   Fri Aug 4 15:09:53 2023 +0930

    PR30697, ppc32 mix of local-dynamic and global-dynamic TLS
    
    This fixes miscounting of dynamic relocations on GOT entries when
    a) there are both local-dynamic and global-dynamic tls accesss for a
       given symbol, and
    b) the symbol is global with non-default visibility, and
    c) the __tls_get_addr calls aren't optimised away.
    
            PR 30697
    bfd/
            * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic
            reloc count.
    ld/
            * testsuite/ld-powerpc/tls32ldgd.d,
            * testsuite/ld-powerpc/tls32ldgd.s: New test.
            * testsuite/ld-powerpc/powerpc.exp: Run it.
    
    (cherry picked from commit ae33771224660dac25e64c3f70943a17bfab7681)
Comment 6 Sourceware Commits 2023-08-04 09:06:56 UTC
The binutils-2_40-branch branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=391fd4d9ee5d2b78244cbcd57fc405738359b70b

commit 391fd4d9ee5d2b78244cbcd57fc405738359b70b
Author: Alan Modra <amodra@gmail.com>
Date:   Fri Aug 4 15:09:53 2023 +0930

    PR30697, ppc32 mix of local-dynamic and global-dynamic TLS
    
    This fixes miscounting of dynamic relocations on GOT entries when
    a) there are both local-dynamic and global-dynamic tls accesss for a
       given symbol, and
    b) the symbol is global with non-default visibility, and
    c) the __tls_get_addr calls aren't optimised away.
    
            PR 30697
    bfd/
            * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic
            reloc count.
    ld/
            * testsuite/ld-powerpc/tls32ldgd.d,
            * testsuite/ld-powerpc/tls32ldgd.s: New test.
            * testsuite/ld-powerpc/powerpc.exp: Run it.
    
    (cherry picked from commit ae33771224660dac25e64c3f70943a17bfab7681)
Comment 7 Sourceware Commits 2023-08-04 09:43:39 UTC
The binutils-2_39-branch branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=a648fe307354235f2460e004c164f99e4ad297f6

commit a648fe307354235f2460e004c164f99e4ad297f6
Author: Alan Modra <amodra@gmail.com>
Date:   Fri Aug 4 15:09:53 2023 +0930

    PR30697, ppc32 mix of local-dynamic and global-dynamic TLS
    
    This fixes miscounting of dynamic relocations on GOT entries when
    a) there are both local-dynamic and global-dynamic tls accesss for a
       given symbol, and
    b) the symbol is global with non-default visibility, and
    c) the __tls_get_addr calls aren't optimised away.
    
            PR 30697
    bfd/
            * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic
            reloc count.
    ld/
            * testsuite/ld-powerpc/tls32ldgd.d,
            * testsuite/ld-powerpc/tls32ldgd.s: New test.
            * testsuite/ld-powerpc/powerpc.exp: Run it.
    
    (cherry picked from commit ae33771224660dac25e64c3f70943a17bfab7681)
Comment 8 Sam James 2023-08-04 10:23:24 UTC
Very grateful Alan, thank you!

Confirmed that the original unreduced testcase works well too.
Comment 9 Sourceware Commits 2023-08-04 11:03:46 UTC
The binutils-2_38-branch branch has been updated by Alan Modra <amodra@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ea5fe5d01e5a182ee7a0eddb54a702109a9f5931

commit ea5fe5d01e5a182ee7a0eddb54a702109a9f5931
Author: Alan Modra <amodra@gmail.com>
Date:   Fri Aug 4 15:09:53 2023 +0930

    PR30697, ppc32 mix of local-dynamic and global-dynamic TLS
    
    This fixes miscounting of dynamic relocations on GOT entries when
    a) there are both local-dynamic and global-dynamic tls accesss for a
       given symbol, and
    b) the symbol is global with non-default visibility, and
    c) the __tls_get_addr calls aren't optimised away.
    
            PR 30697
    bfd/
            * elf32-ppc.c (allocate_dynrelocs): Correct local-dynamic
            reloc count.
    ld/
            * testsuite/ld-powerpc/tls32ldgd.d,
            * testsuite/ld-powerpc/tls32ldgd.s: New test.
            * testsuite/ld-powerpc/powerpc.exp: Run it.
    
    (cherry picked from commit ae33771224660dac25e64c3f70943a17bfab7681)