Bug 26566

Summary: nptl/tst-thread-exit-clobber fails on powerpc/powerpc64 with GCC 10.2
Product: glibc Reporter: Matheus Castanho <msc>
Component: nptlAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED MOVED    
Severity: normal CC: adhemerval.zanella, bergner, carlos, drepper.fsp, fweimer, jeevitha, jskumari, mmatti, tuliom
Priority: P2    
Version: unspecified   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115242
Host: Target:
Build: Last reconfirmed: 2024-01-22 00:00:00

Description Matheus Castanho 2020-09-02 13:52:46 UTC
FAIL: nptl/tst-thread-exit-clobber
original exit status 1
info: unsigned int, direct pthread_exit call
error: tst-thread-exit-clobber.cc:125: not true: value == magic_values_double.v4
error: tst-thread-exit-clobber.cc:122: not true: value == magic_values_double.v3
error: tst-thread-exit-clobber.cc:119: not true: value == magic_values_double.v2
error: tst-thread-exit-clobber.cc:116: not true: value == magic_values_double.v1
error: tst-thread-exit-clobber.cc:113: not true: value == magic_values_double.v0
error: 5 test failures

I can only reproduce this error on ppc and ppc64, not ppc64le. Also, it works with GCC 9.3, but fails with GCC 10.2
Comment 1 Adhemerval Zanella 2020-09-04 22:11:49 UTC
I can't reproduce it with gcc-10 from the build-many-glibcs.py on the gccfarm gcc203.  I also tried with -fstack-clash-protection (as indicated by the test itself where it was added due https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83641) but it also work as expected.

In any case, this seems most likely a compiler or libgcc issue, could you check if it is the case?
Comment 2 Tulio Magno Quites Machado Filho 2020-09-15 15:10:39 UTC
(In reply to Adhemerval Zanella from comment #1)
> I can't reproduce it with gcc-10 from the build-many-glibcs.py on the
> gccfarm gcc203.

I reproduced on gcc203 with a normal build using:
configure --prefix=/usr --with-cpu=power8 CC=gcc-10 CXX=g++-10
Comment 3 Carlos O'Donell 2024-01-22 15:51:22 UTC
This came up again when reviewing the test results for glibc 2.39.

The problem is still present.
Comment 4 Manjunath S Matti 2024-04-24 05:49:23 UTC
I observe this failure even with gcc version 13.2.0, but only when glibc is configured with either --with-cpu=power8 or --with-cpu=power9. But when I simply configure glibc without mention of any specific Architecture like 

../glibc/configure --enable-debug --prefix=/usr then I am unable to reproduce this failure.

I suspect that the code generated is causing the issue

107 void
108 check_magic (int index, double value)
109 {
110   switch (index)
111     {
112     case 0:
113       TEST_VERIFY (value == magic_values_double.v0);

Assembly :

  5d4:  7c 69 1b 78     mr      r9,r3
- 5d8:  7f 64 db 78     mr      r4,r27
+ 5d8:  7b 64 00 20     clrldi  r4,r27,32 -->generating this code fixes the issue
  5dc:  38 60 ff ff     li      r3,-1
Comment 5 Manjunath S Matti 2024-05-08 06:15:04 UTC
I have some more info here:

I have run make check on both x86 and PPC64le, this particular testcase is marked as
UNSUPPORTED: nptl/tst-thread-exit-clobber


Also this testcase compares 2 doubles is this ok ?

106 __attribute__ ((noinline, noclone, weak))
107 void
108 check_magic (int index, double value)
109 {
110   switch (index)
111     {
112     case 0:
113       TEST_VERIFY (value == magic_values_double.v0); -> here
114       break;
Comment 6 Florian Weimer 2024-05-08 09:58:48 UTC
(In reply to Manjunath S Matti from comment #5)
> I have some more info here:
> 
> I have run make check on both x86 and PPC64le, this particular testcase is
> marked as
> UNSUPPORTED: nptl/tst-thread-exit-clobber

I think this happens because it's a C++ tested, and you either do not have a g++ installed, or libstdc++-static is missing (yes, we should probably fixed that, and not disable all C++ tests if C++ static linking is unsupported).

> Also this testcase compares 2 doubles is this ok ?
> 
> 106 __attribute__ ((noinline, noclone, weak))
> 107 void
> 108 check_magic (int index, double value)
> 109 {
> 110   switch (index)
> 111     {
> 112     case 0:
> 113       TEST_VERIFY (value == magic_values_double.v0); -> here
> 114       break;

Are you asking because of a general rule that says not to compare floating point numbers for equality? In this case, we want to make sure that the test case preserves the bit pattern exactly.
Comment 7 Jeevitha P. 2024-05-17 14:08:57 UTC
This test case fails with glibc built using the following configuration:

configure --enable-debug --with-cpu=power8 --prefix=/usr

High-Level Information:

The issue lies in the callee-saved registers, which are not properly restored by `__unwind_resume` from `libgcc_s.so.1`.

Test Case Flow:[Skipped integer cases since we don't have issues there]

In this test case, the `threadfunc` function calls `call_pthread_exit` under certain conditions, which eventually calls `pthread_exit`.

 1. A thread is created and it calls `threadfunc` with an initial structure containing five values.
 2. In threadfunc, the `checker` constructor is called five times to set those initial values.
 3. Based on a condition, either `call_pthread_exit` or `pthread_exit` is called. The failure occurs when `call_pthread_exit` is called.
 4. `call_pthread_exit` creates a new structure with five new values and passes it to `call_pthread_exit_1` where  the `checker` is constructor called again five more times with these new values. 
 5. `pthread_exit` is then called, which tries to destroy the objects created by this thread. Since the thread called the `checker` constructor ten times, it will call the corresponding destructor ten times to destroy them:
       First, it destroys the five objects in call_pthread_exit_1.
       Then, it destroys the five objects in threadfunc.
 6. The destructors check that the values passed during the function calls matches the original struct values.

Assembly Perspective:

 1. In `threadfunc`, the structure values are stored in vector registers (vs59-vs63) during the constructor calls.
 2. When `call_pthread_exit_1` is called, the callee-saved registers (v27-v31), which overlap with (vs59-vs63), are saved in the stack. Then the same registers (vs59-vs63) are then used to load the new structure values during the constructor calls in `call_pthread_exit_1`.
 4. After that, `pthread_exit` is called, which invokes the destructor. 
 5. After all destructors are called in `call_pthread_exit_1`, `__unwind_resume` is called to unwind the stack to restore the callee-saved registers. However, `__unwind_resume` does not restore the vector registers, so they do not have the original values which was set in `threadfunc`.
 6. When the destructors for the checker objects in `threadfunc` are called, the values are overridden because of the same register usage in `call_pthread_exit_1`. The checker destructor fails because the values do not match.


If we disable vector instruction generation using -mno-vsx for the test case, the issue does not occur. This is because the floating point registers, used instead of vector registers, which are properly restored after the destructor in `__unwind_resume`.


Backtrace for `_Unwind_resume` call from glibc,

gdb) bt
#2  0x00007ffff7b7132c in ._Unwind_Resume () from /lib/powerpc64-linux-gnu/libgcc_s.so.1
#3  0x00007ffff7f324c4 in ?? ()//call_pthread_exit_1
#4  0x00007ffff7f32558 in ?? ()
#5  0x00007ffff7f325e4 in ?? ()
#6  0x00007ffff79bc958 in start_thread (arg=0x7ffff784f100) at pthread_create.c:447
#7  0x00007ffff7a55dcc in .LY__clone3 () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone3.S:114
Comment 8 Florian Weimer 2024-05-17 16:28:19 UTC
I suspect this happens if the libgcc unwinder has been built in such a way that it doesn't require a CPU with VSX support. In that case, it's not safe to build code with -mcpu=power8 if it catches exceptions, as the test shows.

I really don't think this is a glibc bug. The libgcc unwinder should have conditional vector register processing even if built with -mno-vsx.
Comment 9 Surya Kumari J 2024-05-19 18:21:13 UTC
Adding some more details:

  In this testcase, we have the following call chain:
threadfunc()->call_pthread_exit()->call_pthread_exit1()->pthread_exit()

  The threadfunc() function is passed a struct containing 5 values. 5 local objects of type “class checker” are created in threadfunc() and the instance variable in each object is initialized with a value from the struct passed as parameter. The values in the struct are stored in vector registers vs59-vs63 during the constructor calls.

  The call_pthread_exit1() routine too is passed a struct containing 5 values (these values are different from those in the struct passed to threadfunc()). The prolog in this routine stores the registers v27-v31 as these are non-volatile registers. Note that v27-v31 overlap with vs59-vs63. This routine creates 5 local objects of type ‘class checker’. The instance variable in each object is initialized with a value from the struct passed as parameter. The values in the struct are stored in vector registers vs59-vs63 during the constructor calls.

  The destructor for ‘class checker’ has ‘inline’ attribute and hence we have code to destroy the objects inlined into threadfunc() and call_pthread_exit1().
4. The destructor checks if the instance variable in the object is the same as what it was initialized to, and the test fails if the value is not the same. Since the destructor code is inlined, the value of the instance variable is compared with the value in the vector register.

  While the checks in the destructors pass in call_pthread_exit1(), the checks fail in threadfunc(). In threadfunc(), the vector registers do not contain the expected values.

The assembly code for call_pthread_exit1 looks as follows:

call_pthread_exit1() {
     prolog code (contains code to save non-volatile vector registers)
     code to copy values from ‘struct parameter’ to vector registers
     code to call constructors
     call to pthread_exit()
  landing pad:
     inlined destructor code which reads vector registers
     call to _Unwind_Resume()
}

Assembly code for threadfunc():

threadfunc() {
     code to copy values from ‘struct parameter’ to vector registers
     code to call constructors
  landing pad:
     inlined destructor code which reads vector registers
}

  Pthread_exit() has to destroy all the objects created in the call chain. So the landing pad code in call_pthread_exit1() is executed to destroy the objects created in call_pthread_exit1(). After the destructors finish executing, _Unwind_Resume is called because we have to unwind the stack.

  When the stack is unwound and we reach the landing pad in threadfunc(), the vector registers no longer contain the correct values.  This is because the vector registers have been rewritten in call_pthread_exit1(). Of course, call_pthread_exit1()’s prolog saves these registers on stack before writing to them. When we unwind and go from call_pthread_exit1()’s frame to the previous frame (call_pthread_exit()), these vector registers have to be restored to their original values from the stack.

  The issue can either be in unwinding (unwind is not restoring vector registers correctly when unwinding the stack frames) or in gcc (gcc does not produce correct unwind info thereby leading to incorrect unwinding) .
Comment 10 Florian Weimer 2024-05-19 19:46:58 UTC
Please check the disassembly of _Unwind_RaiseException in libgcc_s.so.1 to check whether it contains code to save vector registers, perhaps like this:

gdb -batch -ex "disassemble/s _Unwind_RaiseException" /lib64/libgcc_s.so.1

If GCC defaults to POWER8 instructions, it looks like this:

Dump of assembler code for function _Unwind_RaiseException:
../../../libgcc/unwind.inc:
87      {
   0x000000000000c610 <+0>:     addis   r2,r12,3
   0x000000000000c614 <+4>:     addi    r2,r2,-18704
   0x000000000000c618 <+8>:     mr      r0,r1
   0x000000000000c61c <+12>:    stdu    r1,-4096(r1)
   0x000000000000c620 <+16>:    stdu    r0,-528(r1)
   0x000000000000c624 <+20>:    mflr    r0
   0x000000000000c628 <+24>:    stfd    f14,4480(r1)
   0x000000000000c62c <+28>:    stfd    f15,4488(r1)
   0x000000000000c630 <+32>:    stfd    f16,4496(r1)
   0x000000000000c634 <+36>:    stfd    f17,4504(r1)
   0x000000000000c638 <+40>:    stfd    f18,4512(r1)
[…]
   0x000000000000c6cc <+188>:   beq     0xc6d4 <_Unwind_RaiseException+196>
   0x000000000000c6d0 <+192>:   std     r2,4648(r1)

88        struct _Unwind_Context this_context, cur_context;
89        _Unwind_Reason_Code code;
90        unsigned long frames;
91      
92        /* Set up this_context to describe the current stack frame.  */
93        uw_init_context (&this_context);
   0x000000000000c6d4 <+196>:   mfcr    r0
   0x000000000000c6d8 <+200>:   mflr    r5
   0x000000000000c6dc <+204>:   addi    r27,r1,3000
   0x000000000000c6e0 <+208>:   addi    r4,r1,4624
   0x000000000000c6e4 <+212>:   mr      r28,r3
   0x000000000000c6e8 <+216>:   mr      r3,r27
   0x000000000000c6ec <+220>:   addi    r30,r1,1920
   0x000000000000c6f0 <+224>:   addi    r29,r1,32
   0x000000000000c6f4 <+228>:   stw     r0,4088(r1)
   0x000000000000c6f8 <+232>:   stw     r0,4096(r1)
   0x000000000000c6fc <+236>:   stw     r0,4104(r1)
   0x000000000000c700 <+240>:   li      r0,4144
   0x000000000000c704 <+244>:   stvx    v20,r1,r0
   0x000000000000c708 <+248>:   li      r0,4160
   0x000000000000c70c <+252>:   stvx    v21,r1,r0
[…]
Comment 11 Jeevitha P. 2024-05-21 07:43:46 UTC
I was able to run this test on a little-endian machine, and it did not fail there.

In _Unwind_Resume() from /lib/powerpc64le-linux-gnu/libgcc_s.so.1, I can see that the vector (VSX) non-volatile registers are restored properly. Below is the assembly for that: 

000000000000ce60 <_Unwind_Resume_or_Rethrow@@GCC_3.3>:
    ce60:       03 00 4c 3c     addis   r2,r12,3
    ce64:       a0 ae 42 38     addi    r2,r2,-20832
    ce68:       41 f5 21 f8     stdu    r1,-2752(r1)
    ce6c:       a6 02 08 7c     mflr    r0
    ce70:       30 0a c1 d9     stfd    f14,2608(r1)
    ce74:       38 0a e1 d9     stfd    f15,2616(r1)
    ce78:       40 0a 01 da     stfd    f16,2624(r1)
    ce7c:       48 0a 21 da     stfd    f17,2632(r1)
    ce80:       50 0a 41 da     stfd    f18,2640(r1)
    ce84:       58 0a 61 da     stfd    f19,2648(r1)
    ce88:       60 0a 81 da     stfd    f20,2656(r1)
    ce8c:       68 0a a1 da     stfd    f21,2664(r1)
    ce90:       70 0a c1 da     stfd    f22,2672(r1)
    ce94:       78 0a e1 da     stfd    f23,2680(r1)
    ce98:       80 0a 01 db     stfd    f24,2688(r1)
    ce9c:       88 0a 21 db     stfd    f25,2696(r1)
    cea0:       90 0a 41 db     stfd    f26,2704(r1)
    cea4:       98 0a 61 db     stfd    f27,2712(r1)
    cea8:       a0 0a 81 db     stfd    f28,2720(r1)
    ceac:       a8 0a a1 db     stfd    f29,2728(r1)
    ceb0:       b0 0a c1 db     stfd    f30,2736(r1)
    ceb4:       b8 0a e1 db     stfd    f31,2744(r1)
    ceb8:       a0 09 c1 f9     std     r14,2464(r1)
    cebc:       a8 09 e1 f9     std     r15,2472(r1)
    cec0:       b0 09 01 fa     std     r16,2480(r1)
    cec4:       b8 09 21 fa     std     r17,2488(r1)
    cec8:       c0 09 41 fa     std     r18,2496(r1)
    cecc:       c8 09 61 fa     std     r19,2504(r1)
    ced0:       d0 0a 01 f8     std     r0,2768(r1)
    ced4:       d0 09 81 fa     std     r20,2512(r1)
    ced8:       d8 09 a1 fa     std     r21,2520(r1)
    cedc:       e0 09 c1 fa     std     r22,2528(r1)
    cee0:       e8 09 e1 fa     std     r23,2536(r1)
    cee4:       f0 09 01 fb     std     r24,2544(r1)
    cee8:       f8 09 21 fb     std     r25,2552(r1)
    ceec:       00 0a 41 fb     std     r26,2560(r1)
    cef0:       08 0a 61 fb     std     r27,2568(r1)
    cef4:       10 0a 81 fb     std     r28,2576(r1)
    cef8:       18 0a a1 fb     std     r29,2584(r1)
    cefc:       20 0a c1 fb     std     r30,2592(r1)
    cf00:       28 0a e1 fb     std     r31,2600(r1)
    cf04:       a6 02 68 7d     mflr    r11
    cf08:       00 00 6b 81     lwz     r11,0(r11)
    cf0c:       41 e8 6b 6d     xoris   r11,r11,59457
    cf10:       18 00 0b 28     cmplwi  r11,24
    cf14:       08 00 82 41     beq     cf1c <_Unwind_Resume_or_Rethrow@@GCC_3.3+0xbc>
    cf18:       d8 0a 41 f8     std     r2,2776(r1)
    cf1c:       10 00 23 e9     ld      r9,16(r3)
    cf20:       a6 02 a8 7c     mflr    r5
    cf24:       78 1b 7f 7c     mr      r31,r3
    cf28:       ed 08 81 f6     stxv    vs52,2272(r1)
    cf2c:       fd 08 a1 f6     stxv    vs53,2288(r1)
    cf30:       0d 09 c1 f6     stxv    vs54,2304(r1)
    cf34:       1d 09 e1 f6     stxv    vs55,2320(r1)
    cf38:       2d 09 01 f7     stxv    vs56,2336(r1)
    cf3c:       26 00 00 7c     mfcr    r0
    cf40:       00 00 29 2c     cmpdi   r9,0
    cf44:       3d 09 21 f7     stxv    vs57,2352(r1)
    cf48:       4d 09 41 f7     stxv    vs58,2368(r1)
    cf4c:       5d 09 61 f7     stxv    vs59,2384(r1)
    cf50:       6d 09 81 f7     stxv    vs60,2400(r1)
    cf54:       7d 09 a1 f7     stxv    vs61,2416(r1)
    cf58:       8d 09 c1 f7     stxv    vs62,2432(r1)
    cf5c:       9d 09 e1 f7     stxv    vs63,2448(r1)
    cf60:       a8 08 01 90     stw     r0,2216(r1)
    cf64:       b0 08 01 90     stw     r0,2224(r1)
    cf68:       b8 08 01 90     stw     r0,2232(r1)
    cf6c:       10 01 82 40     bne     d07c <_Unwind_Resume_or_Rethrow@@GCC_3.3+0x21c>
    cf70:       d1 5f ff 4b     bl      2f40 <GCC_3.0@@GCC_3.0+0x2f40>
    cf74:       18 00 41 e8     ld      r2,24(r1)
    cf78:       00 00 40 39     li      r10,0
    cf7c:       a8 08 01 80     lwz     r0,2216(r1)
    cf80:       e9 08 81 f6     lxv     vs52,2272(r1)
    cf84:       d8 0a 41 e8     ld      r2,2776(r1)
    cf88:       f9 08 a1 f6     lxv     vs53,2288(r1)
    cf8c:       09 09 c1 f6     lxv     vs54,2304(r1)
    cf90:       19 09 e1 f6     lxv     vs55,2320(r1)
    cf94:       29 09 01 f7     lxv     vs56,2336(r1)
    cf98:       39 09 21 f7     lxv     vs57,2352(r1)
    cf9c:       49 09 41 f7     lxv     vs58,2368(r1)
    cfa0:       20 01 12 7c     mtocrf  32,r0
    cfa4:       b0 08 01 80     lwz     r0,2224(r1)
    cfa8:       59 09 61 f7     lxv     vs59,2384(r1)
    cfac:       c0 08 61 e8     ld      r3,2240(r1)
    cfb0:       69 09 81 f7     lxv     vs60,2400(r1)
    cfb4:       79 09 a1 f7     lxv     vs61,2416(r1)
    cfb8:       89 09 c1 f7     lxv     vs62,2432(r1)
    cfbc:       99 09 e1 f7     lxv     vs63,2448(r1)
    cfc0:       c8 08 81 e8     ld      r4,2248(r1)
    cfc4:       d0 08 a1 e8     ld      r5,2256(r1)
    cfc8:       d8 08 c1 e8     ld      r6,2264(r1)
    cfcc:       a0 09 c1 e9     ld      r14,2464(r1)
    cfd0:       20 01 11 7c     mtocrf  16,r0
    cfd4:       b8 08 01 80     lwz     r0,2232(r1)


Sample assembly on big-endian where we don't have a vector register storing in ._Unwind_Resume() from /lib/powerpc64-linux-gnu/libgcc_s.so.1

0000000000011430 <._Unwind_Resume_or_Rethrow>:
   
   1143c:	f8 01 0a 40 	std     r0,2624(r1)
   11440:	d9 c1 09 a0 	stfd    f14,2464(r1)
   11444:	d9 e1 09 a8 	stfd    f15,2472(r1)
   11448:	da 01 09 b0 	stfd    f16,2480(r1)
   1144c:	da 21 09 b8 	stfd    f17,2488(r1)
   11450:	da 41 09 c0 	stfd    f18,2496(r1)
   11454:	da 61 09 c8 	stfd    f19,2504(r1)
   11458:	da 81 09 d0 	stfd    f20,2512(r1)
   1145c:	da a1 09 d8 	stfd    f21,2520(r1)
   11460:	da c1 09 e0 	stfd    f22,2528(r1)
   11464:	da e1 09 e8 	stfd    f23,2536(r1)
   11468:	db 01 09 f0 	stfd    f24,2544(r1)
   1146c:	db 21 09 f8 	stfd    f25,2552(r1)
   11470:	db 41 0a 00 	stfd    f26,2560(r1)
   11474:	db 61 0a 08 	stfd    f27,2568(r1)
   11478:	db 81 0a 10 	stfd    f28,2576(r1)
   1147c:	db a1 0a 18 	stfd    f29,2584(r1)
   11480:	db c1 0a 20 	stfd    f30,2592(r1)
   11484:	db e1 0a 28 	stfd    f31,2600(r1)
   11488:	f9 c1 09 10 	std     r14,2320(r1)
   1148c:	f9 e1 09 18 	std     r15,2328(r1)
   11490:	fa 01 09 20 	std     r16,2336(r1)
   11494:	fa 21 09 28 	std     r17,2344(r1)
   11498:	fa 41 09 30 	std     r18,2352(r1)
   1149c:	fa 61 09 38 	std     r19,2360(r1)
   114a0:	fa 81 09 40 	std     r20,2368(r1)
   114a4:	fa a1 09 48 	std     r21,2376(r1)
   114a8:	fa c1 09 50 	std     r22,2384(r1)
   114ac:	fa e1 09 58 	std     r23,2392(r1)
   114b0:	fb 01 09 60 	std     r24,2400(r1)
   114b4:	fb 21 09 68 	std     r25,2408(r1)
   114b8:	fb 41 09 70 	std     r26,2416(r1)
   114bc:	fb 61 09 78 	std     r27,2424(r1)
   114c0:	fb 81 09 80 	std     r28,2432(r1)
   114c4:	fb a1 09 88 	std     r29,2440(r1)
   114c8:	fb c1 09 90 	std     r30,2448(r1)
   114cc:	fb e1 09 98 	std     r31,2456(r1)
   114d0:	7d 68 02 a6 	mflr    r11
   114d4:	81 6b 00 00 	lwz     r11,0(r11)
   114d8:	6d 6b e8 41 	xoris   r11,r11,59457
   114dc:	28 0b 00 28 	cmplwi  r11,40
   114e0:	41 82 00 08 	beq     114e8 <._Unwind_Resume_or_Rethrow+0xb8>
   114e4:	f8 41 0a 58 	std     r2,2648(r1)
   114e8:	7c a8 02 a6 	mflr    r5
   114ec:	7c 7f 1b 78 	mr      r31,r3
   114f0:	e9 23 00 10 	ld      r9,16(r3)
   114f4:	91 81 0a 38 	stw     r12,2616(r1)
   114f8:	2c 29 00 00 	cmpdi   r9,0
   114fc:	40 82 00 d8 	bne     115d4 <._Unwind_Resume_or_Rethrow+0x1a4>
   11500:	4b ff 3e e1 	bl      53e0 <GCC_14.0.0@@GCC_14.0.0+0x53e0>
   11504:	e8 41 00 28 	ld      r2,40(r1)
   11508:	39 40 00 00 	li      r10,0
   1150c:	81 81 0a 38 	lwz     r12,2616(r1)
   11510:	e8 01 0a 40 	ld      r0,2624(r1)
   11514:	c9 c1 09 a0 	lfd     f14,2464(r1)
   11518:	c9 e1 09 a8 	lfd     f15,2472(r1)
   1151c:	ca 01 09 b0 	lfd     f16,2480(r1)
   11520:	ca 21 09 b8 	lfd     f17,2488(r1)
   11524:	e8 41 0a 58 	ld      r2,2648(r1)
   11528:	e8 61 08 f0 	ld      r3,2288(r1)
   1152c:	ca 41 09 c0 	lfd     f18,2496(r1)
   11530:	ca 61 09 c8 	lfd     f19,2504(r1)
   11534:	e8 81 08 f8 	ld      r4,2296(r1)
   11538:	e8 a1 09 00 	ld      r5,2304(r1)
   1153c:	7c 08 03 a6 	mtlr    r0
   11540:	ca 81 09 d0 	lfd     f20,2512(r1)
   11544:	e8 c1 09 08 	ld      r6,2312(r1)
   11548:	e9 c1 09 10 	ld      r14,2320(r1)
   1154c:	e9 e1 09 18 	ld      r15,2328(r1)
   11550:	7d 92 01 20 	mtocrf  32,r12
   11554:	7d 91 01 20 	mtocrf  16,r12
   11558:	ea 01 09 20 	ld      r16,2336(r1)
   1155c:	ea 21 09 28 	ld      r17,2344(r1)
   11560:	ea 41 09 30 	ld      r18,2352(r1)
   11564:	ea 61 09 38 	ld      r19,2360(r1)
   11568:	ea 81 09 40 	ld      r20,2368(r1)
   1156c:	ea a1 09 48 	ld      r21,2376(r1)
   11570:	ea c1 09 50 	ld      r22,2384(r1)
   11574:	ea e1 09 58 	ld      r23,2392(r1)
   11578:	eb 01 09 60 	ld      r24,2400(r1)
   1157c:	eb 21 09 68 	ld      r25,2408(r1)
   11580:	eb 41 09 70 	ld      r26,2416(r1)
   11584:	eb 61 09 78 	ld      r27,2424(r1)
   11588:	eb 81 09 80 	ld      r28,2432(r1)
   1158c:	eb a1 09 88 	ld      r29,2440(r1)
   11590:	eb c1 09 90 	ld      r30,2448(r1)
   11594:	eb e1 09 98 	ld      r31,2456(r1)
   11598:	ca a1 09 d8 	lfd     f21,2520(r1)
   1159c:	ca c1 09 e0 	lfd     f22,2528(r1)
   115a0:	7d 90 81 20 	mtocrf  8,r12
   115a4:	ca e1 09 e8 	lfd     f23,2536(r1)
   115a8:	cb 01 09 f0 	lfd     f24,2544(r1)
   115ac:	cb 21 09 f8 	lfd     f25,2552(r1)
   115b0:	cb 41 0a 00 	lfd     f26,2560(r1)
   115b4:	cb 61 0a 08 	lfd     f27,2568(r1)
   115b8:	cb 81 0a 10 	lfd     f28,2576(r1)
   115bc:	cb a1 0a 18 	lfd     f29,2584(r1)
   115c0:	cb c1 0a 20 	lfd     f30,2592(r1)
   115c4:	cb e1 0a 28 	lfd     f31,2600(r1)
   115c8:	38 21 0a 30 	addi    r1,r1,2608
Comment 12 Florian Weimer 2024-05-21 07:53:44 UTC
The powerpc64le-*-linux-gnu target has a baseline of POWER8, so vector support is always compiled into libgcc_s.so.1.

On powerpc64-*-linux-gnu and powerpc-*-linux-gnu, vector support is only present if GCC is built to a minimum baseline with vector support. This is a GCC bug: it needs to do run-time dispatch based on CPU capabilities and be able to save vector registers, independently of how GCC was built.
Comment 13 Peter Bergner 2024-06-24 14:54:18 UTC
(In reply to Florian Weimer from comment #12)
> The powerpc64le-*-linux-gnu target has a baseline of POWER8, so vector
> support is always compiled into libgcc_s.so.1.
> 
> On powerpc64-*-linux-gnu and powerpc-*-linux-gnu, vector support is only
> present if GCC is built to a minimum baseline with vector support. This is a
> GCC bug: it needs to do run-time dispatch based on CPU capabilities and be
> able to save vector registers, independently of how GCC was built.

FYI, we've opened https://gcc.gnu.org/PR115242 to track this.