Bug 25672 - nptl/tst-mutex8-static and nptl/tst-mutexpi8-static failing on sparc64 on Linux
Summary: nptl/tst-mutex8-static and nptl/tst-mutexpi8-static failing on sparc64 on Linux
Status: RESOLVED DUPLICATE of bug 31244
Alias: None
Product: glibc
Classification: Unclassified
Component: ports (show other bugs)
Version: 2.31
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: https://buildd.debian.org/status/fetc...
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-14 08:55 UTC by John Paul Adrian Glaubitz
Modified: 2024-01-18 04:59 UTC (History)
6 users (show)

See Also:
Host:
Target: sparc*-*-*
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Paul Adrian Glaubitz 2020-03-14 08:55:59 UTC
With 2.31, the number of testsuite failures on Linux/sparc64 has dropped dramatically to just three failures. One of the failures left is nptl/tst-mutex8-static and nptl/tst-mutexpi8-static, for a full log see: https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=sparc64&ver=2.31-0experimental0&stamp=1584003885&raw=0

Are these failures which can be safely ignored or do they indicate a larger problem?
Comment 1 Adhemerval Zanella 2020-03-17 13:07:01 UTC
The issue seems that libgcc is in an infinite loop trying to unwind the canceled thread:

(gdb) thread apply all bt

Thread 3 (LWP 421806):
#0  binary_search_single_encoding_fdes (pc=0x110343 <kill+35>, ob=0x2e) at /home/azanella/toolchain/src/gcc/libgcc/unwind-dw2-fde.c:936
#1  search_object (ob=ob@entry=0x2a9c18 <object>, pc=pc@entry=0x110343 <kill+35>) at /home/azanella/toolchain/src/gcc/libgcc/unwind-dw2-fde.c:1005
#2  0x0000000000183dc8 in _Unwind_Find_registered_FDE (bases=0xfff8000100806448, pc=0x110343 <kill+35>) at /home/azanella/toolchain/src/gcc/libgcc/unwind-dw2-fde.c:1054
#3  _Unwind_Find_FDE (pc=0x110343 <kill+35>, bases=bases@entry=0xfff8000100806448) at /home/azanella/toolchain/src/gcc/libgcc/unwind-dw2-fde-dip.c:458
#4  0x000000000017fd54 in uw_frame_state_for (context=context@entry=0xfff80001008060f0, fs=fs@entry=0xfff8000100805570) at /home/azanella/toolchain/src/gcc/libgcc/unwind-dw2.c:1249
#5  0x00000000001816dc in _Unwind_ForcedUnwind_Phase2 (exc=exc@entry=0xfff8000100807d70, context=context@entry=0xfff80001008060f0) at /home/azanella/toolchain/src/gcc/libgcc/unwind.inc:155
#6  0x0000000000181d04 in _Unwind_ForcedUnwind (exc=0xfff8000100807d70, stop=stop@entry=0x10a7a0 <unwind_stop>, stop_argument=stop_argument@entry=0xfff8000100806a20) at /home/azanella/toolchain/src/gcc/libgcc/unwind.inc:207
#7  0x000000000010a8e8 in __pthread_unwind (buf=0xfff8000100806a20) at unwind.c:121
#8  0x00000000001097d0 in __do_cancel () at ./pthreadP.h:311
#9  sigcancel_handler (sig=<optimized out>, si=0xfff8000100806700, ctx=0xfff8000100806700) at nptl-init.c:162
#10 <signal handler called>
#11 0x000000000010709c in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x2a9c7c <c+44>) at ../sysdeps/nptl/futex-internal.h:183
#12 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7feffffeaf8, cond=0x2a9c50 <c>) at pthread_cond_wait.c:508
#13 __pthread_cond_wait (cond=cond@entry=0x2a9c50 <c>, mutex=0x7feffffeaf8) at pthread_cond_wait.c:638
#14 0x0000000000101114 in tf (arg=0x1) at ../sysdeps/pthread/tst-mutex8.c:74
#15 0x0000000000103a78 in start_thread (arg=0xfff8000100807900) at pthread_create.c:473
#16 0x000000000013666c in __thread_start () at ../sysdeps/unix/sysv/linux/sparc/sparc64/clone.S:77
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (LWP 421802):
#0  0x0000000000104ee4 in __pthread_clockjoin_ex (threadid=14, thread_return=0xe, clockid=<optimized out>, abstime=0xe, block=<optimized out>) at pthread_join_common.c:145
#1  0x0000000000000016 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The other issues (nptl/tst-cond8-static, nptl/tst-cancel24-static) seems to follow the same pattern. I am not sure if this is code-generation issue (since the dynamic linked test does not fail) or some missing directive.

I thought it might be something related to b33e946fbb1659d2c5937 (sparc: Move sigreturn stub to assembly) due to some missing CFI directive that is messing with libgcc unwind. I tried to use a C implementation that -fexception and -funwind-asynchronous-table, but it didn't change the outcome.
Comment 2 Adhemerval Zanella 2024-01-17 13:21:46 UTC
It is the same issue from BZ#31244, where the rewrite done by b33e946fbb1659d2c5937c4dd756a7c49a132dff was not fully correct regarding CFI annotation.  I will send a similar fix as proposed to fix the sparc32 issue:

diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/sigreturn_stub.S b/sysdeps/unix/sysv/linux/sparc/sparc64/sigreturn_stub.S
index 12af289375..3134337e25 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/sigreturn_stub.S
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/sigreturn_stub.S
@@ -23,7 +23,10 @@

    [1] https://lkml.org/lkml/2016/5/27/465  */

-ENTRY (__rt_sigreturn_stub)
+   nop
+   nop
+
+ENTRY_NOCFI (__rt_sigreturn_stub)
        mov     __NR_rt_sigreturn, %g1
        ta      0x6d
-END (__rt_sigreturn_stub)
+END_NOCFI (__rt_sigreturn_stub)

It fixes the regression I saw on sparc64:

FAIL: nptl/tst-cancel24-static
FAIL: nptl/tst-cond8-static
FAIL: nptl/tst-mutex8-static
FAIL: nptl/tst-mutexpi8-static
FAIL: nptl/tst-mutexpi9

*** This bug has been marked as a duplicate of bug 31244 ***