Here ( https://sourceware.org/pipermail/gdb-testers/2020q3/169745.html ) we find in gdb.sum: ... FAIL: gdb.threads/create-fail.exp: iteration 10: run till end FAIL: gdb.threads/create-fail.exp: iteration 1: run till end FAIL: gdb.threads/create-fail.exp: iteration 2: run till end FAIL: gdb.threads/create-fail.exp: iteration 3: run till end FAIL: gdb.threads/create-fail.exp: iteration 4: run till end FAIL: gdb.threads/create-fail.exp: iteration 5: run till end FAIL: gdb.threads/create-fail.exp: iteration 6: run till end FAIL: gdb.threads/create-fail.exp: iteration 7: run till end FAIL: gdb.threads/create-fail.exp: iteration 8: run till end FAIL: gdb.threads/create-fail.exp: iteration 9: run till end ... In more detail, in gdb.log: ... (gdb) run Starting program: /home/gdb-buildbot-2/fedora-x86-64-4/fedora-x86-64/build/gdb/testsuite/outputs/gdb.threads/create-fail/create-fail [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1, main () at /home/gdb-buildbot-2/fedora-x86-64-4/fedora-x86-64/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.threads/create-fail.c:69 69 for (i = 0; i < CPU_SETSIZE; i++) (gdb) continue Continuing. [New Thread 0x7ffff7c83700 (LWP 626354)] [New Thread 0x7ffff7482700 (LWP 626355)] [Thread 0x7ffff7c83700 (LWP 626354) exited] [New Thread 0x7ffff6c81700 (LWP 626356)] [Thread 0x7ffff7482700 (LWP 626355) exited] [New Thread 0x7ffff6480700 (LWP 626357)] [Thread 0x7ffff6c81700 (LWP 626356) exited] [New Thread 0x7ffff5c7f700 (LWP 626358)] [Thread 0x7ffff6480700 (LWP 626357) exited] pthread_create: 22: Invalid argument Thread 6 "create-fail" received signal SIG32, Real-time event 32. [Switching to Thread 0x7ffff5c7f700 (LWP 626358)] 0x00007ffff7d87695 in clone () from /lib64/libc.so.6 (gdb) FAIL: gdb.threads/create-fail.exp: iteration 1: run till end ...
This could have something to do with glibc version. The fedora version used by the buildbot is fc31, which uses glibc 2.30. I don't see the FAIL on my laptop with openSUSE Leap 15.2, which uses glibc 2.26. On openSUSE Tumbleweed we do see the FAIL ( https://build.opensuse.org/public/build/devel:gcc/openSUSE_Factory/x86_64/gdb/_log ), with glibc 2.31.
My understanding of what happens is as follows: Glibc internally defines SIGCANCEL: ... ./sysdeps/unix/sysv/linux/internal-signals.h:#define SIGCANCEL __SIGRTMIN ... which get the value 32: ... ./bits/signum-arch.h:#define __SIGRTMIN 32 ... During the test, the signal arrives at gdb. GDB has the following behaviour: ... $ gdb -batch -ex "info handle" Signal Stop Print Pass to program Description ... SIGCANCEL No No Yes LWP internal signal SIG32 Yes Yes Yes Real-time event 32 ... In gdb_signal_from_host we translate the signal with value 32 into a gdb signal. There is a bit: ... #if defined (SIGCANCEL) if (hostsig == SIGCANCEL) return GDB_SIGNAL_CANCEL; #endif ... but that is not actived, because SIGCANCEL is not defined (because it's an glibc-internal signal). This gets actived instead: ... else if (hostsig == 32) return GDB_SIGNAL_REALTIME_32; ... So, instead of the appropriate Stop=No behaviour of SIGCANCEL, we have the Stop=Yes behaviour of SIG32, and the test-case FAILs.
Confirmed on current trunk, in opensuse tumbleweed vm bound to 1 cpu with 75% execution cap.
https://sourceware.org/pipermail/gdb-patches/2021-February/175677.html
While investigating this, I noticed something else that may indicate a kernel-side counterpart. Although this doesn't FAIL for x86_64-linux/Ubuntu 20.04, I can trigger FAIL's manually if I produce enough noise (by, for example, enabling infrun debugging) while the testcase is running. So it seems that this test passes if GDB runs the test undisturbed (and quickly). If there is considerable noise going on during GDB's execution of the test, we will run into the SIG32. It is almost as if the kernel misses sending the signal notification for SIG32 in some cases. Adhemerval, a glibc maintainer, mentioned there might be a race going on somewhere.
Created attachment 13202 [details] Tentative patch This works for me.
I get: binutils-gdb/gdbsupport/signals.cc:283:2: error: #endif without #if 283 | #endif
(In reply to Luis Machado from comment #7) > I get: > > binutils-gdb/gdbsupport/signals.cc:283:2: error: #endif without #if > 283 | #endif Yeah, that's a stray endif, just remove it.
Works for me then.
Found while trying to reproduce root cause: current root cause for me is: since glibc 2.28, sigaddset ignores SIGCANCEL, which means lin_thread_get_thread_signals is broken. This needs to be fixed differently.
The master branch has been updated by Tom de Vries <vries@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=089436f78743628b22e87c2f8d32bd5f9d818f5a commit 089436f78743628b22e87c2f8d32bd5f9d818f5a Author: Tom de Vries <tdevries@suse.de> Date: Fri Feb 12 20:12:37 2021 +0100 [gdb/threads] Fix lin_thread_get_thread_signals for glibc 2.28 When running test-case gdb.threads/create-fail.exp on openSUSE Factory (with glibc version 2.32) I run into: ... (gdb) continue Continuing. [New Thread 0x7ffff7c83700 (LWP 626354)] [New Thread 0x7ffff7482700 (LWP 626355)] [Thread 0x7ffff7c83700 (LWP 626354) exited] [New Thread 0x7ffff6c81700 (LWP 626356)] [Thread 0x7ffff7482700 (LWP 626355) exited] [New Thread 0x7ffff6480700 (LWP 626357)] [Thread 0x7ffff6c81700 (LWP 626356) exited] [New Thread 0x7ffff5c7f700 (LWP 626358)] [Thread 0x7ffff6480700 (LWP 626357) exited] pthread_create: 22: Invalid argument Thread 6 "create-fail" received signal SIG32, Real-time event 32. [Switching to Thread 0x7ffff5c7f700 (LWP 626358)] 0x00007ffff7d87695 in clone () from /lib64/libc.so.6 (gdb) FAIL: gdb.threads/create-fail.exp: iteration 1: run till end ... The problem is that glibc-internal signal SIGCANCEL is not recognized by gdb. There's code in check_thread_signals that is supposed to take care of that, but it's not working because this code in lin_thread_get_thread_signals has stopped working: ... /* NPTL reserves the first two RT signals, but does not provide any way for the debugger to query the signal numbers - fortunately they don't change. */ sigaddset (set, __SIGRTMIN); sigaddset (set, __SIGRTMIN + 1); ... Since glibc commit d2dc5467c6 "Filter out NPTL internal signals (BZ #22391)" (first released as part of glibc 2.28), a sigaddset with a glibc-internal signal has no other effect than setting errno to EINVALID. Fix this by eliminating the usage of sigset_t in check_thread_signals and lin_thread_get_thread_signals. The same problem was observed on Ubuntu 20.04. Tested on x86_64-linux, openSUSE Factory. Tested on aarch64-linux, Ubuntu 20.04 and Ubuntu 18.04. gdb/ChangeLog: 2021-02-12 Tom de Vries <tdevries@suse.de> PR threads/26228 * linux-nat.c (lin_thread_get_thread_signals): Remove. (lin_thread_signals): New static var. (lin_thread_get_thread_signal_num, lin_thread_get_thread_signal): New function. * linux-nat.h (lin_thread_get_thread_signals): Remove. (lin_thread_get_thread_signal_num, lin_thread_get_thread_signal): Declare. * linux-thread-db.c (check_thread_signals): Use lin_thread_get_thread_signal_num and lin_thread_get_thread_signal.
Patch committed, marking resolved-fixed.