Bug 26228 - FAIL: gdb.threads/create-fail.exp: iteration x: run till end (SIG32)
Summary: FAIL: gdb.threads/create-fail.exp: iteration x: run till end (SIG32)
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: threads (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 11.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-11 08:35 UTC by Tom de Vries
Modified: 2021-02-12 19:13 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Tentative patch (978 bytes, patch)
2021-02-04 13:27 UTC, Tom de Vries
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2020-07-11 08:35:07 UTC
Here ( https://sourceware.org/pipermail/gdb-testers/2020q3/169745.html ) we find in gdb.sum:
...
FAIL: gdb.threads/create-fail.exp: iteration 10: run till end
FAIL: gdb.threads/create-fail.exp: iteration 1: run till end
FAIL: gdb.threads/create-fail.exp: iteration 2: run till end
FAIL: gdb.threads/create-fail.exp: iteration 3: run till end
FAIL: gdb.threads/create-fail.exp: iteration 4: run till end
FAIL: gdb.threads/create-fail.exp: iteration 5: run till end
FAIL: gdb.threads/create-fail.exp: iteration 6: run till end
FAIL: gdb.threads/create-fail.exp: iteration 7: run till end
FAIL: gdb.threads/create-fail.exp: iteration 8: run till end
FAIL: gdb.threads/create-fail.exp: iteration 9: run till end
...

In more detail, in gdb.log:
...
(gdb) run 
Starting program: /home/gdb-buildbot-2/fedora-x86-64-4/fedora-x86-64/build/gdb/testsuite/outputs/gdb.threads/create-fail/create-fail 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, main () at /home/gdb-buildbot-2/fedora-x86-64-4/fedora-x86-64/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.threads/create-fail.c:69
69	  for (i = 0; i < CPU_SETSIZE; i++)
(gdb) continue
Continuing.
[New Thread 0x7ffff7c83700 (LWP 626354)]
[New Thread 0x7ffff7482700 (LWP 626355)]
[Thread 0x7ffff7c83700 (LWP 626354) exited]
[New Thread 0x7ffff6c81700 (LWP 626356)]
[Thread 0x7ffff7482700 (LWP 626355) exited]
[New Thread 0x7ffff6480700 (LWP 626357)]
[Thread 0x7ffff6c81700 (LWP 626356) exited]
[New Thread 0x7ffff5c7f700 (LWP 626358)]
[Thread 0x7ffff6480700 (LWP 626357) exited]
pthread_create: 22: Invalid argument

Thread 6 "create-fail" received signal SIG32, Real-time event 32.
[Switching to Thread 0x7ffff5c7f700 (LWP 626358)]
0x00007ffff7d87695 in clone () from /lib64/libc.so.6
(gdb) FAIL: gdb.threads/create-fail.exp: iteration 1: run till end
...
Comment 1 Tom de Vries 2020-07-11 08:52:54 UTC
This could have something to do with glibc version.

The fedora version used by the buildbot is fc31, which uses glibc 2.30.

I don't see the FAIL on my laptop with openSUSE Leap 15.2, which uses glibc 2.26.

On openSUSE Tumbleweed we do see the FAIL ( https://build.opensuse.org/public/build/devel:gcc/openSUSE_Factory/x86_64/gdb/_log ), with glibc 2.31.
Comment 2 Tom de Vries 2020-07-11 09:11:32 UTC
My understanding of what happens is as follows:

Glibc internally defines SIGCANCEL:
...
./sysdeps/unix/sysv/linux/internal-signals.h:#define SIGCANCEL       __SIGRTMIN
...
which get the value 32:
...
./bits/signum-arch.h:#define __SIGRTMIN 32
...

During the test, the signal arrives at gdb.

GDB has the following behaviour:
...
$ gdb -batch -ex "info handle"
Signal        Stop      Print   Pass to program Description
   ...
SIGCANCEL     No        No      Yes             LWP internal signal
SIG32         Yes       Yes     Yes             Real-time event 32
...

In gdb_signal_from_host we translate the signal with value 32 into a gdb signal.  There is a bit:
...
#if defined (SIGCANCEL)   
  if (hostsig == SIGCANCEL)
    return GDB_SIGNAL_CANCEL;
#endif
...
but that is not actived, because SIGCANCEL is not defined (because it's an glibc-internal signal).

This gets actived instead:
...
      else if (hostsig == 32)
        return GDB_SIGNAL_REALTIME_32;
...

So, instead of the appropriate Stop=No behaviour of SIGCANCEL, we have the Stop=Yes behaviour of SIG32, and the test-case FAILs.
Comment 3 Tom de Vries 2020-07-24 13:59:39 UTC
Confirmed on current trunk, in opensuse tumbleweed vm bound to 1 cpu with 75%
execution cap.
Comment 5 Luis Machado 2021-02-02 11:55:49 UTC
While investigating this, I noticed something else that may indicate a kernel-side counterpart.

Although this doesn't FAIL for x86_64-linux/Ubuntu 20.04, I can trigger FAIL's manually if I produce enough noise (by, for example, enabling infrun debugging) while the testcase is running.

So it seems that this test passes if GDB runs the test undisturbed (and quickly). If there is considerable noise going on during GDB's execution of the test, we will run into the SIG32.

It is almost as if the kernel misses sending the signal notification for SIG32 in some cases. Adhemerval, a glibc maintainer, mentioned there might be a race going on somewhere.
Comment 6 Tom de Vries 2021-02-04 13:27:30 UTC
Created attachment 13202 [details]
Tentative patch

This works for me.
Comment 7 Luis Machado 2021-02-04 14:11:25 UTC
I get:

binutils-gdb/gdbsupport/signals.cc:283:2: error: #endif without #if
  283 | #endif
Comment 8 Tom de Vries 2021-02-04 14:15:53 UTC
(In reply to Luis Machado from comment #7)
> I get:
> 
> binutils-gdb/gdbsupport/signals.cc:283:2: error: #endif without #if
>   283 | #endif

Yeah, that's a stray endif, just remove it.
Comment 9 Luis Machado 2021-02-04 14:24:02 UTC
Works for me then.
Comment 10 Tom de Vries 2021-02-04 17:27:41 UTC
Found while trying to reproduce root cause: current root cause for me is: since glibc 2.28, sigaddset ignores SIGCANCEL, which means lin_thread_get_thread_signals is broken.

This needs to be fixed differently.
Comment 11 Sourceware Commits 2021-02-12 19:12:42 UTC
The master branch has been updated by Tom de Vries <vries@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=089436f78743628b22e87c2f8d32bd5f9d818f5a

commit 089436f78743628b22e87c2f8d32bd5f9d818f5a
Author: Tom de Vries <tdevries@suse.de>
Date:   Fri Feb 12 20:12:37 2021 +0100

    [gdb/threads] Fix lin_thread_get_thread_signals for glibc 2.28
    
    When running test-case gdb.threads/create-fail.exp on openSUSE Factory
    (with glibc version 2.32) I run into:
    ...
    (gdb) continue
    Continuing.
    [New Thread 0x7ffff7c83700 (LWP 626354)]
    [New Thread 0x7ffff7482700 (LWP 626355)]
    [Thread 0x7ffff7c83700 (LWP 626354) exited]
    [New Thread 0x7ffff6c81700 (LWP 626356)]
    [Thread 0x7ffff7482700 (LWP 626355) exited]
    [New Thread 0x7ffff6480700 (LWP 626357)]
    [Thread 0x7ffff6c81700 (LWP 626356) exited]
    [New Thread 0x7ffff5c7f700 (LWP 626358)]
    [Thread 0x7ffff6480700 (LWP 626357) exited]
    pthread_create: 22: Invalid argument
    
    Thread 6 "create-fail" received signal SIG32, Real-time event 32.
    [Switching to Thread 0x7ffff5c7f700 (LWP 626358)]
    0x00007ffff7d87695 in clone () from /lib64/libc.so.6
    (gdb) FAIL: gdb.threads/create-fail.exp: iteration 1: run till end
    ...
    The problem is that glibc-internal signal SIGCANCEL is not recognized by gdb.
    
    There's code in check_thread_signals that is supposed to take care of that,
    but it's not working because this code in lin_thread_get_thread_signals has
    stopped working:
    ...
      /* NPTL reserves the first two RT signals, but does not provide any
         way for the debugger to query the signal numbers - fortunately
         they don't change.  */
      sigaddset (set, __SIGRTMIN);
      sigaddset (set, __SIGRTMIN + 1);
    ...
    
    Since glibc commit d2dc5467c6 "Filter out NPTL internal signals (BZ #22391)"
    (first released as part of glibc 2.28), a sigaddset with a glibc-internal
    signal has no other effect than setting errno to EINVALID.
    
    Fix this by eliminating the usage of sigset_t in check_thread_signals and
    lin_thread_get_thread_signals.
    
    The same problem was observed on Ubuntu 20.04.
    
    Tested on x86_64-linux, openSUSE Factory.
    Tested on aarch64-linux, Ubuntu 20.04 and Ubuntu 18.04.
    
    gdb/ChangeLog:
    
    2021-02-12  Tom de Vries  <tdevries@suse.de>
    
            PR threads/26228
            * linux-nat.c (lin_thread_get_thread_signals): Remove.
            (lin_thread_signals): New static var.
            (lin_thread_get_thread_signal_num, lin_thread_get_thread_signal):
            New function.
            * linux-nat.h (lin_thread_get_thread_signals): Remove.
            (lin_thread_get_thread_signal_num, lin_thread_get_thread_signal):
            Declare.
            * linux-thread-db.c (check_thread_signals): Use
            lin_thread_get_thread_signal_num and lin_thread_get_thread_signal.
Comment 12 Tom de Vries 2021-02-12 19:13:22 UTC
Patch committed, marking resolved-fixed.